Eighteen years after a computer beat then-reigning world champion Garry Kasparov at chess, a machine has defeated a professional player at the ancient eastern board game Go. The new advance is much bigger, artificial intelligence (AI) researchers say, as Go is such a computationally demanding game that even a decade ago some researchers thought a computer would never defeat a human expert. What's more, the machine won not by virtue of overwhelming computational power, but by employing "machine learning" tools that enable it to teach itself and to think more like humans do.
"A lot of people will be shocked because for many years people have tried to sell the notion of Go as a game in which computers can never beat humans," says Rémi Coulom, a freelance AI researcher in Lille, France, who wrote the previous best Go program, Crazy Stone. Jonathan Schaeffer, a computer scientist at the University of Alberta (UA), Edmonton in Canada calls the work a tour de force: "This is a paper that is going to have huge immediate impact.”
Go looks childishly simple. The board consists of a square, 19-by-19 grid of points. One player has a supply of black stones, the other a supply of white, and they take turns placing their stones on the board, with black going first. A player tries to surround a "chain" of one or more of the opponent's stones so that no open spaces exist next to it, while preventing the opponent from doing the same thing. Surrounded stones are removed from the board and, roughly speaking, scoring is based on how much area your stones finally enclose.
However, Go defies easy computer analysis for two reasons. First, because the board is so large, the number of possible moves is astronomical. At the beginning of the game, each player has roughly 360 options for placing each stone. So after five plays, the board can be in any of more than 5 trillion different arrangements. In total, the number of different possible arrangements of stones stretches beyond 10100, rendering it impossible for a computer to play by brute force computation of all possible outcomes.
Second, Go has vexed computers because for any particular arrangement of stones on the board it is hard to estimate whether black or white has an advantage. In chess, for example, players or computers can judge approximately how strong their positions on the board are by simply comparing the pieces they've captured with ones they've lost, Coulom says. But such on-the-fly accounting is much harder in Go, he says. That's especially true as much of the tallying is done at the end of the game.
Previous programs focused less on evaluating the state of the board and more on speeding up simulations of how the game might play out. Crazy Stone employs an algorithm called a Monte Carlo tree search that, instead of trying to calculate every possible sequence of plays, samples only some of them by using a random number generator at each step to choose between different possible moves. It uses a small amount of adaptive programming, or machine learning, to learn how to avoid the least helpful moves. Crazy Stone has beaten strong human players, says Bruno Bouzy, a computer scientist at Paris Descartes University (PDU), but only when given the benefit of placing the first three or four stones unopposed.
Now, however, David Silver, Demis Hassabis, and 18 other computer scientists at Google DeepMind, an AI company in London acquired by Google 2 years ago, have developed a program that confronts the challenges of Go directly. Instead of randomly exploring sequences of moves, the program, AlphaGo, learns to distinguish a better move from a poorer one and to evaluate the strength of its position the board. To do those things, the program relies on "deep neural networks" —computer programs that mimic the connections of neurons in the brain and have the capacity to learn, as the team reports online today in Nature.
Such networks consist of layers of abstract interconnected "neurons." One neuron can cause or inhibit another from firing, and learning occurs as the system adjusts connections between neurons. For example, AlphaGo uses a "policy network" to judge the viability of possible moves. The bottom layer of the network consists of a 19-by-19 array of neurons that basically takes a snapshot of the state of the board and uses it as an input. The top layer consists of a similar array that shows all the possible locations for laying the next stone and the probability for making each of those moves. In between lie 11 more layers. (Deep neural networks are literally deeper than their predecessors.)
The goal is to get the network to automatically come up with the best next move given any input configuration. To train the network, researchers fed it a database of 30 million board configurations and subsequent plays by expert players. To match all the inputs to all the outputs, the computer adjusts the connections in the intervening network, which then subtly encode all the "knowledge" in the data. They then let the program further teach itself by playing against itself over and over again. Thus, AlphaGo learned from experience to tell a better move from a poorer one. "The way we've developed the system, it plays more like a human does," Hassabis says.
The researchers developed a separate "value network" that, given a configuration of the board, could evaluate whether black or white holds the advantage by estimating probability that one side or the other would eventually win the game. To train it, the researchers fed the network configurations and outcomes from games AlphaGo played with itself. The value network helped AlphaGo play faster, Silver says. Instead of playing out many scenarios to the very end, as in a Monte Carlo tree search, AlphaGo could play the game forward a few moves and use the network to estimate the final result, Silver says.
AlphaGo easily defeated Crazy Stone and other Go-playing programs, winning 99.8% of games when it ran on a single high-power computer and 100% of games when it ran on multiple computers. And it bested Fan Hui, a lower-ranked professional player and the 2013 European Go champion, five games to none—although Hui took two of five "informal games" played before the official trial. AlphaGo will get its chance to play a top-ranked professional in March, Silver says, when it takes on Lee Sedol, a 32-year-old South Korean Go champion of the highest rank, in a match in Seoul. "You can think of him as the Roger Federer of Go," Silver says.
But even now, AlphaGo marks a major advance, says UA's Schaeffer, especially as it uses the generic, completely automated tools of deep learning and not specialty programming or just more computing power. "It's not an incremental advance," he says. "It's a huge leap forward."
Coulom agrees, but he notes that there isn't one key new invention that makes the whole program work. "It's more like a great engineering achievement," he says. "The way they put all the pieces together is really innovative."
Deep neural networks and deeply learning are already finding use in fields such as pattern recognition, automated translation, medical diagnostics, and smartphone assistance. So the concepts in AlphaGo may already be in use around us.
Even if AlphaGo beats Sedol, its triumph won't feel like the kick in the stomach that was Kasparov's defeat in chess, computer scientists predict. People have just grown used to the idea that in games eventually computers will prevail, they say. But computers haven't yet won them all, notes PDU's Bouzy. Ironically, he says, humans may still hold the greatest advantage in video games. "They're typically very complex," he says, "with lots of characters, lots of action, lots of places to go." For now, the 13-year-old human mind can better cope with all that better than a computer.