The dream of an artificially intelligent computer that can study a problem and gain expertise all on its own is now reality. A system debuted today by a team of Google researchers is not clever enough to perform surgery or drive a car safely, but it did master several dozen classic arcade games, in many cases surpassing the best human players without ever observing how they play.
“The results are impressive,” says Tomaso Poggio, director of the Center for Brains, Minds and Machines at the Massachusetts Institute of Technology in Cambridge.
In theory, computers could learn new skills at incredible speed if they didn’t have to wait for human teachers to give feedback on whether they are on the right track. But this approach—known as unsupervised learning—has rarely worked for skills more complicated than correctly recognizing handwritten ZIP codes or recorded samples of pop songs.
Then a year ago, Demis Hassabis, a wunderkind computer scientist and former video game designer, gave a talk at a technology conference in Paris and provided a glimpse of a success. Just a few months before that, Hassabis's small artificial intelligence startup called DeepMind—about 50 employees based in London—had been acquired by Google for more than $500 million. The video playing behind Hassabis showed what seemed impossible: A computer learning on its own to play complicated video games like Breakout (see video below), in which you have to break down a wall by bouncing a ball off it. After exploring the game by playing it, the computer discovered advanced strategies that few humans know about, such as digging a hole to bounce the ball along the back side of the wall.
Video credit: Google DeepMind (with permission from Atari Interactive Inc.)
In a study published online today in Nature, the DeepMind team finally reveals how they pulled it off. The researchers dub their computer learning system the Deep-Q-Network (DQN) because it combines two different strategies: deep neural networks and Q-learning. The deep neural network is a perception system—very loosely inspired by animal vision, which has made huge strides in recent years. The DQN sees and interacts with the game exactly like humans do: making moves and seeing the game pixels change.
The “Q” in DQN is how the system knows that it’s on the right track. Q-learning is a mathematical version of a concept from psychology called reinforcement learning, a reward system thought to guide the process of learning in humans and other animals. In this case, the DQN’s reward comes in the form of game points. As it tries out different moves in the game, it keeps track of which combinations lead to higher points.
To test the system, the DeepMind researchers let it loose on 49 classic Atari 2600 games from the 1980s. These games are at a “sweet spot,” Hassabis says—not so easy as to be trivial, but hard enough that humans actually struggle to become experts. They gave the DQN only modest resources: just 2 weeks of play for each game with the power of a single desktop computer.
It was far from a sure thing that this strategy would work. Researchers have tried to make computers learn video games by simply optimizing for points, but computers get stuck on games like Breakout or Space Invaders, where long, complicated strategies are often required to score big. In Breakout, for example, you have to have the patience to discover that setting up a hole to the wall’s back side will pay off later.
For about half of the games, not only did the computer not get stuck in a rut, but it also learned how to outperform the best human players. The DQN scored about 20% to 30% more points than humans at classic games like Space Invaders and Pong, and for others, such as Breakout and Video Pinball, it racked up more than 10 times the number of points. The next step, Hassabis says, is “knowledge transfer”: teaching the system to apply what it has already learned from one game to another. For example, it should learn to play games with paddles and bouncing balls faster now that it knows how to play one such game.
The finding “suggests that [computers using] reinforcement learning may be able to learn similar realistic tasks such as driving a car,” Poggio says. However, he is skeptical that this approach alone can enable computers “to learn abstract thinking from scratch, or reasoning, or abilities such as social perception.” Even a self-driving car needs to do more than rack up a high score.