Many great ideas in artificial intelligence languish in textbooks for decades because we don’t have the computational power to apply them. That’s what happened with neural networks, a technique inspired by our brains’ wiring that has recently succeeded in translating languages and driving cars. Now, another old idea—improving neural networks not through teaching, but through evolution—is revealing its potential. Five new papers from Uber in San Francisco, California, demonstrate the power of so-called neuroevolution to play video games, solve mazes, and even make a simulated robot walk.
Neuroevolution, a process of mutating and selecting the best neural networks, has previously led to networks that can compose music, control robots, and play the video game Super Mario World. But these were mostly simple neural nets that performed relatively easy tasks or relied on programming tricks to simplify the problems they were trying to solve. “The new results show that—surprisingly—you may actually not need any tricks at all,” says Kenneth Stanley, a computer scientist at Uber and a co-author on all five studies. “That means that complex problems requiring a large network are now accessible to neuroevolution, vastly expanding its potential scope of application.”
At Uber, such applications might include driving autonomous cars, setting customer prices, or routing vehicles to passengers. But the team, part of a broad research effort, had no specific uses in mind when doing the work. In part, they merely wanted to challenge what Jeff Clune, another Uber co-author, calls “the modern darlings” of machine learning: algorithms that use something called “gradient descent,” a system that gradually improves a solution by reducing its error. Nearly all methods of training neural networks to perform tasks rely on gradient descent.
The most novel Uber paper uses a completely different approach that tries many solutions at once. A large collection of randomly programmed neural networks is tested (on, say, an Atari game), and the best are copied, with slight random mutations, replacing the previous generation. The new networks play the game, the best are copied and mutated, and so on for several generations. The advantage of this method over gradient descent is that it tries a variety of strategies instead of putting all its effort into perfecting a single solution. When compared with two of the most widely used methods for training neural networks, this exploratory approach outscored them on five of 13 Atari games. It also managed to teach a virtual humanoid robot to walk, developing a neural network a hundred times larger than any previously developed through neuroevolution to control a robot.
Clune says the fact that the exploratory algorithm worked on such large networks was “eye-popping,” because millions of connections were being randomly mutated simultaneously. Further, he was surprised that their very basic “vanilla” version of the exploratory algorithm beat the industry-standard algorithms. That means researchers should be able to enhance it in a variety of ways. In fact, when they combined it with two techniques to improve its evolutionary selection process—one of which they invented and report in a companion paper—it showed big jumps in performance. In one case, it reached the end of a maze as three comparison algorithms—all using gradient descent—remained stuck in dead ends.
The other three Uber papers built on a pseudoevolutionary approach advanced by the San Francisco–based nonprofit OpenAI last year. OpenAI used the algorithm (which approximates gradient descent) to create networks that could master Atari games such as Pong and Skiing. The Uber team’s versions improved on it in a few ways, and provided insights into how the original version worked.
Risto Miikkulainen, Stanley’s Ph.D. adviser and a computer scientist at the University of Texas in Austin and Sentient Technologies in San Francisco, says he’s “pretty excited” about the success of the exploratory algorithm, which was not simple to scale up. Tim Salimans, one of the computer scientists behind OpenAI’s algorithm, says that for solving tough problems, the exploratory algorithm “definitely adds one additional option to the mix.” All of the researchers suggested that going forward, the best solutions might involve hybrids of existing techniques, which each have unique strengths. Evolution is good for finding diverse solutions, and gradient descent is good for refining them. With the new tools offered by the OpenAI and Uber papers, Clune says 2017 will be seen as an “inflection point” for neuroevolution.