Ali Rahimi, a researcher in artificial intelligence (AI) at Google in San Francisco, California, took a swipe at his field last December—and received a 40-second ovation for it. Speaking at an AI conference, Rahimi charged that machine learning algorithms, in which computers learn through trial and error, have become a form of "alchemy." Researchers, he said, do not know why some algorithms work and others don't, nor do they have rigorous criteria for choosing one AI architecture over another. Now, in a paper presented on 30 April at the International Conference on Learning Representations in Vancouver, Canada, Rahimi and his collaborators document examples of what they see as the alchemy problem and offer prescriptions for bolstering AI's rigor.
"There's an anguish in the field," Rahimi says. "Many of us feel like we're operating on an alien technology."
The issue is distinct from AI's reproducibility problem, in which researchers can't replicate each other's results because of inconsistent experimental and publication practices. It also differs from the "black box" or "interpretability" problem in machine learning: the difficulty of explaining how a particular AI has come to its conclusions. As Rahimi puts it, "I'm trying to draw a distinction between a machine learning system that's a black box and an entire field that's become a black box."
Without deep understanding of the basic tools needed to build and train new algorithms, he says, researchers creating AIs resort to hearsay, like medieval alchemists. "People gravitate around cargo-cult practices," relying on "folklore and magic spells," adds François Chollet, a computer scientist at Google in Mountain View, California. For example, he says, they adopt pet methods to tune their AIs' "learning rates"—how much an algorithm corrects itself after each mistake—without understanding why one is better than others. In other cases, AI researchers training their algorithms are simply stumbling in the dark. For example, they implement what's called "stochastic gradient descent" in order to optimize an algorithm's parameters for the lowest possible failure rate. Yet despite thousands of academic papers on the subject, and countless ways of applying the method, the process still relies on trial and error.
Rahimi's paper highlights the wasted effort and suboptimal performance that can result. For example, it notes that when other researchers stripped most of the complexity from a state-of-the-art language translation algorithm, it actually translated from English to German or French better and more efficiently, showing that its creators didn't fully grasp what those extra parts were good for. Conversely, sometimes the bells and whistles tacked onto an algorithm are the only good parts, says Ferenc Huszár, a machine learning researcher at Twitter in London. In some cases, he says, the core of an algorithm is technically flawed, implying that its good results are "attributable entirely to other tricks applied on top."
Rahimi offers several suggestions for learning which algorithms work best, and when. For starters, he says, researchers should conduct "ablation studies" like those done with the translation algorithm: deleting parts of an algorithm one at a time to see the function of each component. He calls for "sliced analysis," in which an algorithm's performance is analyzed in detail to see how improvement in some areas might have a cost elsewhere. And he says researchers should test their algorithms with many different conditions and settings, and should report performances for all of them.
Ben Recht, a computer scientist at the University of California, Berkeley, and coauthor of Rahimi's alchemy keynote talk, says AI needs to borrow from physics, where researchers often shrink a problem down to a smaller "toy problem." "Physicists are amazing at devising simple experiments to root out explanations for phenomena," he says. Some AI researchers are already taking that approach, testing image recognition algorithms on small black-and-white handwritten characters before tackling large color photos, to better understand the algorithms' inner mechanics.
Csaba Szepesvári, a computer scientist at DeepMind in London, says the field also needs to reduce its emphasis on competitive testing. At present, a paper is more likely to be published if the reported algorithm beats some benchmark than if the paper sheds light on the software's inner workings, he says. That's how the fancy translation algorithm made it through peer review. "The purpose of science is to generate knowledge," he says. "You want to produce something that other people can take and build on."
Not everyone agrees with Rahimi and Recht's critique. Yann LeCun, Facebook's chief AI scientist in New York City, worries that shifting too much effort away from bleeding-edge techniques toward core understanding could slow innovation and discourage AI's real-world adoption. "It's not alchemy, it's engineering," he says. "Engineering is messy."
Recht sees a place for methodical and adventurous research alike. "We need both," he says. "We need to understand where failure points come so that we can build reliable systems, and we have to push the frontiers so that we can have even more impressive systems down the line."