Read our COVID-19 research and news.

An artificial intelligence (AI) trained on the photos of a dog, crab, and duck (top) would be vulnerable to deception because these photos contain subtle features that could be manipulated. The images on the bottom row don’t contain these subtle features, and are thus better for training secure AI.

Ilyas, Santurkar, Tsipras, Engstrom, Tran, Madry

Scientists help artificial intelligence outsmart hackers

NEW ORLEANS, LOUISIANA—A hacked message in a streamed song makes Alexa send money to a foreign entity. A self-driving car crashes after a prankster strategically places stickers on a stop sign so the car misinterprets it as a speed limit sign. Fortunately these haven’t happened yet, but hacks like this, sometimes called adversarial attacks, could become commonplace—unless artificial intelligence (AI) finds a way to outsmart them. Now, researchers have found a new way to give AI a defensive edge, they reported here last week at the International Conference on Learning Representations.

The work could not only protect the public. It also helps reveal why AI, notoriously difficult to understand, falls victim to such attacks in the first place, says Zico Kolter, a computer scientist at Carnegie Mellon University, in Pittsburgh, Pennsylvania, who was not involved in the research. Because some AIs are too smart for their own good, spotting patterns in images that humans can’t, they are vulnerable to those patterns and need to be trained with that in mind, the research suggests.

To identify this vulnerability, researchers created a special set of training data: images that look to us like one thing, but look to AI like another—a picture of a dog, for example, that, on close examination by a computer, has catlike fur. Then the team mislabeled the pictures—calling the dog picture an image of a cat, for example—and trained an algorithm to learn the labels. Once the AI had learned to see dogs with subtle cat features as cats, they tested it by asking it to recognize fresh, unmodified images. Even though the AI had been trained in this odd way, it could correctly identify actual dogs, cats, and so on nearly half the time. In essence, it had learned to match the subtle features with labels, whatever the obvious features.

The training experiment suggests AIs use two types of features: obvious, macro ones like ears and tails that people recognize, and micro ones that we can only guess at. It further suggests adversarial attacks aren’t just confusing an AI with meaningless tweaks to an image. In those tweaks, the AI is smartly seeing traces of something else. An AI might see a stop sign as a speed limit sign, for example, because something about the stickers actually makes it subtly resemble a speed limit sign in a way that humans are too oblivious to comprehend.

Some in the AI field suspected this was the case, but it’s good to have a research paper showing it, Kolter says. Bo Li, a computer scientist at the University of Illinois in Champaign who was not involved in the work, says distinguishing apparent from hidden features is a “useful and good research direction,” but that “there is still a long way” to doing so efficiently.

So now that researchers have a better idea of why AI makes such mistakes, can that be used to help them outsmart adversarial attacks? Andrew Ilyas, a computer scientist at the Massachusetts Institute of Technology (MIT) in Cambridge, and one of the paper’s authors, says engineers could change the way they train AI. Current methods of securing an algorithm against attacks are slow and difficult. But if you modify the training data to have only human-obvious features, any algorithm trained on it won’t recognize—and be fooled by—additional, perhaps subtler, features.

And, indeed, when the team trained an algorithm on images without the subtle features, their image recognition software was fooled by adversarial attacks only 50% of the time, the researchers reported at the conference and in a preprint paper posted online last week. That compares with a 95% rate of vulnerability when the AI was trained on images with both obvious and subtle patterns.

Overall, the findings suggest an AI’s vulnerabilities lie in its training data, not its programming, says Dimitris Tsipras of MIT, a co-author. According to Kolter, “One of the things this paper does really nicely is it drives that point home with very clear examples”—like the demonstration that apparently mislabeled training data can still make for successful training—“that make this connection very visceral.”