VANCOUVER, CANADA—No, the red player in the video above isn’t having a seizure. And the blue player isn’t drunk. Instead, you’re watching what happens when one artificial intelligence (AI) gets the better of the other, simply by behaving in an unexpected way.
One way to make AI smarter is to have it learn from its environment. Cars of the future, for example, will be better at reading street signs and avoiding pedestrians as they gain more experience. But hackers can exploit these systems with “adversarial attacks”: By subtly and precisely modifying an image, say, you can fool an AI into misidentifying it. A stop sign with a few stickers on it might be seen as a speed limit sign, for example. The new study reveals AI can be fooled into not only seeing something it shouldn’t, but also into behaving in a way it shouldn’t.
The study takes place in the world of simulated sports: soccer, sumo wrestling, and a game where a person stops a runner from crossing a line. Typically, both competitors train by playing against each other. Here, the red bot trains against an already expert blue bot. But instead of letting the blue bot continue to learn, the red bot hacks the system, falling down and not playing the game as it should. As a result, the blue bot begins to play terribly, wobbling to and fro like a drunk pirate, and losing up to twice as many games as it should, according to research presented here this month at the Neural Information Processing Systems conference.
Such adversarial attacks could cause real-world problems for autonomous driving, financial trading, or product recommendation systems, like those seen on Amazon. One can imagine a car owned by a prankster or terrorist jiggling its steering wheel in just such a way as to cause a nearby car to needlessly swerve off the road, or an algorithm executing trades that cause others to go haywire and create a market crash.