Of Robots and Cocktails

All together now. A robot mimics a real person’s head motions.
Credit: Hirohito M. Kondo et al., PNAS

Scientists call it the "cocktail party problem." To understand the person talking to you in a noisy room, you've got to filter out all of the conversations, clinking glass, and other noises in the background. Fortunately, our brains are up to the challenge, and now—thanks to a little help from a humanoid robot—researchers have found new clues to how we do it.

A group of hearing scientists at NTT Communication Science Laboratories in Kanagawa, Japan, conducted the study. The researchers didn't head out to their local bar to investigate the cocktail party problem. Instead, they opted for a far more antisocial environment. They recruited volunteers and asked them to sit alone in a small room and face a speaker. Then the team played a combination of two different tones (somewhat like the sound file at right). At first, just like people at a loud party, the volunteers heard the sound as one cacophonous noise. But within a few seconds, they were able to isolate one tone from the other.

Then the researchers brought in the robot. Team leader Hirohito Kondo explains that he and his colleagues wanted to know if head movements could reset the cocktail party effect—that is, once we've filtered out the background noise, does turning our heads bring back the cacophony? It's not an easy question to answer. When we turn our heads, sounds reach our ears in a different way, the sources of various sounds appear to shift, and we even pay attention to different things. The robot—a humanoid head with built-in microphones designed to mimic how we hear—helped Kondo's team figure out which, if any, of these factors could reset the cocktail party effect.

Here's how the experiments worked. In one room a human volunteer listened to the two-toned sound relayed from microphones in the robot's ear canal. At various points, the researchers would instruct the volunteer to turn his or her head. In another room, the robot turned its head in synch with the person's. With a series of trials using this setup—some in which only the source of the sound changed, some where there was only head motion, and some with both—the team was able to isolate the consequences of head motion on the cocktail party effect.

Rapid head motion resets the cocktail party effect, Kondo and colleagues report online today in the Proceedings of the National Academy of Sciences. But just changing the things we pay attention to—in this case, the human looking at different LED lights placed across the room—does not. And when the cocktail party effect is reset and the cacophony returns, the researchers found, it only takes a few seconds for our brain to sift out the noises again. So a quick swivel of the head makes us reset our perception of what we're hearing. We then start the process of teasing out the separate parts of the noise again.

Josh McDermott, a hearing scientist at New York University, describes the research as "highly novel" and points out that the key result is surprising. "If you move your head such that the acoustic stimulus at the ears changes, but the environment itself doesn't, you wouldn't have thought you would need to restart the process of interpretation," he says.

This setup does not appear to be optimal, he adds, because it means that you're re-evaluating the auditory scene for no good reason. "For me the take home from this paper is that this is not actually adaptive. It reveals a little bit of a bug in the system: that your brain can't completely discount the effect of the head motion."

Al Bregman, a hearing scientist at McGill University in Montreal, Canada, was impressed by the research but is reluctant to believe that such a flaw in the brain exists. Instead, he suggests, there could be a problem with the sound stimulus used in these sorts of studies. "The system is so exquisite in its capabilities, able to detect sub-millisecond asynchronies between the signals at the two ears," he says, "that it is hard for me to believe that the Kondo et al. results reflect a crude flaw in the system."