Somehow, even in a room full of loud conversations, our brains can focus on a single voice in something called the cocktail party effect. But the louder it gets—or the older you are—the harder it is to do. Now, researchers may have figured out how to fix that—with a machine learning technique called the cone of silence.
Computer scientists trained a neural network, which roughly mimics the brain’s wiring, to locate and separate the voices of several people speaking in a room. The network did so in part by measuring how long it took for the sounds to hit a cluster of microphones in the room’s center.
When the researchers tested their setup with extremely loud background noise, they found that the cone of silence located two voices to within 3.7º of their sources, they reported this month at the online-only Conference on Neural Information Processing Systems. That compares with a sensitivity of only 11.5º for the previous state-of-the-art technology. When the researchers trained their new system on additional voices, it managed the same trick with eight voices—to a sensitivity of 6.3º—even if it had never heard more than four at once.
Such a system could one day be used in hearing aids, surveillance setups, speakerphones, or laptops. The new technology, which can also track moving voices, might even make your Zoom calls easier, by separating out and silencing background noise, from vacuum cleaners to rambunctious children.