Read our COVID-19 research and news.

Hirotaka Nakasone

Hirotaka Nakasone

Reuters/Joe Burbank/Pool

Vocal ‘fingerprints’ could help nab criminals

On the night George Zimmerman fatally shot 17-year-old Trayvon Martin in Sanford, Florida, a 911 call captured the sound of someone screaming. But who? An expert for the prosecution testified it was Martin, begging for help in his last moments. But at a pretrial hearing, several scientists said the recording quality was too poor to tell. The call was not admitted as evidence.

The case illustrates the problems in speaker recognition, a forensic field with a checkered history that is trying to find solid scientific ground. Police and lawyers “can’t tell the difference between somebody who’s deluded or who is a charlatan, and somebody who is actually doing solid scientific work,” says Geoffrey Stewart Morrison, an independent forensic scientist and former chair of the Forensic Acoustics Subcommittee of the Acoustical Society of America in Vancouver, Canada.

In the 1960s, analysts began converting recordings into images using spectrograph machines and making subjective judgments about how similar they looked—a method once commonly used in courts but now widely discredited. More reliable alternatives have emerged. Signal processing engineers developed automated systems that typically measure thefrequency components of speech every few milliseconds. Phonetics experts break up recordings based on individual sounds, then analyze the elements using statistical tests or their own judgments.

Read more of our special package that examines the hurdles and advances in the field of forensics

Automated systems now work very well—some banks rely on them to identify their clients—but only if you clearly speak a standard sentence into a microphone. Comparing real-world samples is much more error-prone, says Hirotaka Nakasone, a senior scientist in the Federal Bureau of Inivestigation’s voice recognition program who testified in the Trayvon Martin case. The same person will sound different during a bar fight versus speaking calmly in an interrogation room, and recording quality is often poor. That’s why the admissibility of voice recognition systems in courts is contentious, although the systems are widely used in criminal investigations.

To improve accuracy, scientists are studying how factors like inebriation, emotional state, and recording devices influence voice samples. They are also testing how well existing systems perform and developing standards for things like data selection and the presentation of results. For example, a panel chaired by Nakasone is working on standards for the U.S. National Institute of Standards and Technology. The field is moving away from subjective systems, Morrison says: “Automatic systems are more robust to cognitive bias and are more easily tested.”