If you’re a bird enthusiast, you can pick out the “chick-a-DEE-dee” song of the Carolina chickadee with just a little practice. But if you’re an environmental scientist faced with parsing thousands of hours of recordings of birdsongs in the lab, you might want to enlist some help from your computer. A new approach to automatic classification of birdsong borrows techniques from human voice recognition software to sort through the sounds of hundreds of species and decides on its own which features make each one unique.
Collectors of animal sounds are facing a data deluge. Thanks to cheap digital recording devices that can capture sound for days in the field, “it’s really, really easy to collect sound, but it’s really difficult to analyze it,” say Aaron Rice, a bioacoustics researcher at Cornell University, who was not involved in the new work. His lab has collected 6 million hours of underwater recordings, from which they hope to pick out the signature sounds of various marine mammals.
Knowing where and when a certain species is vocalizing might help scientists understand habitat preferences, track their movements or population changes, and recognize when a species is disrupted by human development. But to keep these detailed records, researchers rely on software that can reliably sort through the cacophony they capture in the field. Typically, scientists build one computer program to recognize one species, and then start all over for another species, Rice says. Training a computer to recognize lots of species in one pass is “a challenge that we’re all facing.”
That challenge is even bigger in the avian world, says Dan Stowell, a computer scientist at Queen Mary University of London who studied human voice analysis before turning his attention to the treetops. “I realized there are quite a lot of unsolved problems in birdsong,” says Stowell, who is lead author on the new paper. Among the biggest issues: There are hundreds of species with distinct and complex calls—and in tropical hotspots, many of them sing all at once.
Most methods for classifying birdsong rely on a human to define which features separate one species from another. For example, if researchers know that a chickadee’s tweet falls within a predictable range of frequencies, they can program a computer to recognize sounds in that range as chickadee-esque. The computer gets better and better at deciding how to use these features to classify a new sound clip, based on “training” rounds where it examines clips with the species already correctly labeled.
In the new paper, Stowell and his Queen Mary colleague, computer scientist Mark Plumbley, used a different approach, known as unsupervised training. Instead of telling the computer which features of a birdsong are going to be important, they let it decide for itself, so to speak. The computer has to figure out “what are the jigsaw pieces” that make up any birdsong it hears, Stowell says. For example, some of the jigsaw pieces it selects are split-second upsweeps or downsweeps in frequency—the sharp pitch changes that make up a chirp. After seeing correctly labeled examples of which species produce which kinds of sounds, the program can spit out a list—ranked in order of confidence—of the species it thinks are present in a recording.
Stowell and Plumbley tested this approach on several natural recordings, including birdsong from the British Library Sound Archive, and a large data set recorded in Brazil (77 hours; 501 species) that was publicly released as part of an annual classification challenge organized by the Scaled Acoustic BIODiversity platform project. Their unsupervised approach performed better than the more traditional methods of classification—those based on a set of predetermined features—and managed to reach up to 85.4% accuracy in the large Brazilian data set, they report today in PeerJ.
The new system’s accuracy fell short of beating the top new computer programs that analyzed the same data sets for the annual competition. But Potamitis Ilyas, a computer scientist at the Technological Educational Institute of Crete in Greece, says that the new system deserves credit for applying unsupervised computer learning to the complex world of birdsong for the first time. He also suggests that this approach could be combined with other ways of processing and classifying sound, because it “can squeeze out some info that other techniques may miss.”
Eighty-five percent accuracy on a choice between more than 500 calls and songs is impressive, Rice says, and shows “both the biological community and the computer community what you can do with these large sound archives.” The next step, he says, is to test the technology with new recordings to see if it can hold its own.