Metadata. It's an obscure data science term that was unknown to most people until 2013, when they learned that the U.S. National Security Agency (NSA) is harvesting vast amounts of it from telephone calls. Government officials have downplayed the sensitivity of such data, but a crowdsourced study of phone metadata now finds that highly revealing information can be gleaned from a simple list of who called whom.
NSA's intrusion into citizen's private lives may have roiled academics, but it has remained unclear what the spy agency was learning from phone metadata. A White House spokesperson reassured the public in 2013 that the metadata harvesting "does not allow the government to listen in on anyone’s telephone calls," leaving privacy intact. Ever since then, a trio of computer scientists from Stanford University in Palo Alto, California—Jonathan Mayer, Patrick Mutchler, and John Mitchell—has been harvesting phone metadata themselves to see what can be revealed.
Unlike NSA, the researchers collected their data with consent from people who downloaded an app called MetaPhone. Once installed on a smart phone, it collects the phone numbers and timing of every call and text message made and received. More than 800 people downloaded the app and consented. If their privacy really is protected, then the records of their 1.2 million text messages and 250,000 calls should reveal little.
In fact, the metadata revealed quite a lot. By using public information and cheap commercial databases to map phone numbers to businesses, organizations, and social media profiles, metadata revealed the location and identity of most of the people, the team reports today in the Proceedings of the National Academy of Sciences. Even deeply private details such as chronic health problems, religious affiliations, and drug use emerged by simply linking people to various clinics, stores, and organizations through their call records.
The sensitivity of phone metadata is "common knowledge in the security and privacy communities," Mutchler says. The goal of the study was to "put hard data behind these hunches." The more important revelation, he says, is the shape of a graph that charts phone call networks. In an attempt to limit the scope of phone metadata surveillance, NSA is purportedly following a "two-hop" rule: For any given person of interest, metadata can only be harvested from people called by that person, and then also people called by them. But the study found that a large proportion of their subjects were connected to each other not through personal relationships but through customer service lines, telemarketers, and two-factor authentication services such as those used by Google. Even with a two-hop limitation, an NSA analyst could in principle "hop" to an additional 25,000 people from any one individual.
"The existing literature on telephone graph structure hadn't really mentioned these hubs," Mutchler says. "Their presence makes some of the legal limitations on the NSA's access to metadata totally ineffective," assuming the goal of the two-hop limit is to reduce unnecessary intrusion into people's private lives.
"The study has important implications for surveillance law and policy," says Arvind Narayanan, a computer scientist and data privacy expert at Princeton University. "Our intuition for terms such as 'two hops,’ [and how it limits the number of people connected to you], proves wildly inaccurate when applied to modern telephone networks." And he notes that NSA has vastly more data and resources than academic researchers. "With access to millions of records and sophisticated machine learning techniques, it is likely that one can obtain a far more complete picture of individuals' sensitive personal details, behavior, and more."