Psychologists grow increasingly dependent on virtual research subjects

New studies suggest that volunteer research subjects for Amazon Mechanical Turk—an online crowdsourcing service—are less numerous and diverse than hoped.


Psychologists grow increasingly dependent on online research subjects

In May, 23,000 people voluntarily took part in thousands of social science experiments without ever visiting a lab. All they did was log on to Amazon Mechanical Turk (MTurk), an online crowdsourcing service run by the Seattle, Washington–based company better known for its massive internet-based retail business. Those research subjects completed 230,000 tasks on their computers in 3.3 million minutes—more than 6 years of effort in total.

The prodigious output demonstrates the popularity of an online platform that scientists had only begun to exploit 5 years ago. In 2011, according to Google Scholar, just 61 studies using MTurk were published; last year the number topped 1200. “This is a revolution in social and behavioral science,” says psychologist Leib Litman of the Lander College for Men in New York City, who generated the May data from TurkPrime, a website that he created last year with computer scientist Jonathan Robinson, also at Lander, to facilitate MTurk studies. “Research is moving from the lab to the cloud.”

Why bother with the cloud? A social sciences study with hundreds of live subjects normally requires weeks of work just to gather the data, not to mention finding people and signing them up. Last month’s studies on MTurk—which include a test of the limits of people’s generosity, a comparison of religiosity and humility, and a measurement of the psychological impact of graphic warnings on cigarette packages—took only days from start to finish.

But the platform’s popularity has raised concerns, as researchers discussed at the Association for Psychological Science meeting in Chicago, Illinois, last month. Some worry that they are becoming too dependent on a commercial platform. “Academic research would be really screwed if Amazon decided to shut it down,” says Todd Gureckis, a psychologist at New York University (NYU) in New York City. Others question whether the research volunteers are paid fairly and treated ethically. And looming over it all are questions about who these anonymous volunteers actually are, and concerns that they are less numerous and diverse than researchers hope.

MTurk’s ascendancy in the social sciences—more than 1000 researchers have registered experiments using it on TurkPrime—is unexpected given the clunkiness of its interface. When Litman first tried to use the platform, he found it baffling. “It looks like a website designed in the 1990s by computer engineers,” he says. That shouldn’t be surprising, considering that Amazon created MTurk as a tool for harnessing humans to improve artificial intelligence software. For example, when a computer struggles to identify the content of a photograph, Turkers can be hired to name objects, helping the computer learn.

But researchers have more complex needs, and adapting MTurk for social sciences often requires computer programming skills that few have. Besides TurkPrime, another research tool for MTurk is psiTurk, created by Gureckis, which is like an “app store” for experiments. Rather than write programs from scratch, you can browse free, open-source code from other researchers’ experiments and just tweak it.

The thorniest issue for researchers has been getting a handle on just how big and diverse the Turker population is. Amazon has boasted that MTurk harnesses more than 500,000 workers from around the globe, but what researchers want to know is how many unique, active users are willing to participate in their studies at any given time. If that number is small, then the same people could be recirculating through experiments, and that can bias the results.

When Turkers register, Amazon marks them with an ID that is buried in the raw code after a work session is complete. Those IDs have enabled researchers to study Turker demographics with a method borrowed from wildlife ecology called capture-recapture: To estimate the number of fish in a lake, capture them, mark them, and return them; the smaller and slower changing the population, the higher the proportion of marked fish will get recaptured. By comparing the Turker IDs from experiments across multiple labs, it is possible to conduct a virtual capture-recapture survey retrospectively.


Neil Stewart, a psychologist at the University of Warwick in Coventry, U.K., led the first effort to estimate the effective MTurk research population with this method—and the results sent shock waves through the community last year. Seven psychology labs in the United States, Europe, and Australia ran 114,000 experimental sessions over a 3-year period. The number of unique people among the subjects came to only 30,000. Rather than a pool of half-a-million subjects always on tap, Stewart estimated that the true number of Turkers that are willing to take part in an experiment at any one time is only about 7300.

“What seemed like a virtually infinite subject pool was in fact more like a very large state university psychology pool,” Gureckis says. Stewart’s data show that the population churns rapidly: Half the Turker population that participates in research is replaced by fresh people every 7 months.

Those Turkers are also far less diverse than was thought. Though Amazon has long noted the global nature of the community, surveys of those completing experimental tasks reveal that the vast majority are based in the Unites States. And compared with the average American, Litman says, Turkers “skew young, they are more liberal, more urban, and more likely to be single.” Knowing such traits, he notes, is crucial for researchers as they try to interpret their data.

Turkers are also poorly paid, although their hourly rate is difficult to calculate, in part because Amazon takes a cut of between 20% to 40%, and because Turkers undertake multiple tasks at once with break time. For the 6 years of accumulated effort in May, researchers doled out $164,882. That would seem to translate to an average pay rate of about $3 per hour, but “the true hourly rate is somewhere between $4 and $8 per hour,” Litman says.

Many Turkers complain that this is too low, because social science experiments often take more effort—and time—than other tasks. Gureckis agrees. At NYU, “we pay subjects $8 to $10 per hour, and there’s often a bonus at the end,” he says. “MTurk subjects should be paid the same as they would in the lab. That’s what we try to do.”

Nor do they yet enjoy the same ethical protections. When subjects drop out of an experiment, “you’re supposed to pay them proportional to the time they put in,” Gureckis says. But MTurk has no mechanism for partial pay: A Turker must complete an entire task or get no pay at all.

And although researchers are supposed to protect the identity of subjects, Gureckis says, “MTurk is not really anonymous.” In 2013, a research team showed that it is possible to match a Turker’s worker ID to their account on Amazon’s retail website. Depending on how much information is associated with that user profile, it can reveal a Turker’s buying habits, video tastes, and even their full name and location. Researchers have suggested ways to remedy this privacy issue, but the company so far seems to have taken no action. (Amazon declined a request for an interview with Science.)

Some researchers wonder whether the giant company will keep MTurk going—or whether social scientists need to develop their own customized alternative. “At this point, MTurk has become so important for social science that the National Science Foundation should be negotiating directly with Amazon,” Gureckis says. “We’re subsidizing this service with millions of dollars in federal grant money.”