Fraud-Detection Tool Could Shake Up Psychology

AMSTERDAM—The most startling thing about the latest scandal to hit social psychology isn’t the alleged violation of scientific ethics itself, scientists say, or the fact that it happened in the Netherlands, the home of fallen research star and serial fraudster Diederik Stapel, whose case shook the field to its core less than a year ago. Instead, what fascinates them most is how the new case, which led to the resignation of psychologist Dirk Smeesters of Erasmus University Rotterdam and the requested retraction of two of his papers by his school, came to light: through an unpublished statistical method to detect data fraud.

The technique was developed by Uri Simonsohn, a social psychologist at the Wharton School of the University of Pennsylvania, who tells Science that he has also notified a U.S. university of a psychology paper his method flagged.

That paper’s main author, too, has been investigated and has resigned, he says. As Science went to press, Simonsohn said he planned to reveal details about his method, and both cases, as early as this week.

If it proves valid, Simonsohn’s technique might find other possible cases of misconduct lurking in the vast body of scientific literature. “There’s a lot of interest in this,” says Brian Nosek of the University of Virginia in Charlottesville, who recently launched an examination of replicability in social psychology findings (Science, 30 March, p. 1558).

The method may help the field of psychological science clean up its act and restore its credibility, he adds—but it may also turn colleagues into adversaries and destroy careers. The field will need ample debate on how to use it, Nosek says, much the way physicists had to grapple with the advent of nuclear physics. “This is psychology’s atomic bomb,” he says.

Simonsohn already created a stir last year with a paper in Psychological Science showing that it’s “unacceptably easy” to prove almost anything using common ways to massage data and suggesting that a large proportion of papers in the field may be false positives. He first contacted Smeesters on 29 August 2011 about a paper on the psychological effects of color, published earlier that year. The two corresponded for months, and Smeesters sent Simonsohn the underlying data file on 30 November. Smeesters also informed a university official about the exchange. Simonsohn says he was then contacted by the university.

An investigative commission set up in January by Erasmus reviewed Simonsohn’s unpublished method with the help of two outside statisticians and concluded it was valid. When the panel subjected the rest of Smeesters’s work to the same method, it found two more papers—one of them unpublished—that failed the test. Digging further, the commission reports that it discovered other problems; for instance, Smeesters said he no longer had the raw data for the three papers due to a computer crash at his home in September—which he had mentioned on Facebook—while versions of the data recorded on paper were lost when he moved office. In a report released on 25 June, the commission said it couldn’t prove that Smeesters had committed fraud, but it doubted the credibility of his explanations for the loss of data, and it had “no confidence in the scientific integrity” of the three papers.

Smeesters, who insists he did not make up data, tells Science that he stepped down in January for medical reasons, including feeling burned out and having a bad knee. (The university officially accepted his resignation on 21 June, 4 days before it released the investigation commission’s report.) Smeesters says he can’t judge Simonsohn’s method because he’s not a statistician. But he says that the odd data patterns found by Simonsohn emerged because of what he calls “questionable research practices” that aren’t uncommon in his field, such as doing multiple analyses and picking the most convincing one, or leaving out certain subjects. Simonsohn says that explanation doesn’t add up: “The raw data he sent me is not consistent with dropping observations selectively, but they are consistent with data tampering.”

News of the affair had psychologists and statisticians yearning for details of Simonsohn’s method. Initially, his name and details of his work were redacted from the released university report because his study remains unpublished. Three days later, with Simonsohn’s permission, the university released an unredacted version, which offers more clues. According to the report, Simonsohn’s method rests on the assumption that if a study measures the same variable in groups that supposedly are independent samples from the same population, chance will produce a certain variance between the means of those groups; if a paper reports means that have too little variance, that may be a sign that the data have been manipulated. A simulation can reveal the probability that this has happened in a given study.

Simonsohn says it would have been unethical for him to release his paper—and accuse Smeesters and the U.S. scientist—before the universities had finished investigating. The full paper and supplemental materials will clarify the method, he says.

“Uri will be vindicated when more details are made public,” Phil Fernbach, a marketing researcher at the University of Colorado, Boulder, predicted in a comment on Science’s website last week. “He is a careful scientist who would never make such serious accusations if the evidence weren’t overwhelming.”

Based on details in the university report, Richard Gill, a statistician at Leiden University in the Netherlands, carried out some data simulations following what he believes is Simonsohn’s procedure. On his own blog, Gill concluded that “Simonsohn has probably found a useful investigative tool,” but he criticized details of the way the university commission had applied it to Smeesters’s work; he said it was akin to a “medieval instrument of torture: the accused is forced to confess by being subjected to an onslaught of vicious p-values which he does not understand.”

Simonsohn says he has not read the analysis but objects to Gill’s language. “If it wasn’t targeted towards people trying to reduce fraud in science, the sophomoric tone would be amusing,” he says. Geurt Jongbloed, a statistician at Delft University of Technology in the Netherlands, shares some of Gill’s concerns but says without additional information, it’s impossible to judge the validity of the new method.

At the moment, scientists don’t know the proportion of papers Simonsohn’s tool might be applicable to or how sensitive it is. Even if it is very good and wrongly indicts only one in 10,000 papers, it would misfire when applied widely, says psychologist Eric-Jan Wagenmakers of the University of Amsterdam. And some worry it could lead to witch hunts or score settling between rivals, Nosek says. Harvard University psychologist Daniel Gilbert says AAAS (the publisher of Science) should have top statisticians study Simonsohn’s method and recommend how to use it. “If we really do have a new tool to uncover fraud, then we should be grateful,” Gilbert says. “But the only difference between a tool and a weapon is in how judiciously they are wielded, and we need to be sure that this tool is used properly, fairly, and wisely.”