Fraud Detection Method Called Credible But Used Like an 'Instrument of Medieval Torture'

Has Uri Simonsohn, a social psychologist at the Wharton School of the University of Pennsylvania, discovered a new way to detect scientific fraud just by subjecting the data in published papers to a novel type of statistical analysis? That's a question social psychologists and statisticians are asking after an investigative commission at Erasmus University Rotterdam in the Netherlands used his unpublished technique to probe the work of marketing researcher Dirk Smeesters—an inquiry that led to Smeesters's resignation and a request by the university to retract two of his papers. The social psychologist continues to deny that he has committed scientific fraud, however, and at least one statistician who has looked into Simonsohn's method says the technique appears to have merit, but was used in the wrong way.

"My overall opinion is that Simonsohn has probably found a useful investigative tool," mathematical statistician Richard Gill of Leiden University in the Netherlands wrote on his blog today after ScienceInsider asked him to evaluate the Erasmus commission's report, which contains the only public description of Simonsohn's method. "In this case it has been used like a medieval instrument of torture: the accused is forced to confess by being subjected to an onslaught of vicious p-values which he does not understand."

Simonsohn first contacted Smeesters on 29 August 2011 to ask him about suspicious patterns among the data in a paper on the effects of color published in the Journal of Experimental Social Psychology, according to the investigative commission's report, which was released earlier this week by the university. Later, Simonsohn also informed Stijn van Osselaer, chair of the university's Rotterdam School of Management (RSM) marketing department, about the problem. The report says Simonsohn plans to publish the case in a paper, which is currently in preparation, called "Finding Fake Data: Four True Stories, Some Stats and a Call for Journals to Post All Data."

After consulting two Dutch statisticians about Simonsohn's method, the commission—which included Erasmus statistician Patrick Groenen—concluded that it was valid. The university investigators also randomly picked four other articles from the journals Smeesters published in "to see if the patterns found by Simonsohn surface with other, randomly selected authors," and they did not find similar irregularities. The commission then subjected those studies for which Smeesters had sole control over the data to similar tests.

Because Simonsohn's method has not been published, Gill and other scientists have so far had to rely on the commission's report in their attempts to evaluate it. Details of the method are not easy for non-statisticians to follow, but Gill summarizes it in his blog this way:

Simonsohn's idea [according to the report by the investigation commission] is that if extreme data has been removed in an attempt to decrease variance and hence boost significance, the variance of sample averages will decrease. Now researchers in social psychology typically report averages, sample variances, and sample sizes of subgroups of their respondents, where the groups are defined partly by an intervention (treatment/control) and partly by covariates (age, sex, education ...). So if some of the covariates can be assumed to have no effect at all, we effectively have replications: i.e., we see group averages, sample variances, and sample sizes, of a number of groups whose true means can be assumed to be equal. Simonsohn's test statistic for testing the null-hypothesis of honesty versus the alternative of dishonesty is the sample variance of the reported averages of groups whose mean can be assumed to be equal. The null distribution of this statistic is estimated by a simulation experiment, by which I suppose is meant a parametric bootstrap.

Gill went on to perform a quick simulation experiment in which he applied Simonsohn's method to "honest" and "dishonest" data. Gill concluded that the simulation's result "supports the principle which Simonsohn has discovered."

However, he raises several criticisms. It's not clear if Simonsohn was on a cherry-picking expedition, analyzing hundreds of papers from many scientists and choosing the most promising ones to follow up on, a strategy that could mean Smeesters's work was highlighted as a statistical fluke. What's more, Gill objects to how the university commission corrected for the fact that it tested the same data in multiple ways using the so-called false discovery rate (FDR). Gill writes:

In my opinion, adjustment of p-values by pFDR methodology is absolutely inappropriate in this case. It includes a guess or an estimate of the a priori "proportion of null hypotheses to be tested which are actually false". Thus it includes a "presumption of guilt"!

Confronted with the commission's statistical analysis, Smeesters had a weak defense to offer, suggests Gill, because he had other problems: For instance, he could provide raw data for none of the controversial studies. But indiscriminately using Simonsohn's method could ensnare scientists acting in good faith, Gill worries.

Simonsohn did not respond to multiple requests for comment, and a press officer at Erasmus University said that Rolf Zwaan, chair of the commission that investigated Smeesters, "will not respond to blog posts."

Smeesters himself could not be reached for comment by ScienceInsider, but he defended himself in an interview with Dutch newspaper Algemeen Dagblad published on Tuesday. "I'm no Stapel," he was quoted as saying, in a reference to fellow social psychologist Diederik Stapel, who has admitted to fabricating results. "I didn't make up data. I committed a scientific error. It hurts to be portrayed like this."

Smeesters repeated in the interview what he told the university: That he only engaged in so-called data massaging, a "large grey area" in his field, and that the raw data for some of his experiments were lost when his home computer crashed. Paper records for the studies, he added, also disappeared when he moved his office. "It doesn't help in the public image," Smeesters conceded in the interview. "But it really happened." (The commission said it had "doubts about the credibility of these reasons.")

The affair is the third high-profile misconduct case in the Netherlands in less than a year, and the second in Rotterdam. In November of 2011, 2 months after the Stapel affair broke, Erasmus MC, a hospital affiliated with the university, fired cardiologist Don Poldermans, the author of more than 600 publications, after an investigation revealed that he had made up data, did not have written consent from some patients included in studies, and submitted two reports to meetings based on data he knew to be unreliable.