Can the vocabulary of deceit reveal fraudulent studies?

Chris Blakely/Flickr (CC BY-NC-ND 2.0)

Can the vocabulary of deceit reveal fraudulent studies?

Do scientific fraudsters have a distinct literary style? They do, suggests a new study from researchers who pay close attention to the vocabulary of deceit. But the language analysis methods aren’t yet a reliable fraud-busting tool, the researchers and others caution.

In their study, published in the Journal of Language and Social Psychology, David Markowitz and Jeffrey Hancock of Stanford University in Palo Alto, California, examined some 2 million words published in 253 biomedical research papers that had been retracted between 1973 and 2013 because they contained fraudulent data. They compared that sample with 253 unblemished papers generally published in the same journals in the same years, as well as 62 papers that had been retracted for reasons other than fraud (such as authorship disputes).

Drawing on studies of the vocabulary of deception in the corporate finance world, the authors then determined the “obfuscation index” for each paper. Using word-analysis techniques and other tools, the researchers rated how abstract or concrete a text was, and calculated the presence of certain types of words, including causal terms, positive emotions, and technical jargon.

The deceitful papers tend to be vaguer, more difficult to read, and have fewer quantifying words, the authors found. They also use a bit more jargon, with fraudulent publications containing 1.5% more technical terms. Another trend, Markowitz told ScienceInsider, is that spurious papers generally had more references. Such characteristics tend to make fraudulent papers more “convoluted” and “costly” in terms of readers’ time, he says.

But linguistic sleuthing is still far from a perfect method for revealing fraud, Markowitz says. In particular, he notes that the team’s method was able to flag whether a paper was dishonest or not only 57.2% of the time.

That means “that almost half the legitimate articles would be improperly flagged” as fraudulent, notes Paul Ginsparg, a physicist at Cornell University and founder of the preprint server ArXiv. “This is barely better than a coin flip,” Ginsparg notes, and makes the approach “unusable” for publishers seeking to ferret out real-world fraud, at least in its current form.

Ginsparg also notes that the authors suggest “that fraudulent authors intentionally use obfuscatory language.” But an alternative explanation could be that fraudsters are as “intrinsically as poor writers as they are scientists,” he says.  

“More needs to be done to improve this technique if it is to become a reliable method,” adds James Parry, chief executive of the United Kingdom’s Research Integrity Office in London. In the long run, he says, the scientific community should probably focus more effort on preventing research misconduct in the first place, although post-publication “detection will always remain important.”