In 2010, Ivan Oransky and Adam Marcus, two longtime health journalists, began to keep a list of retracted scholarly articles as a source of story ideas for Retraction Watch, the scrappy New York City–based blog they launched that year.
They found many more retraction notices than they had time to mention on their blog, and more than had been reported elsewhere. Soon, academics and journals were asking for Oransky’s and Marcus’s data, and they decided to create a more comprehensive database.
The beta version of that database debuted in 2016 with financial support from the John D. and Catherine T. MacArthur Foundation in Chicago, Illinois, and the Laura and John Arnold Foundation in Houston, Texas. This week, Retraction Watch released a comprehensive version of the database online for public use; they have also made the entire data set available to researchers who wish to study it.
Although other organizations maintain databases that track retractions, Retraction Watch’s database includes the most, with more than 18,000 entries overall, and it covers more journals. (The National Library of Medicine’s PubMed database, in contrast, contains only about one-third that number, and includes only biomedical journals.) All the retracted papers appeared in peer-reviewed publications.
Retraction Watch staff members actively search for retraction notices instead of relying only on notices provided by journal publishers. The database may be missing some retracted papers because publishers do not always mark them well enough to be found. Retraction Watch staffers use a taxonomy they developed to record the reason for each retraction, although publishers do not always clearly or accurately report reasons for retractions.
Science performed all data analyses in consultation with Retraction Watch, using data drawn from its database on 30 August.
In analyzing the data, we made several choices:
- We excluded conference abstracts. The database records retractions of nearly 7500 conference paper abstracts, which account for about 40% of all the retractions it contains. Those abstracts and the associated papers are not representative of conventional journal articles, we believe, in part because conference papers in some fields typically receive less rigorous review than traditional journal articles do. (Authors in China wrote most of the conference abstracts retracted. Had we included those retractions in our analyses, China would have had among the world’s highest retraction rates.)
- We counted each mention of an author or country as a separate retraction. Almost all papers have multiple authors; a minority of papers have authors in multiple countries. In each case, the paper was recorded multiple times in our analyses of retractions by country and by author.
- We omitted co-authors when ranking authors with the most retractions, if those co-authors published most of their papers with first or last authors who had more retractions overall.
- We analyzed retractions according to when the original paper was published, not the year the retraction notice appeared. Using the paper’s publication year may be more meaningful because a journal can take an arbitrarily long time to publish a retraction—sometimes many years.
- We used the total number of scientific publications reported by the National Science Foundation (NSF), according to data in the Scopus database, as the denominator in our calculations of retraction rates. The NSF's time series is not considered reliable for comparing years before 2003. The data include only English-language publication titles and abstracts.
- We grouped the reasons for retraction by broad categories so that we could report overall numbers and trends by year. Most retractions listed multiple reasons, but we assigned each retraction to a single, mutually exclusive category.
- “Fraud” contains retractions attributed to scientific fraud, as the U.S. government defines it: fabrication, falsification, or plagiarism.
- “Other misconduct”: other kinds of deliberate, unethical behavior that did not match the U.S. definition.
- “Possible misconduct”: Fraud or misconduct was implied, but not specifically characterized.
- “Reliability”: problems with reproducibility or reliability, excluding fraud and other misconduct.
- “Error”: errors, excluding the categories above.
- “Miscellaneous”: retractions that had other specific reasons, or none, and excluded the categories above.