Dan4th Nicholas/Flickr (CC BY 2.0)

Now free: citation data from 14 million papers, and more might come

Consider this: A scientist publishes a study citing other papers. Those cited papers, in turn, cite studies that came before them. But much of that citation information—which is often of great interest to scientists tracking research trends and hot topics—has not been available freely.

Enter the Initiative for Open Citations (I4OC), a project aiming to make citation data free to all, formally announced today by six organizations, including the Wikimedia Foundation, publisher Public Library of Science, and the open-access journal eLife. So far, the initiative has partnered with 29 journal publishers to enable anyone to access citation data from about 14 million papers indexed by Crossref, a nonprofit collaboration that promotes the sharing of scholarly information. And more publishers are likely to sign on, says Mark Patterson, executive director of eLife, in Cambridge, U.K.

Conversations about opening up citation data initially took place this past September at the eighth Conference on Open Access Scholarly Publishing, in response to a report that found that just 3% of almost a thousand publishers depositing data on Crossref were making citation data open. In practice, that meant that citation data were available for just 1% of the roughly 35 million papers on Crossref, says Dario Taraborelli, head of research at the Wikimedia Foundation in San Francisco, California. 

Now, that share has risen to more than 40% of Crossref papers as a result of I4OC’s efforts. Even some publishers that traditionally charge subscriptions to read their journals, including Taylor & Francis and Wiley-Blackwell, have jumped on board.

Citation data are already available for a fee from other providers, including Clarivate Analytics’s Web of Science and publishing giant Elsevier’s Scopus. And Google Scholar allows users to see citation data but not reuse them. In contrast, I4OC will allow users to freely access and reuse citation data under CC0, the most liberal copyright license.

Freeing up citation data could have a number of benefits, I4OC’s founders say. One of the most basic is helping scientists keep abreast with what their peers are doing and reading. “Citation networks form the fabric that connects scientific knowledge” and “are essential to attribute credit to those scientists who first described a finding,” says Bernd Pulverer, head of scientific publishing at EMBO Press in Heidelberg, Germany, which is participating in I4OC.

Citation data can also open a window into how ideas and research fields evolve, Patterson says. And they “will help all funders better evaluate the research they fund whilst also providing the opportunity for others to build new tools and services to fully explore this rich graph of knowledge,” says Robert Kiley, head of open research at the Wellcome Trust in London, a major research funder backing I4OC.

The Wellcome Trust and the Bill & Melinda Gates Foundation are two of 33 nonpublisher stakeholders involved in the I4OC project. The six main founders also include the data repositories DataCite and OpenCitations, and the Centre for Culture and Technology at Curtin University in Perth, Australia.