OpenCitations Meta: Conclusion, Acknowledgements, and References

3 Jun 2024


(1) Arcangelo Massari, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(2) Fabio Mariani, Institute of Philosophy and Sciences of Art, Leuphana University, Lüneburg, Germany {};

(3) Ivan Heibi, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(4) Silvio Peroni, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(5) David Shotton, Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom {}.

6. Conclusion

This article detailed the methodology used to develop OpenCitations Meta, a database that stores and delivers bibliographic metadata for all publications involved in the OpenCitations Indexes. This process involves two main phases: (1) an automatic curation analysis aimed at deduplicating entities, correcting errors and enriching information, and (2) a data conversion to RDF, while keeping track of changes and provenance in RDF.

Information about new publications is continuously being added to Crossref, DataCite, and PubMed, and we will develop procedures to ingest these new metadata into OpenCitations Meta in a regular and timely manner. Furthermore, work is already underway to ingest bibliographic metadata from the Japan Link Center and the OpenAIRE Research Graph, and other sources will be included as our human and computational resources permit. OpenCitations Meta will thus continue to grow.

OpenCitations Meta has three major benefits. First, the use of OMIDs (OpenCitation Meta Identifiers) for all stored entities enables OpenCitations Meta to act as a mapping hub for publications that may have more than one external PID (for example a journal article described in Crossref with a DOI (Digital Object Identifier), and the same publication described in PubMed with a PMID (PubMed Identifier), while also making it possible to characterise citations involving resources lacking any external PIDs. Consequently, the second benefit is that OpenCitations Meta allows citations in OpenCitations Indexes to be described as OMID-to-OMID, disambiguating citations between documents with different identifier schemes, e.g. represented as DOI-to-DOI on Crossref and PMID-to-PMID on PubMed. Third, OpenCitations Meta speeds search operations to retrieve metadata on publications involved in the citations stored in the OpenCitations Citation Indexes, since these metadata are now kept in-house, rather than being retrieved by on-the-fly API calls to external resources.

Future challenges will be to elaborate a disambiguation system for people lacking an ORCID identifier, to improve the quality of the existing metadata, to enhance the search operations and the storage efficiency, to add additional metadata fields for Abstracts, Funder IDs, Funding information, and Institutional identifiers, and to populate these where these metadata are available from our sources.

Finally, an interface will be implemented and made available to trusted domain experts to permit direct real-time manual curation of metadata held by OpenCitations Meta. Such a system will track changes and provenance, will preserve the delta between different versions of each entity, and will retain information such as the agent responsible for the change, the primary source, and the date. In this way, we will strive to make OpenCitations Meta not only comprehensive but also an accurate and fully open and reusable source of bibliographic metadata to which members of the scholarly community can directly contribute.

7 Acknowledgements

This work has been partially funded by the European Union’s Horizon 2020 Research and Innovation Program under grant agreement No 101017452 (OpenAIRE-Nexus Project).


This paper is available on arxiv under CC 4.0 DEED license.