Introducing the Open Science Chain: Protecting Integrity and Provenance of Research Data

Publication TypeConference Paper
Year of Publication2019
AuthorsSivagnanam S, Nandigam V, Lin K
Conference NameProceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning)
PublisherAssociation for Computing Machinery
Conference LocationNew York, NY, USA
ISBN Number9781450372275
KeywordsBlockchain, Cryptography, Data Integrity, Data Provenance, Data Reproducibility, Distributed Ledger Technology

Data sharing is an integral component of research and academic publications, allowing for independent verification of results. Researchers have the ability to extend and build upon prior research when they are able to efficiently access, validate, and verify the data referenced in publications. Despite the well known benefits of making research data more open, data withholding rates have remained constant. Some disincentives to sharing research data include lack of credit, and fear of misrepresentation of data in the absence of context and provenance. While there are several research data sharing repositories that focus on making research data available, there are no cyberinfrastructure platforms that enable researchers to efficiently validate the authenticity of datasets, track the provenance, view the lineage of the data and verify ownership information. In this paper, we introduce and provide an overview of the NSF funded Open Science Chain, a cyberinfrastructure platform built using blockchain technologies that securely stores metadata and verification information about research data and tracks changes to that data in an auditable manner in order to address issues related to reproducibility and accountability in scientific research.