Experimental Data Curation at Large Instrument Facilities with Open Source Software
The National Synchrotron Light Source II operating at Brookhaven National Laboratory since 2014 for the US Department of Energy is one of the newest and brightest storage-ring synchrotron facility in the world. NSLS-II, like other facilities, provides pre-processing of the raw data and some analysis capabilities to its users. We describe the research collaborations and open source infrastructure developed at large instrument facilities such as NSLS-II for the purpose of curating high value scientific data along the early stages of the data lifecycle. Data acquisition and curation tasks include storing experiment configuration, detector metadata, raw data acquisition with infrastructure that converts proprietary instrument formats to industry standards. In addition, we describe a specific effort for discovering sample information at NSLS-II and tracing the provenance of analysis performed on acquired images. We show that curation tasks must be embedded into software along the data life cycle for effectiveness and ease of use, and that loosely defined collaborations evolve around shared open source tools. Finally we discuss best practices for experimental metadata capture in such facilities, data access and the new challenges of scale and complexity posed by AI-based discovery for the synthesis of new materials.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence.
U.S. Department of Energy
Grant numbers DESC0012704