International Journal of Digital Curation <p>The IJDC publishes pre-prints, research papers, general articles and editorials on digital curation, research data management and related issues. &nbsp;It complements the International Conference on Digital Curation (IDCC) and includes selected proceedings as Conference Papers.</p> University of Edinburgh en-US International Journal of Digital Curation 1746-8256 <p>Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a <a href="">Creative Commons Attribution 4.0 International (CC BY 4.0)</a> licence.<br><br><a href="" rel="license"><img style="border-width: 0;" src="" alt="Creative Commons License"></a></p> Identifying Opportunities for Collective Curation During Archaeological Excavations <p>Archaeological excavations are comprised of interdisciplinary teams that create, manage, and share data as they unearth and analyse material culture. These team-based settings are ripe for collective curation during these data lifecycle stages. However, findings from four excavation sites show that the data interdisciplinary teams create are not well integrated. Knowing this, we recommended opportunities for collective curation to improve use and reuse of the data within and outside of the team.</p> Ixchel Faniel Anne Austin Sarah Whitcher Kansa Eric Kansa Jennifer Jacobs Phoebe France ##submission.copyrightStatement## 2021-04-18 2021-04-18 16 1 17 17 10.2218/ijdc.v16i1.742 Cross-tier Web Programming for Curated Databases: a Case Study <p>Curated databases have become important sources of information across several scientific disciplines, and as the result of manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for the curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so.</p> <p>A scientific database application is not just the relational database itself, but also an ecosystem of web applications to display the data, and applications which allow data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way.</p> <p>Cross-tier programming languages allow developers to write a web application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and it should be possible to provide curation features via program transformations. As a step towards this goal, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language.</p> <p>In this article, we describe such a case study: reimplementing the web frontend of a realworld scientific database, the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb), in the Links cross-tier programming language. We show how programming language features such as language-integrated query simplify the development process, and rule out common errors. Through an automated functional correctness evaluation, we show that the Links implementation correctly implements the functionality of the official version. Through a comparative performance evaluation, we show that the Links implementation performs fewer database queries, while the time neededto handle the queries is comparable to the official Java version. Furthermore, while there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is usable as a proof-of-concept case study of cross-tier programming for curated databases.</p> Simon Fowler Simon Harding Joanna Sharman James Cheney ##submission.copyrightStatement## 2021-04-19 2021-04-19 16 1 21 21 10.2218/ijdc.v16i1.735 Understanding the Data Management Plan as a Boundary Object through a Multi-stakeholder perspective <div class="WordSection1"> <p class="Abstract">A three-phase Delphi study was used to investigate an emerging community for research data management in Norway and their understanding and application of data management plans (DMPs). The findings reveal visions of what the DMP should be as well as different practice approaches, yet the stakeholders present common goals. This paper discusses the different perspectives on the DMP by applying Star and Griesemer’s theory of boundary objects (Star &amp; Griesemer, 1989). The debate on what the DMP is and the findings presented are relevant to all research communities currently implementing DMP procedures and requirements. The current discussions about DMPs tend to be distant from the active researchers and limited to the needs of funders and institutions rather than to the usefulness for researchers. By analysing the DMP as a boundary object, plastic and adaptable yet with a robust identity (Star &amp; Griesemer, 1989), and by translating between worlds where collaboration on data sharing can take place we expand the perspectives and include all stakeholders. An understanding of the DMP as a boundary object can shift the focus from shaping a DMP which fulfils funders’ requirements to enabling collaboration on data management and sharing across domains using standardised forms.</p> </div> Live Kvale Nils Pharo ##submission.copyrightStatement## 2021-07-04 2021-07-04 16 1 16 16 10.2218/ijdc.v16i1.746 Doctoral Students' Educational Needs in Research Data Management: Perceived Importance and Current Competencies <div class="WordSection1"> <p class="Abstract">Sound research data management (RDM) competencies are elementary tools used by researchers to ensure integrated, reliable, and re-usable data, and to produce high quality research results. In this study, 35 doctoral students and faculty members were asked to self-rate or rate doctoral students’ current RDM competencies and rate the importance of these competencies. Structured interviews were conducted, using close-ended and open-ended questions, covering research data lifecycle phases such as collection, storing, organization, documentation, processing, analysis, preservation, and data sharing. The quantitative analysis of the respondents’ answers indicated a wide gap between doctoral students’ rated/self-rated current competencies and the rated importance of these competencies. In conclusion, two major educational needs were identified in the qualitative analysis of the interviews: to improve and standardize data management planning, including awareness of the intellectual property and agreements issues affecting data processing and sharing; and to improve and standardize data documenting and describing, not only for the researcher themself but especially for data preservation, sharing, and re-using. Hence the study informs the development of RDM education for doctoral students.</p> </div> Jukka Rantasaari ##submission.copyrightStatement## 2021-08-09 2021-08-09 16 1 36 36 10.2218/ijdc.v16i1.684 Futureproofing Visual Effects <p class="Abstract">Digital visual effects (VFX), including computer animation, have become a commonplace feature of contemporary episodic and film production projects. Using various commercial applications and bespoke tools, VFX artists craft digital objects (known as “assets”) to create visual elements such as characters and environments, which are composited together and output as shots.</p> <p class="Abstract">While the shots that make up the finished film or television (TV) episode are maintained and preserved within purpose-built digital asset management systems and repositories by the studios commissioning the projects; the wider VFX network currently has no consistent guidelines nor requirements around the digital curation of VFX digital assets and records. This includes a lack of guidance about how to effectively futureproof digital VFX and preserve it for the long-term.</p> <p class="Abstract">In this paper I provide a case study – a single shot from a 3D animation short film – to illustrate the complexities of digital VFX assets and records and the pipeline environments whence they are generated. I also draw from data collected from interviews with over 20 professional VFX practitioners from award-winning VFX companies, and I undertake socio-technical analysis of VFX using actor-network theory. I explain how high data volumes of digital information, rapid technology progression and dependencies on software pose significant preservation challenges.</p> <p>In addition, I outline that by conducting holistic appraisal, selection and disposal activities across their entire digital collections, and by continuing to develop and adopt open formats; the VFX industry has improved capability to preserve first-hand evidence of their work in years to come.</p> Evanthia Samaras ##submission.copyrightStatement## 2021-08-15 2021-08-15 16 1 15 15 10.2218/ijdc.v16i1.689 Assessment, Usability, and Sociocultural Impacts of DataONE <p class="Abstract">DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability &amp; Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from&nbsp;multiple disciplines and communities in convergence research.</p> Robert J. Sandusky Suzie Allard Lynn Baird Leah Cannon Kevin Crowston Amy Forrester Bruce Grant Rachael Hu Robert Olendorf Danielle Pollock Alison Specht Carol Tenopir Rachel Volentine ##submission.copyrightStatement## 2021-04-18 2021-04-18 16 1 48 48 10.2218/ijdc.v16i1.678 metajelo: A metadata package for journals to support external linked objects <p class="Abstract">We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque.</p> <p>&nbsp;</p> Lars Vilhuber Carl Lagoze ##submission.copyrightStatement## 2021-10-26 2021-10-26 16 1 22 22 10.2218/ijdc.v16i1.600 Where There's a Will, There's a Way: In-House Digitization of an Oral History Collection in a Lone-Arranger Situation <p>Analog audio materials present unique preservation and access challenges for even the largest libraries. These challenges are magnified for smaller institutions where budgets, staffing, and equipment limit what can be achieved. Because in-house migration to digital of analog audio is often out of reach for smaller institutions, the choice is between finding the room in the budget to out-source a project, or sit by and watch important materials decay. Cost is the most significant barrier to audio migration. Audio preservation labs can charge hundreds or even thousands of dollars to migrate analog to digital. Top-tier audio preservation equipment is equally expensive. When faced with the decomposition of an oral history collection recorded on cassette tape, one library decided that where there was a will, there was a way. The College of Education One-Room Schoolhouse Oral History Collection consisted of 247 audio cassettes containing interviews with one-room school house teachers from 68 counties in Kansas. The cassette tapes in this collection were between 20-40 years old and generally inaccessible for research due to fear the tapes could be damaged during playback. This case study looks at how a single Digital Curation Librarian with no audio digitization experience migrated nearly 200 hours of audio to digital using a $40 audio converter from Amazon and a campus subscription to Adobe Audition. This case study covers the decision to digitize the collection, the digitization process including audio clean-up, metadata collection and creation, presentation of the collection in CONTENTdm, and final preservation of audio files. The project took 20 months to complete and resulted in significant lessons learned that have informed decisions regarding future audio conversion projects.</p> <p>&nbsp;</p> <p>&nbsp;</p> Mary Elizabeth Downing-Turner ##submission.copyrightStatement## 2021-09-28 2021-09-28 16 1 8 8 10.2218/ijdc.v16i1.744 Improving the Usability of Organizational Data Systems <p>For research data repositories, web interfaces are usually the primary, if not the only, method that data users have to interact with repository systems. Data users often search, discover, understand, access, and sometimes use data directly through repository web interfaces. Given that sub-par user interfaces can reduce the ability of users to locate, obtain, and use data, it is important to consider how repositories’ web interfaces can be evaluated and improved in order to ensure useful and successful user interactions. This paper discusses how usability assessment techniques are being applied to improve the functioning of data repository interfaces at the National Center for Atmospheric Research (NCAR). At NCAR, a new suite of data system tools is being developed and collectively called the NCAR Digital Asset Services Hub (DASH). Usability evaluation techniques have been used throughout the NCAR DASH design and implementation cycles in order to ensure that the systems work well together for the intended user base. By applying user study, paper prototype, competitive analysis, journey mapping, and heuristic evaluation, the NCAR DASH Search and Repository experiences provide examples for how data systems can benefit from usability principles and techniques. Integrating usability principles and techniques into repository system design and implementation workflows helps to optimize the systems’ overall user experience.</p> Chung-Yi Hou Matthew S. Mayernik ##submission.copyrightStatement## 2021-05-18 2021-05-18 16 1 21 21 10.2218/ijdc.v16i1.592 Leveraging Existing Technology: Developing a Trusted Digital Repository for the U.S. Geological Survey <div class="WordSection1"> <p class="Abstract">As Federal Government agencies in the United States pivot to increase access to scientific data (Sheehan, 2016), the U.S. Geological Survey (USGS) has made substantial progress (Kriesberg et al., 2017). USGS authors are required to make federally funded data publicly available in an approved data repository (USGS, 2016b). This type of public data product, known as a USGS data release, serves as a method for publishing reviewed and approved data. In this paper, we present major milestones in the approach the USGS took to transition an existing technology platform to a Trusted Digital Repository. We describe both the technical and the non-technical actions that contributed to a successful outcome.We highlight how initial workflows revealed patterns that were later automated, and the ways in which assessments and user feedback influenced design and implementation. The paper concludes with lessons learned, such as the importance of a community of practice, application programming interface (API)-driven technologies, iterative development, and user-centered design. This paper is intended to offer a potential roadmap for organizations pursuing similar goals.</p> </div> <p>&nbsp;</p> Vivian B. Hutchison Tamar Norkin Madison L. Langseth Drew A. Ignizio Lisa S. Zolly Ricardo McClees-Funinan Amanda Liford ##submission.copyrightStatement## 2021-07-11 2021-07-11 16 1 23 23 10.2218/ijdc.v16i1.741 Data Curation, Fisheries, and Ecosystem-based Management: the Case Study of the Pecheker Database <div class="WordSection1"> <p class="Abstract">The scientific monitoring of the Southern Ocean French fishing industry is based on the use the Pecheker database. Pecheker is dedicated to the digital curation of the data collected on field by scientific observers and which analysis allows the scientists of the Muséum national d’Histoire naturelle institution to provide guidelines and advice for the regulation of the fishing activity, the protection of the fish stocks and the protection of the marine ecosystems. The template of Pecheker has been developed to make the database adapted to the ecosystem-based management concept. Considering the global context of biodiversity erosion, this modern approach of management aims to take account of the environmental background of the fisheries to ensure their sustainable development. Completeness and high quality of the raw data is a key element for an ecosystem-based management database such as Pecheker. Here, we present the development of this database as a case study of fisheries data curation to be shared with the readers. Full code to deploy a database based on the Pecheker template is provided in supplementary materials. Considering the success factors we could identify, we propose a discussion about how the community could build a global fisheries information system based on a network of small databases including interoperability standards.</p> </div> Alexis Martin Charlotte Chazeau Nicolas Gasco Guy Duhamel Patrice Pruvost ##submission.copyrightStatement## 2021-06-07 2021-06-07 16 1 31 31 10.2218/ijdc.v16i1.674 Scaling by Optimising: Modularisation of Data Curation Services in Growing Organisations <p class="Abstract">After a century of theorising and applying management practices, we are in the middle of entering a new stage in management science: digital management. The management of digital data submerges in traditional functions of management and, at the same time, continues to recreate viable solutions and conceptualisations in its established fields, e.g. research data management. Yet, one can observe bilateral synergies and mutual enrichment of traditional and data management practices in all fields. The paper at hand addresses a case in point, in which new and old management practices amalgamate to meet a steadily, in part characterised by leaps and bounds, increasing demand of data curation services in academic institutions. The idea of modularisation, as known from software engineering, is applied to data curation workflows so that economies of scale and scope can be used. While scaling refers to both management science and data science, optimising is understood in the traditional managerial sense, that is, with respect to the cost function. By means of a situation analysis describing how data curation services were applied from one department to the entire institution and an analysis of the factors of influence, a method of modularisation is outlined that converges to an optimal state of curation workflows.</p> Hagen Peukert ##submission.copyrightStatement## 2021-04-26 2021-04-26 16 1 20 20 10.2218/ijdc.v16i1.650 FAIR Forever? Accountabilities and Responsibilities in the Preservation of Research Data <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;">Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the open science and research data communities, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy. However, capacity erodes quickly, establishing a need for ongoing examination and review to ensure that skills, technology, and policy remain fit for changing purpose.&nbsp;To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSC Secretariat Project in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.</p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;">This paper draws from the FAIR Forever study to document and explore its key findings on the identified strengths, weaknesses, opportunities, and threats to the preservation of FAIR data in EOSC, and to the preservation of research data more broadly. It begins with background of the study and an overview of the methodology employed, which involved a desk-based assessment of the emerging EOSC vision, interviews with representatives of EOSC stakeholders, and focus groups with digital preservation specialists and data managers in research organizations. It summarizes key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. It then outlines the recommendations provided in the final report presented to the EOSC Sustainability Working Group.</p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;">To better ensure that research data can be FAIRer for longer, the recommendations of the study are presented with discussion on how they can be extended and applied to various research data stakeholders in and outside of EOSC, and suggest ways to bring together research data curation, management, and preservation communities to better ensure FAIRness now and in the long term.</p> Amy Currie William Kilbride ##submission.copyrightStatement## 2021-09-30 2021-09-30 16 1 16 16 10.2218/ijdc.v16i1.768 How Long Can We Build It? Ensuring Usability of a Scientific Code Base <p>Software and in particular source code became an important component of scientific publications and henceforth is now subject of research data management.&nbsp; Maintaining source code such that it remains a usable and a valuable scientific contribution is and remains a huge task. Not all code contributions can be actively maintained forever. Eventually, there will be a significant backlog of legacy source-code. In this article we analyse the requirements for applying the concept of long-term reusability to source code. We use simple case study to identify gaps and provide a technical infrastructure based on emulator to support automated builds of historic software in form of source code.</p> <p>&nbsp;</p> Klaus Rechert Jurek Oberhauser Rafael Gieschke ##submission.copyrightStatement## 2021-05-17 2021-05-17 16 1 11 11 10.2218/ijdc.v16i1.770