http://www.ijdc.net/issue/feed International Journal of Digital Curation 2022-11-02T17:10:39+00:00 IJDC Editorial Team ijdc@mlist.is.ed.ac.uk Open Journal Systems <p>The IJDC publishes research papers, general articles and brief reports on digital curation, research data management and related issues. &nbsp;It complements the International Conference on Digital Curation (IDCC) and includes selected proceedings as Conference Papers.</p> http://www.ijdc.net/article/view/799 Data Management Planning for an Eight-Institution, Multi-Year Research Project 2022-09-20T16:59:19+01:00 Kristin A. Briney briney@caltech.edu Abigail Goben agoben@uic.edu Kyle M.L. Jones kmlj@iupui.edu <p><span style="font-weight: 400;">While data management planning for grant applications has become commonplace alongside articles providing guidance for such plans, examples of data plans as they have been created, implemented, and used for specific projects are only beginning to appear in the scholarly record. This article describes data management planning for an eight-institution, multi-year research project. The project leveraged four data management plans (DMP) in total, one for the funding application and one for each of the three distinct project phases. By understanding researcher roles, development and content of each DMP, team internal and external challenges, and the overall benefits of creating and using the plans, these DMPs provide a demonstration of the utility of this project management tool. </span></p> 2022-09-07T16:15:08+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/820 Reusable, FAIR Humanities Data 2022-09-20T16:59:19+01:00 Rebecca Grant rebecca.grant@f1000.com <p class="Abstract">While stakeholders including funding agencies and academic publishers implement more stringent data sharing policies, challenges remain for researchers in the humanities who are increasingly prompted to share their research data.&nbsp;This paper outlines some key challenges of research data sharing in the humanities, and identifies existing work which has been undertaken to explore these challenges. It describes the current landscape regarding publishers’ research data sharing policies, and the impact which strong data policies can have, regardless of discipline.</p> <p class="Abstract">Using Routledge Open Research as a case study, the development of a set of humanities-inclusive Open Data publisher data guidelines is then described. These include practical guidance in relation to data sharing for humanities authors, and a close alignment with the FAIR Data Principles.</p> 2022-09-09T16:25:43+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/782 OpenStack Swift: An Ideal Bit-Level Object Storage System for Digital Preservation 2022-10-07T17:03:08+01:00 Guanwen Zhang guanwen@ualberta.ca Kenton Good kenton.good@ualberta.ca Weiwei Shi weiwei.shi@ualberta.ca <p>A bit-level object storage system is a foundational building block of long-term digital preservation (LTDP). To achieve the purposes of LTDP, the system must be able to: preserve the authenticity and integrity of the original digital objects; scale up with dramatically increasing demands for preservation storage; mitigate the impact of hardware obsolescence and software ephemerality; replicate digital objects among distributed data centers at different geographical locations; and to constantly audit and automatically recover from compromised states. A realistic and daunting challenge to satisfy these requirements is not only to overcome technological difficulties but also to maintain economic sustainability by implementing and continuously operating such systems in a cost-effective way. In this paper, we present OpenStack Swift, an open-source, mature and widely accepted cloud platform, as a practical and proven solution with a case study at the University of Alberta Library. We emphasize the implementation, application, cost analysis and maintenance of the system, with the purpose of contributing to the community with an exceedingly robust, highly scalable, self-healing and comparatively cost-effective bit-level object storage system for long-term digital preservation.&nbsp;</p> 2022-10-07T11:08:47+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/837 Curating for Accessibility 2022-09-20T16:59:20+01:00 Theresa Anderson tander08@syr.edu Randy D. Colón rcolon4@uic.edu Abigail Goben agoben@uic.edu Sebastian Karcher skarcher@syr.edu <p class="AfterHeading12"><span lang="EN-GB">Accessibility of research data to disabled users has received scant attention in literature and practice. In this paper we briefly survey the current state of accessibility for research data and suggest some first steps that repositories should take to make their holdings more accessible. We then describe in depth how those steps were implemented at the Qualitative Data Repository (QDR), a domain repository for qualitative social-science data. The paper discusses accessibility testing and improvements on the repository and its underlying software, changes to the curation process to improve accessibility, as well as efforts to retroactively improve the accessibility of existing collections. We conclude by describing key lessons learned during this process as well as next steps.</span></p> 2022-08-03T15:18:06+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/819 An Approach for Curating Collections of Historical Documents with the Use of Topic Detection Technologies 2022-09-20T16:59:19+01:00 Medina Andresel medina.andresel@ait.ac.at Sergiu Gordea sergiu.gordea@ait.ac.at Srdjan Stevanetic srdjan.stevanetic@ait.ac.at Mina Schütz mina.schuetz@ait.ac.at <p class="Abstract" style="margin: 0in -.05pt 5.0pt 0in;"><span lang="EN-GB">Digital curation of materials available in large online repositories is required to enable the reuse of Cultural Heritage resources in specific activities like education or scientific research. </span><span lang="EN-GB">The digitization of such valuable objects is an important task for making them accessible through digital platforms such as Europeana, therefore ensuring the success of transcription campaigns via the Transcribathon platform is highly important for this goal. </span><span lang="EN-GB">Based on impact assessment results, people are more engaged in the transcription process if the content is more oriented to specific themes, such as First World War. Currently, efforts to group related documents into thematic collections are in general hand-crafted and due to the large ingestion of new material they are difficult to maintain and update. The current solutions based on text retrieval are not able to support the discovery of related content since the existing collections are multi-lingual and contain heterogeneous items like postcards, letters, journals, photographs etc. Technological advances in natural language understanding and in data management have led to the automation of document categorization and via automatic topic detection. To use existing topic detection technologies on Europeana collections there are several challenges to be addressed: (1) ensure representative and qualitative training data, (2) ensure the quality of the learned topics, and (3) efficient and scalable solutions for searching related content based on the automatically detected topics, and for suggesting the most relevant topics on new items. This paper describes in more details each such challenge and the proposed solutions thus offering a novel perspective on how digital curation practices can be enhanced with the help of machine learning technologies.</span></p> 2022-09-20T10:46:20+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/847 Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections 2022-10-11T17:03:58+01:00 Maria Esteva maria@tacc.utexas.edu Weijia Xu weijiax@gmail.com Nevan Simone nevan.simone@gmail.com Kartik Nagpal kartiknagpal@utexas.edu Amit Gupta agupta@tacc.utexas.edu Moriba Jah moriba@utexas.edu <p>Data driven applications often require using data integrated from different, large, and continuously updated collections. Each of these collections may present gaps, overlapping data, have conflicting information, or complement each other. Thus, a curation need is to continuously assess if data from multiple collections are fit for integration and reuse. To assess different large data collections at the same time, we present the Synchronic Curation (SC) framework. SC involves processing steps to map the different collections to a unifying data model that represents research problems in a scientific area. The data model, which includes the collections' provenance and a data dictionary, is implemented in a graph database where collections are continuously ingested and can be queried. SC has a collection analysis and comparison module to track updates, and to identify gaps, changes, and irregularities within and across collections. Assessment results can be accessed interactively through a web-based interactive graph. In this paper we introduce SC as an interdisciplinary enterprise, and illustrate its capabilities through its implementation in ASTRIAGraph, a space sustainability knowledge system.</p> 2022-10-11T15:06:47+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/841 Building LABDRIVE, a Petabyte Scale, OAIS/ISO 16363 Conformant, Environmentally Sustainable Archive, Tested by Large Scientific Organisations to Preserve their Raw and Processed Data, Software and Documents 2022-09-21T16:59:32+01:00 David Leslie Giaretta david@giaretta.org Teo Redondo teo.redondo@libnova.com <p>Vast amounts of scientific, cultural, social, business and government, and other, information is being created every day. There are billions of objects, in a multitude of formats, semantics and associated software. Much, perhaps the majority, of this information is transitory but there is still an immense amount which should be preserved for the medium and long term – perhaps even indefinitely.</p> <p>Preservation requires that the information continues to be usable, not simply to be printed or displayed. Of course, the digital objects (the bits) must be preserved, as must the “metadata” which enables the bits to the understood which includes the software.</p> <p>Before LABDRIVE no system could adequately preserve such information, especially in such gigantic volume and variety.&nbsp;</p> <p>In this paper we describe the development of LABDRIVE and its ability to preserve tens or hundreds of petabytes in a way which is conformant to the OAIS Reference Model and capable of being ISO 16363 certified.</p> 2022-09-21T14:19:01+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/827 From Siloed to Reusable 2022-09-29T21:55:59+01:00 Kathryn Gucer kgucer@umd.edu Michelle Janowiecki michelle.janowiecki@jhu.edu <p>In the past twenty-five years, cross-institutional communities have come together in the creation and use of open source software and open data standards to build digital collections (Madden, 2012). These librarians, developers, archivists, artists, and researchers recognize that the custom-built architectures and bespoke data structures of earlier digital collections development are unsustainable. Their collaborations have produced now-standard technologies such as Samvera, Fedora, GeoBlacklight, Islandora 8, as well as RDF, and JSON-LD among other open schemas. A core principle animating these efforts is reusability: data, schemas, and technologies in the open era must be coherent and flexible enough to be reused across multiple digital contexts. The authors of this paper show how reuse guided the migration of the Hopkins Digital Library from an outdated isolated system to a sustainable interconnected environment in GeoBlacklight, Islandora, with metadata based in Linked Open Data. Three areas of reuse focus this paper: the creation of robust interoperable metadata; the expansion of IIIF functionality to integrate the needs of the Hopkins Geoportal’s users; the development of a broadly re/usable data migration module focused on expanding a diverse community of invested users. In focusing on reusability as an organising principle of digital collections development, this case study shows how one digital curation team produced a platform that meets the changing and specific needs of an individual institution, on the one hand, and participated in and furthered the creative coherence of the open communities supporting the team’s work, on the other.</p> 2022-09-21T16:00:15+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/849 Fostering the Adoption of DMP in Small Research Projects through a Collaborative Approach 2022-09-20T16:59:20+01:00 André Maciel andremaciel561@gmail.com João Aguiar Castro joao.a.castro@inesctec.pt Cristina Ribeiro mcr@fe.up.pt Marta Almada martasalmada@gmail.com Luís Midão luismidao@gmail.com <p>In order to promote sound management of research data the European Commission, under the Horizon 2020 framework program, is promoting the adoption of a Data Management Plan (DMP) in research projects. Despite the value of a DMP to make data findable, accessible, interoperable and reusable (FAIR) through time, the development and implementation of DMPs is not yet a common practice in health research. Raising the awareness of researchers in small projects to the benefits of early adoption of a DMP is, therefore, a motivator for others to follow suit. In this paper we describe an approach to engage researchers in the writing of a DMP, in an ongoing project, FrailSurvey, in which researchers are collecting data through a mobile application for self-assessment of fragility. The case study is supported by interviews, a metadata creation session, as well as the validation of recommendations by researchers. With the outline of our process we also outline tools and services that supported the development of the DMP in this small project, particularly since there were no institutional services available to researchers</p> 2022-09-07T14:53:55+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/839 Who Writes Scholarly Code? 2022-11-02T17:10:39+00:00 Sarah Nguyễn snguye@uw.edu Vicky Rampin vs77@nyu.edu <p>This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web. By understanding patrons through multiple lenses – daily productivity habits, motivations, and scholarly needs – librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship. </p> 2022-11-01T18:23:37+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/836 Automation is Documentation: Functional Documentation of Human-Machine Interaction for Future Software Reuse 2022-09-20T16:59:20+01:00 Jurek Oberhauser jurek@openslx.com Rafael Gieschke rafael.gieschke@rz.uni-freiburg.de Klaus Rechert rechert@hs-kehl.de <p class="Abstract">Preserving software and providing access to obsolete software is necessary and will become even more important for work with any kind of born-digital artifacts. While usability and availability of emulation in digital curation and preservation workflow has improved significantly, productive (re)use of preserved obsolete software is a growing concern, due to a lack of (future) operational knowledge. In this article we describe solutions to automate and document software usage in a way, such that the result is not only instructive but also productive.</p> 2022-09-06T17:23:05+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/825 DBRepo: a Semantic Digital Repository for Relational Databases 2022-09-20T16:59:20+01:00 Martin Weise martin.weise@tuwien.ac.at Moritz Staudinger moritz.staudinger@tuwien.ac.at Cornelia Michlits cornelia.michlits@tuwien.ac.at Eva Gergely eva.gergely@univie.ac.at Kirill Stytsenko kirill.stytsenko@univie.ac.at Raman Ganguly raman.ganguly@univie.ac.at Andreas Rauber andreas.rauber@tuwien.ac.at <p>Data curation is a complex, multi-faceted task. While dedicated data stewards are starting to take care of these activities in close collaboration with researchers for many types of (usually file-based) data in many institutions, this is rarely yet the case for data held in relational databases. Beyond large-scale infrastructures hosting e.g. climate or genome data, researchers usually have to create, build and maintain their database, care about security patches, and feed data into it in order to use it in their research. Data curation, if at all, usually happens after a project is finished, when data may be exported for digital preservation into file repository systems.</p> <p>We present DBRepo, a semantic digital repository for relational databases in a private cloud setting designed to (1) host research data stored in relational databases right from the beginning of a research project, (2) provide separation of concerns, allowing the researchers to focus on the domain aspects of the data and their work while bringing in experts to handle classic data management tasks, (3) improve findability, accessibility and reusability by offering semantic mapping of metadata attributes, and (4) focus on reproducibility in dynamically evolving data by supporting versioning and precise identification/cite-ability for arbitrary subsets of data.<span class="Apple-converted-space">&nbsp;</span></p> 2022-09-07T12:40:06+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/818 OpenCitations: an Open e-Infrastructure to Foster Maximum Reuse of Citation Data 2022-09-20T16:59:21+01:00 Chiara Di Giambattista chiar.digiambattista@studio.unibo.it Ivan Heibi ivan.heibi2@unibo.it Silvio Peroni silvio.peroni@unibo.it David Shotton david.shotton@opencitations.net <p>OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. OpenCitations collaborates with projects that are part of the Open Science ecosystem and complies with the UNESCO founding principles of Open Science, the I4OC recommendations, and the FAIR data principles that data should be Findable, Accessible, Interoperable and Reusable. Since its data satisfies all the Reuse guidelines provided by FAIR in terms of richness, provenance, usage licenses and domain-relevant community standards, OpenCitations provides an example of a successful open e-infrastructure in which the reusability of data is integral to its mission.</p> 2022-08-03T11:34:56+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/828 On the Reusability of Data Cleaning Workflows 2022-09-28T17:00:26+01:00 Lan Li lanl2@illinois.edu Bertram Ludäscher ludaesch@illinois.edu <p>The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through&nbsp;updates and data transformations, such that downstream analyses can be conducted and&nbsp;lead to trustworthy results. A transparent and reusable data cleaning workflow can save time&nbsp;and effort through automation, and make subsequent data cleaning on new data less errorprone.&nbsp;However, reusability of data cleaning workflows has received little to no attention in&nbsp;the research community. We identify some challenges and opportunities for reusing data&nbsp;cleaning workflows. We present a high-level conceptual model to clarify what we mean by&nbsp;reusability and propose ways to improve reusability along different dimensions. We use&nbsp;the opportunity of presenting at IDCC to invite the community to share their uses cases,&nbsp;experiences, and desiderata for the reuse of data cleaning workflows and recipes in order&nbsp;to foster new collaborations and guide future work.</p> 2022-09-27T21:46:25+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/852 Increasing the Reuse of Data through FAIR-enabling the Certification of Trustworthy Digital Repositories 2022-09-25T21:40:48+01:00 Benjamin Jacob Mathers bm21346@essex.ac.uk Hervé L’Hours herve@essex.ac.uk <p class="Abstract">The long-term preservation of digital objects, and the means by which they can be reused, are addressed by both the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) and a number of standards bodies providing Trustworthy Digital Repository (TDR) certification, such as the CoreTrustSeal.&nbsp; Though many of the requirements listed in the <em>Core Trustworthy Data Repositories Requirements 2020–2022 Extended Guidance</em> address the FAIR Data Principles indirectly, there is currently no formal ‘FAIR Certification’ offered by the CoreTrustSeal or other TDR standards bodies. To address this gap the FAIRsFAIR project developed a number of tools and resources that facilitate the assessment of FAIR-enabling practices at the repository level as well as the FAIRness of datasets within them. These include the <em>CoreTrustSeal+FAIRenabling Capability Maturity model</em> (CTS+FAIR CapMat), a FAIR-Enabling<em> Trustworthy Digital Repositories-Capability Maturity Self-Assessment </em>template, and F-UJI , &nbsp;a web-based tool designed to assess the FAIRness of research data objects.&nbsp; The success of such tools and resources ultimately depends upon community uptake. This requires a community-wide commitment to develop best practices to increase the reuse of data and to reach consensus on what these practices are.&nbsp; One possible way of achieving community consensus would be through the creation of a network of FAIR-enabling TDRs, as proposed by FAIRsFAIR.</p> 2022-12-01T07:28:42+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/848 Towards Environmentally Sustainable Long-term Digital Preservation 2022-10-31T17:10:13+00:00 Ignacio Peluaga ignacio.peluaga.lozada@cern.ch João Fernandes joao.fernandes@cern.ch Shreyasvi Natraj shreyasvi.natraj@unige.ch <p class="Abstract">ARCHIVER and Pre-Commercial Procurement funding has enabled small to medium enterprises (SMEs) to innovate and deliver new services for EOSC. Within the framework of the <a href="https://www.archiver-project.eu/"><span style="color: windowtext; text-decoration: none; text-underline: none;">ARCHIVER </span></a>pre-commercial procurement tender, between December 2020 and August 2021, three commercial consortia competed to deliver innovative, prototype solutions for long-term data preservation. Two of them were selected to continue with the pilot phase and deliver research-ready solutions for long-term data preservation of research data, therefore filling a gap in the current European Open Science panorama.</p> <p class="Abstract">Digital preservation relies on technological infrastructure (information and communication technology, ICT) that can have environmental impacts. While altering technology usage can reduce the impact of digital preservation practices, this alone is not a strategy for sustainable practice. Moving toward environmentally sustainable digital preservation requires critically examining the motivations and assumptions that shape current practice. The use of scalable cloud infrastructures can reduce the environmental impacts of long-term data preservation solutions.</p> 2022-10-31T12:01:16+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/840 Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans 2022-09-20T16:59:18+01:00 Daniel Spichtinger daniel@spichtinger.net <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">As policies, good practices and mandates on research data management evolve, more emphasis has been put on the licencing of data, which allows potential re-users to quickly identify what they can do with the data in question. In this paper I analyse a pre-existing collection of 840 Horizon 2020 public data management plans (DMPs) to determine which ones mention creative commons licences and among those who do, which licences are being used. </span></p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">I find that 36% of DMPs mention creative commons and among those a number of different approaches towards licencing exist (overall policy per project, licencing decisions per dataset, licencing decisions per partner, licensing decision per data format, licensing decision per perceived stakeholder interest), often clad in rather vague language with CC licences being “recommended” or “suggested”. Some DMPs also “kick the can further down the road” by mentioning that “a” CC licence will be used, but not which one. However, among those DMPs that do mention specific CC licences, a clear favourite emerges: the CC-BY licence, which accounts for half of the total mentioning of a specific licence. </span></p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">The fact that 64% of DMPs did not mention creative commons at all is an indication for the need for further training and awareness raising on data management in general and licencing in particular in Horizon Europe. For those DMPs that do mention specific licences, 60% would be compliant with Horizon Europe requirements (CC-BY or CC0). However, it should be carefully monitored whether content similar to the 40% that is currently licenced with non- Horizon Europe compliant licences will in the future move to CC-BY or CC0 or whether such content will simply be kept fully closed by projects (by invoking the “as open as possible, as close as necessary” principle), which would be an unintended and potentially damaging consequence of the policy. </span></p> 2022-09-20T11:41:22+01:00 ##submission.copyrightStatement##