International Journal of Digital Curation <p>The IJDC publishes research papers, general articles and brief reports on digital curation, research data management and related issues. &nbsp;It complements the International Conference on Digital Curation (IDCC) and includes selected proceedings as Conference Papers.</p> University of Edinburgh en-US International Journal of Digital Curation 1746-8256 <p>Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a <a href="">Creative Commons Attribution 4.0 International (CC BY 4.0)</a> licence.<br><br><a href="" rel="license"><img style="border-width: 0;" src="" alt="Creative Commons License"></a></p> If Data is Used in the Forest and No-one is Around to Hear it, Did it Happen? a Citation Count Investigation <p>In this article I describe the process and results of tracking a citation from a data repository through the article publication process and trying to add a citation event to one of our DOIs. I also discuss some other confusing aspects related to citation counts as indicated in various systems, including reference managers, the publisher’s perspective, aggregators, and DOI minters. I discovered numerous problems with citations. Addressing these problems is important as citations can be key to determining both the original use and reuse of a dataset, especially for repositories that do not track usage by requiring people to login or provide an email to download a dataset. The lack of transparency in some data citation systems and processes obscures how and where data is being used.<span class="Apple-converted-space">&nbsp;</span></p> Susan Borda ##submission.copyrightStatement## 2023-08-22 2023-08-22 17 1 14 14 10.2218/ijdc.v17i1.830 Data Management Planning for an Eight-Institution, Multi-Year Research Project <p><span style="font-weight: 400;">While data management planning for grant applications has become commonplace alongside articles providing guidance for such plans, examples of data plans as they have been created, implemented, and used for specific projects are only beginning to appear in the scholarly record. This article describes data management planning for an eight-institution, multi-year research project. The project leveraged four data management plans (DMP) in total, one for the funding application and one for each of the three distinct project phases. By understanding researcher roles, development and content of each DMP, team internal and external challenges, and the overall benefits of creating and using the plans, these DMPs provide a demonstration of the utility of this project management tool. </span></p> Kristin A. Briney Abigail Goben Kyle M.L. Jones ##submission.copyrightStatement## 2022-09-07 2022-09-07 17 1 9 9 10.2218/ijdc.v17i1.799 Reusable, FAIR Humanities Data <p class="Abstract">While stakeholders including funding agencies and academic publishers implement more stringent data sharing policies, challenges remain for researchers in the humanities who are increasingly prompted to share their research data.&nbsp;This paper outlines some key challenges of research data sharing in the humanities, and identifies existing work which has been undertaken to explore these challenges. It describes the current landscape regarding publishers’ research data sharing policies, and the impact which strong data policies can have, regardless of discipline.</p> <p class="Abstract">Using Routledge Open Research as a case study, the development of a set of humanities-inclusive Open Data publisher data guidelines is then described. These include practical guidance in relation to data sharing for humanities authors, and a close alignment with the FAIR Data Principles.</p> Rebecca Grant ##submission.copyrightStatement## 2022-09-09 2022-09-09 17 1 15 15 10.2218/ijdc.v17i1.820 Putting the R into PlatfoRms <div class="WordSection1"> <p class="Abstract">This paper looks at the question of how and why to bring about greater reusability of Research Platforms (variously called Virtual Laboratories, Virtual Research Environments, or Science Gateways). It begins with some context for the Australian Research Data Commons, where the authors are based. It then examines the infrastructure concerns that are driving the need for platforms to be created and remain sustainable, and the connection from this to reusability. The paper then proceeds to discuss the ways in which FAIR is being extended to a range of research objects and infrastructure elements, before reviewing the work of the FAIR4VREs WG. The core of the paper is an examination, with examples or case studies, of four different paradigms for platform reusability: accessing, adopting, adapting, and abstracting. The paper concludes by examining actions undertaken by the ARDC to increase the likelihood of reusability.</p> </div> <p>&nbsp;</p> Kerry Levett Jonathan Smillie Andrew Treloar ##submission.copyrightStatement## 2022-12-12 2022-12-12 17 1 12 12 10.2218/ijdc.v17i1.843 OpenStack Swift: An Ideal Bit-Level Object Storage System for Digital Preservation <p>A bit-level object storage system is a foundational building block of long-term digital preservation (LTDP). To achieve the purposes of LTDP, the system must be able to: preserve the authenticity and integrity of the original digital objects; scale up with dramatically increasing demands for preservation storage; mitigate the impact of hardware obsolescence and software ephemerality; replicate digital objects among distributed data centers at different geographical locations; and to constantly audit and automatically recover from compromised states. A realistic and daunting challenge to satisfy these requirements is not only to overcome technological difficulties but also to maintain economic sustainability by implementing and continuously operating such systems in a cost-effective way. In this paper, we present OpenStack Swift, an open-source, mature and widely accepted cloud platform, as a practical and proven solution with a case study at the University of Alberta Library. We emphasize the implementation, application, cost analysis and maintenance of the system, with the purpose of contributing to the community with an exceedingly robust, highly scalable, self-healing and comparatively cost-effective bit-level object storage system for long-term digital preservation.&nbsp;</p> Guanwen Zhang Kenton Good Weiwei Shi ##submission.copyrightStatement## 2022-10-07 2022-10-07 17 1 19 19 10.2218/ijdc.v17i1.782 Curating for Accessibility <p class="AfterHeading12"><span lang="EN-GB">Accessibility of research data to disabled users has received scant attention in literature and practice. In this paper we briefly survey the current state of accessibility for research data and suggest some first steps that repositories should take to make their holdings more accessible. We then describe in depth how those steps were implemented at the Qualitative Data Repository (QDR), a domain repository for qualitative social-science data. The paper discusses accessibility testing and improvements on the repository and its underlying software, changes to the curation process to improve accessibility, as well as efforts to retroactively improve the accessibility of existing collections. We conclude by describing key lessons learned during this process as well as next steps.</span></p> Theresa Anderson Randy D. Colón Abigail Goben Sebastian Karcher ##submission.copyrightStatement## 2022-08-03 2022-08-03 17 1 10 10 10.2218/ijdc.v17i1.837 An Approach for Curating Collections of Historical Documents with the Use of Topic Detection Technologies <p class="Abstract" style="margin: 0in -.05pt 5.0pt 0in;"><span lang="EN-GB">Digital curation of materials available in large online repositories is required to enable the reuse of Cultural Heritage resources in specific activities like education or scientific research. </span><span lang="EN-GB">The digitization of such valuable objects is an important task for making them accessible through digital platforms such as Europeana, therefore ensuring the success of transcription campaigns via the Transcribathon platform is highly important for this goal. </span><span lang="EN-GB">Based on impact assessment results, people are more engaged in the transcription process if the content is more oriented to specific themes, such as First World War. Currently, efforts to group related documents into thematic collections are in general hand-crafted and due to the large ingestion of new material they are difficult to maintain and update. The current solutions based on text retrieval are not able to support the discovery of related content since the existing collections are multi-lingual and contain heterogeneous items like postcards, letters, journals, photographs etc. Technological advances in natural language understanding and in data management have led to the automation of document categorization and via automatic topic detection. To use existing topic detection technologies on Europeana collections there are several challenges to be addressed: (1) ensure representative and qualitative training data, (2) ensure the quality of the learned topics, and (3) efficient and scalable solutions for searching related content based on the automatically detected topics, and for suggesting the most relevant topics on new items. This paper describes in more details each such challenge and the proposed solutions thus offering a novel perspective on how digital curation practices can be enhanced with the help of machine learning technologies.</span></p> Medina Andresel Sergiu Gordea Srdjan Stevanetic Mina Schütz ##submission.copyrightStatement## 2022-09-20 2022-09-20 17 1 12 12 10.2218/ijdc.v17i1.819 Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections <p>Data driven applications often require using data integrated from different, large, and continuously updated collections. Each of these collections may present gaps, overlapping data, have conflicting information, or complement each other. Thus, a curation need is to continuously assess if data from multiple collections are fit for integration and reuse. To assess different large data collections at the same time, we present the Synchronic Curation (SC) framework. SC involves processing steps to map the different collections to a unifying data model that represents research problems in a scientific area. The data model, which includes the collections' provenance and a data dictionary, is implemented in a graph database where collections are continuously ingested and can be queried. SC has a collection analysis and comparison module to track updates, and to identify gaps, changes, and irregularities within and across collections. Assessment results can be accessed interactively through a web-based interactive graph. In this paper we introduce SC as an interdisciplinary enterprise, and illustrate its capabilities through its implementation in ASTRIAGraph, a space sustainability knowledge system.</p> Maria Esteva Weijia Xu Nevan Simone Kartik Nagpal Amit Gupta Moriba Jah ##submission.copyrightStatement## 2022-10-11 2022-10-11 17 1 11 11 10.2218/ijdc.v17i1.847 Building LABDRIVE, a Petabyte Scale, OAIS/ISO 16363 Conformant, Environmentally Sustainable Archive, Tested by Large Scientific Organisations to Preserve their Raw and Processed Data, Software and Documents <p>Vast amounts of scientific, cultural, social, business and government, and other, information is being created every day. There are billions of objects, in a multitude of formats, semantics and associated software. Much, perhaps the majority, of this information is transitory but there is still an immense amount which should be preserved for the medium and long term – perhaps even indefinitely.</p> <p>Preservation requires that the information continues to be usable, not simply to be printed or displayed. Of course, the digital objects (the bits) must be preserved, as must the “metadata” which enables the bits to the understood which includes the software.</p> <p>Before LABDRIVE no system could adequately preserve such information, especially in such gigantic volume and variety.&nbsp;</p> <p>In this paper we describe the development of LABDRIVE and its ability to preserve tens or hundreds of petabytes in a way which is conformant to the OAIS Reference Model and capable of being ISO 16363 certified.</p> David Leslie Giaretta Teo Redondo ##submission.copyrightStatement## 2022-09-21 2022-09-21 17 1 15 15 10.2218/ijdc.v17i1.841 From Siloed to Reusable <p>In the past twenty-five years, cross-institutional communities have come together in the creation and use of open source software and open data standards to build digital collections (Madden, 2012). These librarians, developers, archivists, artists, and researchers recognize that the custom-built architectures and bespoke data structures of earlier digital collections development are unsustainable. Their collaborations have produced now-standard technologies such as Samvera, Fedora, GeoBlacklight, Islandora 8, as well as RDF, and JSON-LD among other open schemas. A core principle animating these efforts is reusability: data, schemas, and technologies in the open era must be coherent and flexible enough to be reused across multiple digital contexts. The authors of this paper show how reuse guided the migration of the Hopkins Digital Library from an outdated isolated system to a sustainable interconnected environment in GeoBlacklight, Islandora, with metadata based in Linked Open Data. Three areas of reuse focus this paper: the creation of robust interoperable metadata; the expansion of IIIF functionality to integrate the needs of the Hopkins Geoportal’s users; the development of a broadly re/usable data migration module focused on expanding a diverse community of invested users. In focusing on reusability as an organising principle of digital collections development, this case study shows how one digital curation team produced a platform that meets the changing and specific needs of an individual institution, on the one hand, and participated in and furthered the creative coherence of the open communities supporting the team’s work, on the other.</p> Kathryn Gucer Michelle Janowiecki ##submission.copyrightStatement## 2022-09-21 2022-09-21 17 1 10 10 10.2218/ijdc.v17i1.827 Fostering the Adoption of DMP in Small Research Projects through a Collaborative Approach <p>In order to promote sound management of research data the European Commission, under the Horizon 2020 framework program, is promoting the adoption of a Data Management Plan (DMP) in research projects. Despite the value of a DMP to make data findable, accessible, interoperable and reusable (FAIR) through time, the development and implementation of DMPs is not yet a common practice in health research. Raising the awareness of researchers in small projects to the benefits of early adoption of a DMP is, therefore, a motivator for others to follow suit. In this paper we describe an approach to engage researchers in the writing of a DMP, in an ongoing project, FrailSurvey, in which researchers are collecting data through a mobile application for self-assessment of fragility. The case study is supported by interviews, a metadata creation session, as well as the validation of recommendations by researchers. With the outline of our process we also outline tools and services that supported the development of the DMP in this small project, particularly since there were no institutional services available to researchers</p> André Maciel João Aguiar Castro Cristina Ribeiro Marta Almada Luís Midão ##submission.copyrightStatement## 2022-09-07 2022-09-07 17 1 14 14 10.2218/ijdc.v17i1.849 Who Writes Scholarly Code? <p>This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web. By understanding patrons through multiple lenses – daily productivity habits, motivations, and scholarly needs – librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship. </p> Sarah Nguyễn Vicky Rampin ##submission.copyrightStatement## 2022-11-01 2022-11-01 17 1 18 18 10.2218/ijdc.v17i1.839 Automation is Documentation: Functional Documentation of Human-Machine Interaction for Future Software Reuse <p class="Abstract">Preserving software and providing access to obsolete software is necessary and will become even more important for work with any kind of born-digital artifacts. While usability and availability of emulation in digital curation and preservation workflow has improved significantly, productive (re)use of preserved obsolete software is a growing concern, due to a lack of (future) operational knowledge. In this article we describe solutions to automate and document software usage in a way, such that the result is not only instructive but also productive.</p> Jurek Oberhauser Rafael Gieschke Klaus Rechert ##submission.copyrightStatement## 2022-09-06 2022-09-06 17 1 11 11 10.2218/ijdc.v17i1.836 Cluster Analysis of Open Research Data: A Case for Replication Metadata <p class="Abstract"><span lang="EN-GB">Research data are often released upon journal publication to enable result verification and reproducibility. For that reason, research dissemination infrastructures typically support diverse datasets coming from numerous disciplines, from tabular data and program code to audio-visual files. Metadata, or <em>data about data</em>, is critical to making research outputs adequately documented and FAIR. Aiming to contribute to the discussions on the development of metadata for research outputs, I conducted an exploratory analysis to determine how research datasets cluster based on what researchers organically deposit together. I use the content of over 40,000 datasets from the Harvard Dataverse research data repository as my sample for the cluster analysis. I find that the majority of the clusters are formed by single-type datasets, while in the rest of the sample, no meaningful clusters can be identified. For the result interpretation, I use the metadata standard employed by DataCite, a leading organization for documenting a scholarly record, and map existing <em>resource types</em> to my results. About 65% of the sample can be described with a single-type metadata (such as <em>Dataset</em>, <em>Software</em> or<em>Report</em>), while the rest would require aggregate metadata types. Though DataCite supports an aggregate type such as a <em>Collection</em>, I argue that a significant number of datasets, in particular those containing both data and code files (about 20% of the sample), would be more accurately described as a <em>Replication resource</em> metadata type. Such resource type would be particularly useful in facilitating research reproducibility.</span></p> Ana Trisovic ##submission.copyrightStatement## 2023-02-02 2023-02-02 17 1 13 13 10.2218/ijdc.v17i1.833 DBRepo: a Semantic Digital Repository for Relational Databases <p>Data curation is a complex, multi-faceted task. While dedicated data stewards are starting to take care of these activities in close collaboration with researchers for many types of (usually file-based) data in many institutions, this is rarely yet the case for data held in relational databases. Beyond large-scale infrastructures hosting e.g. climate or genome data, researchers usually have to create, build and maintain their database, care about security patches, and feed data into it in order to use it in their research. Data curation, if at all, usually happens after a project is finished, when data may be exported for digital preservation into file repository systems.</p> <p>We present DBRepo, a semantic digital repository for relational databases in a private cloud setting designed to (1) host research data stored in relational databases right from the beginning of a research project, (2) provide separation of concerns, allowing the researchers to focus on the domain aspects of the data and their work while bringing in experts to handle classic data management tasks, (3) improve findability, accessibility and reusability by offering semantic mapping of metadata attributes, and (4) focus on reproducibility in dynamically evolving data by supporting versioning and precise identification/cite-ability for arbitrary subsets of data.<span class="Apple-converted-space">&nbsp;</span></p> Martin Weise Moritz Staudinger Cornelia Michlits Eva Gergely Kirill Stytsenko Raman Ganguly Andreas Rauber ##submission.copyrightStatement## 2022-09-07 2022-09-07 17 1 11 11 10.2218/ijdc.v17i1.825 Long-Term Preservation and Reusability of Open Access Scholar-Led Press Monographs <p>This brief report outlines some initial findings and challenges identified by the Community-Led Open Publication Infrastructures for Monographs (COPIM) project when looking to archive and preserve open access books produced by small, scholar-led presses. This paper is based on the research conducted by Work Package 7 in COPIM, which has a focus on the preservation and archiving of open access monographs in all their complexity, along with any accompanying materials.<span class="Apple-converted-space">&nbsp;</span></p> Miranda Barnes Ross Higman Gareth J Cole Rupert Gatti Jenny Fry ##submission.copyrightStatement## 2023-02-05 2023-02-05 17 1 5 5 10.2218/ijdc.v17i1.826 OpenCitations: an Open e-Infrastructure to Foster Maximum Reuse of Citation Data <p>OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web (Linked Data) technologies. OpenCitations collaborates with projects that are part of the Open Science ecosystem and complies with the UNESCO founding principles of Open Science, the I4OC recommendations, and the FAIR data principles that data should be Findable, Accessible, Interoperable and Reusable. Since its data satisfies all the Reuse guidelines provided by FAIR in terms of richness, provenance, usage licenses and domain-relevant community standards, OpenCitations provides an example of a successful open e-infrastructure in which the reusability of data is integral to its mission.</p> Chiara Di Giambattista Ivan Heibi Silvio Peroni David Shotton ##submission.copyrightStatement## 2022-08-03 2022-08-03 17 1 5 5 10.2218/ijdc.v17i1.818 Proposal for a Maturity Continuum Model for Open Research Data <p>As a contribution to the general effort in research to generalize and improve the practices of Open Research Data (ORD), we developed a model conceptualizing the degrees of maturity of a research community in terms of ORD. This model may be used to assess the ORD capacity or maturity level of a specific research community, to strengthen the use of standards with respect to ORD within this community, and to increase its ORD maturity level.</p> <p>We present the background and our motivations for developing such an instrument as well as the reasoning leading to its design. We present its elements in detail and discuss possible applications.<span class="Apple-converted-space">&nbsp;</span></p> Marielle Guirlet Gaia Bongi Elise Point Grégoire Urvoy René Schneider ##submission.copyrightStatement## 2023-01-27 2023-01-27 17 1 6 6 10.2218/ijdc.v17i1.821 On the Reusability of Data Cleaning Workflows <p>The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through&nbsp;updates and data transformations, such that downstream analyses can be conducted and&nbsp;lead to trustworthy results. A transparent and reusable data cleaning workflow can save time&nbsp;and effort through automation, and make subsequent data cleaning on new data less errorprone.&nbsp;However, reusability of data cleaning workflows has received little to no attention in&nbsp;the research community. We identify some challenges and opportunities for reusing data&nbsp;cleaning workflows. We present a high-level conceptual model to clarify what we mean by&nbsp;reusability and propose ways to improve reusability along different dimensions. We use&nbsp;the opportunity of presenting at IDCC to invite the community to share their uses cases,&nbsp;experiences, and desiderata for the reuse of data cleaning workflows and recipes in order&nbsp;to foster new collaborations and guide future work.</p> Lan Li Bertram Ludäscher ##submission.copyrightStatement## 2022-09-27 2022-09-27 17 1 6 6 10.2218/ijdc.v17i1.828 Data Curation Strategies to Support Responsible Big Social Research and Big Social Data Reuse <p class="Abstract">Big social research repurposes existing data from online sources such as social media, blogs, or online forums, with a goal of advancing knowledge of human behavior and social phenomena. Big social research also presents an array of challenges that can prevent data sharing and reuse.</p> <p class="Abstract">This brief report presents an overview of a larger study that aims to understand the data curation implications of big social research to support use and reuse of big social data. The study, which is based in the United States, identifies six key issues relating to big social research and big social data curation through a review of the literature. It then further investigates perceptions and practices relating to these six key issues through semi-structured interviews with big social researchers and data curators.</p> <p class="Abstract">This report concludes with implications for data curation practice: metadata and documentation, connecting with researchers throughout the research process, data repository services, and advocating for community standards. Supporting responsible practices for using big social data can help scale up social science research, thus enhancing our understanding of human behavior and social phenomena.</p> Sara Mannheimer ##submission.copyrightStatement## 2022-12-06 2022-12-06 17 1 8 8 10.2218/ijdc.v17i1.823 Increasing the Reuse of Data through FAIR-enabling the Certification of Trustworthy Digital Repositories <p class="Abstract">The long-term preservation of digital objects, and the means by which they can be reused, are addressed by both the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) and a number of standards bodies providing Trustworthy Digital Repository (TDR) certification, such as the CoreTrustSeal.&nbsp; Though many of the requirements listed in the <em>Core Trustworthy Data Repositories Requirements 2020–2022 Extended Guidance</em> address the FAIR Data Principles indirectly, there is currently no formal ‘FAIR Certification’ offered by the CoreTrustSeal or other TDR standards bodies. To address this gap the FAIRsFAIR project developed a number of tools and resources that facilitate the assessment of FAIR-enabling practices at the repository level as well as the FAIRness of datasets within them. These include the <em>CoreTrustSeal+FAIRenabling Capability Maturity model</em> (CTS+FAIR CapMat), a FAIR-Enabling<em> Trustworthy Digital Repositories-Capability Maturity Self-Assessment </em>template, and F-UJI , &nbsp;a web-based tool designed to assess the FAIRness of research data objects.&nbsp; The success of such tools and resources ultimately depends upon community uptake. This requires a community-wide commitment to develop best practices to increase the reuse of data and to reach consensus on what these practices are.&nbsp; One possible way of achieving community consensus would be through the creation of a network of FAIR-enabling TDRs, as proposed by FAIRsFAIR.</p> Benjamin Jacob Mathers Hervé L’Hours ##submission.copyrightStatement## 2023-10-03 2023-10-03 17 1 5 5 10.2218/ijdc.v17i1.852 Towards Environmentally Sustainable Long-term Digital Preservation <p class="Abstract">ARCHIVER and Pre-Commercial Procurement funding has enabled small to medium enterprises (SMEs) to innovate and deliver new services for EOSC. Within the framework of the <a href=""><span style="color: windowtext; text-decoration: none; text-underline: none;">ARCHIVER </span></a>pre-commercial procurement tender, between December 2020 and August 2021, three commercial consortia competed to deliver innovative, prototype solutions for long-term data preservation. Two of them were selected to continue with the pilot phase and deliver research-ready solutions for long-term data preservation of research data, therefore filling a gap in the current European Open Science panorama.</p> <p class="Abstract">Digital preservation relies on technological infrastructure (information and communication technology, ICT) that can have environmental impacts. While altering technology usage can reduce the impact of digital preservation practices, this alone is not a strategy for sustainable practice. Moving toward environmentally sustainable digital preservation requires critically examining the motivations and assumptions that shape current practice. The use of scalable cloud infrastructures can reduce the environmental impacts of long-term data preservation solutions.</p> Ignacio Peluaga João Fernandes Shreyasvi Natraj ##submission.copyrightStatement## 2022-10-31 2022-10-31 17 1 6 6 10.2218/ijdc.v17i1.848 Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">As policies, good practices and mandates on research data management evolve, more emphasis has been put on the licencing of data, which allows potential re-users to quickly identify what they can do with the data in question. In this paper I analyse a pre-existing collection of 840 Horizon 2020 public data management plans (DMPs) to determine which ones mention creative commons licences and among those who do, which licences are being used. </span></p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">I find that 36% of DMPs mention creative commons and among those a number of different approaches towards licencing exist (overall policy per project, licencing decisions per dataset, licencing decisions per partner, licensing decision per data format, licensing decision per perceived stakeholder interest), often clad in rather vague language with CC licences being “recommended” or “suggested”. Some DMPs also “kick the can further down the road” by mentioning that “a” CC licence will be used, but not which one. However, among those DMPs that do mention specific CC licences, a clear favourite emerges: the CC-BY licence, which accounts for half of the total mentioning of a specific licence. </span></p> <p class="Abstract" style="margin: 0cm -.05pt 5.0pt 0cm;"><span lang="EN-GB">The fact that 64% of DMPs did not mention creative commons at all is an indication for the need for further training and awareness raising on data management in general and licencing in particular in Horizon Europe. For those DMPs that do mention specific licences, 60% would be compliant with Horizon Europe requirements (CC-BY or CC0). However, it should be carefully monitored whether content similar to the 40% that is currently licenced with non- Horizon Europe compliant licences will in the future move to CC-BY or CC0 or whether such content will simply be kept fully closed by projects (by invoking the “as open as possible, as close as necessary” principle), which would be an unintended and potentially damaging consequence of the policy. </span></p> Daniel Spichtinger ##submission.copyrightStatement## 2022-09-20 2022-09-20 17 1 9 9 10.2218/ijdc.v17i1.840 Data Showcases: the Data Journal in a Multimodal World <p>&nbsp;</p> <p><span class="Apple-converted-space">&nbsp;</span>As an experiment, the Research Data Journal for the Humanities and Social Sciences (RDJ) has temporarily extended the usual format of the online journal with so-called ‘showcases’, separate web pages containing a quick introduction to a dataset, embedded multimedia, interactive components, and facilities to directly preview and explore the dataset described. The aim was to create a coherent hyper document with content communicated via different media (multimodality) and provide space for new forms of scientific publication such as executable papers (e.g. Jupyter notebooks). This paper discusses the objectives, technical implementations, and the need for innovation in data publishing considering the advanced possibilities of today's digital modes of communication. The data showcases experiment proved to be a useful starting point for an exploration of related developments within and outside the humanities and social sciences. It turns out that small-scale experiments are relatively easy to perform thanks to the easy availability of digital technology. However, real innovation in publishing affects organization and infrastructure and requires the joint effort of publishers, editors, data repositories, and authors. It implies a thorough update of the concept of publication and adaptation of the production process. This paper also pays attention to these obstacles to taking new paths.</p> Leen Breure Peter Doorn Hans Voorbij ##submission.copyrightStatement## 2022-12-06 2022-12-06 17 1 24 24 10.2218/ijdc.v17i1.789 Analysis of U.S. Federal Funding Agency Data Sharing Policies <p class="Abstract">Federal funding agencies in the United States (U.S.) continue to work towards implementing their plans to increase public access to funded research and comply with the 2013 Office of Science and Technology memo <em>Increasing Access to the Results of Federally Funded Scientific Research</em>. In this article we report on an analysis of research data sharing policy documents from 17 U.S. federal funding agencies as of February 2021. Our analysis is guided by two questions: 1.) What do the findings suggest about the current state of and trends in U.S. federal funding agency data sharing requirements? 2.) In what ways are universities, institutions, associations, and researchers affected by and responding to these policies? Over the past five years, policy updates were common among these agencies and several themes have been thoroughly developed in that time; however, uncertainty remains around how funded researchers are expected to satisfy these policy requirements.</p> Reid I. Boehm Hannah Calkins Patricia B. Condon Jonathan Petters Rachel Woodbrook ##submission.copyrightStatement## 2023-02-08 2023-02-08 17 1 18 18 10.2218/ijdc.v17i1.791 Data Curation in Interdisciplinary and Highly Collaborative Research <p class="Abstract"><span lang="EN-US">This paper provides a systematic analysis of publications that discuss data curation in interdisciplinary and highly collaborative research (IHCR). Using content analysis methodology, it examined 159 publications and identified patterns in definitions of interdisciplinarity, projects’ participants and methodologies, and approaches to data curation. The findings suggest that data is a prominent component in interdisciplinarity. In addition to crossing disciplinary and other boundaries, IHCR is defined as curating and integrating heterogeneous data and creating new forms of knowledge from it. Using personal experiences and descriptive approaches, the publications discussed challenges that data curation in IHCR faces, including an increased overhead in coordination and management, lack of consistent metadata practices, and custom infrastructure that makes interoperability across projects, domains, and repositories difficult. The paper concludes with suggestions for future research.</span></p> Inna Kouper ##submission.copyrightStatement## 2023-10-01 2023-10-01 17 1 20 20 10.2218/ijdc.v17i1.835