http://www.ijdc.net/issue/feed International Journal of Digital Curation 2018-12-10T00:36:40+00:00 IJDC Editorial Team ijdc@mlist.is.ed.ac.uk Open Journal Systems <p>The IJDC publishes peer-reviewed papers, articles and editorials on digital curation, research data management and related issues. &nbsp;</p> http://www.ijdc.net/article/view/429 Modelling the Research Data Lifecycle 2018-12-10T00:32:15+00:00 Stacy T Kowalczyk skowalczyk@dom.edu <p>This paper develops and tests a lifecycle model for the preservation of research data by investigating the research practices of scientists.  This research is based on a mixed-method approach.  An initial study was conducted using case study analytical techniques; insights from these case studies were combined with grounded theory in order to develop a novel model of the Digital Research Data Lifecycle.  A broad-based quantitative survey was then constructed to test and extend the components of the model.  The major contribution of these research initiatives are the creation of the Digital Research Data Lifecycle, a data lifecycle that provides a generalized model of the research process to better describe and explain both the antecedents and barriers to preservation.  The antecedents and barriers to preservation are data management, contextual metadata, file formats, and preservation technologies.  The availability of data management support and preservation technologies, the ability to create and manage contextual metadata, and the choices of file formats all significantly effect the preservability of research data.</p> 2018-06-01T23:13:59+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/500 A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository 2018-12-10T00:32:14+00:00 Amy M Pienta apienta@umich.edu Dharma Akmon apienta@umich.edu Justin Noble apienta@umich.edu Lynette Hoelter apienta@umich.edu Susan Jekielek apienta@umich.edu <p class="abstract-western" lang="en-US">Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users’ search activity in an established social science data repository to better understand demand for data and more effectively guide collection development. By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior. Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository’s holdings and leveraged this analysis to inform our collection and curation practices and policies. The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.</p> 2018-06-01T23:34:53+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/512 Tuuli project: accelerating data management planning in Finnish research organisations 2018-12-10T00:32:23+00:00 Minna Ahokas mari.elisa.kuusniemi@helsinki.fi Mari Elisa Kuusniemi mari.elisa.kuusniemi@helsinki.fi Jari Friman mari.elisa.kuusniemi@helsinki.fi <p class="BodyText2">Many research funders have requirements for data sharing and data management plans (DMP). DMP tools are services built to help researchers to create data management plans fitting their needs and based on funder and/or organisation guidelines. Project Tuuli (2015–2017) has provided DMPTuuli, a data management planning tool for Finnish researchers and research organisations offering DMP templates and guidance. In this paper we describe how project has helped both Finnish researchers and research organisations adopt research data management best practices. As a result of the project we have also created a national Tuuli network. With growing competence and collaboration of the network, the project has reached most of its goals. The project has also actively promoted DMP support and training in Finnish research organisations.</p> 2018-02-11T23:02:30+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/552 Building Tools to Support Active Curation: Lessons Learned from SEAD 2018-12-10T00:32:24+00:00 Dharma Akmon dharmrae@umich.edu Margaret Hedstrom hedstrom@umich.edu James D. Myers myersjd@umich.edu Anna Ovchinnikova dharmrae@umich.edu Inna Kouper inkouper@indiana.edu <p class="abstract-western">SEAD – a project funded by the US National Science Foundation’s DataNet program – has spent the last five years designing, building, and deploying an integrated set of services to better connect scientists’ research workflows to data publication and preservation activities. Throughout the project, SEAD has promoted the concept and practice of “active curation,” which consists of capturing data and metadata early and refining it throughout the data life cycle. In promoting active curation, our team saw an opportunity to develop tools that would help scientists better manage data for their own use, improve team coordination around data, implement practices that would serve the data better over time, and seamlessly connect with data repositories to ease the burden of sharing and publishing.</p> <p class="abstract-western">SEAD has worked with 30 projects, dozens of researchers, and hundreds of thousands of files, providing us with ample opportunities to learn about data and metadata, integrating with researchers’ workflows, and building tools and services for data. In this paper, we discuss the lessons we have learned and suggest how this might guide future data infrastructure development efforts.</p> 2018-01-02T22:15:53+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/516 Reuse for Research: Curating Astrophysical Datasets for Future Researchers 2018-12-10T00:32:25+00:00 Anders Sparre Conrad asc@kb.dk Rasmus Handberg asc@kb.dk Michael Svendsen asc@kb.dk <p class="abstract-western"><span style="color: #000000;">“Our data are going to be valuable for science for the next 50 years, so please make sure you preserve them and keep them accessible for active research for at least that period.”</span></p> <p class="abstract-western">These were approximately the words used by the principal investigator of the Kepler Asteroseismic Science Consortium (KASC) when he presented our task to us. The data in question consists of data products produced by KASC researchers and working groups as part of their research, as well as underlying data imported from the NASA archives.</p> <p class="abstract-western">The overall requirements for 50 years of preservation while, at the same time, enabling reuse of the data for active research presented a number of specific challenges, closely intertwining data handling and data infrastructure with scientific issues. This paper reports our work to deliver the best possible solution, performed in close cooperation between the research team and library personnel.</p> 2017-12-30T21:57:51+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/570 Integration of an Active Research Data System with a Data Repository to Streamline the Research Data Lifecyle: Pure-NOMAD Case Study 2018-12-10T00:32:18+00:00 Simone Ivan Conte ff23@st-andrews.ac.uk Federica Fina ff23@st-andrews.ac.uk Michalis Psalios sic2@st-andrews.ac.uk Shyam Ryal sic2@st-andrews.ac.uk Tomas Lebl sic2@st-andrews.ac.uk Anna Clements ff23@st-andrews.ac.uk <p class="abstract-western"><span style="color: #000000;">Research funders have introduced requirements that expect researchers to properly manage and publicly share their research data, and expect institutions to put in place services to support researchers in meeting these requirements. So far the general focus of these services and systems has been on addressing the final stages of the research data lifecycle (archive, share and re-use), rather than stages related to the active phase of the cycle (collect/create and analyse). As a result, full integration of active data management systems with data repositories is not yet the norm, making the streamlined transition of data from an active to a published and archived status an important challenge. In this paper we present the integration between an active data management system developed in-house (NOMAD) and Elsevier’s Pure data repository used at our institution, with the aim of offering a simple workflow to facilitate and promote the data deposit process. The integration results in a new data management and publication workflow that helps researchers to save time, minimize human errors related to manually handling files, and further promote data deposit together with collaboration across the institution</span><span style="color: #000000;">.</span></p> 2018-04-19T14:47:51+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/576 Library Carpentry: Software Skills Training for Library Professionals 2018-12-10T00:32:17+00:00 Jez Cope j.s.cope@sheffield.ac.uk James Baker j.s.cope@sheffield.ac.uk <p class="abstract-western">Much time and energy is now being devoted to developing the skills of researchers in the related areas of data analysis and data management. However, less attention is currently paid to developing the data skills of librarians themselves: these skills are often brought in by recruitment in niche areas rather than considered as a wider development need for the library workforce, and are not widely recognised as important to the professional career development of librarians. We believe that building computational and data science capacity within academic libraries will have direct benefits for both librarians and the users we serve.</p> <p class="abstract-western">Library Carpentry is a global effort to provide training to librarians in technical areas that have traditionally been seen as the preserve of researchers, IT support and systems librarians. Established non-profit volunteer organisations, such as Software Carpentry and Data Carpentry, offer introductory research software skills training with a focus on the needs and requirements of research scientists. Library Carpentry is a comparable introductory software skills training programme with a focus on the needs and requirements of library and information professionals. This paper describes how the material was developed and delivered, and reports on challenges faced, lessons learned and future plans.</p> 2018-05-11T18:25:02+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/551 When Scientists Become Social Scientists: How Citizen Science Projects Learn About Volunteers 2018-12-10T00:36:40+00:00 Peter Darch ptdarch@illinois.edu <p class="abstract-western">Online citizen science projects involve recruitment of volunteers to assist researchers with the creation, curation, and analysis of large datasets. Enhancing the quality of these data products is a fundamental concern for teams running citizen science projects. Decisions about a project’s design and operations have a critical effect both on whether the project recruits and retains enough volunteers, and on the quality of volunteers’ work. The processes by which the team running a project learn about their volunteers play a critical role in these decisions. Improving these processes will enhance decision-making, resulting in better quality datasets, and more successful outcomes for citizen science projects. This paper presents a qualitative case study, involving interviews and long-term observation, of how the team running Galaxy Zoo, a major citizen science project in astronomy, came to know their volunteers and how this knowledge shaped their decision-making processes. This paper presents three instances that played significant roles in shaping Galaxy Zoo team members’ understandings of volunteers. Team members integrated heterogeneous sources of information to derive new insights into the volunteers. Project metrics and formal studies of volunteers combined with tacit understandings gained through on- and offline interactions with volunteers. This paper presents a number of recommendations for practice. These recommendations include strategies for improving how citizen science project team members learn about volunteers, and how teams can more effectively circulate among themselves what they learn.</p> 2018-12-10T02:28:40+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/583 The Changing Influence of Journal Data Sharing Policies on Local RDM Practices 2018-12-10T00:32:14+00:00 Dylanne Dearborn dylanne.dearborn@utoronto.ca Steve Marks dylanne.dearborn@utoronto.ca Leanne Trimble dylanne.dearborn@utoronto.ca <p class="abstract-western">The purpose of this study was to examine changes in research data deposit policies of highly ranked journals in the physical and applied sciences between 2014 and 2016, as well as to develop an approach to examining the institutional impact of deposit requirements. Policies from the top ten journals (ranked by impact factor from the Journal Citation Reports) were examined in 2014 and again in 2016 in order to determine if data deposits were required or recommended, and which methods of deposit were listed as options. For all 2016 journals with a required data deposit policy, publication information (2009-2015) for the University of Toronto was pulled from Scopus and departmental affiliation was determined for each article. The results showed that the number of high-impact journals in the physical and applied sciences requiring data deposit is growing. In 2014, 71.2% of journals had no policy, 14.7% had a recommended policy, and 13.9% had a required policy (n=836). In contrast, in 2016, there were 58.5% with no policy, 19.4% with a recommended policy, and 22.0% with a required policy (n=880). It was also evident that U of T chemistry researchers are by far the most heavily affected by these journal data deposit requirements, having published 543 publications, representing 32.7% of all publications in the titles requiring data deposit in 2016. The Python scripts used to retrieve institutional publications based on a list of ISSNs have been released on GitHub so that other institutions can conduct similar research.</p> 2018-06-04T15:23:06+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/567 Are the FAIR Data Principles fair? 2018-12-10T00:36:39+00:00 Alastair Dunning A.C.Dunning@tudelft.nl Madeleine de Smaele A.C.Dunning@tudelft.nl Jasmin Böhmer J.K.Boehmer@tudelft.nl <p class="abstract-western">This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories.</p> <p class="abstract-western"><span style="color: #000000;">The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines</span><sup><span style="color: #000000;"><a class="sdfootnoteanc" name="sdfootnote1anc"></a>1</span></sup><span style="color: #000000;">are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance.</span></p> <p class="abstract-western"><span style="color: #000000;">This practice paper is connected to a dataset</span><span style="color: #000000;">(Dunning et al.,</span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2017</a></span></span><span style="color: #000000;">) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results.</span></p> <p class="abstract-western">The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture.</p> <div id="sdfootnote1"> <p class="western"><a class="sdfootnotesym-western" name="sdfootnote1sym"></a>(1) H2020 Guidelines on FAIR Data Management:<span style="color: #006b6b;"><span lang="zxx"><a class="western" href="http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf">http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf</a></span></span></p> </div> 2018-12-10T02:28:40+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/509 A Framework for the Preservation of a Docker Container 2018-12-10T00:32:21+00:00 Iain Emsley iain.emsley@oerc.ox.ac.uk David De Roure iain.emsley@oerc.ox.ac.uk <p>Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, and scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place are becoming common within projects. Its seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results by creating the machine environment and package versions. It also helps development and testing through ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process it also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance within the build process useful to provide a complete overview of the workflow.</p> 2018-04-02T11:50:32+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/577 Frictionless Data: Making Research Data Quality Visible 2018-12-10T00:32:16+00:00 Dan Fowler jo.barratt@okfn.org Jo Barratt jo.barratt@okfn.org Paul Walsh jo.barratt@okfn.org <p class="abstract-western"><span style="color: #000000;">There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson,</span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2003</a></span></span><span style="color: #000000;">). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the “Data Package”, a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal.</span></p> 2018-05-13T10:53:47+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/511 Developing a Digital Archive for Symbolic Resources in Urban Environments - the Latina Project 2018-12-10T00:32:21+00:00 Robert Harland g.j.cole@lboro.ac.uk Antonia Liguori g.j.cole@lboro.ac.uk Gareth Cole g.j.cole@lboro.ac.uk <p class="abstract-western"><span style="color: #000000;">The project described in this paper was funded to establish the foundation for a digital archival resource for researchers interested in the way people interact with urban environments through graphic communications. The research was internally funded by Loughborough University as part of its Research Challenge Programme and involved two members of academic staff and two library staff.[1]&nbsp;</span><sup><span style="color: #000000;"><a class="sdfootnoteanc" name="sdfootnote1anc"></a></span></sup><span style="color: #000000;">Two PhD students also participated.</span></p> <p class="abstract-western">The archive consists of a small number of images and will act as a proof of concept, not only for this project but also for current and future funding applications. It is hoped that an extended archive will be useful not only to visual communication researchers, but also historians, architects, town planners and others. This paper will describe the data collection process, the challenges facing the project team in data curation and data documentation, and the creation of the pilot archive.</p> <p class="abstract-western">The creation of the archive posed challenges for both the researchers and Library staff. For the researchers:</p> <ul> <li> <p class="abstract-western">Choosing a small number of images as a discrete collection but which also demonstrated the utility of the project to other disciplinary areas;</p> </li> <li> <p class="abstract-western">Acquiring the necessary knowledge and skills to enable good curation and usability of the digital objects, e.g. file formats, metadata creation;</p> </li> <li> <p class="abstract-western">Understanding what the technical solution enabled and where compromises would have to be made.</p> </li> </ul> <p class="abstract-western">For library staff:</p> <ul> <li> <p class="abstract-western">Demonstrating the utility of the Data Repository;</p> </li> <li> <p class="abstract-western">Understanding the intellectual background to the project and the purpose&nbsp;of the Data Archive within the project;</p> </li> <li> <p class="abstract-western">Clearly explaining the purpose of metadata and documentation.</p> </li> </ul> <p class="abstract-western">The Latina Project has demonstrated the value of a true partnership between the academic community and the professional services. All parties involved have learnt from the creation of the pilot archive and their practices have evolved. For example, it has made the researchers think more carefully about data curation questions and the professional services staff identify more closely with the research purposes for data creation. By working together so closely and sharing ideas from our different perspectives we have also identified potential technical developments which could be explored in future projects. All members of the group hope that the relationships built during this project will continue through other projects. [1] Academic staff: Drs Harland and Liguori. Library staff: Gareth Cole and Barbara Whetnall.</p> 2018-04-02T14:13:53+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/562 Creating a Community of Data Champions 2018-12-10T00:32:23+00:00 Rosie Higman rosie.higman@MANCHESTER.AC.UK Marta Teperek M.Teperek@tudelft.nl Danny Kingsley dak45@CAM.AC.UK <p class="abstract-western">Research Data Management (RDM) presents an unusual challenge for service providers in Higher Education. There is increased awareness of the need for training in this area but the nature of the discipline-specific practices involved make it difficult to provide training across a multi-disciplinary organisation. Whilst most UK universities now have a research data team of some description, they are often small and rarely have the resources necessary to provide targeted training to the different disciplines and research career stages that they are increasingly expected to support.</p> <p class="abstract-western">This practice paper describes the approach taken at the University of Cambridge to address this problem by creating a community of Data Champions. This collaborative initiative, working with researchers to provide training and advocacy for good RDM practice, allows for more discipline-specific training to be given, researchers to be credited for their expertise and creates an opportunity for those interested in RDM to exchange knowledge with others. The ‘community of practice’ model has been used in many sectors, including Higher Education, to facilitate collaboration across organisational units and this initiative will adopt some of the same principles to improve communication across a decentralised institution. The Data Champions initiative at Cambridge was launched in September 2016 and this paper reports on the early months, plans for building the community in the future and the possible risks associated with this approach to providing RDM services.</p> 2018-02-11T17:09:08+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/506 Introducing Safe Access to Sensitive Data at the University of Bristol 2018-12-10T00:32:26+00:00 Debra Hiom d.hiom@bristol.ac.uk Stephen Gray stephen.gray@bristol.ac.uk Damian Steer d.steer@bristol.ac.uk Kirsty Merrett d.hiom@bristol.ac.uk Kellie Snow d.hiom@bristol.ac.uk Zosia Beckles d.hiom@bristol.ac.uk <p class="abstract-western"><span style="color: #000000;">T</span><span style="color: #000000;">he economic and societal benefits of making research data available for reuse and verification are now widely understood and accepted. However, there are some research studies, particularly those involving human participants, which face particular challenges in making their data openly available due to the sensitivities of the data. Despite its potential value to society this material is invariably kept locked away due to concerns over its inappropriate disclosure. The University of Bristol’s Research Data Service has developed the institutional infrastructure, including policies and procedures, required to safely grant access to sensitive research data in a way that is transparent, secure, sustainable and crucially, replicable by other institutions.</span></p> <p class="abstract-western">This paper looks at the background and challenges faced by the institution in dealing with sensitive data, outlines the approach taken and some of the outstanding issues to be tackled.</p> <p>This paper looks at the background and challenges faced by the institution in dealing with sensitive data, outlines the approach taken and some of the outstanding issues to be tackled.</p> 2017-12-30T20:16:23+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/508 Evaluating the Effectiveness of Data Management Training: DataONE’s Survey Instrument 2018-12-10T00:32:25+00:00 Chung-Yi Hou hou@ucar.edu Heather Soyka hou@ucar.edu Vivian Hutchison hou@ucar.edu Isis Sema hou@ucar.edu Chris Allen hou@ncar.edu Amber Budden hou@ucar.edu <div class="WordSection1"> <p class="abstract-western">Effective management is a key component for preparing data to be retained for future long term access, use, and reuse by a broader community. Developing the skills to plan and perform data management tasks is important for individuals and institutions. Teaching data literacy skills may also help to mitigate the impact of data deluge and other effects of being overexposed to and overwhelmed by data.</p> <p class="abstract-western">The process of learning how to manage data effectively for the entire research data lifecycle can be complex. There are often multiple stages involved within a lifecycle for managing data, and each stage may require specific knowledge, expertise, and resources. Additionally, although a range of organizations offers data management education and training resources, it can often be difficult to assess how effective the resources are for educating users to meet their data management requirements.</p> <p class="abstract-western">In the case of Data Observation Network for Earth (DataONE), DataONE’s extensive collaboration with individuals and organizations has informed the development of multiple educational resources. Through these interactions, DataONE understands that the process of creating and maintaining educational materials that remain responsive to community needs is reliant on careful evaluations. Therefore, the impetus for a comprehensive, customizable Education EVAluation instrument (EEVA) is grounded in the need for tools to assess and improve current and future training and educational resources for research data management.</p> <p class="abstract-western">In this paper, the authors outline and provide context for the background and motivations that led to creating EEVA for evaluating the effectiveness of data management educational resources. The paper details the process and results of the current version of EEVA. Finally, the paper highlights the key features, potential uses, and the next steps in order to improve future extensions and revisions of EEVA.</p> </div> 2017-12-31T14:30:45+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/574 Developing Data Curation Protocols for Digital Projects at Vanderbilt: Une Micro-Histoire 2018-12-10T00:32:17+00:00 Veronica A Ikeshoji-Orlati v.ikeshoji-orlati@vanderbilt.edu Clifford B Anderson v.ikeshoji-orlati@vanderbilt.edu <p class="abstract-western"><span style="color: #000000;">This paper examines the intersection of legacy digital humanities projects and the ongoing development of research data management services at Vanderbilt University’s Jean and Alexander Heard Library. Future directions for data management and curation protocols are explored through the lens of a case study: the (re)curation of data from an early 2000s e-edition of Raymond Poggenburg’s </span><span style="color: #000000;"><em>Charles Baudelaire: Une Micro-histoire</em></span><span style="color: #000000;">. The vagaries of applying the Library of Congress Metadata Object Description Schema (MODS) to the data and metadata of the</span><span style="color: #000000;"><em>Micro-histoire</em></span><span style="color: #000000;">will be addressed. In addition, the balance between curating data and metadata for preservation vs. curating it for (re)use by future researchers is considered in order to suggest future avenues for holistic research data management services at Vanderbilt.</span></p> 2018-05-08T11:40:10+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/578 How Valid is your Validation? A Closer Look Behind the Curtain of JHOVE 2018-12-10T00:32:16+00:00 Michelle Lindlar michelle.lindlar@tib.eu Yvonne Tunnat y.tunnat@zbw.eu <p class="abstract-western">Validation is a key task of any preservation workflow and often JHOVE is the first tool of choice for characterizing and validating common file formats. Due to the tool’s maturity and high adoption, decisions if a file is indeed fit for long-term availability are often made based on JHOVE output. But can we trust a tool simply based on its wide adoption and maturity by age? How does JHOVE determine the validity and well-formedness of a file? Does a module really support all versions of a file format family? How much of the file formats’ standards do we need to know and understand in order to interpret the output correctly? Are there options to verify JHOVE-based decisions within preservation workflows? While the software has been a long-standing favourite within the digital curation domain for many years, a recent look at JHOVE as a vital decision supporting tool is currently missing. This paper presents a practice report which aims to close this gap.</p> 2018-05-13T11:56:21+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/518 Sharing Selves: Developing an Ethical Framework for Curating Social Media Data 2018-12-10T00:32:19+00:00 Sara Mannheimer a.whyte@ed.ac.uk Elizabeth A. Hull ehull@datadryad.org <p class="abstract-western">Open sharing of social media data raises new ethical questions that researchers, repositories and data curators must confront, with little existing guidance available. In this paper, the authors draw upon their experiences in their multiple roles as data curators, academic librarians, and researchers to propose the STEP framework for curating and sharing social media data. The framework is intended to be used by data curators facilitating open publication of social media data. Two case studies from the Dryad Digital Repository serve to demonstrate implementation of the STEP framework. The STEP framework can serve as one important ‘step’ along the path to achieving safe, ethical, and reproducible social media research practice.</p> 2018-04-18T21:53:32+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/507 Researcher Training in Spreadsheet Curation 2018-12-10T00:32:22+00:00 Gene Lyddon Melzack diana.sisu@ed.ac.uk <p>Spreadsheets are commonly used across most academic discplines, however their use has been associated with a number of issues that affect the accuracy and integrity of research data. In 2016, new training on spreadsheet curation was introduced at the University of Sydney to address a gap between practical software skills training and generalised research data management training. The approach to spreadsheet curation behind the training was defined and the training's distinction from other spreadsheet curation training offering described.\par<br>The uptake of and feedback on the training were evaluated. Training attendance was analysed by discipline and by role. Quantitative and qualitative feedback were analysed and discussed. Feedback revealed that many attendees had been expecting and desired practical spreadsheet software skills training. Issues relating to whether or not practical skills training should and can be integrated with curation training were discussed. While attendees were found to be predominantly from science disciplines, qualitative feedback suggests that humanities attendees have specific needs in relation to managing data with spreadsheets that are currently not being met. Feedback also suggested that some attendees would prefer the curation training to be delivered as a longer, more in depth, hands on workshop.\par<br>The impact of the training was measured using data collected from the University's Research Data Management Planning (RDMP) tool and the Sydney eScholarship Repository. RDMP descriptions of spreadsheet data and records of tabular datasets published in the repository were analysed and assessed for quality and for accompanying data documentation. No significant improvements in data documentation or quality were found, however it is likely too soon after the launch of the training program to have seen much in the way of impact.\par<br>Identified next steps include clarifying the marketing material promoting to the training to better communicate the curation focus, investigating the needs of humanities researchers working with qualitative data in spreadsheets, and incorporating new material into the training in order to address those needs. Integrating curation training with practical skills training and modifying the training to be more hands on are changes that may be considered in future, but will not be implemented at this stage.</p> 2018-04-01T22:46:48+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/579 Mobilising a Nation: RDM Training and Education in South Africa 2018-12-10T00:32:15+00:00 Refiloe Matlatse heila.pienaar@up.ac.za Heila Pienaar heila.pienaar@up.ac.za Martie van Deventer heila.pienaar@up.ac.za <p class="abstract-western"><span style="color: #000000;">The South African Network of Data and Information Curation Communities (NeDICC) was formed to promote the development and use of standards and best practices among South African data stewards and data librarians (NeDICC, </span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2015</a></span></span><span style="color: #000000;">). The steering committee has members from various South African HEIs and research councils. As part of their service offerings NeDICC arranges seminars, workshops and conferences to promote awareness regarding digital curation. NeDICC has contributed to the increase in awareness, and growth of knowledge, on the subject of digital and data curation in South Africa (Kahn et al.,</span><span style="color: #006b6b;"><span lang="zxx"><a class="western">2014</a></span></span><span style="color: #000000;">).</span><span style="color: #000000;"><span lang="en-ZA">NeDICC members are involved in the UP M.IT and Continued Professional Development training, and serve as external examiners for the UCT M.Phil in Digital Curation degree. NeDICC is responsible for the Research Data Management track at the annual e-Research conference in SA</span></span><sup><span style="color: #000000;"><span lang="en-ZA"><a class="sdfootnoteanc" name="sdfootnote1anc"></a>1</span></span></sup><span style="color: #000000;"><span lang="en-ZA">and develops an annual training-focussed programme to provide workshop opportunities with both SA and foreign trainers. </span></span><span style="color: #000000;">This paper specifically addresses the efforts by this community to mobilise and upskill South African librarians so that they would be willing and able to provide the necessary RDM services that would strengthen the national data effort.</span></p> <div id="sdfootnote1"> <p class="western"><a class="sdfootnotesym-western" name="sdfootnote1sym"></a>1e<span lang="en-ZA">Research conference: </span><span style="color: #006b6b;"><span lang="zxx"><a class="western" href="http://www.eresearch.ac.za/"><span lang="en-ZA">http://www.eresearch.ac.za/</span></a></span></span></p> </div> 2018-05-18T22:08:07+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/581 Navigating Unmountable Media with the Digital Forensics XML File System 2018-12-10T00:36:36+00:00 Alexander Nelson alexander.nelson@nist.gov Alexandra Chassanoff alexander.nelson@nist.gov Alexandra Holloway alexander.nelson@nist.gov <p class="abstract-western" lang="en-US">Some computer storage is non-navigable by current general-purpose computers. This could be because of obsolete interface software, or a more specialized storage system lacking widespread support. These storage systems may contain artifacts of great cultural, historical, or technical significance, but implementing compatible interfaces that are fully navigable may be beyond available resources.</p> <p class="abstract-western" lang="en-US">We developed the DFXML File System (DFXMLFS) to enable navigation of arbitrary storage systems that fulfill a minimum feature set of the POSIX file system standard. Our approach advocates for a two-step workflow that separates parsing the storage’s file system structures from navigating the storage like a contemporary file system, including file contents. The parse extracts essential file system metadata, serializing to Digital Forensics XML for later consumption as a read-only file system.</p> 2018-12-10T02:28:40+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/571 Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework 2018-12-10T00:36:38+00:00 Hagen Peukert hagen.peukert@uni-hamburg.de <p class="abstract-western">Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation. It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered. Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.</p> 2018-12-10T02:28:40+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/575 Implementing a Research Data Policy at Leiden University 2018-12-10T00:36:37+00:00 Fieke Schoots f.schoots@library.leidenuniv.nl Laurents Sesink f.schoots@library.leidenuniv.nl Peter Verhaar f.schoots@library.leidenuniv.nl Floor Frederiks f.schoots@library.leidenuniv.nl <p class="abstract-western">In this paper, we discuss the various stages of the institution-wide project that lead to the adoption of the data management policy at Leiden University in 2016. We illustrate this process by highlighting how we have involved all stakeholders. Each organisational unit was represented in the project teams. Results were discussed in a sounding board with both academic and support staff. Senior researchers acted as pioneers and raised awareness and commitment among their peers. By way of example, we present pilot projects from two faculties. We then describe the comprehensive implementation programme that will create facilities and services that must allow implementing the policy as well as monitoring and evaluating it. Finally, we will present lessons learnt and steps ahead. The engagement of all stakeholders, as well as explicit commitment from the Executive Board, has been an important key factor for the success of the project and will continue to be an important condition for the steps ahead.</p> 2018-12-10T02:28:40+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/515 Setting up a National Research Data Curation Service for Qatar: Challenges and Opportunities 2018-12-10T00:32:20+00:00 Arif Shaon ashaon@qnl.qa Armin Straube ashaon@qnl.qa Krishna Roy Chowdhury ashaon@qnl.qa <p class="abstract-western" lang="en-GB"><span style="color: #00000a;">Over the past decade, Qatar has been making considerable progress towards developing a sustainable research culture for the nation. The main driver behind Qatar’s progress in research and innovation is Qatar Foundation for Education, Science, and Community Development (QF), a private, non-profit organization that aims to utilise research as a catalyst for expanding, diversifying and improving the country’s economy, health and environment. While this has resulted in a significant growth in the number of research publications produced by Qatari researchers in recent years, a nationally co-ordinated approach is needed to address some of the emerging but increasingly important aspects of research data curation, such as management and publication of research data as important outputs, and their long-term digital preservation. Qatar National Library (QNL), launched in November 2012 under the umbrella of QF, aims to establish itself as a centre of excellence in Qatar for research data management, curation and publishing to address the research data-related needs of Qatari researchers and academics. This paper describes QNL’s approach towards establishing a national research data curation service for Qatar, highlighting the associated opportunities and key challenges.</span></p> 2018-04-02T15:30:34+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/561 Is Democracy the Right System? Collaborative Approaches to Building an Engaged RDM Community 2018-12-10T00:32:24+00:00 Marta Teperek M.Teperek@tudelft.nl Rosie Higman rosie.higman@MANCHESTER.AC.UK Danny Kingsley dak45@CAM.AC.UK <p class="abstract-western">When developing new products, tools or services, one always need to think about the end users to ensure a wide-spread adoption. While this applies equally to services developed at higher education institutions, sometimes these services are driven by policies and not by the needs of end users. This policy-driven approach can prove challenging for building effective community engagement. The initial development of Research Data Management support services at the University of Cambridge was policy-driven and subsequently failed in the first instance to engage the community of researchers for whom these services were created.</p> <p class="abstract-western">In this practice paper, we describe the initial approach undertaken at Cambridge when developing RDM services, the results of this approach and lessons learnt. We then provide an overview of alternative, democratic strategies employed and their positive effects on community engagement. We summarise by performing a cost-benefit analysis of the two approaches. This paper might be a useful case study for any institutions aiming to develop central support services for researchers, with conclusions applicable to the wider sector, and extending beyond Research Data Management services.</p> 2018-02-11T16:42:34+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/514 Encouraging and Facilitating Laboratory Scientists to Curate at Source 2018-12-10T00:32:26+00:00 Cerys Willoughby cerys.willoughby@soton.ac.uk Jeremy Frey J.G.Frey@soton.ac.uk <p class="abstract-western">Computers and computation have become essential to scientific activity and significant amounts of data are now captured digitally or even “born digital”. Consequently, there is more and more incentive to capture the full experiment records using digital tools, such as Electronic Laboratory Notebooks (ELNs), to enable the effective linking and publication of experiment design and methods with the digital data that is generated as a result. Inclusion of metadata for experiment records helps with providing access, effective curation, improving search, and providing context, and further enables effective sharing, collaboration, and reuse.</p> <p class="abstract-western">Regrettably, just providing researchers with the facility to add metadata to their experiment records does not mean that they will make use of it, or if they do, that the metadata they add will be relevant and useful. Our research has clearly indicated that researchers need support and tools to encourage them to create effective metadata. Tools, such as ELNs, provide an opportunity to encourage researchers to curate their records during their creation, but can also add extra value, by making use of the metadata that is generated to provide capabilities for research management and Open Science that extend far beyond what is possible with paper notebooks.</p> <p class="abstract-western">The Southampton Chemical Information group, has, for over fifteen years, investigated the use of the Web and other tools for the collection, curation, dissemination, reuse, and exploitation of scientific data and information. As part of this activity we have developed a number of ELNs, but a primary concern has been how best to ensure that the future development of such tools is both usable and useful to researchers and their communities, with a focus on curation at source. In this paper, we describe a number of user research and user studies to help answer questions about how our community makes use of tools and how we can better facilitate the capture and curation of experiment records and the related resources.</p> 2017-12-30T19:38:33+00:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/484 Archiving Large-Scale Legacy Multimedia Research Data: A Case Study 2018-12-10T00:32:19+00:00 Claudia Yogeswaran c.yogeswaran@ucl.ac.uk Kearsy Cormier k.cormier@ucl.ac.uk <p class="Abstract">In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.</p> 2018-04-02T22:32:18+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/510 Data Curation for Community Science Project: CHIME Pilot Study 2018-12-10T00:32:18+00:00 Ayoung Yoon ayyoon@iupui.edu Lydia Spotts ayyoon@iupui.edu Andrea Copeland ayyoon@iupui.edu <p class="abstract-western"><span style="color: #000000;">T</span><span style="color: #000000;"><span lang="en-US">his paper introduces a community science project, Citizen Data Harvest in Motion Everywhere (CHIME), and the findings from our pilot study, which investigated potential concerns regarding data curation. The CHIME project aims to build a cyclist community–driven data archive that citizens, community scientists, and governments can use and reuse. While citizens’ involvement in the project enables data collection on a massive, unprecedented scale, the citizen-generated data (cyclists’ video data recorded with wearable cameras in the CHIME context) also presents several concerns regarding curation due to the grassroots nature of the data. Learning from our examination of cyclists’ video data and interviews with them, we will discuss the curation concerns and challenges we identified in our pilot study and introduce our approach to addressing these issues. Our study will provide insights into data curation concerns, to which other citizen science projects can refer. As a next step, we are in the process of developing a data curation model that will consider other factors related to this community science project and can be implemented in future community science projects</span></span><span style="color: #000000;">.</span></p> 2018-04-25T19:24:25+01:00 ##submission.copyrightStatement## http://www.ijdc.net/article/view/585 Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance 2018-12-10T00:32:13+00:00 Qian Zhang zhangqian06@gmail.com Yang Cao zhangqian06@gmail.com Qiwen Wang zhangqian06@gmail.com Duc Vu zhangqian06@gmail.com Priyaa Thavasimani zhangqian06@gmail.com Timothy McPhillips zhangqian06@gmail.com Paolo Missier zhangqian06@gmail.com Peter Slaughter zhangqian06@gmail.com Christopher Jones zhangqian06@gmail.com Matthew B. Jones zhangqian06@gmail.com Bertram Ludäscher zhangqian06@gmail.com <p class="abstract-western"><span style="color: #000000;">We illustrate how combining retrospective and prospective</span><span style="color: #000000;">provenance can yield scientifically meaningful </span><span style="color: #000000;"><em>hybrid provenance</em></span><span style="color: #000000;">representations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of </span><span style="color: #000000;"><em>retrospective</em></span><span style="color: #000000;">provenance when coupled with </span><span style="color: #000000;"><em>prospective</em></span><span style="color: #000000;">provenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.</span></p> 2018-08-13T17:46:15+01:00 ##submission.copyrightStatement##