Data Mining Research with In-copyright and Use-limited Text Datasets: Preliminary Findings from a Systematic Literature Review and Stakeholder Interviews
Text data mining and analysis has emerged as a viable research method for scholars, following the growth of mass digitization, digital publishing, and scholarly interest in data re-use. Yet the texts that comprise datasets for analysis are frequently protected by copyright or other intellectual property rights that limit their access and use. This article discusses the role of libraries at the intersection of data mining and intellectual property, asserting that academic libraries are vital partners in enabling scholars to effectively incorporate text data mining into their research. We report on activities leading up to an IMLS-funded National Forum of stakeholders and discuss preliminary findings from a systematic literature review, as well as initial results of interviews with forum stakeholders. Emerging themes suggest the need for a multi-pronged distributed approach that includes a public campaign for building awareness and advocacy, development of best practice guides for library support services and training, and international efforts toward data standardization and copyright harmonization.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence.
Institute of Museum and Library Services
Grant numbers LG-73-17-0070-17