Use of Ontologies for Data Integration and Curation
AbstractData curation includes the goal of facilitating the re-use and combination of datasets, which is often impeded by incompatible data schema. Can we use ontologies to help with data integration? We suggest a semi-automatic process that involves the use of automatic text searching to help identify overlaps in metadata that accompany data schemas, plus human validation of suggested data matches.
Problems include different text used to describe the same concept, different forms of data recording and different organizations of data. Ontologies can help by focussing attention on important words, providing synonyms to assist matching, and indicating in what context words are used. Beyond ontologies, data on the statistical behavior of data can be used to decide which data elements appear to be compatible with which other data elements. When curating data which may have hundreds or even thousands of data labels, semi-automatic assistance with data fusion should be of great help.
Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence.