The Use of PDF/A in Digital Archives: A Case Study from Archaeology
In recent years the Portable Document Format (PDF) has become a ubiquitous format in the exchange of documents; in 2005 the PDF/A profile was defined in order to meet long term accessibility needs, and has accordingly come to be regarded as a long-term archiving strategy for PDF files. In the field of archaeology, a growing number of PDF files – containing the detailed results of fieldwork and research – are beginning to be deposited with digital archives such as the Archaeology Data Service (ADS). In the ADS’ experience, the use of PDF/A has had benefits as well as drawbacks: the majority of PDF reports are now in a standard format better suited to longer-term access, however migrating to PDF/A and managing and ensuring reuse of these files is intensive, and fraught with potential pitfalls. Of these, perhaps the most serious has been an unreliability in PDF/A conformance by the wide range of tools and software now available. There are also practical and more theoretical implications for reuse which, as our discipline of archaeology alongside so many others rapidly becomes digitized, presents us with a large corpus of ‘data’ that is human readable, but may not be amenable to machine-based technologies such as NLP. It may be argued that these factors effectively undermine some of the perceived cost benefit of moving from paper to digital, as well as the longer-term sustainability of PDF/A within digital archives.
Copyright for papers and articles published in this journal is retained by the authors, with first publication rights granted to the University of Edinburgh. It is a condition of publication that authors license their paper or article under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence.