Abstract
Regardless of current or future technologies, accessing digitally preserved information resources will always pose challenges. There is a plethora of models, standards and best practices addressing the different facets for the preservation of digital objects. The management of digital objects requires well-defined policies and data management plans that include all processes within their specific lifecycle. To achieve high levels of data sharing and long-term re-use of data, APARSEN recommends developing an Interoperable Framework for Persistent Identifiers, paving the way for a ‘Ring of Trusted Persistent Identifiers for Linked Open Data’. To enable semantic interoperability of such a Ring, this article proposes to map LODE-BD metadata with the Framework’s ontology. The Ring can be further enriched with LOD2 Technology Stack to tackle the problem of trustworthiness of linked data lifecycle while addressing the issue of Big Data. To be trusted, digital libraries need to be audited and certified in compliance with the European Framework for Audit and Certification.
Keywords
Curators at important institutions had been making heroic efforts against the loss of shared cultural heritage (Miller and Ogbuji, 2015: 23).
Digital preservation: Context
In a special issue of the journal ISQ Information Standards Quarterly (2010: 3) dedicated to digital preservation (DP), it was stressed that our rapidly changing digital world suffers from an over-abundance of unstructured digital information, rapid obsolescence of hardware and software, and increasingly restrictive intellectual property regimes. To ensure continued, sustainable and authentic long-term access to digital information, a vibrant international community of digital information specialists is continuously developing and implementing standards and best practices in the areas of digital curation and DP, taking into account that technological means for storage of digital information will change over time. This means that choices made early in the life of a digital project will certainly have an impact on digital posterity (Holdsworth, 2007: 7).
Albeit issues regarding DP will continue to be pressing in the digital universe and despite DP policies that differ greatly across countries, the fundamental challenges regarding information resources’ availability over time are universal (Henneken, 2015; UNESCO, 2012). These challenges concern the whole curation lifecycle of digital resources and are largely addressed by the central methodological problems of research in science and technology at the intersection of digital libraries. The aim of a digital library that may host a number of digital repositories is to facilitate communication between libraries, museums and archives at a cross-cultural level, in order for these institutions to work together to a greater extent, making their digital collections and objects available on the Web for a large audience through one central (multifocal) access point (IFLA, 2014).
Pursuing preservation research to forward the National Preservation Research Agenda, the Library of Congress, in consultation with leading scientific laboratories, has developed a matrix of preservation science projects undertaken by libraries, archives and museums worldwide, illustrating the wide spectrum of preservation research from scientific and forensic studies to the development of preservation treatment (Library of Congress. National Preservation Research Agenda).
The qualitative shift from good research to good practice requires cutting-edge strategies, as in the implementation of methods underpinning the storage of digital memories comprising both short-term and long-term preservation of digital objects (DOs) (COAR, 2015; Cornell University). Preservation of DOs in the long term is not limited to storage and backup; rather it involves multifaceted strategies aimed at providing a Trusted environment (covering authenticity, integrity, long-term access, security issues) where DOs can evolve along with the changes in technology, hardware and software (InterPARES Trust; W3C, 2011). The long-term DP together with the principle of open access to research data (and their metadata) offer broad opportunities for the scientific community. In particular, more and more universities and research centres are starting to build research data repositories allowing permanent and open access to data sets in a trustworthy environment (Swan et al., 2015; Zenodo, 2014: 3). In this context it should be underlined that just a few recognize the importance of preserving the so-called negative results or the inconclusive results deriving from the processes of elaboration of raw data. Usually positive results are preserved and accessible over the long term.
The Digital Curation Centre (DCC) places DP at the centre of digital curation (maintaining, preserving and adding value to digital data throughout its lifecycle) activities. These latter are of vital importance to ensure and achieve qualitative access management and content re-usability by means of well-established digital curation workflow models (e.g. Taverna) and tools (Weidner and Alemneh, 2013) supporting a complex set of actions necessary to support authenticity, reliability, usability and integrity – measured in terms of content, fixity, reference, provenance, and context – of DOs in a long-term perspective.
DO is the heart of what DP management is all about. PREMIS (PREservation Metadata Implementation Strategies) data dictionary defines DO as ‘a discrete unit of information in digital form. A DO can be a representation, file, bitstream, or filestream’ (Library of Congress, 2012: 13). The annual International Conference iPRES, dedicated to different aspects of DP, endorses DOs under the aegis of articles, datasets, images, stream of data (iPRES, 2013: 72). The California Digital Library Glossary identifies DO as an entity with one or more content files united (physically and/or logically, through the use of a digital wrapper) to their corresponding metadata, while the Glossary of Archival and Records Terminology refers to DO as an information resource ‘that has been digitally encoded and integrated with metadata to support discovery, use, and storage of those objects’ (Society of American Archivists, 2005). In regard to metadata, they are essential elements for managing, accessing, reusing, retrieving and preserving huge amounts of information resources (LIBER, 2014). Together with metadata, certain significant properties (InSPECT Project) of DOs need to be preserved in order that these latter are deemed authentic over time.
Digital preservation: Between quality, sustainability and planning
According to the widely-accepted ISO 9000 definition, quality is ‘the totality of features and characteristics of a product or service that bear on its ability to satisfy a given need’ (ISO 9000). An important prerequisite for every sustainable DP management system is to continuously assure compliance to specific quality requirements (technical and non-technical) adopted by its outsourcing and/or hosting organization (American Society for Quality (ASQ)). The ISO 25010 (2011) system and software quality model – often adopted by organizations to set-up DP plans based on classificatory decision criteria for technical requirements (Hamm and Becker, 2011) – defines a hierarchy of quality attributes by combining characteristics related to the outcome of interaction of the software product (quality in use) and those related to static properties of software and dynamic properties of the computer system (product quality).
Quality and sustainability of DP management systems are terms very much ‘in vogue’ (Doorn, 2013; Hey, 2012). The recent document COAR Roadmap for Future Directions for Repository Interoperability includes the concept ‘Sustainability’ among six topics regarding interoperability and groups it with goals such as: Improving Platform Stability; Supporting long-term preservation and archiving; Exposing Persistent Identifiers; Integrating different Persistent Identifiers (COAR, 2015: 12).
To provide sustainability of activities and workflows in DP services (APARSEN, 2013a), a structured, systematic process – based on well-defined strategies (e.g. Digital Preservation Strategy of the British Library), interoperable policies (Innocenti et al., 2011) (e.g. Digital Preservation Policy of the National Library of Australia; PORTICO Trust Archive Preservation Policies) and comprehensive data and process management plans (DMPs) – is essential (Budroni et al., 2013; ICPSR; RDA).
A core set of controlled vocabulary elements can be ‘instantiated to connect preservation planning, preservation watch, and experimentation with preservation policies’ (Kulovits et al., 2013; PLATO).
DMP should be an integral part of every project implicating data management, and it should formalize in detail all technical and non-technical elements – including processes (e.g. workflows performing complex operations involving identification, migration, conversion tools, as well as the comprehension of visualization issues) and context – accompanying a DO’s lifecycle in conjunction with a repository environment.
The DMP section devoted to ‘Preservation’ should comprise and relate all necessary features and requirements clarifying issues on: technical registries, i.e. information about file formats: TIFF, PDF/A, ALTO, TEI, BWF, AIFF, MXF, AVI, etc. (California Digital Library, 2011; PREFORMA) and their conversion (Holdsworth, 2007); software products to access the information; migration paths and platforms; persistent identifiers/PIs unambiguously locating and accessing DOs; Digital Rights and Access Management (DRM) within the context of long-term DP, as well as the related risks and challenges arising in connection with the long-term DP, ongoing accessibility of DRM-protected objects, and the safeguarding of associated rights (APARSEN, 2014b); standards concerning preservation and workflows for collecting actionable representation and administrative (that can overlap with technical and perseveration metadata) metadata (Digital Preservation Coalition, 2013); costs in time and effort.
The overall structure of DMP, considering also processes for data curation – as stressed by Rauber (2014) – should: demonstrate that resources and systems will enable the data to be curated effectively beyond the lifetime (DCC, 2014); describe all contingent processes, their implementation and data used and produced by processes; provide preservation history (long-term storage and funding); highlight conditions for sharing, reuse, verification, legal aspects; demonstrate monitoring and external dependencies; be machine-readable and machine-actionable to automate (most of) the activity in creating and maintaining that DMP.
The main building blocks for process management plans (PMPs), that comprise: metadata frameworks; preservation plans; process context models; preservation actions; approaches for validation documentation; policies;
should be carefully analyzed and elements establishing context of interoperable process activities should be mined and described in a context model, combining features with ground truth into specific file format. PMPs extending DMPs should automatically enable the following processes: capture processes, workflows and their dependencies; verify correctness of re-execution and re-use of data and workflows; identify subsets of data in large and dynamic databases; assign PIs to time-stamped query; capture all elements of a research process; cite data, etc. Needless to say that the development of DMP requires a certain degree of cooperation between a number of agents responsible for a wide range of Digital (Data) Curation phases (DCC, 2014; Ganguly, 2015; IFLA, 2012; Tammaro and Casarosa, 2014; UC Curation Center).
Framing digital objects’ preservation: Models and initiatives
Preservation metadata have been identified as essential for the long-term management of DOs. The core of DP metadata is PREMIS specifying the semantic units/classes (Intellectual Entities, Objects, Rights, Events, Agents) designed to support the long-term accessibility of a DO by providing information about its content, technical attributes, dependencies, management, designated communities and change history. PREMIS interoperable units convey detailed and complex information about digital content through administrative metadata, technical metadata and specification of structural relationships relevant for preservation functions ( ISQ Information Standards Quarterly, 2010).
For institutions getting started with DP, the metadata standards able to support quality and sustainability of DO in a long-term period can be overwhelming. Making smart choices about what constitutes ‘good enough’ can enable repository managers to move forward more quickly. Michael Day published in 1997 a short paper in Ariadne on the implications of metadata for DP from the point of view of responsibilities. The author addressed five important issues which to this day still represent challenges, which are the following: Who will define what preservation metadata are needed? Who will decide what needs to be preserved? Who will curate the preserved information? Who will create the metadata? Who will pay for it? (White, 2015)
It is obvious that DP implies machines’ and humans’ dependability and there should be a common framework regulating responsibilities and interactions between humans and systems and accepting the responsibility to preserve information and make it available for a designated community. Such a common framework is presented by the widely-endorsed Open Archival Information System (OAIS) model (Lavoie, 2014), published as Standard IS0 14721 (2012), which provides a first-rate overview of the role of preservation metadata in the management over time of digital resources and contains a set of Preservation Policies.
When it comes to the long-term perspective of the Digital Library project (IFLA, 2014), a strategy for long-term digital preservation (LTDP) is required and OAIS provides a well-suited reference model for this context. OAIS vision has been specifically tailored for the purposes of lifecycle management in Rome at the Sapienza Digital Library, particularly for building a consistent set of data, covering all information needs, required by the different OAIS functional scenarios: Ingestion (Submission IP), Archiving (Archival IP) and Access (Dissemination IP). The Digital Library and DP services should be based on data conveyed by the aforementioned IP and enriched by a number of components, supporting the management of the information infrastructure (Catarci et al., 2014).
Over the last decade, great practical progress has been achieved in support of DOs’ expressivity and long-term sustainability. In particular, a series of methodologies, models and implementation guidelines have been developed by a number of projects (e.g. APARSEN, PREFORMA, SCAPE, SCIDIP-ES, TIMBUS, WF4Ever, KEEP, DP4lib, PrestoPRIME, PersID, CHRONOPOLIS, PARSEinsight, Preserv, SHAMAN, SPAR, PLANETS, CASPAR), every one of which has come up with a number of personalized (strongly community-driven and ‘by design’) frameworks, tools and systems to solve distinct problems in the DP domain, accelerating long-wave preservation trends with cross-disciplinary strategies. Moreover, ‘an essential step in the data preservation process is to convince people to invest time and effort in depositing their data in repositories specifically designated for data preservation’ (Henneken, 2015: 41, 42), like PHAIDRA (Phaidra, The Ten Commandments for Policy), DANS, Dataverse Network, Zenodo, etc.
Several important preservation issues addressing maintenance and preservation of cultural heritage (CH) resources in the long term as well as their persistent accessibility to the global community have been focused on EUROPEANA Digital Library. This last is a core of the European Commission Recommendation on the digitization and online accessibility of cultural material and digital preservation (European Commission, 2011), that has challenged Member States to develop solid plans and build partnerships to place all public domain masterpieces in EUROPEANA by 2015 and, by 2025, all of Europe’s cultural heritage. The Recommendation also invites all interested shareholders to adapt national legislation and strategies to ensure the long-term DP of more in-copyright and out-of-commerce, i.e. Open Data, CH material online conveyed by non-property (open) formats as property ones make the preservation risky.
To support collaborative creative endeavours in sharing, re-use and enrichment of CH data by adding new value, EUROPEANA Cloud (EUROPEANA Professional) will change the way that data (content and metadata) are sent to and stored in EUROPEANA, and will give researchers new tools to support their engagement in a Trusted, efficient cloud-based infrastructure forging connections with new communities exploiting potential synergies.
The ongoing European project PERICLES (Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics) – besides addressing a number of challenges ensuring that DOs remain accessible in a digital environment encompassing continuous technological change – stresses that changes in semantics (i.e. semantic obsolescence), academic or professional practice, or society itself can also influence the attitudes and interests of the various stakeholders that interact with digital content (PERICLES). Among a range of conceptual and formal models, tools, policies, architectural approaches developed to support a range of preservation requirements to be used independently in different environments, it is worth citing the PET modular toolkit for extracting Significant Environment Information (SEI) and Linked Resource Model (PERICLES) based on linked data (LD) principles for representing dynamic preservation ecosystems. It is already well known that: Linked Data provides a global environment for describing the objects and their significant properties. This environment reduces duplication of effort when describing resources and their attributes, and fosters the creation of a global information graph encompassing all the information needed to perform complex queries and actions. (W3C, 2011)
To tackle the key issues affecting the preservation and long-term accessibility of digital CH, in 2012 UNESCO organized an international conference entitled ‘The Memory of the World in the Digital age: Digitization and Preservation’ and published the Vancouver Declaration, including a number of the main recommendations on Trusted DP frameworks and practices for collaborative management and preservation (DCH-RP Project; UNESCO/UBC, 2013). To support DP activities on a regular basis among different stakeholders, a two-year Coordination Action ‘Digital Cultural Heritage Roadmap for Preservation’ (DCH-RP) was launched by the European Commission (2012). This initiative presented an action framework appropriate for advancing outstanding case studies, practices and effort in facilitating, promoting, advocating, raising awareness and disseminating harmonized data storage and preservation policies developed by different communities (CH organizations and e-Infrastructure providers) aiming to improve access to information and CH resources. The main outcome of this action is a DCH-RP Roadmap supporting implementation of a DP federated and interoperable collaborative e-infrastructures, supported by common standards, practical tools, approaches and business models for decision makers. The DCH-RP Roadmap makes it possible for each CH entity to define its own practical action plan with a realistic timeframe for the implementation of its stages. The DCH-RP Roadmap also provides practical steps to design a Trust model appropriate for the use in collaborative e-Infrastructures and including recommendations for user authentication and access control system(s). Above all, collaboration with a diverse set of stakeholders means that libraries can stake their place in the common vision for DP, thus ensuring that the issues surrounding the preservation of digital CH are represented in this vision (Reilly, 2013).
A common thread among all projects and initiatives focusing their efforts on DP activities is that they highlight the need to contribute qualitatively to the lifecycle of interoperable DOs in a Trusted (complying with specific requirements of quality and sustainability) digital environment. So what are the facets of such an environment and is there any common practical framework to assess its quality and sustainability?
The next sections will be devoted to presenting some issues of interoperability and trust that can be replicated in any DO management environment. Initiatives to be presented below that address these issues are: the already cited APARSEN, LODE-BD, LOD2 and ISO 16363 (2012) tackling a range of topics focused on persistent interoperability and trust of DO management systems.
Interoperability framework for persistent identifiers systems enhanced by LODE-BD e LOD2
One of the main goals of the European APARSEN project was to combine and integrate European DP efforts into a shared enterprise and thus to build a long-lived Virtual Centre of Excellence (VCoE) to share a common vision (APARSEN Roadmap underpinned by the revised OAIS Reference Model) of expertise, tools and resources for DP clustered in two hierarchical groups (Research Silos and Integrated Topics) with common agreement on terminology, evidence standards, DP services, access and re-use of data holdings over the whole life-cycle. The common topics of APARSEN ‘Access’, ‘Usability’, ‘Sustainability’ and ‘Trust’ are impregnated by issues such as interoperability in connection to PIs.
The concept of ‘interoperability’ promoted by APARSEN (2013b) is conceived in terms of a common way to access data in the same format even if these data belong to heterogeneous PI domains. Considering that different identification schemes will never speak with each other (e.g. DOI does not speak with NBN), APARSEN provides Persistent Identifiers (PIs) Interoperability Framework (IF), commonly known as ‘IF for PI systems’ (APARSEN, 2012a) underpinning interoperability, persistent access, reuse and exchange of information through the use of existing PIs and associated objects across different systems, locations and services. The basic idea of IF for PI systems is that a common conceptual representation is the main condition to design added-value interoperability services, which can exploit the value of a scheme of representation agreed and shared across Trusted systems in order to facilitate exchange, re-use and integration of DOs identified in these systems by different PIs.
Different repositories, for example PHAIDRA (Permanent Hosting, Archiving and Indexing of Digital Resources and Assets) and the repositories working within the frame of the PHAIDRA.org network, provide their own identifier system, which is applied to the objects generated through the repository. Moreover, PHAIDRA objects in the future will be assigned more than one PI, namely Handle and URN, according to the needs of the owners of the objects.
With the increasing exchange of metadata, different identifier systems will clash in a repository environment. Any type of additional PI (e.g. PMID/PMCID, ISSN, DO) is useful to fetch more, contextual Information (COAR, 2015: 18).
In compliance with the Linked Content Coalition (2012) Framework, any unique PI should be resolvable to a single object such as web page or file, or to both object and metadata or to multiple objects, such as different formats of the same objects, or different content types, through the same PI (multiple resolution). The resolution is the key mechanism enabling a system to locate and access the identified object or information related to it on the Web.
No digital system can be functional and interoperable without metadata and the explicit linkages between metadata and resources identified by PIs (e.g. relation existing between a resource and the collection of which it is part of). Common conceptual representation of metadata in different services represents an added value that can speed up the implementation of their interoperability. In this respect, the IF is mapped to the incoming Dublin Core and MARC information on the APARSEN entities through flexible FRBRoo ontology (Bekiari et al., 2015) bridging entities representing library and museum CH resources.
The metadata normalization could be accomplished on top of nine metadata groups of common properties recommended by LODE-BD (Linked Open Data enabled Bibliographical Data) (Subirats and Zeng, 2013), which are: Title Information; Responsible Body; Physical Characteristics; Location; Subject; Description of Content; Intellectual Property; Usage; Relation. These nine clusters are consistent in both type of entities and relationships between entities in the treatment of Work, Expression, and Manifestation concepts used in FRBR (Functional Requirements for Bibliographic Records) (IFLA, 1998). Being mapped to DC (simple and qualified) and to other metadata and schemes, also designed to support bibliographical data on the Web, LODE-BD metadata can be seen as one-size-fits-all approach for encoding meaningful LOD-ready bibliographical data concentrated on the data, not on the scheme.
As a reference tool, LODE-BD provides assistance on how to make decisions on metadata modelling (in both depth and detail), encoding and implementation (with better response to specific needs via Design-time/Run-time strategies (Subirats and Zeng, 2013) by providing all necessary paths on how to create meaningful and comprehensive (both to humans and web engines) bibliographic data and to share (Subirats et al., 2011) them among different systems and with LOD universe (an unbound, global data space containing more than 31 billion triples) (ALOE; GETTY).
Content/data providers aiming to communicate and to discover knowledge via a common ‘IF for PI systems’ can directly create RDF triples using LODE-BD metadata properties encoded with non-literal (URI) data values of LOD-ready schemes. In this way, content providers will be aligned on the backbone of a common conceptual representation of data. In the ideal draft scenario, these data should be aggregated by a central ‘IF for PI systems’ (service provider) – exploiting powerful crosswalks and ontology including LODE-BD metadata – with no delays, failures, errors or omissions or loss of transmitted information. ‘Since publishing as LOD in any case means interlinking the data with external sources by means of typed relations, it would foster the topic of data interoperability’ (COAR, 2015: 37).
After the metadata normalization in APARSEN IF comes the stage of the co-reference generation among resources through a <owl: sameAS> relation indicating that two URI refer to the same entity (i.e. digital objects/authors/institutions have the same ‘identity’). The programming of a technical infrastructure based on APARSEN IF should foresee all standardized relationships between the identified entities, their PIs, the corresponding resolution services and related information (metadata). Finally, a common interoperability layer – where meaningful information from independent systems is integrated, re-used and exploited to enable added-value interoperability services (APARSEN, 2014) – can be created.
APARSEN IF for PI systems stresses the importance of registering alternative identifiers for the same entity, because it guarantees multiple ways to access the resource and related information, making the resolution process really persistent.
The first prototype of the APARSEN IF for PI systems demonstrator was presented in 2012 at the workshop ‘Interoperability of Persistent Identifiers Systems – Learning how to bring them together’ (APARSEN, 2012b). This demonstrator aggregated some metadata provided by several APARSEN partners on a single machine implementing the IF (FRBRoo) ontology in a RDF triple store mechanism and exposing these metadata through a SPARQL end-point. The prototype exposes co-references among related entities in the knowledge base using information provided by content providers. ‘If the IF is widely implemented it can become a reference model for any future development for PI systems and it could create a “Ring of Trusted PI for Linked Open Data (LOD)”’ (APARSEN, 2013b: 4).
Extending PRESERVING LINKED DATA project’s challenges, the first DIAHRON Workshop (hosted by ESWC2015) entitled ‘Managing the Evolution and Preservation of the Data Web’ (DIAHRON Workshop, 2015) stressed that it is of particular relevance for different stakeholders to raise awareness of how openly available LD sets could be used to achieve their full potential. A traditional view of digitally preserving LD sets by pickling them and locking them away for future use, like groceries, would conflict with their evolution. To provide some solutions to this problem, the European LOD2 Project proposed the LOD2 approach (i.e. LOD2 Stack) to plan and manage a full life-cycle of LD.
In particular, the LOD2 Project was launched to deal with the following issues: How to improve coherence and quality of data published on the Web? How to close the performance gap between relational and RDF data management? How to establish Trust on the LD Web and generally lower the entrance barrier for data publishers and users?
These questions have been answered by providing: tools and methodologies for exposing and managing very large amounts of structured information (Big Data) on the Data Web (H2020 project; OAI9 Workshop, 2015; OR2015, 2015); a testbed and bootstrap network of high-quality multi-domain, multi-lingual ontologies from sources such as Wikipedia and OpenStreetMap; algorithms for automatically interlinking and fusing data from the Web; standards and methods for consistently tracking trust and trustworthiness of information as well as for assessing its quality (Gladney, 2009; Hartig, 2009; Semantic Web Company, 2013); adaptive tools for searching, browsing, and authoring of LD.
The LOD2 Stack provides a series of mechanisms to manage a full life-cycle of LD, by tackling: synchronisation problem (i.e. how to monitor changes); curation problem (i.e. to repair data imperfections); appraisal problem (i.e. to assess the quality of a dataset); citation problem (i.e. how to cite a particular version of a linked dataset); archiving problem (i.e. to retrieve the most recent or a particular version of a dataset); sustainability problem (i.e. to spread preservation ensuring long-term access).
The LOD2 Stack is a valuable tool to support creators and publishers of LD and is a likely candidate to be integrated in the ‘Ring of Trusted PI for LOD’. In particular, engaging LODE-BD and LOD2 Stack in the IF for PI systems will empower its interoperability, pave all necessary conditions for ‘Creating Knowledge out of Interlinked Data’ (Auer et al., 2014) and enhance ‘Proof and Trust’ (Jaques et al., 2012).
The next section will introduce the reader to the concept of Trust, a concept on which APARSEN and Trusted Digital Repository framework have focused their main endeavours.
Vision of trust
So how can CH organizations collaborate to address unique practices and challenges worldwide related to DP and to management of Trusted systems, aiming at ensuring persistent access to digital resources worldwide?
One of the ways is to be engaged with the DP community as a whole. The previously mentioned APARSEN network, by extending its Virtual Centres of Excellence (Centro di Eccellenza Italiano sulla Conservazione Digitale), invites different stakeholders to take part in its network contributing to and sharing a common DP vision. Collaboration with a diverse set of practitioners (public and private), exchanging their experience and expertise, means that the CH sector can gain its place in the common cross-referenced vision for DP, ensuring that the issues surrounding the preservation and management of digital CH are represented in this common vision too.
In recent years, there have been multiple efforts to assess repositories with the objective of making their practices and procedures transparent, while assuring that their valuable digital assets are protected.
A few years ago, APARSEN presented a unified European vision of Trust in DP (APARSEN, 2012c), in particular when it comes to unfamiliar digitally encoded information, especially when it has passed through several hands over a long period of time. The report collected, evaluated and provided key answers to the following issues: Has the digitally encoded information been preserved properly? Is it of high quality? Has it been changed in some way? Does the pointer or link takes user to the right object?
The unified vision of trust refers to three levels for evaluation of Trusted Digital Repositories (TDR). These levels constitute the TDR framework and are recognized as the ‘European Framework for Audit and Certification on Digital Repositories’ underpinned by a Memorandum of Understanding (MoU) (TrustedDigitalRepositories.eu, 2010). The relevance of TDR framework is also stressed by DCC in the context of lifecycle planning for successful DC.
The integrated multilevel framework for evaluation of a TDR assembles: Data Seal of Approval (DSA) assessment initiative; Standard DIN 31644 (2012) – Information and Documentation. Criteria for Trusted Digital Repositories; Standard for TDR - ISO16363 (2012) – Space Data and Information Transfer Systems – Audit and Certification of Trustworthy Digital Repositories.
By implementing this framework, the digital world may become more reliable. Moreover, the ‘audit and certification of digital repositories are fundamental in guaranteeing the trustworthiness of research infrastructures as a whole’ (Dillo, 2012: 1).
The first (Basic Certification) level – presenting an entry point for the self-accessing of repository quality and sustainability – requires a few days’ effort from the repositories. The last two (Extended and Formal Certification) levels present auditing standards for TDR and require several person months to collect much more detailed information than the DSA, to take part in the audits for assessing the trust of digital repositories, considering also that it is ‘not a one-time accomplishment that you achieve and then forget’ (Dillo, 2012: 4).
Basically, the definition of a TDR starts with a mission to provide reliable, long-term access to managed digital resources to designated community/ies via an articulated framework of attributes (administrative responsibility, organizational viability, financial sustainability, and procedural accountability) and responsibilities for Trusted, reliable, sustainable digital infrastructures capable of handling the plethora of materials held by large and small CH and research institutions. The NESTOR working group defines a Trusted, long-term digital repository as a complex and interrelated system. In determining Trustworthiness, one should look at the quality of entire digital infrastructure, ‘in which the digital information is managed, including the organization running the repository’ (TRAC, 2007: 7, 9, 15).
The DSA sets forth 16 guidelines related to Trustworthy data management and stewardship (Data Seal of Approval, 2010). Some of the digital repositories awarded with DSA include: ICPSR, the Archaeology Data Service (United Kingdom); the DANS Electronic Archiving System (Netherlands); the Platform for Archiving CINES (France); the Language Archive of the Max Planck Institute for Psycholinguistics (Netherlands); and the UK Data Archive (ICPSR. Trusted Digital Repositories).
The standard DIN 31644 consists of 34 requirements structured in three parts: (1) organization; (2) management of intellectual entities and their representations; (3) infrastructure and security. It includes Appendices with examples of digital repositories and best practices for each requirement.
The ISO 16363 – based upon the Trusted Digital Repositories and Audit Checklist (TRAC) tracing the story (‘let to’, ‘developed into’, ‘adopted as’, ‘informed’, ‘referenced by’) (Wikipedia) of all Digital Repository Standards – can be used as a basis for formal certification and assessment of digital repositories. TRAC describes the metrics of an OAIS-compliant digital repository developed from work done by the OCLC/RLG Programs and National Archives and Records Administration (NARA) task force initiative (Giaretta, 2011). The Center for Research Libraries Certification Advisory Panel (Center for Research Libraries) ensures that the certification process addresses the interests of different stakeholders including managers in collection development, preservation and library information technology.
The following different high-quality aspects are provided by both (1) TDR framework (organizational infrastructure; DO management and infrastructure; security risk management, etc.); and (2) Virtual Centres of Excellences constituted by APARSEN: repository policies compliant with TDR criteria can be defined (e.g. Comparison of TRAC Checklist and PLEDGE Policy List); preservation prototypes, as well as a portfolio of models, services and tools for innovative support of lifecycle management, monitoring risks and opportunities connected with DP components and quality measures can be developed; preservation ecosystems (shifting from collaborative approach towards distributed DP to Open Scalable Preservation Ecosystems) can be achieved (Kulovits et al., 2013; Skinner and Halbert, 2009); and a broader take-up of the DP projects’ results can be encouraged providing guidance that others can use in their own preservation efforts determining their own institutional DP needs, and including interactive ‘on-the-spot’ research on current DP trends.
Final thoughts and outlook
The push for the long-term DP of valuable information resources is both a challenge (ensuring that it is carried out in the most cost-effective and efficient methodological and implementation manner) and an opportunity for different stakeholders, included CH organizations. The accurate selection and application of models and technologies promoted by a wide range of initiatives and projects, as well as replication of core elements of best practices – underpinning a plethora of facets of DP – will positively support persistent access to content and its interoperability in the long-term perspective, paving a stable way for re-use of data for research and innovation.
The APARSEN Network of Excellence in DP has launched the long-life collaborative Virtual Centres of Excellence, where different stakeholders can interact, sharing their models and practices and developing a common vision for DP.
By means of the APARSEN Interoperability Framework for Persistent Identifier systems empowered by LODE-BD and LOD2 Stack, semantically enhanced content can be pushed in an interoperable Trustworthy manner out of its DC ecosystem to LOD universe, facilitating communities’ participation through data and knowledge re-use, re-distribution and sharing on the frontline of Linked Data. Trust and Trustworthiness of DP notably affect the quality and sustainability of DC, focusing its main efforts on the creation of long-life value-added services, where users can undertake innovative exploration and analysis of digital contents over a long span of time (APARSEN, 2014a).
Digital repositories compliant with organizations and policies and procedures, focusing well on preservation goals and assessed according to the European Framework for Audit and Certification on Digital Repositories are Trusted and Trustworthy and thus sustaining different opportunities for long-term data sharing.
To empower collaborative endeavours of Trusted DP communities, a set of interrelated technical and non-technical requirements, objectives and components for preservation quality should be programmed in human-machine friendly scalable PMPs connecting dynamically (on request and in respect with updates) cross-referenced elements and retrieving answers on queries, helping to monitor and to assess different preservation contexts with the goal of developing shared solutions for the optimization of DP services.
To enhance community-driven DP activities supported by Virtual Centres of Excellence, DP services should collaboratively focus their efforts on extending already existing ‘friendly human-machine’ controlled vocabulary elements for preservation quality, enabling interoperability among the building blocks of the preservation ecosystem (Kulovits et al., 2013). The semantics of such vocabularies should be optimized for RDF-aware environments, aligned and automatically updatable on the frontline of Linked Open Data (Haag, 2011) and Big Data, thus notably contributing to enable interoperability features defined in the recent COAR (2015) Roadmap for Future Directions for Repository Interoperability. In an ideal scenario, such a common controlled vocabulary supporting DP should connect DP systems around the globe, merging the concepts of policy-aware operations, planning, technical and monitoring components of (complex) digital objects. The ultimate goal of such endeavour is to collaboratively ensure that all necessary exchangeable information is leveraged to develop a global scalable Trusted preservation ecosystem.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
