Abstract
Semantic interoperability issues of international e-Government data exchanges have not been solved up until now. In the case of social security institutions, the data exchange operations have some particularities that make that the non-ambiguous definition of core concepts used in the institutions has a key impact on the success and quality of system interconnections. In this article, we present the result of a research to implement a new metadata specification based in Dublin Core elements for international social security exchanges, named Exchange Social Security Information Metadata (ESSIM). This proposal is based in a semantic approach using Linked Data for Interoperability, with technologies, such as RDF(S), SPARQL, Microdata and JSON-LD, in order to ensure interoperability between social security institutions from different countries. This will help to strengthen the protection of the social security rights of mobile workers by automating the application of international agreements on social security and to improve cross border communication between social security institutions of different countries. For the near future, the goal is to include this specification as part of information and communication technology Guidelines under development by International Social Security Association with the participation of authors of this article. This will facilitate a future adoption of the specification as an international standard.
Keywords
1. Introduction
Interoperability can be considered as the ability of diverse and heterogeneous organisations to interact towards mutually beneficial and according to agreed common goals, involving the sharing of information and knowledge between the organisations, via the business processes they support, by means of the exchange of data between their respective information and communication technology (ICT) systems [1]. Interoperability techniques play an increasingly important role in the implementation of e-government in general, and social security in particular, especially those which include social programmes or services provided by different institutions. The subject was studied in depth by International Social Security Association (ISSA), a set of basic interoperability guidelines has been published [2]. Existing interoperability experiences when implementing international information exchange vary from complex systems, implementing large multilateral agreements, to more simple ones implementing bilateral agreements. In Delgado et al. [3], systems implementing social security international exchanges in different parts of the world are reviewed in order to show some relevant implementation approaches that can serve as basis for proposing improvements. One of the conclusions of this research was that social security data exchanges have some particularities that make that the non-ambiguous definition of core concepts used in the institutions has a key impact on the success and quality of system interconnections as well as on the shared usage of common information systems. These semantic issues of social security data exchanges have not been solved up until now. Indeed, in the context of social security systems, semantic interoperability plays a fundamental role for developing common definitions and interpretations concerning the data to be processed by various organisms. Social security operations involve a wide range of concepts which in spite of having the same name may be interpreted differently (e.g. family group, members of the same household, unemployed person, old age pension, health benefits, social security contributions, etc.). The fact that the information generated by a computer must be processed by another system which must understand its meaning correctly leads to additional complications which affect both the source of the information and its recipient and constitutes the central theme of semantic interoperability. Achieving this interoperability requires agreeing, for example, on the way in which information is represented and on its context. This would enable automatic tools to share and process the information, even when it has been registered separately. Reaching an agreement on the establishment of a metadata for social security institutions would enable the creation of interoperable public services. The subsequent step would consist of identifying and standardising the metadata used by the different institutions as well as documenting it so that it can be shared. This would enable the implementation of a Metadata Management System to be jointly used by all the institutions. This article sums up the research done about the semantic dimension of data exchange between social security institutions and describes a proposal metadata specification (Exchange Social Security Information Metadata (ESSIM)) and its implementation as an RDF vocabulary. This is used in semantic annotations linked to the exchanged data in different ways, depending on the format of the official documents (OD) containing the data: as embedded markup with Microdata, JSON-LD or RDF technologies in the case of HTML documents or as JSON-LD-attached packages or files in the case of non-web documents. Also, a web-based services architecture has been designed to support the annotated data exchanges in an international social security interoperability context. Section 2 presents the background of semantic problems and the use of vocabularies as possible solution in the context of e-Government data exchange. In section 3, the new ESSIM vocabulary is described, based on Dublin Core elements and focusing on social security information systems. In section 4, the vocabulary is implemented using RDF. Section 5 describes the real use of the vocabulary through microdata and JSON-LD scripts as semantic annotations of official social security documents fields. Conclusions and future work are presented in sections 6 and 7.
2. Background
One of the main methods to handle semantic conflicts is the use of controlled vocabularies [4], which, in order of increasing complexity, consist of glossaries, thesauruses, taxonomies and ontologies [5]. The greater the complexity and expressiveness of a controlled vocabulary is, the higher the number of types of semantic conflicts that can be covered. Glossaries and dictionaries are the simplest form of a controlled vocabulary, where no predetermined structure is applied to the term set. Although the use of ontologies could constitute a very relevant contribution towards guaranteeing the quality of electronic exchanges, the definition and maintenance of ontologies involves a very high level of complexity which is often off-putting. This is perhaps why the European Interoperability Strategy aims merely to reach an agreement on the definition of metadata [1]. A joint conceptual model (metadata schema) should include a description of the information to be exchanged in terms of concepts, their properties and the relations between the concepts. Metadata can be defined just once and then shared and reused on multiple occasions. Standardisation is a strong form of establishing an agreement by means of consensus building and an intuitive, technically effective and economically well-recognised way to achieve interoperability [6]. Furthermore, as standards offer a special support and have been widely accepted to achieve interoperability as well as quality development, they should lead to technological independence [7]. The choice of a metadata schema is largely influenced by practices at peer institutions and the compatibility with a content management system [8]. In literature, authors present experiences to achieve interoperability at semantic level [9,10] and successful experiences consuming and producing linked open data [11]. Authors of previous studies [12–15] have studied the use of metadata to describe e-Government services in Web Pages, using different annotation technologies as microformats, microdata [16], RDFa [17] or JSON-LD [18], but not associated with data exchanges, but associated with pages published in websites. In the case of social security information systems, metadata should describe data involved in information exchanges between social security institutions. The primary factor is selecting metadata standard in their suitability for describing the most common type of resources handled by the exchanges between information systems participants [19]. The second and third most common criteria, ‘target users/audience’ and ‘subject matters of resources’, also seem to reflect how domain-specific standard are applied. Nowadays, social security is extended all over the world. However, existing technological infrastructure and resource in different institutions as well as implementation constrains also determine options. Therefore, the defined standard should be based on a worldwide specification. It was found that studies on metadata development for reference resource systems mostly had proposed Dublin Core metadata [20], which also have been widely used in existing reference systems [21]. Regarding e-Government, AGLS is an Australian metadata standard used to describe online Government resources, aligned with Dublin Core [22]. Another well known standard is IEEE-LOM (Learning Object Metadata), a ‘conceptual data schema’ for the description of learning objects [23]. LOM was developed and formalised through the IEEE and their Learning Technology Standards Committee. In turn, ISO/IEC 19788 Metadata Learning Resource (MLR) [24,25] is still under development. The second part of this standard defines metadata elements that match up with Dublin Core elements [26]. According to the experiences analysed, the fifteen Dublin Core elements could be an excellent starting point for a social security standard definition. Extending Dublin Core also means developing a philosophy of extension and methods to extend the vocabulary without breaking any uses of the original set of data elements. In addition, The Dublin Core format can also be used to exchange data between systems [27].
3. ESSIM
A first version of a new vocabulary named ESSIM, based on Dublin Core elements and focusing on social security information systems, was proposed by Delgado et al. [28]. The objective of this vocabulary is to avoid any possible ambiguity in the meaning of the data exchanged between Social Security institutions. Consequently, the concept used will be ‘data describing a given data’. There lies the main novelty of the proposal, as opposed to the tendency to use metadata to describe full documents or artefacts in a general way, but without controlling the content of each particular data. In ESSIM, metadata is used to describe every single data (or fields) included in the documents (or forms). This initial proposal was later discussed with representatives of different social security organisations from different countries, to refine it until obtaining the minimum set of elements or metadata necessary to describe any data field that is part of any official document related to social security. Given the different linguistic, cultural, legal, technical and administrative environments in the social security domain, there are significant challenges to ensuring that the precise meaning and formats of exchanged information between social security institutions is understood and preserved. Analysing the content model of the official forms currently in use shows that similar or identical concepts are used by all sectors of social security around the world. The analysis shows that there are differences in the naming conventions, in the content structure and the principle content used for those concepts in an applied business context. The consequence of this diversity is a huge effort to process and evaluate information based on these entities especially if the business architecture relies on electronic data exchange. In addition, this diversity introduces risk of misinterpretation and erroneous handling of information. The novelty of ESSIM proposal approach with respect to other metadata specifications, such as Dublin Core, is that it describes all the single component of information resource involved in an electronic exchange between social security institutions. The goal is to avoid any misunderstanding due to semantic ambiguity of the terms used. After discussing the proposal at the international level, we have arrived at the list of 14 elements or metadata that are shown in Table 1.
Description of the elements (metadata) proposed in ESSIM.
ESSIM: Exchange Social Security Information Metadata.
The metadata list is based on 11 of the 15 Dublin Core elements (Table 2). These elements have been refined in order to adjust more to the intended use of ESSIM. In the case of the ‘date’ and ‘relation’ elements, the refinement has originated several ESSIM elements based on them, following a procedure similar to that applied by the Dublin Core Metadata Initiative (DCMI) when defining more specific DC terms based on the original elements of DC [20]. In ESSIM, the elements that have derived from ‘date’ are ‘dateCreated’ and ‘dateModified’, and those based on ‘relation’ are ‘relation’, ‘mandate’ and ‘references’. In the latter case, it coincides that DCMI has also created a new DC term called ‘references’ as a refinement of DC’s ‘relation’ element, so that in turn it could be considered that the one proposed by ESSIM is also a refinement of the term ‘references’ from DC.
Characteristics of the ESSIM elements.
ESSIM: Exchange Social Security Information Metadata; DC: Dublin Core; ISO: International Organisation for Standardisation.
Table 2 shows the set of 14 elements or metadata proposed under the denomination ESSIM to describe the data included in the exchanges of documents between Social Security institutions. The column ‘ESSIM’ refers to the name of each of the proposed elements. The ‘Obligation’ column contains the consideration of the element as mandatory (Mandatory), optional (Optional) or conditioned (Conditional), in the case that it should only exist when certain conditions are met. The column ‘Multiplicity’ indicates if an element can have several values, using the notation min..max, which informs of the minimum and the maximum number of values it can have. For example, when 0..1 appears, it means that it may not have any associated value or at most one value. The ‘*’ symbol indicates that there is no limit on the number of values that can be associated with that element. In the column ‘Base on DC’, the elements of Dublin Core are indicated on which the elements proposed for ESSIM are based, as a refinement of the one that appears in each case. The ‘Value range’ column contains the recognised standard type or vocabulary to which the element’s values should belong. Except in the case of elements that support ‘Literal’-type values, such as dates or texts, the values that can be assigned to the rest of the elements must be in controlled vocabularies, with the purpose, as indicated by Craig and Schriar [29], of standardising the content of the metadata. Among the mentioned vocabularies, three of them are standard. That is why it is proposed to use them directly. It is the case of the following: (a) Core Vocabularies: A Core Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of an entity in a context-neutral way [30]. The Interoperability Solutions for European Public Administrations (ISA) Programme of the European Union has developed Core Vocabularies about businesses, persons, locations and public services [31,32]. These vocabularies have been considered of public interest and published by the World Wide Web Consortium (W3C); for example, the ‘Person Core Vocabulary’ is available in Library of Congress [33]. This vocabulary provides a minimum set of classes and properties for describing a natural person, that is, the individual as opposed to any role they may play in society or the relationships they have to other people, organisations and property. This approach contribute significantly to the broader concept of identity (b) ISO 639-1: It is the first part of the ISO 639 international standard language-code family. ISO 639-1 provides two-character lowercase alphabetic strings that serve as identifiers of languages. Several public institutions have published this standard as an open data resource, as the US Government [33]. (c) Literal: The value space of Literal is the set of finite-length sequences of characters. The value space of Date consists of top-open intervals of exactly one day in length. They are different standards that formally define this kind of data. The most frequently used for web development is XML Schema [34] established by W3C. However, until now no vocabularies have been published for the rest of the values of ESSIM vocabulary. To fill the gap, the authors have also created the following vocabularies, to describe some tools developed by the European Union whose content was not published as a vocabulary to be used in the field of Linked Data: (a) Glossary of Terms (GOT). This glossary defines usual concepts related with Social Security. It consists of a generic definition, followed by a definition specific to each member states with a view to enable an institution to best process the data received from abroad. It was defined in the context of TESS/SOSENET programme (Telematic for Social Security) of the European Union, precedent of current Electronic Exchange of Social Security Information (EESSI) project to help social security bodies across the European Union exchange information more rapidly and secure [35]. (b) Institution Repository (IR). The IR is an institutional data base developed by the EESSI project, which permits that any institution is able to contact any other institutions thanks routing facilities of Directory Services and mechanisms established at both international and local levels [3]. The current IR only contains information about European Union social security institutions, but in the future it could be worldwide extended [36]. (c) EUR-Lex: It is an open data service that provides free access to European Union laws and other public documents [37]. The database is updated daily and contains more than 3 million items with some texts dating back to 1951. Each document is registered with analytical metadata, as publication reference, dates or keywords. (d) OD: An objective of the European Union is the exchange of social security data using structured electronic documents [38]. For this purpose, different categories of forms have been created, as the P-series, with 140 forms, each to be used for exchanging a specific type of social security data related to pensions [39]. These last four vocabularies, GOT, IR, EUR-Lex and OD, along with the ESSIM vocabulary itself, haven been published by the authors of this article in GitHub, and they are also available in RDF format to facilitate its use. Figure 1 shows part of the published vocabulary document.

ESSIM vocabulary published in GitHub (https://socialsec.github.io/voc/essim/).
To better understand the proposal, Figure 2 shows the use of ESSIM specification in a current official EU form ( ‘E106: Certificate of entitlement to sickness and maternity insurance benefits in kind for persons residing in a country other than the competent country’), usually exchanged between social security institutions [37]. This form includes different data fields, for example, ‘Forename’, ‘Surname’ and ‘Date of birth’. According the proposal, the form should be sent together an ESSIM metadata package (with 14 elements) associated with every data field of the form, in order to describe or annotate such as field. On the left of Figure 2 appear the values assigned to metadata elements describing the semantic meaning of the ‘Surname’ data field included in the form with the content ‘Martínez’.

Semantic annotation with metadata of an official form data field.
4. Implementation of ESSIM with RDF vocabularies
The general idea that underlies the implementation of the ESSIM proposal presented in this section is that the RDF technology allows interoperability between the different existing resources based on the concepts represented in the vocabularies presented in the previous section, as is shown in Figure 3. It involves describing data or fields included in forms or OD exchanged between social security agencies, using metadata. The values that can be assigned to some of these metadata refer to resources that could be accessed using different technologies. For example, the GOT is available in PDF format (technology T1 in Figure 3), while the information of the institutions is available in a database on a server of the European Commission, which must be accessed from a certain user interface (technology T2 in Figure 3). Using RDF technology, the contents of these repositories can be described and exposed to their users in a uniform way in a single format. Previously, it is necessary to create the ESSIM vocabulary as well as the rest on which it is based according Figure 3 and previous section description.

General scheme of data description using ESSIM metadata.
Figure 4 shows a conceptual diagram in UML that represents these vocabularies. The diagram was made using the UML profile known as Ontology Definition Metamodel (ODM), created by OMG to represent RDF models [40] using stereotypes. To simplify the diagram, the following prefixes have been used to reference each vocabulary:
In the diagram, two elements used from other existing vocabularies appear shaded, with the following prefixes:

Simplified conceptual model of the vocabularies created.
The descriptions of the proposed vocabularies have been published in the URLs indicated by the prefixes, where they are also available in RDF format. For reasons of space, only the content of the ESSIM vocabulary represented with RDF-Turtle serialisation [41] is shown in Figures 8 and 9 in Appendix 1.
In addition, these vocabularies, as well as instances thereof, have been installed in a SPARQL Endpoint available at the University of Alcalá, located at http://tifyc-pmi.cc.uah.es/fuseki/. They have been published as a dataset with the name ‘socialsecurity’. In this way, anyone can use the available editor to execute SPARQL queries [42] on the vocabularies, in addition to integrating with other SPARQL Endpoints installed in the context of the linked data of the Web and integrate with other data sets. For example, a simple query about the name in English of the institution whose acronym is ‘INSS Madrid’ would be the following:
Figure 5 shows the result of this query.

Example of SPARQL query.
5. Using microdata and JSON-LD scripts as semantic annotations
When a document from one social security organisation is going to be sent to another, it must be accompanied by the descriptions of each of its fields using the ESSIM metadata packages. If the sending is made by email, the official document and the file or files containing the metadata can be attached to it. Depending on the official document format, even the metadata could be included in the document itself. This is possible, for example, in pdf files, which allow other pdf files that could contain the metadata to be attached to it. The usual in these cases is that the metadata are represented as RDF text files, using some of the serialisations available for RDF, such as RDF-XML [43], Turtle [41] or JSON-LD [18]. If the document sent is in HTML format, then the metadata can be embedded in the document itself, using some of the existing alternatives for structured data in web pages, such as RDFa, Microdata or JSON-LD [44]. Figure 10 shows an extract of the HTML code of the official document E106 from the example in Figure 2, with the metadata describing the ‘Surname’ field integrated using the JSON-LD technology [18]. In this case, the description between
Another option is to embed the metadata using Microdata technology [16], using attributes such as ‘itemid’, ‘itemtype’ and ‘itemscope’ (Figure 11 in Appendix 1). In this case, a
In addition to the aforementioned options, there is also the option to use RDFa technology [17]. In this case, the metadata is introduced using attributes such as ‘property’, ‘typeof’, ‘content’, ‘resource’ and ‘prefix’ (Figure 12 in Appendix 1).
The three options described are valid. Pros and cons are considered in Google Developers [44]. In the case of its actual use for the exchange of social security data, the authors have decided in the real examples the JSON-LD option, given its flexibility, since it is the same format in cases in which the metadata are included in the same document sent, or outside it as an attached information. However, in case that some social security institution send a document using Microdata or RDFa, it is possible to use a software tool to extract the embedded metadata and its automatic conversion into a JSON-LD structure, using the application programming interface (API) developed by Stolz [45]. To describe a field of any official social security document using ESSIM metadata, such as the one used in the examples, it is sufficient to have a word processor and know the RDF notation, as well as the content of the vocabularies related to ESSIM. However, in order to allow any clerk of a social security institution to make this description without knowing RDF or the detail of the vocabularies, a prototype has been created in the form of a web application developed in Java that allows introducing the values of the metadata and generate a verified RDF file with the structure established by the ESSIM proposal. To process the RDF vocabularies, the Jena framework [46] has been used. It is a Java library that allows reading RDF graphs contained in files and store them in the form of a model with an object structure, which facilitates their processing and allows queries about them with the SPARQL language and can also read files with rules and make inferences about the knowledge model in memory to expand it generating new knowledge. Figure 6 shows the appearance of the application, developed as a JSP page.

Web application for creating ESSIM metadata values.
In those metadata associated with values of a controlled vocabulary, what is offered to the user is a drop-down list of possible values, in a readable format, without the user having to know their internal representation. For example, in the case of the first of the metadata, ‘Creator Institution’, if the user is looking for an institution in whose name the term ‘Madrid’ is included, a list of institutions repository organisms appears with that term, and the user could choose one of them, such as the ‘National Institute of Social Security. Provincial Head Office of Madrid’ (Figure 7). However, internally in RDF format, the value that would be assigned to the property ‘creatorInstitution’ would be the IRI of the resource corresponding to that institution in the IR vocabulary, which would be

Selection of an institution for the metadata ‘Creator Institution’.
6. Discussion and conclusion
An effective information exchange between social security systems, especially cross-border, requires semantic interoperability features. In particular, semantic interoperability is fundamental for developing consistent definitions and interpretations of the data to be processed by different organisations. Most accepted approaches to address these interoperability issues consist of using controlled vocabulary and metadata, but such solutions have to be standardised and also applicable in social security institutions with different development degrees. It is worth to point out that, while social security institutions worldwide have been increasingly adopting Web Services and XML technologies to implement data exchange and cross-organisational collaboration, semantic interoperability requirements remain largely unsolved. This article proposes a specification metadata elements set called ESSIM, which is based on Dublin Core metadata and other standard vocabularies. The applicability of this specification has been tested using RDF(S) technologies, such as RDF, JSON-LD, Microdata, RDFa and SPARQL. The current proposal is aligned with the current implementation approaches of the European EESSI project [36], particularly on developing a common data model to reduce mismatches on structure and concepts in exchanged information. This is quite important and the different stakeholders need an interesting effort to reach agreements on using a semantic ecosystem that will allow the creation of interoperable public services from heterogeneous/distributed organisations. Moreover, the first author of this article is an active member of the European Union working group developing these specifications. Furthermore, the proposal presented in this article is coordinated with the ISSA, which supports institutions worldwide implementing social security schemes through guidelines [2] and additional specifications on interoperable social security systems. In particular, preliminary results of an on-going project addressing the implementation of international social security agreements, which involves an intensive data exchange, are being implemented using the approach presented in this work. The different stakeholders should reach agreements at a political level to have a complete ecosystem with several actors; in the meantime the foundations to solve the technical interopearbility are promising. Summarising, the coordination of the here presented work with other international initiatives generates promising perspectives for a practical application of the mechanisms developed in this article.
7. Future work
As part of the future work, the intent is to propose ESSIM as an ISO standard with the ISSA support. Once the standard is established, an Interchange Field Format will be defined to facilitate the exchange of metadata set. Although elements values are based on existing controlled vocabularies, unfortunately their scope is only the European Union. Extending it to other regions is also part of the future work. The extended use of ESSIM will provide the opportunity to validate in different contexts the quality of elements incorporated to ESSIM vocabulary. The authors are already working on the implementation of a distributed software architecture based on web services that supports the international exchange of documents based on the proposed metadata, based on the own experience and that of other authors in the application of a semantic approach using linked data for interoperability [47]. The architecture covers the main components and technical aspects involved in a semantic-based data processing for a cross-borders social security coordination. In addition, by using mainstream standards and technologies, this proposal aims at facilitating a practical adoption of such advanced services by social security institutions. In addition, it is planned to make use of Data Catalogue vocabulary (DCAT) [48] in order to describe the defined structured datasets [49] with the purpose to improve the data exchange between community-specific metadata vocabularies, while providing at the same time semantic interoperability with other applications. Finally, formalising business rules established in the international social security agreements and using the semantic-based platform to execute them, as logic-based specifications, will constitute another step forward towards automating the operations of these agreements.
