Abstract
Ontologies are computational artifacts that model consensual aspects of reality. In distributed contexts, applications often need to utilize information from several distinct ontologies. In order to integrate multiple ontologies, entities modeled in each ontology must be matched through an ontology alignment. However, imperfect alignments may introduce inconsistencies. One kind of inconsistency, which is often introduced, is the violation of the conservativity principle, that states that the alignment should not introduce new subsumption relations between entities from the same source ontology. We propose a two-step quadratic-time algorithm for automatically correcting such violations, and evaluate it against datasets from the Ontology Alignment Evaluation Initiative 2019, comparing the results to a state-of-the-art approach. The proposed algorithm was significantly faster and less aggressive; that is, it performed fewer modifications over the original alignment when compared to the state-of-the-art algorithm.
Introduction
Studer et al. (1998) defined ontology as a “formal and explicit specification of a shared conceptualization”. In this definition, “formal” means that it should be machine-readable, “explicit” reflects that the concepts and constraints on their use must be defined explicitly, “shared” means that the knowledge contained in an ontology is consensual among a group of people and it is not private to some individual and, finally, “conceptualization” refers to an abstract model of a portion of reality. Thus, ontologies specify well-defined meaning for the information used by some community. It is often the case that several communities build distinct ontologies for domains that are related or even the same. Frequently, it is useful to integrate information from these separate ontologies, especially in distributed contexts such as the Semantic Web. In order to integrate two ontologies, it is necessary to find mappings between concepts from both ontologies. The process of finding such mappings is called ontology matching, and the result of this process is an ontology alignment.
Although there are several distinct approaches to match ontologies, none is free of errors and ontology alignments frequently lead to unintended consequences. Among the most common consequences of flawed alignments is the violation of the conservativity principle. The conservativity principle states that merging two ontologies through an alignment should not introduce new relations of subsumption or of equivalence between concepts originating from the same source ontology. A subsumption relation holds between two concepts c and d if and only if every instance of d is also an instance of c – thus, we say that c subsumes d. An equivalence relation holds between two concepts if they share all their instances in every possible state of affairs, i.e., c is equivalent to d if and only if every instance of c is also an instance of d and every instance of d is an instance of c. Thus, equivalence implies mutual subsumption and is implied by it. Therefore, c is equivalent to d if and only if both c subsumes d and d subsumes c.
If an alignment violates the conservativity principle, the merged ontology may not preserve the original meanings of the concepts specified by the source ontologies. For example, if an alignment matches the concepts person, client and company from two ontologies A and B, where ontology A specifies that person subsumes client (that is, every client is a person) and ontology B states that client subsumes company (i.e., every company is a client), a query for person in the merged ontology will return every instance of company in the knowledge base, which does not reflect the results of the same query neither in ontology A nor in ontology B separately. That is, the merged ontology does not agree with the intended meaning of either source ontology, i.e., it is incompatible with their conceptualizations.
This disagreement indicates that the alignment is matching concepts with divergent meanings in each ontology. Such matches occur due to terminological or structural similarities between the concepts, which lead the domain expert or the matching algorithm to believe their meaning is the same even when it is not. Figure 1 depicts an abstraction of this problem. An alignment may cover any portion of the intersection between ontologies A and B. However, a large part of the intersection of the ontologies’ terminologies do not account for their conceptualizations, i.e., the intended meaning of the terms. The goal of this work is to improve ontology alignments by removing incorrect matches that originate from this false agreement. The violations of the conservativity principle are evidence of such incorrect matches.

False agreement between ontologies A and B.
In the previous example, the concept client in ontology A refers specifically to personal clients, i.e., persons that are clients of some company, while in ontology B the concept client refers to a broader range of entities, including corporate clients, that is, companies which are clients of other companies. Further, in ontology B the concept company actually refers only to corporate clients, excluding suppliers, for example, while the concept company in ontology A does not have this restriction. Therefore, concepts client and company do not have the same meaning in both ontologies. The introduction of new subsumption relations by the alignment, which violates the conservativity principle, evidences this fact.
Another frequent kind of consequence from flawed alignments is the inconsistency of the merged ontology, that is, the presence of concepts that cannot be satisfied. While inconsistency reflects logical flaws in the alignment, conservativity violations reflect mismatching of meaning. That is, inconsistency rises from contradictory axioms in the merged ontology, regardless of the meaning of the concepts being the same. Conservativity violations, on the other hand, rise from conflicting sets of subsumption and equivalence relations. This conflict reveals disagreements between the meaning of concepts as expressed in the ontologies, regardless of the effect these sets have on the satisfiability of the resulting axioms.
As such, conservativity violations are harder for the user to perceive, and much of their danger lies in the fact that the resulting ontology upholds a divergent view of the world, a fact that is not immediately clear for the users. For the same reasons, non-conservative alignments are even more common than inconsistent alignments. The present work deals only with the problem of detecting and correcting conservativity violations. Consistency issues fall outside the scope of this work.
It should be noted that there are distinct perspectives on conservativity violations, regarding whether such violations indicate flaws in the alignment or in one or both source ontologies (Solimando et al., 2017). The latter interpretation holds that the introduction of new subsumption relations in the merged ontology is evidence of missing subsumption relations in the source ontologies. This interpretation is adopted by Lambrix and Liu (2013) and Ivanova and Lambrix (2013). We adopt the perspective that violations of conservativity denote a mismatching of the aligned entities, since separate ontologies are usually designed with distinct outlooks on the world, which will culminate in conflicting sets of subsumption relations – a fact which is disregarded if we consider conflicts as flaws in the ontology itself. On the contrary, we emphasize this consideration by understanding the conflicting subsumption relations as signs of a defective alignment.
Another important aspect of conservativity violations is that there are two types of violations, violations of subsumption conservativity and violations of equivalence conservativity. Violations of subsumption conservativity occur when the alignment introduces new subsumption relations between concepts originating from the same source ontology. Violations of equivalence conservativity occur when the alignment introduces new equivalence relations between concepts originating from the same source ontology. Alignments may set forth new equivalence relations due to either introducing circular chains of subsumption relations or to mapping two or more entities from one of the source ontologies to a single entity in the other.
The algorithm presented in this paper is an evolution of the one proposed by Antunes et al. (2019). Our algorithm corrects both kinds of conservativity violations, i.e., violations of equivalence and of subsumption conservativity. Since equivalence implies (and is implied by) mutual subsumption, the algorithms do not explicitly differentiate violations of subsumption conservativity from violations of equivalence conservativity, even though other approaches deal differently with violations of these two distinct kinds, such as in the work of Solimando et al. (2014b, 2017).
The remainder of this paper is organized as follows. Section 2 presents and discusses related work on the correction of conservativity violations in ontology alignments. Section 3 describes the proposed algorithm. We present a theoretical time-complexity analysis in Section 4 and sketch an analysis of correctness in Section 5. Section 6 discusses the evaluation of the algorithm against reference datasets from the Ontology Alignment Evaluation Initiative (OAEI) and a comparison with a state-of-the-art algorithm. Finally, Section 7 contains a brief conclusion.
The conservativity of ontology mappings appears in the work of Meilicke (2006) under the name of “stability”, along with an algorithm that checks stability in mappings from individual ontologies into the merged ontology. The check is conducted by selecting every concept in the local ontology and verifying if the set of superclasses is the same in the local context (i.e., inside the source ontology) and in the distributed context (in the merged ontology). If the sets are different for at least one concept, the mapping is not stable. However, the algorithm does not correct the detected violations.
In the ASMOV algorithm (ASMOV stands for Automated Semantic Merging of Ontologies with Verification), proposed by Jean-Mary et al. (2009), the semantic verification of ontology alignments is conducted by iteratively selecting two pairs of matched entities and verifying several kinds of inference. The algorithm verifies the occurrence of multiple-entity correspondences (introduction of equivalence relations due to the matching of a single entity to several), crisscross correspondences (introduction of equivalence due to the matching of pairs of entities with inverse subsumption relations in each ontology), disjointness-subsumption contradiction (introduction of subsumption between disjoint entities), subsumption and equivalence incompleteness (introduction of subsumption and equivalence in general), and domain and range incompleteness (mismatch of the domains of matched properties). Excluding domain and range incompleteness, the checks conducted refer to violations of the conservativity principle.
The technique used by Jiménez-Ruiz and Grau (2011) checks only for violations of equivalence conservativity by verifying if the alignment maps directly two different concepts from one source ontology to a single concept in the other. To the extent of our knowledge, this is the only existing approach for correcting conservativity that does not require the merging of the aligned ontologies. However, it also does not account for violations of subsumption conservativity, and ignores the cases when violations of equivalence conservativity arise indirectly from the inclusion of circular subsumption chains.
In the work of Lambrix and Liu (2013), a network of aligned ontologies is debugged by computing a merged ontology over the network and checking if every subsumption relation in the merged ontology between a pair of concepts from the same source ontology also exists in the source ontology. Their approach assumes the opposite perspective from the one adopted in the present paper, that is, it assumes that the violations detected indicate missing subsumption relations in the source ontologies. Therefore, their algorithm corrects the violations through the inclusion of new subsumption relations in the original ontologies. The method conducted this correction by selecting, for each missing subsumption relation “a subsumes b”, a concept c that subsumes a and does not subsume b and a concept d subsumed by b that is not subsumed by a and creating a new subsumption relation “c subsumes d”. The algorithm presents all possible corrections to the user, who then selects the correction of his choice. Ivanova and Lambrix (2013) modified this approach, in order to acknowledge that at least some conservativity violations are indications of flaws in the alignments, by submitting the detected violations to a domain expert for validation.
Solimando et al. (2014a) first extract locality-based modules from both the aligned ontologies and encode the modules as Horn propositional theories and extend such theories with disjointness axioms. By following the assumption of disjointness (i.e., that all concepts that do not share subsumees are disjoint), the problem of detecting subsumption conservativity violations is reduced to the detection of unsatisfiable concepts in the Horn clauses. However, this assumption is not reasonable, since many ontologies accept the instantiation of multiple concepts without a common subsumee, mainly due to the combinatorial explosion of concepts whose explicitation would be otherwise necessary. Take as example a small ontology with concepts man, woman, teenager, adult, elderly, student, artist, retail worker, farmer. In order to make explicit every possible combination of non-disjoint concepts, even if we assume that the sets {man, woman}, {teenager, adult, elderly} and {student, artist, retail worker, farmer} are each pairwise disjoint, would require the inclusion of at least 61 new concepts, including concepts such as teenager woman, elderly farmer and adult male artist. This means that the small ontology would grow more than five times. If we do not assume that the concepts in each set are disjoint, this number grows even bigger. Another important aspect of the mentioned work is that locality-based modules are extracted from the aligned ontologies and codified as Horn propositional theories prior to its extension with additional disjointness axioms and the detection of unsatisfiable concepts.
Solimando et al. (2014b, 2017) extended this approach to cover both type of conservativity violations by using it in conjunction with an algorithm that detects violations of equivalence conservativity through searching for loops in the ontologies represented as directed graphs, where nodes are concepts and edges are subsumption relations. The algorithm proposed was implemented as an extension of the LogMap ontology matching and mapping repair system (Jiménez-Ruiz and Grau, 2011).
Table 1 summarizes the reviewed work. As we have seen, most approaches to the validation of conservativity in ontology alignments rely on merging the aligned ontologies. The work of Solimando et al. (2014a,b, 2017) innovates from anterior work by not iterating over all entities in the ontologies, but still requires heavy reasoning tasks after merging the ontology modules. The two last rows list our previous work (Antunes et al., 2019), which will be detailed in next section, and the current paper. Our techniques do not require merging any portion of the ontologies, and both iterate only over the entities present in the alignment. However, the previous approach was limited to detecting violations without correcting them. The next section presents our current approach in detail.
Comparison of alignment conservativity validation approaches
Comparison of alignment conservativity validation approaches
Only considers violation of equivalence conservativity.
The user defines whether each violation indicates a flaw in the alignment or in the ontologies.
Only considers violation of subsumption conservativity.
Follows the assumption of disjointness.
Additionally, while some authors regard conservativity violations as flaws in the aligned ontologies, the largest trend is to acknowledge them as flaws in the alignment itself. We follow this trend and take a “better safe than sorry” approach, aiming to remove every violation from the alignment.
In a previous paper (Antunes et al., 2019), we analyzed conservativity violations under category theory, a branch of mathematics that studies the structure in systems of composable relations. That work defined a category of ontologies where such relations are subsumption-preserving total mappings between ontological structures, and an alignment between ontologies A and B is a structure with two mappings

Alignment between ontologies A and B.
In the remainder of this section, we present our current approach for correcting conservativity violations in ontology alignments. The category-theoretic formalization guides the development of our approach, understanding alignments as pairs of mappings and conservativity as the existence of subsumption-preserving mappings between the range ontologies. Essentially, the algorithms search the category of ontologies for an object with an injective mapping into the original alignment, i.e., a “subset” of the alignment, such that the composition of this injective mapping with the original mappings (from the alignment and into the ontologies) yields a new alignment for which the subsumption-preserving mappings between the range ontologies do exist.
In the scope of the proposed algorithm, we formalize ontologies as pairs (E,S) of a set of entities E (which contains both concepts and relations among them) and a reflexive, transitive and antisymmetric relation
The implementation of the algorithms deals with OWL 2 ontologies. In this context, the set E in the formalization is actually the union of the set of classes and properties in the OWL ontology, and S is the union of the sets of subclass and subproperty relations, both declared and inferred. We utilize the function “ancestors”, that returns the set of superclasses or superproperties for classes and properties respectively. In this manner, the algorithm is able to give a single treatment for both classes and properties, although in reality they constitute disjoint sets. Thus, we present both in a single set for simplification.
Following our previous work, first it is necessary to build the taxonomy of the concepts for each of the range ontologies A’ and B’ that include all aligned entities (and no other entity) and every subsumption relation between them in A and B respectively. These range ontologies are rather “copies” of the alignment, extended to include the subsumption relations in the respective ontology. Therefore, their sets of entities are isomorphic to the set of entities in V. For simplification, we will assume
Even when the alignment maps two entities c and d into a single entity e in A (or in B), the two entities will appear in the respective range ontology – with additional subsumption relations c subsumes d and d subsumes c, given that subsumption is a reflexive relation and therefore e subsumes e. Similarly, if the alignment maps c and d respectively to e and f, such that an equivalence relation holds between e and f, the range ontology will include relations c subsumes d and d subsumes c, since equivalence implies mutual subsumption (i.e., e is equivalent to f if and only if e subsumes f and f subsumes e).
Once we have the taxonomies built, still according to Antunes et al. (2019), the existence of a subsumption preserving mapping between the range ontologies determines the conservativity of the alignment. Since the sets of entities in the range ontologies are isomorphic, such mapping relies on compatible subsumption relations in the range ontologies. Thus, the method detects conservativity violations by finding discrepancies between the subsumption relations from the range ontologies, i.e., pairs
Therefore, in order to correct the detected conservativity violations it is necessary to remove from the alignment V a set of entities comprehensive enough so that no subsumption relation between a pair of aligned entities occur in a single aligned ontology. We propose to model the offending subsumption relations as edges in a graph whose nodes are the aligned entities. Thus, all vertex covers for the graph are suitable sets of entities to be removed from the alignment for it to become conservative.
The nodes of G are the entities in the alignment, and
For any two nodes a and b, (a,b) is an edge if and only if
Find a vertex cover C for G, i.e., a set of nodes such that for every edge in G at least one of its endpoints is in the cover, defining
It is preferable that the chosen set were as small as possible, in order to minimize the difference between the original alignment and the corrected alignment and preserve the largest possible amount of information. However, finding the minimum vertex cover of a graph is an NP-complete problem, as proved by Karp (1972), and no efficient algorithm exists to fulfill such a task.
Nevertheless, the minimum vertex cover can be approximated by a factor of 2 by a simple greedy algorithm described by Papadimitriou and Steiglitz (1998). Papadimitriou’s algorithm finds a cover that is at most twice the size of the minimum vertex cover by choosing edges from the graph, removing both endpoints from the graph and inserting them into the cover until no edges are left. While slightly better approximations exist, such as the one proposed by Karakostas (2005), our approach takes advantage of the greedy algorithm by exploring the fact that the graph is not given as input, but must be constructed from the alignment. Thus, we compute the cover at the same time as building the graph. Whenever the algorithm, during the graph construction process, would introduce an edge in the graph, it instead removes both endpoints from the graph and includes them in the cover.
Figure 3 shows a diagram with our approach’s main steps. First, the algorithm builds the graph from the alignment (step 1), selecting edges and removing both endpoints from the graph (step 2) while including those nodes in the vertex cover until no edge remains (step 3). We perform an additional improvement procedure to reduce the repair size by verifying if reintroducing the removed nodes leads to conservativity violations (step 4) and reinserting the “safe” nodes (step 5). Finally, the computed vertex cover is the set of mappings to be removed from the alignment for it to become conservative.

Depiction of the repair computation for a sample alignment.
We present our greedy algorithm detail in Algorithm 1. This algorithm performs steps 1 through 3 from Fig. 3. In the remainder of this work, we will refer to Algorithm 1 as GREEDY. It receives as input the alignment (that is, V and the two mappings
It is noteworthy that this approach is enough to cover violations of equivalence conservativity as well as violations of subsumption conservativity. Since subsumption is reflexive (i.e., every entity subsumes itself), if the alignment maps two entities e and d in V to a single entity in A (i.e.,

GREEDY. Greedy algorithm for computing alignment repair
Another important aspect is that, given that GREEDY is interested only in those subsumption relations that hold between entities in the alignment and that it iterates over the entire alignment, it is not necessary to look into both ancestors and descendants of each entity. Instead, it suffices to look into only one of these groups, since, if the algorithm checks only ancestors (respectively, descendants) for each entity, the subsumption relations between the entity and its descendants (ancestors) will be accounted for when the iteration reaches these descendants (ancestors).
As previously discussed, this algorithm yields an approximation of the minimum repair necessary for removing the conservativity violations from the alignment. Since GREEDY is built based on the algorithm analyzed by Papadimitriou and Steiglitz (1998), the approximation may be up to twice the size of the minimum repair, which frequently leads to extremely aggressive repair strategies.
In order to consider this and reduce the impact of the repair, we include a posterior step of repair improvement. Algorithm 2 depicts this step, corresponding to steps 4 and 5 of Fig. 3. We will refer to Algorithm 2 as IMPROVE. The IMPROVE algorithm iterates over the entities in the original repair (

IMPROVE. Repair improvement algorithm
Since IMPROVE does not iterate over all the entities in the alignment, it is necessary to check both ancestors (lines 4 and 13) and descendants (lines 9 and 20) of each entity in order not to reintroduce any conservativity violation. IMPROVE checks every entity in the original repair to verify if reinserting it in the alignment indeed leads to conservativity violations and inserts those that do in the new repair. Thus, while GREEDY includes two entities at a time in the repair, IMPROVE includes only one. Apart from these two key differences, the algorithms are very similar.
It is essential to note that when IMPROVE takes an entity from the previous repair and reinserts it into the alignment, it also includes all subsumption relations in which that entity is involved in the taxonomies
It is also worth pointing out that for every edge in the graph that share no endpoint with other edges, IMPROVE puts only one of the connected nodes in the repair, while GREEDY puts both. In the case where no edges share nodes, which is the worst-case scenario for GREEDY where the computed vertex cover size is two times the number of edges in the graph, the improvement provided by IMPROVE reduces the repair size to its half, i.e., the minimum possible repair. In the empirical evaluation of the algorithms (Section 6), this was the case for all alignments in the Conference dataset. While for many other cases the reduction will not be so significant, it is enough to state that the resulting repair is always smaller than twice the size of the minimum.
The for loop in line 6 in Algorithm 1 (GREEDY) iterates over each ancestor of an entity in one of the input ontologies and contains a nested for loop (line 7) that runs over the entities in the alignment that are mapped to said ancestor. Since each entity in the alignment is mapped to exactly one entity in each ontology, the total number of iterations considering the two nested loops may not be greater than the number of entities in the alignment. This property is expressed in Eq. (3), where f is the function that maps the entities in the alignment to the entities in the aligned ontology and
Two other loops follow this previous one with similar structure (line 10 and line 17), and the three loops are inside of the main loop (beginning on line 4). The main loop iterates over the entities in the alignment – that is, it runs n times. Thus, the algorithm time complexity is asymptotically bound by
IMPROVE has a similar structure, but with the duplication of the internal loops in order to account for the entity’s descendants as well as the ancestors. Since the new loops are included sequentially and present the same complexity limits as the old ones, they do not alter the run-time complexity.
Correctness analysis
We have asserted that, in order for an alignment to violate the conservativity principle, there must exist, for each violation, a pair of aligned entities
In the context of Algorithms 1 and 2, the subsumption relations between the aligned entities in each ontology are stored in the taxonomies
At the end of execution, there is no pair of entities
At the end of each iteration, there is no pair of entities
Algorithm 1 (GREEDY) iterates over every entity in the alignment and every ancestor of each entity, reaching every subsumption relation holding between aligned entities. Since the algorithm includes the corresponding subsumption relation in the taxonomy, there can be no relevant subsumption relation in the ontologies that is not in the taxonomies. The only cases when the algorithm does not store the subsumption relation is when the corresponding relation does not exist in the other taxonomy (that is, in lines 13 and 20), and in those cases the algorithm immediately includes the involved entities in the repair, ensuring Property 1. For these same reasons, there can also be no pair of entities involved in such a relation that was not included in the repair, guaranteeing Property 2.
Similarly, Algorithm 2 (IMPROVE) iterates over every entity in the repair computed by GREEDY and every ancestor and descendant of those entities. When validating each entity, the algorithm compares all its subsumption relations in each ontology, store the relations in the respective taxonomies (securing Property 1) and include the entity in the repair in all the cases when there is a disagreement between the taxonomies (lines 16, 23, 30 and 35, securing Property 2). As we have also pointed out, this strategy assures that the algorithm will take into account the interactions between all entities from the input repair before reintroducing them.
For the empirical evaluation of the algorithm, we used reference alignments from three tracks from the Ontology Alignment Evaluation Initiative 2019.1
The Conference track provides 21 manually created alignments between 7 ontologies from the OntoFarm collection (Zamazal and Svátek, 2017), covering the domain of academic conferences. This dataset contains small ontologies (the biggest has 141 classes) and small alignments (up to 26 mappings). Out of the 21 alignments in the dataset, only 10 present conservativity violations. The following analysis focuses on these 10 alignments and leaves out the remaining alignments. The Anatomy track comprises the Adult Mouse Anatomy (AMA) ontology and a part of the National Cancer Institute (NCI) Thesaurus describing human anatomy (Zhang et al., 2004), along with a reference alignment containing 1516 mappings. Finally, the LargeBio track involves three ontologies with tens of thousands of classes each, the Foundational Model of Anatomy (FMA), the Systematized Nomenclature of Medicine (SNOMED) – Clinical Terms and the NCI Thesaurus. The track also contains three reference alignments based on the Unified Medical Language System (UMLS) Metathesaurus (Bodenreider, 2004) and built by a combination of automatic techniques, expert assessment, and auditing protocols. The reference alignments from OAEI 2019 contain only equivalence mappings, and therefore the limitation of the proposed alignment in dealing with other types of mappings was of no matter to the evaluation performed.The algorithm was implemented and executed with the reference alignments as input2
The algorithm was implemented in Python 3 with the libraries RDFLib and Owlready2, the latter of which includes a modified version of the HermiT reasoner. All evaluations were performed on an Ubuntu 18.10 desktop computer with an Intel Core i7-7700 CPU @ 3.60 GHz × 8 and 16 GB of RAM. The source code and corrected alignments are available at http://github.com/BDI-UFRGS/conservativity/.
Results of the experimental evaluation over alignments from OEAI 2019
It is noteworthy that the improvement step (i.e., the IMPROVE algorithm) reduced the repair resulting from the GREEDY algorithm to half its size for all alignments from the Conference dataset. As previously discussed, this means that the algorithm has reached the minimum possible repair that removes the conservativity violations from the alignment. The reduction rate provided by IMPROVE ranged from 33.5% (again for the SNOMED-NCI alignment) to the maximum possible reduction of 50% in the Conference dataset.
The execution time of both steps of the algorithm was small – even for the largest alignment, with over 18 thousand mappings, the combined total was of 10.16 seconds. The most significant bottleneck in the entire correction process was actually the execution of the OWL 2 reasoner, which is outside the scope of the current paper but is a prerequisite for the algorithm to remove all possible conservativity violations. The use of module extraction techniques to select only the significant portion of the input ontologies prior to running the reasoner may go a long way towards reducing said bottleneck, but the modularization approach must be chosen carefully, since modules of poor quality will reflect directly on conservativity violations persisting in the resulting alignment.
We compare those results to the performance of the LogMap3
LogMapC was compiled from the source code available at http://github.com/ernestojimenezruiz/logmap-conservativity/.
While the authors of LogMapC present a softer version of the conservativity principle, both the referenced paper and the algorithm implementation address the usual definition of conservativity, as taken here. All tests conducted take the stronger notion of conservativity into account.
Results of the evaluation of LogMapC over alignments from OAEI 2019
In the case of LogMapC, the percentage of mappings altered (i.e., either removed or replaced with subsumption mappings) ranged from 8% to 86.6%. The later is verified for the FMA-SNOMED alignment in the LargeBio dataset, with 7802 removed or replaced mappings out of the original 9008. The execution time, which again does not include the time required for computing modules from the source ontologies and running the reasoner, reached over 36 minutes for the SNOMED-NCI alignment.
We summarize the results from both algorithms in Tables 4 and 5. Table 4 presents a size comparison of the repairs computed by both algorithms. The columns under “Removals × 1” present the respective sizes when counting both removing an equivalence mapping or replacing it with a subsumption mapping as a single modification. The columns under “Removals × 2” present the respective sizes when counting removing an equivalence mapping as removing two subsumption mappings, that is, as two modifications. Table 5 presents a comparison of the execution time of both algorithms in seconds.
Repair size comparison between GREEDY+IMPROVE and LogMapC (the smaller the better)
Our algorithm compares very favorably to LogMapC. Even when counting the removal of equivalence mappings as two modifications – which is favorable to LogMapC, since it is the only algorithm that performs other types of modifications – there where only two cases where LogMapC’s repair was smaller: for the alignments cmt-sigkdd and conference-iasted in the Conference dataset, both cases where the proposed algorithm’s repair contain a single removal and the repair computed by LogMapC contains the replacement of a single equivalence mapping for a subsumption mapping. In other two cases, both algorithms recommended the removal of the same number of mappings, for the alignments cmt-ekaw and confof-ekaw. Finally, for the alignment edas-ekaw, LogMapC recommends the substitution of two equivalence mappings for subsumption mappings, while our algorithm recommends the removal of a single equivalence mapping – since removing an equivalence mapping counts as two modifications, both repairs are deemed as equally aggressive. These three cases are also in the Conference dataset. For every other alignment, the repair computed by the proposed algorithm is considerably less aggressive, ranging from 16.6% to 75% smaller, when counting the removal of an equivalence mapping as two modifications.
Execution time comparison between GREEDY+IMPROVE and LogMapC
When it comes to the execution time, GREEDY+IMPROV was over 17 times faster than LogMapC for every input case. In the case of the SNOMED-NCI alignment, which was the most time-intensive for both algorithms, our algorithm was 213.3 times faster than LogMapC.
Apart from the difference in execution time and repair size between the algorithms, it is adequate to compare the computed repairs qualitatively. We will briefly present a textual description of the similarities and distinctions between the computed repairs for the alignments in the Conference dataset and for the remaining alignments (from the Anatomy and the LargeBio datasets) we shall lay out comparison tables.
For the alignments in the Conference dataset, in one case the two repairs were identical (confof-ekaw), in another, our algorithm removes a subset of the mappings removed by LogMapC (cmt-conference). In two cases, there was an overlap between the sets of mappings to be removed (cmt-confof and edas-iasted), and, in one case, the repairs were disjoint (cmt-ekaw). In the remaining five alignments, our algorithm removed either exactly the same mappings (cmt-sigkdd and conference-iasted) or a subset of the mappings (conference-edas, conference-ekaw and edas-ekaw) which LogMapC replaced with subsumption mappings.
The cases where our algorithm removed mappings that were replaced by LogMapC (cmt-sigkdd, conference-iasted, conference-edas, conference-ekaw and edas-ekaw) reflect that there is a limit after which the algorithms cannot reduce the repair further by reinserting equivalence mappings in the alignment without reintroducing conservativity violations. Instead, LogMapC further reduces the repair’s impact by replacing those equivalence mappings with subsumption mappings. The inclusion of an additional step in our algorithm to check if the removed mappings may be reintroduced as subsumption mappings could result in even less aggressive repairs.
When it comes to the three alignments from the Conference dataset where there were mappings removed by our algorithm that were left unchanged by LogMapC (cmt-ekaw, cmt-confof and edas-iasted), it is essential to notice that, when it comes to an isolated violation, removing either end of the offending relation from the alignment suffices to repair the violation. This behavior was precisely the case for the cmt-ekaw alignment, which introduces a new subsumption relation between concepts paper author and conference participant from the ekaw ontology from the mappings between ekaw’s paper author and cmt’s author and between ekaw’s conference participant and cmt’s conference member. Our algorithm removed the latter mapping, while LogMapC removed the first. Likewise for cmt-confof, which introduces a subsumption relation between author and member.
Finally, a similar situation occurs for edas-iasted, however with the introduction of three new subsumption relations in edas centered in the concept attendee – while our algorithm removed only the mapping between edas’ attendee and iasted’s delegate, LogMapC removed the mappings for the other three concepts (i.e., author, reviewer and session chair). Our algorithm guarantees the significantly smaller repair by removing both endpoints of one of the three violating relations in GREEDY (say, author and attendee), which simultaneously removes the other two relations – since one of the endpoints is not there anymore – and the posterior attempt to reinsert them in IMPROVE – reinserting attendee would reintroduce violating subsumption relations with reviewer and session chair while reinserting author would not.
Tables 6 through 9 lay out the comparison between repairs computed for the alignments AMA-NCI, from the Anatomy dataset, FMA-NCI, FMA-SNOMED and SNOMED-NCI, from the LargeBio dataset. The rows present the actions our algorithm performed on the mappings, while the columns present the actions taken by LogMapC. Each cell contains the number of times that the algorithms chose those actions for a mapping in the alignment. The tables refer to the equivalence mappings exchanged for subsumption mappings by LogMapC as “replaced” mappings.
The first thing to notice in these tables is that both algorithms agree to a large extent on the non-altered mappings. Our algorithm maintained between 82,59% (FMA-SNOMED) and 96,21% (ANA-NCI) of the mappings that LogMapC kept unchanged. Another point of some agreement is the removed mappings. Close to half of the mappings removed by our algorithm were also removed by LogMapC for the LargeBio alignments (59,25% for FMA-SNOMED, 49% for FMA-NCI and 45,73% for SNOMED-NCI). The percentage is lower for the AMA-NCI alignment, where it is 19,89%.
Comparison of repairs for the AMA-NCI alignment
Comparison of repairs for the FMA-NCI alignment
Comparison of repairs for the FMA-SNOMED alignment
Comparison of repairs for the SNOMED-NCI alignment
A final relevant aspect is that LogMapC does not guarantee that the resulting alignment is completely free of conservativity violations. In fact, for the alignments AMA-NCI from the Anatomy track and FMA-SNOMED and SNOMED-NCI from the LargeBio track, the alignments still contained 4, 56 and 528 violations, respectively, after LogMapC’s repair, according to the algorithm’s output.
We submitted the alignments resulting from our repair process to LogMapC for validation. LogMapC found no remaining violation for all alignments but one. For the SNOMED-NCI alignment, however, LogMapC detected 1227 remaining violations – although it has proposed no repair for those violations. In order to diagnose such violations, we computed the merge of SNOMED and NCI through the alignment and submitted the merged ontology to the HermiT reasoner for classification. Each violation of the conservativity principle should appear in the resulting hierarchy as a subsumption or equivalence relation. Nevertheless, we have found none of the supposed violations in the set of relations that actually occur in the resulting ontology. For example, LogMapC indicated violating subsumption relations between Carcinoma in situ of lip and Malignant tumor of lip in SNOMED and between Recurrent Neuroblastoma and Relapse in NCI, but neither relation exists in the final merged ontology. The same is true for the remaining reported violations. Therefore, we propound that the violations detected by LogMapC are rather false positives. These false positives may originate from the assumption of disjointness – it is possible that LogMapC accuses violations when merging the ontologies introduces a common subsumee between concepts which previously had none.
In the context of the Semantic Web, targeted at computer programs rather than human users, the validation and correction of ontology alignments play a prominent role in the actual realization of semantic interoperability. Further, efficiency in evaluating alignments obtained from diverse sources on the web is critical in enabling the automatic processing of decentralized information.
While ontology alignments are essential for the interoperability of ontology-based systems, alignments frequently lead to conservativity violations. This is the case even for the reference alignments from the Ontology Alignment Evaluation Initiative. This work proposed a quadratic-time algorithm for correcting such violations and evaluated it over 14 alignments from three different tracks from OAEI 2019 edition. The algorithm’s performance compared favorably to a state-of-the-art algorithm for correcting conservativity violations. For all alignments, the proposed algorithm was significantly faster, ranging from 17 times to 213 times faster. For most of the alignments, the computed repair was also considerably less aggressive, preserving more information from the input alignment. Finally, the resulting repaired alignments were verified to contain no remaining conservativity violations.
However, further work may improve some aspects of the proposed algorithm. The algorithm is currently limited to alignments containing only equivalence mappings. A development allowing the algorithm to run over more complex alignments would vastly increase its utility. Further, the inclusion of an additional step (or the modification of the repair improvement step) to check if the removed equivalence mappings may be reintroduced in the alignment as subsumption mappings without introducing new violations may lead to even less aggressive repairs. Also, the inclusion of a modularization step prior to submitting the source ontologies to a reasoner might reduce the most significant bottleneck for the actual execution of the algorithm. Finally, the algorithm did not consider that a mapping between relations (properties, in OWL terminology) necessarily implies a mapping between the domain and range of such relations. A possible easy fix would be for the algorithm to include the derived domain and range mappings explicitly in the alignment, treating the removal of any such mapping as a removal of the original mapping between the relations. We note, however, that this limitation did not affect the comparison with LogMapC presented in Section 6, as evidenced by the fact that LogMapC found no remaining violations in the alignments repaired by our algorithm. This may be due to the absence of violations of this kind in the reference alignments or to redundancies in said alignments which enable the detection of such violations from other mappings.
Footnotes
Acknowledgements
This work was funded by Brazilian agencies CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) and CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico).
