Ontologies are the prime way of organizing data in the Semantic Web. Often, it is necessary to combine several, independently developed ontologies to obtain a complete representation of a domain of interest. The complementarity of existing ontologies can be leveraged by merging them. Existing approaches for ontology merging mostly implement a binary merge. However, with the growing number and size of relevant ontologies across domains, scalability becomes a central challenge. A multi-ontology merging technique offers a potential solution to this problem. We present oerger, a scalable multiple ontologies merging method. It takes as input a set of source ontologies and existing mappings across them and generates a merged ontology. For efficient processing, rather than successively merging complete ontologies pairwise, we group related concepts across ontologies into partitions and merge first within and then across those partitions. In both steps, user-specified subsets of generic merge requirements (GMRs) are taken into account and used to optimize outputs. The experimental results on well-known datasets confirm the feasibility of our approach and demonstrate its superiority over binary strategies. A prototypical implementation is freely accessible through a live web portal.
Ontologies represent the semantic model of data on the web. For many use cases, individual ontologies cover just a part of the domain of interest or different ontologies exist that model the domain from different viewpoints. In both cases, by merging them, their complementarity can be leveraged and unified knowledge of a given domain can be acquired. The merge process plays an important role in multiple different aspects of the Semantic Web, such as ontology reuse (Caldarola and Rinaldi, 2016), knowledge discovery (Finke et al., 2019), and query processing (Livingston et al., 2015) to reduce development efforts, cost, and time. Moreover, merging ontologies is becoming increasingly important for applications from a wide variety of domains ranging from biomedicine (Finke et al., 2019) and food production (Dooley et al., 2018) to social networks (Priya and Cherukuri, 2019) and cultural heritage (Zalamea Patino et al., 2018), to name just a few. In order to combine two (or more) ontologies, first, concepts in these ontologies need to be mapped to each other (matching). Once these mappings have been determined, a merged ontology respecting these mappings needs to be created. In this paper, we assume that the mapping has happened already and focus on the merging phase.
Most existing ontology merging approaches (Guzmán-Arenas and Cuevas, 2010; Ju et al., 2011; Raunich and Rahm, 2014; Zhang et al., 2017; Priya and Kumar, 2019; Priya and Cherukuri, 2019) are limited to merging two ontologies at a time, due to using a binary merge strategy. In contrast, a series of binary merges can be applied incrementally to more than two ontologies, thus merge more than two ontologies. But, this approach is not sufficiently scalable and viable for a large number of ontologies (Rahm, 2016). Merging n ontologies () in a single step, employing what is called an n-ary strategy, only recently gained attention (Osman et al., 2021). It is a necessity to develop an efficient n-ary technique that scales to merging multiple ontologies. Thus, we aim to investigate the extent to which the n-ary strategy can solve the scalability problem.
Let us demonstrate our motivation for performing the n-ary merge with an example and highlight the difference between the binary and n-ary approaches. Figure 1 shows five source ontologies with their correspondences depicted by dashed lines. To estimate the merge effort, we measure three operations during merging: combining the correspondence entities into an integrated entity |combine|; reconstructing the relationship for the integrated entities |reconst|; and output generation |output|. For instance, and from Fig. 1 are combined into a new integrated entity, called (see in Fig. 2) (one time combine process). Then, in the reconstruction phase, the linked relationships to and will be linked to this new integrated entity, i.e., to (here 5 times reconstruct operations). In the binary-ladder strategy (Batini et al., 1986), in the first step, and are combined into an intermediate merged ontology . Then, is merged with and so on. All intermediate ontologies and the final merged ontology are shown in Fig. 2 and the required operations are presented in Table 1. The n-ary merged ontology has the same structure as the final merged ontology of the binary-ladder merge for the given source ontologies. Please note, that binary and n-ary approaches combine different numbers of entities: In the binary method, for intermediate ontologies, newly combined entities are created (see, for example, in ). These newly created entities might be combined again during the creation of the next intermediate ontology (if they have corresponding entities from the given corresponding list) (see for example, in ). The total number of reconstructions (in the binary part) is , while the n-ary approach needs 25 operations, which indicates 24.2% improvement. The binary merge approach needs 8 combinations, while the n-ary method needs 6, which shows 25% improvement in our example. The n-ary method needs one time output generation, while the binary approach needs 4 times. While these numbers are specific to our example, the general pattern will be the same for other examples. The achieved improvements are significant compared to binary approaches, especially when dealing with a large number of ontologies and processing large-scale ontologies.
Five sample source ontologies with their correspondences depicted by dashed lines.
The merged ontologies in each step of the binary merge for the source ontologies from Fig. 1.
Number of operations for merging the five sample ontologies
To handle a large number of source ontologies, we aimed to reduce the time and operational complexity while achieving at least the same quality of the final result or even improve upon it. For efficiently applying the n-ary method on merging multiple ontologies, we use a partitioning-based method inspired by partitioning-based ontology matching systems (Aumueller et al., 2005; Hu et al., 2008; Jiménez-Ruiz et al., 2018). These systems first partition the source schemas or ontologies and then perform a partition-wise matching. The idea is to perform partitioning in such a way that every partition of one schema has to be matched with only a subset of the partitions (ideally, only with one partition) of the other schema. This results in a significant reduction of the search space and thus improved efficiency. Furthermore, matching smaller partitions reduces the memory requirements compared to matching the full schemas. Following that, in our n-ary method, oerger, we develop an efficient merging technique that scales to many ontologies. We show that by using a partitioning-based method, we can reduce the complexity of the search space.1
In our context, the search space is the set of entities and their relations that have to be evaluated for a specific merge step.
Our method takes as input a set of source ontologies and their mappings and generates a merged ontology. At first, we create blocks populated by partitions of the source ontologies. After that, the blocks are individually merged and refined. Finally, they are combined to produce the merged ontology followed by a global refinement.
All refinement steps leverage our earlier work on Generic Merge Requirements (GMRs) (Babalou and König-Ries, 2019a). These are a set of possible requirements towards the merge result that we identified in the literature. Examples include structural preservation, acyclicity, redundancy prohibition, or constraint and entailments satisfaction (see Section 2.5). In our approach, users are able to specify which subset of the GMRs should be met. The refinement step aims to meet these user-specified requirements. The refinement process is a set of operations applied to the merged result to assure the quality of the final merged ontology. We consider local and global refinements on the generated blocks and the last merged result, respectively. We provide experimental tests for merging a variety of ontologies showing the effectiveness of our approach over binary approaches.
Separation of match and merge. We explicitly separate the ontology matching and merging problems. There have been remarkable works on ontology matching as an independent problem (see (Shvaiko and Euzenat, 2011; Otero-Cerdeira et al., 2015) for some surveys). The successful development of ontology matching systems (cf. the result of OAEI (Ontology Alignment Evaluation Initiative)2
http://oaei.ontologymatching.org/
(Algergawy et al., 2019)) increases the potential possibility that not only this research but also future studies on ontology merging use the results of these matching algorithms as input. Thus, ontology merging and matching will become more, rather than less, distinct over time. In this view, merging the ontologies is a process of creating a unified merged ontology from a set of source ontologies with a set of correspondence pairs extracted from a given mapping (Pottinger and Bernstein, 2003). In the literature, some existing ontology merging systems, such as (Ju et al., 2011; Zhang et al., 2017), generate the mapping between the source ontologies themselves. Some others, such as (Raunich and Rahm, 2014), assume the mapping is given. We follow the second group. Since the accuracy of the existing ontology matching tools is relatively high (see the result of OAEI (Algergawy et al., 2019)), we use the achievements of those systems. In oerger, we assume that corresponding sets between two ontologies are known. They can be obtained from curated mappings or ontology matching tools. In the experimental section, we will look at perfect and non-perfect mappings and investigate the differences, but we have not yet investigated how inconsistent mappings can be resolved beyond the repairs provided with the GMRs.
The rest of the paper is organized as follows: our proposed method is described in Section 2 followed by the experimental results in Section 3. A survey on related work is presented in Section 4 and the paper is concluded in Section 5.
Proposed method
Figure 3 provides an overview of oerger. The input is a set of source ontologies with their correspondence sets, extracted from given mappings, and the merged ontology is the output. In the Initialization phase, the source ontologies and the corresponding sets are processed to construct an initial merge ontology (see Section 2.2). In the Partitioning phase, the initial merge ontology, constructed upon the n source ontologies, is divided into k blocks based on structural similarity (see Section 2.3). Then, in the Combining phase, the created blocks are individually refined and finally combined into the merged ontology (see Section 2.4). In the following, we describe preliminaries and each phase in detail. We then present the algorithm of our method along with a running example.
The oerger workflow.
Preliminaries
Many of the terms related to ontology merging are used differently by different authors. In this section, we therefore provide our definition of central terms. An ontology is a formal, explicit description of a given domain (Gruber et al., 1993). It contains a set of entities including classes C, properties P, and instances I. Properties can belong to either taxonomic or non-taxonomic relations.
The ontology matching process or ontology alignment (Euzenat et al., 2013) takes a pair of source ontologies as input and produces a set of correspondences (matches) between the elements of the source ontologies. The mapping between ontologies thus includes a set of corresponding entities. Formally, we present the mapping between ontologies, adapted from (Rahm and Bernstein, 2001) as follows:
A Mapping of two ontologies and consists of a set of tuples , where and , describing the relationship between and , and c is a confidence value, usually, a real number within the interval (0, 1].
The mapping relationship can be one of equality, similarity, or subset (is-a) type. In oerger, we consider the similarity type with at least a given similarity value. We discard subset and other more complicated mappings.
Since existing mappings are usually generated for pairs of ontologies, we maintain a model (see Definition 2), combining the information across a group of correspondences over multiple ontologies.
A model of mappings over multiple ontologies consists of a set of correspondence sets . Each of the correspondence sets holds a set of corresponding entities between the source ontologies.
We use to denote correspondence sets. These sets contain corresponding entities. These can be either classes (in which case we write ) or properties (in which case we write ). For now, we only consider the TBox of ontologies and leave ABox assertions as future work. We use ≡ to represent that two entities are corresponding to each other. Suppose the underlying mappings show and . We combine this information into one correspondence set containing all three entities. We also denote the cardinality of each correspondence set by ; in the previous example that is . In this context, we define ontology merging (Pottinger and Bernstein, 2003) as follows:
Ontology merging is the process of creating a new merged ontology from a set of source ontologies given a set of correspondence sets extracted from their pairwise mappings with .
We assume that users will have quality requirements towards the merged ontology as specified in (Babalou and König-Ries, 2019a) and that the merging process strives to meet them (see Section 2.5).
Initialization phase
This phase takes as input a set of source ontologies with their correspondence sets and provides an initial merge ontology .
An initial merge ontology consists of the intermediate result of the merge process. It shows the merged result over the processed axioms of the source ontologies and their correspondences. This initial merge ontology gradually changes in each step toward achieving the final merged ontology.
Our initialization phase is partially similar to the preliminary process in (Raunich and Rahm, 2014) with an extension for multiple ontologies. This phase includes the following tasks:
Initial merge ontology builder: We build an initial empty merge ontology and parse the source ontologies to load sets of corresponding and non-corresponding entities into .
Correspondences processor: In this step, the correspondence sets from the given mappings are processed to build the model of mappings over multiple ontologies. If several entities from multiple source ontologies correspond to each other, one joined entry for all is created in this model.
Entity integrator: For each entry of , a new integrated entity in is created. This means the corresponding entities are combined into a new integrated entity, replacing the original entities in . If the original entities within a single set of correspondences have different labels, the newly generated integrated entity will have multiple labels or can hold all labels as a concatenation.
Translator: To construct the initial relations between the entities in , we add axioms to . To do so, we take the axioms of the source ontologies and translate the entities used there to the newly created integrated entities, if applicable (i.e. if the original entity is part of a correspondence set). If an axiom’s entities have a correspondence entity in , the axiom is translated with the generated integrated entity, i.e., the original entity will be replaced with the integrated entity in each axiom.
Unlike the approach in (Noy and Musen, 2003) for merging the individuals, we directly place the individuals on the final merged ontology. Because we assumed that individuals are not included in the correspondences, this step includes the processing of class assertions axioms in order to place (migrate) the individuals to their respective (integrated) classes. Note that this process also included the translation of the constraint axioms. However, in this step, in the case of any conflicts between the source ontologies’ constraint axioms, no conflict solution takes place. It will be carried out in the refinements step (see Section 2.5). Besides, for two correspondence properties, iPrompt (Noy and Musen, 2003) suggests merging their respective classes. In this case, iPrompt infers these classes as the new correspondence set. We differ from this approach, in which the classes of the correspondence properties will not merge if they are not included in the given corresponding sets. We narrow our assumption only on the given mappings and do not infer new corresponding sets.
The initial merge ontology can be used to derive a merge result in a straightforward manner. We do not stop here, though. Rather oerger differs from using alone, by its focus on applying a set of local and global refinements, including, e.g., structural preservation, acyclicity, or constraint and entailment satisfaction to achieve a quality-assured merged ontology.
Partitioning phase
Our method is based on the ontology partitioning. We aim to partition n source ontologies to k blocks. However, for a given ontology and a set of vocabularies (terms) from the ontology, a module extraction mechanism returns a module, supposed to be the relevant part of the original ontology that covers the given vocabularies (see (d’Aquin et al., 2007) for more comparison).
To partition the source ontologies, we use a set of pivot classes . This is inspired by the work in (Deelers and Auwatanamongkol, 2007), where a set of predetermined points has been successfully used in the partitioning method. The partitioning process generates ontologies’ blocks, with the following definition:
A block is a non-empty subset (or whole) of one (or more) source ontologies whose axioms belong to a subset of axioms from source ontologies, and every entity from is at most in one of these subsets.
The number of blocks is denoted by k and is the set of all blocks. In this regard, the ontology merging task is decomposed into a block merging task , where .
Each partitioning method has an objective function or objective goals and criteria to act based on that. In software engineering, the notions of cohesion and coupling have been associated with different aspects of software quality related to good modularisation of software (see (Paixao et al., 2017) for a survey). There, cohesion represents the number of dependencies of elements within the same module, while coupling is determined by counting the number of dependencies of elements within a module on elements outside of that module. The aim is to reach high cohesion and low coupling values. We have adapted this notion for our context by analyzing similarity values provided by the mappings: The more similar elements within a block are, the higher its cohesion. The less similar elements within a block are to elements outside of it, the lower its coupling. Similar to software module metrics, ontology module metrics are designed to quantify ontology modules’ properties. Thus, the objective of the partitioning phase in oerger is to maximize intra-block similarity (cohesion) and minimize inter-block similarity (coupling). This indicates that entities within one block are close to each other in terms of structure, while the entities of different blocks are distant from each other. For each block, a sub-ontology will be created in the intra-combination phase in Section 2.4. Then, intra- and inter-similarity can be measured on the sub-ontologies’ level. We design our partitioning objective function according to this general goal.
In the following, we discuss our approach to finding pivot classes and the partitioning method.
Finding pivot classes
Our method is based on measuring a value for the classes to find . Classes with high values (where is a correspondence set) show high overlap within . Putting them in one block can increase intra-block similarity and achieve our partitioning objective. However, contemplating only this metric tends to choose isolated classes as the pivot classes. To overcome this drawback, the number of connections of each class is taken into account, too. Thus, the largest sets of correspondence classes in corresponding sets that also have a high number of connections are very promising to be considered as . We calculate a quality degree of each class based on the connectivity degree and the cardinality of correspondence classes as given in Equation (1). Thus, is achieved by a sorted list of corresponding sets based on their quality degrees.
The connectivity degree of a class is indicated by the number of associated taxonomic (subClassOf) and non-taxonomic relations for the class, and , respectively. For instance, Fig. 4 shows the taxonomic and non-taxonomic relations for the Reviewer class of the cmt ontology from the Conference track.3
The Reviewer class has 3 taxonomic and 13 non-taxonomic relations. This example demonstrates how a class can be augmented with several relationship types. One can assign different weights for taxonomic or non-taxonomic relations based on the source ontologies’ structure or the user preference. For ontologies with a large number of non-taxonomic relations, when the same weight is assigned to the taxonomic and non-taxonomic relations, the classes with less number of taxonomic relations would be selected as the pivot classes. As a result, blocks with smaller sizes will be generated. We calculate the connectivity degree of a class with user pre-determined weights ( and ), as given by Equation (2) (See Section 3.2.1 for a discussion on how to set the weights).
Taxonomic and non-taxonomic relations for the Reviewer class of cmt ontology.
Partitioner: A structure driven strategy
This step divides all classes from into a set of blocks . For this purpose, we follow a structural-based similarity, in which classes close in the hierarchy are strongly related and should be placed in the same block. Thus, once a class is assigned to a block, the transitive closure of its adjacent classes (on the hierarchy levels of the respective source ontology) consequently will be added. In this regard, the first block is created by the element of which has the highest quality degree. For each correspondence class , where , all classes of , i.e., until with all their adjacent classes on their respective ontologies are added to the block. Then, the next element of is selected to create a new block, if at least one of its classes has not been assigned to the previous blocks. This process is continued until all elements of are processed. Following this process, the overall number of blocks is automatically determined based on the ratio of the number of ’s elements and amount of overlap between ’s elements.
The partitioning process is restricted by two assumptions: (1) if the taxonomy relation of ’s element is null, no block will be created for it. This prevents the creation of very small blocks; (2) if classes are left that do not belong to any block, they will not be added to any, since they do not require any refinement in the block. Therefore, these unconnected classes will be added directly to . Overall, our proposed strategy has two advantages: First, it has low computational complexity since it does not need to run a similarity membership function and scales well into a large number of ontologies with many classes. The acceleration of the partitioning process has been achieved by using the pivot classes. Second, it uses the structural similarity between classes by considering the adjacent relationship between classes. Thus, it increases the intra-block similarity (in terms of hierarchical structure) and significantly reduces the inter-block similarity.
Combining phase
In this phase, the created blocks are combined to generate the final merged ontology. To achieve that, we split the combining process into two steps:
Intra-combination: Independent merge
In this step, all blocks are processed to be merged and refined. Merging the smaller number of blocks reduces the memory allocations compared to merging all source ontologies. This results in a significant reduction of the search space and thus improves efficiency. To further improve performance, the block merging may be performed in parallel. Intra-combination parallelization enables the parallel execution of independently executable merges to use multiple processors for faster processing. Thus, the entities inside the blocks are combined to create local sub-ontologies. This step is required to assign the properties for each created block.
In the previous step, all the classes have been divided into disjoint blocks. However, these blocks cannot be directly used because the property axioms, which connect these classes, are missing. Thus, we need to add the properties of the classes to their respective blocks and construct the relationships between classes. We retrieve all axioms from , which already contains the translated correspondence properties. Thus, each class is augmented by the original or translated properties axioms, including all taxonomy and non-taxonomy relations. So, the taxonomy relations between classes as well as non-taxonomic relations are built for each block. A very simple and yet effective approach is to assign each axiom to a block in which all its entities are contained. To keep the blocks disjoint, each axiom belongs to at most one block. If the classes of an axiom are distributed across multiple blocks, they are not added to any block and are marked as distributed axioms () (see Section 2.7 for an example). Their inclusion is delayed until the next step.
After that, a local refinement process takes place for each sub-ontology. Through our tool, in addition to the final merged ontology, users access the k created local sub-ontologies separately. The usage of sub-ontologies rather than source ontologies has the advantage that the created sub-ontologies concisely contain knowledge about a sub-domain (w.r.t. the knowledge provided by the source ontologies) as they include all similar entities. An additional advantage is that maintaining the source ontologies while keeping the existing mapping between them requires much more effort than keeping the k local sub-ontologies (for which the existing mappings are gathered in one place with limited numbers of mappings between them). So, when local refinements are applied to them, their quality is higher.
Inter-combination: Dependent merge
In the inter-combination step, described below, the global merged ontology is constructed based on the k created local sub-ontologies. For this to be achieved, we follow a sequential merge processing in this step based on the inter-block relatedness, which represents how much two blocks differ from each other. Thus, we calculate the number of shared distributed axioms between two blocks and of to indicate the inter-block relatedness, as shown in Equation (3), where show the distributed axioms of block .
At first, the two blocks and with the highest inter-block relatedness value are merged into . This includes adding all distributed axioms to them. Then, the next block, which has the highest inter-block relatedness value with the recently merged block , will be merged. After merging blocks, the number of distributed axioms between the recent merged one and the remaining blocks will be updated. This process is supported so that the most similar blocks can be executed earlier, and less similar blocks are processed at later steps. We follow here approach taking by COMA (Aumueller et al., 2005) to also match only similar blocks and delay the processing of dissimilar blocks. The sequential execution of merging will be continued until all blocks are processed. While two blocks are by nature disjoint and no shared classes exist on both, the inter-relatedness for them might not always be zero, since this metric is based on the number of distributed axiom between them. If the inter-block relatedness between two blocks is zero, they will be entirely disjoined and will not need any merge process. Thus, they will be imported directly to the .
A set of global refinements will be applied once all blocks have been combined. Upon that, in the last step, the merged ontology is built. The reason to merge the most similar blocks first is that the most similar blocks have much more distributed axioms. Combining these two blocks in the earlier steps is more efficient when the blocks are small. Note that, in each sequential merge, the intermediate merged blocks will get larger. So, it is more efficient in the number of processes to have less processing when the blocks get more massive.
Refinements via Generic Merge Requirements (GMRs)
The Generic Merge Requirements (GMRs) have been first introduced in the Vanilla system (Pottinger and Bernstein, 2003). GMRs are a set of Generic Merge Requirements that the merged ontology is expected to achieve. Later other merge approaches implicitly or explicitly took them into consideration (cf. (Noy and Musen, 2003; Saleem et al., 2008; Jiménez-Ruiz et al., 2009; Ju et al., 2011; Raunich and Rahm, 2014; Mahfoudh et al., 2014; Priya and Kumar, 2019; Priya and Cherukuri, 2019). To provide customizable GMRs through oerger, we surveyed the literature to compile a list of GMRs. See (Babalou and König-Ries, 2019a; Babalou et al., 2020b) for a more detailed explanation of the process and the identified GMRs. Details necessary to understand the evaluation of our approach are provided in Section 3.3.1. We extracted GMRs by studying three different research fields: (1) ontology and model merging methods, (2) ontology merging benchmarks, and (3) ontology engineering domain. This investigation leads to extracting twenty GMRs, summarized in Table 2. For instance, R1 – class preservation means that all classes in source ontologies should be preserved in the merged ontology; R8 – class redundancy prohibition emphasizes that no redundant classes should exist in the merged ontology; and R16 – acyclicity in the class hierarchy means that the merge process should not produce a cycle in the class hierarchy. In (Babalou and König-Ries, 2019a), we detail how the different GMR’s can be achieved, how we implement their achievement in oerger and our approach to dealing with conflicting requirements. For instance, to achieve R1, we add the missing classes to the merged ontology (for more details on GMRs implementation, refer to (Babalou and König-Ries, 2019a)).
To apply local and global refinement within oerger, we use GMRs. In oerger, users can select a subset of GMRs4
For details of applying GMRs, see: http://comerger.uni-jena.de/requirement.jsp.
that the merged ontology should fulfill according to their requirements. With this, oerger provides a flexible parameterizable merge method. In the local and global refinement phases, changes are applied to the (intermediate) merge results to ensure that the chosen requirements are met. Moreover, users can easily adjust this framework so that different refinements perform in intra- and inter-combination steps.
Generic Merge Requirements (GMRs) (Babalou and König-Ries, 2019a; Babalou et al., 2020b)
N-ary merge algorithm
Algorithm 1 describes the proposed n-ary merge method. The algorithm accepts a set of source ontologies with the respective corresponding sets and generates a merged ontology .
First, an empty initial merged model is built (line 1). The source ontologies are parsed into and the map model is built (lines 2–3). The corresponding entities from are integrated in and the axioms are translated (lines 4–5). In the partitioning step, first, the pivot sets are detected (line 6). Then, the initial merge ontology is divided based on to create a set of blocks (line 7). In the combining phase, first, the properties are assigned to blocks (line 8) and the intra-combination process is applied to create k sub-ontologies from the k blocks. This process is followed by applying local refinements (lines 9–10). After that, the created sub-ontologies are combined in the inter-combination step, followed by applying the global refinements to create the merged ontology (lines 11–12). Finally, the merged ontology is returned to the user (line 13).
Example
Figure 5 shows three sample ontologies. The first ontology has 22 axioms, 8 classes, and 3 properties. The second ontology has 27 axioms, 8 classes, and 5 properties. The third ontology has 47 axioms, 13 classes, and 8 properties. The correspondences between the source ontologies (for classes and properties) are shown in Table 3. In this section, we detail the process of merging the given source ontologies and their respective corresponding sets. Each following step corresponds to the line numbers of Algorithm 1.
Step 1: An empty is built.
The n-ary merge algorithm for multiple ontologies in oerger
Three sample source ontologies.
Corresponding pairs between the source ontologies given in Fig. 5
The initial merge ontology for the given source ontologies.
Step 2: The axioms of all three source ontologies are imported in . In this stage, contains all 22 axioms of , 25 axioms of , and 46 axioms of . All these source ontologies as a whole have 19 taxonomic and 32 non-taxonomic relations.
Step 3: We have 9 pairs of corresponding entities. The map model is built with 8 elements. Since pairs and are considered as one joint entry in . Thus, includes: .
Step 4: The corresponding entities are integrated. Thus, includes new 8 integrated entities: , , , , , , . The original entities are deleted from . The max cardinality is 3 here.
Step 5: All axioms from the source ontologies (existing in ) will be translated if their entities exist in . For instance, axiom is replaced with . Figure 6 shows the translated axioms in (22 classes, 15 properties). As a whole, 36 axioms out of 93 in are translated. Indeed, of axioms are translated, since the overlap (number of corresponding entities on the total entities) between the source ontologies is relatively high (24.14). Note that nothing has to be done for those entities (and their axioms) that do not occur in . This is true for all entities that occur in one ontology only and do not have counterparts in one of the others.
A map model and its elements with their quality degree
Step 6: To find the pivot classes, we measure the quality degree of each element of . The cardinality Card, connectivity Conn and the quality degree of each corresponding sets are shown in Table 4, where and . Based on the quality degree, the set of pivot class sets is ordered as: .
Step 7: ’s axioms should be divided into different blocks. To this end, the first element of , which has the highest quality degree, is selected to be placed in the first block . Thus, are placed in block . All is-a connected entities of elements, i.e., connected entities of , , are added to (see in Fig. 7). The next element in is . The entities of have not been added in the previous block () yet. So, a new block should be generated for it. Thus, and with their connected entities construct block (see in Fig. 7). The next elements of , i.e. , and have been already assigned to a block. Thus, no new block will be generated. As a whole, two blocks with 10 and 8 classes are created, respectively. Note that is unconnected on the class hierarchy level. Thus, it is not added to any blocks. Moreover, although Acceptance and Rejection have an is-a connection to Decision, but since Decision is not connected to any other classes, these three classes can not be added to any blocks.
Step 8: Each block is augmented with the properties. Figure 8 shows the blocks with their properties. Properties , and are marked as distributed properties and are not added to any blocks. As a whole, there are no is-a distributed axioms between these two blocks. The two is-a axioms of Decision are marked as unconnected distributed axioms. There are only 3 non-taxonomic distributed axioms.
Step 9: Two sub-ontologies are built for and of . The output generation is based on the user-selected format.
Generating two blocks for the given source ontologies.
Augmenting the blocks with non-taxonomy relations.
Step 10: Both sub-ontologies are checked for the refinements based on the user-selected GMRs. For instance, let us suppose the user selects R1, R15, and R16. In all blocks, these three GMRs are satisfied, so no more refinement takes place.
Step 11: All refined sub-ontologies should be merged to create . The inter-relatedness degree between two blocks is 3. In this step, the distributed axioms are added to the . The classes which could not be added in Step 7 to any blocks are added now to .
Step 12: The is checked for the global refinements. R1 and R16 are fulfilled in the merged ontology. However, R15 is not satisfied. Property has multiple domains, so the oneness refinement R15 is applied. Thus, as a possible solution within (see more detail in (Babalou et al., 2020b)), we create a new class 5
Among existing solutions, we select this one (for more detail see (Babalou and König-Ries, 2019a)). For conflict management of applying different GMRs, see our paper (Babalou et al., 2020b).
as the union of all its domains ( and ). We then add this new single class as a domain of property . As a whole, 1 global refinement action has been done here and no local refinement.
Step 13: The final is saved based on the user-selected output-format and returned to the user. has 23 classes, 11 properties, and 93 axioms.
Experimental evaluation
To validate the applicability of the proposed approach, we conducted a series of experiments utilizing different sets of ontologies to analyze quality, runtime, and complexity performance (see Section 3.3). The proposed approach has been implemented in oerger (Babalou et al., 2020a), a tool that is publicly available on http://comerger.uni-jena.de/ and distributed under an open-source license along with the merged ontologies. oerger allows the user to load the source ontologies in OWL format. The mapping between the source ontologies can be automatically determined by an embedded matcher, or it can be provided by the user in RDF format. The merged ontology created by oerger can be in RDF/XML or OWL/XML format.
In the following sections, we present the datasets used, describe experimental environments, and report on the results.
Datasets
To evaluate the general applicability of our approach, we have aimed to use a wide variety of datasets both in terms of subject and in terms of size in our experiments. We have selected sets of ontologies6
https://bioportal.bioontology.org/; accessed at 01.10.2019.
in the domains of biomedicine (), and health (), the union of both () as well as combination of several subdomains () (See Table 5). Our dataset includes a variety of ontologies with different axioms sizes () and numbers of source ontologies (). We conducted our tests with two different types of correspondences: (i) a perfect mapping from the OAEI benchmark and BioPortal’s mappings, (ii) an imperfect mapping which is produced by an ontology matching system, SeeCOnt (Algergawy et al., 2015). While the first shows the general potential of the approach, the latter shows its applicability in a realistic setting where typically no perfect mapping is available. The perfect mappings have confidence value 1, while the imperfect mappings consider the confidence value larger than a given threshold. For our test, we considered the confidence value that was used in the original publication. Note that these mappings show only a class is mapped to another class, and a property is mapped to another property.
Dataset statistics (–). Number n of source ontologies with their axioms size
Test setting
All the experiments were carried out on Intel core i7 with 12 GB internal memory on Windows 10 with Java compiler 1.8. In this section, we present: (1) which values for the parameters have been set, (2) how we built the binary merged ontologies, (3) how we created different versions of the merged ontologies, and (4) which refinements we used in the creating of merged ontologies.
Adjusting parameters
In our experimental results, the values of and were empirically determined to 0.75 and 0.25, respectively, but we make no claim that these are optimal values. The reason for choosing these values is that the tested ontologies are augmented with a large number of non-taxonomic relations (see Table 6). When the weight of non-taxonomic relations is as important as the taxonomic relations, the classes with less number of taxonomic relations would be selected as the pivot classes. Note that, assigning classes to the blocks is carried out based on the taxonomic relations. In this view, when a class with fewer taxonomic relations is considered as the pivot of the block, then only a few classes will be added to that block. As a result, blocks will be smaller, and the number of created blocks will be bigger and even more than the number of source ontologies. Thus, we set a higher weight to taxonomic relations.
We also performed a test for all values of and in the range [0,1] with distance 0.25, where . The results of different weights on the number of blocks are shown in Fig. 9. The datasets which are not shown, achieved for all weight settings. The number of blocks depends on the characteristics of ontologies. Thus, in Table 6, we show the number of taxonomic and non-taxonomic relations of merged ontologies in each dataset. Given the characteristics that they have, the number of blocks is acceptable.
Effect of taxonomic weight and non-taxonomic weight on the number of blocks k.
Number of taxonomic and non-taxonomic relations for the datasets
Adjusting binary methods
Binary strategies allow the merging of two ontologies at a time (Batini et al., 1986). They are called ladder strategies when a new ontology is integrated with an existing intermediate result at each step. A binary strategy is balanced when the ontologies are divided into pairs at the start and are integrated in a symmetric fashion. Let us consider an example when , , , and are source ontologies.
In the balanced-binary merge, first, and are merged, which results in an intermediated merged ontology namely . Then and are merged, which creates . After that, the created and are merged to produce the final result.
In the ladder-binary merge, first, and are merged. Then, is merged with , which results in creating . After that, is merged with to create the final result.
For our evaluation we implemented these binary approaches, following the mentioned procedures.
Building different versions of merged ontologies
We evaluated our approach under different conditions – (see Table 7):
Using the perfect mapping () versus an imperfect mapping ()
Applying (✓) the local refinement process or not (×)
Applying (✓) global refinements or not (×)
We follow these conditions for n-ary, balanced, and ladder merge strategies. Thus, we generated eight versions (–) of the merged ontology using the n-ary method, four versions (–) of binary balanced, and four versions (–) of the binary ladder. Since, at each step of the merge process, the mapping for the created intermediate merged result and one of the source ontologies is generated on the fly with the ontology matching tool, we consider the imperfect mapping, only.
Adjusting refinements
To apply local and global refinements, we select a subset of GMRs (R1–R3, R7, R15, R16, R19) from (Babalou and König-Ries, 2019a). R1, R2, R3, and R7 are related to class, property, instance, and structure preservation, respectively. R15 restricts properties without multiple domains or ranges, so-called oneness characteristics. R16 is relevant to class acyclicity and R19 expresses the degree of connectivity in merged ontology. We use these criteria to observe how well the merged ontologies are structured. The remaining GMRs do not have special effect on our datasets, so we do not present them here.
The settings for generating twelve variants of the merged ontologies
Experimental results
In the first test, we observe the characteristics of the n-ary merged ontologies. In the second test, we analyze the constructed logic of the merged ontology by answering a group of Competency Questions (CQs). Comparing binary merge and n-ary methods is demonstrated in the third test. We compare our approach with the existing n-ary one in the last test. For an inconsistency test (related to the model’s entailment), we refer to our previous work (Babalou and König-Ries, 2019b). To improve readability in Fig. 10, Fig. 11 and Table 9, we show the results on some of the datasets only. Our discussion takes the full set of experiments into account, though. The corresponding results are available in our repository9
Class C, property P, and instance I coverage with the number of unpreserved structure of eight versions of n-ary merge for 3 sample datasets.
Number of local and global refinements, oneness , unconnected classes and cycle of eight versions of n-ary merge.
Characteristics of the N-ary merged result
To evaluate the characteristics of the created , we use three evaluation criteria categories:
Integrity: in (Duchateau and Bellahsene, 2010), the integrity of a merged ontology is defined as its compactness, completeness, and redundancy. Compactness represents the size of the merge result.10
It is not presented in this paper, but is available in our repository and in the appendix.
Coverage (or so-called completeness) is the percentage of entities present in the that are included in . This includes classes C, properties P, instances I, and the structurality coverage. The latter one refers to preserving the structure of the merged ontology w.r.t. source ontologies. These metrics are related to the evaluation of R1–R3, and R7 from (Babalou and König-Ries, 2019a). Note that during the merge process, classes, properties, and instances may be deleted or added to meet GMRs. Redundancy checks whether redundant entities appear in . Since we found no redundant entities in any of the created versions of the merged ontologies, we do not include this metric in results.
There is a difference between duplicated or redundant entities. Duplicated are those entities (with the same name and characteristics), which are repeated more than one time in the ontologies. For example, class Paper appears two times in the ontology. In the implementation of oerger, classes in OWL ontologies are defined as SET, which by nature, contains no repeated elements. Thus, in all tested ontologies, there are no duplicated entities. On the other hand, redundant means that two entities (possibly with different names) are the same real entity in the world. For example, classes Abstract and PaperAbstract are referring to one real entity in the world. So, they are the same entities with different names. If an ontology contains these two classes, it has redundancy. One way to detect the entities referring to the same real entity in the world is the mappings given by experts or generated by an ontology alignment system. Given the alignment between source ontologies, the mapped entities should not appear separately in the merged ontologies. If so, we call them redundant entities. Since for a group of correspondence entities, our approach merges them into an integrated entity, there are no redundant entities in all tested ontologies, too.
Evaluation of applying the GMRs: we evaluate to which extent the refinements play a role. For R15 (oneness), we count the number of properties that have multiple domains or ranges and present it as . For R19 (connectivity), we consider only those unconnected classes in the which were connected in the given by . For R16 (acyclicity), we calculate how many cycles in the class hierarchy in the exist ().
Merge process characteristic: we address the characteristic of the merge process by measuring the number of created blocks k, percentage of distributed is-a () and the translated () axioms on the total axioms, number of local and global refinement actions in intra- and inter-combinations.
In Fig. 10, we show the degree of information preservation on the six versions of our n-ary merge method. The percentage of class coverage is shown on the chart, while the percentage of the property , instance coverage, and the absolute number of unpreserved structure are drawn in the table view under each chart. In Fig. 11, we show the number of local and global refinement actions. In this figure, we also present the statistics about the evaluation of selected GMRs with oneness , unconnected classes , and cycle .
To analyze the result of these two figures, we examine the result of different versions. By investigating the effect of considering no refinements () compared to applying refinements either locally or globally (–, –), we can conclude that applying refinement leads to better class coverage (in ), better preserved structure (in and ), more oneness (in , , , –, , and ), and less cycles (in and ). For example, if no refinement is applied, 5 classes became unconnected in , or 21 properties with multiple domains and ranges in are generated, or 13 cycles exist in , or 71 unpreserved structures happen in . Moreover, comparing applying only local () or global () refinements shows that when only global refinements are applied, we have better class coverage (cf. , , or ), better preserved structure (cf. or –), better oneness (cf. –, or ), fewer cases of unconnected classes (cf. –), and fewer cases of cycle (cf. , , or –). The superiority of applying only local refinements rather than only global refinements cannot be observed in the view of the mentioned criteria. However, applying refinement actions in the local or global level have different computational complexity, since the respective search spaces substantially differ. For instance, finding or repairing a cycle in a small set of classes (local sub-ontologies) is far less expensive than among all classes of the merged ontology.
In comparing usage of perfect (–) or imperfect (–) mappings, based on our tested matching tool, we observe that out of 12 datasets, a perfect mapping causes 5 fewer cases of unconnected entities (e.g., in and ); 3 fewer cases of cycles (e.g., in ); 7 fewer cases of properties oneness (e.g, in ); 4 cases preserving better structure (e.g., in and ); 3 fewer cases of local refinements (e.g., in ); 9 fewer cases of global refinements (e.g., in , , and ). On the other hand, using imperfect mapping causes 1 fewer case of unconnected entities in ; 6 fewer cases of cycles (e.g., in and ); 4 fewer cases of local refinements (e.g., in ); and 2 fewer cases of global refinements (e.g., in ).
Comparing translated axioms with correspondence entities in and .
Number of blocks k vs. class overlap , max cardinality Card, and distributed axioms .
Figure 12 shows the percentage of translated axioms for all datasets along with the number of correspondence entities in perfect or imperfect mapping. Overall, using perfect or imperfect mappings with different numbers of correspondence entities has a direct effect on the number of translated axioms.
Figure 13 demonstrates the number of created blocks k for all datasets using perfect (–) or imperfect mappings (–). The value of k affects the number of distributed axioms. Thus, we also report the percentage of the distributed taxonomic (is-a) axioms on the total axioms . These axioms mostly relate to the axioms with objectUnionOf, where the union classes are distributed over the blocks. Determining k in our method mainly depends on the amount of overlap between ’s classes and the cardinality Card value on correspondence classes. The overlap is calculated by the ratio of the numbers of correspondence classes on the total classes. For each dataset, we show the maximum cardinality between the correspondence entities. Considering and Card, the values of k in our datasets are reasonable (), which shows the feasibility of our approach. Moreover, for large n (see Table 5), k tends to be very small.
Answering CQs on the different versions of merged ontologies
Answering competency questions
Competency Questions (CQs) are a list of questions used in the ontology development life cycle, which an ontology should answer. By using Competency Questions tests, we aim to observe which created can provide superior answers to the CQs. To this end, we used a set of CQs11
(30 CQs) in the conference domain, a combination of yes/no questions with WH-questions. Each CQ has been converted manually to a SPARQL query and run against the source ontologies and the different versions of the merged ontologies (our datasets in the conference domain). We compare the CQ-results for each dataset with all possible answers from the with respect to its on that dataset. The complete answer indicates a full answer. Among all answers of the , if the number of found answers in is higher (lower) than the number of not found, we marked it as a semi-complete (partial) answer. An answer is marked as wrong if CQ on the does not return the same answer as the source ontologies, e.g., false instead of true, or shows the wrong hierarchy. If CQ’s entities exist in the ontology, but no further knowledge exists about them, we mark them by a null answer. If the ontology does not have any knowledge about the CQ, we indicate this by an unknown answer. The results are presented in Table 8, where the values are shown as the percentage of the total number of CQs. Values in boldface show the best result (highest values in complete, semi-complete, partial, and total-complete answers and lowest values in wrong, unknown, and null answers) in each dataset. The last column shows a sum value on the complete, semi-complete, and partial answers given by the total correct answer.
Overall the result in Table 8 indicates that applying local or global refinements in some cases can provide more complete answers (cf. ), more semi-complete (cf. , ), and less null answers (cf. in ). Using perfect mappings causes more complete answers (cf. , ), less wrong answers (cf. ), and less unknown answers (cf. ). On the other hand, imperfect mappings have more complete and partial answers in . In most cases, ladder and balanced binary approaches produce the same final merged ontologies w.r.t. answering CQs. Comparing the n-ary and binary strategies reveals that the n-ary merge can achieve the same quality result as binary methods, and even better results in in terms of achieving more complete answers rather than binary approaches. Summing up, the test shows that the merged ontologies can provide the comprehensive answers w.r.t. source ontologies. The merged ontologies in , built based on all source ontologies from the conference domain, could achieve 100% total correctness.
Comparing n-ary (N), balanced (B) and ladder (L) merge strategies with the number of correspondence entities , translated axioms , global refinements , merge processes
Binary vs. N-ary
While the CQ test shows that the result quality of the n-ary approach can compete with the binary approaches, in the third test, we compare performance metrics. We conduct an experimental test by a series of binary merges on the eight datasets that have more than two source ontologies. We examine the runtime performance and required operations. Table 9 shows the difference of operation complexities between these three strategies, where for legibility, only some datasets in comparing of n-ary (), balanced (), and ladder () are shown. However, we base our interpretation on all eight datasets with versions –. The number of total correspondence entities during the whole process of the merge is quite different. In each test of a binary merge, only the correspondences between two entities can be integrated. However, in n-ary merge, the correspondence entities from multiple source ontologies can be integrated simultaneously into the new entity. For this reason, the number of correspondence entities in the binary merge in 7 out of 8 datasets is much higher than the n-ary approach. Consequently, the required amount of time to combine them into new entities and translate their axioms is high in all tested datasets. In , e.g., the number of translated axioms in the n-ary method is 1310, while in the binary-ladder strategy it is 3462. Therefore, the n-ary approach has great speed-up.
We compare the number of required refinement actions in Table 9. In 7 out of 8 datasets, the n-ary approach requires fewer refinement actions compared to binary merges. For instance, in , the n-ary method runs 977 actions, while the binary-balanced method runs 1533 actions. The same conclusion can be derived from comparing the required local refinements. We also present the number of merge processes by in the table. While the n-ary approach only uses one iteration for all tests, ladder and balanced methods require merge process, e.g., in , 54 times the whole process of the merge should be run.
We demonstrate the method’s scalability by illustrating the performance test results. Here, the runtime performance is evaluated based on the number of ontologies versus the required time for the merge process in n-ary (), balanced (), and ladder (). Figure 14 shows the total runtime in seconds for the merge processes. If , there is no difference between n-ary and binary. So, we show the result for . We ran each test 10 times and presented the average values. The processing time of the binary merge does not include the time for creating the respective mappings. In this test, we present the number of source ontologies and their axioms, to show the dependency on the number and size of source ontologies w.r.t. merge processing time. As a whole, when the number and size of source ontologies are increased, the merge processing time will also increase. These results show that the n-ary merge is on average 4 (9) times faster than the balanced (ladder) binary merge, respectively. This concludes that using n-ary rather than binary methods is more valuable and effective when the number of ontologies gets higher. For example, in with 3 source ontologies, n-ary is 3 times faster than binary strategies, in with , n-ary is 4 times faster than both binary approaches, and in with , it is 31 times faster than the binary ladder.
Runtime performance: numbers of source ontologies and axioms vs. time for the merge process in second. Binary merges run on datasets with .
Overall our results show that the n-ary strategy achieves comparable results in terms of quality but outperforms binary approaches in terms of runtime and complexity.
N-ary vs. N-ary
Osman et al. (Osman et al., 2021) recently introduced an n-ary merge approach. They implemented two different versions: AROM replaces corresponding entities with new ones in the merged ontologies; OIAR follows a simple merge without replacing corresponding entities. We have compared the AROM version with Version 1 of our dataset in the CQ tests. The result shows that overall, both methods produce comparable results. oerger has better results w.r.t the complete answers. Table 10 compares the CQ test of the merged ontologies by AROM (Osman et al., 2021) and V1 of our datasets.
Comparing oerger () with AROM merged ontologies (Osman et al., 2021) in the CQ test
State of the art
Merging strategies basically have been divided into two main categories (Batini et al., 1986): “binary” and “n-ary”. The binary approach allows the merging of two ontologies at a time, while the n-ary strategy merges n ontologies () in a single step. To deal with merging more than two ontologies, the binary strategy has a quadratic complexity of merging process in terms of involved ontologies. However, in the n-ary strategy, the number of merging steps is minimized. Most methodologies in the literature, such as (Noy and Musen, 2003; Jiménez-Ruiz et al., 2009; Guzmán-Arenas and Cuevas, 2010; Ju et al., 2011; Raunich and Rahm, 2014; Mahfoudh et al., 2014; Zhang et al., 2017; Fahad, 2017; Makwana and Ganatra, 2018; Priya and Cherukuri, 2019; Priya and Kumar, 2019) agree on adopting a binary strategy due to the simplicity of the search space. Applying a series of binary merges to more than two ontologies is not sufficiently scalable and viable for a large number of ontologies (Rahm, 2016). The existing n-ary approaches (Saleem et al., 2008; Maiz et al., 2010) deal with merging multiple ontologies in a single step, however, each of these systems suffers certain drawbacks. In (Saleem et al., 2008), the final merge result depends on the order in which the source tree-structured XML schemas are matched and merged. In (Maiz et al., 2010), the experimental tests were carried out on a few small source ontologies, only. Porsche (Saleem et al., 2008) does not target ontologies but XML schemas. Despite the efforts of these research studies, developing an efficient, scalable n-ary method has not been practically applied and still is one of the crucial challenges. Osman et al. (Osman et al., 2021) recently introduced an n-ary merge approach. We have compared our method with their proposed one. The result shows that both methods can compete with binary approaches.
Merging ontologies either in a binary or n-ary approach can be categorized into two different strategies: “one-level merge” and “two-level merge”. In the latter one, an intermediate merge result is produced at the first level. Then, in the second phase, the intermediate result is refined to generate a final merge result. In contrast, one-level merge approach (Noy and Musen, 2003; Ju et al., 2011; Priya and Cherukuri, 2019; Osman et al., 2021) creates the merge result in one incrementally processing step by considering the effect of the previous combined entities. In the two-level merge approaches (Saleem et al., 2008; Jiménez-Ruiz et al., 2009; Maiz et al., 2010; Guzmán-Arenas and Cuevas, 2010; Raunich and Rahm, 2014; Mahfoudh et al., 2014; Zhang et al., 2017; Fahad, 2017; Makwana and Ganatra, 2018; Priya and Kumar, 2019), a set of refinements is carried on the intermediated result. For instance, applying a set of GMRs in ATOM (Raunich and Rahm, 2014), utilizing granular processing in GCBOM (Priya and Kumar, 2019), and considering source ontologies’ restrictions in OM (Guzmán-Arenas and Cuevas, 2010) have been performed. The output of the second level is called merged ontology in literature. Whereas, the outcome of the first level comes under different names, such as an integrated concept graph in ATOM (Raunich and Rahm, 2014), network-based knowledge model in OIM-SM (Zhang et al., 2017), common ontology graph in GROM (Mahfoudh et al., 2014), or intermediate schema in PORSCHE (Saleem et al., 2008).
Here, we also review briefly the evaluation approaches of the ontology merging systems. To this end, we use the classification of ontology evaluation from (Brank et al., 2005; Hlomani and Stacey, 2014; Raad and Cruz, 2015). In each, we present how the existing ontology merging systems evaluated their merged result.
Gold-standard comparison: The tool-created merged ontology has been compared against a human-created one in a couple of ontology merging systems such as (Jiménez-Ruiz et al., 2009; Guzmán-Arenas and Cuevas, 2010; Zhang et al., 2017) and in the ontology merging benchmarks (Raunich and Rahm, 2012; Mahfoudh et al., 2016). The comparison has been performed in terms of the size of the merged result in (Guzmán-Arenas and Cuevas, 2010; Zhang et al., 2017), the time of processing in (Zhang et al., 2017), and the entailment satisfaction in (Jiménez-Ruiz et al., 2009). The ontologies in these tests were small in size, and only a few pairs of ontologies have been evaluated. Two benchmarks for the ontology merging domain are introduced in (Raunich and Rahm, 2012; Mahfoudh et al., 2016). Benchmark (Raunich and Rahm, 2012) includes simple taxonomies, and only the number of paths and concepts of the tool-created merged ontology with the human-created ones are evaluated. Other criteria on the properties are not considered. Thus, it could not be extended for non-taxonomy ontologies. To our knowledge, this benchmark is not publicly available, so others are unable to use it. Benchmark (Mahfoudh et al., 2016) included a few small ontologies. The authors presented criteria, achieved by their tool (Mahfoudh et al., 2014) without any comparison with human results.
Application or task-based evaluation: To the best of our knowledge, the evaluation of the merged ontology in the context of an application or a use-case scenario has not been covered in the literature.
User-based evaluation: In (Noy and Musen, 2003), the authors provided a platform for user-based evaluation in the ontology merging scenario. They analyzed the extent to which the users agree with the tool’s suggestions. Thus, it is mainly related to the merge method’s evaluation, not on the merge result. Moreover, the ontologies in this evaluation are small, and mostly only a few pairs of ontologies are evaluated by the users. The task of user-based evaluation generally is a labor-intensive task for humans, and insufficient, especially for large-scale ontologies or a large number of source ontologies.
Data-driven evaluation: This approach compares the ontologies with existing data about the respective domain to evaluate how far an ontology sufficiently covers a given domain. To the best of our knowledge, this approach is also barely used. Some semi-related attempts, such as (Livingston et al., 2015) have been proposed, where a set of corpora and ontologies are merged to build a merged ontology. Their evaluation focuses only on query processing analysis. However, data-driven evaluation requires analyzing how well the created merged ontology covers a topic of the domain-corpus or how well it fits the domain knowledge by comparing ontology concepts. These issues are not covered in (Livingston et al., 2015).
Criteria-based evaluation: The authors in (Raunich and Rahm, 2014; Priya and Kumar, 2019; Priya and Cherukuri, 2019) considered the evaluation of the merged ontology’s size or compactness (introduced metric in (Raunich and Rahm, 2012)). In addition, in (Raunich and Rahm, 2014), the number of leaf paths is examined. In other researches (Saleem et al., 2008; Makwana and Ganatra, 2018; Osman et al., 2021), two more criteria, namely coverage and redundancy (introduced criteria in (Duchateau and Bellahsene, 2010; Raunich and Rahm, 2012)), have been considered. Moreover, in CreaDo (Ju et al., 2011), the authors reported basic statistics about a few common pitfalls taken from (Poveda Villalon et al., 2010) related to the general design of the ontology.
Overview of existing ontology merging systems: merge strategy (binary or n-ary); merge type (one-level or two-levels); fulfilled GMRs (full list of GMRs available in (Babalou and König-Ries, 2019a; Babalou et al., 2020b)); evaluation technique of the merged result
Table 11 shows a summary of existing ontology merge systems in four main features introduced in the previous sections. The merge strategy (binary or n-ary) and the type of merge (one-level or two-levels) for each approach are specified in columns 3–4. Column 5 indicates the GMRs that are fulfilled implicitly or explicitly by each ontology merge approach. The last column illustrates the evaluation technique used in each system in order to assess the quality of the merged result. The × sign in this column shows no evaluation has been provided by the ontology merge system. The last row of Table 11 reports the characteristics of oerger, the proposed method of this paper. oerger can apply all GMRs based on the user’s wish, however, in the experimental test of this paper, we have used R1–R3, R7, R15, R16, and R19 as example.
Conclusion
Ontology merging is frequently needed. Existing approaches scale rather poorly to the merging of multiple ontologies. We proposed the n-ary multiple ontologies merging approach oerger, to overcome this issue. Efficiency is achieved by breaking processing of n ontologies into merging of k blocks, with a minor overhead in the dividing process (see Section 2.3 for details). The tool, that we built for this purpose (described in Babalou et al. (2020a)), can provide a parameterizable merge platform allowing users to influence the merge result interactively. Our evaluation results show future research agenda, including evaluating strategies for such user interactions, adapting our approach to merging data on the schema-level of Linked Open Data (LOD) scenarios, and taking advantage of the parallelization potential of the approach. With the rapidly growing importance of (ABox-heavy) knowledge graphs, extending the approach to consider instance mapping, also is a promising avenue of future research.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Acknowledgements
S. Babalou has been supported by a scholarship from German Academic Exchange Service (DAAD).
Appendix
Table 12 shows the dataset statistics. The number of source ontologies n in each dataset is shown in column 2. The source ontologies’ name and the axioms’ size are shown in columns 3–4. The number of correspondence classes and properties for imperfect and perfect mappings has been presented in columns 5–8, respectively.
Table 13 and Table 14 show the characteristics of the merged ontologies on the OAEI and BioPortal datasets, respectively. Parts of these two tables have already shown in Fig. 10 and Fig. 11 in the main manuscript.
Table 15, Table 16, Table 17, and Table 18 compare the n-ary, balanced, and ladder merge strategies for different versions of merged ontology based on the test setting of Section 3.2 in the main manuscript. Table 15 shows the results when both local and global refinements are applied. Table 16 shows the results when only local refinements are applied. Table 17 shows the results when only global refinements are applied, which part of it has already been presented in the main manuscript. Table 18 shows the results when no refinement is applied. In all tables, we show the results for those datasets which have more than two source ontologies.
References
1.
Algergawy, A., Babalou, S., Kargar, M.J. & Davarpanah, S.H. (2015). SeeCOnt: A new seeding-based clustering approach for ontology matching. In Advances in Databases and Information Systems. Chapter 17. doi:10.1007/978-3-319-23135-8.
2.
Algergawy, A., Faria, D., Ferrara, A., Fundulaki, I., Harrow, I., Hertling, S., Jimenez-Ruiz, E., Karam, N., Khiat, A., Lambrix, P., Li, H., Montanelli, S., Paulheim, H., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Thiéblin, E., Trojahn, C., Vataščinová, J., Zamazal, O. & Zhou, L. (2019). Results of the ontology alignment evaluation initiative 2019. In CEUR Workshop Proceedings (Vol. 2536, pp. 46–85).
3.
Aumueller, D., Do, H.-H., Massmann, S. & Rahm, E. (2005). Schema and ontology matching with COMA++. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data – SIGMOD’05 (pp. 906–908). doi:10.1145/1066157.1066283.
4.
Babalou, S., Grygorova, E. & König-Ries, B. (2020a). CoMerger: A customizable online tool for building a consistent quality-assured merged ontology. In 17th Extended Semantic Web Conference (ESWC’20), Poster and Demo Track.
5.
Babalou, S., Grygorova, E. & König-Ries, B. (2020b). What to do when the users of an ontology merging system want the impossible? Towards determining compatibility of generic merge requirements. In International Conference on Knowledge Engineering and Knowledge Management (pp. 20–36). Springer. doi:10.1007/978-3-030-61244-3_2.
6.
Babalou, S. & König-Ries, B. (2019a). GMRs: Reconciliation of generic merge requirements in ontology integration. In SEMANTICS Poster and Demo Track.
7.
Babalou, S. & König-Ries, B. (2019b). A subjective logic based approach to handling inconsistencies in ontology merging. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (pp. 588–606). Springer. doi:10.1007/978-3-030-33246-4_37.
8.
Batini, C., Lenzerini, M. & Navathe, S.B. (1986). A comparative analysis of methodologies for database schema integration. ACM Computing Surveys (CSUR), 18(4), 323–364. doi:10.1145/27633.27634.
9.
Brank, J., Grobelnik, M. & Mladenic, D. (2005). A survey of ontology evaluation techniques. In Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005) (pp. 166–170). Ljubljana, Slovenia: Citeseer.
10.
Caldarola, E.G. & Rinaldi, A.M. (2016). An approach to ontology integration for ontology reuse. In IEEE 17th International Conference on Information Reuse and Integration (IRI) (pp. 384–393). doi:10.1109/IRI.2016.58.
11.
d’Aquin, M., Schlicht, A., Stuckenschmidt, H. & Sabou, M. (2007). Ontology modularization for knowledge selection: Experiments and evaluations. In International Conference on Database and Expert Systems Applications (pp. 874–883). Springer. Chapter 85. doi:10.1007/978-3-540-74469-6_85.
12.
Deelers, S. & Auwatanamongkol, S. (2007). Enhancing K-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. International Journal of Computer Science, 2(4), 247–252.
13.
Dooley, D.M., et al. (2018). FoodOn: A harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food, 2. doi:10.1038/s41538-018-0032-6.
14.
Duchateau, F. & Bellahsene, Z. (2010). Measuring the quality of an integrated schema. In Conceptual Modeling – ER (pp. 261–273). Chapter 19. doi:10.1007/978-3-642-16373-9_19.
Fahad, M. (2017). Merging of axiomatic definitions of concepts in the complex OWL ontologies. Artificial Intelligence Review, 47(2), 181–215. doi:10.1007/s10462-016-9479-5.
17.
Finke, M.T., Filice, R.W. & Kahn, C.E.Jr. (2019). Integrating ontologies of human diseases, phenotypes, and radiological diagnosis. Journal of the American Medical Informatics Association, 26(2), 149–154. doi:10.1093/jamia/ocy161.
18.
Gruber, T.R., et al. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220. doi:10.1006/knac.1993.1008.
19.
Guzmán-Arenas, A. & Cuevas, A.-D. (2010). Knowledge accumulation through automatic merging of ontologies. Expert Systems with Applications, 37(3), 1991–2005. doi:10.1016/j.eswa.2009.06.078.
20.
Hlomani, H. & Stacey, D. (2014). Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal, 1(5), 1–11.
21.
Hu, W., Qu, Y. & Cheng, G. (2008). Matching large ontologies: A divide-and-conquer approach. Data & Knowledge Engineering, 67(1), 140–160. doi:10.1016/j.datak.2008.06.003.
22.
Jiménez-Ruiz, E., Agibetov, A., Samwald, M. & Cross, V. (2018). We divide, you conquer: From large-scale ontology alignment to manageable subtasks with a lexical index and neural embeddings. In CEUR Workshop Proceedings, 2288, 13–24.
23.
Jiménez-Ruiz, E., Grau, B.C., Horrocks, I. & Berlanga, R. (2009). Ontology integration using mappings: Towards getting the right logical consequences. In ESWC (pp. 173–187). Springer. Chapter 16. doi:10.1007/978-3-642-02121-3_16.
24.
Ju, S.P., Esquivel, H.E., Rebollar, A.M., Su, M.C., et al. (2011). CreaDO – A methodology to create domain ontologies using parameter-based ontology merging techniques. In 10th Mexican International Conference on Artificial Intelligence (pp. 23–28). IEEE.
25.
Livingston, K.M., Bada, M., Baumgartner, W.A. & Hunter, L.E. (2015). KaBOB: Ontology-based semantic integration of biomedical databases. BMC bioinformatics, 16(1), 1. doi:10.1186/s12859-015-0559-3.
26.
Mahfoudh, M., Forestier, G. & Hassenforder, M. (2016). A benchmark for ontologies merging assessment. In Knowledge Science, Engineering and Management (pp. 555–566). Chapter 44. doi:10.1007/978-3-319-47650-6_44.
27.
Mahfoudh, M., Thiry, L., Forestier, G. & Hassenforder, M. (2014). Algebraic graph transformations for merging ontologies. In Model and Data Engineering (pp. 154–168). Springer. Chapter 16. doi:10.1007/978-3-319-11587-0_16.
28.
Maiz, N., Fahad, M., Boussaid, O. & Bentayeb, F. (2010). Automatic ontology merging by hierarchical clustering and inference mechanisms. In Proceedings of I-KNOW (pp. 1–3).
29.
Makwana, A. & Ganatra, A. (2018). A known in advance, what ontologies to integrate? For effective ontology merging using K-means clustering. International Journal of Intelligent Engineering and Systems, 11, 72. doi:10.22266/ijies2018.0831.08.
30.
Noy, N.F. & Musen, M.A. (2003). The PROMPT suite: Interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies, 59(6), 983–1024. doi:10.1016/j.ijhcs.2003.08.002.
31.
Osman, I., Pileggi, S.F., Yahia, S.B. & Diallo, G. (2021). An alignment-based implementation of a holistic ontology integration method. MethodsX, 8, 101460. doi:10.1016/j.mex.2021.101460.
32.
Otero-Cerdeira, L., Rodríguez-Martínez, F.J. & Gómez-Rodríguez, A. (2015). Ontology matching: A literature review. Expert Systems with Applications, 42(2), 949–971. doi:10.1016/j.eswa.2014.08.032.
33.
Paixao, M., Harman, M., Zhang, Y. & Yu, Y. (2017). An empirical study of cohesion and coupling: Balancing optimization and disruption. IEEE Transactions on Evolutionary Computation, 22(3), 394–414. doi:10.1109/TEVC.2017.2691281.
34.
Pottinger, R.A. & Bernstein, P.A. (2003). Merging models based on given correspondences. In Proceedings 2003 VLDB Conference (pp. 862–873). Elsevier. doi:10.1016/B978-012722442-8/50081-1.
35.
Poveda Villalon, M., Suárez-Figueroa, M.C. & Gómez-Pérez, A. (2010). A double classification of common pitfalls in ontologies. In Workshop on Ontology Quality (OntoQual 2010). Co-located with EKAW. Informatica.
36.
Priya, M. & Cherukuri, A.K. (2019). A novel method for merging academic social network ontologies using formal concept analysis and hybrid semantic similarity measure. Library Hi Tech, 38, 399. doi:10.1108/LHT-02-2019-0035.
37.
Priya, M. & Kumar, C.A. (2019). An approach to merge domain ontologies using granular computing. Granular Computing, 6, 1–26. doi:10.1007/s41066-019-00193-3.
38.
Raad, J. & Cruz, C. (2015). A survey on ontology evaluation methods. In Proceedings of the 7th International Joint Conference on Knowledge Discovery. Knowledge Engineering and Knowledge Management. doi:10.5220/0005591001790186.
39.
Rahm, E. (2016). The case for holistic data integration. In Advances in Databases and Information Systems (pp. 11–27). Chapter 2. doi:10.1007/978-3-319-44039-2_2.
40.
Rahm, E. & Bernstein, P.A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10(4), 334–350. doi:10.1007/s007780100057.
41.
Raunich, S. & Rahm, E. (2012). Towards a benchmark for ontology merging. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (Vol. 7567, pp. 124–133). Chapter 20. doi:10.1007/978-3-642-33618-8_20.
42.
Raunich, S. & Rahm, E. (2014). Target-driven merging of taxonomies with ATOM. Information Systems, 42, 1–14. doi:10.1016/j.is.2013.11.001.
43.
Saleem, K., Bellahsene, Z. & Hunt, E. (2008). Porsche: Performance oriented schema mediation. Information Systems, 33(7), 637–657. doi:10.1016/j.is.2008.01.010.
44.
Shvaiko, P. & Euzenat, J. (2011). Ontology matching: State of the art and future challenges. IEEE Transactions on knowledge and data engineering, 25(1), 158–176. doi:10.1109/TKDE.2011.253.
45.
Zalamea Patino, O.P., Van Orshoven, J. & Steenberghen, T. (2018). Merging and expanding existing ontologies to cover the Built Cultural Heritage domain. Journal of Cultural Heritage Management and Sustainable Development, 8(2), 162–178. doi:10.1108/JCHMSD-05-2017-0028.
46.
Zhang, L.-Y., Ren, J.-D. & Li, X.-W. (2017). OIM-SM: A method for ontology integration based on semantic mapping. Journal of Intelligent & Fuzzy Systems, 32(3), 1983–1995. doi:10.3233/JIFS-161553.