Abstract
Ontology, which is regularly used in computer information retrieval and other computer applications, plays an essential role in effectively retrieving the concepts that have highly semantic similarity with the original query concept. Meanwhile, ontology also returns the results to the user. Ontology mapping is used to connect the relationship between different ontologies, and similarity computation is the essence of such applications. In this article, we present a ranking based ontology optimizing algorithm to get the ontology score function which map each ontology vertex to a real number, and the similarity between ontology vertices is determined according to the difference of the scores. The ontology framework is designed relied on the eigenpair computation, and the solution is obtained by means of operator calculation. The result data of simulation experiment implies that our new ontology method has high efficiency and accuracy in ontology similarity measure and ontology mapping in multiple disciplines.
Introduction
The term “ontology” has its origin in the field of philosophy. It has been used to describe the natural connection of things and the inherently hidden connections of their components. In information and computer science, ontology is a model for knowledge storing and representation. This term has been widely applied in knowledge management, machine learning, information systems, image retrieval, information retrieval search extension, collaboration and intelligent information integration. The past decade has witnessed how ontology, as an effective concept semantic model and a powerful analysis tool has been widely applied in pharmacology science, biology science, medical science, geographic information system and social sciences (for instance, see [1–5]).
A simple graph is usually used to represent the structure of ontology. Each concept, object or element in ontology is corresponding to a vertex. Each (directed or undirected) edge on an ontology graph represents a relationship (or potential link) between two concepts (objects or elements). Let O be an ontology. Let G be a simple graph corresponding to O. The nature of ontology engineer application falls onto getting the similarity calculating function. Such a function is to compute the similarities between ontology vertices. The intrinsic link between vertices in ontology graph can be represented by these similarities. Ontology mapping aims to get the ontology similarity measuring function through measuring the similarity between vertices from different ontologies. A mapping like this is a bridge between different ontologies. It gets a potential association between the objects or elements from different ontologies. Specifically speaking, the ontology similarity function is a semi-positive score function and it maps each pair of vertices to a non-negative real number.
In recent years, ontology technologies can be found in a variety of applications. A graph derivation representation based technology for stable semantic measurement was presented by Ma et al. [6]. An ontology representation method for online shopping customers knowledge in enterprise information was raised by Li et al. [7]. An innovative ontology matching system that finds complex correspondences by processing expert knowledge from external domain ontologies and in terms of using novel matching tricks was proposed by Santodomingo et al. [8]. Description of the main features of the food ontology and some examples of application for traceability purposes was given by Pizzuti et al. [9]. That ontologies can be used in designing an architecture for monitoring patients at home was aroused in argument by Lasierra et al. [10].
Using ontology learning algorithm which gets an ontology function is an advanced idea in dealing with the ontology similarity computation. The ontology graph is mapped into a line which consists of real numbers through ontology function. Comparing the difference between their corresponding real numbers is used to measure the similarity between two concepts. Dimensionality reduction is the essence of this idea. To associate the ontology function with ontology application, a vector is used to express all its information for vertex v. We slightly confuse the notations and v is used to denote both the ontology vertex and its corresponding vector because we want to facilitate the representation. We map the vector to a real number by ontology function. The ontology function is a dimensionality reduction operator and it maps multi-dimensional vectors into one dimensional vectors.
Several effective methods for getting efficient ontology similarity measure or ontology mapping algorithm in terms of ontology function can be used. The ontology similarity calculation in terms of ranking learning technology was considered by Wang et al. [11]. The fast ontology algorithm in order to cut the time complexity for ontology application was raised by Huang et al. [12]. An ontology optimizing model such that the ontology function is determined by virtue of NDCG measure was presented by Gao and Liang [13]. Meanwhile, it is successfully applied in physics education. Gao and Xu [14] presented ontology similarity measuring and ontology mapping algorithm via MEE (minimum error entropy) criterion. An ontology algorithm by virtue of diffusion and harmonic analysis on hyper-graph framework was inferred by Gao et al. [15]. By using special loss functions, Gao et al. [16] proposed new multi-dividing ontology learning algorithms for similarity measure and ontology mapping construction.
Several articles make contributions to the theoretical analysis of ontology algorithms. Ontology algorithm from a theoretical perspective was studied by Gao et al. [17] based on the regression. Considering the strong and weak stability of k-partite ranking, ontology algorithm was studied by Gao et al. [18]. Gao et al. [19] continued to discuss the ontology algorithm in multi-dividing setting. They pointed out that conditional linear statistical can be used to represent the empirical multi-dividing ontology framework. An approximation result is obtained from the projection method. A gradient learning model for ontology similarity measuring and ontology mapping in multi-dividing setting was raised by Gao and Zhu [20] based on their cooperation. Furthermore, the hypothesis space and the ontology dividing operator determines the sample error in this setting. Based on the assumptions of low noise, the upper bound and lower bound minimax learning rate was obtained by Gao et al. [21]. Other theoretical results can be found in Gao et al. [22].
In recent years, ranking learning algorithm and its applications have raised high attention among scholars. A large number of ranking tricks are introduced for a variety of applications (see [23–28]). In this paper, we present a new ontology optimization model based on ranking technologies. The solution is calculated in terms of eigenpair computation. At last, the proposed ontology algorithm will be applied in plant science, physical education, biology science and humanoid robotics. The effectiveness of ontology algorithm in these fields proved to be high according to experiment results.
Setting
Let V be a compact metric input space (or ontology instance space) and Y = [0, M] be an output space, where M is a fixed positive number. Suppose that ρ is a Borel probability measure on Z = V × Y. Set ρ V as the marginal distribution on V and its conditional distribution on Y for fixed ontology vertex v is denoted by ρ (· |v). The aim of ranking based ontology algorithm is to obtain a ontology function in terms of given ontology data (sample) set which is drawn independently and identically according to unknown ρ. This ontology function can be regarded as a score function which values future vertices with larger scores higher than those with smaller labels, that is to say, vertex v is to be assigned with a larger value than v′ if f z (v) > f z (v′), and a smaller value than v′ if f z (v) < f z (v′). For convenience, we say that f z (v) = f z (v′) implies there is no difference in ranking order between vertices v and v′. Ontology learning algorithms were used aiming to obtain an optimal ontology function f. In this way, the similarity between two vertices v i and v j is determined by the value of |f (v i ) - f (v j ) |.
The punishment of such ranking based ontology function f on a pair of ontology vertex pair (v, v′) with corresponding labels y and y′ can be expressed as the least squares ontology loss
Let be the set of target ontology functions which are defined to be the real score functions minimizing the error R (f) over . It is not hard to check that any ontology function with the form f + c has the same ontology expected error as that of ontology function f, where is a fixed constant. From this point of view, different from the target ontology function in regression setting, the ranking based target ontology function is not unique. It can be verified that the regression ontology function of ρ denoted as
A Mercer kernel (see [30–35] for more details) on V is defined as a symmetrically continuous function satisfying that the n × n matrix K with its (i, j)-element K (v
i
, v
j
) is a positive semi-definite matrix for any ontology data set . Set span{K
v
: v ∈ V} as the space spanned by the collection {K
v
= K (· , v) : v ∈ V}. Therefore, the inner product in the space s span {K
v
: v ∈ V} can be expressed by
The reproducing kernel Hilbert space (RKHS) H K associated with kernel function K was introduced as the completion of span{K v : v ∈ V} with the norm ∥ · ∥ K induced by the inner product 〈 · , · 〉 K . The reproducing property of RKHS H K can be formulated by f (v) = 〈 f, K v 〉 K for any v ∈ V and any f ∈ H K . By virtue of reproducing property and set , we infer that ∥f ∥ ∞ ≤ κ ∥ f ∥ K for all f ∈ H K .
A kernel associated ontology learning algorithm in ranking setting with the least squares ontology loss function can be stated as follows:
Framework of ontology computation
Let K be a Mercer kernel on ontology input space V. For any ontology function f ∈ H
K
, L
K
is the operator associated with K, which is defined by
It is easy to deduce that L
K
has at most countable eigenvalues. All of these eigenvalues are nonnegative since they are self-adjoint, compact, and positive operators. The eigenvalues {λ
l
} (with multiplicities) of L
K
can be ranked as a non-increasing sequence with lower bound 0 and associated with the sequence of eigenfunctions {φ
l
} which forms the orthonormal basis of reproducing kernel Hilbert space H
K
. The unlabeled part of the ontology samples is expressed as . For any f ∈ H
K
, the operator defined on H
K
is denoted by
The relationship between and L K can be formulated as , which implies that can be regarded as the empirical version of the operator L K with respect to v. Clearly, the operator is a self-adjoint, positive operator with . The eigensystem of can be called as an empirical eigensystem, which is denoted by . It is emphasized here that the eigenvalues are ranked in the non-increasing order, and all the empirical eigenfunctions exactly form an orthonormal basis of reproducing kernel Hilbert space H K . Furthermore, at most n eigenvalues are nonzero. We can formulate as for all l > n.
In this paper, our ontology framework is a modified version of (1) based on the computation of the first n empirical eigenfunctions in ranking setting which can be stated as
According to the representation theory in learning theory, the output ontology function of computation model (2) is denoted by .
We should determine the suitable target ontology function approximated by the ontology function that forms the set of infinitely number of target ontology functions. Set as the rth power of L K . Our selection strategy is described as follows: suppose that we can search an ontology function to meet that for certain r > 0 and , .
If there exists another ontology function satisfying for certain . Then, we get for some constant in view of . We rewrote as by means of the fact that can be formulated by using {λ l } and {φ l } as . Set . Then, we deduce and . Obviously, c l = 0 for all l ∈ Λ by means of L K (c) =0. It implies that and . This reveals the uniqueness of .
It is verified above that there exists only one ontology score function in H satisfying the condition above. Thus, it is reasonable to take as the target ontology function for our ontology framework (2).
Now, we present the detailed technologies to calculate the empirical eigenpairs and then obtain the solution of our ontology problem (2).
Let I be the n-th identity matrix and . Set , and we can verify that P is an orthogonal projection matrix such that
Then,
Set
Furthermore, for 1 ≤ l ≤ n, we set
Then, the solution to ontology problem (2) is determined by
Description of ontology algorithms
The new ontology learning algorithm can be used in ontology concepts similarity measurement and ontology mapping. The basic idea is: via the computation of ontology framework (2), the ontology graph is mapped into a line consisting of real numbers. The similarity between two concepts then can be measured by comparing the difference between their corresponding real numbers.
Step 1. Mathematizing ontology information.
Step 2. By obtaining the solution of ontology algorithm (2), we map the ontology graph to the real line and map the vertices of ontology graph to real numbers.
Step 3. For each v ∈ V (G), we obtain the similar vertices and return the outcomes to the users.
Let G1, G2, …, G m be ontology graphs corresponding to ontologies O1, O2, …, O m .
Step 1. Mathematizing ontology information.
Step 2. By obtaining the solution of ontology algorithm (2), we map the ontology graph to the real line and map the vertices of ontology graph to real numbers.
Step 3. For v ∈ V (G i ), where 1 ≤ i ≤ m, we obtain the mapping vertices and return the outcome to the users.
Experiments
In this section, four simulation experiments relevance ontology similarity measure and ontology mapping below are designed.
Experiment on biology data
We use “Go” ontology O1 which was constructed in http://www.geneontology.org. (Figure 1 shows the basic structure of O1) in our experiment. P@N (Precision Ratio, see [41] for more detail) is used to measure the equality of the experiment. First of all, we give the closest N concepts for every vertex on the ontology graph by expert, and then we obtain the first N concepts for every vertex on ontology graph by the algorithm and compute the precision ratio. Huang et al. [12], Gao and Liang [13] and Gao and Gao [42] employed ontology algorithms to “Go” ontology. We compare the precision ratio which we got from four methods. Several experiment results can are shown in Table 1.
When N = 3, 5, 10 or 20, the precision ratio by virtue of our algorithm 1 is higher than the precision ratio determined by algorithms as [12, 42] proposed in their research. Particularly, when N increases, such precision ratios will apparently increase. Conclusion can be drawn from the fact that the algorithm described in our paper is superior to the method proposed by [12, 42].
Experiment on physical education data
We use physical education ontologies O2 and O3 (the structures of O2 and O3 are presented in Figs. 2 and 3 respectively) for our second experiment. This experiment aims at determining the ontology mapping between O2 and O3 via Algorithm 2. P@N criterion is applied to measure the equality of the experiment. We gave the closest N concepts for each vertex on the ontology graph with the help of experts. Then, we obtained the first N concepts for every vertex on ontology graph by the algorithm and compute the precision ratio. Ontology algorithms to “physical education” ontology was also employed in [12, 13, 15]. We made a comparison among the precision ratios which we get from four methods. Several experiment results can be found in Table 2.
Our Algorithm 2 is more efficiently than algorithms raised in [12, 15]. It is especially the case when N is sufficiently large. This is revealed by the experiment results in Table 2.
Experiment on plant data
In this subsection, “PO” ontology O4 which was constructed in http://www.plantontology.org. (Figure 4 shows the basic structure of O4) is used to test the efficiency of our new algorithm for ontology similarity measuring. The P@N standard is used again for this experiment. Furthermore, we apply ontology method in [11–13] to the “PO” ontology in our experiment. We calculate the accuracy by these three algorithms, and also compare the result to algorithm rose in our paper. Part of the data can be referred to Table 3.
When N = 3, 5, or 10, the precision ratio determined by algorithms that [11–13] proposed in their research is lower than the precision ratio in terms of our algorithm. Such precision ratio apparently increase as N increases. That the algorithm described in our paper is superior to the method that [11–13] proposed in their research can be drawn from the paper as a conclusion.
Experiment on humanoid robotics data
We used humanoid robotics ontologies O5 and O6 (constructed by [42], and the structures of O5 and O6 are presented in Figs. 5 and 6 respectively) for our last experiment. This experiment aims at determining ontology mapping between O5 and O6 via Algorithm 2. P@N criterion we applied to measure the equality of the experiment. Ontology algorithms in [12, 43] were employed to humanoid robotics ontologies. We compare the precision ratio that we get from four methods. Several experiment results are shown in Table 4.
That our algorithm 2 works with more efficiency than algorithms [12, 43] raised in their research especially when N is sufficiently large is revealed by the experiment results in Table 4.
Conclusions
In our article, a ranking based computation technology is manifested for ontology similarity measure and ontology mapping application. The detailed optimization scheming relies on the eigenpair computation, and the solution has uniqueness in view of our method of selection. Simulation data in the last part show that new ontology trick has high efficiency in biology, physics education, plant science and humanoid robotics. The ranking based ontology algorithm raised in this paper illustrates promising application prospects in multiple disciplines.
Footnotes
Acknowledgments
We thank all the reviewers for their constructive suggestions on how to improve the quality of this paper. We wish to acknowledge the National Natural Science Foundation of China (11401519).
