Intelligent recognition of semantic relationships based on antonymy

Abstract

Since computing semantic similarity tends to simulate the thinking process of humans, semantic dissimilarity must play a part in this process. In this paper, we present a new approach for semantic similarity measuring by taking consideration of dissimilarity into the process of computation. Specifically, the proposed measures explore the potential antonymy in the hierarchical structure of WordNet to represent the dissimilarity between concepts and then combine the dissimilarity with the results of existing methods to achieve semantic similarity results. The relation between parameters and the correlation value is discussed in detail. The proposed model is then applied to different text granularity levels to validate the correctness on similarity measurement. Experimental results show that the proposed approach not only achieves high correlation value against human ratings but also has effective improvement to existing path-distance based methods on the word similarity level, in the meanwhile effectively correct existing sentence similarity method in some cases in Microsoft Research Paraphrase Corpus and SemEval-2014 date set.

Keywords

1. Introduction

As one of the most important components of big data, the scale of texts grows with people’s life. Meanwhile, it brings more challenges for text processing such as automatic text understanding and word spelling error detection and correction [1]. To solve this problem, semantic similarity measuring is used to determine the semantic association of words and measure the hidden similarity of lexical meanings under the surface form similarity of concepts [2]. The methods have been improved with the continuous effort of researchers. In the beginning, there are only methods [3, 4, 5, 6, 7] measuring semantic similarity through “is-a” relation between words. Now more multiple semantic relation-based approaches [8, 9, 10, 11, 12, 13] have been proposed by considering more relations such as “part-of”.

Comparing with the high degree of dependence on the corpus and the high computational complexity of corpus-based measures, knowledge-based measures are more suitable for the similarity computation issues. Most state-of-the-art approaches based on the lexical knowledge database WordNet [15] are proposed to measure concept semantic similarity [4, 7, 11, 14, 16]. WordNet is a large English semantic dictionary. Nouns, verbs, adjectives, and adverbs in WordNet are grouped into sets of cognitive synonyms (synsets) with each expressing a distinct concept. Synsets are interconnected through conceptual-semantic and lexical relations. At first, researchers only use “is-a” relation to compute the semantic similarity and apply this relation to generate three major attributes of concepts, path distance, information content (IC), and feature with each attribute representing a calculative strategy. Path-distance based approaches [3, 4, 17, 18, 19] regard all the edges between any two synsets are equal to 1 and count the least edges connecting the compared concepts. The less the value of the semantic distance between two concepts, the more similar they are. Thus, the semantic distance between the same concepts is 0. While the approaches based on IC [20, 21, 22, 23, 24] assume that each synset is unique due to their different structures which concludes the scale of ancestor nodes, hypernyms, hyponyms and so on. Therefore, IC-based approaches use IC value to represent concepts’ structure, abstract quantity, and IC of concepts. A concept’s IC value will rise with its abstract quantity. For two given concepts, the similarity between them not only depends on the information on their own but also depends on what they have in common. In the feature-based approaches [25, 26, 27, 28, 29, 30], a concept is represented as a feature vector in which the features are derived from WordNet such as gloss, hypernyms, and hyponyms. If two concepts have more features in common, they are more similar.

According to the research of paper [31] and our observation, most of the WordNet-based measures don’t use the structures of WordNet to their full potential. For example, methods that neglect the antonymy may lead to higher similarity results. Therefore, the main motivation for this work is to find a structure which can make full use of antonymy and correct the over-similar situations. Our main hypothesis is that antonymy between words can support a negative effect on the semantic similarity values of concepts. It is just like the effect in which we should consider the similarity between “boy” and “girl” while computing the similarity between “boy” and “nun”.

The aim of this paper is to answer two research problems:

RQ1: What kind of structure can make full use of the antonymy? RQ2: How effective is antonymy during similarity computation?

Our approach mainly contributes to:

1.
An implementation and comparison of existing representative approaches with their calculation factors.
2.
An model to introduce the antonymy as dissimilarity between concepts into the process of similarity computation.
3.
A conducted experiment showing that the proposed model performs well in terms of correlation value on WordNet-based models.

The rest of this paper is organized as follows. Section 2 introduces the main WordNet-based similarity measures. Section 3 presents the new factor for semantic similarity measuring. Applications on different text granularity levels and experiment results of the proposed model are shown in Section 4. Section 5 discusses the results obtained by different approaches. In Section 6, conclusions and future researches are presented.
2. Literature review

In this section, we take an overview of the WordNet-based semantic similarity methods. According to the number of relations being used, methods can be roughly divided into “is-a” relation-based methods, multiple semantic relation-based methods, and feature-based methods.

2.1 “Is-a” relation-based methods

The “is-a” relation accounts for nearly 80% of all link types in WordNet, and it shows the hypotaxis between synsets. So it commonly is the first-choice or even the only-choice in the process. According to the different ways of using this relation, the “is-a” relation-based methods can be divided into path-based measures and IC-based measures.

2.1.1 Path-distance based

The main idea of path-based measures lies in the fact that the similarity between two concepts is a function of the path distance linking the concepts and their positions in the taxonomy.

Rada et al. [3] proposed that ontologies can be treated as a directed graph which is composed of concepts linked by multiple semantic relationships such as by taxonomic (“is-a”) links. Therefore, for two given concepts $c_{i}$ and $c_{j}$ , their semantic similarity is mainly affected by the semantic distance that counted as the sum of all the edges on the shortest path between them, which is equal to the number of “is-a” links from $c_{i}$ to $c_{j}$ .

However, Rada’s method has a flaw that the similarities of concept pairs with equal path distance are the same. But in some cases, the concept pairs are not similar at all under human judgment, which is the limitation of path-only based methods.

Wu [17] improved Rada’s work by introducing the depth of concepts and the Least Common Ancestor (LCA) of concepts. Based on the depth of LCA and the relative location of $c_{i}$ and $c_{j}$ , Lowest Common Ancestor can elicit the whole LCA structure’s position in WordNet and then comes to more precise similarities. The depth can be seen as another form of path distance, which equals to the number of “is-a” links from concepts to the root node.

Through path distance, Li [4] exploited two more structure factors of concepts to calculate semantic similarity, which are local density and the depth of LCA. The authors used these factors through a variety of combinations to form multiple formulas. After tuning parameters on the training dataset, they seek out a nonlinear formula containing path distance and the depth of LCA. The result of the training set is similar to human judgment. Moreover, Hao [32] proposed a new model to combine path distance and depth of LCA through imitating the thought process of humans.

Although the path-based measures have developed two more factors, depth and density, their defect is the neglect of most of the structure of WordNet. So they may produce unreliable results.

2.1.2 Information content-based

The initial IC-based measure is proposed by Resnik [20] in which the IC of LCA represents the information shared by two concepts. However, there are two flaws in Resink’s method. One is all the pairs of concepts hold the same similarity while they have the same LCA. The other is the similarity between a concept and itself is not equal to one.

Jiang [21] improved Resnik’s method by combining path distance and IC. He assumed that IC can be employed to qualify the path distance linking every two concepts. The IC of the LCA is subtracted from the sum of the IC of the individual compared concept. Therefore, their method presents the dissimilarity rather than the similarity.

Lin [22] has also improved Resnik’s approach by using the ratio of the commonalities between concepts $c_{i}$ and $c_{j}$ together with their full information needed as the similarity score between concepts. Based on Lin’s approach, Meng [7] proposed a nonlinear similarity model considering the result of Lin’s measure as the exponent to conduct the similarity computation.

Resnik [20] was the first to propose the IC computation method who assumed IC values of concepts were mainly affected by their frequency of appearance in the corpus. In line with Resnik’s model, the IC a concept contains increases with the growth of its appearance in the corpus, which indicates that the more abstract a concept is the less IC a concept contains.

Nevertheless, corpora-based IC computation depends on corpora availability and the size of the corpus. The IC of each concept tends to be the same constant when the size of a corpus is large enough, and it requires manual judgment for word sense disambiguation.

Seco [34] put forward an IC intrinsic method that utilized the intrinsic of ontologies to measure IC of concepts. In their approach, the number of hyponyms in WordNet is used as the occurrences of the given concept. Meng [7] introduced the depth of concepts into the IC-based similarity measure and proposed an IC computation formula combined with the hyponyms of concept. Experiments showed that the correlation value between similarities measured by the proposed IC computation formula and human judgments was improved.

It can be concluded from Section 2.1.2 that the differences among IC-based measures are the ways of IC computation and the combinations of IC concepts. The IC-based measures are highly dependent on the IC computation because the results and computing time in IC-based measures may be quite different when different IC computation ways are used. Another defect of the corpora-based IC computation is that it needs an additional large text corpus to compute word frequency. Besides, IC-based measures ignore part or all of the structure of the taxonomy, so it normally generates a coarse result for comparison of concepts [33].

2.1.3 Hybrid “is-a” relation measures

Based on the “is-a” relation, hybrid measures combine the structural characteristics in WordNet, such as path length, depth, and local density, and some of the approaches presented above.

Zhou [16] proposed a measure that takes IC measures and path-based measures as parameters. They used a tuning factor to control the contribution of each point (in their experiment $k=$ 0.5). Instead of the formal path distance between concepts, Zhu [14] proposed a new model to definite the path length by considering edge and density.

2.2 Feature-based

Feature-based approaches make use of the properties of the ontology to obtain similarity between concepts. A concept can be described by a set of words indicating its properties or features, such as their “glosses” and related terms. When two concepts have more common characteristics and less non-common characteristics, they are more similar.

Tversky [25] holds that the similarity between concepts is asymmetric. Features between a subclass and its superclass have a larger contribution to the similarity evaluation than those in the inverse direction. In Rodríguez and Egenhofer [26]’s approach, the similarity is computed based on synsets, neighbor concepts (those linked via semantic relations) and features which linearly combined by weighting parameters.

Feature-based measures rely on a complete attribute or WordNet gloss set. However, most of them ignore the role of “is-a” relation, which may cause inaccurate similarity results.

2.3 Multiple relation-based measures

Different from all the methods above, multiple semantic relation-based measures tend to search similarity factors from non-taxonomic relationships, such as meronymy, holonymy in WordNet.

2.3.1 Graph-based measures

Graph-based measures aimed to use the graph structure to organize multiple relations. One of the recent approaches is a new method for building WordNet graphs using multiple relations, such as synonymy, hypernym and hyponym (“is-a” link), holonym and meronym (“part-of” link). Stanchev [11] considered concepts’ definition and examples in WordNet as evidence of the relationships between concepts. Nodes in the WordNet graph are connected with asymmetric directed edges whose weight is affected by their link types. For instance, the weight of the hyponym-type edge is equal to the ratio that 0.9 multiplied by the number of all the hyponym senses of the initial sense. Hypernym-type edges’ weights will be the same and equal to the value 0.3. The weight of a meronym-type edge is set to 0.6/n, where $n$ is the number of meronyms of the sense. The weights of edges for the holonym relationship are set to a constant which is equal to 0.15. All the weight represents the probability that someone interested in a sense will be also interested in the hypo-hypernym/hol-meronym of this sense. Cai [35] proposed a new semantic distance computing model for hyper-hyponym, hol-meronym, respectively. The final semantic distance between concepts will be equal to the minimum among two link types. In the hyper-hyponym model, every pair of concepts which on the shortest path between $c_{i}$ and $c_{j}$ and exit “is-a” link, their weight is affected by the ratio of ancestor’s scale and descendant’s scale. In the hol-meronym type model, if $c_{i}$ is part of meronyms of $c_{j}$ , the semantic distance between them is equal to the ratio of $c_{j}$ ’s meronym’s number and $c_{j}$ ’s meronym’s number plus one. Or the semantic distance between $c_{i}$ and $c_{j}$ is affected by the number of meronyms they have in common. On the other hand, Quintero [12] introduced a semantic distance which is defined as the length of the shortest path between concepts in an asymmetrically weighting graph whose weight is automatically refined through a relaxation process.

2.3.2 Multiple relation path-based methods

Hirst and St-Onge [36] determine the relatedness using the path distance between concepts via semantic relations such as hypernymy, hyponymy, and antonymy. Sussna [37] used the sum of weighting shortest path length in WordNet to calculate the relatedness of two concepts, where the edges come from different semantic relations including hypernymy, hyponymy, holonymy, meronymy, and antonymy.

Multiple relation-based measure is one of the future ways to compute semantic similarity. Because it conforms to the fact that the process of computation should simulate the thinking process of humans, and the computation should consider a variety of relations between concepts. However, few multiple relation-based measures introduce the effect of antonymy or just simply apply the direct antonymy of concepts. The amount of direct antonymy of concepts is much less than that of “is-a” relation, which will lead to less inaccurate results. In this paper we aim to explore the potential use of antonymy during the similarity computation. An approach to mine the potential antonymy is proposed and then applied in existing similarity measures to improve their computation accuracy.

3. Proposed model

In most cases, people may consider what have in common for concepts while using the inherent structure of WordNet to compute the similarity between them, such as the concepts’ domain, category, and species. However, dissimilarity among them was ignored. In this paper, we proposed an antisense similarity model to measure the similarity of concepts that can explore the effectiveness of antonym relation under the surface. Based on the existing path-distance based similarity approaches [4, 17, 32, 38, 39], we introduced a new calculation factor called the Antisense Coefficient (AC) in similarity computation by taking account of the antonymy between concepts. Different from other multiple semantic relation-based measures [36, 37] that simply apply the direct antonyms of concepts, our method proposed a model named Node to Least Common Ancestor Antisense Path (NLAP) by taking consideration of the negative effect of the antonyms of the concepts’ ancestor and the positive effect of concepts’ ancestors while computing the concepts’ similarity value in WordNet. To quantify the introduced factor AC of the compared concepts, the model computes the similarity between concepts and the nodes on their NLAPs. Based on AC, we can get a more accurate result which is closer to human judgment.

3.1 An antonymy-based similarity model

Considering that WordNet only provides antonyms of concepts’ lemmas in Natural Language Toolkit (NLTK) which may cause concepts’ antonyms only be tracked by antisense lemmas’ concepts, we propose the following expression:

Definition 1: Let $c$ be a concept in the WordNet, the antonym set of $c$ can be defined as:

$\displaystyle\textit{anti}(c)=\{\textit{synsets}(\textit{antonyms}(\textit{% lemma}))|\textit{lemma}\in c\}$ (1)

where lemma refers to the lemmas of concept $c$ , $\textit{antonyms}(\textit{lemma})$ is the set of antonyms of a lemma, and $\textit{synsets}()$ can track a lemma’s concept in WordNet.

Antonyms are the basis of our model, and the proposed factor AC is derived from antonymy between the compared concepts and is represented by the similarity between the concepts and their antonyms. AC plays a corrective role in the process of similarity measuring when it is used to represent the dissimilarity between concepts, which means AC can correct some excessive similar situations by using dissimilarity. In this study, AC is linearly combined with the results of existing path-distance based methods to express the similarity between concepts.

According to the different ways of introducing AC, four strategies are proposed to calculate the similarity between concepts.

Strategy 1: We use the antonyms of compared concepts to calculate AC in order to explore the impact of antonyms. In this case, AC is combined with existing similarity results as a negative factor. The formula is as:

$\displaystyle\textit{Sim}_{\text{strategy1}}(c_{i},c_{j})=\textit{sim}_{% \textit{exist}}(c_{i},c_{j})-\alpha\times\textit{antisim}(c_{i},c_{j})$ (2)

where $\textit{sim}_{\textit{exist}}(c_{i},c_{j})$ represents the existing path-distance based methods used to calculate the similarity between $c_{i}$ and $c_{j}$ , and $\alpha$ is a parameter scaling the contribution of AC. $\textit{antisim}(c_{i},c_{j})$ is the function representing the AC of $c_{i}$ and $c_{j}$ . The detail is shown as:

$\displaystyle\textit{antisim}(c_{i},c_{j})=\textit{sim}_{\textit{exist}}(c_{i}% ,\textit{anti}(c_{j}))+\textit{sim}_{\textit{exist}}(\textit{anti}(c_{i}),c_{j})$ (3)

where $\textit{anti}(c_{i})$ and $\textit{anti}(c_{j})$ refer to the antonyms sets of $c_{i}$ and $c_{j}$ , respectively. It is important to note that some concepts have many antonyms in WordNet. For instance, the concept “boy” in WordNet has two antonyms, “female_child” and “daughter”. However, as there may be more than one similarity values between antonyms and compared concepts, we take the maximum value, so as the following strategies. For example, the details of using Wu [17]’s method to calculate $\textit{antisim}(\textit{boy},\textit{lad})$ is shown as follow:

$\displaystyle\textit{antisim}(\textit{boy},\textit{lad})=\textit{sim}_{wu}(% \textit{boy},\textit{anti}(\textit{lad}))+\textit{sim}_{wu}(\textit{anti}(% \textit{boy}),\textit{lad})=0+\max\{\textit{sim}_{wu}(\textit{female}\_\textit% {child},\textit{lad}),\textit{sim}_{wu}(\textit{daughter},\textit{lad})\}=0+% \max\{0.74,0.78\}=0.78$ (4)

whereas, the numbers of concepts that appear antonymy in datasets like RG65, MC30 are only two and the antonyms are the same. The two concepts are “boy” and “brother”. The fact that there are few antonyms of concepts in WordNet means AC in strategy 1 doesn’t work if there are no antonyms for the compared concepts.

The detail of word pairs in datasets RG65 and MC30 with antonymy is shown in Table 1, column AC(Wu) represents the AC computed by Wu’s method, column simWu shows the similarity results of Wu’s method, column antisimWu refers to the results of the model combined strategy 1 with Wu’s method, and columns MC30 and RG65 represent the similarity results of human judgment between the compared concepts. It can be observed from Table 1 that the similarity between “brother” and “lad” is 0.78 (the range is from 0 to 1) in Wu’s method, which is higher than the human judgment values 1.66 (the range is from 0 to 4) in MC30 and 2.41 (the range is from 0 to 4) in RG65. After improved by strategy 1, the similarity decreased from 0.78 to 0.71 (the range is from 0 to 1), which is closer to human judgment.

Table 1

The direct antonyms in MC30 and RG65

Word pairs		Antonyms and the other concept	AC (Wu)	simWu	antisimWu ( $\alpha=$ 0.1)	AC (Li)	simLi	antisimLi ( $\beta=$ 0.1)	MC30	RG65
Boy	Rooster	Female_child-rooster	0.52	0.52	0.468	0.11	0.11	0.099	N/A	0.44
		Daughter-rooster
Boy	Sage	Female_child-sage	0.74	0.74	0.666	0.37	0.37	0.333	N/A	0.96
		Daughter-sage
Brother	Lad	Sister-lad	0.7	0.78	0.71	0.3	0.45	0.42	1.66	2.41
Brother	Monk	Sister-monk	0.67	0.95	0.883	0.25	0.82	0.795	2.82	2.74
Boy	Lad	Female_child-lad	0.78	0.95	0.872	0.45	0.82	0.775	3.76	3.82
		Daughter-lad

Strategy 2: As we discussed in strategy 1 few concepts in WordNet have direct antonyms. To explore the effectiveness of antonymy on similarity measuring, we proposed a structure called NLAP to represent the hidden antonyms of two concepts.

The key idea of Resnik’s method [20] is that one criterion of similarity between two concepts is “the extent to which they share information in common”, which can be determined by inspecting the relative position of the most-specific concept that subsumes them both in an IS-A taxonomy. Based on Resnik’s method, Least Common Ancestor (LCA) is used to represent the most-specific subsumer and to explore hidden antonymy between concepts. An example of LCA is, in Fig. 1 the LCA of school boy#1#1 and monk#1#2 is person#1#3.

Definition 2 (Node to Least Common Ancestor Antisense Path). Let $\textit{path}(c_{i},\textit{LCA})$ be the longest path in all the possible paths between concepts $c_{i}$ and its Least Common Ancestor, we define the path in which the nodes are the antonyms of the nodes on $\textit{path}(c_{i},\textit{LCA})$ as:

$\displaystyle\textit{NLAP}(c_{i})=\{\textit{anti}(n)|n\in\textit{path}(c_{i},% \textit{LCA})\}$ (5)

The concepts are related to their ancestors in the lexical hierarchy of WordNet cause they inherit part of the characters from their ancestors via “is-a” relation. As a member of ancestors, LCA is used to set the range of ancestors. That means ancestors of the compared concepts whose hierarchy is deeper than LCA can be seen as the extension of the compared concepts and used to find antonymy of the compared concepts. Based on their affiliation, the chosen ancestors can be grouped as $\textit{path}(c_{i},\textit{LCA})$ and $\textit{path}(c_{j},\textit{LCA})$ , respectively. Thus, the antonyms of the nodes in $\textit{path}(c_{i},\textit{LCA})$ and $\textit{path}(c_{j},\textit{LCA})$ are used to represent the hidden antonyms of $c_{i}$ and $c_{j}$ , which represented as the sets $\textit{NLAP}(c_{i})$ and $\textit{NLAP}(c_{j})$ . For example, in Fig. 1, when computing the similarity between school boy#1#1 and monk#1#2, the nodes on $\textit{NLAP}(\textit{school boy{\#}1{\#}1})$ are female#2#5 and female child#1#1.

As NLAP is used to compute AC and linearly combined with the similarity results of existing path-distance based methods, the similarity measure with NLAP is formalized as follows:

$\displaystyle\textit{Sim}_{\text{strategy2}}(c_{i},c_{j})=\textit{sim}_{% \textit{exist}}(c_{i},c_{j})-\alpha\times\textit{antisim}(c_{i},c_{j})$ (6)

Where $\alpha$ represents the weighting parameter of AC, $\textit{antisim}(c_{i},c_{j})$ is the function representing AC of $c_{i}$ and $c_{j}$ and is defined as:

$\displaystyle\textit{antisim}(c_{i},c_{j})=\left.\left(\sum\limits_{e\in% \textit{NLAP}(c_{i})}\textit{sim}_{\textit{exist}}(c_{j},e)+\sum\limits_{f\in% \textit{NLAP}(c_{j})}\textit{sim}_{\textit{exist}}(c_{i},f)\right)\right/N$ (7)

where $N$ represents the number of nodes on the two NLAPs, $e$ and $f$ represent the nodes on $\textit{NLAP}(c_{i})$ and $\textit{NLAP}(c_{j})$ respectively. For a given compared concepts pair, the AC between them is calculated as the mean value of the similarity results between one concept and the antonyms of all the nodes on the NLAP of the other concept.

Table 2

Comparison of using different methods to compute AC in strategy 2

Method	MC30	Method	MC30	Model (AC calculating methods)	MC30	Model	MC30
Wu	0.74133	Hao	0.82779	strategy2-Wu (Wu)	0.78	strategy2-Wu (Hao)	0.8051
Lch	0.77916	Hao	0.82779	strategy2-Lch (lch)	0.8045	strategy2-Lch (Hao)	0.846
Liu-1	0.79639	Hao	0.82779	strategy2-Liu-1 (Liu-1)	0.8401	strategy2-Liu-1 (Hao)	0.8519
Li	0.79226	Hao	0.82779	strategy2-Li (Li)	0.8386	strategy2-Li (Hao)	0.8556

Figure 1.

“is-a” relation taxonomy fragment in WordNet.

Experiment results showed that AC computed by more precise existing models behaved better on correlation values. For instance, in Table 2 $\textit{strategy2-Wu}(\textit{Wu})$ represents the strategy2 model that uses Wu’s model [17] to compute AC which is then combined with Wu’s similarity results [17]. The correlation value between strategy 2 and human judgment in MC30 is 0.78 which is lower than the value 0.8051 in the model $\textit{strategy2-Wu}(\textit{Hao})$ that combines the same similarity results with the AC calculated by Hao’s model [32]. That is because Hao’s model itself behaves better on the correlation value than Wu’s model.

Strategy 3: As shown in strategy 2, we can draw two conclusions. The first is that NLAP is an effective structure to represent hidden antonyms of compared concepts. The second refers that using a more precise model to calculate AC may behave better on the correlation value. Based on the above, we propose another strategy called strategy 3 in which strategy 2 is treated as a more precise model and then used to calculate AC which is later combined with the similarity result of strategy 2 in the way just like we did in Eq. (7). In another word, the proposed strategy can be seen as an iteration of strategy 2. The formula of strategy 3 is shown as follow:

$\displaystyle\textit{sim}_{\text{strategy3}}(c_{i},c_{j})=\textit{sim}_{\text{% strategy2}}(c_{i},c_{j})-\beta\ast\textit{antisim}_{\text{strategy2}}(c_{i},c_% {j})$ (8)

where $\beta$ represents the weighting parameter of AC, $\textit{sim}_{\text{strategy2}}(c_{i},c_{j})$ represents the similarity results of strategy 2, and $\textit{antisim}_{\text{strategy2}}(c_{i},c_{j})$ refers to the AC calculating model using strategy 2 to replace the existing methods in Eq. (7). As a result of this formulation, strategy 2 is improved as an iteration of itself.

Strategy 4: In strategy 2, we have considered the negative effect of concepts’ ancestors on the similarity results. But we ignored the impact of similarity between ancestor sets of concepts. So we propose the following expression:

Definition 3 (Node to Least Common Ancestor Antisense Path Similarity). Let $\textit{NLAP}(c_{i})$ and $\textit{NLAP}(c_{j})$ be the NLAPs of concepts $c_{i}$ and $c_{j}$ , respectively, the similarity between concepts $c_{i}$ and $c_{j}$ ’s NLAPs is defined as:

$\displaystyle\textit{NLAPSim}(c_{i},c_{j})=\sum\limits_{e\in\textit{NLAP}(c_{i% }),f\in\textit{NLAP}(c_{j})}\textit{sim}(e,f)/\textit{Number}$ (9)

where Number represents the computation times of similarity measuring between nodes in the process of calculating NLAPSim, $e$ and $f$ refer to the nodes on $c_{i}$ and $c_{j}$ ’s NLAPs. For concepts $c_{i}$ and $c_{j}$ to be compared $\textit{NLAPSim}(c_{i},c_{j})$ is calculated as the mean value of the similarities between the nodes on the NLAP of each compared concepts.

As NLAPSim is introduced in the process of computing AC, the effect of ancestors’ antonyms can be considered in similarity computation. Thus, AC in strategy 4 is calculated by combined NLAPSim with NLAP and then strategy 2 is improved as:

$\displaystyle\textit{Sim}_{\text{strategy4}}(c_{i},c_{j})=\textit{sim}_{% \textit{exist}}(c_{i},c_{j})-\alpha\times(\textit{antisim}(c_{i},c_{j})-% \textit{NLAPSim}(c_{i},c_{j}))$ (10)

where $\alpha$ represents the weighting parameter of AC which is composed of $\textit{NLAPSim}(c_{i},c_{j})$ and $\textit{antisim}(c_{i},c_{j})$ , where $\textit{antisim}(c_{i},c_{j})$ has been defined in Eq. (7). The experimental results of strategy 3 and 4 and the comparison are shown in Section 4. There is also the application of our model on sentence similarity and short text similarity in Section 4.

4. Evaluation

4.1 Datasets

In this study, words from widely used datasets in the existing researches have been chosen to conduct the experiments and the results have been compared against human ratings performed in the same settings. At present, many related approaches [4, 16, 17, 22, 32, 38, 39] have used Miller and Charles (MC30) [40] and Rubenstein and Goodenough (RG65) [41] benchmarks as their test datasets. RG65 includes 65 pairs of words whose rating is on a scale from 0 to 4 for semantically unrelated to highly synonymous given by 51 human subjects in 1965. Another dataset MC30 contains 30 word pairs derived from dataset RG65MC30 has an obvious hierarchy on similarity (10 highlevel word pairs, 10 intermediatelevel word pairs, and 10 lowlevel word pairs) with the scores from 0 to 4 for no similarity to perfect. MC30 dataset derives from a duplicate of the original experiment with re-rating carried out 25 years later than RG65.

Figure 2.

The process of using strategy 4 to compute similarity.

4.2 Experimental process

In this paper, the noun taxonomy of WordNet 3.0 was used as the taxonomic ontology for the MC30 and RG65 datasets. Natural Language Toolkit (NLTK) interface is used for WordNet to acquire data in WordNet 3.0. The maximum depth and maximum nodes of the WordNet structure are equal to 20 and 82115, respectively.

Table 3
Results compared with existing path-based methods

Algorithm	Calculating factors	MC30	RG65
Path computed as edge counting in Rada’s method
Wu	Path length, depth	0.74133	0.78602
Leacock	Path length	0.77916	0.83872
Liu1	Path length, depth	0.79639	0.84238
Liu2	Path length, depth	0.746549	0.82157
Li	Path length, depth	0.79226	0.8528
Hao	Path length, depth	0.82779	0.85604
Path computed as edge counting in Zhu’s method
Wu_PD	Path length, depth, density	0.8725	0.8576
Leacock_PD	Path length, depth, density	0.8335	0.8448
Li_PD	Path length, depth, density	0.8398	0.8491
Methods combined with strategy 4
Wu	Path length, depth	0.78 ( $\alpha=$ 0.32)	0.7923 ( $\alpha=$ 0.07)
Leacock	Path length	0.8045 ( $\alpha=$ 0.25)	0.8394 ( $\alpha=$ 0.03)
Liu1	Path length, depth	0.8401 ( $\alpha=$ 0.43)	0.8453 ( $\alpha=$ 0.06)
Liu2	Path length, depth	0.8319 ( $\alpha=$ 0.83)	0.829 ( $\alpha=$ 0.1)
Li	Path length, depth	0.8386 ( $\alpha=$ 0.48)	0.8538 ( $\alpha=$ 0.07)
Hao	Path length, depth	0.8772 ( $\alpha=$ 0.57)	0.8566 ( $\alpha=$ 0.03)
Wu_PD	Path length, depth, density	0.8872 ( $\alpha=$ 0.32)	0.8576 ( $\alpha=$ 0.03)
Leacock_PD	Path length, depth, density	0.8555 ( $\alpha=$ 0.37)	0.8451 ( $\alpha=$ 0.02)
Li_PD	Path length, depth, density	0.8703 ( $\alpha=$ 0.48)	0.8494 ( $\alpha=$ 0.03)
Methods combined with strategy 3
Wu	Path length, depth	0.80933 ( $\beta=$ 1.8, $\alpha=$ 1.8)	0.81052 ( $\beta=$ 0, $\alpha=$ 0.2)
Leacock	Path length	0.78475 ( $\beta=$ 1.7, $\alpha=$ 1.8or exchange)	0.84644 ( $\beta=$ 0.1, $\alpha=$ 0 or exchange)
Liu1	Path length, depth	0.8451 ( $\beta=$ 1.6, $\alpha=$ 1.6)	0.85509 ( $\beta=$ 0.1, $\alpha=$ 0.1)
Liu2	Path length, depth
Li	Path length, depth	0.84166 ( $\beta=$ 1.4, $\alpha=$ 1.3or exchange)	0.8644 ( $\beta=$ 0, $\alpha=$ 0.2)
Hao	Path length, depth	0.8831 ( $\beta=$ 1, $\alpha=$ 1.3or exchange)	0.86004 ( $\beta=$ 0.1, $\alpha=$ 0.1)

Table 4

Results compared with various existing methods

Algorithm (strategy)	Type		MC30 (drived from)	RG65 (drived from)
Rodriguez and Egenhofer	Feature		0.71 (Petrakis et al.)	N/A
Tversky			0.73 (Petrakis et al.)	N/A
Petrakis et al.			0.74 (Petrakis et al.)	N/A
David Sánchez et al.			0.83 (Sánchez et al.)	0.857 (Sánchez et al.)
Wu	Edge-counting		0.74133	0.78602
Leacock			0.77916	0.83872
Liu1			0.79639	0.84238
Liu2			0.746549	0.82157
Li			0.79226	0.8528
Hao			0.82779	0.85604
Wu_PD			0.8725	0.8576
Res-Seco	IC	IC calculated in Seco’s method	0.74164	0.79403
Lin-Seco			0.84167	0.84416
Meng-Seco			0.83857	0.84851
Res-Meng		IC calculated in Meng’s method	0.83881	0.8314
Lin-Meng			0.84947	0.86259
Meng-Meng			0.85821	0.87677
Hadj Taieb et al.	Hybrid		0.85 (Hadj Taieb et al.)	0.88 (Hadj Taieb et al.)
Zhou et al.			0.86 (Zhou et al.)	0.87 (Zhou et al.)
Hao (improved by stategy4)	Edge-counting		0.8772	0.8566
Wu_PD (improved by stategy4)			0.8872	0.8576
Hao (improved by stategy3)			0.8831	0.86004

The detailed experimental and measuring process is as follows:

Each word in the word pairs on MC30 and RG65 datasets is regarded as an index word to query WordNet 3.0 and one or more meanings (concepts or synsets) for them can be obtained. Because an index word may get more than one meanings, the following formula is used to calculate the similarity of words:

$\displaystyle\textit{sim}(w_{i},w_{j})=\mathop{\max}\limits_{c_{i}\in\textit{% sense}(w_{i}),c_{j}\in\textit{sense}(w_{j})}|{\textit{sim}(c_{i},c_{j})}|$ (11)

where $(w_{i},w_{j})$ refers to a word pair to be compared on MC30 or RG65 dataset, $\textit{sense}(w_{i})$ and $\textit{sense}(w_{j})$ represent the sense sets queried from WordNet 3.0 with index words $w_{i}$ and $w_{j}$ respectively. The maximum value of sense similarity is chosen to be on behalf of the words’ similarity.

While using AC to calculate the similarity between words a word may have more than one antonym due to its many senses. In this case, the sense and antonym which can gain the maximum similarity are used in the experiment.

Pearson correlation coefficient is used to compare the results of the experiment and judgments provided by humans in datasets. Pearson’s $r$ is calculated as follow:

$\displaystyle r=\frac{\sum\nolimits_{i=1}^{n}{(x_{i}-\bar{x})(y_{i}-\bar{y})}}% {\sqrt{\sum\nolimits_{i=1}^{n}{(x_{i}-\bar{x})^{2}\times\sum\nolimits_{i=1}^{n% }{(y_{i}-\bar{y})^{2}}}}}$ (12)

where $x$ refers to the set of human judgments and $y$ refers to the sets of measurement results, $x_{i}$ and $y_{i}$ refer to the $i$ th element in $x$ and $y$ , respectively. $\bar{x}$ and $\bar{y}$ represent the average values of $x$ and $y$ , respectively, and $n$ is the number of word pairs in the dataset.

4.3 Results and comparisons

A series of comparisons were made with various methods to validate the effectiveness of the proposed model.

First, we compared the existing approaches using path length, depth, and density factors with the same approaches combined with our AC model on the measurement of the same datasets. This comparison would reveal the ability of our model to enhance the measurement accuracy of the “is-a” based similarity approaches. Then comparisons between the results of strategy 3 and strategy 4 with the popular similarity algorithms, including IC-based approaches, feature-based approaches, and the hybrid approaches, were made to evaluate whether the edge-based similarity approaches combined with our path model reach a level of excellence.

Table 3 shows the correlations of different similarity methods on the MC30 and RG65 datasets. Firstly, we reproduced some classic approaches [4, 17, 32, 38, 39], such as path distance, depth, and density, and a different path-weighting approach [14]. Then we combined the approaches above with strategy 3 and strategy 4. Table 4 provides a comparison of strategy 4 and strategy 3 with other current mainstream similarity algorithms on the MC30 and RG65 datasets.

4.4 Analysis of the parameters in the metrics

Since the parameter settings of $\alpha$ and $\beta$ affect the final experimental results, analysis of the parameters is necessary.

Considering that AC plays a corrective role in our model and have a significant influence on the result, we can conclude that the main function of parameters in the formulas is to weight the corrective function of AC. The following shows the detailed analysis of parameters in the formulas according to Eqs (8) and (10).

To better understand the influence of parameter $\alpha$ in Eq. (10) on experimental results, we set the range of $\alpha$ from 0 to 1 and observe the variety of correlation values in strategy 4. Similarly, to better observe the corrective function of AC in strategy 4 in each model, we set the range of $\alpha$ from 0 to 1 and combined five methods [4, 17, 32, 38, 39] with strategy 4 and put the results into Fig. 3. From Eq. (10) we can know that AC equals 0 while $\alpha$ equals 0, and the correlation value is equal to the original method’s correlation value at this time. In Fig. 3, there are many situations where the correlation values are higher than those when $\alpha$ equals 0 In this case, $\alpha$ represents that the weight of AC is suitable and thus improves the original method, which also means that the corrective function of AC is the most suitable while the correlation value reaches its extreme value. On the contrary, the correlation value is lower than that when $\alpha$ equals 0, which demonstrates an overcorrection situation and thus brings negative effects to the judgments in similarity computation. In Fig. 3, each color represents a model in which the original method is improved by strategy 4. Solid and dashed lines for each color are used in Fig. 3 where the solid lines represent the correlation values of models when $\alpha$ changes from 0 to 1 and the dashed lines represent the correlation values of original methods. From Fig. 3 it can be concluded that the AC in three models [4, 32, 39] can improve the models’ correlation value when 0 $<\alpha<$ 0.99. However, the correlation value started to decrease after reaching its extreme value. In the other two models [17, 38], their correlation values are lower than their original methods in 0.6 $<\alpha$ and 0.85 $<\alpha$ , respectively. This demonstrates that AC doesn’t improve the original methods all the time, the correlation value will be lower than that of original methods when AC is oversized, which means AC is useless at this time.

Figure 3.

Comparison of five existing path-distance based methods improved by strategy 4.

We can learn from Eq. (8) that there are two times of similarity measuring on compared concepts while using strategy 3 to compute the similarity between concepts, and each time of measuring has a parameter, $\alpha$ and $\beta$ , respectively. The process of two times of similarity measuring is like that in Eq. (10) except for considering NLAPSim during the process of computing AC. Thus, the function of $\alpha$ and $\beta$ in Eq. (8) is the same with $\alpha$ in Eq. (10), which is the weighting parameter of AC. As shown in Fig. 4, we combined Li’s method [4] with our strategy 3 and set the range of $\alpha$ and $\beta$ from 0 to 5.

Figure 4.

Results of combined Li’s method with strategy 3.

After comparing the results with human judgment in MC30, we get the correlation values. The correlation value is the original correlation value of Li’s method [4] when $\alpha$ and $\beta$ are equal to 0. Places with lighter colors mean that the correlation values are higher. Therefore, places where the color is lighter than those in which $\alpha$ and $\beta$ equal to 0 representing that their correlation values are higher than the original correlation value in Li’s method [4].

We can learn from Fig. 4c that when the value of $\alpha$ and $\beta$ appear in the yellow area, the correlation value will be higher than that of Li’s method. Moreover, the yellow areas are symmetric to $y=x$ according to the part (c) in Fig. 4, which indicates that $\alpha$ and $\beta$ can be exchanged in the situation in which the correlation value is higher than the original correlation value. The correlation value will also be higher than the original correlation value after changing $\alpha$ and $\beta$ , although their improvement may be different.

From the above analysis, it can be observed that the proposed method can improve other path-based methods and achieve higher correlation values with standard benchmarks (e.g. strategy 4 obtains the highest correlation of 0.887 with MC30 dataset and 0.857 with RG65 dataset. For strategy 3, the corresponding values are 0.883 and 0.864, respectively). With all these characteristics, the proposed method is suitable for engineering applications, especially the field of knowledge reuse and information management.

4.5 Application of the proposed model on sentence similarity

To further observe the performance of our proposed model on text processing tasks, we conducted comparative tests on the Microsoft Research Paraphrase Corpus (MSRP) [42] for sentence similarity in which 1725 test pairs were used to perform the paraphrase recognition task. For each sentence pair, human subjects have judged whether any of the two sentences in the pair contains a paraphrase of the other. The sentence pair will be marked as 1 if it contains any paraphrase of the other, and marked as 0 in case of no containment. We utilize Rada [43]’s method to calculate the semantic similarity between sentences.

$\displaystyle\textit{sim}(S_{i},S_{j})=\frac{\sum\nolimits_{w\in S_{i}}{\max% \textit{Sim}(w,S_{j})}+\sum\nolimits_{w\in S_{j}}{\max\textit{Sim}(w,S_{i})}}{% |{S_{i}}|+|{S_{j}}|}$ (13)

where $S_{i}$ and $S_{j}$ are the sentences, $\textit{maxSim}(w,S_{j})$ is the highest semantic word similarity among similarities between a word in $S_{i}$ and all the words in $S_{j}$ . The semantic word similarity computing methods are various existing WordNet-based methods and our model. It should be pointed out that we only consider the nouns in the sentence while calculating the semantic similarity between sentences.

The threshold is set to 0.5 according to Rada [43]. When the similarity between two sentences is higher or equal to 0.5, any of the two sentences contains a paraphrase of the other. By combining different WordNet-based word similarity methods [4, 17, 39], we obtained the comparison results as shown in Table 5.

Table 5

Performance of various combined word similarity methods on the Microsoft Research Paraphrase Corpus (MSRP)

Combined methods	Accuracy	Recall	F1-measure
Leacock	66.49	100	79.87
Hao	66.49	100	79.87
Wu	66.49	100	79.87
Proposed strategy4 model (Wu) ( $\alpha=$ 0.6)	66.55	100	79.91
Proposed strategy3 model (Wu) ( $\alpha=$ 1, $\beta=$ 1)	66.6	100	79.95
Li	66.43	99.73	79.74
Proposed strategy4 model (Li) ( $\alpha=$ 0.6)	66.67	99.91	79.97
Proposed strategy3 model (Li) ( $\alpha=$ 1.2, $\beta=$ 1.2)	66.73	99.83	79.99
Liu	66.43	99.91	79.80
Proposed strategy4 model (Liu) ( $\alpha=$ 1)	66.61	1	79.96
Proposed strategy3 model (Liu) ( $\alpha=$ 0.9, $\beta=$ 0.9)	66.61	99.83	79.91

Take the 1248th item in MSRP for example, the compared sentences are:

sentence1: ‘His lawyer, Pamela MacKey, said Bryant expects to be completely exonerated.’ sentence2: “‘ Mr. Bryant is innocent and expects to be completely exonerated,” Mackey said in a statement.’.

These sentences are marked as 0 in MSRP. However, two noun synset sets are obtained after picking up the nouns as the guide word to query synsets from WordNet and the results are shown as follows:

Sentence1:

{‘lawyer’: Synset(‘lawyer.n.01’), Synset(‘beryllium.n.01’)}

Sentence2:

{‘Mr.’:Synset(‘mister.n.01’), ‘innocent’:Synset(‘innocent.n.01’),

‘be’:Synset(‘beryllium.n.01’), ‘in’: Synset(‘inch.n.01’), Synset(‘indium.n.01’),

Synset(‘indiana.n.01’), ‘a’: Synset(‘angstrom.n.01’), Synset(‘vitamin_a.n.01’),

Synset(‘deoxyadenosine_monophosphate.n.01’), Synset(‘adenine.n.01’),

Synset(‘ampere.n.02’), Synset(‘a.n.06’), Synset(‘a.n.07’),

‘statement’: Synset(‘statement.n.01’), Synset(‘argument.n.01’), Synset(‘statement.n.03’),

Synset(‘statement.n.04’), Synset(‘affirmation.n.02’), Synset(‘instruction.n.04’),

Synset(‘statement.n.07’)}

Based on these word pairs, the following is the process of sentence similarity computation when using Eq. (13) and Li’s method [4].

$\displaystyle\textit{sim}_{Li}(\textit{sentence}1,\textit{sentence}2)=\frac{(0% .45+1)+(0.16+0.45+1+0.67+0.2+0.14)}{2+6}=0.51$ (14)

A similar computation is conducted to these word pairs when using Eq. (13) and our strategy 4 ( $\alpha=$ 0.6).

$\displaystyle\textit{sim}_{\textit{strategy}4(Wu)}(\textit{sentence}1,\textit{% sentence}2)=\frac{(0.05+1)+(0.16+0.16+1+0.67+0.2+0.14)}{2+6}=0.42$ (15)

The results show that the similarity between the sentences decreased from 0.51 to 0.42 when using strategy 4 to improve Li’s word similarity method, which is lower than the threshold 0.5 and closer to the human judgment value 0.

The same situation happened on the 1295th item in MSRP, the compared sentences are:

sentence1: ‘Chante Jawan Mallard, 27, went on trial Monday, charged with first-degree murder.’ sentence2: ‘Chante Jawaon Mallard, 27, is charged with murder and tampering with evidence.’.

These sentences are also marked as 0 in MSRP. The results after disposing of are shown as follows:

Sentence1:

{‘Jawan’: Synset(‘jawan.n.01’), ‘Mallard’: Synset(‘mallard.n.01’),

‘trial’: Synset(‘test.n.05’), Synset(‘trial.n.02’), Synset(‘test.n.04’), Synset(‘trial.n.04’),

Synset(‘trial.n.05’), Synset(‘trial.n.06’), ‘Monday’: Synset(‘monday.n.01’),

‘first’: Synset(‘first.n.01’), Synset(‘first.n.02’), Synset(‘beginning.n.02’),

Synset(‘first_base.n.02’), Synset(‘first.n.05’), Synset(‘first_gear.n.01’),

‘degree’: Synset(‘degree.n.01’), Synset(‘degree.n.02’), Synset(‘academic_degree.n.01’),

Synset(‘degree.n.04’), Synset(‘degree.n.05’), Synset(‘degree.n.06’), Synset(‘degree.n.07’), ‘murder’: Synset(‘murder.n.01’)}

Sentence2:

{‘Mallard’: Synset(‘mallard.n.01’), ‘murder’: Synset(‘murder.n.01’),

‘tampering’: Synset(‘meddling.n.01’), ‘evidence’: Synset(‘evidence.n.01’),

Synset(‘evidence.n.02’), Synset(‘evidence.n.03’)}

The process of sentence similarity computation when using Eq. (13) and Li ’s method [4] is shown as:

$\displaystyle\textit{sim}_{{Li}}(\textit{sentence}1,\textit{sentence}2)=\frac{% \begin{array}[]{c}(0.06+1+0.25+0.11+0.25+0.29+1)\\ {}+(1+1+0.37+0.29)\\ \end{array}}{7+4}=0.51$ (16)

The similar computation using Eq. (13) and our strategy 4 is shown as:

$\displaystyle\textit{sim}_{\textit{strategy}4(Wu)}(\textit{sentence}1,\textit{% sentence}2)=\frac{\begin{array}[]{c}(0.005+1+{0.23}+{0.11}+{0.19}+{0.28}+{1})% \\ {}+(1+1+0.36+0.28)\\ \end{array}}{7+4}=0.495$ (17)

Despite the examples demonstrate that our strategy 4 behaves better, experiment results in Table 5 showed that the accuracy, recall rate, and F1-measure of exist path-based methods have little improvement when switching to strategy 4. That means few items in MSRP changed their mark in that situation. The reason is that the preprocessing of sentences is imperfect. Situations that words like ‘a’ and someone’s names have some noun senses doesn’t help but make some redundant senses when computing word similarities and finding possible antonyms. Especially we only pick up nouns in sentences during the process of computing the similarity between sentences. In the future work, we should divide the sentences by part of speech properly, and combined the word similarity with string similarity to compute the similarity between each part of sentences. After dividing, we can give different weight to the word similarity and the string similarity according to the words’ part of speech. For example, when computing the part of sentences which include names, the weight of string similarity will be much higher than that of word similarity.

4.6 Application of proposed model on short text similarity

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics. SemEval-2014 was the 8th workshop on semantic evaluation. The Cross-Level Semantic Similarity is one of the SemEval-2014’s tasks, and it contains the task that computing semantic similarity between paragraph and sentence or so-called ‘short text’. After using Eq. (13) to compute the similarity between sentences, the following formula is used to combine sentence similarity to obtain the short text similarity.

$\displaystyle\textit{sim}(T_{i},T_{j})=\frac{\sum\limits_{1}^{n}{\max(\textit{% sim}(S_{i},S_{j}))}+\sum\limits_{1}^{m}{\max(\textit{sim}(S_{i},S_{j}))}}{m+n}$ (18)

where $T_{i}$ and $T_{j}$ are two short text sets to be compared, the elements in two sets are sentences and are represented as $S_{i}$ and $S_{j}$ , respectively. $m$ and $n$ are the numbers of sentences in the two short text sets. The similarity between short texts is calculated as the mean value of the max similarity between a sentence from one short text set and all the sentences in another short text set.

Experiment results show that our proposed model also works in some short text similarity cases in SemEval-2014. Take the 343rd item in the training set as an example, the compared texts are:

Text1: ‘At what are obviously busy times on the net. I use Firefox via Orange. Anybody else had a problem please?’ Text2: ‘My internet is slowing to a crawl.’

The disposed of results of texts on the basis of sentences are shown in Table 6.

Table 6

Synsets of concepts in Text1 and Text2

Text	Concept	Synsets
Text1	At	Synset(‘astatine.n.01’), Synset(‘at.n.02’)
	Are	Synset(‘are.n.01’)
	Times	Synset(‘times.n.01’), Synset(‘multiplication.n.03’), Synset(‘time.n.01’), Synset(‘time.n.02’), Synset(‘time.n.03’), Synset(‘time.n.04’), Synset(‘time.n.05’), Synset(‘time.n.06’), Synset(‘clock_time.n.01’), Synset(‘fourth_dimension.n.01’), Synset(‘meter.n.04’), Synset(‘prison_term.n.01’)
	Net	Synset(‘internet.n.01’), Synset(‘net.n.02’), Synset(‘net_income.n.01’), Synset(‘net.n.04’), Synset(‘net.n.05’), Synset(‘net.n.06’)
	I	Synset(‘iodine.n.01’), Synset(‘one.n.01’), Synset(‘i.n.03’)
	Use	Synset(‘use.n.01’), Synset(‘function.n.02’), Synset(‘use.n.03’), Synset(‘consumption.n.03’), Synset(‘habit.n.02’), Synset(‘manipulation.n.01’), Synset(‘use.n.07’)
	Orange	Synset(‘orange.n.01’), Synset(‘orange.n.02’), Synset(‘orange.n.03’), Synset(‘orange.n.04’), Synset(‘orange.n.05’)
	A	Synset(‘angstrom.n.01’), Synset(‘vitamin_a.n.01’), Synset(‘deoxyadenosine_monophosphate.n.01’), Synset(‘adenine.n.01’), Synset(‘ampere.n.02’), Synset(‘a.n.06’), Synset(‘a.n.07’)
	Problem	Synset(‘problem.n.01’), Synset(‘problem.n.02’), Synset(‘trouble.n.01’)
Text2	internet	Synset(‘internet.n.01’)
	Slowing’	Synset(‘deceleration.n.01’)
	A	Synset(‘angstrom.n.01’), Synset(‘vitamin_a.n.01’), Synset(‘deoxyadenosine_monophosphate.n.01’), Synset(‘adenine.n.01’), Synset(‘ampere.n.02’), Synset(‘a.n.06’), Synset(‘a.n.07’)
	Crawl	Synset(‘crawl.n.01’), Synset(‘crawl.n.02’), Synset(‘crawl.n.03’)

Table 7

Antonyms in the NLAP of compared concepts

Compared concepts		Antonyms in concepts’ NLAPs
At	Slowing	Synset(‘acceleration.n.01’)
Are	Slowing	Synset(‘acceleration.n.01’)
Times	Internet	Synset(‘inactivity.n.03’), Synset(‘natural_object.n.01’)
Times	A	Synset(‘inactivity.n.03’)
Net	Internet	Synset(‘loss.n.06’) Synset(‘outgo.n.01’) Synset(‘natural_object.n.01’)
Use	Crawl	Synset(‘inutility.n.01’) Synset(‘supply.n.02’) Synset(‘discontinuance.n.01’) Synset(‘inactivity.n.03’) Synset(‘misconception.n.01’)

Part of the set that contains the antonyms appear in the NLAP of two compared concepts in the computational process when using our model is listed in Table 7.

The similarity between two short texts is marked as 0.75 (the range is from 0 to 4) under human judgment. The computational process of short text when using Li [4]’s method is:

$\displaystyle\textit{sim}(\textit{Text}1,\textit{Text}2)=\frac{(0.52+0.36+0.43% )+0.52}{3+1}=0.46(\text{the range is from 0 to1})$ (19)

The similarity between texts decreased to 0.42 (the range is from 0 to 1) when using our strategy 4, which is closer to human judgment. Its detail is shown as:

$\displaystyle\textit{sim}(\textit{Text}1,\textit{Text}2)=\frac{(0.45+0.39+0.39% )+0.45}{3+1}=0.42(\text{the range is from 0 to 1})$ (20)

Another example is the 347th item in the training set, its compared texts are:

Text3: ‘What do you all the think the impact on the British internet providers has been since the release of online media streaming, not so much youtube but more BBCi player, they allow streamed content of HD and it is hugely popular …surely this has put a lot of strain on the providers servers. Have they had to spend a fortune to upgrade?’ Text4: ‘HD video accounts for over 50% of the bandwidth consumed in Britain on an average weeknight.’

The words in the above texts that have antonyms are shown in Table 7.

The similarity between the above two short texts is marked as 0 (the range is from 0 to 4) under human judgment. However, the similarity result is 0.31 (the range is from 0 to 1) when using Li’s method [4] and Eq. (13), its detail is shown as:

$\displaystyle\textit{sim}(\textit{Text}3,\textit{Text}4)=\frac{(0.33+0.28)+0.3% 3}{2+1}=0.31(\text{the range is from 0 to 1})$ (21)

The similarity between texts decreased to 0.3 (the range is from 0 to 1) when using our strategy 4 and Eq. (13). Its detail is shown as:

$\displaystyle\textit{sim}(\textit{Text}3,\textit{Text}4)=\frac{(0.32+0.26)+0.3% 2}{2+1}=0.3(\text{the range is from 0 to 1})$ (22)

4.7 Application of the proposed model on topic analysis

Sentence similarity can also be applied to the field of text classification by computing the similarity between texts since texts that have the same topic can get higher similarity score. Quora [44] is an American question-and-answer website where questions are asked, answered, followed, and edited by Internet users, either factually or in the form of opinions. Each question has at least one topic, and the answers to a question should have the same topics. To validate the effectiveness of the proposed model on text classification, we choose two questions which have different topics and cut out some short texts from answers of the questions. Then we use Eqs (10) and (13) to compute the similarities between these short texts. Our model will play a role if there are differences between the similarities of short texts with the same and different topics.

The similarity of short texts has been shown in Table 9. The ‘A’, ‘B’, ‘C’, ‘D’, ‘E’ represent part of the answers of the question ‘What do you think about the cryptocurrency Ripple?’, and the ‘F’, ‘G’, ‘H’, ‘I’ are the short texts cut out from the answers of the question ‘What do you think about cooking?’. The details of the short texts have been given in Table 8.

Table 8
The short texts from answers in Quora

Labels of answers	Respondents	Answers
Question: What do you think about the cryptocurrency Ripple?
A	Eric Olson	I think it’s extremely over valued compared to other crypto. No I don’t hate on it because they work with banks. The problem with Ripple is that it’s centrally controlled by one company that holds most of the currency and has basically complete authority over the network.
B	Rene C. Toyer	I have invested in them, most people however find investing in them unethical. Personally the way I feel about the cryptocurrency space is that it is revolutionary however as much as the bank system blatantly rubs us of our hard earn money for services that aren’t worth the fees.
C	Mandar Pande	XRP has good future: It’s Secure, Fast & Scalable digital currency rather an asset. It uses a revolutionary technology. Good team with decent investors and most importantly a great vision.
D	Joe Miller	For myself it’s not something I would support. The vast majority of the tokens are held by RIPPLE. This, and the fact their code allows them to freeze your tokens at their whim allows them to control the value of the currency.
E	Fabio Augusto	I don’t like the idea of a centralized cryptocurrency. They do have innovative projects as a company and as a platform for banks, but the token itself is not getting much from it. I am more advocate of decentralized cryptocurrencies, such as Ethereum and Bitcoin.
Question: What do you think about cooking?
F	Adrienne Boswell	Personally, I love to cook and bake. Iv́e been cooking since I was nine years old. My mother liked gourmet food so we always were trying something new and different. My roommate for 7 years was an executive chef, taught me a lot, and always said I missed my calling by not being a professional baker.
G	Larry Canepa	Cooking is one the three greatest passions of my life, so I’d be a little lost without it. Cooking is one of those arts that challenges you continually. You can make something perfect, or nearly perfect, many times in a row only to have an instance where it wasn’t up to your standard.
H	Steve Lynch	I personally enjoy cooking and I am responsible for the bulk of the cooking in our household. I also really enjoy the products of cooking as you can tell by looking at my waistline.
I	Pritam Naldurgakar	As I am chef, cooking is sweetest thing I experience everyday, when you make anything on your own, when yourself get satisfied what you made thatś most important thing and second thing is when others also like it then you have some magic in your hand, while cooking I never get bored.

Table 9

The similarities between answers

Topics of questions	Virtual currencies					Cooking
Answers	A	B	C	D	E	F	G	H	I
A	1.03086	0.71108	0.71755	0.7532	0.72458	0.54956	0.64272	0.56321	0.6752
B	0.711088	1.02294	0.70888	0.66521	0.76938	0.58856	0.72565	0.64915	0.70511
C	0.71755	0.70888	1.05653	0.65745	0.6854	0.53486	0.65718	0.55067	0.66219
D	0.7532	0.66521	0.65745	1.03107	0.65608	0.52658	0.6394	0.62058	0.66229
E	0.72458	0.76938	0.6854	0.65608	0.99985	0.66865	0.72342	0.67879	0.75312
F	0.54956	0.58856	0.53486	0.52658	0.66865	1.02646	0.71004	0.62428	0.66691
G	0.64272	0.72565	0.65718	0.6394	0.72342	0.71004	0.9997	0.68843	0.77092
H	0.56321	0.64915	0.55067	0.62058	0.67879	0.62428	0.68843	1.01154	0.73522
I	0.6752	0.70511	0.66219	0.66229	0.75312	0.66691	0.77092	0.73522	1.013

We can know from Table 9 that there are boundaries between different short texts with different topics. Values in the upper left corner and the lower right corner are the similarities of the short texts with the same topic. We can see that most of the similarity values of the same-topic short text are around 1 and fewer are lower than 0.7. Values in the upper right corner are the similarities of the short texts with different topics in which fewer of the different-topics similarities are higher than 0.7. In this case, the value of 0.7 is the difference that can label the short texts. Therefore, 0.7 can be treated as the threshold value which can be used to determine whether the short texts with the same topic or not. The experiment results showed that the feasibility of our proposed model when it is applied in text classification.

5. Discussion

The correlation values between different methods and human ratings are listed in Table 4. It can be observed that methods [4, 39] considering more related factors achieve higher correlation values than the single-factor measure [38]. The correlation values of some methods [32, 35] show that the combination of the factors plays an important role in the process of calculating the similarity of concepts. The experimental results confirm the following statements:

1.
Considering more factors in the process of similarity measuring (like human processing) can improve the performance of the measurement.
2.
An appropriate combination of the factors increases the correlation value.

From the above observations, we believe that effective methods combining more related factors contribute to the performance of similarity measures.

Directly employing a single factor to compute the similarity between concepts gives a low correlation with human ratings. Therefore, other factors can be introduced to make an impact on the process of computing similarity. For instance, the dissimilarity between concepts can be considered while computing the similarity which is computed as AC in the proposed model. Different from other algorithms [36, 37] which simply consider the direct antonymy, we explored the effect of antonyms from concepts’ ancestor, and further considered the effect of antonyms in the NLAP structure.

The experimental results show that when (0 $<\alpha<$ 0.6), strategy 4 improves the accuracy of semantic similarity assessment. When the value of $\alpha$ and $\beta$ appear in the yellow area in Fig. 4c, strategy 3 achieves a higher correlation value. The proposed method has a high correlation with standard benchmarks. It obtains the highest correlation of 0.887 on MC30, and 0.864 on RG65.

From Table 3, a conclusion can be drawn that our model can significantly improve the correlation value of various edge-counting similarity approaches on the WordNet, including path-based only algorithms [3], path-and-depth-based algorithms [4, 17], and combined path length, depth, density to re-weight the path length algorithm [16]. After combining with strategy 4, the edge-counting similarity approaches mentioned above universally promoted their correlation values on MC30 about 5 percent and about 0.5 percent on RG65. The extent of promotion decreases with the increase of the correlation values in the original methods. The best correlations reach 0.89 and 0.86 on MC30 and RG65 for WordNet, respectively.

While combining with strategy 3, the edge-counting similarity approaches promoted their correlation values on MC30 about 7 percent, and about 1.5 percent on RG65. The extent of promotion decreases with the increase of the correlation values of the original methods. The best correlations reach 0.89 and 0.865 on MC30 and RG65 for WordNet, respectively. Our correlations are widely recognized and repeatable for computer-based similarity measures on the MC30 dataset and are quite similar to the average correlation (0.9015) between individual human subjects reported in Resnik’s replication [20] of the Miller and Charles experiment.

Table 10
The details of similarity measuring in strategy 4 on MC30

Word Pairs Human Judement Sim of Li Antisim of Li ( $\alpha=$ 0.48) Changed or not

Cord Smile 0.13 0.1128 0.1128 N

Rooster Voyage 0.08 0.0054 0.0054 N

Noon String 0.08 0.0924 0.0924 N

Glass Magician 0.11 0.2335 0.2335 N

Monk Slave 0.55 0.4491 0.4491 N

Coast Forest 0.42 0.3483 0.3483 N

Monk Oracle 1.1 0.2465 0.1728 Y

Lad Wizard 0.42 0.4491 0.1858 Y

Forest Graveyard 0.84 0.1912 0.1912 N

Food Rooster 0.89 0.0415 0.0415 N

Coast Hill 0.87 0.442 0.1827 Y

Car Journey 1.16 0.0179 $-$ 0.0012 Y

Crane Implement 1.68 0.4487 0.4487 N

Brother Lad 1.66 0.4491 0.4028 Y

Bird Crane 2.97 0.5488 0.5488 N

Bird Cock 3.05 0.8187 0.8187 N

Food Fruit 3.08 0.1565 0.0196 Y

Brother Monk 2.82 0.8187 0.8187 N

Asylum Madhouse 3.61 0.8187 0.8187 N

Furnace Stove 3.11 0.1645 0.5129 Y

Magician Wizard 3.5 1 1 N

Journey Voyage 3.84 0.8187 0.8187 N

Coast Shore 3.7 0.8147 0.8147 N

Implement Tool 2.95 0.8184 0.8184 N

Boy Lad 3.76 0.8187 1.1026 Y

Automobile Car 3.92 1 1 N

Midday Noon 3.42 1 1 N

Gem Jewel 3.84 1 1 N

Cemetery Woodland 0.95 0.1912 0.1912 N

Shore Woodland 0.63 0.4254 0.4254 N

It can be observed in Table 10 that only 8 pairs of concepts’ similarities have changed in MC30 while using strategy 4 to improve Li’s method [4], which means the AC of 8 concept pairs are nonzero. This demonstrates that every two concepts may not appear antonyms. Meanwhile it explains the reason our improvement is not good enough in some cases. The result in Table 10 also indicates that the rectification of antonymy’s corrective effect is effective. Take concept pair “lad” and “wizard” as an example, the human judgment of similarity between “lad” and “wizard” is 0.42 (the range is from 0 to 4). However, in Li’s method [4] the similarity between them is 0.4491(the range is from 0 to 1), which is much higher than human judgment. After improved by strategy 4, the similarity between “lad” and “wizard” computed by Li’s method decreased to 0.1858 (the range is from 0 to 1), which is closer to human judgment. Similar to “lad” and “wizard” case, the similarity between “coast” and “hill” decreased from Li’s original similarity value 0.442 (the range is from 0 to 1) to 0.1827 (the range is from 0 to 1), which is closer to the human judgment 0.87 (the range is from 0 to 4). However, there are some overcorrection situations in the model combined strategy 4 with Li’s method For example, the similarity between “boy” and “lad” is more than 1. This is due to our formula’s limitation, the NLAPSim between “boy” and “lad” is bigger than expected, and that is greater than the corrective function of AC in Eq. (10) which leads to excessive positive correction. There are also excessive negative correction cases because of the oversized AC between concepts. For instance, the similarity between “car” and “journey is $-$ 0.0012 (the range is from 0 to 1).

Table 11
The details of similarity measuring in strategy 3 on MC30

Word Pairs Human Judgment Sim of Li The First Computing ( $\alpha=$ 1.4) The Second Computing ( $\beta=$ 1.3) Similarity Changed Times

Cord Smile 0.13 0.1128 0.1128 0.1128

Rooster Voyage 0.08 0.0054 0.0054 0.0054

Noon String 0.08 0.0924 0.0924 0.0924

Glass Magician 0.11 0.2335 0.2335 0.2335

Monk Slave 0.55 0.4491 0.3011 0.3011 Once

Coast Forest 0.42 0.3483 0.3483 0.3483

Monk Oracle 1.1 0.2465 0.1652 0.1652 Once

Lad Wizard 0.42 0.4491 $-$ 0.3189 $-$ 0.026 Twice

Forest Graveyard 0.84 0.1912 0.1912 0.1912

Food Rooster 0.89 0.0415 0.0415 0.0415

Coast Hill 0.87 0.442 0.1281 0.1281 Once

Car Journey 1.16 0.0179 $-$ 0.0379 $-$ 0.0172 Twice

Crane Implement 1.68 0.4487 0.4487 0.4487

Brother Lad 1.66 0.4491 $-$ 0.1792 0.1386 Twice

Bird Crane 2.97 0.5488 0.5488 0.5488

Bird Cock 3.05 0.8187 0.8187 0.8187

Food Fruit 3.08 0.1565 $-$ 0.0756 $-$ 0.0294 Twice

Brother Monk 2.82 0.8187 0.8187 0.8187

Asylum Madhouse 3.61 0.8187 0.8187 0.8187

Furnace Stove 3.11 0.1645 $-$ 0.1934 0.727 Twice

Magician Wizard 3.5 1 1 1

Journey Voyage 3.84 0.8187 0.8187 0.8187

Coast Shore 3.7 0.8147 0.8147 0.8147

Implement Tool 2.95 0.8184 0.8184 0.8184

Boy Lad 3.76 0.8187 0.2469 1.2959 Twice

Automobile Car 3.92 1 1 1

Midday Noon 3.42 1 1 1

Gem Jewel 3.84 1 1 1

Cemetery Woodland 0.95 0.1912 0.1912 0.1912

Shore Woodland 0.63 0.4254 0.4254 0.4254

There are two sets of similarity results of strategy 3 in Table 11. That is because strategy 3 is the iteration of strategy 2, so there may be two similarity computations of compared concepts in strategy 3 and may produce two AC s. When some concepts’ similarities are corrected and reach the expectation in the first measuring, some concepts’ similarities appear in overcorrection situations like strategy 4. So here comes the second measuring which means another correction to the overcorrection concepts, such as concept pair “furnace” and “stove” in Table 11.

Table 12
Computer configuration used in the experiment

Software Computer type CPU type CPU frequency Memory

Pycharm Desktop PC i5-4210H 2.9 GHz 12 GB

However, since our strategies are proposed to improve the results of other exiting similarity methods [4, 17, 32, 38, 39], our proposed models take longer than the normal algorithms while calculating the similarity

Table 13
Details of similarity measuring in strategy 3 on MC30

Methods References Semantic relations’ number Similarity factors Considered concepts’ number Similarity or relatedness Computing time on MC30 (in s)

Path distance Depth Density Features LCA IC Similarity Relatedness

Rada Rada et al. (1989) 2 $\times$ 2 $\times$ N/A

Wu Wu and Palmer (1994) 2 $\times$ $\times$ $\times$ 3 $\times$ 0.1038

Leacock Leacock and Chodrow 0.0226

(1998)

Liu-1 Liu et al. (2007) 0.046

Li Li et al. (2003) 2 $\times$ $\times$ $\times$ 3 $\times$ 0.0751

Hao Hao et al. (2011) 2 $\times$ $\times$ $\times$ 3 $\times$ 0.036

Resnik IC computing in Seco’s method Resnik (1995) 2 $\times$ $\times$ 1 $\times$ 2.2469

IC computing in Meng’s method 26.553

Jiang Jiang and Conrath (1997) 2 $\times$ $\times$ $\times$ 3 $\times$ N/A

Lin IC computing in Seco’s method Lin (1998) 2 $\times$ $\times$ 3 $\times$ 5.6808

IC computing in Meng’s method 37.3487

Meng IC computing in Seco’s method Meng et al. (2012) 2 $\times$ $\times$ 3 $\times$ 5.7281

IC computing in Meng’s method 27.8491

Zhou Zhou et al. (2008) 2 $\times$ $\times$ $\times$ $\times$ 3 $\times$ N/A

Zhu Combined with Wu Zhu et al. (2017) 2 $\times$ $\times$ $\times$ $\times$ 3 $\times$ 2.4274

Combined with Lch 2.4289

Combined with Li 2.3439

Tversky Tversky (1997) $\times$ 2 $\times$ N/A

R&E Rodríguez and Egenhofer $\times$ 2 $\times$ N/A

(2003)

Petrakis Petrakis et al. (2006) $\times$ 2 $\times$ N/A

Hirst&St-Onge Hirst and St-Onge (1998) 3 $\times$ N/A

Sussna Sussna (1993) 5 $\times$ N/A

Strategy 3 3 $\times$ $\times$ $\times$ 3 $+$ n $\times$ 2.1484

Strategy 4 3 $\times$ $\times$ $\times$ 3 $+$ n $\times$ 1.9745

between concepts. To compare the computing time of each model, we recorded some models’ computing time on MC30 and their computation factors in Table 13. The computer configuration used in our experiment is shown in Table 12. We can know from Table 13 that the computing time of IC-based methods is much longer than other methods and is mainly affected by their choice of IC computation. While using different IC computation, the same IC-based method has much different computing time. For example, the computing time of Meng’s method [7] on MC30 is 27.8491 s when using Meng’s IC computation [7] to compute IC, which is much longer than 5.7281s of using Seco’s IC computation [34]. That’s because the formulas of IC computation also have different similarity factors. We can know from the difference between Meng’s IC computation and Seco’s IC computation that Meng’s method is more complex and needs more similarity factors like descendants of concepts and depth. This situation proves that the computing time is influenced by a method’s similarity factors.

In a method, all of its similarity factors are based on a concept’s factors in WordNet such as depth and LCA. Thus, the computing time is also influenced by concepts, especially the number of concepts. The number of similarity factors will increase when we need to query more concepts from WordNet. Moreover, query different relations in WordNet between concepts take different time. Hypernymy, hyponymy, holonymy, meronymy take the same time because they are directly connecting concepts. However, we cannot directly query a concept’s antonym by using NLTK. We can only get the antonyms of a concept’s lemmas and track which concept these antonyms belong to get antonyms of concepts. So, query antonymy takes longer than other relations in NLTK. While considering more relations in the process of similarity measuring, it takes more time because we need to deal with these relations separately. Hence the number of semantic relations also affects computing time.

Above all, we can conclude that the computing time of similarity measuring will increase with the increase of the number of similarity factors, the quantity of considered semantic relations, number of considered concepts, and the complexity of its formula. In addition, query concepts’ antonymy by NLTK takes longer than other relations.

We can know in Table 13 that our average computing times of strategy 3 and strategy 4 on dataset MC30 are 2.1484 s and 1.9745 s, respectively. Because of our strategies are attached to other existing methods [4, 17, 32, 38, 39], we got five computing times for each strategy and the computing time in Table 13 is the mean value of them. For example, the computing time of strategy 3 is 2.1484 s which is the mean value of 2.4129 s (combined strategy 3 with Wu’s method [17]), 1.9924 s (combined strategy 3 with Leacock’s method [38]), 2.1012 s (combined strategy 3 with Li’s method [4]), 2.1041 s (combined strategy 3 with Liu’s method [39]), 2.1317 s (combined strategy 3 with Hao’s method [32]). Our strategies’ computing time is close to Zhu’s model [14] and is longer than the computing time of the methods we improved. That is because both Zhu’s approach [14] and our approaches can be seen as the attachment of other processes on the existing path-based approaches. Compared with the methods we improved, our strategies consider more similarity factors (AC), more semantic relations, and a bigger number of considered concepts. Therefore, the fact that our methods take more computing time is consistent with the above conclusions. Furthermore, strategy 3 normally takes more time than strategy 4. That is because strategy 3 is the iteration of strategy 2 while strategy 4 just adds a similarity factor called NLAPSim to strategy 2, which means that the complexity of strategy 3 is higher than strategy 4 and needs an extra computation on compared concepts.
6. Conclusion and future work

Word Pairs	Human Judement	Sim of Li	Antisim of Li ( $\alpha=$ 0.48)	Changed or not
Cord	Smile	0.13	0.1128	0.1128	N
Rooster	Voyage	0.08	0.0054	0.0054	N
Noon	String	0.08	0.0924	0.0924	N
Glass	Magician	0.11	0.2335	0.2335	N
Monk	Slave	0.55	0.4491	0.4491	N
Coast	Forest	0.42	0.3483	0.3483	N
Monk	Oracle	1.1	0.2465	0.1728	Y
Lad	Wizard	0.42	0.4491	0.1858	Y
Forest	Graveyard	0.84	0.1912	0.1912	N
Food	Rooster	0.89	0.0415	0.0415	N
Coast	Hill	0.87	0.442	0.1827	Y
Car	Journey	1.16	0.0179	$-$ 0.0012	Y
Crane	Implement	1.68	0.4487	0.4487	N
Brother	Lad	1.66	0.4491	0.4028	Y
Bird	Crane	2.97	0.5488	0.5488	N
Bird	Cock	3.05	0.8187	0.8187	N
Food	Fruit	3.08	0.1565	0.0196	Y
Brother	Monk	2.82	0.8187	0.8187	N
Asylum	Madhouse	3.61	0.8187	0.8187	N
Furnace	Stove	3.11	0.1645	0.5129	Y
Magician	Wizard	3.5	1	1	N
Journey	Voyage	3.84	0.8187	0.8187	N
Coast	Shore	3.7	0.8147	0.8147	N
Implement	Tool	2.95	0.8184	0.8184	N
Boy	Lad	3.76	0.8187	1.1026	Y
Automobile	Car	3.92	1	1	N
Midday	Noon	3.42	1	1	N
Gem	Jewel	3.84	1	1	N
Cemetery	Woodland	0.95	0.1912	0.1912	N
Shore	Woodland	0.63	0.4254	0.4254	N

Word Pairs	Human Judgment	Sim of Li	The First Computing ( $\alpha=$ 1.4)	The Second Computing ( $\beta=$ 1.3)	Similarity Changed Times
Cord	Smile	0.13	0.1128	0.1128	0.1128
Rooster	Voyage	0.08	0.0054	0.0054	0.0054
Noon	String	0.08	0.0924	0.0924	0.0924
Glass	Magician	0.11	0.2335	0.2335	0.2335
Monk	Slave	0.55	0.4491	0.3011	0.3011	Once
Coast	Forest	0.42	0.3483	0.3483	0.3483
Monk	Oracle	1.1	0.2465	0.1652	0.1652	Once
Lad	Wizard	0.42	0.4491	$-$ 0.3189	$-$ 0.026	Twice
Forest	Graveyard	0.84	0.1912	0.1912	0.1912
Food	Rooster	0.89	0.0415	0.0415	0.0415
Coast	Hill	0.87	0.442	0.1281	0.1281	Once
Car	Journey	1.16	0.0179	$-$ 0.0379	$-$ 0.0172	Twice
Crane	Implement	1.68	0.4487	0.4487	0.4487
Brother	Lad	1.66	0.4491	$-$ 0.1792	0.1386	Twice
Bird	Crane	2.97	0.5488	0.5488	0.5488
Bird	Cock	3.05	0.8187	0.8187	0.8187
Food	Fruit	3.08	0.1565	$-$ 0.0756	$-$ 0.0294	Twice
Brother	Monk	2.82	0.8187	0.8187	0.8187
Asylum	Madhouse	3.61	0.8187	0.8187	0.8187
Furnace	Stove	3.11	0.1645	$-$ 0.1934	0.727	Twice
Magician	Wizard	3.5	1	1	1
Journey	Voyage	3.84	0.8187	0.8187	0.8187
Coast	Shore	3.7	0.8147	0.8147	0.8147
Implement	Tool	2.95	0.8184	0.8184	0.8184
Boy	Lad	3.76	0.8187	0.2469	1.2959	Twice
Automobile	Car	3.92	1	1	1
Midday	Noon	3.42	1	1	1
Gem	Jewel	3.84	1	1	1
Cemetery	Woodland	0.95	0.1912	0.1912	0.1912
Shore	Woodland	0.63	0.4254	0.4254	0.4254

In this paper, we proposed a new approach to compute the similarity between concepts based on the antonymy of WordNet. The approach is suitable for improving other path-distance based methods to calculate semantic similarity between concepts. Compared with the existing “is-a” relation-based methods, our proposed model introduced one more structure in WordNet into the process of similarity computation to make the similarity results more precise. Compared with the existing multiple relation based measures, our proposed model makes full use of the potential antonymy and introduced it into the process of similarity computation. In the introduction, we put forward the research problems of our paper. We conclude the paper by answering each of them.

RQ1 What kind of structure can make full use of the antonymy?

Considering the low appearance of antonymy and the positive effect of the least common ancestor, we proposed a model named Node to Least Common Ancestor Antisense Path (NLAP).

The ancestors of concepts can be seen as the extension of concepts because concepts inherit some attributes from their ancestors. The farther a node is away from its ancestor, the fewer attributes it can inherit from that ancestor. To some extent, the antonyms of a concept’s ancestor can be seen as the antonyms of the concept. Those antonyms are called the NLAP of the concept, and the dissimilarity between concepts equals to the similarity between a concept and the nodes on the NLAP of another compared concept. In addition, those antonyms of ancestors’ contribution to the dissimilarity computation are determined by the different path distance between the concept and its ancestors.

RQ2 How effective is antonymy during similarity computation?

We introduced the antonymy into the similarity computation in the proposed model. Experiment results showed that our model obtains the highest correlation of 0.887 on word dataset MC30 and of 0.864 on word dataset RG65. The proposed model can correct the over-similar situations of existing path distance-based models by introducing the NLAP of concepts.

Furthermore, in order to validate the correctness on similarity measurement, the proposed model is applied to different granularity level of text including word, sentence and short text. The results showed that the proposed model did work on the above circumstances as well.

Antisense Coefficient (AC), a new calculation factor introduced in this paper, represents the impact of antisense during similarity computation which plays an important role in the proposed model. Depending on the usage of NLAP, our model is derived into two strategies. Strategy 4 uses the antisense similarity got from NLAP structure and the NLAPSim to compute AC Unlike strategy 4, strategy 3 only regards the antisense similarity got from NLAP structure as AC. However, there are two times of similarity computation on the compared concepts in strategy 3.

ACs in both strategies have an excessive corrective effect on the process of similarity measuring. In future work, we will find a more efficient combination of AC with other similarity results or other factors. The lexical structure of WordNet is too complex that LCA may not be the best structure to represent hidden antonymy, so a more efficient and more effective structure can be another interesting direction in future works. Besides, we will explore other relations in the impact of WordNet on semantic similarity and evaluate the graph and the weighting path to place each relation, such as meronymy and holonymy. In addition, we will find a sentence similarity computation method that can combine the concept similarity and the string similarity properly by the part of speech of words.

Footnotes

Acknowledgments

This work was supported by Natural Science Foundation of Liaoning Province (201602583).

Author’s Bios

Hui Guan received her B.Sc. and M.Sc. Degrees in computer science from the Shenyang Institute of Chemical Technology, China, in 2000 and 2006, respectively, and her PhD in Software Engineering in 2014 from De Montfort University, England. She is currently an associated professor at School of Computer Science and Technology, Shenyang University of Chemical Technology. She has published 20 referred journal and conference papers. Her research interest covers software security, model driven development and software evolution. Prof. Guan received research financial support from Natural Science Foundation of Liaoning Province in 2016, and Research Project of Education department of Liaoning province (China) in 2010 and 2013, respectively. E-mail: h.guan@syuct.edu.cn.

ChengZhen Jia received the B.Sc. degree in computer science from Shenyang University of Chemical Technology in 2018 and now he is a candidate graduate student in Shenyang University of Chemical Technology. His current research interests include semantic web technologies, natural language processing, text mining, text categorization, feature selection, machine learning, and data mining.

Hongji Yang obtained his B.Sc. and M.Phil. in Computer Science in 1982 and 1985 respectively from Jilin University, China, and his PhD in Computer Science in 1994 from Durham University, England. He is currently a Professor at the Department of Informatics, University of Leicester, England. His general research interests include Software Engineering and Pervasive Computing. He served as a Program Co-Chair at IEEE International Conference on Software Maintenance 1999 (ICSM’99) and the Program Chair at IEEE Computer Software and Application Conference 2002 (COMPSAC’02). Prof. Yang is IEEE Computer Society Golden Core Member, 2010.

References

Budanitsky

and Hirst

, Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures, in: Workshop on WordNet and Other Lexical Resources, 2, 2001, pp. 2–2.

Piskorski

Wieloch

and Sydow

, On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages, Information Retrieval 12(3) (2009), 275–299.

Rada

et al., Development and application of a metric on semantic nets, IEEE Transactions on Systems, Man, and Cybernetics 19(1) (1989), 17–30.

Bandar

Z.A.

and McLean

, An approach for measuring semantic similarity between words using multiple information sources in: IEEE Transactions on Knowledge and Data Engineering, IEEE, 15(4), 2003, pp. 871–882.

Kim

J.W.

and Candan

K.S.

, Cp/cv: concept similarity mining without frequency information from domain describing taxonomies in: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Association for Computing Machinery Virginia, 2006, pp. 483–492.

Cai

and Lu

, An improved semantic similarity measure for word pairs, in: 2010 International Conference on e-Education, e-Business, e-Management and e-Learning, IEEE, Sanya, 2010, pp. 212–216.

Meng

and Zhou

, A new model of information content based on concept’s topology for measuring semantic similarity in WordNet, International Journal of Grid and Distributed Computing 5(3) (2012), 81-94.

Alvarez

M.A.

and Lim

, A Graph Modeling of Semantic Similarity between Words, in: International Conference on Semantic Computing (ICSC 2007), IEEE, Irvine, CA, 2007, pp. 355–362.

Bin

et al., Ontology-Based Measure of Semantic Similarity between Concepts, in: 2009 WRI World Congress on Software Engineering, IEEE, Xiamen, 2009, pp. 109–112.

10.

Zhao

et al., Measuring Semantic Similarity Based on WordNet, in: 2009 Sixth Web Information Systems and Applications Conference, IEEE, Xuzhou, 2009, pp. 89–92.

11.

Stanchev

, Creating a similarity graph from WordNet, in: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), Association for Computing Machinery Thessaloniki, 2014, pp. 1–11.

12.

Quintero

et al., Dis-c: Conceptual distance in ontologies, a graph-based approach, Knowledge and Information Systems 59(1) (2019), 33–65.

13.

Zhu

et al., Measuring similarity and relatedness using multiple semantic relations in WordNet, Knowledge and Information Systems, 2019, 1–31.

14.

Zhu

et al., An efficient path computing model for measuring semantic similarity using edge and density, Knowledge and Information Systems 55(1) (2018), 79–111.

15.

Miller

G.A.

, WordNet: A lexical database for English, Communications of the ACM 38(11) (1995), 39–41.

16.

Zhou

Wang

and Gu

, New model of semantic similarity measuring in wordnet, in: 2008 3rd International Conference on Intelligent System and Knowledge Engineering, IEEE, Xiamen, 2008, pp. 256–261.

17.

and Palmer

, Verb semantics and lexical selection, arXiv preprintcmp-lg/9406033, 1994.

18.

Lee

J.H.

Kim

M.H.

and Lee

Y.J.

, Information retrieval based on conceptual distance in IS-A hierarchies, Journal of documentation, 1993.

19.

Al-Mubaid

and Nguyen

H.A.

, Measuring semantic similarity between biomedical concepts within multiple ontologies, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39(4) (2009), 389–398.

20.

Resnik

, Using information content to evaluate semantic similarity in a taxonomy, arXiv preprint cmp-lg/9511007, 1995.

21.

Jiang

J.J.

and Conrath

D.W.

, Semantic similarity based on corpus statistics and lexical taxonomy, arXiv preprint cmp-lg/9709008, 1997.

22.

Lin

, An Information-Theoretic Definition of Similarity, in: Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1998, pp. 296–304.

23.

Yuan

and Wang

, A New Model of Information Content for Measuring the Semantic Similarity between Concepts, in: 2013 International Conference on Cloud Computing and Big Data, IEEE, Fuzhou, 2013, pp. 141–146.

24.

Harispe

et al., The semantic measures library: assessing semantic similarity from knowledge representation analysis, in: International Conference on Applications of Natural Language to Data Bases/Information Systems, Springer, Cham, 2014, pp. 254–257.

25.

Tversky

, Features of similarity, Psychological Review 84(4) (1977), 327.

26.

Rodríguez

M.A.

and Egenhofer

M.J.

, Determining semantic similarity among entity classes from different ontologies, IEEE Transactions on Knowledge and Data Engineering 15(2) (2003), 442–456.

27.

Petrakis

E.G.

et al., X-similarity: Computing semantic similarity between concepts from different ontologies, Journal of Digital Information Management 4(4) (2006).

28.

Pirró

, A semantic similarity metric combining features and intrinsic information content, Data & Knowledge Engineering 68(11) (2009), 1289–1308.

29.

Sánchez

et al., Ontology-based semantic similarity: A new feature-based approach, Expert Systems with Applications 39(9) (2012), 7718–7728.

30.

Solé-Ribalta

et al., Towards the estimation of feature-based semantic similarity using multiple ontologies, Knowledge-Based Systems 55 (2014), 101–113.

31.

Alvarez

M.A.

and Lim

, A Graph Modeling of Semantic Similarity between Words, in: International Conference on Semantic Computing (ICSC 2007), IEEE, Irvine, CA, 2007, pp. 355–362.

32.

Hao

et al., An Approach for Calculating Semantic Similarity between Words Using WordNet, in: 2011 Second International Conference on Digital Manufacturing & Automation, IEEE, Zhangjiajie, 2011, pp. 177–180.

33.

Liu

Bao

and Xu

, Concept vector for semantic similarity and relatedness based on WordNet structure, Journal of Systems and Software 85(2) (2012), 370–381.

34.

Seco

Veale

and Hayes

, An intrinsic information content metric for semantic similarity in WordNet, in: Proceedings of the 16th European Conference on Artificial Intelligence, IOS Press, 2004, pp. 1089–1090.

35.

Cai

et al., Measuring distance-based semantic similarity using meronymy and hyponymy relations, Neural Computing and Applications 32(8) (2020), 3521–3534.

36.

Hirst

and St-Onge

, Lexical chains as representations of context for the detection and correction of malapropisms, WordNet: An Electronic Lexical Database 305 (1998), 305–332.

37.

Sussna

, Word sense disambiguation for free-text indexing using a massive semantic network, in: Proceedings of the Second International Conference on Information and Knowledge Management, Association for Computing Machinery New York, 1993, pp. 67–74.

38.

Leacock

and Chodorow

, Combining local context and WordNet similarity for word sense identification, WordNet: An Electronic Lexical Database 49(2) (1998), 265–283.

39.

Liu

Zhou

and Zheng

, Measuring Semantic Similarity in Wordnet, in: 2007 International Conference on Machine Learning and Cybernetics, IEEE, Hong Kong, 2007, pp. 3431–3435.

40.

Miller

G.A.

and Charles

W.G.

, Contextual correlates of semantic similarity, Language and Cognitive Processes 6(1) (1991), 1–28.

41.

Rubenstein

and Goodenough

J.B.

, Contextual correlates of synonymy, Communications of the ACM 8(10) (1965), 627–633.

42.

Dolan

et al., Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources, 2004.

43.

Mihalcea

Corley

and Strapparava

, Corpus-based and knowledge-based measures of text semantic similarity, in: Aaai, AAAI Press Boston, 2006(6), 2006, pp. 775–780.

44.

Quora https://www.quora.com/#.

Software	Computer type	CPU type	CPU frequency	Memory
Pycharm	Desktop PC	i5-4210H	2.9 GHz	12 GB

Intelligent recognition of semantic relationships based on antonymy

Abstract

Keywords

1. Introduction

2.1 “Is-a” relation-based methods

2.1.1 Path-distance based

2.1.2 Information content-based

2.1.3 Hybrid “is-a” relation measures

2.2 Feature-based

2.3 Multiple relation-based measures

2.3.1 Graph-based measures

2.3.2 Multiple relation path-based methods

3. Proposed model

3.1 An antonymy-based similarity model

4.1 Datasets

Table 3 Results compared with existing path-based methods

4.4 Analysis of the parameters in the metrics

Table 8 The short texts from answers in Quora

Footnotes

Acknowledgments

Author’s Bios

References

Table 3
Results compared with existing path-based methods

Table 8
The short texts from answers in Quora