A novel model for semantic similarity measurement based on wordnet and word embedding

Abstract

To measure semantic similarity between words, a novel model DFRVec that encodes multiple semantic information of a word in WordNet into a vector space is presented in this paper. Firstly, three different sub-models are proposed: 1) DefVec: encoding the definitions of a word in WordNet; 2) FormVec: encoding the part-of-speech (POS) of a word in WordNet; 3) RelVec: encoding the relations of a word in WordNet. Then by combining the three sub-models with an existing word embedding, the new model for generating the vector of a word is proposed. Finally, based on DFRVec and the path information in WordNet, a new method DFRVec+Path to measure semantic similarity between words is presented. The experiments on ten benchmark datasets show that DFRVec+Path can outperform many existing methods on semantic similarity measurement.

Keywords

1 Introduction

Measuring semantic similarity between words is a very important issue in the field of natural language processing and it is widely applied to areas such as text retrieval [1], identification of Gene Ontology [2], measurement of medical language [3], identification of entities in the biological field [4], word sense disambiguation [5], sentiment classification [6, 7] and topic sentiment analysis [8].

Using semantic resource including ontology and dictionary to calculate semantic similarity is a common method. An example of ontology is WordNet [9], which is a very mature and artificially organized database for storing semantic information. WordNet contains multiple semantic information of a word, including definition, POS, form and relation, etc. The existing WordNet-based methods for semantic similarity measurement mostly rely on semantic taxonomy [10], which is derived from the relations of hypernyms and hyponyms and contains the semantic information such as path, depth and density of synsets. For example, the method of Wu [11] is based on LCS (Least Common Subsume) of two synsets and depth, the method of Zhu [12] introduces edge and density information from WordNet into the path computing model, and the method of Lin [13] is similar to the method of Wu, except that Lin uses IC (information content, i.e. the information contained in a word) instead of depth. The main advantage of the WordNet-based methods is that they can separately represent the different senses of a word, and can simultaneously contain multiple-semantic information. However, the disadvantage is that they can only represent nouns and verbs, but not adjectives and adverbs. Being limited by size and structure of WordNet, these methods are difficult to make a breakthrough in performance.

In addition, using the cosine score between word’s vectors in a word embedding model to measure semantic similarity is another common method. In recent research, the commonly used word embedding models are word2vec [14] and GloVe [15]. Note that the GloVe model mentioned in this article only refers to Glove Common Crawl by Pennington et al. with vector size of 300 dimensions and corpora size of 42 billion. However, it is difficult for word2vec and GloVe to distinguish the same word’s senses, POS and forms, without introducing external semantic information. There are many improved models based on word2vec and GloVe, such as GloVe-d [16], GW597 [16], Paragram-sl [17], and Counter-fitting [18]. Both of Paragram-sl and Counter-fitting introduce external semantic information (i.e., similarity relations for Paragram-sl, and synonyms and antonyms for Counter-fitting) and encode the information into a vector space, so they can improve the performance. However, both Paragram-sl and Counter-fitting only introduce a single aspect of semantic information.

The above research shows that the methods combining word embedding with some semantic information of words can improve the accuracy of word similarity computation. To further improve the accuracy of semantic similarity measurement of words, this paper will focus on how to design a new model to encode various semantic information of words contained in WordNet into a vector space through word embedding, and how to use it to measure the semantic similarity between words. The main contributions of this paper are as follows. Firstly, it has proposed a novel model called DFRVec, which encodes the semantic information of definition, POS, form, and relation of a word contained in WordNet into a vector space based on word embedding. Secondly, based on DFRVec, a new method called DFRVec+Path is proposed to measure semantic similarity between words. Thirdly, the proposed method has been proved to be able to increase the performance of the semantic similarity measurement by comparison experiment with related methods on ten different benchmark datasets.

The rest of this paper is organized as follows: Section 2 introduces the related work; Section 3 gives a detailed description of our new model; Section 4 is the performance comparative experiment with the existing methods; and Section 5 is the summary of our work.

2 Related work

The existing research methods on semantic similarity measurement can be roughly divided into three categories: Knowledge-based methods, Corpus-based methods, and Combined methods, which are described in the following sub-sections.

2.1 Knowledge-based methods

The “knowledge” refers to some semantic resources, such as ontology, encyclopedias or dictionaries, which contain different semantic information, such as definitions, POS and relations.

Ontology-based method is the most popular knowledge-based method. Over the past thirty years, ontology-based methods have been the main solution to semantic similarity measurement. WordNet is the most widely used ontology in this field. The basic unit of WordNet is a synset, which includes a unique definition and a set of synonyms. For example, the synset “smile.v.02” means the second verb sense of smile, and its corresponding definition is “express with a smile”. Only the word “smile” belongs to this synset.

Most WordNet-based methods rely on the semantic taxonomy. There are three main kinds of methods for calculating semantic similarity based on WordNet:

(1) Path-based methods: The semantic similarity between two words can be represented by the length of the path between their synsets of the words in WordNet. For example, according to Rada [19], Path model, which uses the path information to compute semantic similarity between two synsets, is defined as follows: ${sim}_{Path} ({Syn}_{i}, {Syn}_{j}) = \frac{1}{1 + dist ({Syn}_{i}, {Syn}_{j})}$ (1) where dist (Syn_i, Syn_j) is a function that returns the distance between the two synsets Syn_i, Syn_j, and the similarity score is 0 if there is no path between Syn_i and Syn_j. However, the length of the path between two synsets sometimes does not really represent the semantic similarity [20], because in many cases, two synsets with the same distance length have different similarity scores.

(2) Feature-based methods: The main idea is to represent a synset as a set of features that can reflect its attributes or aspects [10]. Feature-based methods measure the semantic similarity between words according to the structural feature of semantic taxonomy, including nodes and edges. The more common features and the fewer non-common features two synsets have, the more similar they are. Taieb [21] presented a new method to measure the semantic similarity between concepts and words based on the “is-a” semantic taxonomy. Their method combines the hyponyms and depth parameters.

(3) IC-based methods: Information content refers to the amount of information in a synset in WordNet. Early information content was measured by the reciprocal of the frequency of words. As more and more researchers continue to explore, the computing model of information content has developed to be very complex, containing more and more comprehensive semantic information. The calculation of semantic similarity has also been greatly improved such as the methods proposed by Meng [22], Adhikari [23], Cai [24], and Zhang [25]. The first step of such methods is to calculate the information content of each synset, and the second step is to compute the similarity between words based on the calculation results.

2.2 Corpus-based methods

Corpus-based methods, such as the well-known word embedding of GloVe and word2vec, are commonly trained by some neural network on a very large corpus, and based on the well-known distribution hypothesis introduced by Harris, if the probability of two words appearing in similar contexts is higher, the similarity is considered higher. Compared with knowledge-based methods, corpus-based methods provide wider vocabulary coverage, and there are no restrictions on artificially constructed semantic resources. For example, WordNet-based methods are mostly unable to measure semantic similarity between adjectives and adverbs. Recently, many researchers have proposed various models of word embedding, and achieved good results. For instance, FRAGE [26] is an improved word embedding model related to semantic similarity measurement. It can solve the problem that the embeddings of a rare word and a popular word can be far from each other even if they are semantically similar, and make learned word embeddings more effective. In general, word embedding maps words in a vocabulary to the corresponding dense vectors and has different linear characteristics determined by its training algorithm [16].

In the field of semantic similarity research, corpus-based methods are better than knowledge-based methods in general. But word embedding also has its weaknesses. For example, in processing the problem of polysemy, word embedding is powerless as it does not consider the senses, POS, forms and relations of words.

2.3 Combined methods

Combined methods, which combine the advantages of knowledge-based methods and corpus-based methods, have become a popular research area in recent years. Especially, the combination methods of WordNet and word embedding have achieved better results.

For example, Lee [16] proposes several simple and feasible ideas for optimizing the word embedding and for combining word embedding and WordNet, and then presents three methods, i.e. GloVe-d, GW597 and GW597 + Path. The GloVe-d with three abnormal dimensions removed achieves very good performance. The GW597 with 597 dimensions that combines GloVe-d and word2vec is a state-of-the-art method on semantic similarity measurement. The GW597 + Path that combines GW597 with Path outperforms GloVe-d and GW597.

In GW597 + Path, the semantic similarity between words w_i and w_j is defined as follows: $\begin{matrix} {sim}_{GW 597 + Path} (w_{i}, w_{j}) = \\ \underset{m, n}{\max [} ω \cos (v_{w_{i}}, v_{w_{j}}) + (1 - ω) \\ \frac{1}{1 + dist ({Syn}_{w_{i}, m}, {Syn}_{w_{j}, n})}] \end{matrix}$ (2) where dist (Syn_{w_i,m}, Syn_{w_j,n}) is the distance between two synsets Syn_{w_i,m}, Syn_{w_j,n}, which belong to the words w_i and w_j respectively and ω is a weighting factor, and cos(v_{w_i}, v_{w_j}) is a function that returns the cosine value of the two vectors v_{w_i} and v_{w_j} from GW597. And the function of method Path mentioned above for computing semantic similarity between the words w_i and w_j is defined as follows: ${sim}_{Path} (w_{i}, w_{j}) = \max_{m, n} \frac{1}{1 + dist ({Syn}_{w_{i}, m}, {Syn}_{w_{j}, n})}$ (3)

The Counter-fitting model proposed by Mrkšić [18] uses the antonyms and synonyms in PPDB (Pesticide Properties Database) and WordNet to tune word vectors. Another model Paragram-sl proposed by Wieting [17] is tuned with SL999 dataset [27] by rewarding exclusively similarity relations in word vector model. Scheepers [28] proposes a new idea, which is to combine the semantic information contained in the definitions of word’s synsets in WordNet with word embedding, and uses a data set that consists of the definition sentences to train and test a new model. All of these models are combined with external semantic information to tune word vector model, and have achieved very good results. However, the vocabulary of their models is relatively small, and both the size of the vocabulary and the scale of external semantic information need to be further expanded. In addition, various types of semantic information should be considered to improve the capability of the vectors to represent different types of words.

3 Proposed model

In our proposed model DFRVec, various semantic information, which is described artificially and elaborately in WordNet, such as definitions, POS, forms and relations of synsets of a word, is applied to generate a new vector representation of the word. DFRVec encodes the multiple semantic information of a word into a vector space (a set of vectors) by using an existing word embedding model. DFRVec not only includes three new sub-models, DefVec (to encode the definitions), FormVec (to encode the POS and forms), and RelVec (to encode the relations), but also combines the word embedding (to retain the semantic information learned from a corpus).

The process of DFRVec model is shown in Fig. 1. Here, it is supposed that word w has r synsets in total, Synset_w,1, ... , Synset_w,r. DFRVec’s encoding result for w is a set of vectors, DefVec_w,1, ... , DefVec_w,r, each of which consists of the vectors from DefVec, FormVec, RelVec and the vector v_w from word embedding model.

Fig. 1

The process of DFRVec model of encoding various semantic information of a word in WordNet into a vector space.

These models will be introduced in detail in the following sub-sections respectively.

3.1 DefVec

Each synset of a word in WordNet has a corresponding unique definition, which describes important semantic information of the synset. Duong [2] represents each gene ontology term by the vector obtained by using an existing sentence encoder to encode the term’s definition given in Gene Ontology Database, and then estimates the similarity of two terms through their vectors. It is a good way to map the definition of a word into a vector to represent the word. However, different from Duong’s method [2], a new encoder DefVec sub-model is designed in this paper to encode the semantic information in each definition into a vector representation through word embedding.

Let w be the target word, to calculate the DefVec_w,i vector of i’th synset of w, DefVec model firstly represents w and the words in the definition of the synset as vectors through word embedding; then calculates the cosine similarity score between each word in the definition and w; thirdly removes those words in the definition whose scores are less than a given threshold; finally, takes an average vector of the remaining words’ vectors.

The detailed computation process of DefVec_w,i is described as follows:

Step 1. The word set Defs_w,i of length D is obtained from the synset’s definition after tokenizing and removing numbers, punctuations, and stopwords from NLTK. Defs_w,i is represented as follows: ${Defs}_{w, i} = {w_{1}, w_{2}, . . ., w_{D}}$

Step 2. Defs_w,i’s corresponding vectors set V_w,i is represented as follows: $V_{w, i} = {v_{d} | d \in {Defs}_{w, i} andifcos (v_{w}, v_{d}) ⩾ α}$ where v_d and v_w mean the vector representations of word d and w in word embedding, α is a threshold to filter out words with low similarity to the target word, which is set to 0.05 according to our experiments, and cos(v_w, v_d) is a function that returns the cosine value of v_w and v_d.

Step 3. Applying an average value function to V_w,i, we can obtain a vector: ${DefVec}_{w, i} = {\begin{matrix} \frac{1}{| V_{w, i} |} \sum_{m = 1}^{| V_{w, i} |} V_{w, i} [m], V_{w, i} \neq \emptyset \\ v_{w}, V_{w, i} = \emptyset \end{matrix}$ (4) here and after, Σ means to sum the vectors by dimension.

3.2 FormVec

POS is a basic semantic feature that is available for each synset of a word in WordNet. Many word embedding models do not pay attention to the semantic features such as POS. In word embedding, on the one hand, a word with more than one POS may correspond to the same vector representation; on the other hand, one POS of the word may have one or more different forms which have different vector representations in the word embedding. For example, the word ‘smile’ has two POS: noun and verb. Its noun forms are “smile” and “smiles”, while its verb forms may be “smile”, “smiles”, “smiled” and “smiling”, and those forms may have different vector representations.

Therefore, the proposed FormVec sub-model is designed to encode the semantic information of POS and forms in each synset into a vector space through word embedding. The new model can distinguish the different POS of a word, and at the same time bring the synsets of the word with the same POS closer to each other.

Given a target word w, the FormVec_w,i vector of the i’th synset of w is computed by the following three steps:

Step 1. Find the set of all possible and different word forms for the i’th synset of w (note that the target word w must be excluded in it), which is represented as follows: ${Forms}_{w, i} = {forms ({POS}_{w, i})} - {w}$ where forms (POS_w,i) is a function from pattern.en 1 module that returns all possible forms of POS_w,i, which is the POS of the i’th synset of w.

Step 2. The vectors set V_w,i of length S (S is the number of words in Forms_w,i) from word embedding model is represented as follows: $V_{w, i} = {{v}_{f} | f \in {Forms}_{w, i}}$

Step 3. The vector to encode the POS of the i’th synset of w is represented as follows: ${FormVec}_{w, i} = {\begin{matrix} maxCosVec (v_{w}, V_{w, i}), V_{w, i} \neq \emptyset \\ v_{w}, V_{w, i} = \emptyset \end{matrix}$ (5) where maxCosVec (v_w, V_w,i) is a function that returns the vector in V_w,i which has the largest cosine similarity with v_w.

3.3 RelVec

In WordNet, there may be manually defined relations between two synsets of words. Therefore, a word may be connected with other words through their relations. Based on a word’s related neighboring words, including words from definitions and lemmas of related synsets, Jimenez [29] propose a new method called word2set to obtain a word representation, and then find two words’ intersection sets of related neighboring words to measure their similarity. The method is effective but may be too simple to achieve high performance on semantic similarity measurement. We think that by encoding these words into a vector space it may get a better word representation for semantic similarity measurement. Therefore, a new RelVec sub-model is designed to encode the related words of a word into a vector space through word embedding. Otherwise, when computing the related neighboring words, RelVec considers only the words from related synsets.

In this paper, the following relations can be considered: Hypernyms, Instance Hypernyms, Hyponyms, Instance Hyponyms, Member Holonyms, Member Meronyms, Substance Holonyms, Part Holonyms, Substance Meronyms, Part Meronyms, Attributes, Entailments, Causes, Also See, Verb Groups, and Similar To. For example, as is described in Fig. 2, “smile.n.01” is one synset of word “smile”, and it has three related synsets, “smirk.n.01”, “simper.n.01” and “facial_expression.n.01”, respectively connected by relations Hyponyms, Hyponyms and Hypernyms. In addition, it is connected by no other relations. Note that antonyms are not considered, because the vectors of antonyms and target word in word embedding usually have a high cosine similarity.

Fig. 2

Elements involved in RelVec method using WordNet.

Given a target word w, the RelVec_w vector is computed by the following four steps:

Step 1. Get all the related synsets set of w, RelatedSynsets_w, which is defined as follows: $\begin{matrix} {RelatedSynsets}_{w} = \\ {relatedSynsets (syn) | syn \in synsets (w)} \\ \cup {synsets (w)} \end{matrix}$ where synsets (w) is a function that returns all synsets of the target word w, and relatedSynsets (syn) is also a function that returns all related synsets of syn.

Step 2. Get a set of related words by merging the words contained in each of the related synsets set RelatedSynsets_w except w itself, as follows: $\begin{matrix} {Rels}_{w} = \\ {lemmas (syn) | syn \in {RelatedSynsets}_{w}} - {w} \end{matrix}$ where lemmas (syn) is a function that returns all lemmas of syn.

Step 3. According to Rels_w, find the corresponding set of related word vectors V_w in word embedding, as follows: $V_{w} = {v_{r} | r \in {Rels}_{w} andifcos (v_{w}, v_{r}) ⩾ β}$ where β is a threshold that is set to 0 because we have found in our experiments that β near 0 performs better.

Step 4. An average value function is applied to V_w and RelVec is obtained as follows: ${RelVec}_{w} = {\begin{matrix} \frac{1}{| V_{w} |} \sum_{n = 1}^{| V_{w} |} V_{w} [n], V_{w} \neq \emptyset \\ v_{w}, V_{w} = \emptyset \end{matrix}$ (6)

3.4 DFRVec

As shown in Fig. 1, DFRVec is composed of the above three sub-models (DefVec, FormVec, and RelVec) and the original word embedding. The composing process is shown in Fig. 3. N is the dimension of v_w, and the vector representation DFRVec_w,i of the i’th synset of a target word w is defined as follows: ${\begin{matrix} {DFRVec}_{w, i} = λ_{1} {DefVec}_{w, i} + λ_{2} {FormVec}_{w, i} + \\ λ_{3} {RelVec}_{w} + λ_{4} v_{w} \\ λ_{1} + λ_{2} + λ_{3} + λ_{4} = 1 \end{matrix}$ (7) where “+” is an operator that adds two vectors by dimension, v_w is the vector of w in word embedding, and λ₁, λ₂, λ₃andλ₄ are weighting factors of DefVec, FormVec, RelVec and word embedding respectively.

Fig. 3

The DFRVec vector of composing the vectors of DefVec, FormVec, RelVec and the original word embedding.

3.5 New semantic similarity

After encoding the various semantic information of a word into a vector space by DFRVec model, a new semantic similarity calculation model can be derived based on it. Through the DFRVec model, for a pair of words we can obtain respectively two sets of vector representations of all their synsets, which correspond to all senses of the two words, and so the similarity of the pair is defined as the maximum value of the cosine values between their vectors in our research.

Formally, given two words w_i and w_j, the semantic similarity between them is computed as follows: $\begin{matrix} {sim}_{DFRVec} (w_{i}, w_{j}) = \\ \max_{m, n} \cos ({DFRVec}_{w_{i}, m}, {DFRVec}_{w_{j}, n}) \end{matrix}$ (8)

According to the research of Lee [16] and Lastra-Díaz [20], when calculating the semantic similarity, a linear combination of the two similarity scores, a word-embedding-based method with a WordNet-based method, can achieve good results. In that case, by combining DFRVec and the Path method, the new semantic similarity between two words w_i and w_j in this paper can be finally computed as follows: $\begin{matrix} {sim}_{DFRVec + Path} (w_{i}, w_{j}) = \\ \max_{m, n} [θ \cos ({DFRVec}_{w_{i}, m}, {DFRVec}_{w_{j}, n}) + \\ (1 - θ) \frac{1}{1 + dist ({Syn}_{w_{i}, m}, {Syn}_{w_{j}, n})}] \end{matrix}$ (9) where dist (Syn_{w_i,m}, Syn_{w_j,n}) is the distance between two synsets Syn_{w_i,m}, Syn_{w_j,n}, which belong to the target words w_i and w_j in WordNet respectively, and θ is a weighting factor which is in the range 0 to 1. θ is uncertain because a fixed value cannot make the model perform better on multiple datasets. In the following experiments, the value of θ for each dataset is obtained by grid search. According to our experiments, the value of θ on the ten data sets is in the range of 0 to 0.5.

As the influence of choosing different WordNet-based method on the final model in formula (9) is not the research focus of this article, also for comparison with method GW597 + Path, the simpler Path method that only considers path information is chosen to combine with DFRVec.

3.6 Training DFRVec model

When training DFRVec model, the vocabulary of this model is based on the vocabulary of the selected word embedding model during the encoding process of DFRVec model, and all the words with zero synset in WordNet are eliminated.

The training process can be described as the following four steps:

Step 1. The vocabulary W of size N used in both word embedding and WordNet (their intersection) can be defined as follows: $W = {w_{1}, w_{2}, \dots, w_{N}}$

Step 2. According to Sections 3.1, 3.2, and 3.3, we can find out respectively the definition words’ collection Defs, the POS and form words’ collection Forms, and the relation words’ collection Rels by W.

Step 3. The vector collection V of size L, which is the union of the four vector collections that are computed from the four models of DefVec, FormVec, RelVec, and word embedding respectively according to Defs, Forms, Rels, and W as described in Sections 3.1, 3.2, and 3.3, can be defined as follows: $V = {v_{1}, v_{2}, \dots, v_{L}}$

Step 4. According to the DFRVec model in Section 3.4, we can generate our target vector collection V′ based on V: $\begin{matrix} V^{'} = {(v_{w_{1}, 1}, \dots, v_{w_{1}, s_{1}}), (v_{w_{2}, 1}, \dots, v_{w_{2}, s_{2}}), \\ \dots, (v_{w_{N}, 1}, \dots, v_{w_{N}, s_{N}})} \end{matrix}$ (10) where (for i = 1,2, ... ,N) w_i means the i’th word in W, s_i is the number of synsets of w_i, and v_{w_i,r} computed by formula (7) means the vector representation DFRVec_w,r of the r’th (r = 1,2, ... , s_i) synset of word w_i.

4 Experiments

In this section, the evaluation of measuring semantic similarity using DFRVec model and different combinations of the three sub-models are described. Also, this section evaluates the performance of multiple existing word embedding models working with the proposed model.

4.1 Experiment setup

All experiments described in this section are performed on an Intel Core I5, 2.9 GHz processor machine with 16 GB RAM, 500 GB hard disk, and Windows 10 system. All algorithms are implemented in python.

4.2 Evaluation metrics

Pearson correlation (r), Spearman rank correlation (ρ), and the harmonic mean score (h) of Pearson correlation and Spearman rank correlation are used in this paper. Pearson and Spearman correlations are defined between the human judgments X and the values of the semantic similarity Y. These formulas are defined as follows: $r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}$ (11) $ρ = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)}, d_{i} = (x_{i} - y_{i})$ (12) $h = \frac{2 r ρ}{r + ρ}$ (13)

As far as any scaling or translation of the data is concerned, the Pearson correlation is invariant, while the Spearman rank correlation is rank invariant, which means that it has the same value for any arbitrary monotone data transformation. In order to compare the performance of the methods evaluated in our experiments, we introduce the average values of Pearson correlation, Spearman correlation, and Harmonic mean score in all datasets.

4.3 Dataset

In order to evaluate the general performance of the proposed model, the experiments have been performed on ten benchmark datasets. The detailed information of the datasets is introduced in Table 1.

Table 1
Detailed information of the 10 datasets

Dataset Content Number of pairs Reference

MC28 n 28 Miller [30]

RG65 n 65 Rubenstein [31]

PS n 65 Pirró

Agirre201 n 201 Agirre [33]

SL665 n 665 Hill [27]

SL999 n,v,adj 999 Hill [27]

YP130 v 130 Yang [34]

WS353 n,v,adj 353 Finkelstein [35]

RW n,v,adj 2034 Luong [36]

MEN n,v,adj 3000 Bruni [37]

Dataset	Content	Number of pairs	Reference
MC28	n	28	Miller [30]
RG65	n	65	Rubenstein [31]
PS	n	65	Pirró
Agirre201	n	201	Agirre [33]
SL665	n	665	Hill [27]
SL999	n,v,adj	999	Hill [27]
YP130	v	130	Yang [34]
WS353	n,v,adj	353	Finkelstein [35]
RW	n,v,adj	2034	Luong [36]
MEN	n,v,adj	3000	Bruni [37]

Notes: The column Content describes the POS of each dataset.

4.4 Word vectors and vocabulary

Two word embedding models, word2vec [14] and GloVe-d [16] (GloVe with three dimensions removed), are used to encode sematic information in our experiments to verify the performance of the proposed model on sematic similarity measurement.

When using a word embedding model to train DFRVec model, the vocabulary is the intersection of the two vocabularies of the word embedding model and WordNet. When using word2vec to train DFRVec model, the size of the vocabulary is 207119, and about GloVe-d, the size is 116823.

4.5 Evaluation results

4.5.1 Performance of the three proposed sub-models

In order to evaluate the performance of DefVec, FormVec, and RelVec proposed in Section 3, we consider three situations in this experiment: 1) only one sub-model is composed of the original word vector model; 2) only one sub-model is not composed of the original word vector model; 3) the three sub-models are composed of the original word vector model. In all the three cases, the weighting factors (λ₁, λ₂, λ₃, λ₄) in formula (7) are evenly distributed. For example, in the third case, the parameter is set to (0.25, 0.25, 0.25, 0.25).

The experiment results are shown in Table 2 and Table 3. In the two tables the semantic similarities of words w_i and w_j of all the methods are the cosine value between their corresponding vectors.

Table 2
Pearson(r) values for DFRVec models in different sub-models and existing word embedding models

Method MC28 RG65 PS Agirre201 SL665 SL999 YP130 WS353 RW MEN Avg(r)

word2vec 0.796 0.772 0.786 0.757 0.461 0.454 0.589 0.653 0.508 0.756 0.653

DFRVec_word2vec(only DefVec) 0.908 0.890 0.890 0.833 0.557 0.524 0.790 0.718 0.499 0.792 0.740

DFRVec_word2vec(only FormVec) 0.847 0.826 0.837 0.779 0.505 0.481 0.649 0.669 0.508 0.777 0.688

DFRVec_word2vec(only RelVec) 0.866 0.869 0.890 0.808 0.557 0.548 0.686 0.679 0.512 0.779 0.719

DFRVec_word2vec(except DefVec) 0.885 0.873 0.890 0.811 0.553 0.536 0.696 0.690 0.531 0.792 0.726

DFRVec_word2vec(except FormVec) 0.917 0.914 0.918 0.840 0.595 0.565 0.799 0.702 0.527 0.789 0.757

DFRVec_word2vec(except RelVec) 0.909 0.890 0.893 0.831 0.563 0.536 0.763 0.720 0.521 0.806 0.743

DFRVec_word2vec(all) 0.921 0.910 0.916 0.843 0.592 0.569 0.779 0.720 0.545 0.809 0.760

GloVe-d 0.870 0.822 0.818 0.786 0.476 0.473 0.612 0.732 0.481 0.799 0.687

DFRVec_GloVe-d(only DefVec) 0.948 0.909 0.899 0.836 0.571 0.551 0.779 0.767 0.436 0.818 0.751

DFRVec_GloVe-d(only FormVec) 0.860 0.829 0.817 0.814 0.502 0.490 0.667 0.752 0.482 0.819 0.703

DFRVec_GloVe-d(only RelVec) 0.900 0.873 0.889 0.830 0.578 0.578 0.679 0.746 0.518 0.822 0.741

DFRVec_GloVe-d(except DefVec) 0.896 0.871 0.872 0.838 0.563 0.555 0.703 0.760 0.529 0.835 0.742

DFRVec_GloVe-d(except FormVec) 0.943 0.915 0.913 0.854 0.616 0.598 0.790 0.761 0.511 0.824 0.773

DFRVec_GloVe-d(except RelVec) 0.923 0.891 0.876 0.843 0.563 0.552 0.773 0.778 0.477 0.838 0.751

DFRVec_GloVe-d(all) 0.933 0.906 0.899 0.856 0.602 0.592 0.788 0.776 0.529 0.844 0.773

Method	MC28	RG65	PS	Agirre201	SL665	SL999	YP130	WS353	RW	MEN	Avg(r)
word2vec	0.796	0.772	0.786	0.757	0.461	0.454	0.589	0.653	0.508	0.756	0.653
DFRVec_word2vec(only DefVec)	0.908	0.890	0.890	0.833	0.557	0.524	0.790	0.718	0.499	0.792	0.740
DFRVec_word2vec(only FormVec)	0.847	0.826	0.837	0.779	0.505	0.481	0.649	0.669	0.508	0.777	0.688
DFRVec_word2vec(only RelVec)	0.866	0.869	0.890	0.808	0.557	0.548	0.686	0.679	0.512	0.779	0.719
DFRVec_word2vec(except DefVec)	0.885	0.873	0.890	0.811	0.553	0.536	0.696	0.690	0.531	0.792	0.726
DFRVec_word2vec(except FormVec)	0.917	0.914	0.918	0.840	0.595	0.565	0.799	0.702	0.527	0.789	0.757
DFRVec_word2vec(except RelVec)	0.909	0.890	0.893	0.831	0.563	0.536	0.763	0.720	0.521	0.806	0.743
DFRVec_word2vec(all)	0.921	0.910	0.916	0.843	0.592	0.569	0.779	0.720	0.545	0.809	0.760
GloVe-d	0.870	0.822	0.818	0.786	0.476	0.473	0.612	0.732	0.481	0.799	0.687
DFRVec_GloVe-d(only DefVec)	0.948	0.909	0.899	0.836	0.571	0.551	0.779	0.767	0.436	0.818	0.751
DFRVec_GloVe-d(only FormVec)	0.860	0.829	0.817	0.814	0.502	0.490	0.667	0.752	0.482	0.819	0.703
DFRVec_GloVe-d(only RelVec)	0.900	0.873	0.889	0.830	0.578	0.578	0.679	0.746	0.518	0.822	0.741
DFRVec_GloVe-d(except DefVec)	0.896	0.871	0.872	0.838	0.563	0.555	0.703	0.760	0.529	0.835	0.742
DFRVec_GloVe-d(except FormVec)	0.943	0.915	0.913	0.854	0.616	0.598	0.790	0.761	0.511	0.824	0.773
DFRVec_GloVe-d(except RelVec)	0.923	0.891	0.876	0.843	0.563	0.552	0.773	0.778	0.477	0.838	0.751
DFRVec_GloVe-d(all)	0.933	0.906	0.899	0.856	0.602	0.592	0.788	0.776	0.529	0.844	0.773

Table 3

Spearman (ρ) values for DFRVec models in different sub-models and existing word embedding models

Method	MC28	RG65	PS	Agirre201	SL665	SL999	YP130	WS353	RW	MEN	Avg(ρ)
word2vec	0.781	0.760	0.767	0.765	0.454	0.442	0.570	0.700	0.534	0.771	0.654
DFRVec_word2vec(only DefVec)	0.921	0.886	0.889	0.810	0.523	0.503	0.754	0.745	0.506	0.800	0.734
DFRVec_word2vec(only FormVec)	0.866	0.832	0.843	0.776	0.489	0.470	0.620	0.708	0.531	0.787	0.692
DFRVec_word2vec(only RelVec)	0.856	0.866	0.879	0.807	0.523	0.520	0.658	0.712	0.501	0.789	0.711
DFRVec_word2vec(except DefVec)	0.878	0.874	0.885	0.803	0.525	0.517	0.668	0.723	0.531	0.799	0.720
DFRVec_word2vec(except FormVec)	0.933	0.904	0.910	0.821	0.553	0.538	0.759	0.722	0.509	0.795	0.744
DFRVec_word2vec(except RelVec)	0.927	0.886	0.894	0.821	0.532	0.518	0.737	0.751	0.532	0.813	0.741
DFRVec_word2vec(all)	0.937	0.910	0.921	0.827	0.553	0.546	0.750	0.744	0.536	0.814	0.754
GloVe-d	0.893	0.810	0.803	0.790	0.450	0.460	0.588	0.769	0.502	0.808	0.687
DFRVec_GloVe-d(only DefVec)	0.952	0.893	0.881	0.822	0.525	0.527	0.742	0.791	0.449	0.823	0.740
DFRVec_GloVe-d(only FormVec)	0.867	0.825	0.825	0.812	0.472	0.479	0.648	0.777	0.503	0.823	0.703
DFRVec_GloVe-d(only RelVec)	0.912	0.866	0.873	0.831	0.538	0.554	0.656	0.776	0.515	0.828	0.735
DFRVec_GloVe-d(except DefVec)	0.903	0.861	0.872	0.834	0.524	0.536	0.683	0.786	0.533	0.838	0.737
DFRVec_GloVe-d(except FormVec)	0.943	0.902	0.901	0.838	0.570	0.573	0.762	0.779	0.509	0.827	0.761
DFRVec_GloVe-d(except RelVec)	0.933	0.879	0.880	0.834	0.518	0.531	0.750	0.799	0.489	0.840	0.745
DFRVec_GloVe-d(all)	0.933	0.886	0.897	0.844	0.555	0.569	0.766	0.794	0.530	0.845	0.762

Notes: In Tables 2 and 3, DFRVec_word2vec means the specific DFRVec when uses word2vec to encode sematic information, and DFRVec_Glove-d is the same meaning but uses Glove-d. Each value of the column Avg(.) is the average value of the 10 datasets’ values in the same row.

It can be seen from Tables 2 and 3 that, to semantic similarity measurement, DefVec, FormVec and RelVec models have a significant improvement when working with word2vec and GloVe-d. Furthermore, according to the avg values, among the three sub-models, DefVec has the best performance and FormVec performs the worst. So, when composing the three sub-models in DFRVec model according to formula (7), DefVec may take a larger weight and FormVec smaller.

On RW dataset, the three models perform relatively poorly, although most of their results still improve. The reason may be that the words contained in this dataset are mostly rare words, and their semantic information in WordNet may be insufficient.

4.5.2 Performance of DFRVec model

The second experiment compares the performance on semantic similarity measurement of our DFRVec model, which encodes the sematic information of definition, POS, forms, and relation described in WordNet with different word embedding model, with the eight existing methods described in Section 2, including one knowledge-based method: Path [19]; four corpus-based methods: word2vec [14], GloVe [15], GloVe-d [16], GW597 [16]; and three combined methods: GW597 + Path [16], Paragram-sl [17], Counter-fitting [18], GW597 + Path [16]. In this experiment, it is very important to decide the weighting factors of λ₁, λ₂, λ₃, andλ₄. To find out a set of relatively better weighting factors for all the ten datasets, the grid-search method was used, and the changing range of λ₁ is 0.3, 0.35, 0.4, λ₂ is 0.1, 0.15, 0.2, and λ₃ is 0.2, 0.25, 0.3, based on experience.

When applying DFRVec model to word2vec (called DFRVec_word2vec) and GloVe-d (called DFRVec_ GloVe-d), the best weighting factors of both cases are 0.3, 0.15, 0.3 and 0.25.

The method DFRVec597 of 597 dimensions is a combination of DFRVec_word2vec and DFRVec_GloVe-d by concatenating vectors of corresponding words and synsets. The method DFRVec597 + Path is a linear combination of DFRVec597 and Path. But in the two methods, the weighting factors are 0.3, 0.2, 0.25, 0.25 for word2vec and 0.3, 0.15, 0.25, 0.3 for GloVe-d.

Although DFRVec597 + Path is similar in form to GW597 + Path, both of which use ideas of concatenating vectors of two word embedding models and linear combination of word embedding model and WordNet-based method, DFRVec597 + Path is fundamentally different from GW597 + Path in that GW597 directly concatenates word2vec and GloVe-d, but DFRVec597 firstly encodes multiple types of semantic information from WordNet into vectors DFRVec_word2vec and DFRVec_GloVe-d through word2vec and GloVe-d based on our DFRVec model respectively, and then concatenates them.

The results of the second experiment are described in Table 4.

Table 4
Pearson(r), Spearman (ρ) and Harmonic(h) values of our models and the other compared methods

Method MC28 RG65 PS Agirre201 SL665 SL999 YP130 WS353 RW MEN Avg

Pearson(r)

Path 0.768 0.777 0.837 0.582 0.539 0.453 0.732 0.374 0.020 0.354 0.544

Counter-fitting 0.806 0.806 0.866 0.712 0.697 0.728 0.751 0.624 0.555 0.680 0.722

Paragram-sl 0.781 0.798 0.854 0.751 0.682 0.689 0.736 0.648 0.550 0.712 0.720

word2vec 0.796 0.772 0.786 0.757 0.461 0.454 0.589 0.654 0.508 0.756 0.653

GloVe 0.832 0.800 0.775 0.698 0.433 0.401 0.512 0.652 0.371 0.728 0.620

GloVe-d 0.870 0.822 0.818 0.786 0.476 0.473 0.612 0.732 0.481 0.799 0.687

GW597 0.865 0.825 0.826 0.796 0.483 0.480 0.616 0.736 0.517 0.805 0.695

GW597 + Path 0.884 0.869 0.900 0.816 0.627 0.591 0.783 0.739 0.517 0.809 0.754

DFRVec_word2vec 0.927 0.919 0.922 0.847 0.604 0.577 0.804 0.718 0.539 0.806 0.766

DFRVec_GloVe-d 0.944 0.915 0.909 0.860 0.604 0.620 0.799 0.772 0.522 0.840 0.779

DFRVec597 0.945 0.921 0.917 0.863 0.624 0.607 0.805 0.771 0.553 0.840 0.785

DFRVec597 + Path 0.944 0.920 0.935 0.864 0.668 0.637 0.822 0.778 0.554 0.843 0.797

Spearman(ρ)

Path 0.810 0.771 0.757 0.584 0.579 0.437 0.630 0.310 –0.004 0.324 0.520

Counter-fitting 0.857 0.808 0.831 0.724 0.698 0.735 0.710 0.693 0.612 0.741 0.741

Paragram-sl 0.761 0.775 0.794 0.791 0.679 0.685 0.693 0.730 0.597 0.771 0.727

word2vec 0.781 0.760 0.767 0.765 0.454 0.442 0.570 0.701 0.534 0.771 0.654

GloVe 0.856 0.817 0.789 0.688 0.404 0.374 0.537 0.646 0.384 0.731 0.623

GloVe-d 0.893 0.810 0.803 0.790 0.450 0.460 0.588 0.769 0.502 0.808 0.687

GW597 0.876 0.814 0.809 0.803 0.460 0.466 0.589 0.776 0.538 0.814 0.694

GW597 + Path 0.931 0.894 0.890 0.841 0.625 0.571 0.741 0.795 0.538 0.822 0.765

DFRVec_word2vec 0.942 0.913 0.924 0.829 0.563 0.553 0.771 0.737 0.522 0.810 0.756

DFRVec_GloVe-d 0.945 0.895 0.904 0.846 0.581 0.573 0.768 0.790 0.520 0.842 0.766

DFRVec597 0.939 0.902 0.912 0.847 0.577 0.583 0.778 0.790 0.542 0.842 0.771

DFRVec597 + Path 0.941 0.909 0.918 0.853 0.646 0.620 0.785 0.800 0.546 0.845 0.787

Harmonic(h)

Path 0.788 0.774 0.795 0.583 0.559 0.445 0.677 0.339 –0.011 0.338 0.529

Counter-fitting 0.831 0.807 0.848 0.718 0.697 0.732 0.730 0.657 0.582 0.709 0.731

Paragram-sl 0.771 0.786 0.823 0.770 0.681 0.687 0.714 0.686 0.573 0.740 0.723

word2vec 0.788 0.766 0.777 0.761 0.457 0.448 0.579 0.677 0.521 0.763 0.654

GloVe 0.844 0.809 0.782 0.693 0.418 0.387 0.525 0.649 0.377 0.729 0.621

GloVe-d 0.881 0.816 0.811 0.788 0.463 0.466 0.600 0.750 0.491 0.804 0.687

GW597 0.870 0.819 0.817 0.799 0.471 0.473 0.602 0.755 0.527 0.809 0.694

GW597 + Path 0.907 0.881 0.895 0.828 0.626 0.581 0.761 0.766 0.527 0.815 0.759

DFRVec_word2vec 0.934 0.916 0.923 0.838 0.583 0.565 0.787 0.727 0.530 0.808 0.761

DFRVec_GloVe-d 0.945 0.905 0.907 0.853 0.593 0.595 0.783 0.781 0.521 0.841 0.772

DFRVec597 0.942 0.911 0.914 0.855 0.600 0.595 0.791 0.780 0.548 0.841 0.778

DFRVec597 + Path 0.942 0.915 0.927 0.859 0.657 0.629 0.803 0.789 0.550 0.844 0.791

Method	MC28	RG65	PS	Agirre201	SL665	SL999	YP130	WS353	RW	MEN	Avg
Pearson(r)
Path	0.768	0.777	0.837	0.582	0.539	0.453	0.732	0.374	0.020	0.354	0.544
Counter-fitting	0.806	0.806	0.866	0.712	0.697	0.728	0.751	0.624	0.555	0.680	0.722
Paragram-sl	0.781	0.798	0.854	0.751	0.682	0.689	0.736	0.648	0.550	0.712	0.720
word2vec	0.796	0.772	0.786	0.757	0.461	0.454	0.589	0.654	0.508	0.756	0.653
GloVe	0.832	0.800	0.775	0.698	0.433	0.401	0.512	0.652	0.371	0.728	0.620
GloVe-d	0.870	0.822	0.818	0.786	0.476	0.473	0.612	0.732	0.481	0.799	0.687
GW597	0.865	0.825	0.826	0.796	0.483	0.480	0.616	0.736	0.517	0.805	0.695
GW597 + Path	0.884	0.869	0.900	0.816	0.627	0.591	0.783	0.739	0.517	0.809	0.754
DFRVec_word2vec	0.927	0.919	0.922	0.847	0.604	0.577	0.804	0.718	0.539	0.806	0.766
DFRVec_GloVe-d	0.944	0.915	0.909	0.860	0.604	0.620	0.799	0.772	0.522	0.840	0.779
DFRVec597	0.945	0.921	0.917	0.863	0.624	0.607	0.805	0.771	0.553	0.840	0.785
DFRVec597 + Path	0.944	0.920	0.935	0.864	0.668	0.637	0.822	0.778	0.554	0.843	0.797
Spearman(ρ)
Path	0.810	0.771	0.757	0.584	0.579	0.437	0.630	0.310	–0.004	0.324	0.520
Counter-fitting	0.857	0.808	0.831	0.724	0.698	0.735	0.710	0.693	0.612	0.741	0.741
Paragram-sl	0.761	0.775	0.794	0.791	0.679	0.685	0.693	0.730	0.597	0.771	0.727
word2vec	0.781	0.760	0.767	0.765	0.454	0.442	0.570	0.701	0.534	0.771	0.654
GloVe	0.856	0.817	0.789	0.688	0.404	0.374	0.537	0.646	0.384	0.731	0.623
GloVe-d	0.893	0.810	0.803	0.790	0.450	0.460	0.588	0.769	0.502	0.808	0.687
GW597	0.876	0.814	0.809	0.803	0.460	0.466	0.589	0.776	0.538	0.814	0.694
GW597 + Path	0.931	0.894	0.890	0.841	0.625	0.571	0.741	0.795	0.538	0.822	0.765
DFRVec_word2vec	0.942	0.913	0.924	0.829	0.563	0.553	0.771	0.737	0.522	0.810	0.756
DFRVec_GloVe-d	0.945	0.895	0.904	0.846	0.581	0.573	0.768	0.790	0.520	0.842	0.766
DFRVec597	0.939	0.902	0.912	0.847	0.577	0.583	0.778	0.790	0.542	0.842	0.771
DFRVec597 + Path	0.941	0.909	0.918	0.853	0.646	0.620	0.785	0.800	0.546	0.845	0.787
Harmonic(h)
Path	0.788	0.774	0.795	0.583	0.559	0.445	0.677	0.339	–0.011	0.338	0.529
Counter-fitting	0.831	0.807	0.848	0.718	0.697	0.732	0.730	0.657	0.582	0.709	0.731
Paragram-sl	0.771	0.786	0.823	0.770	0.681	0.687	0.714	0.686	0.573	0.740	0.723
word2vec	0.788	0.766	0.777	0.761	0.457	0.448	0.579	0.677	0.521	0.763	0.654
GloVe	0.844	0.809	0.782	0.693	0.418	0.387	0.525	0.649	0.377	0.729	0.621
GloVe-d	0.881	0.816	0.811	0.788	0.463	0.466	0.600	0.750	0.491	0.804	0.687
GW597	0.870	0.819	0.817	0.799	0.471	0.473	0.602	0.755	0.527	0.809	0.694
GW597 + Path	0.907	0.881	0.895	0.828	0.626	0.581	0.761	0.766	0.527	0.815	0.759
DFRVec_word2vec	0.934	0.916	0.923	0.838	0.583	0.565	0.787	0.727	0.530	0.808	0.761
DFRVec_GloVe-d	0.945	0.905	0.907	0.853	0.593	0.595	0.783	0.781	0.521	0.841	0.772
DFRVec597	0.942	0.911	0.914	0.855	0.600	0.595	0.791	0.780	0.548	0.841	0.778
DFRVec597 + Path	0.942	0.915	0.927	0.859	0.657	0.629	0.803	0.789	0.550	0.844	0.791

In Table 4, the sematic similarity of words w_i and w_j, for Path is formula (3); for GW597 + Path is given in formula (2); for Counter-fitting, Paragram-sl, word2vec, Glove, Glove-d, GW597, and DFRVec597 are all the cosine similarity cos(v_{w_i}, v_{w_j}); and for DFRVec597 + Path is a linear combination of the similarities of DFRVec597 and Path, which is computed by formula (9), as the same as those of GW597 + Path, which is given in formula (2). The best value for each dataset is shown in bold.

It can be seen from Table 4 that: 1) For each of the three evaluation values of Pearson, Spearman and Harmonic score, the proposed method DFRVec597 + Path obtains the best result on the ten datasets among the compared methods (including the times of getting the best result and the average value on the ten datasets); 2) The performance of DFRVec597 is better than that of DFRVec_word2vec and DFRVec_GloVe-d, which indicates that it is effective to concatenate their corresponding vectors; 3) Compared with DFRVec597, the performance of DFRVec597 + Path is significantly improved, which means that the linear combination of DFRVec597 and Path is effective; 4) Counter-fitting and Paragram-sl perform better than our methods on SL665, SL999 and RW. This is because that Counter-fitting and Paragram-sl were trained on SL999, and SL665 is a subset of SL999. In addition, in RW the numbers of words, which are actually measured by Counter-fitting and Paragram-sl, are only 873 and 1015, more than half words fail to represent.

We have also compared our DFRVec597 + Path with FRAGE [26]. Since on semantic similarity measurement only the experimental results of spearman(ρ) on RG65, WS353 (named as WS in [26]) and RW datasets have been done in [26], in Table 5 there are only three datasets and the results are directly copied from Gong’s paper. As we can see in Table 5, our proposed method is better than FRAGE except on RW dataset.

Table 5

Spearman(ρ) values of our DFRVec597 + Path and FRAGE

Method	RG65	WS353	RW	Avg(ρ)
FRAGE	0.787	0.693	0.581	0.687
DFRVec597 + Path	0.909	0.800	0.546	0.751

5 Conclusion

This study focuses on how to improve the measurement accuracy of semantic similarity between words, from the perspective of how to encode the various semantic information of words of WordNet into a vector space through word embedding by designing a new model, and how to use it to measure the semantic similarity between words.

Firstly, three sub-models DefVec, FormVec and RelVec, which are used to respectively encode the semantic information of definition, POS and relation described in WordNet into a vector space by using an existing word embedding model, are presented in this paper. Then by combining the three sub-models with the word embedding, a novel model DFRVec is designed. Lastly, based on DFRVec and the path information in WordNet, a new measuring method DFRVec+Path for semantic similarity between words is proposed, which can achieve better performance when compared with many existing methods of semantic similarity measurement.

The significance of the proposed model is that it encodes various semantic information included in WordNet into a word embedding vector which can be used to measure semantic similarity of words more precisely. It can solve the problems that the methods based on word embedding cannot distinguish the senses and POS between two words, and that the performance of the methods based on WordNet is relatively low. However, the model proposed in this paper also has some limitations. For example, the method of encoding semantic information is relatively simple and the source of semantic information is not diverse. Future research may involve the following aspects: (1) apply more effective methods to the process of encoding semantic information, such as neural networks; (2) introduce more external semantic resources, such as PPDB.

Footnotes

Acknowledgments

This work was supported in part by the Humanities and Social Sciences Research Project of the Ministry of Education of China (No. 18YJA740015), and by Chongqing Key Laboratory of Software Theory and Technology, Chongqing, China.

A natural language processing toolkit based on python.

References

, Ritter

and Yen

P.-Y.

, Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews, Publisher, City, 2017.

Duong

, Ahmad

W.U.

, Eskin

, Chang

K.-W.

and Li

J.J.

, Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions, Publisher, City, 2019.

Abdeddaïm

, Vimard

and Soualmia

L.F.

, The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain, Publisher, City, 2018.

Aouicha

M.B.

and Taieb

M.A.H.

, Computing semantic similarity between biomedical concepts using new information content approach, Publisher, City, 2016.

Orkphol

and Yang

, Word Sense Disambiguation Using Cosine Similarity Collaborates with Word2vec and WordNet, Publisher, City, 2019.

Araque

, Zhu

and Iglesias

C.A.

, A semantic similarity-based perspective of affect lexicons for sentiment analysis, Publisher, City, 2019.

Madani

, Erritali

and Bengourram

, Sentiment analysis using semantic similarity and Hadoop MapReduce, Publisher, City, 2019.

Poria

, Chaturvedi

, Cambria

and Bisio

, Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis, in: 2016 international joint conference on neural networks (IJCNN), IEEE, 2016, pp. 4465–4473.

Miller

G.A.

, WordNet: a lexical database for English, Publisher, City, 1995.

10.

Saif

, Zainodin

U.Z.

, Omar

and Ghareb

A.S.

, Weighting-based semantic similarity measure based on topological parameters in semantic taxonomy, Publisher, City, 2018.

11.

and Palmer

, Verbs semantics and lexical selection, in: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, 1994, pp. 133–138.

12.

Zhu

, Li

, Chen

and Peng

, An efficient path computing model for measuring semantic similarity using edge and density, Publisher, City, 2018.

13.

Lin

, An Information-Theoretic Definition of Similarity, Publisher, City, 1998.

14.

Mikolov

, Chen

, Corrado

and Dean

, Efficient estimation of word representations in vector space, Publisher, City, 2013.

15.

Pennington

, Socher

and Manning

, Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

16.

Lee

Y.Y.

, Ke

, Yen

T.Y.

, Huang

H.H.

and Chen

H.H.

, Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement, Publisher, City, 2019.

17.

Wieting

, Bansal

, Gimpel

and Livescu

, From paraphrase database to compositional paraphrasemodel and back, Publisher, City, 2015.

18.

Mrkšić

, Séaghdha

D.O.

, Thomson

, Gašić

, Rojas-Barahona

, Su

P.-H.

, Vandyke

, Wen

T.-H.

and Young

, Counter-fitting word vectors to linguistic constraints, Publisher, City, 2016.

19.

Rada

, Mili

, Bicknell

and Blettner

, Development and application of ametric on semantic nets, Publisher, City, 1989.

20.

Lastra-Díaz

J.J.

, Goikoetxea

, Taieb

M.A.H.

, García-Serrano

, Aouicha

M.B.

and Agirre

, A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art, Publisher, City, 2019.

21.

Taieb

M.A.H.

, Aouicha

M.B.

and Hamadou

A.B.

, Ontology-based approach for measuring semantic similarity, Publisher, City, 2014.

22.

Meng

and Gu

, A new model for measuring word sense similarity in WordNet, in: Proceedings of the 4th international conference on advanced communication and networking. SERSC, Jeju, Korea, 2012, pp. 18–23.

23.

Adhikari

, Dutta

, Mondal

and Singh

, An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology, Publisher, City, 2018.

24.

Cai

, Zhang

, Lu

and Che

, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, Publisher, City, 2018.

25.

Zhang

, Sun

and Zhang

, An information Content-Based Approach for Measuring Concept Semantic Similarity in WordNet, Publisher, City, 2018.

26.

Gong

, He

, Tan

, Qin

, Wang

and Liu

T.-Y.

, FRAGE: frequency-agnostic word representation, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Montréal, Canada, 2018, pp. 1341–1352.

27.

Hill

, Reichart

and Korhonen

, Simlex-999: Evaluating semantic models with (genuine) similarity estimation, Publisher, City, 2015.

28.

Scheepers

, Kanoulas

and Gavves

, Improving Word Embedding Compositionality using Lexicographic Definitions, International World Wide Web Conferences Steering Committee, Lyon, France, 2018.

29.

Jimenez

, Gonzalez

F.A.

, Gelbukh

and Duenas

, word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis, Publisher, City, 2019.

30.

Miller

G.A.

and Charles

W.G.

, Contextual correlates of semantic similarity, Publisher, City, 1991.

31.

Rubenstein

and Goodenough

J.B.

, Contextual correlates of synonymy, Publisher, City, 1965.

32.

Pirró

, A semantic similarity metric combining features and intrinsic information content, Publisher, City, 2009.

33.

Agirre

, Alfonseca

, Hall

, Kravalova

, Paşca

and Soroa

, A study on similarity and relatedness using distributional and wordnet-based approaches, in: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2009, pp. 19–27.

34.

Yang

and Powers

D.M.

, Verb similarity on the taxonomy of WordNet, Masaryk University, 2006.

35.

Finkelstein

, Gabrilovich

, Matias

, Rivlin

, Solan

, Wolfman

and Ruppin

, Placing search in context: The concept revisited, Publisher, City, 2002.

36.

Luong

, Socher

and Manning

, Better word representations with recursive neural networks for morphology, in: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013, pp. 104–113.

37.

Bruni

, Tran

N.-K.

and Baroni

, Multimodal distributional semantics, Publisher, City, 2014.