Abstract
In natural language processing, the problem of finding the intended meaning or “sense” of a word which is activated by the use of that word in a particular context is generally known as word sense disambiguation (WSD) problem. The solution to this problem impacts many other fields of natural language processing including sentiment analysis and machine translation. Here, WSD problem is modelled as a combinatorial optimization problem where the goal is to find a sequence of meanings or senses that maximizes the semantic meaning among the targeted words. In this work, an algorithm is proposed that uses a combinatorial version of particle swarm optimization algorithm for solving WSD problem. The test results show that the algorithm performs better than existing methods.
Keywords
Introduction
Human language is ambiguous that many words can be interpreted in different and multiple ways depending on the context in which they are used. For a human being, when an ambiguous word is used in a sentence, he can understand the correct meaning of the word by correlating to the context in which it appears without considering alternative senses. This is not the case with a computer that processes natural language applications, thus creating erroneous results. The words having a different meaning or sense in a different context are called polysemous words. For example, the word “bat” in a sentence can be translated as an implement used in sports to hit balls or as a flying mammal? The process of automatically determining the intended sense of a polysemous word in a context is known as word sense disambiguation. It is a fundamental task in natural language processing applications. WSD heavily relies on knowledge sources. Knowledge sources can be annotated or un-annotated corpora of text, machine-readable dictionaries, semantic networks, etc. [1].
The process of creating the knowledge resources is an expensive and time consuming one. The knowledge source creation must be repeated when the disambiguation scenario changes. That is, for different language and domains, we need to do this process again. This is the problem that mostly affects the field of WSD and is called the knowledge acquisition bottleneck [2]. Over the decades, there are a lot of studies are carried out to find different methods for solving WSD. These methods include supervised methods as well as unsupervised methods. With supervised methods, any of the machine learning algorithms are applied to train the classifiers on large manually-annotated corpora of text. These classifiers, when supplied with new words, they assign them the correct sense using their prior experience. The disadvantage is that the creation of manually annotated corpora for different languages is a big challenge. With unsupervised methods, they employ unannotated corpora of text to distinguish among senses. It works on the assumption that the probability is high for words that appear in similar contexts to have similar senses [3].
Recently, the WSD problem is presented as an optimization problem. T. Pedersen et al. [4] have proposed a definition to solve WSD as a combinatorial optimization problem, which has motivated us to work on this problem. Approximation methods such as swarm intelligence techniques can be used to solve the problem. Particle Swarm Optimization (PSO) algorithm is one such technique which is well known for its fast convergence for solving such problems. PSO algorithm work in real number space. In order to apply PSO algorithm for combinatorial problems, the combinatorial version of PSO proposed by Jarboui et al. [5] is used in this work to solve the WSD problem. The approach is novel as, to the best of our knowledge and belief, this is the first work that uses the combinatorial version of PSO algorithm to solve the problem.
Related works
Generally, for WSD problem, the supervised methods give better results than the unsupervised ones [6]. But the creation of annotated corpora of text needs great effort. Also, it is to be made available separately for each and every language under consideration and each domain including their senses. Also, languages evolves over time, requiring to add new words and new senses. For example, the word ‘rock’ means ‘a stone’ and or ‘a music genre’ nowadays [7]. Unsupervised knowledge-based methods use dictionaries and lexical resources such as WordNet. One of the main advantages of knowledge-based approaches is that these apply to any text. This is because the knowledge resources are easily and increasingly available and they are becoming more informative [8].
The Lesk method [9] is a popular knowledge-based method to solve the WSD problem. Lesk Method is a local algorithm and is very sensitive to the exact wording of the sense definitions and hence perform poorly. An improvement in this approach was suggested by T Pedersen et al. [10] named Extended Lesk (E-Lesk) algorithm, which is a global algorithm. They extended the gloss of the sense by the glosses of its related synsets. The semantic relations in the WordNet [11] like hypernymy, hyponymy, meronymy etc. are used for expanding the gloss. Here all the words in a context window are disambiguated by considering all the possible combinations of senses. Each of these combinations is assigned a score based on the overlap among senses’ definitions and their semantic relatedness. Pedersen et al. [4] presented a method of word sense disambiguation that assigns a target word the sense that is most related to the senses of its neighbouring words. This method is a brute force method leading to combinatorial explosion for simple sentences even. Butnaru et al. [12] uses a DNA sequencing technique to solve WSD problem. D S Chaplot et al. [13] model the WSD problem as a maximum a posteriori (MAP) inference query on Markov random field (MRF). It is a graph based unsupervised algorithm that tries to maximize the total joint probability of all the senses in the context.
Among the approximation approaches, J. Cowie et al. [14] proposed a method using a machine-readable dictionary and the technique used for optimization is simulated annealing (SA). C. Zhang et al. [15] uses genetic algorithm (GA) for solving word sense disambiguation, but, the algorithm takes too much time to converge. W Alsaeedan et al. [16] uses self-adaptive GA (SAGA) for WSD problem that automatically tune its cross over and mutation probalities. Another proposal by W Alsaeedan et al. [17] uses hybrid of SAGA and ant colony optimization algorithm to solve WSD. D. Schwab et al. [18, 19] propose ant colony algorithm for solving WSD and shows that the ant colony based algorithm outperforms both GA and SA. The disadvantage of this method is that the time to converge is uncertain. S. Abdualhaja et al. [8] make use of artificial bee colony optimization algorithm to solve the WSD problem. W AL-Saiagh et al. [20] has developed a hybrid PSO algorithm with simulated annealing to solve the problem. They use E-Lesk algorithm and JCN method for finding the similarity score.
WSD as an optimization problem
The definition provided by Pedersen et al. [4] to represent WSD problem as a combinatorial optimization problem is used in this work. Let C = {w0, w1, . . . . . wn-1} be a set of n words in a window and w0 be the target word to be disambiguated. Suppose each word w i , 0 ≤ i ≤ n - 1, has m i possible senses si1, si2, . . . . . s im i . Then, the objective function is
Here rel is the semantic relatedness score between any two senses. Using Equation (1), each sense of the target word is assigned a score. This score depends on the maximum relatedness of the target word with senses of other words [8]. We aim to find a sequence of senses which maximizes the overall relatedness score among the words in a given sentence. It assigns a score to a sequence of senses of all the words in the context window than assigning a score to each sense of the target word. This sequence is a solution configuration that we are trying to find. On multiple iterations of the applied optimization algorithm, the sequence is modified to find the one having maximum score.
PSO algorithm introduced by Kennedy and Eberhart [21] is one of the mainstream metaheuristics applied for optimization problems. Each swarm consists of a set of particles and the individual and group behaviour of these particles guide them to the desired goal. These particles are the potential solutions to the optimization problem we are dealing with. Combinatorial version of the original PSO algorithm proposed by B. Jarboui et al. [5] is used here for solving the WSD problem. A particle position (solution to the optimization problem) is denoted by
Let
The update of the solution is computed within
The proposed method is an unsupervised knowledge-based method which uses WordNet for solving the word sense disambiguation problem. We solve WSD as a combinatorial optimization problem. Here, Combinatorial PSO is used for solving the WSD, which is a global method. It uses a knowledge-based local algorithm, the E-Lesk algorithm. Given a set of target words as input, the proposed algorithm finds a corresponding sequence of senses representing the target words. The general description of the algorithm is presented as Procedure D_CPSO () (Algorithm 1).
Procedure D_CPSO ()
Read a sentence
Call Create_Seq ()
Call WSD_CPSO ()
Find the Sum of the score sequence
Find the Sequence having maximum sum
Print the senses from that sequence
Let C = {w0, w1, . . . . . wn-1} be a set of n words and w0 ∈ C is the target word. Let seq1 = {s0,1, . . . . sn-1,1} is a sequence of senses corresponding to the first sense of each word in the context window. Let S = {seq1, seq2, . . . . seq m } be the set of all sequences that covers all the senses’ combinations of all the words in the context window. Now we can extend the Equation (1) for this sequence as
Procedure Create_Seq ()
Select the first polysemous word w0 in the sentence
Find the synsets of w0
create a sequence of senses
create an empty score sequence, x
create an empty velocity sequence, v
create an empty personal best sequence, p
Add a random synset to the sequence
Find the Relatedness score and add to score sequence, x
Add score to personal best sequence, p
Set velocity as 0,add to velocity sequence, v
Find the global best sequence, G
We used E-Lesk algorithm [10] as the local algorithm. The definitions of the senses as well as the semantic relatedness given by WordNet [11], are used as such. We take a passage as input and process sentence by sentence. That is, our process is applied to a sentence in a passage, and then continue to the next sentence in the passage. Before applying the combinatorial PSO, we need to provide candidate sequences. Procedure Create_Seq () (Algorithm 2) is used for the same. The process starts with the first poly-semous word in the sentence. The number of senses it has will be the no. of candidate sequences. That is sequences are formed in such a way that first sense will be one of the sense of the first word. So if first polysemous word w0 have k senses, then there will be k sequences and first sense of each sequence will be one of w0’s sense. Senses are obtained by the synset from WordNet [11]. The next step will be completing the sequence. That is, for each sequence, we select the second entry by randomly selecting one of the senses of the next polysemous word. Similarly, the sequence grows by selecting the sense of successive polysemous words in that sentence. Finally, the length of the sequence will be the number of polysemous words in that sentence.
Procedure WSD_CPSO ()
y ij = 1 or -1 randomly
y ij = 1
y ij = -1
y ij = 0
d1 = -1 - y ij
d2 = 1 - y ij
v ij = w . v ij + r1 . c1 . d1 + r2 . c2 . d2
λ ij = y ij + v ij
y ij = 1
y ij = -1
y ij = 0
x ij = G j
x ij = p ij
Update Personal best sequence, p
Update Global best sequence, G
Applying combinatorial PSO
For every sequence, it contains senses of words in a sentence. Now we want to find out a sequence which will have the suitable sense of the words in that sentence. In each sequence, for every two consecutive senses, there will be a score. We need to maximize this score. If there are p polysemous words then there will be p-1 scores. Let x i = {xi1, xi2, . . . . xip-1} be scores in sequence i. So we need to optimize the score. For that we can use the combinatorial PSO. For that we need to define a velocity sequence v i = {vi1, vi2, . . . . vip-1}. Initially all velocity values will be zero. Similarly we also need sequence for personal best sequence p i = {p i 1 , p i 2 , . . . . p i p-1 } and global best sequence G = {G1, G2, . . . . Gp-1}. Procedure WSD_CPSO () (Algorithm 3) finds the same. The value of x i andv i will be updated using the equations (2), (3), (4), (5) and (3). After executing max_iteration times, the sequence having maximum score is selected as the solution sequence. This process is repeated for next sentence and so on. The final output will be the possible sense for the words in the given passage (excluding monosemous words, preposition, etc.). Procedure D_CPSO () (Algorithm 1) algorithm invokes Algorithm 2 and Algorithm 3 to find the scores of each sequence. It then finds the sequence having the maximum score and prints the senese from that sequence.
Experimental evaluation
The proposed algorithm is implemented using Python on a laptop machine with Intel Core i5-4200 processor @1.6GHz x 2 with 3.18GiB RAM run on Linux Kernel 3.19.0-32-generics with 809.9 GB hard drive. We use the natural language toolkit (nltk) package for language processing. nltk provides a WordNet interface, from which we can obtain the senses of a word as synsets and relatedness score. We use path similarity for the relatedness score. The propositions, pronouns, connectives, etc. present in the given input is eliminated before processing. The parameters of the combinatorial PSO algorithm are found on trial and error basis and optimal values are set based on the result obtained for the input passage from “Oliver Twist”. The acceleration constants c1 and c2 are fixed at 1 and parameter α also set to 1. The max_iteration is set to 50. These values are fixed after several independent trials of different combinations.
We evaluate our system by using passages from a set of text documents. The documents contains lines from the famous novel ‘Oliver Twist’, ‘Immortals of Meluha’ and ‘Knights of Art’. These documents are given as input and our system is used for disambiguating all words in these passages. The system takes sentence by sentence in the passage and process one sentence at a time. A result file is also generated with each sentence and the meaning or sense given by the system for each word in that passage. We did crosschecking of the identified senses by the system manually for its correctness.
The passage from ‘Oliver Twist’ is given as the input test document, whose screen shot is shown in Fig. 1. The screen shot of the output produced by the proposed algorithm is shown in Fig. 2. The system gives a sense for 62 words in the test document. Out of 62 words, 55 identified senses are correct, giving a precision of 88.71%.

Snapshot of passage from “Oliver Twist”.

Snapshot of result obtained for D_CPSO algorithm for passage from “Oliver Twist”.
For comparing the proposed algorithm with some of the existing algorithms, we implemented E-Lesk [10] and bee colony solution for WSD (D_bees) [8]. Both the systems are fed with the same input passages and the result obtained are analyzed. It is observed that the proposed system attained maximum performance whereas E-Lesk performed the minimum. Bee colony implementation produces better output than E-Lesk. The proposed D_CPSO implementation gives more stabilized output over several runs. It also gives more correct senses as compared to the other algorithms. For test document shown in Fig. 1, bee colony solution for WSD [8] produces 14 wrong senses with a precision of 77.41% and E-Lesk [10] produces 20 wrong senses with a precision of 67.74%.
Similar evaluation is done for a passage from the novel ’Immortals of Meluha’ consisting of 176 words. The D_CPSO algorithm identified the correct senses for 146 words out of 176 words with a precision of 84.88%. The snapshot of the input is presented as Fig. 3 and result as Fig. 4. The same input is fed to the D_bees algorithm which identified the correct senses for 133 words giving a precision of 77.32%. The E-Lesk algorithm identified 120 words correctly with a precision of 69.76% only.

Snapshot of passage from “Immortals of Meluha”.

Snapshot of result obtained for D_CPSO algorithm for passage from “Immortals of Meluha”
Another passage with more number of words from the novel ‘Knights of Art’ with 182 words is fed to the proposed algorithm. The screenshot of the passage is shown in Fig. 5 and the result obtained is shown in Fig. 6. With this input, the proposed algorithm correctly identified the senses for 150 words whereas the D_bees algorithm identified 139 word senses correctly and the E-Lesk algorithm could identify only 125 senses correctly.

Snapshot of passage from “Knights of Art”

Snapshot of result obtained for D_CPSO algorithm for passage from “Knights of Art”
The precision of all these algorithms for 62 words from ‘Oliver Twist’, for 176 words from the ‘Immortals of Meluha’ and for 182 words from ‘Knights of Art’ are presented in Table 1.
Performance Analysis
From Table 1, it is clear that, the proposed method performs better than the other state-of-art techniques. The percentage of precision obtained is above 83% with all the test cases for the proposed algorithm, which is promising. On analyzing the running time required by all the three algorithms, the proposed algorithm takes slightly more time than the other two algorithms on all the test inputs.
Word Sense Disambiguation is defined as identifying computationally the intended sense of a word that is activated in a certain context. In this work, a knowledge-based unsupervised method for solving word sense disambiguation problem is proposed. We have modelled the problem as a combinatorial optimization problem. It uses a combinatorial version of PSO algorithm for solving the problem. When given a text document as input, the proposed algorithm tries to disambiguate all words in the document. On comparing the proposed method to some of the existing methods, the proposed algorithm performs better for tested inputs. As a future work, the algorithm performance is to be analyzed with benchmark data sets.
