Solving word sense disambiguation problem using combinatorial PSO

Abstract

In natural language processing, the problem of finding the intended meaning or “sense” of a word which is activated by the use of that word in a particular context is generally known as word sense disambiguation (WSD) problem. The solution to this problem impacts many other fields of natural language processing including sentiment analysis and machine translation. Here, WSD problem is modelled as a combinatorial optimization problem where the goal is to find a sequence of meanings or senses that maximizes the semantic meaning among the targeted words. In this work, an algorithm is proposed that uses a combinatorial version of particle swarm optimization algorithm for solving WSD problem. The test results show that the algorithm performs better than existing methods.

Keywords

Word sense disambiguation particle swarm optimization knowledge-based approach combinatorial PSO

1 Introduction

Human language is ambiguous that many words can be interpreted in different and multiple ways depending on the context in which they are used. For a human being, when an ambiguous word is used in a sentence, he can understand the correct meaning of the word by correlating to the context in which it appears without considering alternative senses. This is not the case with a computer that processes natural language applications, thus creating erroneous results. The words having a different meaning or sense in a different context are called polysemous words. For example, the word “bat” in a sentence can be translated as an implement used in sports to hit balls or as a flying mammal? The process of automatically determining the intended sense of a polysemous word in a context is known as word sense disambiguation. It is a fundamental task in natural language processing applications. WSD heavily relies on knowledge sources. Knowledge sources can be annotated or un-annotated corpora of text, machine-readable dictionaries, semantic networks, etc. [1].

The process of creating the knowledge resources is an expensive and time consuming one. The knowledge source creation must be repeated when the disambiguation scenario changes. That is, for different language and domains, we need to do this process again. This is the problem that mostly affects the field of WSD and is called the knowledge acquisition bottleneck [2]. Over the decades, there are a lot of studies are carried out to find different methods for solving WSD. These methods include supervised methods as well as unsupervised methods. With supervised methods, any of the machine learning algorithms are applied to train the classifiers on large manually-annotated corpora of text. These classifiers, when supplied with new words, they assign them the correct sense using their prior experience. The disadvantage is that the creation of manually annotated corpora for different languages is a big challenge. With unsupervised methods, they employ unannotated corpora of text to distinguish among senses. It works on the assumption that the probability is high for words that appear in similar contexts to have similar senses [3].

Recently, the WSD problem is presented as an optimization problem. T. Pedersen et al. [4] have proposed a definition to solve WSD as a combinatorial optimization problem, which has motivated us to work on this problem. Approximation methods such as swarm intelligence techniques can be used to solve the problem. Particle Swarm Optimization (PSO) algorithm is one such technique which is well known for its fast convergence for solving such problems. PSO algorithm work in real number space. In order to apply PSO algorithm for combinatorial problems, the combinatorial version of PSO proposed by Jarboui et al. [5] is used in this work to solve the WSD problem. The approach is novel as, to the best of our knowledge and belief, this is the first work that uses the combinatorial version of PSO algorithm to solve the problem.

2 Related works

Generally, for WSD problem, the supervised methods give better results than the unsupervised ones [6]. But the creation of annotated corpora of text needs great effort. Also, it is to be made available separately for each and every language under consideration and each domain including their senses. Also, languages evolves over time, requiring to add new words and new senses. For example, the word ‘rock’ means ‘a stone’ and or ‘a music genre’ nowadays [7]. Unsupervised knowledge-based methods use dictionaries and lexical resources such as WordNet. One of the main advantages of knowledge-based approaches is that these apply to any text. This is because the knowledge resources are easily and increasingly available and they are becoming more informative [8].

The Lesk method [9] is a popular knowledge-based method to solve the WSD problem. Lesk Method is a local algorithm and is very sensitive to the exact wording of the sense definitions and hence perform poorly. An improvement in this approach was suggested by T Pedersen et al. [10] named Extended Lesk (E-Lesk) algorithm, which is a global algorithm. They extended the gloss of the sense by the glosses of its related synsets. The semantic relations in the WordNet [11] like hypernymy, hyponymy, meronymy etc. are used for expanding the gloss. Here all the words in a context window are disambiguated by considering all the possible combinations of senses. Each of these combinations is assigned a score based on the overlap among senses’ definitions and their semantic relatedness. Pedersen et al. [4] presented a method of word sense disambiguation that assigns a target word the sense that is most related to the senses of its neighbouring words. This method is a brute force method leading to combinatorial explosion for simple sentences even. Butnaru et al. [12] uses a DNA sequencing technique to solve WSD problem. D S Chaplot et al. [13] model the WSD problem as a maximum a posteriori (MAP) inference query on Markov random field (MRF). It is a graph based unsupervised algorithm that tries to maximize the total joint probability of all the senses in the context.

Among the approximation approaches, J. Cowie et al. [14] proposed a method using a machine-readable dictionary and the technique used for optimization is simulated annealing (SA). C. Zhang et al. [15] uses genetic algorithm (GA) for solving word sense disambiguation, but, the algorithm takes too much time to converge. W Alsaeedan et al. [16] uses self-adaptive GA (SAGA) for WSD problem that automatically tune its cross over and mutation probalities. Another proposal by W Alsaeedan et al. [17] uses hybrid of SAGA and ant colony optimization algorithm to solve WSD. D. Schwab et al. [18, 19] propose ant colony algorithm for solving WSD and shows that the ant colony based algorithm outperforms both GA and SA. The disadvantage of this method is that the time to converge is uncertain. S. Abdualhaja et al. [8] make use of artificial bee colony optimization algorithm to solve the WSD problem. W AL-Saiagh et al. [20] has developed a hybrid PSO algorithm with simulated annealing to solve the problem. They use E-Lesk algorithm and JCN method for finding the similarity score.

3 WSD as an optimization problem

The definition provided by Pedersen et al. [4] to represent WSD problem as a combinatorial optimization problem is used in this work. Let C = {w₀, w₁, . . . . . w_n-1} be a set of n words in a window and w₀ be the target word to be disambiguated. Suppose each word w_i, 0 ≤ i ≤ n - 1, has m_i possible senses s_i1, s_i2, . . . . . s_{im
_i}. Then, the objective function is

$\underset{i = 1}{\overset{m}{argmax}} \sum_{j = 0}^{n - 1} \max {rel (s_{0 i}, s_{j 1}), . . . . . rel (s_{0 i}, s_{{jm}_{j}})}$ (1)

Here rel is the semantic relatedness score between any two senses. Using Equation (1), each sense of the target word is assigned a score. This score depends on the maximum relatedness of the target word with senses of other words [8]. We aim to find a sequence of senses which maximizes the overall relatedness score among the words in a given sentence. It assigns a score to a sequence of senses of all the words in the context window than assigning a score to each sense of the target word. This sequence is a solution configuration that we are trying to find. On multiple iterations of the applied optimization algorithm, the sequence is modified to find the one having maximum score.

4 Solving WSD using combinatorial PSO

PSO algorithm introduced by Kennedy and Eberhart [21] is one of the mainstream metaheuristics applied for optimization problems. Each swarm consists of a set of particles and the individual and group behaviour of these particles guide them to the desired goal. These particles are the potential solutions to the optimization problem we are dealing with. Combinatorial version of the original PSO algorithm proposed by B. Jarboui et al. [5] is used here for solving the WSD problem. A particle position (solution to the optimization problem) is denoted by $X_{i}^{t} = {x_{i 1}^{t}, x_{i 2}^{t}, . . . . x_{in}^{t}}$ takes a value from {-1, 0, 1} according to the state of the i^th particle at iteration t. $Y_{i}^{t} = {y_{i 1}^{t}, y_{i 2}^{t}, . . . . y_{in}^{t}}$ is a vector associated to the solution $X_{i}^{t}$ . It is a dummy variable to represent move from the combinatorial state to the continuous state and vice-versa. $y_{ij}^{t} = {\begin{matrix} 1 & , if x_{ij}^{t} = G_{j}^{t}, \\ - 1 & , if x_{ij}^{t} = p_{j}^{t}, \\ - 1 or 1 randomly & , if x_{ij}^{t} = G_{j}^{t} = p_{j}^{t}, \\ 0 & , otherwise \end{matrix}$ (2)

Let $d_{1} = - 1 - y_{ij}^{t - 1}$ be the distance between current particle position $x_{ij}^{t - 1}$ and the best position achieved by the i^th particle (p_i) and $d_{2} = 1 - y_{ij}^{t - 1}$ be the distance between the current particle position $x_{ij}^{t - 1}$ and the best position among all particles in the swarm (G_i). Then the velocity update equation in Combinatorial PSO is given in Equation (3). $v_{ij}^{t} = v_{ij}^{t - 1} + r_{1} . c_{1} (- 1 - y_{ij}^{t - 1}) + r_{2} . c_{2} (1 - y_{ij}^{t - 1})$ (3) With this function, the change of the velocity $v_{ij}^{t}$ depends on $y_{ij}^{t - 1}$ . If $x_{ij}^{t - 1} = G_{j}^{t - 1}$ then $y_{ij}^{t - 1} = 1$ . Thereafter d₂ = 0, and d₁ = -2. If $x_{ij}^{t - 1} = p_{ij}^{t - 1}$ then $y_{ij}^{t - 1} = - 1$ . Then d₂ = 2, and d₁ = 0. For $x_{ij}^{t - 1} \neq G_{j}^{t - 1}$ and $x_{ij}^{t - 1} \neq p_{j}^{t - 1}$ , $y_{ij}^{t - 1} = 0$ and d₂ = 1 and d₁ = -1. Now the parameters r₁, r₂, c₁ and c₂ will decide the value of velocity. The case where $x_{ij}^{t - 1} = G_{j}^{t - 1}$ and $x_{ij}^{t - 1} = p_{j}^{t - 1}$ , $y_{ij}^{t - 1}$ takes a value in {-1,1}.

4.1 Solution construction

The update of the solution is computed within $y_{ij}^{t}$ .

$λ_{i, j}^{t} = y_{i, j}^{t - 1} + v_{i, j}^{t}$ (4) The value of $y_{ij}^{t}$ is adjusted according to the following function (Equation (5)):

$y_{ij}^{t} = {\begin{matrix} 1 & , if λ_{ij}^{t} > α, \\ - 1 & , if λ_{ij}^{t} < - α, \\ 0 & , otherwise \end{matrix}$ (5) where α is a parameter. The new solution is given in equation (3).

$x_{ij}^{t} = {\begin{matrix} G_{j}^{t - 1} & , if y_{ij}^{t} = 1, \\ p_{j}^{t - 1} & , if y_{ij}^{t} = - 1, \\ a random number & , otherwise \end{matrix}$ (6)

4.2 Proposed algorithm

The proposed method is an unsupervised knowledge-based method which uses WordNet for solving the word sense disambiguation problem. We solve WSD as a combinatorial optimization problem. Here, Combinatorial PSO is used for solving the WSD, which is a global method. It uses a knowledge-based local algorithm, the E-Lesk algorithm. Given a set of target words as input, the proposed algorithm finds a corresponding sequence of senses representing the target words. The general description of the algorithm is presented as Procedure D_CPSO () (Algorithm 1).

Algorithm 1 Solving WSD using Combinatorial PSO

Procedure D_CPSO ()

Read a sentence

Call Create_Seq ()

Call WSD_CPSO ()

Find the Sum of the score sequence

Find the Sequence having maximum sum

Print the senses from that sequence

Let C = {w₀, w₁, . . . . . w_n-1} be a set of n words and w₀ ∈ C is the target word. Let seq₁ = {s_0,1, . . . . s_n-1,1} is a sequence of senses corresponding to the first sense of each word in the context window. Let S = {seq₁, seq₂, . . . . seq_m} be the set of all sequences that covers all the senses’ combinations of all the words in the context window. Now we can extend the Equation (1) for this sequence as

${argmax}_{seq \in S} score (seq)$ (7) where the score is the value assigned to a sequence of senses based on their semantic relatedness with each other.

Algorithm 2 Sequences creation

Procedure Create_Seq ()

Select the first polysemous word w₀ in the sentence

Find the synsets of w₀

for (every synset of w₀) do

create a sequence of senses

create an empty score sequence, x

create an empty velocity sequence, v

create an empty personal best sequence, p

end for

for (every word in the sentence) do

if (have a synset) then

for (Every sequence) do

Add a random synset to the sequence

Find the Relatedness score and add to score sequence, x

Add score to personal best sequence, p

Set velocity as 0,add to velocity sequence, v

end for

end if

end for

Find the global best sequence, G

4.2.1 Sequence creation

We used E-Lesk algorithm [10] as the local algorithm. The definitions of the senses as well as the semantic relatedness given by WordNet [11], are used as such. We take a passage as input and process sentence by sentence. That is, our process is applied to a sentence in a passage, and then continue to the next sentence in the passage. Before applying the combinatorial PSO, we need to provide candidate sequences. Procedure Create_Seq () (Algorithm 2) is used for the same. The process starts with the first poly-semous word in the sentence. The number of senses it has will be the no. of candidate sequences. That is sequences are formed in such a way that first sense will be one of the sense of the first word. So if first polysemous word w₀ have k senses, then there will be k sequences and first sense of each sequence will be one of w₀’s sense. Senses are obtained by the synset from WordNet [11]. The next step will be completing the sequence. That is, for each sequence, we select the second entry by randomly selecting one of the senses of the next polysemous word. Similarly, the sequence grows by selecting the sense of successive polysemous words in that sentence. Finally, the length of the sequence will be the number of polysemous words in that sentence.

Algorithm 3 Combinatorial PSO

Procedure WSD_CPSO ()

for (max_iteration) do

for each sequence, x_ido

for for each x_{i
_j}do

ifx_ij = G_jthen

ifx_ij = p_ijthen

y_ij = 1 or -1 randomly

else

y_ij = 1

end if

else ifx_ij = p_ij

y_ij = -1

else

y_ij = 0

end if

d₁ = -1 - y_ij

d₂ = 1 - y_ij

v_ij = w . v_ij + r₁ . c₁ . d₁ + r₂ . c₂ . d₂

λ_ij = y_ij + v_ij

ifλ_ij > αthen

y_ij = 1

else ifλ_ij < - α

y_ij = -1

else

y_ij = 0

end if

ify_ij = 1 then

x_ij = G_j

else ify_ij = -1

x_ij = p_ij

end if

end for

Update Personal best sequence, p

Update Global best sequence, G

4.2.2 Applying combinatorial PSO

For every sequence, it contains senses of words in a sentence. Now we want to find out a sequence which will have the suitable sense of the words in that sentence. In each sequence, for every two consecutive senses, there will be a score. We need to maximize this score. If there are p polysemous words then there will be p-1 scores. Let x_i = {x_i1, x_i2, . . . . x_ip-1} be scores in sequence i. So we need to optimize the score. For that we can use the combinatorial PSO. For that we need to define a velocity sequence v_i = {v_i1, v_i2, . . . . v_ip-1}. Initially all velocity values will be zero. Similarly we also need sequence for personal best sequence p_i = {p_{i
₁}, p_{i
₂}, . . . . p_{i
_p-1}} and global best sequence G = {G₁, G₂, . . . . G_p-1}. Procedure WSD_CPSO () (Algorithm 3) finds the same. The value of x_iandv_i will be updated using the equations (2), (3), (4), (5) and (3). After executing max_iteration times, the sequence having maximum score is selected as the solution sequence. This process is repeated for next sentence and so on. The final output will be the possible sense for the words in the given passage (excluding monosemous words, preposition, etc.). Procedure D_CPSO () (Algorithm 1) algorithm invokes Algorithm 2 and Algorithm 3 to find the scores of each sequence. It then finds the sequence having the maximum score and prints the senese from that sequence.

5 Experimental evaluation

The proposed algorithm is implemented using Python on a laptop machine with Intel Core i5-4200 processor @1.6GHz x 2 with 3.18GiB RAM run on Linux Kernel 3.19.0-32-generics with 809.9 GB hard drive. We use the natural language toolkit (nltk) package for language processing. nltk provides a WordNet interface, from which we can obtain the senses of a word as synsets and relatedness score. We use path similarity for the relatedness score. The propositions, pronouns, connectives, etc. present in the given input is eliminated before processing. The parameters of the combinatorial PSO algorithm are found on trial and error basis and optimal values are set based on the result obtained for the input passage from “Oliver Twist”. The acceleration constants c₁ and c₂ are fixed at 1 and parameter α also set to 1. The max_iteration is set to 50. These values are fixed after several independent trials of different combinations.

We evaluate our system by using passages from a set of text documents. The documents contains lines from the famous novel ‘Oliver Twist’, ‘Immortals of Meluha’ and ‘Knights of Art’. These documents are given as input and our system is used for disambiguating all words in these passages. The system takes sentence by sentence in the passage and process one sentence at a time. A result file is also generated with each sentence and the meaning or sense given by the system for each word in that passage. We did crosschecking of the identified senses by the system manually for its correctness.

The passage from ‘Oliver Twist’ is given as the input test document, whose screen shot is shown in Fig. 1. The screen shot of the output produced by the proposed algorithm is shown in Fig. 2. The system gives a sense for 62 words in the test document. Out of 62 words, 55 identified senses are correct, giving a precision of 88.71%.

Fig. 1

Snapshot of passage from “Oliver Twist”.

Fig. 2

Snapshot of result obtained for D_CPSO algorithm for passage from “Oliver Twist”.

For comparing the proposed algorithm with some of the existing algorithms, we implemented E-Lesk [10] and bee colony solution for WSD (D_bees) [8]. Both the systems are fed with the same input passages and the result obtained are analyzed. It is observed that the proposed system attained maximum performance whereas E-Lesk performed the minimum. Bee colony implementation produces better output than E-Lesk. The proposed D_CPSO implementation gives more stabilized output over several runs. It also gives more correct senses as compared to the other algorithms. For test document shown in Fig. 1, bee colony solution for WSD [8] produces 14 wrong senses with a precision of 77.41% and E-Lesk [10] produces 20 wrong senses with a precision of 67.74%.

Similar evaluation is done for a passage from the novel ’Immortals of Meluha’ consisting of 176 words. The D_CPSO algorithm identified the correct senses for 146 words out of 176 words with a precision of 84.88%. The snapshot of the input is presented as Fig. 3 and result as Fig. 4. The same input is fed to the D_bees algorithm which identified the correct senses for 133 words giving a precision of 77.32%. The E-Lesk algorithm identified 120 words correctly with a precision of 69.76% only.

Fig. 3

Snapshot of passage from “Immortals of Meluha”.

Fig. 4

Snapshot of result obtained for D_CPSO algorithm for passage from “Immortals of Meluha”

Another passage with more number of words from the novel ‘Knights of Art’ with 182 words is fed to the proposed algorithm. The screenshot of the passage is shown in Fig. 5 and the result obtained is shown in Fig. 6. With this input, the proposed algorithm correctly identified the senses for 150 words whereas the D_bees algorithm identified 139 word senses correctly and the E-Lesk algorithm could identify only 125 senses correctly.

Fig. 5

Snapshot of passage from “Knights of Art”

Fig. 6

Snapshot of result obtained for D_CPSO algorithm for passage from “Knights of Art”

The precision of all these algorithms for 62 words from ‘Oliver Twist’, for 176 words from the ‘Immortals of Meluha’ and for 182 words from ‘Knights of Art’ are presented in Table 1.

Table 1

Performance Analysis

Input with number of words	Precision (%)
	D_CPSO	D_bees	E-Lesk
Oliver Twist (62)	88.71	77.42	67.74
Immortals of Meluha (176)	84.88	77.32	69.76
Knights of Art (182)	83.51	76.37	68.68

From Table 1, it is clear that, the proposed method performs better than the other state-of-art techniques. The percentage of precision obtained is above 83% with all the test cases for the proposed algorithm, which is promising. On analyzing the running time required by all the three algorithms, the proposed algorithm takes slightly more time than the other two algorithms on all the test inputs.

6 Conclusion and future work

Word Sense Disambiguation is defined as identifying computationally the intended sense of a word that is activated in a certain context. In this work, a knowledge-based unsupervised method for solving word sense disambiguation problem is proposed. We have modelled the problem as a combinatorial optimization problem. It uses a combinatorial version of PSO algorithm for solving the problem. When given a text document as input, the proposed algorithm tries to disambiguate all words in the document. On comparing the proposed method to some of the existing methods, the proposed algorithm performs better for tested inputs. As a future work, the algorithm performance is to be analyzed with benchmark data sets.

References

Navigli

, Word sense disambiguation: A survey, ACM Computing Surveys (CSUR)41(2) (2009), 10.

Gale

W.A.

, Church

K.W.

and Yarowsky

, A method for disambiguating word senses in a large corpus, Computers and the Humanities26 (1992), 415–439.

Agirre

and Edmonds

P.G.

, Word sense disambiguation: Algorithms and applications, Springer Science & Business Media33 (2007), 1.

Pedersen

, Banerjee

and Patwardhan

, Maximizing semantic relatedness to perform word sense disambiguation, Research Report UMSI, University of Minnesota Supercomputing Institute25 (2005), 1.

Jarboui

, Cheikh

, Siarry

and Rebai

, Combinatorial particle swarm optimization (cpso) for partitional clustering problem, Applied Mathematics and Computation192(2) (2007), 337–345.

Navigli

, Litkowski

K.C.

and Hargraves

, Semeval-2007 task 07: coarse-grained english all-words task. Proceedings of the 4th InternationalWorkshop on Semantic Evaluations, Association for Computational Linguistics, pages 30–35, 2007.

Tahmasebi

, Risse

and Dietze

, Towards automatic language evolution tracking, a study onword sense tracking. Proceedings of Joint Workshop on Knowledge Evolution and Ontology Dynamics, 2011.

Abdualhaja

and Zimmerman

K.H.

, D-bees: A novel method inspired by bee colony optimization for solving word sense disambiguation, Swarm and evolutionary computation27 (2016), 188–195.

Lesk

, Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th Annual International Conference on Systems Documentation, ACM, pages 24–26, 1986.

10.

Banerjee

and Pedersen

, An adapted lesk algorithm for word sense disambiguation using wordnet. Proceedings of the international conference on Computational Linguistics and Intelligent Text Processing, pages 136–145, 2002.

11.

Miller

G.A.

, Wordnet: a lexical database for English, Communications of the ACM38(11) (1995), 39–41.

12.

Butnaru

A.M.

, Ionescu

R.T.

, Hristea

, Shotgunwsd: An unsupervised algorithm for global word sense disambiguation inspired by dna sequencing. arXiv preprint arXiv:1707.08084, 2017.

13.

Chaplot

D.S.

, Bhattacharyya

and Paranjape

, Unsupervised word sense disambiguation using markov random field and dependency parser. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 2217–2223, 2015.

14.

Cowie

, Guthrie

and Guthrie

, Lexical disambiguation using simulated annealing, Proceedings of the 14th Conference on Computational Linguistics, Association for Computational Linguistics1(11) (1992), 359–365.

15.

Zhang

, Zhou

and Martin

, Genetic word sense disambiguation algorithm, Proceedings of the Second International Symposium on Intelligent Information Technology Application1 (2008), 123–127.

16.

Alsaeedan

and Menai

M.E.B.

, A self-adaptive genetic algorithm for the word sense disambiguation problem. Proceedings of the international Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Springer, pages 581–590, 2015.

17.

Alsaeedan

, Menai

M.E.B.

and Al-Ahmadi

, A hybrid genetic-ant colony optimization algorithm for the word sense disambiguation problem, Information Sciences, Elsevier417 (2017), 20–38.

18.

Schwab

, Guillaume

, Aglobal ant colony algorithm for word sense disambiguation based on semantic relatedness. Highlights in Practical Applications of Agents and Multiagent Systems, pages 257–264, 2011.

19.

Schwab

, Goulian

and Tchechmedjiev

, Worst-case complexity and empirical evaluation of artificial intelligence methods for unsupervised word sense disambiguation, International Journal of Web Engineering and Technology8(2) (2013), 124–153.

20.

AL-Saiagh

, Tiun

, AL-Saffar

, Awang

and Alkhaleefa

A.S.

, Word sense disambiguation using hybrid swarm intelligence approach, PLoS ONE13(12) (2018), 1.

21.

Kennedy

and Eberhart

, Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, pages 1942–1948, 1995.