Word-embedding-based pseudo-relevance feedback for Arabic information retrieval

Abstract

Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary mismatch between users’ queries and relevant documents, expansion terms are selected without considering their similarity to the original query terms. In this article, we propose a method to incorporate word embedding (WE) similarity into PRF models for Arabic information retrieval (IR). The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. Experiments are conducted on the standard Arabic TREC 2001/2002 collection using three neural WE models. The obtained results show that our PRF extensions significantly outperform their baseline PRF models. Moreover, they enhanced the baseline IR model by 22% and 68% for the mean average precision (MAP) and the robustness index (RI), respectively.

Keywords

Arabic information retrieval pseudo-relevance feedback query expansion word embedding

1. Introduction

User queries are usually too short to describe the information needs accurately, often leading to vocabulary mismatch between queries and documents and poor retrieval performance of relevant documents. In order to deal with these problems, query expansion techniques have gained interest in the last decades [1 –5]. Pseudo-relevance feedback (PRF) has been proven to be an effective query expansion approach to deal with vocabulary mismatch between users’ queries and documents. This approach expands users’ queries by selecting relevant terms from the top retrieved documents, called top pseudo-relevant documents. Although PRF techniques can yield good performance [1,4,6,7], they primarily depend on the distribution of expansion terms in the set of pseudo-relevant documents; the similarity between the expansion terms and the original query terms is usually not explicitly taken into account. Ideally, however, expansion terms should be selected based on their similarity to query terms as well as their distribution in the set of pseudo-relevant documents. Some studies have indeed proposed to do so with mutual information [8 –10]. We propose here to use word embedding for this task and focus on the Arabic language. This last choice is motivated by two aspects: first, this language has been less studied for information retrieval (IR) purposes than most European (and to a certain extent Asian) languages, and second, words in Arabic have a higher degree of ambiguity justifying the use of disambiguation techniques based on similarities on top of standard pseudo-relevance methods.

Recent advances in neural language models have introduced effective methods for learning word embedding. These methods represent words by vectors in low-dimensional semantic vector space relying on contextual information (representing a word by means of its neighbours) or/and word co-occurrence [11,12]. An evaluation of neural word embedding against traditional word count-based approaches, including the positive pointwise mutual information (PMI), the singular value decomposition (SVD) and the non-negative matrix factorisation (NMF) methods [13], demonstrated the success of the former on a variety of natural language processing (NLP) tasks, such as semantic relatedness, synonym detection and concept categorisation. The main advantage of the latter approaches lies in easy adaptation to any domain where a sufficiently large corpus is available. Furthermore, they showed promising results on several similarity tasks compared with knowledge-based methods using the WordNet ontology [14].

In the last few years, there has been a growing interest in using word embedding as a representational basis for NLP applications and particularly for IR. Previous research has demonstrated that incorporating word embedding similarities into existing IR models improves the performance of IR [15 –18]. Moreover, several researchers have showed the effectiveness of using word embedding for query expansion [5,19,20].

Despite the recent advances in exploiting word embedding in IR, using such word representation for Arabic IR remains yet under-explored. In fact, the rich and complex morphology of Arabic language is the most studied area in Arabic IRs [21 –27]. Although the field of Arabic IR has achieved a tangible progress, most stemming algorithms produce a noisy representation of documents and queries. On the one hand, root-based stemmers conflate words with different meaning to the same root. On the other hand, most light stemming algorithms, which have been proven to be effective to deal with the Arabic morphology in the context of IR, do not deal with broken plural and fail to discriminate conjunctions and prepositions from the core words [28,29]. Hence, light stemmers may conflate words with the same meaning to different stems. Thus, dealing with term mismatch between document and queries is of particular interest for Arabic IR.

The hypothesis of this article is that word embedding can be exploited in PRF framework for Arabic IR to deal with term mismatch since similar words, as well as words that should be grouped to the same stem, will be close to each other in the vector space. To illustrate the latter hypothesis, Figure 1 presents the two-dimensional (2D) projection using principal component analysis (PCA) for word ‘’ (lessons) and its top 100 related words where the word embedding is trained using the continuous bag-of-words (CBOW) model on stemmed Arabic text corpora applying Farasa stemmer [26]. Performing stemming before learning word embedding for Arabic IR is motivated by the fact that Arabic has a rich and complex morphology and previous studies showed that stemming is a key preprocessing step to deal with its morphology for IR. The figure shows that not only similar words appear close to each other in the vector space, but also words that should be grouped into the same stem (broken plurals, stemming errors, etc.).

Figure 1.

2D projection of Arabic word ‘’ (lessons) and its top 100 related words using PCA using the CBOW word embedding model.

In this article, we propose a method to incorporate word embedding similarities into existing PRF models for Arabic content retrieval. The main goal is to boost weights of semantically related terms to the original query terms. The present work investigates three neural word embedding models, including the Skip-gram, the CBOW and the Glove models, that represent each word by a single vector in low-dimensional vector space. Moreover, the word embedding similarities are incorporated into four PRF models, including the Kullback–Leibler divergence (KLD) [1], the Bo2 of the family of divergence from randomness (DFR) models [7] and the log-logistic (LL), as well as the smoothed power law (SP) of the information-based family of PRF models [4]. Our goal in this work is to study how word embeddings may be exploited in PRF techniques for Arabic IR. Specifically, we are looking for answers to the following main questions:

Does incorporating word embedding similarity into existing PRF models improve the performance of Arabic IR?

Which word embedding model performs better for incorporating term similarity into PRF models for Arabic IR?

The rest of the article is organised as follows. We discuss the related work in Section 2 and briefly review the neural word embedding models used in this work in Section 3. Then, we present our proposed method to incorporate word embedding semantic similarities into existing PRF models in Section 4, and we discuss the experiment results in Section 5. Finally, we conclude in Section 6.

2. Related work

For several years, great effort has been devoted to the study of Arabic query expansion. Shaalan et al. [30] introduced a method to incorporate semantic similarity into Arabic query expansion using expectation–maximisation (EM) algorithm. The EM algorithm is used to select relevant expansion terms out of top retrieved documents. Experiments are performed on INFILE test collection of CLEF 2009. The results showed that their method improves the recall. In another work, Mahgoub et al. [31] proposed a method for semantic query expansion using a domain-independent ontology built from Wikipedia. Experiments are performed on Arabic TREC 2002. The results showed better results against the baseline keyword matching method. In a different study, Belalem et al. [32] introduced a technique for interactive and automatic query expansion using the Arabic WordNet (AWN) to enhance Arabic IR. The main contribution consists of using word’s part-of-speech to select the appropriate synonyms. More recently, Atwan et al. [9] presented an automatic corpus-based expansion technique combining AWN and corpus-based semantic similarity to select expansion terms. The results showed that the automatic expansion technique enhances the accuracy of Arabic IR on TREC 2001 dataset.

One of the most significant current discussions in IR is word embedding. Vulić and Moens [16] introduced a unified framework for Bilingual Word Embedding Skip-Gram (BWESG) for monolingual information retrieval (MoIR) and cross-lingual information retrieval CLIR) from comparable data. The latter framework relies on estimating document vectors using single word embedding through a compositional approach based on word occurrence in the target document and the vocabulary size of the collection. Significant improvements are obtained by linear combination of the proposed documents embedding and the baseline language model for both MoIR and CLIR tasks. The authors have also reported a significant improvement over latent Dirichlet allocation (LDA)-based IR models. In another work, Ganguly et al. [15] have proposed a word-embedding-based generalised language model (GLM). The GLM estimates transformation probabilities (events) for a given term and its semantically related terms based on three transformation events sampling: direct term sampling, that is, language model (LM) baseline, transformation via document sampling and transformation via collection sampling. The three sampling transformations are linearly combined in the scoring function. The obtained results on TREC data sets show a significant improvement over the baseline LM and the LDA-based IR models. Zuccon et al. [17] have proposed a neural translation language model (NTLM) for exploiting word embedding in IR. The latter are used to estimate translation probabilities between words. The results show that the NTLM significantly improves the baseline language model and achieves a better performance than the state-of-the-art translation language models on most test sets. In the context of Arabic language, El Mahdaouy et al. [18] have introduced a modified term frequency scheme to incorporate word embedding similarities for Arabic IR. The main idea consists of computing the within-document term frequency based on the number of occurrences of a given query term and its similar terms. The results showed that incorporating the enhanced term frequency to standard probabilistic IR models significantly improves their baseline bag-of-words models on the standard Arabic collection TREC 2001/2002. Moreover, El Mahdaouy et al. [33] have proposed a method to incorporate word embedding similarities into existing probabilistic IR models to deal with term mismatch for Arabic document retrieval. The main idea consists of selecting the most related terms, for each query term following the approach defined by Li and Gaussier [34] in the context of CLIR, either from the collection vocabulary or from each document. The obtained results on the standard Arabic TREC 2001/2002 collection, using three neural word embedding models, showed that their proposed IR extensions significantly outperform baseline bag-of-words models and three state-of-the-art word-embedding-based language models [15 –17] and the AWN-based semantic indexing method for IR [35].

For PRF using word embedding, Zamani and Croft [36] have proposed an embedding-based relevance model. The latter model is an extension of the relevance model approach. The obtained results on TREC test collection show a significant improvement over the baseline relevance model. Moreover, Kuzi et al. [19] have presented a suite of query expansion methods using CBOW model. The obtained results show that the proposed query expansion methods improve the baseline IR models and the baseline relevance model. According to Zahran et al. [37], exploiting neural word embedding models for query expansion, including the two word2vec models and Glove model, perform slightly better than the semantic query expansion that is introduced in Mahgoub et al. [31].

In this article, we incorporate word embedding similarity into existing PRF models for Arabic IR. The main goal is to boost expansion weights of semantically related terms to the original query. We integrate word embedding similarities into four PRF models, including the KLD [1], the Bo2 of the family of DFR models [7] and the LL, as well as the SP of the information-based family of PRF models [4]. Moreover, we evaluate three neural word embedding models, including the CBOW, the Skip-gram and Glove models.

3. Neural word embedding

Most NLP applications involve word representation step and could benefit from word representations that reflect similarities and dissimilarities between them rather than treating individual words as independent symbols. Hence, there has been a lot of work proposing to represent words as dense vectors in a low-dimensional vector space obtained using various training methods inspired from neural-network language modelling. These vectors’ estimation is based on the idea that words in similar contexts have similar meanings.

3.1. CBOW

In the CBOW model [11], the context is represented by surrounding words for a given target word. The word representation is constructed by maximising the log probability to predict the target word given its context. The CBOW model uses a simple neural architecture where the nonlinear hidden layer is removed and the projection layer is shared for all words. For a given target word $w_{t}$ and its context ${w_{t - c}, \dots, w_{t - 1}, w_{t + 1}, \dots, w_{t + c}}$ , the model maximises the following CBOW

\frac{1}{| C |} \sum_{t = 1}^{| C |} \log [P (w_{t} | w_{t - c}, \dots, w_{t - 1}, w_{t + 1}, \dots, w_{t + c})]

(1)

where $| C |$ is the number of words in the corpus and c is the size of the dynamic context of $w_{t}$ .

3.2. Skip-gram model

Instead of predicting the current word using its surrounding words (context), the Skip-gram model uses a similar architecture by reversing the input and the output of the neural network [11]. Each word vector is trained to maximise the log probability of neighbouring words in a corpus. Given a sequence of training words ${w_{t - c}, \dots, w_{t + c}}$ , the model maximises the following average log probability to predict the context of the current target word (equation (2))

\frac{1}{| C |} \sum_{t = 1}^{| C |} \sum_{j = t - c, j \neq t}^{t + c} \log [P (w_{j} | w_{t})]

(2)

where $| C |$ is the number of words in the corpus and c is the size of the dynamic context of $w_{t}$ .

3.3. Glove model

Glove model is a global log-bilinear regression model that combines the advantages of global matrix factorisation, as well as local context window methods. The underlying model is trained on the non-zero entries of global word-word co-occurrence matrix [12]. The model constructs a word-word co-occurrence matrix X, whose element $X_{ij}$ represents the number of times word j occurs in the context of word i. For each word pair, Glove defines a soft constraint: $w_{i}^{T} w_{j} + b_{i} + b_{j} = \log (X_{ij})$ where $w_{i}$ and $w_{j}$ are the vectors for main word and context word, respectively. Finally, adding an additional biases $b_{i}$ for $w_{i}$ and $b_{j}$ for $w_{j}$ to restore the symmetry. The cost function is given by equation (3)

J = \sum_{i = 1}^{| C |} \sum_{j = 1}^{| C |} f (X_{ij}) {(w_{i}^{T} w_{j} + b_{i} + b_{j} - \log X_{ij})}^{2}

(3)

where f is a weighting function to avoid weighting all co-occurrences equally

f (X_{ij}) = {\begin{matrix} {(\frac{X_{ij}}{x_{max}})}^{α} & if X_{ij} < x_{max} \\ 1 & otherwise \end{matrix}

(4)

where $x_{\max}$ and $α$ are fixed experimentally to 100 and 3/4 in order to deal with rare word pairs.

4. Proposed method

To enhance the performance of Arabic IR, we propose a word-embedding-based PRF method, which incorporates word embedding similarity into existing PRF models. The main idea of our method consists of combining the distribution of expansion terms in the set of pseudo-relevant documents and their similarity to the original query in unified PRF framework. The process of our method is composed of seven main steps:

Step 1. Select a set $F = {d_{1}, \dots, d_{k}}$ of top k pseudo-relevant documents (top retrieved documents). Terms that belong to the set F are called candidate expansion terms;

Step 2. For each term in F, we compute their word embedding similarities to the original query and transform them into probabilities;

Step 3. For each term in F, we compute their distribution (i.e. weight) in the set F using a standard PRF model;

Step 4. For each term in F, we compute its modified weights by multiplying its weights that are computed in steps 2 and 3;

Step 5. Select the best n terms (expansion terms) from F according to their resulting weights (computed in step 4);

Step 6. Weight the selected expansion terms and add them to the query;

Step 7. Retrieve documents using the new query;

After selecting the set of top k pseudo-relevant documents $F = {d_{1}, \dots, d_{k}}$ , our method relies on two techniques to compute the similarity between a candidate expansion term and the original query. The first technique consists of estimating a query embedding (query vector) using query terms’ vectors. The latter is based on the idea that adding vectors of terms yields to a semantic composition of the corresponding terms [16]. The query vector is obtained using the following equation

\vec{q} = \sum_{w_{q} \in q} \frac{x_{w_{q}}}{l_{q}} \cdot {\vec{w}}_{q}

(5)

where $w_{q}$ is a query term, $x_{w_{q}}$ is the corresponding weight of $w_{q}$ in the query q and $l_{q}$ is the number of terms in q. Then, the similarity between any candidate expansion term w and the query q is obtained using the cosine distance between their corresponding vectors, given by

Si m_{comp} (\vec{w}, \vec{q}) = \cos (\vec{w}, \vec{q})

(6)

The second technique relies on computing the average similarity between a candidate expansion term’s vector and the query terms’ vectors. The average similarity is computed using the following formula

Si m_{avg} (\vec{w}, \vec{q}) = \frac{1}{l_{q}} \sum_{w_{q} \in q} \cos (\vec{w}, \cdot {\vec{w}}_{q})

(7)

Then, we use the softmax function to transform these similarities into probabilities to facilitate their incorporation into existing PRF models. The probability of a candidate expansion term w given the original query q and the set F is given by

P (w | q, F) = \frac{\exp (Sim (w, q))}{\sum_{w \in F} \exp (Sim (w, q))}

(8)

where $Sim (w, q)$ is the similarity between the candidate expansion term w and the query q, computed using equation (6) or (7).

The combination of candidate expansion term similarity to the original query and its distribution in the set F (step 4) is obtained simply by multiplying the probability $P (w | q, F)$ by its distribution in F, which is obtained using standard PRF models. Hence, the modified weight of a given candidate expansion term w is obtained by the following equation

F_{s} (w) = F (w) \cdot P (w | q, F)

(9)

where $F (w)$ is the expansion weight of a given term w, obtained using standard PRF models:

For the KLD model [1]

F (w) = P (w | F) \cdot \log (\frac{P (w | F)}{P (w | C)})

(10)

where $P (w | F) = \frac{TF (w)}{\sum_{d \in F} l_{d}}$ is the distribution of w in the set of top pseudo-relevant documents F, $TF (w)$ is the number of occurrence of w in F and $l_{d}$ is the length of document d. $P (w | C)$ is the distribution of w in the collection C.

For the Bo2 model [7]

F (w) = \underset{2}{\log} (1 + g_{w}) + TF (w) \cdot \underset{2}{\log} (\frac{1 + g_{w}}{g_{w}})

(11)

where $g_{w} = P (w | C) \cdot \sum_{d \in F} l_{d}$ .

For the information-based family of PRF models [4]:

The LL model

F (w) = \frac{1}{| F |} \sum_{d \in F} - \log (\frac{g_{w}}{g_{w} + t_{w}^{d}})

(12)

The SP model

F (w) = \frac{1}{| F |} \sum_{d \in F} - \log (\frac{g_{w}^{\frac{t_{w}^{d}}{g_{w} + t_{w}^{d}}} - g_{w}}{1 - g_{w}})

(13)

where $t_{w}^{d} = x_{w_{d}} \cdot \log (1 + c \cdot \frac{l_{d}}{l_{avg}})$ is the normalised term frequency $x_{w_{d}}$ in the document d and $g_{w} = \frac{N_{w}}{N}$ is a collection-dependent parameter of the term w. c is the term frequency normalisation parameter and $l_{avg}$ is the average document length.

After selecting the best n expansion terms according to their modified expansion weights (equation (9)), the original query is then modified to take into account both original query terms and expansion terms. The final modified weights of the query terms and expansion terms are computed using the following equation

x'_{w_{q}} = \frac{x_{w_{q}}}{{max}_{w} x_{w_{q}}} + β \cdot \frac{F_{s} (w)}{{max}_{w} F_{s} (w)}

(14)

where $x'_{w_{q}}$ is the updated weight of the query term w, $β$ is the parameter that controls the weight of expansion terms and $F_{s} (w)$ is our expansion weight of terms obtained by equation (9).

The last step of our method consists of retrieving documents for the new query using the baseline IR model.

5. Experimental evaluation

5.1. Experimental settings

All experiments are conducted using Terrier 3.5¹ IR platform on the Arabic standard TREC 2001/2002 data set. We used title-description topic fields and relevance judgements on the Arabic Newswire LDC catalogue number LDC2001T55.² The latter data set contains 75 topics. The corpus consists of 383,872 documents from the Agence France-Presse (AFP; France Press Agency) Arabic Newswire, containing 76 million tokens for 666,094 unique words. These documents are newspaper articles covering the period from May 1994 until December 2000. Our extensions are tested and evaluated mainly using the mean average precision (MAP), the precision at 10 documents (P10), and the robustness index (RI) [2]. The RI is defined as $\frac{Q_{+} - Q_{-}}{| Q |} \in [- 1, 1]$ , where $| Q |$ is the total number of queries, and $Q_{+}$ and $Q_{-}$ denote the number of increased and decreased queries, respectively. Robust models have a higher value of RI. We vary the number of top pseudo-relevant documents (k) and the number of expansion terms (n) between {10, 20, 30, 40, 50, 60, 70, 80, 90, 100} and select their optimal values according to the best MAP value. These values of k and n are commonly used in the state-of-the-art PRF models to expand queries [4,5,38]. For the baseline IR models, we evaluate the Smoothed Power-Law (SPL) and the Log-Logistic Distribution (LGD) instances of the information-based models [39], the language model [40,41], and the BM25 model [42]. The parameters of each IR model are optimised using five-fold cross-validation. Table 1 summarises the IR models, the set of parameters and the values that are used for cross-validation. Moreover, we perform the significance paired t-test and attached a letter or a number to the MAP’s value in the tables when the test passes at 90%.

Table 1.

Cross-validation parameter values.

Model	Parameter	Values
LGD	c	$0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0$
SPL		$4.5, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20.0$
LM	$μ$	$10, 25, 50, 75, 100, 200, 300, 400, 500, 600, 700$
		$800, 900, 1000, 1500, 2000, 2500, 3000, 4000, 5000$
BM25	b	$0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9$ $1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0$
PRF models	$β$	$0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, \dots, 2$

LM: language model; PRF: pseudo-relevance feedback.

To train the word embedding models, we collected a 2.03-gigabyte raw Arabic texts, containing about 216 million tokens, including Arabic BBC, CNN, OSAC corpora,³ Arabic Newswire LDC catalogue number LDC2001T55 and other sentence corpora collected from WORTSHATZ.⁴ Since stemming is a key component in any Arabic IR system and plays a key role in reducing morphological variants, we used the Farasa stemmer [26] on the collected corpora to train the neural embedding models. The latter stemmer is selected since our comparison results show that Farasa significantly improves the light and the root-based stemming approaches [43,44]. For training all word embedding models (CBOW, Skip-gram⁵ and Glove models⁶), we fix the context size and the word vector dimension to 10 and 300, respectively. The training time of each word embedding model, using an Intel Xeon 4 CPUs E5-2407 machine with 48Go of RAM, is as follows

155 min for the Skip-gram model;

52 min for the CBOW model;

93 min for the Glove model.

Even though the Glove model is three times faster than the Skip-gram model (the slowest here), all these models can be used, their training time remaining reasonable on the collection considered.

5.2. Experimental results

5.2.1. Comparison of Arabic text preprocessing approaches

First, we evaluate the impact of several text preprocessing methods on the performance of Arabic document retrieval. To do so, our experiments are performed using the following approaches:

Light stemming:

Farasa stemmer (FS); [26]

Light stemmer (LS); [43]

Lemmatisation:

MADAMIRA lemmatiser [45]: MADAMIRA unvocalised lemma (M unvocL) and MADAMIRA vocalised lemma (M vocL);

Farasa lemmatiser (FL); [46]

Heavy stemming:

Khoja root-based stemmer (RS); [44]

Text normalisation (Norm).

In fact, the Arabic language is morphologically rich and there is no real consensus, in past experiments, on which stemmer or lemmatiser to use for IR [23,27,47 –49]. We tackle this problem by conducting an extensive comparison of different stemming and lemmatisation approaches, coupled with four different IR models, from different families. In order to perform Arabic text lemmatisation, we used SAFAR API sentence splitter [50] to segment the collection’s documents and feed the obtained sentences to Farasa and MADAMIRA lemmatisation modules [46,45]. Since MADAMIRA lemmatiser produces vocalised lemmas, we used the SafeBW transliteration scheme of the underlying system in order to preserve diacritics representation for vocalised lemma-based indexing method (M vocL).

Table 2 presents the indexing time and the vocabulary size of the Arabic TREC 2001/2002 collection for the different text preprocessing approaches using the same machine.

Table 2.

Comparison of indexing time and vocabulary size for the different Arabic text preprocessing approaches.

Preprocessing	FS	FL	M unvocL	M vocL	LS	RS	Norm
Indexing time (min)	14.16	20.5	1025.4	1025	5.75	34.2	3.65
Vocabulary size	219,963	220,555	231,435	233,752	267,844	172,720	507,088

FS: Farasa stemmer; FL: Farasa lemmatiser; M unvocL: MADAMIRA unvocalised lemma; M vocL: MADAMIRA vocalised lemma; LS: light stemmer; RS: root-based stemmer.

Not surprisingly doing just text normalisation is faster, while MADAMIRA lemmatiser, that conducts deeper analysis based on word context, leads to the worst performance. Farasa lemmatizer (FL) shows the fastest indexing time in comparison with MADAMIRA lemmatiser. The latter is explained by the fact that FL relies on a dictionary of words and their possible diacritisations ordered by number of occurrences of each diacritised form, select the lemma that corresponds to the frequent diacritisation [46]. This said, the indexing times above do not prevent the use of any of the stemming approaches. The choice for one or the other is to be based on the overall performance on the targeted task. Moreover, using roots, stems and lemmas significantly reduces the storage space in comparison with text normalisation. As expected, the root-based (RS) indexing method has the smallest vocabulary size. Farasa stemmer (FS) has the smallest vocabulary size among the light stemming and MADAMIRA lemmatisation approaches. Indexing vocalised lemma (M vocL) increases the vocabulary size in comparison with unvocalised lemma (M unvocL). Furthermore, Farasa lemmatizer (FL) shows a small reduction in the storage size in comparison with MADAMIRA lemmatiser and the light stemmer (LS). The latter can be explained by the fact that Farasa lemmatizer showed a better lemmatisation accuracy than MADAMIRA [46]. In addition, FL uses Farasa stemmer/segmenter to deal with out-of-dictionary words by removing prefixes and some suffixes to get either their lemmas (if their segmented forms are found in the lemmatiser’s dictionary) or stems [46].

Table 3 summarises the obtained results for the three stemmers, the three lemma-based indexing methods and text normalisation without query expansion. The results show that all stemming and lemmatisation approaches significantly improve the text normalisation. Farasa stemming approach significantly outperforms the classical stemming approaches and both MADAMIRA lemma-based indexing methods (M unvocL and M vocL) for Arabic IR, which is explained by the high accuracy of word segmentation with Farasa [29]. Moreover, Farasa stemmer yields to a better performance than Farasa lemmatizer (FL). The latter lemmatiser shows significant improvement over MADAMIRA vocalised lemma-based indexing method (M vocL) and a better performance over MADAMIRA unvocalized lemma-based indexing method (M unvocL). This is explained by the fact that Farasa lemmatizer achieves a better lemmatisation accuracy than MADAMIRA lemmatiser [46]. Furthermore, the unvocalized lemma-based indexing method (M unvocL) achieves a better performance than vocalised lemma-based indexing method, which increases the vocabulary size (see Table 2). Thus, vocalised lemmas lead to another form of term mismatch. In addition, M unvocL achieves the best P10 performance in comparison with the other lemma-based indexing methods (FL and M vocL). In line with previous studies [43,51], the light stemming approach significantly improves the heavy stemming approach. The low performance of the heavy stemming approach is explained by the fact that the root-based stemmer conflates words with different meanings into the same root. Furthermore, the overall comparison results show that the SPL and BM25 models achieve better performances than the LGD and the LM models. Thus, we select Farasa stemmer for Arabic text preprocessing and the SPL model as a baseline IR model for the rest of our experiments.

Table 3.

Summary of the results for bag-of-words IR models using Farasa, light and heavy stemming approaches, and text normalisation without query expansion.

Prep.	FS		FL		M unvocL		M vocL		LS		RS		Norm
Model	MAP	P10	MAP	P10	MAP	P10	MAP	P10	MAP	P10	MAP	P10	MAP	P10
LGD	32.42^{1, h, n, u, v}	47.33	31.08^{l, h, n, v}	45.33	30.41^{1, h, n}	47.07	29.22^{h, n}	44.13	28.94^{h, n}	44.20	24.97ⁿ	41.07	22.32	37.60
SPL	33.51^{1, h, n, u, v}	50.67	32.06^{l, h, n, v}	48.27	31.14^{1, h, n}	49.07	30.06^{h, n}	47.33	28.72^{h, n}	44.80	25.28ⁿ	45.73	22.82	40.27
BM25	33.42^{1, h, n, u, v}	49.60	32.58^{l, h, n, v}	47.87	31.52^{1, h, n}	49.47	30.48^{h, n}	47.33	28.93^{h, n}	42.93	25.17ⁿ	44.40	22.91	40.93
LM	31.15^{1, h, n, u, v}	46.39	30.24^{l, h, n, v}	45.73	29.16^{1, h, n}	46.20	28.47^{h, n}	45.20	27.85^{h, n}	43.07	25.22ⁿ	43.87	22.24	38.53

For statistical significance, f = better than Farasa stemming (FS), t = better than Farasa lemmatiser (FL), u = better than MADAMIRA unvocalized lemma (M unvocL), v = better than MADAMIRA vocalised lemma (M vocL), l = better than light stemming (LS), h = better than heavy stemming (RS) and n = better than text normalisation (Norm).

5.2.2. Evaluation of word-embedding-based PRF models

Second, we conduct several experiments to compare our proposed extensions for incorporating word embedding similarities into existing PRF against their baseline PRF models and the baseline IR model (SPL model). The main goal of these experiments is to answer the following question: does incorporating word embedding similarity into existing PRF models improve the performance of Arabic IR? The PRF extensions are evaluated using both similarity functions ( $Si m_{comp}$ and $Si m_{avg}$ ) that are used to compute the similarity between expansion terms and the original query. For word embedding, we selected the Glove model. We just report here the results obtained with $k = 10$ (number of feedback documents) and $n = 50$ (number of expansion terms). The best value for k depends on the collection considered and is usually in-between 5 and 20. We conduct below (Section 5.2.3) an analysis of the impact of n on the retrieval performance.

Table 4 shows the obtained results for our PRF extensions, their baseline PRF models and the baseline IR model (SPL), using the Farasa stemming approach. The proposed extensions significantly outperform their baselines using both similarity functions for the KLD and Bo2 PRF models. Although our LL and SP extensions improve their baseline PRF model, the difference is not statistically significant. The best results are obtained by incorporating word embedding similarity into the Bo2, LL and SP PRF models. Our Bo2, LL and SP extensions enhance the baseline IR model by 22% and 68% for the MAP and the RI, respectively. Furthermore, computing the average similarity between expansion terms and the original query terms yield to a slightly better performance than using a single embedding vector for the query.

Table 4.

Evaluation of the proposed PRF models against their baseline PRF models and the baseline IR model (SPL).

Model / measure	MAP	P10	RI
Baseline	33.51	50.67	−−
KLD	38.76 ¹	52.27	0.57
KLD_ $Si m_{comp}$	40.45 ^1,2	54.03	0.65
KLD_ $Si m_{avg}$	40.62 ^1,2	54.03	0.65
Bo2	39.7 ^1,2	52.8	0.59
Bo2_ $Si m_{comp}$	41.03 ^1,2	54.8	0.68
Bo2_ $Si m_{avg}$	41.11 ^1,2	55.07	0.68
LL	40.71¹	53.07	0.62
LL_ $Si m_{comp}$	41.01¹	54.00	0.68
LL_ $Si m_{avg}$	41.04¹	54.40	0.68
SP	40.17¹	52.80	0.60
SP_ $Si m_{comp}$	41.02¹	54.33	0.68
SP_ $Si m_{avg}$	41.04¹	54.20	0.68

PRF: pseudo-relevance feedback; IR: information retrieval; MAP: mean average precision; P10: precision at 10 documents; RI: robustness index; KLD: Kullback–Leibler divergence; LL: log-logistic; SP: smoothed power law.

Superscripts 1 and 2 report a significant improvement over the SPL baseline IR models and the baseline PRF model, respectively; The best performances are shown by bold.

5.2.3. Impact of the number of expansion terms

Third, we evaluate the impact of the number of expansion terms on the proposed PRF extensions and their baseline models. The aim is to study the sensitivity of the proposed extension to the number of expansion terms. To do so, we fixed the number of pseudo-relevant documents to 10 and vary the number of expansion terms between 10 and 100.

Figure 2 illustrates the sensitivity of the proposed extension to the number of expansion terms. According to this figure, the performance of the PRF extensions and their baselines generally increases by increasing the number of expansion terms for the used 100 best expansion terms. Moreover, for all ranges of expansion terms, the proposed extensions improve their PRF baselines. The $Si m_{comp}$ and $Si m_{avg}$ similarity functions achieve nearly the same performance for the used numbers of expansion terms, especially when it comes to selecting a higher number of expansion terms (>10). For most PRF extensions and their baseline PRF models, the performance remains in general stable after 50–60 expansion terms. Small improvements are nevertheless obtained by LL and SP extensions for a large number of expansion terms. The latter finding is explained by the fact that the LL and the SP models satisfy most PRF constraints that a desirable PRF model should satisfy to be empirically effective [4]. All in all, the best performance is obtained for 50 ≤ n ≤ 70, with no clear difference in this range. The need of 10 or more expansion terms can be explained by the fact that term mismatch significantly affects the performance of Arabic IR.

Figure 2.

Effect of the number of expansion terms on the MAP of the proposed PRF extensions and their baselines: (a) KLD PRF model, (b) Bo2 PRF model, (c) LL PRF model and (d) SP PRF model.

The processing time, again using an Intel Xeon 4 CPUs E5-2407 machine with 48Go of RAM, of documents retrieval employing the SPL baseline IR model, the LL PRF baseline model (the most complex PRF model we consider) and our extensions for the whole set of queries (75 queries) is as follows (k is set to 10):

10.46 s for the baseline SPL models (without query expansion);

26.10 and 65.44 s for the baseline LL PRF model using $n = 10$ and $n = 100$ , respectively;

28.94 and 67.21 s for LL_ $Si m_{comp}$ model using $n = 10$ and $n = 100$ , respectively;

31.59 and 70.18 s for LL_ $Si m_{avg}$ model using $n = 10$ and $n = 100$ , respectively.

As one can note, if the processing time increases, it remains below 0.5 s/query for $n = 10$ and 1 s/query for $n = 100$ . In both cases, however, the performance improves over the baseline (see Figure 2). Moreover, an extremely small increase in the processing time for the whole set of queries is obtained by our PRF extensions in comparison with the standard PRF model (LL).

5.2.4. Comparison of word embedding models for PRF

Finally, we compare the three word embedding models (CBOW, Skip-gram and Glove) for incorporating term similarity into PRF models (KLD, Bo2, LL and SP). The main goal of this comparison is to answer the question: Which word embedding model performs better for incorporating term similarity into PRF techniques for Arabic IR?

Table 5 presents the comparison results for the proposed word-embedding-based PRF extensions using Glove, CBOW and Skip-gram models. The optimal values of the number of top pseudo-relevant documents and the number of expansion terms are selected according to the best MAP value for each PRF model. The overall comparison results show that the difference in terms of MAP performance between the three word embedding models for each PRF extension is not statistically significant. Although the best MAP and P10 values are obtained by incorporating the Skip-gram word similarity into the Bo2 PRF extension (Bo2_ $Si m_{avg}$ ), the PRF extensions achieve quite similar performances for the three word embedding models. In line with the previous results, the difference in terms of MAP performance between $Si m_{avg}$ and $Si m_{comp}$ similarity functions is not statistically significant for the three word embedding models.

Table 5.

Summary of the comparison results for the proposed word-embedding-based PRF extensions using Glove, CBOW and Skip-gram models.

Word embedding	Glove		Skip-gram		CBOW
Model / measure	MAP	P10	MAP	P10	MAP	P10
KLD_ $Si m_{comp}$	40.45	54.03	40.41	53.47	40.16	53.73
KLD_ $Si m_{avg}$	40.62	54.03	40.71	53.73	40.67	53.73
Bo2_ $Si m_{comp}$	41.03	54.8	41.18	54.40	41.09	54.67
Bo2_ $Si m_{avg}$	41.11	55.07	41.26	54.67	41.21	55.07
LL_ $Si m_{comp}$	41.01	54.00	40.95	53.73	41.05	54.33
LL_ $Si m_{avg}$	41.04	54.40	41.04	54.93	41.12	54.80
SP_ $Si m_{comp}$	41.02	54.33	41.11	54.07	41.02	54.33
SP_ $Si m_{avg}$	41.04	54.20	41.15	54.42	41.06	54.20

PRF: pseudo-relevance feedback; CBOW: continuous bag-of-words model; KLD: Kullback–Leibler divergence; LL: log-logistic; SP: smoothed power law; MAP: mean average precision; P10: precision at 10 documents.

The best performances on the MAP and the P10 values are shown by bold and bold-italic respectively.

6. Conclusion

In this article, we proposed a method to incorporate word embedding similarity into existing PRF models (KLD, Bo2, LL and SP) for Arabic IR. The main idea of our method consists of combining the distribution of expansion terms in the set of pseudo-relevant documents and their similarity to the original query terms in unified PRF framework. To do so, we used two word similarity functions that compute the similarity between a candidate expansion term and the original query. Evaluations are performed on the standard Arabic TREC 2001/2002 test collection using three neural word embedding models, including the Glove, the CBOW and the Skip-gram models. Our method improved the baseline IR model by 22% and 68% for the MAP and the RI, respectively. The analysis of the obtained results led us to conclude that

Incorporating word embedding similarity into existing PRF models significantly improves the performance of their baselines (KLD and Bo2) and the baseline bag-of-word IR models (SPL model);

The difference in terms of performance between the three word embedding models (Glove, CBOW and Skip-gram models) is not statistically significant;

Computing word similarity using either a query embedding using query terms vectors ( $Si m_{comp}$ ) or the average similarity between a candidate expansion term and the original query ( $Si m_{avg}$ ) leads to nearly the same performance for all word embedding models and PRF extensions;

The Farasa stemmer is in general preferable to the classical light, the root-based stemming, the evaluated state-of-the-art lemmatisers (MADAMIRA and Farasa) and text normalisation approaches;

The results showed that the difference in terms of performance between the three word embedding models (Glove, CBOW and Skip-gram models) is not statistically significant. A straightforward path of future research is to study the impact of parameters that are used to learn word embedding, such as the context size and the dimension of word vectors, and to rely on other word-embedding-based IR models as the one proposed in El Mahdaouy et al. [33]. We also plan on comparing our PRF approach to other query expansion methods as [5,19,20].

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

Notes

References

Carpineto

de Mori

Romano

. An information-theoretic approach to automatic query expansion. ACM T Inform Syst 2001; 19(1): 1–27.

Collins-Thompson

Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of the 18th ACM conference on information and knowledge management, Hong Kong, China, 2–6 November 2009, pp. 837–846. New York: ACM.

Carpineto

Romano

A survey of automatic query expansion in information retrieval. ACM Comput Surv 2012; 44(1): 11–150.

Clinchant

Gaussier

A theoretical analysis of pseudo-relevance feedback models. In: Proceedings of the 2013 conference on the theory of information retrieval, Copenhagen, 29 September–2 October 2013, pp. 6–13. New York: ACM.

Almasri

Berrut

Chevallet

A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In: Proceedings of the advances in information retrieval – 38th European conference on IR research, ECIR 2016, Padua, 20–23 March 2016, pp. 709–715, https://hal.archives-ouvertes.fr/hal-01576603/document

Lavrenko

Croft

WB.

Relevance-based language models. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, LA, 9–12 September 2001, pp. 120–127. New York: ACM.

Amati

Van Rijsbergen

CJ.

Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM T Inform Syst 2002; 20(4): 357–389.

Fang

Zhai

Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, WA, 6–11 August 2006, pp. 115–122. New York: ACM.

Atwan

Mohd

Rashaideh

. Semantically enhanced pseudo relevance feedback for Arabic information retrieval. J Inform Sci 2016; 42(2): 246–260.

10.

Montazeralghaem

Zamani

Shakery

Axiomatic analysis for improving the log-logistic feedback model. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, Pisa, 17–21 July 2016, pp. 765–768. New York: ACM.

11.

Mikolov

Chen

Corrado

. Efficient estimation of word representations in vector space. Ithaca, NY: Cornell University Library, 2013.

12.

Pennington

Socher

Manning

Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language, pp. 1532–1543. Doha, Qatar: Association for Computational Linguistics, https://www.aclweb.org/anthology/D14-1162

13.

Baroni

Dinu

Kruszewski

Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, Baltimore, MD, 23–25 June 2014, pp. 238–247. Baltimore, Maryland: ACL, http://www.aclweb.org/anthology/P14-1023

14.

Lofi

Measuring semantic similarity and relatedness with distributional and knowledge-based approaches. Inform Media Technol 2015; 10(3): 493–501.

15.

Ganguly

Roy

Mitra

. Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, 9–13 August 2015, pp. 795–798. New York: ACM.

16.

Vulić

Moens

MF.

Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, 9-13 August 2015, pp. 363–372. New York: ACM.

17.

Zuccon

Koopman

Bruza

. Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of the 20th Australasian document computing symposium, Parramatta, 8–9 December 2015, pp. 1–12. New York: ACM.

18.

El Mahdaouy

Alaoui

SOE

Gaussier

Éric

. Semantically enhanced term frequency based on word embeddings for Arabic information retrieval. In: Proceedings of the 2016 4th IEEE international colloquium on information science and technology (CiSt), Tangier, Morocco, 24–26 October 2016, pp. 385–389. New York: IEEE.

19.

Kuzi

Shtok

Kurland

Query expansion using word embeddings. In: Proceedings of the 25th ACM international on conference on information and knowledge management, Indianapolis, IN, 24–28 October 2016, pp. 1929–1932. New York: ACM.

20.

Diaz

Mitra

Craswell

Query expansion with locally-trained word embeddings, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. The Association for Computer Linguistics. http://www.aclweb.org/anthology/P16-1035

21.

Larkey

Ballesteros

Connell

ME.

Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, 11–15 August 2002, pp. 275–282. New York: ACM.

22.

Kadri

Nie

JY.

Effective stemming for Arabic information retrieval. In: Proceedings of the international conference at the British computer society challenge of Arabic for NLP/MT, London, 2006, pp. 68–74.

23.

Abu El-Khair

. Arabic information retrieval. Annu Rev Inform Sci 2007; 41(1): 505–533.

24.

Algarni

Martin

Bell

. Simple Arabic stemmer. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, 3–7 November 2014, pp. 1803–1806. New York: ACM.

25.

Azmi

Aljafari

EA.

Modern information retrieval in Arabic – catering to standard and colloquial Arabic users. J Inform Sci 2015; 41(4): 506–517.

26.

Abdelali

Darwish

Durrani

. Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics demonstrations session, 12–17 June 2016, pp. 11–16. San Diego CA: Human Language Technologies, http://www.aclweb.org/anthology/N16-3003

27.

Guirat

Bounhas

Slimani

Combining indexing units for Arabic information retrieval. Int J Softw Innov 2016; 4(4): 14.

28.

Nwesri

Tahaghoghi

Scholer

Stemming Arabic conjunctions and prepositions. In: Consens

Navarro

(eds) String processing and information retrieval, lecture notes in computer science, vol. 3772. Berlin; Heidelberg: Springer, 2005, pp. 206–217.

29.

Darwish

Mubarak

Farasa: a new fast and accurate Arabic word segmenter. In: Proceedings of the tenth international conference on language resources and evaluation LREC 2016, Portorož, 23–28 May 2016, pp. 1070–1074. Paris: European Language Resources Association (ELRA).

30.

Shaalan

Al-Sheikh

Oroumchian

. Query expansion based-on similarity of terms for improving Arabic information retrieval. In: Proceedings of the international conference on intelligent information processing, 2012, pp. 167–176. Berlin: Springer.

31.

Mahgoub

Rashwan

Raafat

. Semantic query expansion for Arabic information retrieval. In: Proceedings of the Arabic natural language processing workshop, 2014, pp. 87–92.

32.

Belalem

Abbache

Barigou

. The use of Arabic WordNet in Arabic information retrieval. Int J Inform Retr 2014; 4(3): 54–65.

33.

El Mahdaouy

El Alaoui

Gaussier

. Improving Arabic information retrieval using word embedding similarities. Int J Speech Technol 2018; 21(1): 121–136.

34.

Gaussier

É.

An information-based cross-language information retrieval model. In: Proceedings of the 34th European conference on IR research, ECIR 2012, lecture notes in computer science (LNCS), vol. 7224, Barcelona, 1–5 April 2012, pp. 281–292. Berlin: Springer.

35.

Abderrahim

Dib

Abderrahim

MEA

. Semantic indexing of Arabic texts for information retrieval system. Int J Speech Technol 2016; 19(2): 229–236.

36.

Zamani

Croft

WB.

Embedding-based query language models. In: Proceedings of the 2016 ACM international conference on the theory of information retrieval, Newark, DE, 12–16 September 2016, pp. 147–156. New York: ACM.

37.

Zahran

Magooda

Mahgoub

. Word representations in vector space and their applications for arabic. In: Gelbukh

(ed.). Computational linguistics and intelligent text processing, Proceedings of the 16th international conference part I, lecture notes in computer science, CICLing 2015, Cairo, 14–20 April 2015, vol. 9041. Berlin: Springer.

38.

Zamani

Dadashkarimi

Shakery

. Pseudo-relevance feedback based on matrix factorization. In: Proceedings of the 25th ACM international on conference on information and knowledge management, Indianapolis, IN, 24–28 October 2016, pp. 1483–1492. New York: ACM.

39.

Clinchant

Gaussier

Information-based models for Ad Hoc IR. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, 19–23 July 2010, pp. 234–241. New York: ACM.

40.

Ponte

Croft

WB.

A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, Melbourne, 24–28 August 1998, pp. 275–281. New York: ACM.

41.

Zhai

Lafferty

A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, LA, 9–13 September 2001, pp. 334–342. New York: ACM.

42.

Robertson

Walker

Jones

. Okapi at TREC-3. In: Proceedings of the TREC’94, pp. 109–126.

43.

Larkey

Ballesteros

Connell

Light stemming for Arabic information retrieval. In: Soudi

Bosch

Neumann

(eds) Arabic computational morphology, text, speech and language technology, vol. 38. Berlin: Springer, 2007, pp. 221–243.

44.

Khoja

Garside

Stemming Arabic text. Lancashire: Computing Department, Lancaster University.

45.

Pasha

Al-Badrashiny

Diab

. Madamira: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Chair

NCC

Choukri

Declerck

. (eds) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik: European Language Resources Association (ELRA), 2014, p. L14-1479.

46.

Mubarak

Build fast and accurate lemmatization for Arabic. Ithaca, NY: Cornell University Library, 2017.

47.

Hmeidi

Al-Ayyoub

Abdulla

. Automatic Arabic text categorization: a comprehensive comparative study. J Inform Sci 2015; 41(1): 114–124.

48.

Al-Badarneh

Al-Shawakfa

Bani-Ismail

. The impact of indexing approaches on Arabic text classification. J Inform Sci 2017; 43(2): 159–173.

49.

Zeroual

Lakhouaja

Arabic information retrieval: stemming or lemmatization? In: Proceedings of the 2017 intelligent systems and computer vision (ISCV), Fez, Morocco, 17–19 April 2017, pp. 1–6. New York: IEEE.

50.

Jaafar

Bouzoubaa

Arabic natural language processing from software engineering to complex pipeline. In: Proceedings of the 2015 first international conference on Arabic computational linguistics (ACLing), Cairo, Egypt, 17–20 April 2015, pp. 29–36. New York: IEEE.

51.

El Mahdaouy

Gaussier

Alaoui

SOE.

Exploring term proximity statistic for Arabic information retrieval. In: Proceedings of the 2014 third IEEE international colloquium in information science and technology (CIST), Tetouan, Morocco, 20–22 October 2014, pp. 272–277. New York: IEEE.