Learning search popularity for personalized query completion in information retrieval

Abstract

Query completion approaches assist searchers in formulating queries with few keystrokes when using an information retrieval system to address their information needs, which help users benefit from avoiding spelling mistakes and from producing clear query formulations, etc. Previous work on query completion algorithms returns a ranked list of queries to the users mostly based on the overall observed search popularity of query candidates in the whole query logs. However, the query search popularity could be changed over time, i.e., it’s time-aware. Thus, these ranking approaches based on the overall search popularity could not work very well and users may fail to find an acceptable query in the returned list, resulting in a limited search satisfaction. Hence, this paper proposes a Learning-based Personalized Query Ranking approach, i.e., LQR, where the features on the observed and predicted search popularity both in the whole logs and the recent period are exploited. Taking a pair-wise learning scenario, this paper presents a method for generating a ranked list of query candidates, and then reranks the candidates by the similarity to current search context. The experimental results show the proposed approach outperforms the baseline in terms of Mean Reciprocal Rank (MRR), reporting an average MRR improvement of 7% against the baseline.

Keywords

Information retrieval query completion query suggestion query formulation

1 Introduction

Query completion is one of the most significant features provided by common information retrieval systems. It also can be embedded into other search systems such as image retrieval system [1, 2]. As shown in Fig. 1, after typing the prefix “Nepal” in the search box, the search system will automatically provide several query candidates to help you finish formulating your query. The goal of query completion service provided by the search systems is to accurately predict the user’s intended query with only few keystrokes (i.e., typed prefix or a string). By doing so, it helps the user formulate a satisfactory query with less effort and avoid misspelling it as well. It has been shown in Web search applications that query completion is heavily used by searchers and highly influential on search results accuracy when it’s offered [3].

Fig.1

Google query completions of typed prefix “Nepal”. The snapshot was taken on Saturday, May 2, 2015.

Currently, a common and straightforward approach used in previous work on query completion ranking task is to extract the matching queries to the prefix from the query logs, and rank them by their popularity [4 –6], which assumes that the current or future popularity of queries will remain the same as previously observed. Although this approach results in good query ranking performance on average, it fails to take strong clues from time, or named search popularity trend, into consideration while such information often influences the queries most likely to be typed. As shown in Fig. 2, these two queries “Nepal” and “Nepal earthquake” are absolutely unpopular before April 2015. However, after the date of April 22, 2015, we can see there is a sharp jump from a low level to a completely high degree, that’s because of an earthquake in Nepal recently. Hence, the query ranking approach solely based on the overall search popularity can not work well for such cases.

Fig.2

Relative query popularity for different queries over time. Query “Nepal” in blue and query “Nepal earthquake” in red. Both queries present a sharp increase from a relatively low level after April 22, 2015. The snapshot was taken on Saturday, May 2, 2015.

In addition, queries issued in the same session often express similar search intents, which could affect the ranking of query candidates for a query completion task. To deal with it, this paper proposes a Learning-based personalized Query Ranking approach, namely LQR, which takes the observed and predicted search popularity in a recent period into consideration as well as the overall popularity in the whole query logs. Moreover, the search intent expressed by previous in session is also taken into account in our model by considering the similarity between queries. Eventually, our proposal can beat the baseline in terms of Mean Reciprocal Rank (MRR) with 7% improvement.

The goal of this paper is that we provide an alternative approach for query completion, which is helpful to search engine designers to improve the satisfaction of web search users. Our contributions in this paper can be summarized as: 1.

We propose a Learning-based Personalized Query Ranking approach (LQR), incorporating popularity related features for learning and user’s search history for personalization, which is verified to improve the accuracy of query completion.

We analyze the effectiveness of our LQR model and find that it significantly outperforms the state-of-the-art baselines for query completion in terms of Mean Reciprocal Rank (MRR), reporting an MRR improvement of around 7% against a time-sensitive query completion approach.

Next, we briefly summarize the related work on query completion in Section 2, which is followed by a Section 3 of introducing our approach, i.e., learning-based personalized query ranking approach. Section 4 details the experimental setups and Section 5 discusses the results generated by our proposal and compared with those of the baseline. We conclude our findings and point out possible future research directions in Section 6.

2 Related work

Query completion [5 , 7–10], a well-known feature of today’s search systems, relies on the query logs to generate query candidates and is among the first services that users interact with a search system as they formulate their queries [11]. In a common query ranking scenario, the most simple approach to rank query candidates is to use Maximum Likelihood Estimation (MLE) approach based on the collected popularity of queries [4], i.e., frequency. It refers to this type of query ranking as the Most Popular Completion (MPC) model as Equation (1): $MPC (p) = \underset{q \in C (p)}{arg max} w (q),$ (1) where $w (q) = \frac{f (q)}{\sum_{i \in Q} f (i)}$ and f (q) denotes the number of occurrences of query q in the search log, and C (p) is a set of query completion candidates that start with prefix p.

In essence, the MPC model assumes that the current query popularity distribution will keep the same as what was previously observed, and hence query completions are ranked by their past popularity in order to maximize the effectiveness for all users on average. However, query popularity may change over time, and accordingly, query candidates must be adjusted to account for time-sensitivity. As a consequence, the ranking of query completions should be time-dependent (see Fig. 2). These points have been studied recently in the fields of information retrieval. Next, we summarize some recent query completion approaches from two aspects: search popularity based query completion and search context similarity based query completion.

2.1 Search popularity based query completion

The basic query ranking approach works well based on the overall search popularity. However, the temporal information, e.g., recency, has not been taken into account in the original ranking model while the query popularity is changing over time. Rather than ranking query candidates by their previously overall popularity, researchers proposed a time series analysis based modeling approach to forecast the query frequency by applying a fixed moving time window [5]. Queries recurring during specific temporal intervals, such as day/night, workday/weekend, summer/winter, etc. are modeled differently to predict the future popularity for query ranking at different time. The forecasts obtained by such models are substantially more reliable, by which good query ranking can be generated. However, a detailed analysis of the performance impact of the time window period selection is lacking.

Moreover, the time information, such as recency, has been studied [12 –14] and the seasonality has also been considered [5, 15] for ranking query candidates. These approaches employ time-series analysis techniques to forecast query’s future popularity [5, 15]. In addition, a popularity-based query ranking approach can be combined with other criteria. The hybrid model assigns a final score to each candidate by linearly combining the popularity score and another metric score, e.g., context similarity score. All these discussed models use the search popularity in a straightforward way, ignoring the latent contributions from recent search popularity or the trend. Our proposal will conduct a deep investigation on the impact of search popularity to query ranking performance.

2.2 Search context based query completion

In most work mentioned so far, query candidates are computed globally, and for a given prefix, all users are presented with the same list of candidates. However, different users have various search intents. Hence, exploiting user’s personal search context has led to increase the query ranking effectiveness [4 , 16–18].

Typically, the user’s recent queries are used as context to find users with shared search activity [4, 19], considering the similarity of query candidates to this context for ranking. Their hybrid model computes a final score of each candidate by linearly combining the MPC score and a context-similarity score. Similarly, the click graph is used [11, 20] to measure the similarity between search context and query candidates, by which the query ranking list can be generated. Moreover, user interactions can play a prominent role in models for ranking queries [3, 21]. In addition, exploiting the context to learn user reformulation behavior is well proposed [21], where a supervised approach for ranking query completions based on term-, query- and session-level features of user reformulation behavior was proposed. These features can capture how users change their queries during search sessions and these interesting findings provide valuable insights for understanding user’s engagement in search systems.

In this paper, we argue that injecting recent search popularity features, from observations and predictions, into a learning to rank framework to query completion tasks will boost the ranking performance. In addition, our personalized query ranking based on semantic query similarity will help generate better ranking of queries.

3 Approach

In this section, we first describe the search popularity features, then introduce the learning algorithm for query ranking, and finally generate a personalized query ranking by incorporating the query similarity to search context.

3.1 Search popularity based features

We propose a learning-based query ranking method, i.e., LQR, that ranks query candidates by both observed and predicted query popularity (i.e., frequency) based on its recent trend. LQR not only inherits the merits of time-series analysis on short-term observations of query popularity, but also considers the overall query counts. Generally, we collect the observed query popularity in the whole query log as well as in recent m day(s), and denote the corresponding features as: f_o+w and f_o+m, where the former relates to the popularity in the whole query logs and the latter is collected only in the recent m days.

In addition, we predict a query q’s future popularity based on its recent preceding observations. The search popularity prediction $\hat{y} (t_{0}, q, i)_{trend}$ for query q at future day t₀ generated from the observation of preceding day i (i = 1, 2, ⋯ , n) is derived from the first order derivative of q’s daily count C (q, i) as Equation (2): $\hat{y} (t_{0}, q, i)_{trend} = C (q, i) + \int_{i}^{t_{0}} \frac{\partial C (q, t)}{\partial t} d t$ (2)

We indicate the features based on the predicted search popularity with f_p+n, where n denotes the day from which the prediction is generated.

3.2 Pair-wise learning for query ranking

The problem of learning-based query completion ranking can be formalized as follows. For a prefix p typed by a user and a query candidate set Q consisting of N queries to be ranked, i.e., q₁, q₂, ⋯ , q_N, the optimal ranking approach should return a ranked query list L_r that orders the queries in Q according to a score (or relevance), where the queries can be represented as vectors, e.g., q_i ← {f_i1, f_i2, ⋯ , f_iu}, where u is the number of features. Hence, the query ranking approach tries to assign a score s (p, q) to each query candidate q in the set Q using a ranking function as Equation (3): $s (p, q) \leftarrow \vec{ω} Φ (p, q)$ (3) where $\vec{ω}$ is a weight vector that is adjusted by learning. Φ (p, q) is a vector of mapping features that describe the match between query q and prefix p.

Typically, the ranking systems can not achieve an optimal ranked list $L_{r}^{*}$ including all query candidates matching the typed prefix. Instead, we can only focus on the top positions in the list, e.g., the top N candidates. Formally, if a query q₁ is ranked higher than another query q₂ in a list L_r, we denote q₁ ≻ _{L
_r}q₂, and finally, if the query q₁ is issued by the user and the other queries in the candidate list are not, we can assign a binary score 1 to q₁ and 0 to others. By dong so, we can get a query-pair comparison score: com (q₁, q₂) =1 for the pair of q₁ and q₂, which can be used for learning.

We are now in a position to define the problem of learning a query ranking function. For a fixed but unknown distribution $P (p, L_{r}^{*})$ of prefix p and an optimal query ranking $L_{r}^{*}$ in a query candidate set Q with N queries, the goal is to learn a ranking function f by which the expected score as Equation (4): $ExpectedS (f) = \int s (p, {qr}^{*} d P (p, {qr}^{*}))$ (4) is maximal. Instead of maximizing (4) directly, it is equivalent to find a weight vector so that the maximum number of the following inequalities as Equation (5) is fulfilled. ${\begin{matrix} \forall (q_{1 i}, q_{1 j}) \in \underset{r 1}{L^{*}} : \vec{ω} Φ (p_{1}, q_{1 i}) > \vec{ω} Φ (p_{1}, q_{1 j}) \\ \forall (q_{2 i}, q_{2 j}) \in \underset{r 2}{L^{*}} : \vec{ω} Φ (p_{2}, q_{2 i}) > \vec{ω} Φ (p_{2}, q_{2 i}) \\ \dots \\ \forall (q_{ni}, q_{nj}) \in \underset{rn}{L^{*}} : \vec{ω} Φ (p_{n}, q_{ni}) > \vec{ω} Φ (p_{n}, q_{ni}) \end{matrix}$ (5)

However, it has been shown that this problem is NP-hard [22]. Instead, it is possible to approximate the solution by introducing a non-negative variable ξ_i,j,k and minimizing the upper bound ∑ξ_i,j,k. Hence, it leads to the following optimization objective in Equation (6): $min \sum_{i, j, k} ξ_{i, j, k}$ (6) subject to the conditions in Equation (7): ${\begin{matrix} \forall (q_{1 i}, q_{1 j}) \in \underset{r 1}{L^{*}} : \vec{ω} Φ (p_{1}, q_{1 i}) - \vec{ω} Φ (p_{1}, q_{1 j}) \\ > 1 - ξ_{i, j, 1} \\ \forall (q_{2 i}, q_{2 j}) \in \underset{r 2}{L^{*}} : \vec{ω} Φ (p_{2}, q_{2 i}) - \vec{ω} Φ (p_{2}, q_{2 j}) \\ > 1 - ξ_{i, j, 2} \\ \dots \\ \forall (q_{ni}, q_{nj}) \in \underset{rn}{L^{*}} : \vec{ω} Φ (p_{n}, q_{ni}) - \vec{ω} Φ (p_{n}, q_{nj}) \\ > 1 - ξ_{i, j, n} \end{matrix}$ (7) where these constrains mentioned in Equation (5) can be directly derived from the query-pair comparisons in the training phase of query pairs for a specific prefix. Through this formulation of training samples, the query ranking function can be produced by applying a learning algorithm, e.g., SVM_light [23] or LambdaMART [24], which results in a ranking model $M$ .

3.3 Session-based personalized query completion

As queries in the same session always express similar search intents [25], we personalize query completion according to the query similarity to the current search context in session, i.e., the previous queries in session.

We introduce a hybrid model to incorporate the scores of the similarity and the ranking generated by the learning process in §3.2. First, the learning process produces a ranking of query candidates S (p) of prefix p, where each candidate q_c ∈ S (p) is associated with a ranking score LeaScore (q_c). Then we calculate the similarity score SimScore (q_c) for q_c to the previous queries Q_S in session by the scheme in Equation (8) $SimScore (q_{c}) = \frac{1}{| Q_{S} |} \sum_{q_{i} \in Q_{S}} \cos (q_{c}, q_{i}),$ (8) where |Q_S| indicates the number of queries in session and we use the cosine function to measure the similarity between queries. To represent queries, a word-embedding approach is introduced [26], which is based on the word representation. For instance, we use Equation (9) to calculate the cosine similarity between queries q₁ and q₂, $\cos (q_{1}, q_{2}) = \frac{1}{| q_{1} | | q_{2} |} \sum_{w_{i} \in q_{1}} \sum_{w_{j} \in q_{2}} \cos (w_{i}, w_{j}),$ (9) where |q| means the length of query q in words and each word w can be directly represented a vector returned by word2vec [26].

Like this work [4], we then define our hybrid model as a combination of two scores in Equation (10):

$\begin{matrix} HybScore (q_{c}) \\ = γ \cdot LeaScore (q_{c}) + (1 - γ) \cdot SimScore (q_{c}) . \end{matrix}$ (10)

As LeaScore (q_c) and SimScore (q_c) use different units and scales, they need to be standardized before being combined. We standardize LeaScore (q_c) as follows in Equation (11): $LeaScore (q_{c}) \leftarrow \frac{LeaScore (q_{c}) - μ_{T}}{σ_{T}},$ (11) where μ_T and σ_T are the mean and standard deviation of scores of queries in S (p). Similarly, we standardize SimScore (q_c) in Equation (12): $SimScore (q_{c}) \leftarrow \frac{SimScore (q_{c}) - μ_{P}}{σ_{P}},$ (12) where μ_P and σ_P are the mean and standard deviation of similarity scores of queries in S (p). Algorithm 1 details the major steps of our LQR model.

Algorithm 1 Learning-based Personalized Query Ranking (LQR).

Require: A prefix p; A ranking model $M$ learnt by popularity features; A list L of top N query candidates returned by MPC; Previous queries in session Q_S.

Ensure: A reranked list L′ of N query candidates;

1: for each query q ∈ Ldo

2: Extract the popularity features q_i ← {f_i1, f_i2, ⋯ , f_iu};

3: Compute LeaScore (q_c) using $M$ ;

4: Compute SimScore (q_c) according to Eq. (8);

5: Compute HybScore (q_c) according to Eq. (10);

6: end for

7: Rerank query q ∈ L based on the hybrid scores to generate L′;

8: returnL′;

4 Experimental setup

Below, §4.1 describes the datasets and lists some interesting observations; §4.2 provides details about our evaluation metric and baseline; we detail our settings and parameters in §4.3.

4.1 Dataset

We use the query log dataset collected from a commercial search engine. It’s sampled between March 1, 2006 and May 31, 2006. In total there are 16,946,938 queries submitted by 657,426 unique users and we partitioned the data into two parts: a training set consisting of 75% of the query log in terms of time period, and a test set consisting of the remaining 25%. Traditional k-fold cross-validation is not applicable to streaming sequence since it would disorder the temporal data [27]. Moreover, we filtered out a large volume of navigational queries containing URL substrings (.com,.net,.org, http,.edu, www.) from the dataset and removed queries starting with special characters such as &, $ and # from both datasets.

Additionally, only queries appearing in both partitions were kept in order to extract the popularity features. A cross-validation based approach cannot be applied in our setting because it will disorder the sequence of queries, resulting in incorrect time-aware query popularity. Table 1 details the statistics of the dataset. One of the most interesting findings from the data is that the phenomenon typified by a number of repeated queries from each user can be seen in both the training and testing phase.

Table 1
Statistics of the processed dataset. Queries: Qs, Users: Us

Variable Training Testing

# Qs 6,904,655 3,609,617

# Unique Qs 456,010 456,010

# Unique Us 466,241 314,153

# Qs/Us 14.81 11.49

Variable	Training	Testing
# Qs	6,904,655	3,609,617
# Unique Qs	456,010	456,010
# Unique Us	466,241	314,153
# Qs/Us	14.81	11.49

4.2 Baseline and metric

One baseline used for comparisons with our method was proposed in 2011 [4], namely, the most popular completion method, which sorts the queries by their popularity based on the whole log, referred as MPC in this paper. In addition, we use another baseline for comparison, i.e., MPC-R [12], which ranks query candidates by the popularity collected in recent R days of query logs, where we set R = 7 in our experiments [12].

To evaluate the quality of query rankings, Mean Reciprocal Rank (MRR) is selected as a standard measure. Given a query q with prefix p in the query set Q associated with a list of query candidates S (p) for this prefix p, and the user’s finally completed query q, Reciprocal Rank (RR) is computed as Equation (13): $RR = {\begin{matrix} \frac{1}{rank of q^{'} in S (p)}, & if q^{'} \in S (p) \\ 0, & else . \end{matrix}$ (13)

Then MRR is computed as the mean of RR for all prefixes.

4.3 Settings

In our learning-based query ranking approach experiments, we are first given a list of top N QAC candidates returned by MPC; we set N = 10 as this is commonly used [4 , 11–13] and used by many web search systems. In addition, the cases where the final submitted query is not included in the original list of N candidates by MPC are removed from the dataset, to guarantee the ground truth is returned.

We use the LambdaMART [24] learning algorithm for ranking QAC candidates across all experiments. For the observed query popularity, we set m ∈ {1, 2, 4, 7, 14, 28}, which was often used [12] for the same task and reported a good result. These options are also used for predicting the future popularity. Therefore, we can generate totally 15 popularity features for each query, i.e., 7 features from observations in recent m day(s), 7 features from predictions based on recent m day(s), and 1 feature from observation in the whole query logs. We set the tradeoff γ = 0.5 in (10) to balance the contribution of learning-based ranking and of query similarity. In addition, we compare the models at various lengths of prefix in character, ranging from 1 to 5 characters.

5 Results and discussions

In section, we will examine the effectiveness of our proposed model, compared to the baseline. Additionally, we zoom in on the feature importance to find the ranking of features used in our model.

5.1 Query completion ranking performance

In this section, we compare the query ranking results generated by our model against those of baselines, i.e., MPC and MPC-R. We report the performance of these two models in terms of MRR in Table 2. Clearly, our LQR model can beat the baseline MPC at all prefix lengths, which presents an MRR improvement around 8.10% and 5.31% generated at prefix length 1 and 5, respectively. Additionally, all the MRR scores are larger than 0.5, indicating that for most cases, the final submitted query can be returned early in the top two positions in the query list. One particularly interesting finding from Table 2 is that, the MRR scores are increased and the MRR improvement margins of LQR against MPC are decreased as the prefix length increases. It could be explained by the fact that long prefix can sharply narrow the space of query candidates, resulting in relatively high MRR scores and small MRR gaps when it’s increased.

Table 2
Performance in terms of MRR of our proposal LQR model and the baselines, tested at various prefix lengths (# p) ranging from 1 to 5 in character. The best performer in each row is boldfaced

# p MPC MPC-R LQR

1 0.5472 0.5411 0.5915

2 0.5613 0.5618 0.6031

3 0.5794 0.5802 0.6189

4 0.5916 0.5933 0.6263

5 0.6137 0.6158 0.6462

# p	MPC	MPC-R	LQR
1	0.5472	0.5411	0.5915
2	0.5613	0.5618	0.6031
3	0.5794	0.5802	0.6189
4	0.5916	0.5933	0.6263
5	0.6137	0.6158	0.6462

Regarding MPC-R, it performs better than MPC at long prefix, e.g., 4 or 5, however, it loses the comparison with MPC at short prefix, e.g., 1. In contrast, our LQR models again beats MPC-R for at all prefix lengths. From the results shown in Table 2, we can conclude that, our LQR model can benefit not only from the overall popularity of candidates but also from the recent popularity. In addition, the personalization mechanism in our model helps further boost the ranking performance.

5.2 Prefix-level bake-off

Next, we compare the performance of LQR and baselines at the prefix level in terms of reciprocal rank. We report the ratios of prefixes at various lengths for which LQR wins, ties and loses the comparisons against baselines in Table 3.

Table 3
Per prefix bakeoff, in terms of reciprocal rank: LQR vs. baselines. The ratios (%) of test prefixes at various lengths for which LQR loses against corresponding model listed in columns have a red background, ratios with equal performance have a yellow background, and those of prefixes for which LQR wins have a green background

MPC MPC-R

# p Loses Ties Wins Loses Ties Wins

1 5.38 251.47 343.15 5.10 248.64 346.26

2 7.04 252.83 340.13 7.78 251.69 340.53

3 8.14 254.02 337.84 8.82 253.73 337.45

4 9.85 253.81 336.34 10.11 253.98 335.91

5 9.91 253.78 336.31 10.25 254.61 335.14

	MPC	MPC-R
1	5.38	251.47	343.15	5.10	248.64	346.26
2	7.04	252.83	340.13	7.78	251.69	340.53
3	8.14	254.02	337.84	8.82	253.73	337.45
4	9.85	253.81	336.34	10.11	253.98	335.91
5	9.91	253.78	336.31	10.25	254.61	335.14

As shown in Table 3, compared to MPC, LQR shows a majority of draws (more than 50%). These draws actually often happen in these cases where both LQR and MPC return the ground truth (finally submitted query by user) early in the query list, e.g. at position 1 or 2. This could also be used to explain why these two models can receive very high MRR scores reported in Table 2. Another finding from Table 3, the opportunity for MPC of winning the comparisons against LQR is increased when more keystrokes are typed, i.e., when the prefix length becomes long. It means our model can work better for short prefixes than long prefixes.

In particular, LQR wins or ties MPC for most cases at various prefix length. In other words, LQR outperforms MPC frequently. It means these developed features for learning to rank queries used in LQR model do help improve the performance. We can also argue the search popularity in recent period does affect the ranking of query candidates. In order to have a deep look at these developed features, we will conduct an investigation on feature importance analysis in our learning algorithm in Section 5.3.

Regarding the comparison between LQR and MPC-R, similar findings can be observed. For prefix length at 1, MPC-R wins only 5.10% against LQR, less than the ratios of MPC, i.e., 5.38%. However, for longer prefixes, MPC-R wins LQR a bit more than MPC against LQR, indicated by the corresponding ratios at various prefix lengths. These results are consistent with the findings in Table 2, where MPC-R can work better than MPC at longer prefixes, e.g., 4 or 5.

5.3 Feature importance analysis

Following previous work [28], we implement a feature ranking investigation to derive the relative importance of features used in our learning approach, according to a χ² test. We report the top ten important features used in our learning approach in Table 4. We score the most important feature with 1.0000 and assign the relative importance scores of other features against the most important one. By doing so, the scores are all in (0, 1).

Table 4
Ten most important features used in our LQR model, returned by χ² test; the character o or p in subfix of feature f in column 2 means observed or predicted popularity, respectively; and the number or the character w means the time period for observation or the preceding day used for prediction, respectively

Rank Feature Relative importance

1 f _p+2 1.0000

2 f _p+4 0.8154

3 f _o+4 0.7043

4 f _o+2 0.6571

5 f _o+w 0.6048

6 f _p+1 0.5573

7 f _o+7 0.4667

8 f _p+7 0.4069

9 f _o+1 0.3646

10 f _o+14 0.3591

Rank	Feature	Relative importance
1	f _p+2	1.0000
2	f _p+4	0.8154
3	f _o+4	0.7043
4	f _o+2	0.6571
5	f _o+w	0.6048
6	f _p+1	0.5573
7	f _o+7	0.4667
8	f _p+7	0.4069
9	f _o+1	0.3646
10	f _o+14	0.3591

As shown in Table 4, the feature of predicted search popularity based on the observation in 2 days before is the most important feature in our LQR model. Besides that, the observed search popularity in recent 4 days or 2 days are also important, which are ranked in position 3 and 4, respectively. Apparently, the search popularity in recent period either from observations or from predictions plays absolutely important role in query candidate ranking. In contrast, compared to the features in Table 4, the feature of popularity in recent 28 days is less important. However, the overall search popularity in the whole query logs, i.e. feature f_o+w, line 5 in Table 4, is relatively important. That’s probably because for some queries, their popularity are not time-sensitive but they are frequently issued by searchers.

Finally, we zoom in on the performance of our LQR model only using these top ten important features reported in Table 4. We denote this model as LQR_top and plot the comparisons in Fig. 3. Clearly, both LQR model and LQR_top model can beat the baseline MPC as all the MRR margins, i.e., the MRR scores of LQR-MPC and LQR_top-MPC are positive. In addition, if we compare the MRR gap between LQR and LQR_top, the biggest gap can be found at prefix length 1 and the gap is further shorten as the length of prefix increases. It could be explained by the fact that long prefixes make it difficult to distinguish the results generated by these two models, because both models report good performance in terms of MRR. Moreover, we can see from Fig. 3 that LQR always outperforms LQR_top, indicating than our learning-based query ranking approach not only learns from the important features, but also from the less important ones.

Fig.3

Overall performance of two models, i.e., LQR and LQR_top, tested on all prefixes at various lengths in terms of MRR margin, i.e., LQR-MPC and LQR_top-MPC.

5.4 Relative contribution analysis

To examine the relative contribution of learning output and of personalization, we manually vary the value of γ in (10) in LQR and LQR_top models form 0.1 to 0.9 with interval 0.1. We plot the results in Fig. 4.

Fig.4

Performance of LQR in terms of MRR at various γ.

As shown in Fig. 4, apparently, our LQR and LQR_top models with a relatively small γ < 0.5 can produce better results than those with a large γ > 0.5. It means that in our task, the personalized information of search context similarity contributes much more than the learning output. It’s further confirmed by the finding that both models achieve the best performance when γ = 0.3. In addition, we find LRQ is more sensitive to γ than LQR_top as it shows obvious fluctuation when γ varies.

6 Conclusion

Previous work on query ranking is based on the overall search popularity. However, for some queries, their popularity is time-dependent. Thus, this basic query ranking model can not work well. To deal with it, we propose a learning-based query ranking approach, where the search popularity features based on recent observations and predictions are developed. In addition, we consider the query similarity to rerank the candidates using the preceding queries in session to personalize query completion. Our initial experimental results show our LQR model outperforms the baseline. In addition, we find the predicted popularity is important for query candidate ranking in a search system.

As to future work, we intend to have a closer look at the performance when more query candidates are initially returned by the popularity based candidate ranking method: how much can we gain from the good candidates that previously were ranked at lower ranks and later were pushed up by our model? In addition, we would like to implement our model on some artificial data to gain deep insight into the formulated problem (e.g. user behaviors). Moreover, we aim to transfer our approach to other datasets with long-term query logs. Finally, a further possible step is to extract these features in a parallelized way.

Footnotes

Acknowledgments

This work is supported by the National Advanced Research Project of China under No. 6141B08010101.

References

Jain

and Merchant

S.N.

, Wavelet-based multiresolution histogram for fast image retrieval, International Journal of Wavelets, Multiresolution and Information Processing2(1) (2004), 59–73.

Chang

C.-C.

, Chuang

J.-C.

and Hu

Y.-S.

, Similar image retrieval based on wavelet transformation, International Journal of Wavelets, Multiresolution and Information Processing2(2) (2004), 111–120.

Mitra

, Shokouhi

, Radlinski

and Hofmann

, On user interactions with query auto-completion, in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014, pp. 1055–1058.

Bar-Yossef

and Kraus

, Context-sensitive query autocompletion, in Proceedings of the 20th International World Wide Web Conference, 2011, pp. 107–116.

Shokouhi

and Radinsky

, Time-sensitive query autocompletion, in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 601–610.

Strizhevskaya

, Baytin

, Galinskaya

and Serdyukov

, Actualization of query suggestions using query logs, in Proceedings of the 21st International World Wide Web Conference, 2012, pp. 611–612.

Cai

and de Rijke

, A survey of query auto completion in information retrieval, Foundations and Trends in Information Retrieval10(4) (2016), 273–363.

Cai

, Liang

and de Rijke

, Prefix-adaptive and timesensitive personalized query auto completion, IEEE Transactions on Knowledge and Data Engineering28(9) (2016), 2452–2466.

Cai

, Reinanda

and de Rijke

, Diversifying query auto-completion, ACM Transactions on Information Systems34(4) (2016), Article 25.

10.

Cai

and de Rijke

, Learning from homologous queries and semantically related terms for query auto completion, Information processing and Management52(4) (2016), 628–643.

11.

Shokouhi

, Learning to personalize query autocompletion, in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013, pp. 103–112.

12.

Whiting

and Jose

J.M.

, Recent and robust query autocompletion, in Proceedings of the 23rd International World Wide Web Conference, 2014, pp. 971–982.

13.

Golbandi

N.G.

, Katzir

L.K.

, Koren

Y.K.

and Lempel

R.L.

, Expediting search trend detection via prediction of query counts, in Proceedings of the 6th ACM International Conference on Web Search and Data Mining, 2013, pp. 295–304.

14.

Miyanishi

and Sakai

, Time-aware structured query suggestion, in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013, pp. 809–812.

15.

Shokouhi

, Detecting seasonal queries by time-series analysis, in Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011, pp. 1171–1172.

16.

Santos

R.L.T.

, Macdonald

and Ounis

, Learning to rank query suggestions for adhoc and diversity search, Information Retrieval16 (2013), 429–451.

17.

Liao

, Jiang

, Chen

, Pei

, Cao

and Li

, Mining concept sequences from large-scale search logs for contextaware query suggestion, ACM Transactions on Intelligent Systems and Technology3(1) (2011), Article 17.

18.

Yera

and Martinez

, Fuzzy tools in recommender systems: A survey, International Journal of Computational Intelligence Systems10(1) (2017), 776–803.

19.

Cai

, Wang

and de Rijke

, Behavior-based personalization in web search, Journal of the Association for Information Science and Technology68(4) (2017), 855–868.

20.

Cao

, Jiang

, Pei

, He

, Liao

, Chen

and Li

, Context-aware query suggestion by mining click-through and session data, in Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2008, pp. 875–883.

21.

Jiang

J.-Y.

, Ke

Y.-Y.

, Chien

P.-Y.

and Cheng

P.-J.

, Learning user reformulation behavior for query auto-completion, in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014, pp. 445–454.

22.

Höffgen

K.-U.

and Simon

H.U.

, Robust trainability of single neurons, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 428–439.

23.

Joachims

, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Norwell, MA, USA: Kluwer Academic Publishers, 2002.

24.

Burges

C.J.

, Svore

K.M.

, Bennett

P.N.

, Pastusiak

and Wu

, Learning to rank using an ensemble of lambdagradient models, Journal of Machine Learning Research14 (2011), 25–35.

25.

Rahurkar

M.A.

and Cucerzan

, Using the current browsing context to improve search relevance, in Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008, pp. 1493–1494.

26.

Mikolov

, Sutskever

, Chen

, Corrado

G.S.

and Dean

, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems 26, 2013, pp. 3111–3119.

27.

Gama

, Žliobaitė

, Bifet

, Pechenizkiy

and Bouchachia

, A survey on concept drift adaptation, ACM Computing Surveys46(4) (2014), 44:1–44:37.

28.

, Li

, Chen

and Zhou

, Local energy-based framework for feature ranking, Journal of Intelligent and Fuzzy Systems28(4) (2015), 1565–1575.

	MPC			MPC-R
# p	Loses	Ties	Wins	Loses	Ties	Wins
1	5.38	251.47	343.15	5.10	248.64	346.26
2	7.04	252.83	340.13	7.78	251.69	340.53
3	8.14	254.02	337.84	8.82	253.73	337.45
4	9.85	253.81	336.34	10.11	253.98	335.91
5	9.91	253.78	336.31	10.25	254.61	335.14

Learning search popularity for personalized query completion in information retrieval

Abstract

Keywords

1 Introduction

2.2 Search context based query completion

3 Approach

3.1 Search popularity based features

4.1 Dataset

Table 1 Statistics of the processed dataset. Queries: Qs, Users: Us Variable Training Testing # Qs 6,904,655 3,609,617 # Unique Qs 456,010 456,010 # Unique Us 466,241 314,153 # Qs/Us 14.81 11.49

5 Results and discussions

5.1 Query completion ranking performance

Footnotes

Acknowledgments

References

Table 1
Statistics of the processed dataset. Queries: Qs, Users: Us

Variable Training Testing

# Qs 6,904,655 3,609,617

# Unique Qs 456,010 456,010

# Unique Us 466,241 314,153

# Qs/Us 14.81 11.49