Dynamic user modeling for expert recommendation in community question answering

Abstract

Community Question Answering (CQA) websites has become an important channel for people to acquire knowledge. In CQA, one key issue is to recommend users with high expertise and willingness to answer the given questions, i.e., expert recommendation. However, a lot of existing methods consider the expert recommendation problem in a static context, ignoring that the real-world CQA websites are dynamic, with users’ interest and expertise changing over time. Although some methods that utilize time information have been proposed, their performance improvement can be limited due to fact that they fail they fail to consider the dynamic change of both user interests and expertise. To solve these problems, we propose a deep learning based framework for expert recommendation to exploit user interest and expertise in a dynamic environment. For user interest, we leverage Long Short-Term Memory (LSTM) to model user’s short-term interest so as to capture the dynamic change of users’ interests. For user expertise, we design user expertise network, which leverages feedback on users’ historical behavior to estimate their expertise on new question. We propose two methods in user expertise network according to whether the dynamic property of expertise is considered. The experimental results on a large-scale dataset from a real-world CQA site demonstrate the superior performance of our method.

Keywords

Expert recommendation user modeling neural network community question answering

1 Introduction

In the past years, many Community Question Answering (CQA) [20] websites, such as Stack Exchange 1 and Zhihu 2 , have gained popularity among users. On CQA websites, users can raise their questions, and answer the questions posted by others. However, a CQA website may have tens of thousands of questions every day, which are difficult to be solved in a short time. Under this situation, expert recommendation [22], which aims to provide an answerer who are most likely to post satisfying answers for the questions, has become a promising technique to tackle the challenge.

Existing research methods on expert recommendation include feature based methods [13, 18] and matrix factorization(MF) based methods [7, 12]. The former methods focus on estimating user expertise level relying on crafted features. The latter methods rely on feedback information(e.g., votes in Stack Exchange). Recently, some related works [6 , 24] proposed to use network embedding to solve the expert recommendation and question retrieval problem due to its capability to handle sparsity data. However, the vast majority of these methods consider the expert recommendation problem in a static context. In particular, the expert recommendation is based on a snapshot of users’ previously asked or answered questions, ignoring the temporal information. However, the context of expert recommendation is dynamic due to the following two reasons. First, similar to general recommendation systems, users’ interests change over time. A user who answered a question related a certain topic may no longer to answer the questions in this topic anymore. Moreover, a lot of previous works [1] have proved that when buying goods or reading news, users’ next behavior is closely related to his recent behavior instead of remote historical behavior. This phenomenon also exists in CQA. For example, if a user just answered a question about English grammar, there’s a good chance that he will answer the questions in this field again. Second, users’ expertise may not be consistent as well, as users’ skills may improve or decline over time. Especially in some question fields where the updating of knowledge is very quickly, such as programming and artificial intelligence, this phenomenon is more obvious. The knowledge level of experts should be continuously evolving, otherwise they may lose their expertise. Thus, it is necessary to consider the dynamic change in CQA to to provide precise recommendation. There are some previous works attempting to make the expert recommendation methods adaptive in a dynamic context. Some of them used a decaying factor to suppress questions answered in remote history and focused on users’ recent behavior. Some methods designed time related features and fed them into classifier. However, these methods are relatively simple and most of them only consider one aspect of dynamic interest and dynamic expertise.

In order to overcome the above limitations, we propose Dynamic User Modeling method (DUM) for expert recommendation. Similar to previous works [15, 24], we cast the expert recommendation problem as a pair-wise ranking problem and we aim to learn accurate user embedding to make recommendation. But different from those methods which learn one embedding for each user that preserve both interest information and expertise information, we propose to learn user interest embedding and user expertise embedding separately. Because when assuming that the environment is static, the users’ interest is consistent with users’ expertise as users are interested in a question domain if they are specialized in it. But in the dynamic reality, a user may not answer the question because his current interest is not in this question domain even if he once has high expertise on it. Under the circumstances, user interest and user expertise are inconsistent, making it hard to capture both of them by one embedding.

Specifically, for user interest, we model users’ recent answering behavior by LSTM to capture users’ dynamic short-term interest and incorporate it with long-term interest to get users’ interest representation. For user expertise, we leverage the designed user expertise network, which exploits users’ historical answering behavior and feedback information to capture user expertise on different question domains. It has two network structures. The first one is treating historical answers disorderly, which applies to the scene with little expertise change. The second one is using LSTM to model the historical answering sequence so that the dynamic change of users’ expertise can be captured. We use the similar ranking strategy with the work [15] in which answerers who get more votes should get higher score. As illustrated in the previous works [19, 22], the experts should have both high interest and high expertise. Thus, the score predicted by our model is a combination of interest score and expertise score.

The main contributions of this paper are as follows:

We propose a novel framework DUM to predict the best answerers for new questions in CQA sites. Based on our knowledge, our approach is the first work that consider the dynamic change of both user interest and user expertise in expert recommendation problem.

We design a user expertise network, which can estimate users’ expertise on new questions based on their historical answer performance. By appling attention-based LSTM in the user expertise network, the dynamic change of user expertise can be captured precisely.

We demonstrate the effectiveness of the proposed framework on the real-world Stack Exchange datasets.

2 Related work

Expert recommendation, also known as expert finding [3 , 26], aims at recommending the best answerer to a new question. The majority of previous works fall into feature based methods, matrix factorization based methods and network embedding based methods. According to whether temporal information is used, these methods can be divided into static methods and dynamic methods. The former uses a snapshot of users’ previously asked or answered questions and they consider that user interest and expertise are static. Pal et al. [18] use three classes of features to train a binary classifier, including question features, user features and user feedback on answers. However, heavily relying on well-designed feature and feature selection limits its robustness. Cho et al. [7] combine generative probabilistic model and matrix factorization to solve the problem. Huang et al. [10] utilize a tree to guide the decomposition of 4th rank tensor data consisting of questions, topics, voting and expertise information. However, they model question content by simple language model or topic model, which are unable to capture the complex semantics of question content. Fu et al. [8] propose a recurrent memory reasoning network to explore the implicit relevance between a requester’s question and a candidate expert’s historical records. It performs much better than the above methods because it applies deep understanding and reasoning of semantic information. Recently, network embedding received a lot of attention in expert recommendation. Since it can capture the implicit association between users and questions by analyzing the network structure. Zhao et al. [25] adopt a random-walk based learning method with recurrent neural networks to learn ranking metric embedding from the CQA network. Li et al. [15] treat question raisers and answerers as different entities to make the recommendation more personalized. They learn representations of entities by metapath based random walk and compute ranking scores for answerers by a designed CNN scoring function. Although these methods obtain good performance, they ignores the time information, which is the focus of this paper.

There are a few methods belonging to the dynamic methods. They tend to consider different dynamic aspect separately, such as user availability, user interest, and user expertise. Currently, user availability is the most commonly considered dynamic aspect for the expert recommendation problem. Several studies used temporal features to estimate the availability of users for a given day or for a specific time of the day. For example, Sung et al. [21] use all replies of users to train a sigmoid function and Chang et al. [4] build binary classifiers using all answers of users within a fixed time frame. For dynamic user interest and expertise, a straightforward solution is to use a decaying factor to suppress questions answered in remote history and focus on users’ recent interest. Yeniterzi et al. [23] incorporate temporal information to model dynamic user expertise and apply two models, namely exponential and hyperbolic discounting models, to discount the effect of older records in calculating z-scores. This method is still rather straightforward being equivalent to using a decaying factor. Neshati et al. [17] consider four feature groups: topic similarity, emerging topics, user behavior and topic transition features. They found that all of these feature groups can be beneficial to predict the probability of becoming an expert in the dynamic environment. However, most existing dynamic methods rely on feature engineering [22], which is time-consuming and problem specific [5]. Thus, we design the neural network to exploit the dynamicity in an automatic way in this paper.

3 Problem statement

Let U ={ u₁, u₂, . . . , u_l } denote a set of users and Q ={ (q₁, t_{q
₁}) , (q₂, t_{q
₂}) , . . . , (q_c, t_{q
_c}) } denote a set of questions, where (q₁, t_{q
₁}) means the question q₁ is raised at time t_{q
₁}. A user’s answering history is represented by an ordered list: B (u) = ${(q_{1}^{u}, t_{1}^{u}, f_{1}^{u}), (q_{2}^{u}, t_{2}^{u}, f_{2}^{u}), . . ., (q_{| B (u) |}^{u}, t_{| B (u) |}^{u}, f_{| B (u) |}^{u})}$ , where $(q_{1}^{u}, t_{1}^{u}, f_{1}^{u})$ means the user u answered the question q₁ at time t₁, t_i < t_j for i < j, and received feedback $f_{1}^{u}$ . |B (u) | denotes the number of questions user u have answered.

Using above notations, we define the expert recommendation problem as follows: Given a set of questions Q_<T raised before time T, accompanying with answerers and feedback information (e.g. votes), compute the ranking scores of all potential users u ∈ U for each new raised question q ∈ Q_≥T so that the user with the highest score is the expert.

4 Framework

The architecture of the proposed framework is shown in Fig. 1. The first component of the framework is user interest modeling, shown in Fig. 1 (b). Considering that user’s interest is changing over time and user’s recent behavior can better reflect their current interest, we propose to learn the short-term interest representations from their recent behavior by LSTM. Then we combine it with the long-term interest, which is the average of users’ all history answered question embedding. The second component is user expertise modeling, shown in Fig. 1 (c). Feedback information from answers reflects users’ expertise. By using these feedback information, we design user expertise network to capture users’ expertise in specific question domain. In the ranking component shown in Fig. 1 (d), user interest embedding and user expertise embedding are fed into score function to compute the score of contributing the “accepted answers” for each user. The flow chart of the proposed framework is shown in Fig. 2.

Fig.1

Overview of the proposed framework DUM. (a) An example of CQA network. The highlighted part in bold means the question q₂ is answered by u₂, u₃, u₄. (b) Using LSTM to model users’ short-term interest and combine long-term interest to get user interest embedding. (c) Designing user expertise network to generate user expertise embedding. The detailed network structure is shown in Fig. 3. (d) Rank the user given user interest embedding, user expertise embedding and target question embedding. The user with the highest score is the expert.

Fig.2

Flowchart of the framework.

4.1 Question representation

In CQA websites, the question is described by the question title, content and tags. Generally, the question title and tags can clearly express the meaning of the question so we use them to represent questions. For question title, there are multiple ways to get its representation, such as TF-IDF and Doc2vec [14]. We choose to pretrain a LSTM [27] and extract the last hidden state as the title representation, denoted as $q^{ti} \in ℝ^{d_{q_{1}}}$ . For question tags, we randomly initialize their embeddings, which are trainable parameters, denoted as $q^{ta} \in ℝ^{d_{q_{2}}}$ . Then we concatenate them and apply a linear transformation to get the question embedding q: $q = W_{q} \cdot [q^{ti} \oplus q^{ta}]$ (1) where $W_{q} \in ℝ^{(d_{q_{1}} + d_{q_{2}}) \times d}$ is transformation matrix.

4.2 User interest modeling

User interest modeling is to learn embedding of user interest, denoted as $p^{u} \in ℝ^{d}$ for answerer u. It contains long-term interest and short-term interest, denoted as $p_{long}^{u}$ and $p_{short}^{u}$ . Users’ long-term interest are inherent and supposed to be static or changed slowly and it can be extracted from users’ long-term behavior. Thus, we use the average of users’ answering questions as long-term interest: $p_{long}^{u} = \frac{1}{| B (u) |} \sum_{i = 1}^{| B (u) |} q_{i}$ (2)

Users may have dynamic short-term interests, which to a great extent determines whether they are willing to answer the given questions. Consequently, we use LSTM to capture the sequential patterns due to its remarkable ability in processing sequential data. Specificly, we define the questions answered in the last n days as users’ recent behavior, which can be denoted as $B (u)_{\geq t_{c} - n}^{< t_{c}} = {q_{1}^{rec}, q_{2}^{rec}, . . ., q_{b}^{rec}}$ , where t_c is the current time and b is the number of recent behavior. The embedding of these questions, denoted as ${q_{1}^{rec}, q_{2}^{rec}, . . ., q_{b}^{rec}}$ , are the input of LSTM. The short-term user interest embedding is computed as follows: $\begin{matrix} f_{t} = σ (q_{t}^{rec} W_{f} + h_{t - 1} U_{f} + b_{f}), \\ i_{t} = σ (q_{t}^{rec} W_{i} + h_{t - 1} U_{i} + b_{i}), \\ c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tanh (q_{t}^{rec} W_{c} + h_{t - 1} U_{c} + b_{c}), \\ o_{t} = σ (q_{t}^{rec} W_{o} + h_{t - 1} U_{o} + b_{o}), \\ h_{t} = o_{t} ⊙ \tanh (c_{t}) \end{matrix}$ (3) where f_t, i_t and o_t represent the forget, input and output gate respectively, and c_t, h_t denote the cell state and hidden output at time t. The short-term interest is captured by the last hidden state h_b. In order to combine the long-term interest and short-term interest more naturally, we use the $p_{long}^{u}$ to initialize the first hidden state, that is, $h_{0} = p_{long}^{u}$ . So that the last hidden state h_b is a combination of short-term and long-term interest and we use it as user interest embedding, i.e., p^u = h_b.

Fig.‘3

The architecture of the user expertise network.

4.3 User expertise modeling

User expertise modeling is to learn embedding of user expertise, denoted as e^u $\in ℝ^{d}$ for question user u. In CQA website, the feedback information, including votes and accepted flag are the key to measure users’ expertise. Voting up an answer signals to the rest of the community that the post is well researched and useful. The more that people vote on an answer, the more certain future visitors can be confident of the quality of information contained within the answer. Hence voting indicates a CQA community’s long-term review for a given user’s expertise level under a specific topic. Users with high expertise tend to receive high votes. In addition, askers can select an answer as the accepted answer for their asked questions. The user who provided the accepted answer is represented as the best answerer. Considering that the absolute votes of answers are dependent on the question popularity in CQA sites, it is biased to use them to estimate users’ expertise directly. In order to eliminate this bias, we construct several statistics, such as the ratio and the ranking of the answer, which are listed in Table 1. These statistics form a dense vector, denoted as f_i for question q_i. We feed it into an activation network to calculate the expertise weight, which characterize the users’ expertise level of the question. For each question in B (u), we use expertise weight to activate it: $x_{i} = act (f_{i}) \cdot q_{i}$ (4) where act (·) is activation network, a feed-forward network with output as the expertise weight, as shown in Fig. 3. Questions that users perform well will be activated, which captures the domain-specific expertise.

Considering that different users answer different numbers of questions, We introduce two methods to transform the list of embedding vectors to a fixed-length embedding. The first method is pooling, as shown in Fig. 3 (a). We denote this method as DUM-pool. A straightforward way is to use mean pooling to maintain the most information. Nevertheless, it is biased to treat all historical answers equally, because for different target question, the contribution of each historical answers is significant different. For example, when the target question is about English Grammar, we should pay more attention to historical answers on this topic. Thus, we use attention mechanisms [2] to compute the weight. Note that for target question q_t raised at t_{q
_t}, we use B (u) _{<t_{q
_t}}to generate expertise embedding. For each question in B (u) _{<t_{q
_t}}, its attention weight is computed as follows:

Table 1

Feedback Statistics

Statistics	Description
Votes	The votes of the answer after min-max normalization
Ratio	Votes of answers divide votes of all answers of the question
Ranking	The ranking of the answer in all answers of the question
Accepted	If the answer is the accepted answer

$β_{i} = \frac{\exp (x_{i} W_{e} q_{t})}{\sum_{j = 1}^{| B (u)_{< t_{q_{t}}} |} \exp (x_{j} W_{e} q_{t})}$ (5)

where q_t is the target question embedding and W_e is a trainable parameter. The expertise embedding is the weighted sum of x_i: $e^{u} = \sum_{i = 1}^{| B (u)_{< t_{q_{t}}} |} β_{i} x_{i}$ (6)

The pooling method ignore the temporal information, treating the answering behavior in a disordered way. Typically, the user’s performance sequence can reflect his expertise’s evolution. For example, If a user’s answering performance is constantly improving, the more likely he is to become an expert, and vice versa. Thus, we propose to use LSTM to capture the the trend of the user’s expertise level changing with time. We denote this method as DUM-lstm, as shown in Fig. 3 (b). Specificly, the activated answering sequence, which contains the expertise information of the question, serve as the input of the LSTM. Similar with the DUM-pool, we also use attention mechanisms to focus on the questions which are similar to the target question and reduce the noise brought by unrelated questions. We denote it as A-LSTM. The attention weight is calculated in the same way with Equation 5. Then we utilize these scores to adjust the cell and hidden states of A-LSTM: $\begin{matrix} \tilde{c_{t}} = β_{t} * c_{t} + (1 - β_{t}) * c_{t - 1} \\ \tilde{h_{t}} = β_{t} * h_{t} + (1 - β_{t}) * h_{t - 1} \end{matrix}$ (7) In this way, the question related to the target question will send more message to the next time step. Finally, we use the last hidden state as users’ expertise embedding e^u.

4.4 Ranking

Obtaining the interest embedding p^u, expertise embedding e^u and target question embedding q_t, we use the inner product to compute the score because they’re in the same embedding space: $S = q_{t} \cdot (p^{u} + e^{u})$ (8)

Following previous works [15, 25], we introduce the relative rank to model the score. Given the votes of the answers, we design the following rules: (1) Users with higher vote score higher than those with lower vote. (2) Users who answered q_t score higher than those who did not. Let the tuple (i, j, q_t) mean that “user i’s answer obtains more votes than user j’s answer on the question q_t", (i, j, q_t) ∈ D₁. Let the tuple (y, z, q_t) mean that “user y answered the question q_t while user z didn’t”, (y, z, q_t) ∈ D₂. The inequality holds: $\begin{matrix} S (p_{i}^{u}, e_{i}^{u}, q_{t}) \geq S (p_{j}^{u}, e_{j}^{u}, q_{t}), i, j \in A_{q_{t}} and v_{i} \geq v_{j} \\ S (p_{y}^{u}, e_{y}^{u}, q_{t}) \geq S (p_{z}^{u}, e_{z}^{u}, q_{t}), y \in A_{q_{t}} and z \notin A_{q_{t}} \end{matrix}$ (9) where U_{q
_t} denotes the set of users who answered q_t, v denotes the vote. Based on the rules, for each question, we design its loss function as follows: $\begin{matrix} l (q_{t}) = \sum_{(i, j, q_{t}) \in D_{1}} S (p_{j}^{u}, e_{j}^{u}, q_{t}) - S (p_{i}^{u}, e_{i}^{u}, q_{t}) + \\ \sum_{(y, z, q_{t}) \in D_{2}} S (p_{z}^{u}, e_{z}^{u}, q_{t}) - S (p_{y}^{u}, e_{y}^{u}, q_{t}) \end{matrix}$ (10)

4.5 Model training

The parameters in our model include question tag embedding q^ta, transformation matrix W_q, W_e, parameters of LSTM and activation network. We denote all the parameters of the model by Θ. Therefore, the objective function in our learning process is given by: $min_{Θ} L (Θ) = \sum_{q_{t} \in Q} l (q_{t}) + λ {∥ Θ ∥}_{2}$ (11) where λ is the trade-off factor between the training loss and regularization. The parameters are updated by gradient-based algorithms and we adopt the Adam as the optimizer in our implementation. The main training process is summarized in Algorithm 1.

Algorithm 1 The proposed DUM model
Input: Questions in Q_<T, question title embedding q^ti
Output: Question embedding q, the user’s interest embedding
p^u and expertise embedding e^u
1: while not converged and not reaching max iterations do
2: for n ← 1 to N do
3: Sample a batch of q in Q_<T and generate D₁, D₂
4: Compute L (Θ) according to Equation (11)
5: Take a gradient step for L (Θ);
6: end while

5 Experiments

5.1 Experimental settings

5.1.1 Datasets

We experimented our method based on Stack Exchange Dataset, which contains many sub-dataset with specific topics. Two of them are used to evaluate our method: English and StackOverflow (SO) 3 . Each dataset contains questions posted over eight years (from July 2011 to March 2019) and covers a lot of tags of questions. The datasets differ in sizes and they have different propertiet so that DUM can be comprehensively tested. Considering that the StackOverflow dataset is extremely large, we random sample a set of questions of several important tags (e.g. Java, Python, C#, PHP). The statistics of the dataset can be seen in Table 2, where u, q, t, u/q, v/a represent users, questions, tags, average number of users per question and average number of votes per answer respectively.

Table 2
Statistics of the datasets

Dataset # of u # of q # of t # of u/q of v/a

English 23123 42972 944 7.6 5.6

SO 86714 239095 296 6.1 4.3

Dataset	# of u	# of q	# of t	# of u/q	of v/a
English	23123	42972	944	7.6	5.6
SO	86714	239095	296	6.1	4.3

5.1.2 Evaluation metrics

Since the expert recommendation problem is regarded as a ranking problem and usually only top returned results are of interest, we adopt three commonly used ranking metrics: Mean Reciprocal Rank (MRR), Hit at 5 (Hit@5), and Precision at 1 (Prec@1).

MRR takes into account of the rank of the correct answerer among recommended answerers for query question q: $MRR = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} \frac{1}{{rank}_{i}}$ (12) where rank_i is the rank of correct answerer among recommended answerers for query i.

Hit@k considers if the correct answerer appears in the top-k, defined as: $Hit @ k = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} I ({rank}_{i} \leq k)$ (13) For all evaluations, we set k = 5.

Prec@1 considers if the correct answerer appears in the first place, defined as: $Prec @ 1 = \frac{1}{| Q |} \sum_{i = 1}^{| Q |} I ({rank}_{i} = = 1)$ (14)

Table 3

The comparisons of MRR, Hit@5, and Prec@1 between DUM-pool, DUM-lstm and baseline models

Method	T=2019-01-01						T=2018-10-01
	English			StackOverflow			English			StackOverflow
	MRR	Hit@5	Prec@1	MRR	Hit@5	Prec@1	MRR	Hit@5	Prec@1	MRR	Hit@5	Prec@1
Similarity	0.225	0.406	0.097	0.231	0.417	0.122	0.218	0.398	0.099	0.223	0.404	0.110
T-ZScore	0.237	0.399	0.117	0.246	0.402	0.138	0.224	0.407	0.101	0.236	0.396	0.115
GBDT	0.372	0.602	0.188	0.381	0.605	0.203	0.362	0.600	0.192	0.369	0.588	0.197
ConvNCF	0.417	0.668	0.245	0.430	0.671	0.264	0.422	0.669	0.246	0.424	0.661	0.257
RMNL	0.491	0.717	0.336	0.503	0.732	0.340	0.492	0.708	0.331	0.488	0.709	0.335
LSTUR	0.514	0.730	0.333	0.527	0.744	0.339	0.502	0.711	0.327	0.511	0.728	0.328
NeRank	0.528	0.775	0.343	0.548	0.782	0.356	0.515	0.766	0.341	0.531	0.764	0.339
DUM-pool	0.575	0.803	0.398	0.591	0.811	0.397	0.565	0.790	0.393	0.568	0.801	0.390
DUM-lstm	0.578	0.808	0.394	0.598	0.819	0.410	0.564	0.793	0.397	0.580	0.805	0.404

5.1.3 Baselines

We evaluate the performance of our approach by comparing it with several baseline methods, In these methods, Temporal-ZScore, GBDT and LSTUR are dynamic methods because they take time into consideration and others are static methods.

Similarity. Find the closest question to the target question based on TF-IDF and recommend the answerer who has the highest votes.

Temporal-ZScore [23]. ZScore is calculated for each time interval and then it is exponentially discounted with respect to its temporal distance from question’s interval. We refer it as T-ZScore in the following.

Gradient boosting decision tree(GBDT). It’s a feature based method and we use LightGBM [11] as the implementation for its fast speed. Extracted features are reported in Table 4.

ConvNCF [9]. It’s a multilayer neural network architecture based matrix factorization method. It uses an outer product to find out the pairwise correlations between the user and item embedding.

RMNL [25]. A network embedding based methods which exploits both users’ relative quality rank to given questions and their social relations. Note that the Stack Exchange Dataset doesn’t contain social relations, so we replace it by user badges relation [16], which is defined as there is an edge between two users if there are more than 5 common badges between them.

NeRank [15]. This model is the state-of-the-art method based on network embedding. It learns representations of entities by the HIN embedding method and compute ranking scores by an explicitly defined CNN scoring function.

LSTUR [1]. This is a neural news recommendation approach which considers both long-term and short-term user representations. It uses LSTM to capture user’s short-term interest and it proposes two methods to combine long and short-term interest.

Table 4
Selected features of GBDT

Class Feature description

1 user id and question category id

2 16-dim tf - idf embedding of question

3 the reputation of users

4 number of answers provided by users

5 number of best answers provided by users

6 ratio of users providing the best answer

7 statistics of users’ historical upvotes/downvotes

8 time interval between the answer and the question

Class	Feature description
1	user id and question category id
2	16-dim tf - idf embedding of question
3	the reputation of users
4	number of answers provided by users
5	number of best answers provided by users
6	ratio of users providing the best answer
7	statistics of users’ historical upvotes/downvotes
8	time interval between the answer and the question

5.1.4 Parameter settings

As illustrated in Section 3, questions raised before the timestamp T are treated as training data and questions raised in or after T are left for evaluation. We design two different train/test splits by setting T = “2019-01-01” and “2018-10-01”. To avoid cold start, we remove those users whose historical records are less than 5 in test set. In addition, we set n = 30 when capturing users’ short-term interest. We set d_{q
₁} = 256 and d_{q
₂} = 64 and the size of user embedding d is set as 256.

5.2 Performance comparison

The performances of all methods are reported in Table 3, where the best results are highlighted in bold. We have the following main findings:

First, dynamic methods (e.g. T-ZScore, GBDT and LSTUR) perform quite different. T-ZScore makes little improvement compared to Similarity because it is rule-based, which is too simple. GBDT performs better for using a lot of handcrafted features. But it falls behind other deep-learning based methods probably because features are usually not optimal. What’s more, limited by simple language model, it fails to exploit the semantic information of question. LSTUR performs well thanks to the powerful ability of LSTM to process time series, which helps to capture users’ short-term interest. In addition, network embedding based methods are superior to other baseline methods because they captures not only the interactions but also the questions’ content information. NeRank outperforms RMNL and LSTUR because it considers the question raiser’s information and the CNN score funtion captures more complex relationships compared with inner product.

Second, our model significantly outperforms all the baseline methods, including the network embedding based methods, such as RMNL and NeRank on both datasets and train/test settings. The reason is that compared with the baseline methods which only learn one single embedding for each user, our model considers the inconsistency between user interest and user expertise and learn the embedding for each of them. The result turns out that it can enhance the expression ability of the model. By considering both users’ long-term and short-term interest, users’ dynamic behaviors can be predicted more precisely. What’s more, compared with LSTUR which only captures users’ interest, the designed user expertise network in our model explicitly exploit the feedback information to capture user expertise on the specific question domain. In addition, attention mechanism helps learn more critical information, boosting the performance.

Third, our proposed two methods to learn user expertise, i.e., DUM-pool and DUM-lstm, can achieve comparable performance and both outperform baseline methods, which validate the effectiveness of these methods. By comparison, we can find that DUM-lstm performs much better in the StackOverflow dataset than in the English dataset. This is because the properties of the datasets are different. In English, the knowledge is relatively fixed and the update is slow. However, in StackOverflow, the programming languages and technologies continue to develop so that the experts have to keep improve their knowledge level over time. The time information plays a more important role in StackOverflow when estimating users’ expertise.

Fig.4

The effectiveness of capturing both long-term and short-term user interest.

Fig.5

The effectiveness of the activation network.

Fig.6

The effectiveness of the attention mechanism.

Fig.7

Effect of recent behavior n and the dimension of user embedding d on the performance of model.

5.3 Ablation study

The DUM provides model components to (1) capture users’ short-term interest by LSTM and combine it with long-term interest and (2) capture user expertise by user expertise network. In order to validate the effectiveness of each component, we will provide further investigations to better understand the contributions of model components to the proposed framework. The analytical experiments are based on T = “2019-01-01” due to page limit.

First, we conduct several experiments to explore the effectiveness of capturing both long-term and short-term user interest. The results are summarized in Fig. 4. We compare the performance of our method DUM-pool with the model variants DUM-long and DUM-short. The former removes the users’ long-term interest $p_{long}^{u}$ and the latter removes the users’ short-term interest $p_{short}^{u}$ . We can find that both long-term and short-term interest are useful for expert recommendation. In comparison, the short-term interest benefit more to the model. This is probably because the short-term interest is closely related to users’ next behavior, which apply to that the dataset is divided by time. In addition, we replace the LSTM of DUM-pool with GRU, denoted as DUM-gru. The performance in StackOverflow dataset drops a lot. This may be because LSTM contains more parameters so it performs better in larger datasets.

Second, we design several experiments to validate the effectiveness of the user expertise network. The activation network and the attention mechanism are the essential elements of the user expertise network. So we will explore their effectiveness separately.

In our approach, the activation network takes the statistics of the feedback as input and output the expertise weight. We denote it as Statistics. We compare it with the designed variants, denoted as None and Vote. The former removes the activation network and the latter takes the vote as the expertise weight directly. The results on two datasets are similar so we just show the result on English dataset in Fig. 5. Both the performance of DUM-pool and DUM-lstm are severely decreased. Although Vote contains the vote information, it dose not make much improvement compared with None. The reason is that the absolute votes of answers are dependent on the question popularity, so using it directly will bring noise to the model. The result demonstrates that the vote information is critical for capturing users’ expertise. Moreover, the activation network can eliminate the bias, estimating the expertise precisely.

In order to evaluate the effectiveness of attention mechanism, we compare our model with several variants. In DUM-pool, we replace the attention mechanism with the average of all the question embedding in recent behavior, denoted as DUM-patt. In DUM-lstm, we remove the attention mechanism in LSTM, denoted as DUM-latt. The result is shown in Fig. 6. It can be seen that both variants have worse performance. It demonstrates that by concentrating on questions which are more related to the target question, the user expertise is captured more accurately.

To summarize, the user expertise network can precisely capture user expertise on specific question domain with the help of attention network, and modeling users’ short-term interest is useful to predicte users’ answering willingness.

5.4 Parameters sensitivity

We conduct experiments to analyze the impacts of two key parameters, i.e., the parameter n which defines users’ recent behavior and the size of user embedding d. The performances (in terms of MRR and Prec@1) on various settings of n and d are shown in Fig. 7.

It is observed that with the increment of n, the performance increases before n = 30 since more recent behaviors are involved in, contributing to a better short-term interest embedding. Then it stops improving because remote behaviors bring noise to the short-term interest modeling. Similar to n, increasing the embedding size from 32 to 256 can improve the performance significantly. However, with the embedding size of 300, the performance degrades. It demonstrates that using a large number of the embedding size has powerful representation. But too large size will lead to over-fitting.

Table 5
Question information

ID Title&Tag Time Feedback

16 Making a list of lists with variable size.&python 2018.03.18

55 Breaking out of a recursion in java.&java 2018.03.01

89 Pandas dataframe select columns in multiindex.&pandas 2018.02.19 35,2,0.51,0

174 Why does my recursive function return None? &python 2018.02.13 8,3,0.22,0

675 Installing Java JDK 7 For Mac OS&java 2017.12.02 9,4,0.11,0

725 How to unnest a column in a pandas DataFrame?&pandas 2017.11.28 2,6,0.06,0

281 How do I copy an object in Java?&java 2017.09.26 31,5,0.19,0

91 How to make a button bigger when pressed?&android 2017.08.14 9,1,0.51,1

332 What is a NullPointerException, and how do I fix it?&java 2017.08.02 20,2,0.48,0

ID	Title&Tag	Time	Feedback
16	Making a list of lists with variable size.&python	2018.03.18
55	Breaking out of a recursion in java.&java	2018.03.01
89	Pandas dataframe select columns in multiindex.&pandas	2018.02.19	35,2,0.51,0
174	Why does my recursive function return None? &python	2018.02.13	8,3,0.22,0
675	Installing Java JDK 7 For Mac OS&java	2017.12.02	9,4,0.11,0
725	How to unnest a column in a pandas DataFrame?&pandas	2017.11.28	2,6,0.06,0
281	How do I copy an object in Java?&java	2017.09.26	31,5,0.19,0
91	How to make a button bigger when pressed?&android	2017.08.14	9,1,0.51,1
332	What is a NullPointerException, and how do I fix it?&java	2017.08.02	20,2,0.48,0

5.5 Case studies

We conduct case study to reveal the inner structure of DUM on English dataset. Table 5 lists the information of the target question q₁₆ and q₅₅ and historical questions answered by user u₁₁, where the feedback is as illustrated in Table 1. We first visually explore the effectiveness of user expertise network. Fig. 8 shows the expertise weight and attention weight of these questions with respect to the target question q₅₅. We can find that questions which are answered well get higher expertise weight, e.g.,q₈₉, q₂₈₁, q₉₁, q₃₃₂. In addition, questions related to the target question such as q₆₇₅, q₂₈₁, q₃₃₂ get higher attention weight.

Then we compare the ranked lists of our model DUM-pool, DUM-lstm and variant model DUM-short for target question q₁₆ and q₅₅, respectively. As shown in the upper half part in Table 6, there are 6 candidate users for q₁₆ and the last two users u₁₁ and u₉₄ are sampled from answerers who didn’t answer q₁₆. The ground truth is highlighted in red. we can find that DUM-short ranks u₁₁ first probably because u₁₁ has a good performance in answering questions on the java tag. But it neglects that he hasn’t answered questions of java tag for a long time. DUM-pool manages to avoid this problem by considering users’ short-term interest.

For another target question q₅₅, it has 6 candidate users and u₁₁ is the ground truth. We can find that DUM-lstm makes the right recommendation but DUM-pool fails. It’s probably because the answering performance of u₁₁ on python tag is improving. However, DUM-pool can’t capture this trend because it ignores the time information when estimating user expertise. The DUM-lstm can make up this shortage by using the A-LSTM.

Table 6
Ranked list of q₁₆ and q₅₅

Candidates u₆₃, u₄, u₁₇₂, u₇₄₁, u₁₁, u₉₄ MRR

q ₁₆ DUM u₆₃, u₄, u₁₁, u₁₇₂, u₇₄₁, u₉₄ 1

DUM-short u₁₁, u₆₃, u₄, u₇₄₁, u₁₇₂, u₉₄ 1/2

Candidates u₁₁, u₂₉, u₅₆₃, u₁₀₂₉, u₃₄₈, u₂ MRR

q ₅₅ DUM-lstm u₁₁, u₅₆₃, u₂₉, u₁₀₂₉, u₂, u₃₄₈ 1

DUM-pool u₅₆₃, u₂₉, u₁₁, u₁₀₂₉, u₃₄₈, u₂ 1/3

	Candidates	u₆₃, u₄, u₁₇₂, u₇₄₁, u₁₁, u₉₄	MRR
	Candidates	u₁₁, u₂₉, u₅₆₃, u₁₀₂₉, u₃₄₈, u₂	MRR
q ₅₅	DUM-lstm	u₁₁, u₅₆₃, u₂₉, u₁₀₂₉, u₂, u₃₄₈	1
	DUM-pool	u₅₆₃, u₂₉, u₁₁, u₁₀₂₉, u₃₄₈, u₂	1/3

Fig.8

Illustration of the user expertise network.

6 Conclusion

In this paper, we propose a new framework to dynamically model user interest and user expertise for expert recommendation. For user interest, we learn both long-term and short-term interest to precisely predict the user’s next behavior. For user expertise, we propose to explicitly model the feedback information on history answers to estimate users’ expertise on new questions. In addition, in order to adapt to the characteristics of different question domains, we use two methods to learn user expertise, i.e., using mean pooling of history sequence to generate expertise embedding or using A-LSTM to capture the trends of users’ expertise. Experimental results on two real-world datasets show that the performance is boosted by capturing users’ short-term interest and accurate user expertise modeling. Some potential future work includes: (1) the questions representation can be improved by using attention mechanism; (2) the profile of user such as user personal description should be considered to solve the cold-start problem.

Footnotes

References

, Wu

, Zhang

, Liu

, Xie

, Neural news recommendation with long-and short-term user representations, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019), 336–345.

Bahdanau

, Cho

, Bengio

, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).

Castro

, Toledo

R.Y.

, Alzahrani

A.A.

, Sánchez

P.J.

, Barranco

M.J.

, Martínez

, A big data semantic driven context aware recommendation method for question-answer items, IEEE Access 7 (2019), 182664–182678.

Chang

, Pal

, Routing questions for collaborative answering in community question answering, In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013) (2013), , IEEE, pp. 494–501.

Chen

, Sun

, Task-guided and path-augmented heterogeneous network embedding for author identification, In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (2017), pp. 295–304.

Chen

, Zhang

, Zhao

, Yao

, Cai

, Question retrieval for community-based question answering via heterogeneous social influential network, Neurocomputing 285 (2018), 117–124.

Cho

J.H.

, Li

, Girju

, Zhai

, Recommending forum posts to designated experts, In 2015 IEEE International Conference on Big Data (Big Data) (2015), IEEE pp. 659–666.

, Li

, Zhang

, Wu

, Jiang

Y.-G.

, Recurrent memory reasoning network for expert finding in community question answering, In Wsdm 20: the Thirteenth Acm International Conference on Web Search Data Mining (2020).

, Du

, Wang

, Tian

, Tang

, Chua

T.-S.

, Outer product-based neural collaborative filtering, arXiv preprint arXiv:1808.03912 (2018).

10.

Huang

, Yao

, Wang

, Benatallah

, Zhang

, Dong

, Expert recommendation via tensor factorization with regularizing hierarchical topical relationships, In International Conference on Service-Oriented Computing (2018), Springer, pp. 373–387.

11.

, Meng

, Finley

, Wang

, Chen

, Ma

, Ye

, Liu

T.-Y.

, Lightgbm: A highly efficient gradient boosting decision tree, In Advances in Neural Information Processing Systems (2017), 3146–3154.

12.

Koren

, Bell

, Volinsky

, Matrix factorization techniques for recommender systems, Computer 8 (2009), 30–37.

13.

L.T.

, Shah

, Retrieving rising stars in focused community question-answering, In Asian Conference on Intelligent Information and Database Systems (2016), Springer, pp. 25–36.

14.

, Mikolov

, Distributed representations of sentences and documents, In International Conference on Machine Learning (2014), pp. 1188–1196.

15.

, Jiang

J.-Y.

, Sun

, Wang

, Personalized question routing via heterogeneous network embedding.

16.

Liu

, Li

, Bai

, Yan

, Zhang

, A dual attentive neural network framework with community metadata for answer selection, In National CCF Conference on Natural Language Processing and Chinese Computing (2017), Springer, pp. 88–100.

17.

Neshati

, Fallahnejad

, Beigy

, On dynamicity of expert finding in community question answering, Information Processing & Management 53(5) (2017), 1026–1042.

18.

Pal

, Farzan

, Konstan

J.A.

, Kraut

R.E.

, Early detection of potential experts in question answering communities, In International Conference on User Modeling, Adaptation and Personalization (2011), Springer, pp. 231–242.

19.

Sha

, Yu

, Jie

, Cabotà

J.B.

, Expert finding in community question answering: A review, Artificial Intelligence Review 23 (2018).

20.

Srba

, Bielikova

, A comprehensive survey and classification of approaches for community question answering, ACM Transactions on the Web (TWEB) 10(3) (2016), 18.

21.

Sung

, Lee

J.-G.

, Lee

, Booming up the long tails: Discovering potentially contributive users in communitybased question answering services, In Seventh International AAAI Conference on Weblogs and Social Media (2013).

22.

Wang

, Huang

, Yao

, Benatallah

, Dong

, A survey on expert recommendation in community question answering, Journal of Computer Science and Technology 33(4) (2018), 625–653.

23.

Yeniterzi

, Callan

, Moving from static to dynamic modeling of expertise for question routing in cqa sites, In Ninth International AAAI Conference on Web and Social Media (2015).

24.

Zhao

, Lu

, Zheng

V.W.

, Cai

, He

, Zhuang

, Community-based question answering via asymmetric multi-faceted ranking network learning, In Thirty-First AAAI Conference on Artificial Intelligence (2017).

25.

Zhao

, Yang

, Cai

, He

, Zhuang

, Expert finding for community-based question answering via ranking metric network learning, In Ijcai 16 (2016), 3000–3006.

26.

Zhao

, Zhang

, He

, Ng

, Expert finding for question answering via graph regularized matrix completion, IEEE Transactions on Knowledge and Data Engineering 27(4) (2014), 993–1004.

27.

Zhou

, Sun

, Liu

, Lau

, A clstm neural network for text classification, arXiv preprint arXiv:1511.08630 (2015).

	Candidates	u₆₃, u₄, u₁₇₂, u₇₄₁, u₁₁, u₉₄	MRR
q ₁₆	DUM	u₆₃, u₄, u₁₁, u₁₇₂, u₇₄₁, u₉₄	1
	DUM-short	u₁₁, u₆₃, u₄, u₇₄₁, u₁₇₂, u₉₄	1/2
	Candidates	u₁₁, u₂₉, u₅₆₃, u₁₀₂₉, u₃₄₈, u₂	MRR
q ₅₅	DUM-lstm	u₁₁, u₅₆₃, u₂₉, u₁₀₂₉, u₂, u₃₄₈	1
	DUM-pool	u₅₆₃, u₂₉, u₁₁, u₁₀₂₉, u₃₄₈, u₂	1/3

Dynamic user modeling for expert recommendation in community question answering

Abstract

Keywords

1 Introduction

2 Related work

3 Problem statement

4 Framework

5.1 Experimental settings

5.1.1 Datasets

Table 2 Statistics of the datasets Dataset # of u # of q # of t # of u/q of v/a English 23123 42972 944 7.6 5.6 SO 86714 239095 296 6.1 4.3

5.2 Performance comparison

5.4 Parameters sensitivity

Footnotes

References

Table 2
Statistics of the datasets

Dataset # of u # of q # of t # of u/q of v/a

English 23123 42972 944 7.6 5.6

SO 86714 239095 296 6.1 4.3