Algorithm for recommending answer providers in community-based question answering

Abstract

Obtaining answers from community-based question answering (CQA) services is typically a lengthy process. In this light, the authors propose an algorithm that recommends answer providers. A two-step framework is developed, in which a query likelihood language model is constructed that enables the determination of the interests of answer providers. The model is then used to identify answer providers who are interested in answering questions related to the identified topics. At the same time, a maximum entropy model is designed to estimate answer quality. Finally, an answer-quality-based algorithm is developed to model the expertise of answer providers for the purpose of differentiating answer providers of various capacities. The proposed scheme leverages answer provider interest and expertise, allowing for more effective differentiation. Experiments on real-world data from Baidu Knows, a renowned Chinese CQA service similar to Yahoo! Answers, reveal significant improvements over the baseline methods, and test results demonstrate the effective of the novel approach.

Keywords

community-based question answering maximum entropy model query likelihood language model recommending answer providers

1. Introduction

Recent years have seen the rapid development of Q&A systems because of the disadvantages of conventional search engines and rising user demand for fast information access. In contrast to existing document retrieval systems, a Q&A system can answer users’ questions in accurate, simple, and natural language. Early Q&A systems based on the auto-answering framework rely primarily on natural language processing, knowledge representation and reasoning, and machine learning. Thus far, however, no ideal open-domain question answering system has been developed.

With the development of Web 2.0 and social networking services, community-based question answering (CQA) services have become popular. An increasing number of users rely on CQA to resolve daily problems. Baidu Knows, a renowned Chinese CQA service similar to Yahoo! Answers, has currently accumulated more than 200 million questions, with an increase of more than 100,000 new questions daily. However, problems arise along with CQA development and the increase in the volume of system data. Given the rise in the quantity of questions submitted in CQA systems, users spend a considerable amount of time waiting for other users to answer the inquiries. In many cases, some of the questions remain unanswered. Meanwhile, users who are willing to answer questions encounter difficulties in finding topics that they are interested in. To prove this point, the authors randomly selected 1000 newly raised questions from Baidu Knows, and checked them again after two days. The result reveals that 68.2% of the questions were responded to with users’ answers, but only 13.2% of them were resolved. Therefore, research on recommending answer providers is essential to CQA service development. An effective system is characterized by the immediate forwarding of newly raised questions to potential answer providers; this allows for the efficient resolution of questions, thereby reducing waiting time and enhancing user experience.

In this article, the authors propose an algorithm that recommends answer providers, with comprehensive consideration for factors related to the interests and expertise of the answer providers. This consideration is based on an extensive analysis of research on answer provider recommendation. First, a query likelihood language model was constructed to analyse answer provider interest for the purpose of identifying groups of suitable answer providers. Because the quality of the answers to questions varies, individuals who address a large number of questions may be not suitable as answer providers. For example, a user may provide a lot of answers, while few of them are effective, so this user must not be a reliable answer provider. Thus, a maximum entropy model is designed to determine answer quality. On the basis of the quality of answers provided, an answer provider expertise model is established to distinguish amongst the capacities of different answer providers and reveal individual characteristics.

2. Related work

Extensive research has been devoted to CQA services in an effort to reduce user waiting time and improve the efficiency of question-answering systems. Before inquiries are submitted, searching for similar questions can provide high-quality answers that may already be available in the system. This feature not only reduces waiting time, but also conserves system resources. In view of this mechanism, Jeon et al. [1] presented an algorithm intended to search for semantically similar questions in a question-answering system. This algorithm calculates question similarities through a language translation model, and considers answers for the corresponding question. Wang et al. [2] proposed a syntax tree structure-based, rather than training-based, query framework to search for similar questions.

The sources of knowledge or information in CQA systems are the question–answer pairs. The number of question–answer pairs generated by a system is relatively limited, whereas more knowledge or information is available in user-generated internet forums. Therefore, these question–answer pairs can be extracted from internet forums as a supplement to Q&A systems. Cong et al. [3] studied the extraction of question–answer pairs from forums and proposed a classification-based sequential pattern algorithm that enables searching for question posts. A transfer-based graph theory was then used to detect answer information in question candidates. Wang et al. [4] examined answer ranking issues by building models of the relationship between questions and answers in Q&A systems. On the assumption that several different types of direct associations exist between questions and answers, the authors used a statistical regression model to calculate such associations and proposed an approach based on analogy to identify the most matched answers.

Some studies are focused on recommending answer providers for newly raised questions in the community. Li et al. [5] used the source codes released by users to build a ‘concept network’, accomplished through the analysis of the class inheritance and class call relationships. A user network is set up by examining the interdependence of posts in the forums. Then, ‘expert’ recommendations are provided in accordance with the concept of question posts and conceptual similarity of answer providers. Zhou et al. [6] used a statistical language model to represent answer provider information, and recommended answer providers in accordance with the dependence amongst posts as well as post topics. Nevertheless, building a ‘reply relationship’ between users/answer providers for most answering systems is difficult, making this method impractical. In the literature [7–9], the probabilistic latent semantic analysis (PLSA) model [8] was implemented to resolve issues related to recommendations on answer providers. PLSA was used to analyse the questions addressed by answer providers and build a model that depicts the interests of the answer providers. The similarity between answer provider interests and questions for recommendation was then calculated. Finally, a recommended list of answer providers was drawn. Given that answer quality varies, recommending answer providers solely on the basis of an interest model is not a sufficiently accurate approach. Moreover, question description in question-answering systems is usually shorter than ordinary document information. Because of limited data availability, constructing a precise answer provider interest model using latent semantic analysis techniques is difficult. In the present work therefore, different methods to model answer provider interest are used, whilst comprehensively considering answer quality. To further improve the accuracy of the recommendations, an expertise model is proposed.

3. Recommendation algorithm

3.1. Formal definition

First, a formal definition and description of the problem are presented.

Definition 1: Answer provider recommendation

An answer provider can be defined as follows. It pertains to a given set of users $U = {u_{1}, s u_{2}, s \dots, s u_{n}}$ with a newly raised question $q_{r}$ , for which the system recommends appropriate answer providers $u_{r 1}, u_{r 2} \dots, u_{rk}$ who satisfy certain specific conditions as they answer newly raised questions. Recommended answer providers have to satisfy

u_{i} = TOPN {UR (u_{i}, q_{r}, C)}

(1)

where $TOPN {UR (u_{i}, q_{r})}$ denotes the top N answer providers, and $UR (u_{i}, q_{r}, C)$ represents the degree of satisfaction of answer provider $u_{i}$ with the answer to question $q_{r}$ under condition C. The larger the $UR (u_{i}, q_{r}, C),$ the more capable is the answer provider in addressing question $q_{r}$ .

The conditions that the recommended answer provider should satisfy can be analysed from two perspectives: (a) the recommended answer provider should show considerable interest in the question (topic). Constructing an interest model for evaluating the degree of answer provider interest in the question is therefore necessary; (b) the recommended answer provider should have the capacity (expertise) to answer the questions. Thus, constructing an expertise model is also necessary to determine whether the expertise matches the corresponding topic.

3.2. Modelling answer provider interest

Answering a question is usually an indication that the answer provider is interested in the topic. Therefore, to identify the appropriate answer to a newly raised question, the authors set up an answer provider interest model and then recommend a suitable answer on the basis of the matching degree between answer provider interest and question.

To evaluate interest, other questions previously addressed by the answer provider are explored. The information from these answered questions may reflect interest to a certain extent.

In the Q&A community, the interests of answer providers reflect stability over long periods, and an answer provider is usually interested only in questions that fall within specific topics. Meanwhile, the terms combined within a topic are easily distinguished, indicating that topics may differ significantly in terms of vocabulary. All questions in Baidu Knows are divided into many categories, such as sports, business/financial, computer/network, and so on. Before a new question is submitted, the question asker should choose a category for their question. By analysing words included in questions and answers of Baidu Knows, the authors find that questions in the sports category include a few keywords with high probability, such as ‘football’, ‘basketball’, ‘NBA’, and other specific terms. Conversely, questions in the business/financial category are more likely to contain keywords such as ‘stock’, ‘transaction’, and ‘shop’, amongst others. Frequently occurring words in the set of questions addressed by answer providers (questions raised by users and answers given by answer providers) characterize answer provider interest, as shown in Table 1.

Table 1.

Distribution of answer provider vocabulary

Answer provider 1 vocabulary	Distribution probability, %	Answer provider 2 vocabulary	Distribution probability, %
Patch	6.41	Enterprise	1.64
Update	2.45	Audit	1.58
Polo shirt	1.98	Borrow	0.90
Team	1.93	Loan	0.87
EPL	1.52	Finance	0.81
Bundesliga	1.32	Expense	0.72
Modification	1.24	Risk	0.72
Football	1.23	Management	0.69
Real Madrid	1.21	Accountancy	0.63
mb	1.14	Property	0.62
…	…	…	…
		Job	0.51
		Company	0.49
Company	0.001	…	…
Employment	0.001	Modification	0.03
…	…	Update	0.01
…	…	…	…

If the keywords of a newly raised question can be derived from a set of questions previously addressed by an answer provider, then it may be inferred that the answer provider may have considerable interest in the question. Thus, the query likelihood language model [9] is implemented to measure the degree of interest in the question. The measurement is accomplished by calculating the probability that the newly raised question will be generated from previously answered questions.

Definition 2: Degree of answer provider interest in the question

From the perspective of the language model, the degree of interest of answer provider $u_{i}$ in question $q_{r}$ is defined as the likelihood of generating question $q_{r}$ from a set of questions already addressed by $u_{i}$ . It is expressed as follows:

I (u_{i}, s q_{r}) = P (q_{r} | θ_{Q_{r} (u_{i})})

(2)

where $I (u_{i}, s q_{r})$ is the degree of answer provider $u_{i}$ 's interest in newly raised question $q_{r}$ , and $P (q_{r} | θ_{Q_{r} (u_{i})})$ is the query likelihood of $q_{r}$ under language model $θ_{Q_{r} (u_{i})}$ of the set of questions $Q_{r} (u_{i})$ addressed by $u_{i}$ . The latter indicates the matching degree between the set of questions $Q_{r} (u_{i})$ and question $q_{r}$ as the degree of interest of $u_{i}$ in $q_{r}$ .

Figure 1 illustrates the method that indicates the degree of answer provider interest on the basis of the query likelihood model. In all the sets of questions addressed by the answer provider (Figure 1), corresponding language model $θ_{Q_{r} (u_{i})}$ is estimated first, and then probability $P (q_{r} | θ_{Q_{r} (u_{i})})$ of generating question $q_{r}$ is calculated under this language model. All answer providers are ranked by the probability values, and appropriate answer providers are recommended. Given a language model, the probabilities of generating two different questions should differ [e.g. $P (q_{r 1} | θ_{Q_{r} (u_{i})}) \neq P (q_{r 2} | θ_{Q_{r} (u_{i})})$ ]. The probability of generating a question that contains more frequently occurring keywords under the given language model should be higher because frequent words can be an indication of the language model topics.

Figure 1.

Diagram of the query likelihood language model.

In this method, the problem of determining the degree of interest is transformed into a calculation process for generating the probability (or query likelihood) that a question will be raised under a given language model. To calculate the query likelihood of question $q_{r}$ , language model $θ_{Q_{r} (u_{i})}$ is defined for question set $Q_{r} (u_{i})$ and then estimated on the basis of the question set. Two basic forms of language models exist: the multi-Bernoulli model and the multinomial distribution model (also known as the unitary model). Compared with the multivariate model, the unitary model is relatively simple and can substantially reduce system complexity and computational overhead. Moreover, according to the existing literature [10], the unitary model is more effective than other more complex language models (such as the bigram and tri-gram language models), which cannot significantly improve performance. Therefore, the multinomial distribution model is adopted to construct a language model for the set of questions addressed by answer providers; that is, queries are treated as result sequences of a multinomial random experiment, in which each term w in the language model corresponds to a multinomial random variable. The query likelihood in the multinomial model is a polynomial distribution; that is, for $q_{r} = w_{1} w_{2} \dots w_{m},$

P (q_{r} | θ_{Q_{r} (u_{i})}) = \prod_{i = 1}^{m} P (W_{i} | θ_{Q_{r} (u_{i})}) = \prod_{W \in q_{r}} P {(W| θ_{Q_{r} (u_{i})})}^{c ({w,q}_{r})}

(3)

In the formula, $c (w, q_{r})$ is the appearance time of $w$ in question $q_{r}$ . For the estimation of language model, $θ_{Q_{r} (u_{i})}, s$ that is $P (w | θ_{Q_{r} (u_{i})})$ the maximum likelihood method is used to estimate the model parameters. The maximum likelihood of the language model is calculated as follows:

P_{ml} (w | {\hat{θ}}_{Q_{r} (u_{i})}) = \frac{c ({w,Q}_{r} (u_{i}))}{| Q_{r} (u_{i}) |}

(4)

where $c (w, Q_{r} (u_{i}))$ is the appearance time of $w$ in question set $Q_{r} (u_{i})$ , and $| Q_{r} (u_{i}) |$ is the length of the question set (the total number of keywords in a question set). However, a problem arises if only the maximum likelihood is used. That is, for a term not included in the set of questions addressed by the answer provider, probability $P (w | θ_{Q_{r} (u_{i})}) = 0$ is clearly unreasonable. Because of this phenomenon, the zero probability problem caused by the lack of data is eliminated using the data smoothing technique. In the current work, the Jelinek–Mercer smoothing method [11] is used, incorporating the language model to the question set and conducting linear interpolation. Thus, the estimation of the language model is:

P_{λ} (w | {\hat{θ}}_{Q_{r} (u_{i})}) = (1 - λ) P_{ml} (w | {\hat{θ}}_{Q_{r} (u_{i})}) + λ P (w | col)

(5)

where $λ$ is a smoothing parameter, $λ \in [0, 1]$ ; $λ = 0$ is a pure maximum likelihood estimation; $λ = 1$ denotes that the language model for different sets of question addressed by the answer provider is transformed into the same language model, and $P (w | Col)$ is the language model for the data set.

Thus, the formula for calculating answer provider interest is:

P (q_{r} | θ_{Q_{r} (u_{i})}) = \underset{w \in q_{r}}{Π} P {(w | θ_{Q_{r} (u_{i})})}^{c (w, q_{r})} = \underset{w \in q_{r}}{Π} P_{λ} {(w | θ_{Q_{r} (u_{i})})}^{c (w, q_{r})} = \underset{w \in q_{r}}{Π} [(1 - λ) (\frac{c (w, Q_{r} (u_{i}))}{| Q_{r} (u_{i}) |}) + λ \frac{c (w, Col)}{| Col |}]^{c (w, q_{r})}

(6)

3.3. Answer provider expertise model

Determining recommended answer providers by analysing interest only on the basis of historical information in the questions addressed by answer providers is an insufficiently accurate approach because of the prevalence of spam and the varying quality of answers given by answer providers. The answer providers to be recommended should not only be interested in the question, but also be an expert in the topic; these characteristics guarantee high-quality answers. Thus, to derive a more effective recommendation algorithm, answer quality is also taken into consideration by constructing an answer provider expertise model, which is based on the answer provider interest model.

Assuming that the responses of the answer provider for a given topic exhibit a greater probability of being high-quality answers (such as the Baidu Knows community answers that exhibit high adoption rates), it can be inferred that the answer provider possesses sufficient expertise in a particular field. Thus, the quality of answers already given by answer providers is used to determine answer provider expertise. To determine who should be recommended, his/her expertise is matched with the topic.

Because a newly raised question has not been addressed by a potential answer provider, the quality of the answer to be provided necessitates prediction. The weighted average of the answers already given by the answer provider is used to estimate the quality of the answer to the newly raised question:

Q (u_{i}, q_{r}) = \frac{\sum_{q \in Q_{r} (u_{i})} Q (u_{i}, q) \times sim (q, q_{r})}{\sum_{q \in Q_{r} (u_{i})} sim (q, q_{r})}

(7)

where $q$ is one of the questions already addressed by answer provider $u_{i}$ in question set $Q_{r} (u_{i}),$ and $sim (q, q_{r})$ is the cosine similarity between the two questions. The lexical vector of a question represents the question itself, and word feature weights can be calculated using the TF-IDF method.

In the answering community, a user (the individual submitting an inquiry) can select the answer with which he is most satisfied as the best answer. Apart from this criterion, no other standard for the quality of answers is used in the answering community. More than one correct answer may exist amongst those given by different answer providers, but the user can only choose one best answer. The quality of answers to the question should therefore be estimated using a specific method.

The quality of answers can be measured in terms of different aspects, such as accuracy, completeness, timeliness, reliability, verifiability, etc. [12]. Zhu et al. [13] proposed an evaluation framework for estimating answer quality. However, the indicators they used were mostly descriptive and could not be obtained and calculated automatically. The current study uses this evaluation to label experiment training data. In addition, the maximum entropy model is trained by extracting some question–answer characteristics that can be calculated directly and obtained to derive the method for automatically calculating answer quality. The general framework of the maximum entropy model is shown in Figure 2 [14].

Figure 2.

Frame diagram of the maximum entropy model.

Answer quality is evaluated as either good or bad. Estimating answer quality means determining the probability of occurrence for an event $y$ (the quality of the answer is good or bad) in a given context $x$ (feature of questions and answers). The characteristic function is used to represent the context that is dependent on the model. The estimated probability of event occurrence based on the maximum entropy model is:

p^{☆} (y | x) = \frac{1}{Z (x)} exp (\sum_{i} λ_{i} f_{i} (x, y))

(8)

where $f_{i} (x, s y)$ is the characteristic function, $Z (x) = \sum_{y} \exp (\sum_{i} λ_{i} f_{i} (x, y))$ is the normalization factor, and $λ_{i}$ is the weight of corresponding characteristic $f_{i}$ .

On the basis of the parameter form of the maximum entropy model, the probability distribution function can be generated, after which the maximum entropy model is fully constructed, provided that parameter values $Λ^{☆} = {λ_{1}, λ_{2}, \dots, λ_{n}}$ are present. Generalized iterative scaling (GIS) or improved iterative scaling (IIS) is typically adopted to estimate the parameters of the maximum entropy model. In this article, the authors use the statistical language model kits¹ provided by Carnegie Mellon University to estimate the model parameters.

According to the discussion above, characteristic selection is a key factor in the maximum entropy model. For the features of questions and answers in this article, selection mainly focuses on text and non-text characteristics. Text characteristics pertain to answer content, whose characteristics include answer length, ratio of answer lengths, number of different words in an answer, and number of words both in the question and answer. Non-text characteristics refer to the features of an answer provider; these include the features of answers [answer provider’s credit (answer provider’s empirical value], answer provider’s authority (adoption rate of answers, number of praise acquired), number of answers, ratio of questions to answers, number of answers to the question, etc.

The values of text characteristics can be calculated by extracting the corresponding question content. Before calculating answer length and number of words, text content is pre-processed by removing stop words, such as ‘the’. Given the unique quality of the Chinese language, Chinese words are first segmented for text. With regard to non-text characteristics, the corresponding information is directly extracted by analysing the structure of web pages. In addition, the probability of characteristics in the maximum entropy model monotonically increases, that is, the larger the characteristic value, the higher the event occurrence probability. Adoption rate is defined as the rate of user acceptance of the answer. A high adoption rate for the responses given by the answer provider indicates a high probability that the answer provider can respond well to the question. However, a longer answer cannot always be regarded as a better answer. For this reason, conversion is essential for characteristic values. The method given in [15] is adopted for converting the calculation for such non-monotonic characteristic values.

The model that predicts answer quality is established for newly raised questions given by recommended answer providers (that is, the answer provider expertise model). First, the maximum entropy model is constructed by model training. In model training, the characteristics of the answers already addressed by answer providers are first extracted and calculated. The quality of answers is marked by artificial marking and the training samples are generated. The learning parameters of the maximum entropy model are then proposed using the GIS algorithm, and the weights of all the characteristics in the model are calculated. This step completes the construction of the maximum entropy model for answer-quality measurement. After this, the quality of answers $Q (u_{i}, q)$ already addressed by the answer provider is calculated according to the model above. Finally, on the basis of the quality of the answers given by the answer provider, as well as the similarity between newly raised and already-answered questions, the quality $Q (u_{i}, q_{r})$ of the answer provider response for a newly raised question is predicted using Equation (7).

3.4. Recommendation algorithm

An algorithm that recommends answer providers is developed in this article, with comprehensive consideration for the degree of answer provider interest in questions, as well as matching the degree of answer provider expertise and topic. This process is expressed as follows:

UR (u_{i}, q_{r}) = (1 - α) I (u_{i}, q_{r}) + α Q (u_{i}, q_{r})

(9)

where $α \in [0, 1]$ is the weight coefficient. As the value of $α$ increases, the weight of the answer provider’s expertise in recommendation grading increases; and as the value of $α$ decreases, the weight of the answer provider’s interest increases.

4. Experimental results and analysis

4.1. Experimental data

The experimental data were automatically extracted by our programs from Baidu Knows. The Q&A data from Baidu Knows was crawled for a half-month period (1–15 January 2010). A total of 1,017,461 valid data items (only questions with answers) were used. Statistical data information is shown in Table 2 and Figures 3 and 4.

Figure 3.

Distribution of number of answers given by per answer provider.

Figure 4.

Comparison diagram of number of answers given by per answer provider and total number of answers in special region.

Table 2.

Statistical information on the crawled data set

Total number of users who asked or answered questions	Number of questions	Number of answers
1,160,723 (number of users who asked questions is 704,595, number of users who answered questions is 604,615)	1,017,461	2,992,870

Figures 3 and 4 show that, in a Q&A community, most users rarely answer questions. In the statistical data, there are more than 270,000 users (more than 45% of users who have answered questions) who only answer questions once, and the number of their answers is only 9.10% of all answers provided by all users. On the contrary, the number of users who answer questions more than 20 times is 3.66% of users who have answered questions, but they produce 40.62% of all answers.

Because most answer providers in the data set addressed very few questions (more than 45% answered only once), this aspect cannot be used to learn answer provider interest and construct the expertise model. Answer providers recommended by the system should be the more active ones with reasonable response records. Therefore, the data set is filtered to answer providers who have addressed more than 20 questions. Experimental data were collected from these Q&A data sets in accordance with the responses given by the answer providers. Data set information obtained after screening is shown in Table 3.

Table 3.

Statistical information on the experimental Q&A data set

Number of answer providers	Number of questions	Number of answers
21,140	525,627	1,215,831

The experimental data covering 1–10 January 2010 were chosen as training data for the construction of the answer provider model; data on the answer providers for the last five days were chosen as test data for the evaluation of the proposed recommendation algorithm.

4.2. Evaluation method

Two basic assessment parameters (precision and recall) were used to evaluate the proposed algorithm. Precision indicates the proportion of highly relevant answer providers out of all answer providers to be recommended, and is expressed as:

precision = \frac{| X \cap Y |}{X}

(10)

where $X$ is the recommendation result and $Y$ is the set of highly relevant answer providers.

Recall indicates the proportion of recommended relevant answer providers in all highly relevant answers:

recall = \frac{| X \cap Y |}{| Y |}

(11)

Recall is denoted by the proportion of highly relevant answers contained in the first N results as R–P@N.

However, because precision does not consider result rank, it cannot satisfy our goal of evaluating the overall recommended results. Thus, average precision is calculated instead:

avgprec = \frac{1}{K} \sum_{i = 1}^{K} \frac{i}{r_{i}}

(12)

where $K$ is the number of recommendation results, and $r_{i}$ is the rank of the $i$ th result. If the result is irrelevant, $r_{i}$ is infinite.

Given that more than one test data item is used in the experiment, the mean average precision (MAP) is taken as the indicator of precision evaluation for the recommendation algorithm. In addition, the mean of reciprocal rank (MRR) of the first ‘correct result’ from the recommendation results is also used. This indicator can be used to measure how many answer providers should be recommended before the first highly relevant answer provider is recommended.

4.3. Experimental results

In the experiment, the historical information was first segmented on the answers already given by answer providers using ICTCLAS1.0² and Lucene³ The models for answer provider interest and answer provider expertise were then constructed. Different methods were tested and compared. These methods are as follows:

the baseline model (BM), which is the recommendation algorithm for constructing the answer provider interest model based on PLSA;

the interest-based model (IM), which is the recommendation algorithm for constructing the answer provider interest model based on language model; and

the IM + expertise, which is the recommendation algorithm that comprehensively considers answer provider expertise.

With regard to IM, Equation (6) was used to calculate the answer provider rating, selecting recommended answer providers. The experimental result for IM is shown in Figure 5, which shows traversal from $λ = 0$ at a step of 0.1.

Figure 5.

Recommendation results of the IM algorithm.

Figure 3 shows that, when $λ > 0$ data smoothing is incorporated, the effect of the model can be improved, so that when $λ$ increases from 0 to 0.1 the recommendation effect visibly increases. However, after $λ$ increases beyond a certain threshold, the recommendation effect does not improve as the weight of background data increases, and inversely, the model effect decreases. If $λ = 1$ , that is, when answer provider individuality is ignored, the recommendation effect is poor. These results suggest that data smoothing on background data can eliminate the problems caused by data scarcity and improve the model results. However, with increasing background weight, the information on answer provider individuality is gradually overshadowed, worsening the system effect. This result is consistent with actual experience and expectation because the most important feature of a recommendation system is ‘individuation’. When the weight of background data increases, the difference between answer providers is eliminated. Thus, the effect of recommendation based on this difference becomes poor. Figure 3 shows that when $λ = 0.2$ the IM algorithm works at its best.

The results of the recommendation algorithm using PLSA to construct the answer provider interest model are shown in Table 4. The table shows that the IM algorithm improves in terms of precision and recall compared with the BM algorithm, primarily because the latter adopts the proportion of the word frequency sum of answer provider responses instead of the overall responses. The scarcity of the answer data and the prevalence of spam cause inaccuracy in answer provider interest learning, resulting in the low validity of the recommendation results.

Table 4.

Recommendation results of the BM and IM algorithms

Model	MAP	MRR	R–P@5	R–P@10
BM	1.55%	2.89%	1.61%	2.33%
IM ( $λ = 0.2$ )	1.55%	3.04%	1.68%	2.25%

To understand the effect of answer provider expertise on the recommendation results, answer provider expertise was taken into consideration in conjunction with the answer provider interest in the question, resulting in the recommendation algorithm [IM-expertise algorithm, Equation (9)]. The experimental result is shown in Figure 6.

Figure 6.

Recommendation results of IM-expertise.

After the incorporation of the answer provider expertise factor, the accuracy of the recommendation result further improves. When $α = 0.4, s$ the algorithm achieves the best result, and at this point, its MAP and MRR improve by 20.1% and 29.7%, respectively, compared with IM. However, the recall is somewhat lower. This result may be attributed to the fact that, after incorporating the answer provider expertise factor, the algorithm tends to recommend answer providers who possess expertise in a given topic. Thus, the answer provider who provides a high quality answer obtains a higher rank (for example, the rank of answer providers who actually addressed the question with high answer adoption rates increases in the experiment), which leads to the increase in the MAP and MRR values of the algorithm. However, because numerous actual answers are available, and the probability that these answer providers all have high adoption rates is very low, previous actual answers that are recommended based on the answer provider interest model are no longer recommended once our algorithm considers the quality factor. The algorithm tends to recommend answer providers with expertise in the relevant topics. This is why a decline in recall value is observed. The analysis above shows that the answer provider expertise factor plays a positive role in improving algorithm effectiveness. When recommending answer providers, answer quality should be considered and expertise in relevant topics should be recommended to improve question-solving efficiency and reduce the number of invalid answers in the system.

Upon a comprehensive view (Table 5), the IM-expertise algorithm improves MAP and MRR by 20.1% and 29.7%, respectively, compared with the IM model at the expense of the decline in recall (R–P@5, R–P@10 reduced by an average of 6%). This indicates that a comprehensive recommendation algorithm achieves better results in recommending answer providers.

Table 5.

Recommendation effect of IM and IM-Expertise

Model	MAP	MRR	R–P@5	R–P@10
IM ( $λ = 0.2$ )	1.55%	3.04%	1.68%	2.25%
IM-Expertise ( $α = 0.4$ )	1.86%	3.95%	1.63%	2.04%

5. Conclusion

This article presented an algorithm that recommends answer providers, with comprehensive consideration for number of factors such as answer provider interest and expertise. The algorithm is aimed at reducing waiting time and helping answer providers in the Q&A community address questions that they are interested in. The experimental results for the Baidu Knows Q&A community shows that significant enhancement in the accuracy of the recommendation algorithm can be achieved when the answer provider expertise factor is taken into consideration.

Further research work will involve the incorporation of answer provider availability and answer provider load balance into the current algorithm because users/answer providers can be highly active for a certain period, but may gradually rarely participate in asking/answering questions. By analysing answer provider availability, the probability that answer providers will address recommended questions can be estimated within a certain period, thereby preventing the system from recommending inactive answer providers. In addition, answer provider recommendation can be balanced to reduce the workload of some active answer providers, possibly further improving the efficiency with which questions are addressed.

Footnotes

Acknowledgements

This work is supported by the National Key Technology R&D Program (no. 2008BAH24B03) and the National Natural Science Foundation of China (no. 61003254) and Zhejiang Provincial Natural Science Fund of China (no. Y1080130) and the Fundamental Research Funds for the Central Universities.

Notes

References

Jeon

Croft

Lee

. Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on information and knowledge management. Bremen, Germany: ACM, 2005, pp. 84–90.

Wang

Ming

Chua

. A syntactic tree matching approach to finding similar questions in community-based qa services. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston, MA: ACM, 2009, pp. 187–194.

Cong

Wang

Lin

Song

Sun

. Finding question–answer pairs from online forums. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. Singapore: ACM, 2008, pp. 467–474.

Wang

Feng

Zhang

. Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston, MA: ACM, 2009, pp. 179–186.

Zhang

. G-Finder: routing programming questions closer to the experts. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications. Reno/Tahoe, NV: ACM, 2010, pp. 62–73.

Zhou

Cong

Cui

Jensen

Yao

. Routing questions to the right users in online communities. In: Proceedings of the 2009 IEEE international conference on data engineering. IEEE Computer Society, 2009, pp. 700–711.

Guo

Bao

. Tapping on the potential of Q&A community by recommending answer providers. In: Proceedings of the 17th ACM conference on information and knowledge management. Napa Valley, CA: ACM, 2008, pp. 921–930.

Hofmann

. Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. Berkeley, CA: ACM, 1999, pp. 50–57.

Ponte

Croft

. A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. Melbourne, Australia: ACM, 1998, pp. 275–281.

10.

Chen

Goodman

. An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th annual meeting of Association for Computational Linguistics. Santa Cruz, CA: Association for Computational Linguistics, 1996, pp. 310–318.

11.

Jelinek

Mercer

. Interpolated estimation of Markov source parameters from sparse data. In: Proceedings of the workshop on pattern recognition in practice. Amsterdam, the Netherlands, 1980, pp. 381–397.

12.

John

Chua

Goh

. What makes a high-quality user-generated answer?IEEE Internet Computing2011; 151: 66–71.

13.

Zhu

Bernhard

Gurevych

. A multi-dimensional model for assessing the quality of answers in social Q&A sites. Technical report TUD-CS-2009-0158, UKP Lab, Technische Universität Darmstadt, 2009.

14.

Yaqian

. Maximum entropy method and its applications in natural language processing [Phd thesis]. 2004: Shanghai: Fudan University.

15.

Jeon

. A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. Seattle, WA: ACM, 2006, pp. 228–235.