Abstract
The proliferation of knowledge-sharing communities has generated large amounts of data. Prominent examples of how user-generated content can be harnessed include IBM’s Watson question answering sytem and Apple’s Siri, the question answering application in iPhones. Facing such massive data, user authority ranking is important to the development of question answering and other e-commerce services. In this study, we propose three probabilistic models to rank the user authority of each question. Compared to the existing approaches focused on the user relationship primarily, our method is more effective because we consider the link structure and topical similarities between users and questions simultaneously. We use a real-world dataset from Zhihu, a popular community question answering website in China to conduct experiments. Experimental results show that our model outperforms other baseline methods in ranking the user authority.
Introduction
Question Answering (QA) systems involve different techniques for question analysis, answer extraction, answer presentation, and so forth. The successes of IBM’s Watson and Apple’s Siri have highlighted QA researches and commercialization opportunities [2]. For example, Community Question Answering (CQA) websites including Quora (www.quora.com), Stackoverflow (stackoverflow.com), Yahoo! Answer s (answers.yahoo.com) and Zhihu (zhihu.com), are extremely popular in recent years. On CQA websites, users can raise their questions, read answers of these questions, and answer the questions which are posted by others. It is interesting that many CQA websites have strong social characteristics, which are different from the traditional ones such as Baidu Knows (zhidao.baidu.com). Take Zhihu as an example, the content read by user u in his/her home page depends on the people that u has followed (i.e., “friends” of u). Therefore, a user is able to see the questions from all his/her friends, their answers and their “liked” answers. By allowing the interaction between users, CQA websites have accumulated millions of questions and their answers over time [3].
Although the CQA website such as Yahoo! Answer s has attracted many users to contribute knowledge, it brings several new challenges to provide high quality services [4]: (1)
Traditional methods of ranking the user authority are based on the relationship between users primarily [5, 6], which are insufficient to get a personalized ranking list for each question. For instance, a movie star may have many followers and achieve a high ranking if we rank the user authority based on the relationship between users only. However, he/she may not be appropriate to answer the questions about computers. We here propose a new framework based on the link structure and topical similarities between users and questions, which can get a personalized user authority ranking for each question. In our framework, all answers posted by a user are considered as a document, the description and all answers to a question are also treated as a document; we then apply latent Dirichlet allocation (LDA) [7] to extract topics from both users and questions. After generating these topics, we can measure the topical similarities between questions and answers, and rank users by the relationship between users and the topical similarity. The main contributions of this work are summarized as follows. First, unlike the user authority ranking using the topic-sensitive transition matrix [1], we propose two methods of ranking the user’s authority by a topic-independent matrix. Second, we further compare the above models in terms of both performance and efficiency.
The rest of the paper is organized as follows. In Section 2, we introduce the related work on user authority ranking, similar question recommendation, topic-sensitive expert recommendation and question answering system. In Section 3, the proposed framework is described in detail. In Section 4, we illustrate the employed dataset, evaluation metrics and experimental results. We conclude our paper and make a future plan in Section 5.
Related work
In this section, we firstly focus on studies on user authority ranking by relationships, similar question recommendation, and topic-sensitive expert recommendation, which will shed light on topic-sensitive user authority ranking for CQA websites. To obtain a holistic view of our research question in QA systems domain, we also give a general review of the existing techniques and models for QA systems.
User authority ranking by relationships
The existing ranking algorithms computed the user authority on CQA websites by the relationship between users primarily. Bouguessa et al. [8] proposed a method to find the expert based on the best answer that the user has posted. Jurczyk and Agichtein [9] applied HITS algorithm to rank the user authority by the followed-following relationship. Zhang et al. [10] proposed an algorithm based on the users’ specialty. Although the algorithms that analyzing the relationship between users have achieved the desired results, there are some challenging problems that are difficult to be solved by these algorithms, e.g., the expert discovered could not give the satisfied answer to the field he/she is not skilled in. Thus, some other algorithms based on topics were proposed. Guo et al. [11] proposed an algorithm to explore the similarity between users and askers by the tags of users. Liu et al. [12] proposed a probabilistic language model to predict the best answerer of questions.
Similar question recommendation
For similar question recommendation, Jeon et al. [13] proposed a statistical approach that explored the semantic features to measure the question similarity. Li et al. [14] designed a strategy via machine translation rather than the simple cosine similarity approaches. Hao et al. [15] developed a pattern-based algorithm by seed patterns and a semi-supervised approach. Wu et al. [16, 17] proposed an approach that exploited both the user interest and feedback, in which the historical data was utilized to derive the user interest. These studies showed the benefits of using topic model over the traditional methods. Our method is similar to some of these works in that we explore the semantics of questions using topic models. However, our method also considers the link structure of users, which was omitted in most similar question recommendation methods.
Topic-sensitive expert recommendation
Recently, some algorithms of expert recommendation have been proposed to consider both the relationship between users and the similarity of topics. TwitterRank [18] is a typical algorithm developed to estimate the influence of Twitter users based on the followed-following relationship and topical similarities. Zhou et al. [19] proposed a TSPR algorithm for expert recommendation on CQA websites. The algorithm first employed LDA to extract the topics of users, and then recommended experts by the number of answers a user posted and the similarity between the user and the asker [20]. Zhao et al. [21] designed a method to model experts and topics jointly by incorporating each user’s contribution dynamically. Chen et al. [22] developed a rating system of user reputation based on user comments. Pal et al. [23] proposed an algorithm based on Gaussian mixture models to identify topical authorities in microblogs. Liu et al. [24] proposed a new topic model for expert recommendation on CQA websites.
Different from these algorithms, our method considers both the link structure of users and the topical similarity between users and questions. To the best of our knowledge, it is the first time to consider the topical similarity between users and questions for user authority ranking on CQA websites.
Question answering system
QA systems have been extensively studied in recent years. In 2007, IBM Research undertook a challenge to build a computer system named Watson that could compete at the human champion level in real time on the American TV quiz show, Jeopardy [25]. Watson has been performed at human expert levels in terms of precision, confidence, and speed at the Jeopardy quiz show, the results of which strongly suggested that DeepQA is an effective and extensible architecture in the field of QA [25]. To alleviate the problem that the typical triplet does not represent a faithful representation of the semantic structure of the natural language question, Unger et al. [26] presented a novel approach that relies on a parse of the question to produce a SPARQL template that directly mirrors the internal structure of the question.
More recently, with the rapid growth of open, distributed and structured semantic data on the Internet, how to address the question answering task over RDF data has been a popular research topic [27]. Marx et al. [28] presented a modular and extensible open-source question answering framework to tackle the problems such as imposing minimal hurdles to their users, retrieving the desired data, and developing and evaluating question answering systems. Furthermore, Zou et al. [29] proposed a systematic framework to answer natural language questions over RDF repository from a graph data-driven perspective, which adopted a semantic query graph to model the query intention in the natural language question in a structural way. The significant contributions of this research were (i) the RDF question answering has been reduced to the general subgraph matching problem; and (ii) the ambiguity of natural language questions has been resolved at the time when matches of query are found [29].
Another popular research direction is to generalize the conventional text-based question answering to multimedia objects (named as visual/image question answering). Specifically, given an image and a natural language question about the image, the task is to provide an accurate natural language answer [30]. Ren et al. [31] proposed to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Furthermore, Malinowski et al. [32] attempted to address this problem by combining the Convolutional Neural Networks (CNN) for visual recognition and Recurrent Neural Networks (RNN) for sequence modeling.
Proposed models
The proposed Topical User Authority Ranking (TUAR) models are to rank the user authority for each question, by exploiting the relationship between users and the topical similarities between users and questions. Conventionally, the more “likes” a user receives, the higher authority he/she will achieve, which is similar to the ranking approach of websites. It is important to note that the weight of each “likes” given to an answer is different from the others. Assume that both user B with higher authority and user C with lower authority give “likes” to user A, the “likes” from user B is often more powerful to improve the authority of user A than user C. Therefore, we compute of influence of users based on the iteration method of PageRank [5]. In addition, as each question may involves various topics and users are familiar with different topics, we take the similarities among users and topics into consideration.
Topic extraction of users
We apply LDA [7] to perform the topic extraction of users from the combined text of their answers. LDA is an unsupervised topic extraction model, which is based on the bag of word assumption. It treats each text as a vector whose characteristic of each dimension is the number of a word that appears in the text. Each text can be expressed as the probability distribution of a series of topics and each topic can be expressed as the probability distribution of a series of words. LDA is a nature model for topic extraction of long text, in which the probability distribution of topics for each text and words for each topic can be estimated by the Gibbs sampling algorithm [33]. The detailed process of user topic extraction in our TUAR models is as follows: Firstly, we consider all answers which are posted by each user as a text, thus the one-to-one mapping between a user and its text is established. Meanwhile, all answers to a question are treated as another text. There is also a one-to-one mapping between a question and its text; Secondly, we use LDA to train the text of all users and questions, in which, the probability distribution of the topics corresponding to each text θ and probability distribution of the words corresponding to each topic φ can be estimated; Thirdly, we keep θ unchanged and carry out the Gibbs sampling only with the input of user text; Finally, we get a new φ
u
and let UZ = φ
u
, which represents the topic distribution of a text. The definition of UZ is given below.
User authority transition matrix
In the extant studies on social networks, the followed-following relationship between users [18] and the number of answers [19, 24] can be employed to generate the user authority transition matrix via the iterative approach [34]. Different from the traditional social networks, the approval mechanism is introduced into the CQA website such as Zhihu. Thus, we can generate the user authority transition matrix based on the following principles. First, the more “likes” a user receives, the higher authority he/she will achieve. Second, the weights of a “likes” given by different users are varied, i.e., the approval by the expert in a certain field is more powerful to improve the authority of the user whom he/she give “likes” to in this field. Third, the “likes” given by a user who seldom makes approval achieves a larger weighting value than those who often deliver “likes”. Therefore, we consider the user’s authority ranking as a Markov Chain [34], and the transition matrix is shown as follows:
Another way to measure the similarity between user i and user j is to employ the normalized Euclidean distance between UZ
i
and UZ
j
, as follows:
We also use a topic-sensitive transition matrix to rank the user’s authority [1], as follows:
The similarity function in the topic-sensitive transition matrix is:
In Section 3.2, we get the user authority transition matrix iteratively. Next, TUAR models take the approval relationship between users and the topical similarity into account to compute the authority ranking of users in topic z:
After convergence, we get the final result of the user authority ranking for each topic.
We consider all answers to a question as a document. Then we apply LDA trained in Section 3.1, i.e., keep θ unchanged and carry out the Gibbs sampling with the input of question document again. Finally, we get φ q and let QZ = φ q . The definition of QZ is as follows:
User authority ranking for each question
Since we get matrix QZ (the topic distribution of every question) and UR (the user authority ranking of every topic), the user authority ranking of each question can be estimated by the following Bayes’s rule:
The previous section introduced the details of our models. In this section, the dataset we used and the experimental results are to be shown.
Dataset
A real-world dataset from Zhihu was employed for the experiment. Zhihu is one of the most popular question answering communities in China. Different from StackOverflow and Yahoo! Answers, we can get all users who “like” an answer from Zhihu, thus the user authority transition matrix can be generated by users’ “like” relationships. According to Equation 1, the user’s authority ranking is modeled as a Markov Chain based on the numbers of “likes”. For instance, the more “likes” user j gives to user i, the higher the influence of user i to user j.
We have collected 576 questions which were raised by 9,043 Zhihu users. The total amount of answers is 209,309, which include all answers of the 576 questions, and all answers which were posted by the 9,043 users. The latter was collected to extract the topics of users, as shown in Section 3.1. The detail process of preparing the above dataset is as follows: For each question, its description, contents of all its answers and the real ranking of all answers were crawled; For each user, the number of friends, followers, answers, “likes” received and the contents of all answers he/she posted were crawled.
We used the Jieba Chinese Text Segmentation (http://github.com/fxsjy/jieba) to perform Chinese word segmentation, and filtered stop words to reduce the vocabulary size.
Parameters
We have several parameters in our models, i.e., the Dirichlet hyper-parameters α and β, the number of topics Z, the damping factor parameter λ used in PageRank. In this paper, we set the Dirichlet priors α = 50/Z, and β = 0.05 as in [35]. We run LDA with 1000 iterations of Gibbs sampling. After trying several different numbers of topics, we empirically set Z = 50. We choose these parameter settings because they give coherent and meaningful topics for our dataset. Table 1 shows top five words of 20 topics generated by LDA.
When computing the user authority ranking of each topic, the damping factor parameter λ is set to be 0.85. The reason of setting λ as 0.85 is that we optimize the parameter in the training set (80% of the entire dataset). That is, we further divide the training data into a tuning set (64%) and a validation set (16%), and obtain the optimized setting of λ through 5-fold cross-validation.
Evaluation metrics
To measure the performance of different methods, two evaluation metrics commonly used in information retrieval and ranking were chosen: Mean Reciprocal Rank (MRR): This index is the multiplicative inverse of the rank of the first retrieved expert for each topic; nDCG@K: This index measures the quality of a recommendation system based on the graded relevance of the recommended entities. It varies from 0.0 to 1.0, with 1.0 representing the ideal ranking of all entities, as follows: , where Q is the set of questions. Mq,j is the j-th expert generated by method M for question q. score (Mq,j) =2v(Mq,j) - 1, and v (Mq,j)is the ground truth score for the expert Mq,j. IdealScore (K, q) is the ideal ranking score of the top K experts for question q.
Comparison with baselines
As stated in Section 3.2, we have proposed two models of user authority ranking using the topic-independent transition matrix. For these two methods, the one of estimating the similarity between user i and user j by the normalized cosine similarity is denoted as In-degree by number of followers ( In-degree by number of “likes” ( PageRank by number of followers ( PageRank by number of “likes” ( Topic-Sensitive PageRank (
The MRR and nDCG@K of different methods are presented in Figs. 1 and 2, respectively, from which we can observe that the proposed TUAR models outperformed other methods for both metrics. Compared to the baseline methods of IDF, IDV, PRF, PRV and TSPR, the performance of TUAR1 improved 7.8%, 9.4%, 3.4%, 2.4%, 28.0% in terms of MRR and 1.9%, 2.8%, 1.0%, 0.8%, 13.1% in terms of nDCG. With respect to the same baselines, the performance of TUAR2 improved 8.3%, 9.9%, 3.9%, 2.9%, 28.6% in terms of MRR and 2.0%, 2.8%, 1.0%, 0.8%, 13.1% in terms of nDCG. Compared to the above baseline methods, the performance of TUAR3 improved 11.1%, 12.8%, 6.6%, 5.6%, 32.0% in terms of MRR and 2.8%, 3.7%, 1.9%, 1.7%, 14.1% in terms of nDCG. The results indicated that it is effective to consider the topical similarity between questions and users when ranking the user authority. Due to that TUAR1, TUAR2 and TUAR3 performed nearly the same in terms of nDCG, we plot the values of TUAR3 in Fig. 2 for clarity.
Comparison with topic-sensitive models
The performance and efficiency of our TUAR models are shown in Table 2, in which U is the number of users, Z is the number of topics, and C is the iteration times. For example, the computational cost of TUAR1 is consisted of two parts. The first one is to extract topics and the second one is to generate the user authority transition matrix.
Although the performance of TUAR1 and TUAR2 are worse than that of TUAR3 in terms of MRR, we can find that the topic-independent ones (TUAR1 and TUAR2) are much faster than TUAR3. The reason is that the topic-independent models only need to compute once for each pair of users when estimating the transition matrix, but the topic-sensitive model need to estimate the transition matrix for each topic. With respect to nDCG, the performance of these models are nearly the same with each other.
Conclusion
Many promising question answering system application areas, including education, health, and defense have been identified [2]. This paper proposed an effective framework to estimate the user authority ranking on CQA social networks, which can integrated with various multimedia primitive features [36–38] for social media organization [39, 40], sentiment analysis [41–44], emotion detection [45–47] and personalization [48–50] in the big data era. We evaluated the framework by the real-world dataset from Zhihu, and the experimental results verified the effectiveness of our models when compared to other existing methods.
In the future, we plan to design a new probabilistic model and try other recent learning models [51–53] for short text topic extraction when there are few answers to a question. The sparsity of content in short documents brings new challenges to topic modeling and user authority ranking. On one hand, the frequency of words, which is quite important to model lengthy text, plays limited discriminative role in short documents. On the other hand, inferring topics and ranking the user authority from large-scale short documents becomes a critical task for many areas. It follows that user authority ranking in short text deserves further research.
Footnotes
Acknowledgments
The research has been supported by the National Natural Science Foundation of China (61502545, 61472453, U1401256, U1501252), the Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase), the Fundamental Research Funds for the Central Universities (46000-31610009), a grant from the Research Grants Council of the Hong Kong Special AdministrativeRegion, China (UGC/FDS11/E06/14), and the Internal Research Grant (RG 30/2014-2015) of the Hong Kong Institute of Education. This work has also been supported, in part, by a Strategic Research Grant (Project no. 7004218), and an Applied Research Grant (Project no. 9667095), both of City University of Hong Kong.
