Abstract
Recent years have witnessed a surge of research on the identification of key users in online communities. However, seldom research has focused on their knowledge diffusion capabilities. The purpose of this study is to propose a new measure to find key users who perform well in knowledge diffusion in order to promote users’ active participation in Q&A communities. In particular, this article develops an improved measure consolidating both users’ structural hole and knowledge diffusion capability and evaluates its performance through a field study involving 230,000 users and more than 132 million network relations of users. Our results show that our proposed measure can be used to detect key users who occupy structural holes’ advantages in social networks. In addition, key users detected by our proposed measure generally perform well on nearly all dimensions of knowledge diffusion capability compared with other measures of key users. Our study entails important theoretical and practical contributions.
1. Introduction
Nowadays, social media have become important platforms for people to share and diffuse knowledge. The strong chain of relationships in social media helps people share knowledge widely and provides an efficient approach for knowledge diffusion. In particular, Question and Answering (Q&A) communities, such as Stack Overflow, Quora and Zhihu, have received wide attention of knowledge management researchers and practitioners. In Q&A communities, users are no longer passive knowledge recipients, but active knowledge disseminators. They are free to publish and disseminate knowledge, and their opinions are affected by friends in the network. Although these Q&A communities attract a large number of users, only a small part of them are active to provide a large amount of high-quality knowledge. Finding these key users (defined by their influence [1]) and evaluating their knowledge diffusion ability are critical for the sustainable development of Q&A communities.
To identify key users is not a trivial task, because criteria of key users are diverse. It is not possible to find a universal index that best quantifies users’ importance in every situation [2,3]. At present, the methods for detecting key users can roughly be divided into system science methods that use connectivity of network to reflect some functional integrity of the system [4,5], social network methods in which nodes’ importance is equivalent to the nodes’ connection with other nodes [6–8] and information diffusion methods based on information diffusion theory and epidemic models [9–14]. However, seldom study has focused on the measure of knowledge diffusion capability of users in Q&A communities to detect key users.
Therefore, this article targets to propose and evaluate a new measure to detect key users based on Burt’s structural holes theory and users’ knowledge diffusion capabilities. First, this article reviews different methods for detecting key users. Key users are defined by means of their information advantage in Q&A communities according to structural holes theory. Second, this article attempts to discover key users who perform well in knowledge diffusion. Users’ knowledge diffusion capability is related to not only their network relation, but also their behaviour and feature information in Q&A communities. Accordingly, this article proposes an improved measure consolidating both users’ structural hole and knowledge diffusion capabilities and evaluates its performance in the context of the Zhihu Q&A community.
2. Related work
There is a surge of research on Q&A communities due to their importance for knowledge diffusion and open-ended complex problem solving [15–17]. Popular topics include identification of authorities [18,19] and topical experts [20] based on classical algorithms such as PageRank and Hyperlink-Induced Topic Search (HITS), which are widely used for finding key users in social network. We review these methods in the following subsections.
2.1. Methods for detecting key users
To detect key nodes is the focus of social network research. For the question of ‘which nodes are the core or the most important?’, the concept of centrality has received extensive attention [21]. Generally speaking, users with high centrality have a better advantage to gain new knowledge and information [22]. The methods for detecting key users based on centrality mainly include degree centrality (DC) [23], eigenvector centrality [24], PageRank [25], HITS [26] and betweenness centrality [27].
Besides centrality, influence maximisation is also used to discover key users [28]. The purpose of influence modelling is to influence and control the propagation behaviour in social networks, which is usually used to control the spread of infectious diseases. For relation network formed by user friends in Q&A communities, key users are the smallest set of nodes that can maximise such influence.
Domingos and Richardson [29,30] first described the problem of influence maximisation. Carnes et al. [1] studied maximising influence under multi-information propagation. In addition, Goyal et al. [31] focused on how to find the smallest set of sources of propagation and compute individual influences. Typical influence maximisation models include the independent cascade model (ICM), linear threshold model (LTM) and epidemic model. Morone and Makse [32] proposed a collective influence (CI) algorithm to bring the problems of minimising the fraction of inactive nodes and maximising the spreading over the network into the same framework of optimal percolation, which can find a much smaller set of optimal influencers. Unlike the DC and PageRank, it discovers some influencers with weakly connected nodes.
In addition to the above two categories of methods, the user who occupies structural holes is also a kind of key users [33]. The structure hole is used to describe the non-repetitive relationship between contacts. Access to structural holes can bring control benefit and information benefit, so those users have more chances to get valuable information. Burt has proposed several methods to measure structural holes [34], including Constraint (CS), Hierarchy (HA), Effective Size (ES) and Efficient (EC), in the context of employees in enterprises. More recently, a new measure, N-Burt, which optimised the Burt’s Constraint, was proposed to find the key nodes in the online social network [35]. This new measure considers the impact of the topology of the neighbour nodes and else other connected nodes. In summary, methods for detecting key users can be divided into three categories: centrality-based methods, influence maximisation–based methods and structural holes–based methods. Different categories of methods for detecting key users may generate different results, and accordingly we need to explore and evaluate their knowledge diffusion performance in Q&A communities.
2.2. Knowledge diffusion in Q&A communities
Recently, there has been a tremendous interest in knowledge diffusion in online communities, especially in Q&A communities, the primary mission of which is sharing knowledge. Knowledge diffusion studies in Q&A communities mainly employ complex network analysis and behavioural research method.
Complex networks generated in the process of knowledge diffusion in Q&A communities share some common characteristics. Two of them are Scale-Free [36] and Small World [37]. Studies on knowledge diffusion based on complex network focus on analysing the effects of network structure and nodes on knowledge diffusion. According to the theory of transmission dynamics, knowledge diffusion can be seen as a dynamic process. Cowan and Jonard [6] studied the relation of network structure and knowledge diffusion and how different network structures affect knowledge diffusion. Subsequently, more research on knowledge diffusion is conducted based on Cowan’s model. For example, Lin and Li [38] studied knowledge diffusion under Scale-Free network. However, these studies generally emphasise on information flow or the overall network structure, while ignoring users’ behaviour information in Q&A communities.
The typical knowledge behaviours in Q&A communities are in the forms of asking and answering questions. Chen and Kao [39] differentiated questions being easy or hard and studied how such difference affects Q&A communities’ service. Lou el al. [40] proposed that group membership could directly enhance individual users’ knowledge contribution in Q&A community. Lou et al. [16] studied the quantity and quality of knowledge in Q&A community by structural equation model and self-determination theory. Quantity and quality have been recognised as two vital aspects of knowledge contribution in Q&A communities. Low-quality, misleading answers can lead users to distrust and even withdraw from Q&A communities [41]. The viability and success of Q&A communities depend on users which voluntarily contribute not only a large amount of knowledge, but also knowledge with high quality [16]. However, other aspects of knowledge diffusion, such as span and timeliness, are less studied.
In addition, behavioural research generally studied the influence factors of knowledge diffusion in Q&A communities relying on subjective surveys and questionnaires [32,35,42–44] without full considerations of objective measures on users’ knowledge behaviour. Therefore, in this article, we propose an improved objective measure of V-Constraint (VC) based on social network analysis and evaluate key users’ knowledge diffusion capability in terms of quality, quantity, span and timeliness for different key user detection methods.
3. The proposed measure
In this section, we propose and evaluate our improved measure, VC, of key users in Q&A communities in order to mitigate the limitations of Burt’s Constraints in large-scale social network sites and consider the contributions of users’ knowledge diffusion capability.
3.1. VC
The most important measure based on Burt’s structural holes is Constraint. Constraint can help judge the criticality of nodes. However, Burt usually detects structural holes of the relationship networks in enterprises and assumes that a player spends time and energy on the players who have a direct connection with him or her. Nevertheless, in Q&A communities, friendship means a strong tie between two nodes. Thanks to the prevalence of Internet and social software, it is easy for a user to keep contact with other user nodes who take advantage of his or her friends, and then to spend his or her time and energy on those user nodes. Taking Facebook or Twitter as an example, there are two users who are not friends or do not follow each other, but a user can browse to other users’ information easily through his or her friends’ shared or retweet information. Those investments of time and energy should also belong to the investment of users who spend on their social networks.
As we can see from Figure 1, the dashed area is the figure that Burt used to describe how to calculate Constraint in equation (1), and nodes E and F have the same neighbour node A and the purple node i
where

Computing the constraint.
According to equation (1)
that is to say, node i should spend the same time and energy on nodes E and F.
However, it may not be the case in the illustrated scenario in Figure 1. Nodes E and F connect to a different number of nodes, and node i can access to the adjacent nodes G, H, L and M through nodes E and F. Specifically, in Figure 1, node F has more adjacent nodes than node E, and those adjacent nodes do not have relation with node i, and node F is more likely to share new things and information with node i. In other words, node F has better knowledge diffusion capability than node E. Then node i will prefer to spend more time and energy in maintaining relationship with node F. It is a pity that those situations cannot be shown in Burt’s Constraint.
Therefore, we introduce
where
where
where
where
We can get the values of
where
3.2. Preliminary evaluation
In this section, we conducted a preliminary benchmark experiment to evaluate the proposed measure and to demonstrate its performance to detect key users compared with classical Burt’s measures.
We chose a social network dataset, Netscience, to carry out the experiments. The dataset was provided by Newman [45] in May 2006 and it contained a network of scientists who together co-authored the network theory and experiment. The descriptive statistics of the network are shown in Table 1.
Descriptive statistics of the Netscience network
We identified the top 10 key users’ nodes occupying structural holes in the Netscience dataset using the proposed measure VC, as well as two classical measures, Burt and N-Burt. Table 2 lists the results.
The top 10 key users in Netscience detected by different measures
We can draw from Table 2 that the ranking result of N-Burt is slightly different from the ranking result obtained by Burt, but the ranking result of the measure VC proposed in this article is precisely the same as Burt’s ranking result. Meanwhile, the measure VC also takes the influence of the topological structure that the neighbour nodes and the other nodes are connected into account. In order to display the key users identified by our proposed measure VC in Table 2, Figure 2 depicts the network topology of high-degree nodes of Netscience. Furthermore, observing the scientists corresponding to the top 10 nodes, they are actually several prominent scholars in the fields of complex networks, systems engineering and physics. Therefore, we are confident that the proposed measure can be used to detect key users who occupy structural holes’ advantages in social networks.

The network topology of high-degree nodes of Netscience.
4. Knowledge diffusion capability
In this study, we are also interested in evaluating knowledge diffusion capabilities of key users detected by our proposed measure VC. Therefore, in this section, we built an integrated evaluation model of knowledge diffusion capability from angles of quantity, quality, span and timeliness. Figure 3 depicts the dimensions and metrics involved in the model.

The evaluation model of knowledge diffusion capability.
4.1. The evaluation model of knowledge diffusion quantity
In this article, we define an article/answer issued by a user and then pushed to another user as a unit of knowledge. If more users browse a particular user’s articles/answers, then the user diffuses more knowledge.
We simplify the network relation that user 1 issues an article/answer that is commented/shared/up-voted by user 2 to the relation of user 1 and user 2, which is shown in Figure 4. The direction of arrow reflects the direction of knowledge diffusion.

The simplified network relation of users and their behaviours in Q&A.
After user 2 comments/shares/up-votes the article/answer issued by user 1, the behaviour will be shown in the homepage of followers of user 2, so that the followers can browse the article/answer of user 1 easily. In other words, knowledge transferred by the articles/answers of user 1 can be diffused to user 2 and followers of user 1 and user 2, rather than just diffused by relation network of users. In that way, the knowledge diffusion quantity of an article/answer of one user can be described in equation (7)
where
Therefore, the total knowledge diffusion quantity of user i in a certain time can be described by the summation of equation (8) in terms of all articles/answers of user i in the certain time. m is the number of articles/answers of user i in the certain time
4.2. The evaluation model of knowledge diffusion quality
In this article, we adopt a general weighted-average framework to measure knowledge diffusion quality of users by means of the approval degree of the articles/answers issued by users. In Q&A communities, we can express our approval degree to the user’s knowledge by features of up-voting, thanks and so on. It is feasible to measure knowledge diffusion quality of users through these data, which can be described in equations (9) and (10)
where
4.3. The evaluation model of knowledge diffusion span
The span of users’ knowledge diffusion can be evaluated by different types of their knowledge topics. For example, knowledge topics of article/answers can be described by question tags, which are chosen from a list by users and are displayed at the top of question pages. These tags are helpful to organise questions into topic communities and to facilitate users interested in a topic to browse relevant articles. We obtain these tags when crawling users’ data. After that, we build a tag co-occurrence matrix for all users
where
4.4. The evaluation model of knowledge diffusion timeliness
As for the knowledge issued by users in Q&A communities, no matter article or answer, we assume that the longer the published time is, the smaller the timeliness is. Generally speaking, the longer the published time is, the greater the chance of being spread is. However, there are some articles or answers that have attracted great attention in a short time, which can reflect the user’s knowledge having good timeliness. Analogously, we intend to read recent papers which have been widely cited when they were just published in a short period of time. In this study, we evaluate the knowledge diffusion timeliness of users by taking the time of knowledge published as weight.
To facilitate the analysis, we set a time origin for Q&A communities and divide out a time interval according to the month or quarter. We set
The knowledge diffusion timeliness of user i in Q&A communities can be expressed as in equation (12)
where
5. A field study
In this section, we conduct a field study to detect key users and then evaluate their knowledge diffusion capability in the Zhihu community, a famous Q&A community in China. Zhihu community grows rapidly since its debut in 2013 and it has more than 160 million users in 2018. It gathered a large number of domain experts and generated far-reaching impacts on Q&A community in China. More and more people have accustomed to find answers in Zhihu community when they encounter difficult problems covering every aspect of their lives. The experiment procedure is outlined in Figure 5.

The experimental procedure.
5.1. Data
From April to November 2016, we crawled data of users by a crawler software deployed on a server from zhihu.com (https://www.zhihu.com/). By the end of October 2016, we have crawled information about more than 230,000 users and more than 132 million network relations of users. The total size of dataset is 1.13 GB. The descriptive statistics of the crawled network are shown in Table 3. In this study, we focused on three kinds of data, that is, the network relation of users, personal information of users and answers information:
Network relation of users. We choose the originator of zhihu.com– Jixin Huang (https://www.zhihu.com/people/jixin/) as the seed user, because he has been very active from the beginning of establishment of the site, and he follows a lot of influential and active users from various knowledge fields. We crawl his followers and users who were followed by Jixin Huang’s followers one by one. We do not crawl their followers further because there are lots of long tail users which are beyond the discussion of this study.
Personal information of users. It includes basic personal information (e.g. location and vocation) and personal achievements (e.g. number of followers and number of up-votes, thanks and sharing).
Answers information. To thoroughly evaluate the knowledge diffusion capability of users, we choose 100 questions with a largest number of up-votes and 100 questions with the middle number of up-votes for every user. Because some users answer thousands of questions, crawling all answers of them is unlikely, due to the technical and time constraints. As for less than 200 questions, we crawl all answers of those users. The main data items include question title, answer content, number of comments, up-votes and thanks of the answer and the list of users who comment and up-vote the answer.
The descriptive statistics of the crawled network from Zhihu.com
5.2. Results and discussion for detecting key users
In view of the above three types of key users’ detection measures, we choose four representative measures, that is, DC, PageRank, CI and VC (of which
Results of key users detected by different measures
CI: collective influence; VC: V-Constraint.
We also calculate the top 1000 key users by each measure. Venn diagram is used to describe the relations of four groups of key users. As shown in Figure 6, different measures generate different key users due to different focuses of those measures. The intersection of the four groups of key users consists of 187 users, accounting for 7.7% of the total detected key users.

The Venn diagram of the four groups of the detected top 1000 key users.
Many key users with a higher degree of centrality also appear in the key users found by the PageRank method, and most of the key users found by these two methods work for the Internet, with a much higher number of males than females. However, due to the difference between the design ideas of the CI and PageRank methods, the key users discovered by CI are different from those by the prior two methods, which are also in line with the characteristics of key users detected by the CI method. Although their numbers of fans are not so prominent, they are weakly connected nodes in the network, and the connectivity will be greatly reduced without them in the Q&A community. The key users found using the VC method proposed in this article are users who play a bridge role in the interconnection and operation of the network and users with high activity in the network.
5.3. Results and discussion for evaluating the knowledge diffusion capability
Drawn from the basic information of key users, the key users detected in Zhihu community are either founder of the community or social celebrities. In this section, we further evaluate their knowledge diffusion capability based on the model proposed in section 4. Besides, we make use of Spearman coefficient of rank correlation [47] to analyse correlations of the ranking of key users and their knowledge diffusion capability. The result of correlation analysis of the top 1000 key users detected by each measure and their knowledge diffusion capability is shown in Table 5. Figure 7 illustrates their Spearman correlations at top k users for the four dimensions of knowledge diffusion capability.
Rank correlations of the top 1000 key users across dimensions of knowledge diffusion capability
CI: collective influence; VC: V-Constraint.

Rank correlations at top k users across dimensions of knowledge diffusion capability: (a) quantity, (b) quality, (c) span and (d) timeliness.
Figure 7(a) shows that key users detected by DC correlates more strongly with their knowledge diffusion quantity than the other three methods, and the next is VC. High negative correlation means that the smaller the ranking of key users is, the stronger their abilities of knowledge diffusion quantity are. However, surprisingly, CI’s results are positively correlated. It seems that more senior users do not necessarily have greater knowledge diffusion quantity. It can be speculated that although they ensure the connectivity of the network topology, they are not as good in the knowledge diffusion quantity.
Figure 7(b) shows that users found by the VC method are highly negatively correlated with their knowledge diffusion quality, and the next is CI. However, the key users found through DC performed better in terms of the knowledge diffusion quantity, but not so prominent in the knowledge diffusion quality. There is a long tail phenomenon in the users’ knowledge diffusion quality.
Figure 7(c) shows that users with high knowledge diffusion span may have a high degree of centrality, but the correlation between them is not prominent, whereas the key users found by the VC method have a significant correlation with knowledge diffusion span.
Figure 7(d) shows that the rank correlations with knowledge diffusion timeliness are all small, which may be attributed to the fact that these key users have accumulated their influence when the community was just established. Compared with the early phrase, they had not been as active as that time, so that their performances on knowledge diffusion timeliness are not so good. Managers of Q&A communities should pay more attention to this phenomenon. Some incentive mechanisms can be used to promote users’ active participation in Q&A communities.
In summary, key users detected by VC generally perform well on nearly all dimensions of knowledge diffusion capability except timeliness. The performance of different top k key users detected by PageRank and CI fluctuates, which means that PageRank and CI are suitable for discovering authority nodes, while underperforming on finding users who are good at knowledge diffusion. In other words, key users detected only by their network relations may not be outstanding on knowledge diffusion. It entails important theoretical and practical implications. Theoretically speaking, it is inappropriate for studies on knowledge diffusion in Q&A communities entirely rely on network relation. Users’ behaviour information should be seriously considered. Practically speaking, managers of Q&A communities can use the proposed measure of VC to detect key user and adopt the evaluation model of knowledge diffusion capability for expert recommendation and promote key user’s active participation in Q&A communities.
6. Conclusion and future work
In this article, we propose a new measure to detect key users and evaluate knowledge diffusion capabilities of detected users in Q&A communities. Particularly, as for key users who occupy structural holes, we propose an improved measure, that is, VC, which is suitable for Q&A communities. We employ a benchmark experiment to evaluate the performance of the measure VC to detect key users. In addition, we conduct a field study involving 230,000 users and more than 132 million network relations of users to evaluate knowledge diffusion capabilities of detected key users. Results show that key users detected by VC do well in nearly all dimensions of knowledge diffusion.
There are several limitations for this research. In this study, we did not consider key users in dynamic networks. In future, we plan to study how the influence of these key users varies over time. Besides, we intend to include more datasets of Q&A communities to enhance the generalisability of this study and to examine whether the findings reported in this article are similar to other communities.
Notwithstanding these limitations, this study entails important theoretical and practical contributions. Our results indicate that network relation can describe knowledge diffusion to a certain extent, but users’ behavioural information and features should also be considered to study knowledge diffusion in Q&A communities. The proposed measure of VC constitutes an important theoretical extension of the current key user detection methods.
Q&A communities can utilise the proposed measure to improve their user recommendation services. In particular, the practical contributions of this study are threefold: for users of Q&A communities, they can follow key users with strong ability of knowledge diffusion to find the best answers and improve their questioning and answering experience; for operators of Q&A communities, they can find key users who excel at knowledge diffusion and accordingly promote their active participation to boost the development of the whole Q&A communities; for regulators of Q&A communities, they can pay close attention to key users with strong knowledge diffusion capability to effectively manage public opinions. In a word, the study is of great significance for sustainable Q&A communities.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The work described in this paper was partially supported by grants from the National Natural Science Foundation of China under Grant Nos. 71871005, 71471011 and 71531001.
