Abstract
Public interest in cryptocurrencies has consistently risen over the past decade. Owing to this rapid growth, cryptocurrency-related information is being increasingly shared online. As considerable portions of such information in online communities are noise, extracting meaningful information is important. Therefore, judging whose opinion should be considered more important or who the opinion leaders in online communities are is critical. This study analyzed the topics that contain meaningful information, in particular, user groups, by investigating the correlation between topic weights and their price change. The proposed analysis method involves (1) effective classification of the user groups using a hypertext-induced topic selection algorithm, (2) textual information analysis through topic modeling, and (3) the identification of user groups that have a high interest in the Bitcoin price by measuring the correlation between the price and the topics and by measuring the topic similarities between each user group and all users to determine the user group that can effectively represent the entire community. By analyzing the information shared by users, we observed that most users are interested in the price information, whereas users having social influence are not only interested in the price but also in other information.
In the past decade, new currencies have begun to emerge with the development of the Internet. With the introduction of cryptocurrencies, the economic activities among users on the Internet underwent enormous changes. Mining cryptocurrencies using a unique blockchain technology results in an increase in the total quantity of currencies. Bitcoin was first introduced in 2008 (Nakamoto, 2008), and various types of cryptocurrencies that adopted a similar blockchain technology appeared at the onset of the 2010s (Barber, Boyen, Shi, & Uzun, 2012; Bohme et al., 2015; Grinberg, 2012; Wood, 2014). Consumers are now increasingly using cryptocurrencies such as Bitcoin for all types of payments and are trading them in cryptocurrency markets (Barber et al., 2012; Grinberg, 2012; Y. B. Kim et al., 2016; M. Kim, Kang, Park, Choo, & Elmqvist, 2017; Marella, Lindman, Rossi, & Tuunainen, 2017; Reid & Harrigan, 2013).
The price history of Bitcoin until now has shown considerable fluctuations. The Bitcoin price rises within a short period while plunging at an alarming rate. Specifically, the Bitcoin currency started 2017 at the levels of 997 USD per Bitcoin and increased to 19,000 USD in December 2017, potentially creating a profit of almost 1,900% in less than a year. In June 2018, the price fell drastically to 6,000 USD, which corresponds to a loss of approximately 70% since December 2017. This trend cannot be sufficiently explained by standard macroeconomic theories (Aalborg, Molnár, & de Vries, 2018; Gerlach, Demos, & Sornette, 2018; Kristoufek, 2013; Li & Wang, 2017).
Originally, the main function of Bitcoin was to act as a substitute for fiat money; however, there exist differences between fiat money and cryptocurrencies. Fiat money is controlled by a central system, but cryptocurrencies are not. Therefore, although a blockchain is used in various fields such as remittance and Darknet, Bitcoin markets are now often recognized as sites for speculators. These markets, unlike the real money market, are currently and considerably influenced by short-term investors, trend chasers, speculators (Kristoufek, 2013), and large numbers of fraud activities (Moeller, Munksgaard, & Demant, 2017). Therefore, the analysis of these traders, activities, and cryptomarkets becomes important to invigorate cryptocurrencies and the blockchain (Demant, Munksgaard, Décary-Hétu, & Aldridge, 2018; Moeller et al., 2017). In addition, the majority of Bitcoin investors determine their investment strategy on the basis of the scant information they can obtain because there is no perception of fair prices in this market (Aalborg et al., 2018; Kristoufek, 2013) and there is little information on cryptocurrency markets compared to real markets. Therefore, in this study, we focused on the characteristics of online Bitcoin discussion forums, which are a potentially rich source of information.
Most Bitcoin trading activities take place online, and users share information related to Bitcoin at almost no cost in specific online discussion forums. They effectively decide to buy or sell Bitcoins on the basis of this information (Y. B. Kim et al., 2016; Kim et al., 2017; Phillips & Gorse, 2018). Although responses from many users can be found in real time in these forums, public online discourses have a large amount of noise, misinformation, and spam users; thus, the identification of information that is meaningful and the users who provide high-quality information among the vast user opinion data tend to be difficult (Waldherr, Maier, Miltner, & Günther, 2017). Therefore, the analysis of the relationship between user opinions and price changes is important to verify their meaningfulness and extract verified users and documents from communities.
This study focuses on user opinions and identifies the opinion leaders, also known as influencers, using community networks by analyzing which user group’s opinion is highly related to the price change and therefore meaningful. In this article, we propose the analysis method shown in Figure 1. We first construct community networks using the data collected from the Bitcoin forum and effectively classify user groups using the hypertext-induced topic selection (HITS) algorithm based on the information exchange in the Bitcoin online community. After conducting topic modeling for each user group, we analyze which user group demonstrates a strong correlation between the topics from the user group and the Bitcoin price. Finally, we measure the topic similarities between topics from each user group and the topics from all users to determine which user group can effectively represent the entire community.

Overview of the proposed analysis approach. After forming community networks, we (1) divide users into an opinion leader group and a majority user group, (2) conduct topic modeling using documents from each user group, (3) calculate the Pearson correlation between topic weights and the Bitcoin price, and (4) measure the similarities between topics from each group and topics for all users.
The findings of this study based on an analysis of documents written by users in the Bitcoin forum and responses over the past 3 years demonstrate that most users are only interested in the price information, whereas the top users, defined as the opinion leaders in this article, who shared information frequently, show significant interest in diverse information and actively share it with each other.
Related Work
Various events in the Bitcoin market cannot be sufficiently explained by standard macroeconomic theories and this market is characteristically online. Since 2013, there has been considerable research on connecting Bitcoin with various sources of online information to analyze and utilize Bitcoin and its market. Numerous studies have covered a wide range of topics from the trends and various social events related with the appearance of Bitcoin to the characteristics of the Bitcoin price. These studies analyzed user sentiments related to Bitcoin on social media such as Twitter or quantified web search queries on search engines such as Google Trends and analyzed the fluctuations in the price and transaction volume to determine any relation (Cheah & Fry, 2015; Ifrim, Shi, & Brigadir, 2014; Karalevicius, Degrande, & De Weerdt, 2018; Kaminski, 2014; Kristoufek, 2013; Mai, Shan, Bai, Wang, & Chiang, 2018; Phillips & Gorse, 2017; Yelowitz & Wilson, 2015; Zhang, Wang, Li, & Shen, 2018). These studies are more likely to be used by various online sources as material for understanding and analyzing Bitcoin. Despite this, there have been almost no studies seeking to identify users who share information that could be useful. An average analysis of entire groups should comparatively involve noise and is likely to degrade the quality of the analysis. Therefore, finding users and groups that share meaningful information is necessary in order to address this problem.
People with similar interests such as making money and investment gather together, write articles on these topics, and exchange ideas in social communities (Bickart & Schindler, 2001; Bernstein et al., 2011; Kim et al., 2017; Panzarasa, Opsahl, & Carley, 2009). Previous studies on user opinions in online discussion forums and price fluctuation performed sentimental and quantitative analyses or focused on certain phases where the Bitcoin price soared. Several studies attempted to investigate how the Bitcoin price is related to users’ feelings or opinions by analyzing Bitcoin forums (Y. B. Kim et al., 2016; Kim et al., 2017; Y. B. Kim et al., 2015; Phillips & Gorse, 2018). Although Kim’s works form the motive for our research, we mainly focused on finding opinion leaders using network analysis and analyzing different behaviors and interest between opinion leaders and other majority users and not on predicting prices or volumes using causality analysis between the price and the word frequencies. Furthermore, studies on the prediction of the Bitcoin value or trading volume were performed (Alessandretti, ElBahrawy, Aiello, & Baronchelli, 2018; Guo & Antulov-Fantulin, 2018; Indera, Yassin, Zabidi, & Rizman, 2017; Y. B. Kim et al., 2016; Kim et al., 2017; Li, Chamrajnagar, Fong, Rizik, & Fu, 2018; Matta, Lunesu, & Marchesi, 2015; McNally, Roche, & Caton, 2018; Sin & Wang, 2017). These studies focused more on the prediction of the Bitcoin price fluctuation itself rather than on finding information that could be useful, such as opinions or opinion leaders within the forum. Many studies also sought to analyze community networks and users’ opinions to identify opinion leaders or to find the main theme and maximize network marketing effects (Alvarez-Galvez, 2016; Choi, 2015; Ho et al., 2016; Junbo, Min, Fan, & Xufa, 2005; Jiang et al., 2014; Maier, Waldherr, Miltner, Jähnichen, & Pfetsch, 2018; Wu et al., 2015; Wang, Du, & Tang, 2016; Zhao, Kou, Peng, & Chen, 2018). Users who actively participate in the community and have high reputation are called opinion leaders and they play an important role in leading the consensus in the social community (Chan & Misra, 1990; Johnson, Safadi, & Faraj, 2015; Ming Yu, 2002). Rogers (2010) showed that opinions shared by opinion leaders propagate much faster in social communities than those of public users. Several surveys have created a simulation environment to understand the flow of opinions and identify opinion leaders (Zhao et al., 2018) and have attempted to identify the opinion leaders in a group (Ho et al., 2016; Jiang et al., 2014; Wu et al., 2015; Wang et al., 2016). Several authors focused on developing improved algorithms based on PageRank to identify opinion leaders in social communities (Jiang et al., 2014; Wang et al., 2016).
Method
People actively communicate with each other by writing posts and posting comments on other users’ posts in online communities such as the Bitcoin forum. In this section, we introduce the process of data collection and preprocessing and analyze the difference between user groups by constructing community networks and performing topic modeling.
Data Collection and Preprocessing
First, we crawled posts and comments including metadata such as the post time and username from a Bitcoin forum (https://bitcointalk.org/index.php?board=1.0) using Python, Requests, and the Beautiful Soup library. Users in this forum discuss Bitcoin, the blockchain system of Bitcoin, prices, and investments, among other topics. As Bitcoin developers and core members have discussed the development of Bitcoin since its initial stages in this forum, this community can be considered as a representative community among various Bitcoin communities. We collected a total of 36,898 posts and 353,679 comments written by 32,304 users from November 22, 2009, when the forum was created to February 2, 2018, when we finished crawling. We crawled data in a legitimate manner in compliance with terms at https://bitcointalk.org/robots.txt. In addition, the crawled data do not include any personal information.
After collecting the data, we applied data preprocessing to the raw text for (1) parsing each document into word tokens, (2) stemming each word using the WordNet lemmatizer in the Python NLTK (Natural Language Toolkit) library (Loper & Bird, 2002), (3) converting each word into lowercase, and (4) removing non-English characters, special characters, and stop words, which are words that have fewer than two characters and words that appeared fewer than 10 times. We also discarded posts and comments that are five words or less. After data preprocessing, we collected 390,572 posts and comments with a vocabulary of 12,021 terms.
Subsequently, we constructed community networks and analyzed the network topology using the Python NetworkX library (Hagberg, Swart, & Chult, 2008). There exist network or link analysis algorithms such as PageRank and HITS (Kleinberg, 1999; Page, Brin, Motwani, & Winograd, 1999). In this study, the HITS algorithm was employed to measure the degree of significance of each user. We then extracted both the highest and lowest ranked users and performed topic modeling.
Network Analysis
We assume that there exists an edge if user ui
posted a comment on an article by another user uj
given the user set
Therefore, a directed graph
where
HITS algorithm.
Using the Python NetworkX library, we calculated the authority score for each user node using the HITS algorithm to assess how much attention a user gets from other users. The degree of attention gained from others indicates more influence than activity because there may be spam users or malicious users. Hence, we used the authority score and not the hub score to measure influential power of each user.
As shown in the Network Analysis subsection in the Experimental Results section, the distribution of authority scores follows Pareto’s law (Koch, 2011). In many studies (Lipovetsky, 2009; Koch, 2018; Parmenter, 2007), the 80/20 rule is often adopted to split two polarity groups. Therefore, we defined users who account for the top 20% authority scores as opinion leaders
Topic Modeling
Topic modeling is a machine learning technique that extracts coarse-grained and abstract topics that prominently appear in a large-sized document corpus. Generally, topics can be calculated using unsupervised learning algorithms. For example, topic modeling includes probabilistic techniques such as probabilistic latent semantic indexing (p-LSI; Hofmann, 1999), latent Dirichlet allocation (Blei, Ng, & Jordan, 2003), and matrix factorization techniques such as latent semantic analysis (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990) and nonnegative matrix factorization (NMF; Lee & Seung, 1999, 2001). By conducting topic modeling, topics and key words that prominently appear per topic can be extracted from the document corpus. Using these topics and key words, we can efficiently analyze large documents by examining only the summarized information or by interactive visualization using topic information. For example, several studies (Choo, Lee, Reddy, & Park, 2013; Chaney & Blei, 2012; Kim et al., 2017; Kuang, Choo, & Park, 2015) provide visualization platforms that help experts or analysts to carry out efficient analyses using interactive tools such as hovering and data linking.
In this study, we conducted topic modeling using NMF. Before conducting topic modeling, a term-document matrix
where
In addition to conducting topic modeling using the term-document matrix, we performed topic modeling by establishing the
Experimental Results
After constructing the community networks described in the Network Analysis subsection given in the Method section, we analyzed the community networks to differentiate opinion leaders and majority users in the community, as described in the Network Analysis subsection in the Experimental Results section. Subsequently, we performed topic modeling to compare the opinion leaders to other majority users, as explained in the Topic Modeling subsection given in the Experimental Results section.
Network Analysis
As described in the Network Analysis subsection given in the Method section, the HITS algorithm was applied to the community networks, and the authority score was calculated for each user. Figure 2 illustrates the total number of posts, the number of comments written by each user, and the number of comments tagged to each user’s posts according to the authority scores.

Scatterplots of the numbers of posts and comments according to the authority scores. (A) Number of posts. (B) Number of received comments. (C) Number of written comments.
Figure 2A and B shows that both the numbers of posts and comments received are highly correlated with the authority score, but Figure 2C shows a low correlation between the number of comments written and the authority score. In particular, the correlation between the number of comments received and the authority score is positive and much higher than the other correlations. This shows that authority scores tend to be high if a user receives many comments or attention from other users in the community.
In addition, the distribution of authority scores according to the user rank is shown in Figure 3A. This figure shows that a small number of top-ranked users account for most authority scores. We plot Figure 3A on a log–log scale (as shown in Figure 3B) and observe that the shape of the distribution almost follows a linear form and the coefficient of determination is approximately

Distribution of authority scores. (A) Authority score versus the user rank. (B) Log–Log scaled graph of Authority scores.
According to Pareto’s principle (Koch, 2011), we selected users who account for the top 80% of authority scores (3,637 users, approximately 11.26%) and defined them as the opinion leader (
First, we extracted words that are relatively more frequently used by opinion leaders rather than majority users and also words that are used more frequently by majority users. Table 1 indicates that opinion leaders use more terminology such as technical words than majority users. They are likely to be Bitcoin developers, Bitcoin miners, or businessmen producing new cryptocurrencies or initial coin offerings. In contrast, words that are observed to be more frequently used by majority users than opinion leaders are related to the price, trading, and investment. This indicates that majority users are more interested in the price, transaction, and investment of Bitcoin.
Top 25 Key Words That Are Relatively More Frequently Used by Each User Group Compared to the Other Group.
Note. The List of Words is Sorted by the Relative Ratio.
We further analyzed whether opinion users are interested in the price, transaction, or investment and if so, whether they can represent the entire community by covering a variety of topics through topic modeling.
Topic Modeling
We performed topic modeling considering the period from June 1, 2015, to February 1, 2018, for the entire set of users, opinion leaders, and majority users. We conducted topic modeling with 5, 10, 15, and 20 topics. When the number of topics is more than 10, very similar topics occurred; therefore, 10 topics were observed to have the best fit. The hyperparameters
Table 2 implies that the topics generated by majority users are highly correlated with the price and are positively correlated with the quantities. In other words, the majority users are highly interested in topics related to the price. In contrast, entire users and opinion leaders show a relatively low correlation than majority users and there exist a few negative correlations with the corresponding topics. These trends are also shown in Figure 4B, which depicts the weights for the top three topics from majority users that show the highest correlation to the fluctuation in the Bitcoin price. These top three topics are mainly related to the price, transaction, and investment of Bitcoin. In Figure 4B, we can clearly observe that the fluctuation of each topic that is highly correlated with the Bitcoin price is similar to the fluctuation in the Bitcoin price. These trends can also be seen in Figure 4A. After June 2017, when the Bitcoin price had drastically increased, the numbers of new members, posts, and visits also increased. These new members were not originally interested in Bitcoin, but it is likely that they became interested since the price soared. These results indicate that majority users are more interested in topics related to the price, whereas opinion leaders are interested in topics related to the price as well as other diverse topics such as the blockchain and mining, among others.
Pearson Correlation Between Topic Weights and the Bitcoin Prices.

Time-series graphs for community statistics and the weights of majority users’ topics highly correlated to Bitcoin prices. (A) Number of new members, posts, and visits on each date. Each value is normalized to the range from 0 to 1. (B) Weights for each topic extracted from majority users and real prices.
Therefore, we conducted further experiments to determine whether opinion leaders can effectively represent the entire community. The degree to which a user group represents the entire community can be calculated using the topic similarity for each topic with its one-to-one matching topic from the entire community. However, it is a nontrivial problem to find the one-to-one matching pairs. To solve this problem, surveys such as Greene, O’Callaghan, and Cunningham (2014) used the Hungarian algorithm, also called the Kuhn–Munkres algorithm or Munkres assignment algorithm. The Hungarian algorithm is an optimization algorithm used to solve an assignment problem. The Hungarian algorithm is calculated as follows:
where
Greene et al. used the Jaccard distance with the top-ranked key words for several surveys by applying the Hungarian algorithm. Following this, we found one-to-one matching topic pairs for the opinion user group, majority user group, and total user group. Thus, we obtained
Table 3 summarizes the topic similarities and average scores for the opinion leader group and majority user group. Most topics selected by opinion leaders and majority users are similar to topics by entire users. However, two topic similarities from majority users, as highlighted in Table 3, were far lower. These topics are mainly related to mining network updates and questions about the identity of Satoshi Nakamoto (2008) who invented the original version of Bitcoin and the blockchain. In spite of the use of a smaller number of articles (192,292 articles from opinion leaders and 197,344 articles from majority users), the average topic similarity from opinion leaders was higher than that from majority users. Thus, this result suggests that opinion leaders have a greater ability to effectively represent the entire community than majority users.
Topic Similarities and the Average Scores for the Opinion Leader Group and Majority User Group.
Discussion
As presented, we revealed several interesting findings regarding opinion leaders and majority users in the Bitcoin forum. In the Network Analysis subsection in the Experimental Results section, we determined that the authority scores calculated from community networks follow the Pareto distribution. In addition, we investigated users who have the highest authority scores. As described earlier, a high authority score indicates that they are likely to get more attention or have a high reputation and communicate with other users actively. In practice, most of these users are Staff, Legendary, or Hero members in the Bitcoin forum (in the Bitcoin forum, there are positions such as staff, legendary, hero member, Sr. member, full member, member, Jr. member, newbie, and brand new for each user based on the number of activities and posts they have written) and active users who simultaneously work within the Bitcoin forum and other cryptocurrency communities. We can presume that they have abundant information about the trends of Bitcoin and other cryptocurrencies and they can transmit this information to this community. According to the results in Table 1, opinion leaders use professional and technical words relatively more frequently than others, whereas majority users use words related to the Bitcoin prices or transactions. It implies that, in reality, opinion leaders are very likely to be experienced and have a deeper understanding of Bitcoin and cryptocurrencies.
In the Topic Modeling subsection given in the Experimental Results, we compared the topics extracted from opinion leaders and those from majority users in the Bitcoin forum. The results of the Pearson correlation demonstrate that majority users tend to focus on the price, transaction, and investment of Bitcoin. On the basis of the topic similarity, we realized that opinion leaders can represent the entire community much more effectively than majority users. Then, can we represent the entire community using a small number of opinion leaders? In other words, would it be more reasonable than selecting users randomly? In addition, we hypothesized that users who account for the top 80% of authority scores are opinion leaders, whereas Zhao et al. (2018) assumed that there are 5% of the members of a social e-commerce community are opinion leaders when conducting simulation experiments. Then, how many users need to be defined as opinion leaders to represent the entire community?
To answer these questions, in a similar manner to the Topic Modeling subsection given in the Experimental Results, we calculated the average topic similarities from opinion leaders and presented them using a line graph according to the percentage of users. We also calculated the average topic similarities from randomly selected users as a baseline model. However, when we randomly select users, the variation in the topic similarity should be high. Therefore, a bootstrapping technique was applied. Bootstrapping is a resampling technique for reducing the statistical variance. Here, we conducted experiments 10 times and used the average results.
Figure 5 shows that the topic similarity of opinion leaders converges to almost 1.0 after using only 7% of users. However, the topic similarity of randomly selected users increases much more slowly than that of opinion leaders and converges to a lower value than that of opinion leaders. Therefore, we can represent the entire community with only a small percentage of opinion leaders. However, at some points, such as ratios of 0.45 and 0.65, the topic similarity slightly decreases. We note that this is because there are several users having different interests even in the same opinion leader group. We leave this problem of various groups existing among opinion leaders for future work.

Average topic similarity according to the proportion of users.
This study focused on online users in a Bitcoin forum. Most Bitcoin transactions are conducted online and many people seek and share information on a forum. We confirmed that the environments characterized by a large number of users and a large amount of shared information may have opinion leaders, the existence of which facilitates the analysis. Such analyses utilizing opinion leaders are expected to be helpful in various ways in processing information related to Bitcoin. For example, we can analyze the huge amount of information more efficiently. Because we analyze only a small percentage of opinion leaders to analyze the entire community, we only use the posts written by these opinion leaders, and the number of posts is less than half of all posts in the community. In addition, the number of documents targeted for analysis is very important because topic modeling consumes considerable memory and time resources.
The findings confirmed in various Bitcoin forums with unique characteristics are expected to contribute to finding the opinion leaders in communities with various types of environments, identifying their roles, and extracting meaningful information. For example, the identification of opinion leaders in a marketing-oriented forum may maximize the marketing effect by targeting these opinion leaders, as they have more socially influential power than majority users.
Conclusion
This study identified opinion leaders in a Bitcoin forum and compared them to other majority users by analyzing the topics extracted from posts written over the past 3 years. The results of this study showed that the information delivered by a small number of opinion leaders may represent the opinion of the entire community in large social communities such as the Bitcoin forum where information is exchanged vigorously.
In a future study, as mentioned in the Discussion section, we intend to find the opinion leaders using a more efficient method and divide them into several topic groups to analyze the opinion leaders for each topic group. The analysis of how online communities influence the value of cryptocurrencies and the blockchain is an interesting research problem (Munksgaard & Demant, 2016). Furthermore, we will investigate methods for finding opinion leaders more efficiently not only in the Bitcoin forum but also across other online social communities with different characteristics.
Footnotes
Authors’ Note
The data set generated and analyzed during the current study is available in the repository at https://tinyurl.com/y6vgycst. The source code generated and analyzed during the current study is available in the author’s Github repository at
.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2016R1C1B2015924) and the Chung-Ang University Research Grants in 2018.
