Abstract
Weibo, the most widely-used social media in China, makes researchers highly regard its profound impact in public and gather moods for social computing and analysis, such as financial prediction. Most existing literatures concern excessively on text semantic or sentiment mining techniques, but neglect the procedure of moods dissemination and its factors. This paper proposes an integrated framework of social media moods mining, which creatively focuses on information transmission and propagating factors analysis, to predict stock prices more accurately. For the part of propagating factors on social media, several essential factors are distinguished in the dissemination process, such as emotional absorption of forwarding, influence of content and poster, user categories, release time, etc. to optimize the fitting effect of original model. And the count of forwarding also matters on predicting stock prices. Searching a given finance-related keyword, from Weibo we collected over 500,000 micro-blogs and their user information. Then we adopt the proposed integrated framework to predict stock price fluctuation, as well as the simple neural network method. Experiments demonstrate that the former outperformed the latter. The results also show that user categories and the count of forwarding differ on the lag phase of influence. And more, this paper studies the fitting effect of prediction models for different periods of the stock curve. The results indicate that the model works the best in the rising periods of stock prices curves, relatively well in the declining and the worst in the random fluctuating.
Introduction
Weibo, with the meaning of “Micro-blog” in Chinese, has grown up to be a leading social media platform. It provides simple way for Chinese people or organizations to publicly express themselves in real time, as well as interact with others on a massive global scale. So that Weibo has become a cultural phenomenon in China, as well as has a profound social impact. Existing studies have found out that most users and organizations look Weibo mainly as a means of real-time public self-expression. That leads to an amazing consequence that Weibo has been a good medium of mood aggregation and distribution, e.g., its influence in the stock market has become noticeable, which has attracted the attention of scholars. Many scholars have tried to gather sentiment on Weibo for prediction. A sentiment might influence trends in stock markets.
Text sentiment analysis refers to detecting, analyzing and mining the user’s viewpoint, preferences, emotion in texts. Weibo has its characteristics, such as tremendous amount of feeds, real time information, and special styles of text. It provides scholars broader development space for Weibo mining, as well as bigger challenge for mining technologies. Most works focus on text mining techniques, such as semantic analysis or sentiment mining. Various semantic web, ontologies and sentiment dictionaries have been established. Related papers are abundant. The efficacy, accuracy and stability of the proposed predicting methods, however, differ in various cases. Existing works concerned excessively on text mining techniques, but neglect the procedure of moods dissemination. Worse, they choose “all users” as the subject investigated, ignoring that user categories, as propagating factors in information transmission, seriously affect the procedures of mood aggregation and distribution. At present, the studies in this area are relatively weak. We did try to integrate propagating factors into Weibo mining for stock prices prediction. Moreover, we took the amount of forwarding into account, checking its effect on stock prices fluctuation. We believe it is a meaningful effort and exploration to consider the user motivation and behaviors. That enriches the methodologies of Weibo semantic analysis and sentiment mining. Experiments show that our proposed integrated framework outperformed the simple neural network method. We observe that user category and the count of forwarding differ on the lag phase of influence. And more, we found that the model fitting effect were the best in the rising periods of stock prices curves, the second place in the declining and the worst in the fluctuating.
The remainder of the paper is organized as follows. In Section 2, we present briefly a description of proposed problems and a literature review. Section 3 introduces the fundamental procedures of Weibo mining for prediction. The procedures include crawling data, storing data, preprocessing data and calculating text sentiment of Weibo feeds. In Section 4, we apply an improved approach, introducing the effect of propagating factors on the proposed predicting models, to achieve more accurate of stock prices. These propagating factors consist of emotional absorption index, content influence, poster influence, poster activity and release time of Weibo feeds. Section 5 introduces a new conception of user category effect on models. We analyzed the propagating effect of different users, verified or unverified users, and different forwarding behaviors, 1 through 5 times and more than 5 times forwarding. Base on this analysis, we improve the proposed predicting models. The experiments and their results are presented in Section 6. Finally, some concluding remarks are given in Section 7.
Literature review
Considering that a feed of Weibo has a feed of up to 140 Chinese characters, and it has special text styles nowadays in the age of network culture, we only focus on words/phrases level sentiment analysis. There are abundant literatures, especially for Tweeter analysis, which used to be divided to two categories, target-dependent or target-independent: (a) for target-dependent sentiment analysis, there are still two kinds, rule based or features based approaches. Davidov et al. [3] utilized 50 Twitter tags and 15 smileys as sentiment labels, trained a rule based classifier similar to a KNN (k-Nearest Neighbor) System, and fulfilled identification and classification of diverse sentiment types of short texts. Saif et al. [10] alleviated Tweeter data sparseness problems using two different feature sets: a semantic feature set incorporated into classifier training through interpolation, and a sentiment-topic feature set with what the original feature space was augmented. Jiang et al. [6] proposed an improve Twitter sentiment classification. They incorporated target-dependent features first and then took related tweets, its context, into consideration; (b) for target-independent sentiment analysis, scholars do not care who or what the given text talks about, but only want to know the sentiment polarity, good or bad. Most works that have been published are, so far, subject-irrelevant. In this area, mainstream methods include dictionaries based, unsupervised and supervised or semi- supervised machine learning techniques. Go et al. [4] presented a novel approach, using distant supervision, for automatically classifying the sentiment of Twitter feeds. They adopted various machine learning algorithms, including naive Bayes, maximum entropy, and SVM, to establish the proposed approach. Kontopoulosa et al. [8] deployed original ontology-based techniques for a more efficient sentiment analysis. Tweeter feeds were not simply characterized by a sentiment score, but instead receive a sentiment grade for each distinct notion in the post. Thelwall et al. [12] reported that a study of Twitter posts, assessing whether popular events are typically associated with increases in sentiment strength, as seems intuitively likely.
Another important part of literatures is the study on information transmission and propagating effect of Weibo feeds. Here propagating effect refers to the total impact on users and the society. An impact comes from information transmission via a social media platform. Jansen et al. [5] analyzed huge micro-blog postings with branding comments, sentiments, and opinions, and found that 80 percent were information seeking or sharing. Only 20 percent contained some expression of branding sentiments. Of these, more than 50 percent were positive and 33 percent were critical of the company or product. This result shows the importance of micro-blog on overall marketing strategy and branding campaigns. Boyd et al. [2] examined the behaviors of retweeting, investigating who do retweet, why to retweet and how to retweet. And they finally revealed that the messiness of retweeting by highlighting how issues of authorship, attribution, and communicative fidelity are negotiated in diverse ways. Suh et al. [11] examined a number of features that might affect retweet-ability of tweets. They found that some factors were significantly associated with retweet rate. Xia [14] investigated the structure and mechanism of Weibo interaction. He believed that is a cultural, personal and emotional medium, and the rights of “receiver” would go beyond “transmitter”. Li [9] looked followers’ forwarding behaviors as the key means for information transmission. So he believed that scale and propagation time of forwarding behaviors were two key indicators to reflect propagation effect. In to perspective of analyzing individual forwarding motivation, Li integrated five features into a prediction model: poster influence, follower activity, content importance, similarity between followers’ interestingness and content, intimacy between poster and follower. Zhang et al. [16] investigated the retweet mechanism in Tweeter, analyzed different features and presented a new classifier with weighted features to predict retweet behaviors. The experiment showed that the proposed model outperformed previous works with the accuracy of 85.9%.
The relationship between mood online and stock market. In western countries, public sentiment on web, especially in micro-blog like Tweeter, has been quantified and inputted into stock market prediction models. Zhang et al. [15] analyzed the correlation between collected sentiment on Tweeter and the stock market indicators. They found that these indices significantly negatively correlated with Dow Jones, NASDAQ and S&P 500, but significant positive correlation to VIX. Bollen et al. [1] utilized Opinion Finder to measure positive vs. negative mood, as well as Google-Profile of Mood States (GPOMS) to measure mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy), for the text content of daily Twitter feeds. They reported that the accuracy of Dow Jones predictions could be significantly improved by the inclusion of specific public mood dimensions. Domestic research in China starts focusing on correlation between online sentiment on Weibo and the stock market in China. Most scholars prefer to apply SVM in stock prediction models, which is more adequate for the nonlinear and volatility characteristics of stock prices. Comparing with ANN (Artificial Neural Network), SVM helps to solve problems of few samples or high dimension. Jin et al. [7] presented a multi-variable stock market time series prediction model, with a SVM based core algorithm, to improve the accuracy of prediction. Experimental results showed that its generalization ability was better than regression prediction of single-variable. Wang et al. [13] adopted the sentiment of the news as an extra indicator into the financial prediction models for stock price volatility. The result proved the correlations between the information sentiment and the stock price volatility.
In summary, the existing work focuses on the process of text mining rather than moods dissemination, and studies in this area are relatively weak. So we proposed to integrate propagating factors into Weibo mining for stock prices prediction and take the amount of forwarding into account, which we believe may better the results of prediction.
Weibo data acquisition, preprocessing and preliminary analysis
Data acquisition
Over 2.8 billion feeds were shared on Weibo. Most of these feeds are freely available to researchers through public APIs, namely “Sina Weibo Open Access Platform”. Although we can use these APIs to access Weibo data, strict rate limitations block the volume of data we could collect. We have to appeal to some automated programs, such as web crawlers, scatters, spiders, or robots. Hence, in this study, we use a combination of both public APIs and web crawlers. Then we have mainly three categories of data to store: (a) Weibo feeds, for sentiment analysis; (b) poster properties, for user category and property analysis; (c) original content of forwarding, for motivation analysis of forwarding behaviors.
Data preprocessing
Limited up to 140 Chinese characters, Weibo feed used to be concise and marked by simplicity, clarity, and candor. The styles differ from the traditional writing on three issues: (a) the semantics structure usually is incomplete, the subjects or objects being omitted; (b) the context is very important on Weibo, a review or comments exactly replying the original feeds when forwarding, just like a conversation; (c) feeds usually contain emoticons, labels, and hyperlinks, essential for the meaning feeds want to express. All these characteristics make sentiment analysis more complicate. We need do something before directly calculating sentiment score of a text. Above all, we split feeds layer by layer. We do further sentiment analysis on a per-text basis and stack the scores. And we sum up the scores and get a final result.
After preprocessing Weibo data, we calculate sentiment of Weibo feeds, transforming text into a sentiment score. This process seeks to automatically associate a piece of text, such as word or phrase, with a sentiment score, a positive or negative emotional score. Therefore, we need find the special cases in text and treat them specially. As mentioned above, emoticon also has a sentiment. And we should get rid of usernames, even if there were some emotional meaning. We should not take usernames into account on sentiment analysis.
Usually, a Weibo feed contains a context of the original message and the following reviews or comments. These reviews or comments are continuing the mood of the original one. We should not calculate the sentiments themselves alone. But there is a big challenge how to create a vector of weights. For far there is no any research fruit of this problem, we plan to get such a vector of weights based on an experimental method. For ease to process, we store the sentiment scores using a string format at first, like “1st score + 2nd one + 3rd one + …”. The “1st score” represents the sentiment score of the first layer. Later when we do sentiment analysis, we then split the strings and calculate to obtain the weight vector.
Preliminary analysis
To investigate a curve fitting of Weibo sentiment curve and a related stock price curve, we select a keyword “YuE Bao”, as well as a related stock (JZGF, Stock code 600446) in SSE (Shanghai Stock Exchange). We first search this keyword on Weibo and collect related feeds, and then calculate and obtain the sentiment curve of feeds on a daily basis.
Directly Comparing Results of Curves, sentiment values and stock prices. For an intuitive grasp of the circumstance, we plot them in a time-series graph, as shown in Fig. 1 (where ‘M6’ means June, ‘M7’ means July, and so on, and ‘M5’–‘M12’ in other figures are the same).

A sample of forwarding Weibo feed with the similar words.
Obviously two curves significantly differ. The stock price peaks in November 2013 and remains relative stable in other months. The sentiment value peaks in June and November and is relative stable in other months. Meanwhile, see the small changes in the two curves. We find that there is an indistinct lag effect between them, which need a further study to get the details.
The subjective judgment by seeing the graphs doesn’t reveal the inner connections between a huge data. We do AFD (Augment Dickey–Fuller) test using the professional software Eviews 6. We choose the simplest way, the unit root test, to do AFD test. The original curve is not stationary. A new curve through first-order difference of the original one passes the unit root test. That says the trend of stock prices is a curve of first-order integration, JZGF ∼ I(1). The software Eviews 6 gives us the output results:
That indicates that the sentiment curve is not a sequence of random walk. But like the first-order difference of stock prices curve, it is a curve of first-order integration.
As the previous section illustrated, though the model and each coefficient all pass the significance tests, F test and t test, respectively,
Here we determine that it is 4 of the value of time lag between stock prices and Weibo sentiment, which is the benchmark of time lag for the future improvement.
In the following sub-sections, using the control variable method, we could separate the effecting variables and get their weights. The method helps reduce the confounding effect of irrelevant variables so that we can check these factors one by one, which affect Weibo mood, including emotional absorption of forwarding, influence of content and poster, release time of feeds. For new factors, we compare and analyze the regression results.
Emotional absorption of forwarding
The first one is emotional absorption index of forwarding behaviors, which refers to the indicator of the emotional overlay during the forwarding behaviors. Usually there are three scenarios: (1) Commonly the users support the feeds. There will be very short comments, or not any words at all, just with a word “forwarding”. Here we think the present feeds inherit the mood of original ones. The emotional absorption index is k if the present feeds have the same mood with the last ones. We simplify the forwarding as a nested structure “the first layer feed//@the second layer//@…”. That has a formula
After incorporating forwarding behaviors’ emotional absorption index k into the predictive model, the new model’s
Comparing to the latest equation (3), the two variables regression coefficients all have got a certain degree of improvement. That indicates that sentiment curve had a more significant impact on stock price curve, for what it makes sense to add emotional absorption index of forwarding behaviours into the predictive model.
Feeds influence
The second one is the feeds influence, which depends on feeds content’s qualities, lengths, interests and review amount. Due to the nature of social media, such Tweeter and Weibo, it is very easy to express opinion. Whenever you support one some feed, what you need to do is just to forward this feed. Whenever you feel having some comments for one some feed, what you need to do is just to post a review under that feed. There is also an easier way that you can click the button “like” or “dislike” to express your feeling. Hence one feed’s amounts of comment, forwarding and like/dislike are enough to reflect its popularity. We think, comparing among these three factors, it is hard to discriminate which one is more important than the others. Therefore, we simply add them up.
The statistics show, for 276,290 of all collected 389,473 feeds, there is no any comment, forwarding, or like. It verifies a kind of common phenomenon that the majority of Weibo feeds have not attracted the interest of the public discussions. Among the rest 113,183 feeds, the highest value of responses is 23432, summing up the amounts of comment, forwarding and like/dislike, which reflect its huge popularity in social networks. But the remaining 90% of them has less than 100 responses. So it is necessary to set different weights for feeds due to their different popularities. We find that the span is too large of the response, adding up the amounts of comment, forwarding and like/dislike. So we adopt an algorithm as below to standardize the data with a threshold of 100.
After getting the weight
Posters influence
The properties of feeds authors also matter, including their charisma, activity degree, etc. However, they are difficult to quantify the pure text, so we prefer to use the digital indicators and emphasize the amounts of review, forwarding and “like”, as well as their charisma (weight
Considering issues of charisma, we must admit that the social media is an amazing place, encouraging people to convey freely their thoughts, idea or appeal to the public. That leads to a new cultural phenomenon that many ordinary people are coming to be the “somebodies” in the social media. Some guys are very famous in Weibo, such as @Zuoyeben, @Guomingyi, @Yuying, @Zhangjing, @KejiaSister, @Huaijiu, @zhishu, @Nongyange, @Suzizi, @Wangyujing. The commonness of them is that they all have a huge number of followers, called “fans” in Weibo. For example, @Zuoyeben has more than 8,759,307 fans and @Guomingyi 21,770,617 fans. These striking numbers reveal their enormous public appeal. Hence we adopt the number of fans as the simplest and direct quantitative indicator of posters’ charisma.
For all collected 309,186 users, one user has the biggest number of fans with the value of 20,061,864, and there are other 268,583 users with less than 1,000 fans, about 86%. The span is too large, so we need standardize the data to calculate the user charisma weight
Similarly, after get the weight
Comparing equation (8) to equation (4), obviously incorporating
Another important indicator of users’ public appeal is, in social media, is the activity degree. Even though somebody has numerous fans, he/she is to lose his/her leverage if inactive for a while, due to the so fast evolution speed of the Weibo. The activity degree of a user is directly reflected on the amount of feeds he/she post. Simultaneously there is an indirect effect of increasing the visibility on fans’ homepage if he/she posts many feeds in a period of time. More feeds, more familiar he/she gets, as well as credibility. However, some excessive behaviours, i.e. posting a number of valueless feeds in a short time, leads to the antipathies of fans and even causes to “Unfollow”. There is a kind of possibility that such stupid behaviours affect his/her charisma. But it is over complicated here so that we do not consider this kind of problems and its weight for his/her public appeal. We still only address the activity degree’s positive correlation with the charisma.
The statistics show, for all collected 309,168 users, there is 201,639 users with less than 2,000 feeds, about 65%. The biggest value of the feeds of one user is 6,833,567. It reflects the distribution of feeds on Weibo, for what it makes sense to add author activity degree weight
Similarly, after getting the weight
Comparing four equations (4), (6), (8), and (10), it is clear that incorporating
Clearly, three independent weights, of content influence, fans quantities, and author activity degree, differ in impact on regression results. Among them, the second works best. It verifies the thought that more fans lead to more power in social media. One feed, posted by the user who has numerous fans, is easier to be seen and forwarded to others. That is a root reason and economic incentive of zombie fans industry why to be widespread and uncurbed. For further improving the regression, we simultaneously incorporate three factors of feeds influence into the preliminary model. The integrated weight is from three weights multiply together, that is
We summarize the regression analysis result together, shown as Table 1.
Explorations of 3 weights of feeds influence
Explorations of 3 weights of feeds influence
Obviously, the result is the best of incorporating three weights simultaneously, which outperform other ways of anyone of independent weights, with
The release time of one feed has a big impact on its visibility in Weibo, which reflects the nature of Weibo’s display style. For any given logged-on user, Weibo system automatically displays all new posted feeds due to release time reversely. So fans access the latest feeds first. According to the common behaviours, very few users browse more than 5 pages. That means the content range is limited. Hence it is better to set different weights for feeds with different release time.
There are significant differences of the number of Weibo feeds in different time periods. There are less feeds in the early hours of the morning, which is consistent with people’s habits. By viewing a specific feed, we can find that the majority of feeds released in the early hours of the morning are released by automatic release tools, such as Time Machine, may led by network marketing. Since network marketing will also have an impact on the user’s mood, there is no mentioning of advertising type of Weibo.
Considering people work 8 hours every day, divide each day into three parts. First is 0–8 o’clock, as early morning, with least released feeds; the second is 8 to 17 o’clock, as work hours, with higher number of feeds in general; the third is 17 to 0, as after-work time, with basically stable number. The regression model is analysed and compared in different time period, and the time factor is introduced for another weights. In case there is no corresponding feed in certain category, we add 0.001 to ensure logarithm calculation.
Regression analysis in the early hours of the morning
Based on benchmark of time lag in equation (4), use Weibo data in the early hours of the morning. The results are shown in Table 2, where it is clearly 3-order lag influence. So the equation is given as equation (12). Compared to equation (11), new regression coefficient is better and
Explorations of feeds in the early morning
Explorations of feeds in the early morning
Similarly, use Weibo data in work hours. It is also 3-order lag influence, and we get a new equation (13). Compared to equation (11), The results shown in Table 3 indicate that new regression coefficient is also better, as well as
Explorations of feeds in working hours
Explorations of feeds in working hours
Similarly, after using Weibo data in after-work time, we get 4-order lag influence, and another equation (14). Unfortunately, compared with equation (11), this time the results in Table 4 are not as good as expected, since its regression coefficient and
Explorations of feeds in after-work time
Explorations of feeds in after-work time
After introducing time factor separately, we found only model with factor of early morning is better than the original model and the other two time periods perform worse in model fitting. To solve the above problems, three factors are introduced simultaneously with benchmark of 4-order lag. Three factors are weighted and added, and the weight is the regression coefficient of each regression. The final model is shown as equation (15). There are exploratory experiments with three phases respectively. See Table 5, introducing three time weights simultaneously has most obvious improving impact of
Explorations of three time factors
Explorations of three time factors
Through the method of control variable, in this section we analyze the multiple parameters influencing Weibo feed’s emotion, including the inheritance of forwarding emotion, feeds content, poster influence, and the release time. For different parameters, through the method of classification and time lag adjustment, the regression effect is compared, and we get the optimal weights and the optimal fitting, and finally the optimal model with 4-order time lag, where the
User categories and their effect
For Weibo being a dominant social media in China, there is a big overlap between its users and stock investors. A lot of feeds reflect posters emotions, which are further affect investor’s emotion. Institutional investors have more information than individual investors, and the investment behaviour is more rational. Individual investors are noise traders, likely to follow others. So this paper relies on the verification system in Weibo to distinguish different investors. In this experiment, a total of 40041 feeds are released by verified users and 349432 by unverified ones.
Explorations of verified user factor
Explorations of verified user factor
Weibo provides a variety of types of verification, including government certification, enterprise certification, institution certification, media certification, personal user authentication, website certification, involving entertainment, sports, finance, IT, communications and other industries. Verification platform has a relatively strict application requirements and processes for all kinds of certification, ensuring the information basically credible.
First, analyse verified user alone. Time lag and model’s exploratory experiments are shown as Table 6 and the model expression as equation (16).
Second, analyse unverified user only. Time lag and model’s exploratory experiments are shown as Table 7 below. It’s 4-order lag influence following whole emotional trend, and model expression is as equation (17).
Explorations with ordinary user factor
Introduce authenticated and ordinary user this two factors’ weight, and the final equation is equation (18).
Conduct exploratory experiments with equation (15), results shown as Table 8.
Explorations of user verification factors
It can be seen that introducing user verification factor separately can NOT improve the original model. But with two user verification factors,
Here unverified users are divided into two categories: users forwarding from verified users and users forwarding from unverified users. However, there are no reasonable regression results obtained. Considering no classification of users led by “//@” in the forwarding content and a lack of user information in the middle of the process, the division of two categories of users is not accurate enough, the effect of user verification in which is failed to be proved.
Now consider the effect of user categories on the count of user Forwarding. Because of the characteristic of verified user, any node of the dissemination network can play a role as starting point. So it is not influenced by the count of forwarding behaviors. And unverified users are different, on which accumulated effect of forwarding behavior has been fully reflected, so there is need to classify and discuss feed’s emotion of different number of forwarding. Since forwarding information in every feed cannot be accessed directly, we determine the number of forwarding according to the number of “//@” in feed text, including the feed and original one that is forwarded.
It can be seen from the distribution table of the count of feeds forwarding, that the number of feeds forwarded more than 5 times is only with proportion of 10%, so these are treated as one class. User forwarding behaviours are divided into several situations, that are zero user forwarding, user forwarding once, twice, 3 times, 4 times, 5 times and more than 5 times, so these six influence factors are to be studied.
Firstly, decide best time lag of the model of each factor and conduct exploration test to these six factors respectively. Result is shown as Table 9.
Time lag of 6 factors on forwarding
Time lag of 6 factors on forwarding
The emotional time lag is shortened to 3 when forwarding once, which indicates that the impact of forwarded feeds on emotion and behavior is more direct, and it is proved that the accumulation of forwarding number increases the reliability of the information. Such infections are beginning to weaken after forwarding 3 times. According to the table and significance test of regression coefficient, take 3-order lag for forwarding twice when
Without changing each factor’s time lag, improve the model by changing the number of factors. Six factors are introduced into the original model gradually, to determine the final total model. Set time lag as benchmark value of 4, and the summary table is shown as Table 10.
Exploratory analysis of introducing factor gradually that the count of forwarding
Exploratory analysis of introducing factor gradually that the count of forwarding
After introducing factor of forwarding 5 times and above, the validity of the model is weakened, so only introduce 4 factors which are up to forwarding 4 times. It can be inferred that with limitation of 140 character in Weibo feed, high frequency of “//@” characters indicates that the forwarding behavior is just for fun and less emotional.
Based on time lag of each factor and the number of introduced variables according to Sections 5.1 and 5.2, the regression model is shown as equation (19) below.
After model adjustment,
Finally, the model of unverified user and verified user are combined with the factor that is the count of forwarding, and the weights of all variables are modified based on former ones. The final regression is shown as equation (21) below, the final equation of our proposed model.
Concluding remarks
This section categorizes Weibo users into verified and unverified ones, studies on time lag and the fitting effect of emotional value and stock. Then unverified users are further categorized, time lag is shorter when forwarding feed once or twice, indicating the impact of the count of forwarding. But forwarding more than 5 times is more likely behaviour just for fun. Final regression model includes impacts of user categories and the count of forwarding,
Experiments and analysis
In this section, we collect data from stocks market and Weibo relatively from June through December 2013. We use data to check the fitting effect of the final model, the equation (20).
Model applied in different period
The first experiment is mainly to analyze the applicability of the final model in the different period of the stock curve. Because of the difficulty in the short term forecasting stock curve of the first order differential random walk, the
Above all, according to the fluctuation condition of stock curve, we choose 5 time periods to compare and analyse, as shown in Fig. 2. The selected 5 time periods are divided into three categories: the rising period of the curve (June 5th to July 10th, September 27th to October 14th), the stable period (August 1st to 21st), the declining period (October 14th to November 1st, December 3rd to 23rd) respectively, as shown in Table 11.

Stock curve and 5 time periods taken.
Model applicability in different periods
From the table, we can infer several insights: the model effect in the rising period is very good.
In short, the primary conclusion is that the final model is used to conduct short-term analysis and prediction, especially suitable for rising or declining period, not for period with relatively stable fluctuations.
Adopting software Eviews 6, we use the final model to predict data and compare it with real data in all time period. The experiment results are shown as Fig. 3.
Eviews 6 provides us several indicators of evaluating the predicting performance, as shown as Table 12, including root mean square error (RMSE), mean absolute error (MAE), mean average percentage error (MAPE), Theil inequality coefficient and deviation ratio, variance ratio and covariance ratio. Their criteria are: (1) Smaller RMSE and MAE, better the prediction; (2) Theil inequality coefficient is between 0 and 1, where 0 represents completely consistence with real value; (3) deviation ratio indicates the deviation of the predicted average to the actual value of the sequence. Variance ratio shows the variance of the predicted variance to the actual variance of the sequence. Covariance ratio measures non-systematic error. So good prediction effect indicates small deviation and the variance, and relatively large covariance. In the final model RMSE and MAE are less than 0.01, Theil inequality coefficient is 0.017, covariance ratio is 77.28%, so the model prediction effect is quite good to accept.

Model’s fitting effect of
Evaluation indicators of the predicting performan
This paper aims to capture Weibo feeds relative to some given financial keyword, then fit with the curve of its concept stock price. The key points are to introduce influence factors of mood diffusion model, and improve the effect of curve fitting, by modifying weight and time lag, as well as analyzing user categories and their behaviors.
In the actual data analysis, we set up a regression model of the emotion value and the stock price and find some interesting remarks, which wish help emotion analysis and prediction in the future:
Factors influencing feeds diffusion effect. This paper focuses on emotional absorption index k, content influence
Analysis of user categories. In this paper, the impact on Weibo diffusion effect is quantified of different types of users and forwarding behaviors, where users are classified into unverified users and verified users. It is found that the time lag of verified users is shorter than unverified users, emotional influence to investment is more direct. And unverified users forwarding once or twice have shortest time lag, indicating higher number of forwarding times improving credibility of information and promoting behaviours. But when the count of forwarding is greater than or equal to 5 times, the effect is weakened, forwarding behaviour is much for fun and has no direct impact.
Comparison of different periods. In this paper, the fitting effect is analyzed of the stock curve in different periods. It is found that the fitting effect is best in rising period, then is declining period and is worst in period with stable fluctuations.
In this paper, we get some important conclusions in the fitting of emotional curve and the stock curve, but there are also some limitations:
May not obtain all relative Weibo feeds. Keyword is limited to a certain stock, and many relative feeds without the keyword are abandoned, so the acquisition of feeds may not include all relative ones.
The acquisition of forwarding text may not be complete. Due to the limit of 140 words in microblogging, users may delete the original feed when forwarding due to the long content and not enough character to write users’ comments.
The acquisition of user information may not be comprehensive. User information in this paper is derived from poster and original blog’s poster that is forwarded from. But in the real situation of text analysis, the notation “//@” indicates a large number of user information, the lack of information in this part leads to failure in 5.2 of analysis of the situation when unverified users forward feeds from verified users and unverified users.
Use single model. In this paper, we use dynamic regression ARIMAX model of time series to analyze data, and judge the improvement of the model by
Footnotes
Acknowledgements
This paper is supported by the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (2020030099).
