Analysis of relationships between tweets and stock market trends

Abstract

In this paper we measure the relationship between messages in the social media and the stock market prices. First, we measure the correlation and association between the amount of stock related tweets and different financial indicators such as prices, returns and transaction volume. Then, we analyze the content of the messages and test whether the tweets generated during different trends of price change (up, down or steady) can be distinguished by automatic classifiers. Our corpus consist on messages related to nine IT companies and also their daily prices and volume during trading hours for over a period of three months. Two textual representations were used, bag of words and word embeddings. The tweets were automatically tagged using two thresholds to bin the changes in price. We have found a correlation between the amount of daily messages and the volume of financial transactions. We also found negative association (more specifically, what we define as local trend association) between tweet volume and financial indicators that were not found by using only the correlation analysis. Our main contribution is that the messages generated during a positive, negative and neutral trend can be distinguished by state of the art classifiers.

Keywords

Stock market twitter machine learning bag of words word embeddings

1 Introduction

With the advances in natural language processing and machine learning techniques the task of forecasting the stock market using textual information has grown popularity as more data becomes available. The first works [14 , 43] used news as a source of information with promising results . On recent years the works [4 , 45] use the social media provided its growth in popularity. Social media embeds not only news but also stock market events, opinions and insights from investors.

We collected Twitter data to create a corpus of messages related to nine IT companies. We choose Twitter because it is a popular platform and also because the messages or tweets are tagged with the company’s stock symbol by the users themselves.

In this study we analyze the tweets generated during different trends, our hypothesis is that it is possible to identify tweets depending on the positive, negative or neutral market trends. We used two different text representations, Bag of Words (BOW) and Word Embeddings (WE), for the latter word2vec, a model trained on news, is used. Previous works compare textual representations in the task of sentiment analysis using a manually tagged set, however we tag a tweet automatically by the trend in which it appeared. Our results are contrasted against a baseline that considers the distribution of the corpus. Not many related works consider how the distribution of the data may influence the results [22].

We also study whether tweet volume is correlated with the transaction volume or returns indicators and whether it can be helpful on the task of forecasting. We consider also negative relationships using a measure of local trend association [2]. Previous studies do not consider the trading hours, which are important since after the market closes new topics are generated that may not be related with the previous price but with next day’s price.

The rest of the paper is structured as follows. In Section 2, we present the related works. In Section 3, we describe the corpus. In Sections 3.1 and 3.2, we present experiments with tweet volume and in Section 3.3, experiments with the content of the tweets. Section 4 includes conclusions and future work.

2 Related work

There are two main paradigms in forecasting the stock market: technical [18] and fundamental [1] analysis. The first approach analyzes the historical prices of the time series themselves using methods such as auto-regression, ARIMA, state-space models, neural networks and others. The second approach relies on financial indicators such as the overall economy, unemployment, welfare indicators, etc.

There have been numerous approaches to integrate automatically other information than in fundamental or technical analysis in the stock market prediction.

One of the first research works that investigates such strategies is done by Sankaran [36]. In his work, traders are asked to assign weights to different attributes they call “unmeasurable factors”, including government policies or political news that may influence the exchange rates. In this approach the knowledge of experts is taken from the news they read and other internal information they posses, but the news are not directly processed.

Leung [17] creates a system that requests the user to manually download news before the market opens to predict the Hang Seng Index, generating decision rules from the news articles. Later, in [43], Leung’s approach is extended by adding other global indexes such as DJIA and FTSE and improving the processing techniques. This last approach resulted in greater performance forecasting the DJIA, it was attributed to the fact that most news sources they used were from the U.S.

Lavrenko et al. [14] collect news automatically, they consider prediction as a classification problem. The text from the news is aligned with the future financial time series trend. He analyzed different companies, one of his interesting findings is that the same word may have different effects for different companies. The set up of prediction as a classification problem has been widely used in the literature [11 , 47].

Lee et al. [16] create a corpus of financial reports that ranges form 2002 to 2012. They tested several machine learning methods, the method reported with the best results was Random Forest. In this article financial and text features are combined, both extracted from the reports. Twenty one financial features are used, the most helpful being the earnings surprise (the difference between expected and reported earnings for a given company). It was found that adding text features to financial features improved the accuracy from 50.1 to 55.5.

In Zhang’s dissertation [47] the corpus from [16] was used. He implemented three different neural networks architectures. Although they did not considered the financial features but only language features the accuracy he reported was better than Random Forest and Support Vector Machine classifiers. He selected only 15 from the 1500 companies in the corpus.

On recent years it is common to use information from social media, considering that it is a channel for communication changing the ways users interact with the news [15]. Social media includes information from different sources. In [13] it was found that 85% of the topics in social media are related to news.

Twitter is a micro-blogging social media platform launched in 2006. Its mission is “To give everyone the power to create and share ideas and information instantly, without barriers”. It has 313 million active users1 and any topic can be tackled in this platform including discussions related to the stock market. When a company stock is being discussed it is clearly referenced by preceding its stock symbol with the dollar sign e.g. $MSFT stands for the Microsoft’s stock symbol.

Some of the advantages of twitter are the inclusion of different sources of information and the summarization of this information, since only messages shorter than 140 characters are permitted. One of the disadvantages is that we can find noisy tweets in the form of adds or fake news. Wolfram [42] points out that his results could have been improved handling better the noise. Zhang et al. [46] only consider re-tweets with the rationale that if it is re-tweeted then it is more relevant to users.

Social media can be explored in different manners. De Choudhury et al. [8] correlate the magnitude of individual stocks’s price change with features of user interaction such as the number of comments on a post, the numbers of replies to them, the elapsed time between each comment and other several features. Another interesting approach by Ruiz et al. [35] is the representation of social media as a graph and the analysis of correlations between the features of the graph and the price and volume of company stocks. An interesting challenge in processing social media data is tackling multimodal information, which is the subject of the research field of multimodal sentiment analysis [6 , 44].

A popular approach to forecast the stock market is the use if sentiment analysis techniques. This is usually done for predicting large indicators such as DJIA or S&P [4 , 45], but it is also used for the stocks of individual companies [7 , 38].

Tetlock [40] uses a psychosocial dictionary, the General Inquirer’s Harvard IV-4, to measure the sentiment of a Wall Street column. It was found out that pessimism causes downward trends and that either high or low pessimism can predict high trading volume. Bollen et al. [4] predicted the daily up and down changes in the closing values of the DJIA using different mood indicators such as happy, alert or calm.

The results of forecasting using sentiment analysis at company level are different from those at market level. Oliveira et al. [23] did not find predictive power for the returns even after trying with five different sentiment lexicons, nor did they find a significance difference among the lexicons. De Fortuny et al. [9], also at company level, compare BOW against sentiment analysis, they found out that the latter performed worse than the former and even worse than random. They attribute this behavior to the fact that sentiment lexicons are usually extracted from different contexts, such as book or movie reviews. Even more, in finance sometimes the nouns and verbs are more informative for decision making than the adjectives. Lee et al. [16], predicting individual company stocks, used generic and specialized sentiment lexicons, none of such approaches boosted the accuracy results.

From the evidence in the literature it seems that sentiment analysis is more useful at the global market level than at the company stocks level.

Besides returns, the transaction volume is another variable than has been studied, it can be defined as the number of transactions performed in a given period of time. Transaction volume is useful to study because in technical analysis it determines the importance of the changes in price and the risk involved in a transaction.

Ruiz et al. [35] tested correlation at different lags between volume and graph features generated from Twitter interaction. The greatest correlation was found at lag zero. Mao et al. [20] found weak correlation between price and tweets mentioning the stock symbol AAPL, and moderate correlation to the daily volume traded. Sprenger et al. [39] and Oliveira et al. [23] found that the message volume is useful to predict the next-day trading volume using regression analysis.

3 Experimental setup

The tweets were collected from March 23, 2017 to July 3, 2017 using the Twitter API tweepy [34]. We only considered tweets with a single company in the message, this operation reduced the corpus up to 25% of its original size. The rationale for this filter was the observation that tweets with more than one stock symbol mentioned were either advertisements or contained a relationship between such entities that is outside of our scope. By filtering out such tweets it is also avoided to use the same tweet for different forecasts. A total of 141’007 tweets were used in the analysis. The stock symbols chosen based on their popularity are amzn (Amazon.com, Inc.), aapl (Apple Inc.), fb (Facebook Inc.), goog (Alphabet Inc.), msft (Microsoft Corporation), snap (Snap Inc.), twtr (Twitter Inc.), yhoo (Yahoo Inc.)2 and znga (Zynga Inc.). Other stock symbols were considered but not enough tweets were collected (less than 100). The distribution of the tweets3 in the corpus is shown on Fig. 1. The average of daily tweets is shown in Table 1. Below we show a sample of the tweets.

Fig.1

Distribution of tweets through the corpus.

Table 1

Average of daily tweets per company

Symbol	Trading hours	Outside trading hours	Weekends and holidays
amzn	151.72	125.55	87.27
aapl	163.52	167.00	103.17
fb	161.31	220.97	238.48
goog	56.97	72.39	60.36
msft	110.70	130.77	118.90
snap	110.49	89.21	32.05
twtr	143.47	134.79	104.28
yhoo	15.79	14.05	11.47
znga	12.12	10.65	9.65

“$AMZN Amazon launches store-pick grocery service in Seattle <url> ”

“Signal is Positive upward for Apple! $AAPL #AAPL #stocks #DayTrade #AI”

“#Facebook Messenger Reaches 1.2B Monthly Active Users Milestone. Read more: <url> $FB”

“Insider Trading Activity Alphabet Inc (NASDAQ:GOOG) Director Sold 24 shares of Stock <url> $GOOG”

The daily stock prices at open and close market times and the volume were collected from Google Finance.4

BOW is a traditional approach that has been widely used in the literature, it is a subclass of n-gram representation in which a word represents one dimension in a document vector. A word is represented by the number of times that it appeared in the document.

Word embeddings is the representation of words as vectors, it is a recently proposed methodology by Mikolov [21]. We used the Word2Vec vectors5 obtained from a trained distributed representation on news with three million words, each word is represented by a 300-dimensional array. The semantics captured by this model allows operations among words, a popular example of such powerful operation of the trained model is: “king” - “man” + “woman” = “queen”.

3.1 Correlating tweets with volume and returns

It was tested a lagged correlation between the daily number of tweets per company against the company’s open price, close price, the difference between close and open prices, the absolute value of such difference, transaction volume, returns ${Ret}_{t} = (\frac{p_{t} - p_{t - 1}}{p_{t - 1}}),$ (1) and logarithmic returns ${LogRet}_{t} = log (\frac{p_{t}}{p_{t - 1}}),$ (2) where p_t is the price at close market time t and p_t-1 is the price at open market time t - 1.

The time series of the tweets is created with the days for which at least one tweet was collected. The days with zero tweets are not considered as point in the correlation even if it is a working day, this is because not for all companies we started downloading at the same time.

In Table 2 we show the results for |ρ|>0.5, where ρ is the Pearson’s correlation coefficient. Although we compared with all the indicators mentioned above volume was the indicator that appeared the most. The column Days indicates the total number of points of the time series, for each lag a point is lost. High correlation with a lag 1 indicates that the number of tweets may cause the price or transaction volume fluctuation of the next day. High correlation with lag -1 indicates that the price or transaction volume may cause the next day’s tweet volume.

Table 2

Higher results for lagged correlation between tweet volume and financial indicators using a lag from -3 to 3

Symbol	Indicator	Correlation	Lag	Days	Symbol	Indicator	Correlation	Lag	Days
Before trading hours					During trading hours
amzn	Volume	0.6526	0	45	snap	Volume	0.9567	0	52
fb	Open	0.5170	-1	44	snap	Volume	0.6765	1	51
fb	Open	0.5509	0	45	twtr	Volume	0.5147	0	52
fb	Close	0.5140	-1	44	yhoo	Volume	0.5955	0	50
fb	Close	0.5210	0	45	yhoo	Volume	0.5603	1	49
snap	Volume	0.9141	0	40	yhoo	Volume	0.7739	3	47
snap	Volume	0.5546	1	39	After trading hours
During trading hours					snap	Volume	0.6846	0	52
amzn	Volume	0.6375	0	70	snap	Volume	0.9175	1	51
aapl	Volume	0.6056	0	67	snap	Volume	0.5748	2	50
snap	Volume	0.5408	-1	51	znga	Volume	0.5229	0	54

The time series shown on the Figs. 1 and 2 are z-normalized using $z (x_{i}) = \frac{x_{i} - μ (X)}{σ (X)},$ (3) where x_i ∈ X are the points in the time series X, μ (X) and σ (X) are the mean and standard deviation of X respectively.

Fig.2

(a) Correlation between Apple’s tweet volume and transaction volume lag = 0, ρ = 0.61; (b) Correlation between Amazon’s tweet volume and transaction volume lag = 0, ρ = 0.64; (c) Correlation between Snapchat’s tweet volume and transaction volume lag = 0, ρ = 0.96; (d) Correlation between Twitter’s tweet volume and transaction volume lag = 0, ρ = 0.51; (e) Correlation between Yahoo’s tweet volume and transaction volume lag = 0, ρ = 0.60; (f) Correlation between Yahoo’s tweet volume and transaction volume lag = 3, ρ = 0.77. For Figures (a)–(f), the tweet volume is measured during trading hours.

Fig.3

Associations between (a) Snapchat’s tweet volume and open price lag = 0, ρ = -0.17, LTAM = -0.74; (b) Yahoo’s tweet volume and open price lag = -2, ρ = -0.22, LTAM = -0.58; (c) Snapchat’s tweet volume and close price lag = 0, ρ = -0.19, LTAM = -0.61; (d) Yahoo’s tweet volume and open price lag = -1, ρ = 0.14, LTAM = -0.57; (e) Amazon’s tweet volume and returns lag = 1, ρ = -0.34, LTAM = -0.55; (f) Snapchat’s tweet volume and close price lag = 1, ρ = -0.19, LTAM = -0.75. For Figures (a)–(f), the window of the MAT is k = 2.

We can observe from Table 2 that tweet volume and trading volume showed more correlation at lag 0. This is consistent with [35], where the greatest correlation on average was ρ = 0.4728 at lag 0 and between tweets and trading volume. However they use the total number of tweets during the day, the daily change in close prices and daily traded volume. We also tried the total number of tweets during the day achieving similar results. Another interesting observation is the case of the company Snapchat, the result indicates that the volume of tweets the night before and the morning before are highly correlated with the transaction volume during the following trading hours.

3.2 MAP transform

The Moving Approximation (MAP) transform [2, 3] is the transformation of a time series of size n into a series of n - k + 1 local slopes. In order to obtain the slopes a linear least squares regression is passed through the time series using a sliding window of size k. The Local Trend Association Measure (LTAM) is the measure of cosine similarity between the series of slopes of two time series. The MAP transform and the LTAM use the time series trends instead of point to point comparison as in correlation analysis. The higher results of LTAM are shown on Table 3. Figure 2 depicts some of the associations found.

Table 3
Higher results for lagged Local Trend Association measure between tweet volume and financial indicators using a lag from -3 to 3

Symbol Indicator LTA Lag Days Symbol Indicator LTA Lag Days

Before trading hours During trading hours

amzn Volume 0.6856 0 45 snap Volume 0.9402 0 52

snap Open 0.5056 -1 39 yhoo Open -0.5669 -1 49

snap Open -0.7361 0 40 yhoo Volume 0.7210 3 47

snap Close -0.6812 0 40 After trading hours

snap Volume 0.8669 0 40 amzn Close-Open -0.5344 1 67

yhoo Open -0.5773 -2 32 amzn Returns -0.5468 1 67

During trading hours amzn Log Returns -0.5465 1 67

amzn Volume 0.6741 0 70 snap Open -0.7599 1 51

snap Open -0.6019 0 52 snap Close -0.7487 1 51

snap Close -0.6086 0 52 snap Volume 0.7715 1 51

Symbol	Indicator	LTA	Lag	Days	Symbol	Indicator	LTA	Lag	Days
Before trading hours	During trading hours
amzn	Volume	0.6856	0	45	snap	Volume	0.9402	0	52
snap	Open	0.5056	-1	39	yhoo	Open	-0.5669	-1	49
snap	Open	-0.7361	0	40	yhoo	Volume	0.7210	3	47
snap	Close	-0.6812	0	40	After trading hours
snap	Volume	0.8669	0	40	amzn	Close-Open	-0.5344	1	67
yhoo	Open	-0.5773	-2	32	amzn	Returns	-0.5468	1	67
During trading hours	amzn	Log Returns	-0.5465	1	67
amzn	Volume	0.6741	0	70	snap	Open	-0.7599	1	51
snap	Open	-0.6019	0	52	snap	Close	-0.7487	1	51
snap	Close	-0.6086	0	52	snap	Volume	0.7715	1	51

Using the LTAM we found negative associations not obtained by the correlation analysis.

3.3 Tweet classification

The task of tweet classification is done by automatically tagging the tweets according to the price trend in which they were generated, upward, downward or neutral. Then, a machine learning classifier is trained on some portion of the tweets randomly selected to predict the tag of the remaining tweets.

We used two word representations, BOW and WE, and different machine learning classifiers included in the scikit-learn API [5].

Tweet and stock time series are labeled with different time zones. It was necessary to adjust the tweets UTC time-stamp to match financial time series ET time zone. NYSE and NASDAQ markets open at 9:30 ET and close at 16:00 ET.

We used two different thresholds, 0.3 (Leung’s tagging) and 0.5 (Wütrich’s tagging), for an automated binning procedure to tag the tweets in the following manner. If the price change is greater or equal than the threshold then the tweet is considered positive, if the price change falls beyond the threshold the tweet is considered negative, otherwise the tweet is considered neutral. This type of tagging for the prices was used in [17] and [43]. The distrubution of the different tagging schemes is depicted in Table 4.

Table 4
Tag balance using different offsets

Wütrich’s threshold = 0.5 Leung’s threshold = 0.3

Company Tweets Positive (%) Neutral (%) Negative (%) Positive (%) Neutral (%) Negative (%)

amzn 21606 30.38 46.24 23.36 46.58 27.29 26.11

aapl 25179 27.78 49.17 23.04 33.44 37.53 29.01

fb 32627 20.82 61.31 17.85 32.16 45.52 22.30

goog 10513 23.36 57.41 19.22 33.97 41.08 24.94

msft 19523 19.24 63.61 17.14 32.14 43.41 24.44

snap 11193 48.88 13.71 37.39 50.46 10.13 39.40

twtr 16938 34.10 43.48 22.41 45.74 27.44 26.80

yhoo 1777 35.05 49.57 15.36 46.82 29.31 23.86

znga 1651 50.33 32.76 16.89 52.39 28.64 18.95

Wütrich’s threshold = 0.5	Leung’s threshold = 0.3
amzn	21606	30.38	46.24	23.36	46.58	27.29	26.11
aapl	25179	27.78	49.17	23.04	33.44	37.53	29.01
fb	32627	20.82	61.31	17.85	32.16	45.52	22.30
goog	10513	23.36	57.41	19.22	33.97	41.08	24.94
msft	19523	19.24	63.61	17.14	32.14	43.41	24.44
snap	11193	48.88	13.71	37.39	50.46	10.13	39.40
twtr	16938	34.10	43.48	22.41	45.74	27.44	26.80
yhoo	1777	35.05	49.57	15.36	46.82	29.31	23.86
znga	1651	50.33	32.76	16.89	52.39	28.64	18.95

We measure price change with Equation 1. For tweets generated within trading hours p_t is the close price and p_t-1 is the open price of the tweet date. For tweets generated outside trading hours p_t is the next open price and p_t-1 is the previous close price.

The preprocessing steps are as follows: (a) Lowercase the tweet. (b) Replace links by the word url. (c) Replace usernames by the word username. (d) Remove the stock symbol. (e) Replace some non-ambiguous english contractions e.g. “’m” with “am”. (f) Tokenization. (g) Remove stopwords. (h) Remove tokens that are numbers. (i) Remove tokens that do not contain letters. (j) Remove continuous repeated letters in each token if there are more than three, e.g. “haaaaaappy” is replaced by “happy”.

In the BOW approach we tried with different vocabulary sizes, a size of 1000, with the most common words gave the best results.

In the WE approach each word was transformed to their Word2Vec representation, i.e. a 300-dimensional vector. Tweets in which none of its words were found on the Word2Vec model are discarded. Then all retrieved vectors are averaged to obtain a single 300-dimensional vector that represents the tweet. The process of averaging all words returns a vector that is semantically similar to the tweet vectors [41].

Tweets are split randomly using 70% for train and the rest for test. We tested different classifiers, however the ones with greater accuracy are shown on Tables 5 and 6.

Table 5

Results for Bag of Words, the baseline is majority vote (MV), the voting classifier (VC) considers the classifiers in the three previous columns

Wütrich’s threshold = 0.5						Leung’s threshold = 0.3
	MV	SVC	DT	RF	VC	MV	SVC	DT	RF	VC
amzn	46.3%	46.3	45.9% **	45.9% **	46.3%	46.2%	46.5% *	45.9%	46.0%	46.2%
aapl	37.9%	38.3% **	37.8	36.6% **	38.2%	49.6%	49.7%	48.9% **	49.4% *	49.6%
fb	45.9%	45.9	46.3% **	45.6% *	45.8%	60.8%	60.8%	60.6%	60.8%	60.8%
goog	40.4%	41.0% **	33.8% **	391%	40.7%	56.9%	57.1%	55.3% **	56.5%	56.9%
msft	43.4%	43.5%	43.3%	43.4%	43.3%	63.9%	63.9%	61.6% **	63.8%	63.9%
snap	50.5%	50.5%	50.1%	50.0%	50.6%	48.9%	48.9%	48.5%	48.6%	48.8%
twtr	45.5%	45.4%	45.3%	44.8% **	44.8% **	44.6%	45.0% **	33.2% **	40.5% *	43.2% **
yhoo	44.8%	44.8%	44.8%	43.9%	44.8%	52.2%	52.2%	50.5%	50.5%	50.8% *
znga	54.3%	54.3%	52.5%	52.5%	54.5%	52.3%	52.3%	52.3%	52.1%	52.3%
Average	45.4%	45.5%	44.4%	44.6%	45.4%	52.8%	52.9%	50.7%	52.0%	52.5%

Table 6

Results for Word Embeddings, the baseline is majority vote (MV), the voting classifier (VC) considers the classifiers in the three previous columns

Wütrich’s threshold = 0.5						Leung’s threshold = 0.3
	MV	SVC	SGD	RF	VC	MV	SVC	SGD	RF	VC
amzn	46.3%	59.8% **	46.7%	52.8% **	56.9% **	46.2%	59.7% **	29.9% **	53.9% **	54.6% **
aapl	37.9%	55.9% **	41.3% **	50.8% **	53.4% **	49.6%	60.1% **	49.5% **	56.3% **	56.3% **
fb	45.9%	62.9% **	49.3% **	56.9% **	60.3% **	60.8%	69.6% **	22.6% **	65.4% **	66.9% **
goog	40.4%	60.3% **	42.9% *	57.2% **	59.1% **	56.9%	69.2% **	57.4% **	65.4% **	67.3% **
msft	43.4%	56.6% **	44.3%	51.5% **	54.2% **	63.9%	67.4% **	63.3% *	65.2% **	64.9% **
snap	50.5%	60.2% **	52.9% **	55.8% **	59.5% **	48.9%	59.6% **	48.9% **	55.7% **	56.3% **
twtr	45.5%	56.4% **	45.6%	52.4% **	55.6% **	44.6%	58.2% **	44.6% **	53.0% **	55.8% **
yhoo	44.8%	61.2% **	47.8%	57.2% **	59.1% **	52.2%	64.2% **	52.0%	63.2% **	65.1% **
znga	54.3%	65.5% **	58.8%	64.2% **	64.4% **	52.3%	65.3% **	40.4% **	64.6% **	61.4% **
Average	45.4%	59.8%	47.7%	55.4%	58.0%	52.8%	63.7%	45.4%	60.3%	60.9%

We used majority vote (MV) as baseline, which consists on predicting always the class with more examples on the training set. We also implemented a voting classifier (VC), which votes among the classifier predictions.

Dickinson [10] and Pagolu [24] compare both word representations on the sentiment classification task of stock tweets, they manually tagged tweets from the time series, 1000 and 3216 tweets respectively. Dickson obtained an accuracy of 68.5% for n-gram model and 63.4% for WE model, Pagolu obtained 70.5% and 70.2%. Comparing with these works, our task is to classify the tweet according to the trend in which they were generated. In our case the W2V approach gave better results than BOW.

We tried different classifiers such as Naive Bayes, k-nearest-neighbors, logistic regression or multilayer perceptron however the top three classifiers were selected based on the accuracy results. Once the best classifiers were selected, their parameters were optimized. The best ones are shown next. The parameters of the BOW classifiers were: For support vector classifier (SVC) RBF kernel, γ = 10 and C = 10. For decision tree (DT) maximum depth of tree = 10. For random forest (RF) Trees = 10, maximum depth of tree = 10. The parameters of the W2V classifiers were: For support vector classifier (SVC) RBF kernel, γ = 2 and C = 1. For stochastic gradient descent classifier (SGD) Default parameters. For random forest (RF) Trees = 10, maximum depth of tree = 5.

Analyzing results from Table 6 we see that the threshold of 0.5 gave better results, this may be because such threshold generates a less balanced set. However using the threshold of 0.3 two out of the three classifiers are also able to learn and surpass the threshold. The amount of data did not play a significant role. The companies YHOO and ZNGA with the less tweets performed better than AMZN or AAPL. Also in the WE approach with Wütrich’s tagging the difference between baseline and SVC for the companies FB (32627 tweets) and YHOO (1777 tweets) is similar.

In Table 5 the accuracy results using the BOW representation mostly fall under the baseline. However in Table 6, using the WE representation, the classifiers are able to surpass the threshold and therefore to identify the tweets during different trends, positive, negative and neutral. For the four tables two different tagging schemes were used.

To test the statistical significance of the accuracy results with respect to baseline we applied the McNemar’s test with $χ^{2} = \frac{(| e_{01} - e_{10} | - λ)^{2}}{e_{01} + e_{10}},$ (4) where $λ = {\begin{matrix} 1, & e_{01} + e_{10} < 25, \\ 0, & otherwise . \end{matrix}$ (5)

Here, e₀₁ is the number of examples that the baseline classified correctly and our classifier incorrectly and, correspondingly, e₁₀ is the number of examples that our classifier classified correctly and the baseline incorrectly.

McNemar’s test is a measure of the significance between two classifiers considering the examples on the test set where the classifiers disagree. On Table 6 considering Leung’s tagging for the stock TWTR we observe that SGD and the baseline have the same accuracy but the difference is statistically significant. This can be explained because although the same number of sample were classified correctly, the samples were not the same ones. As expected we can see that companies with more tweets showed the higher statistical significance. In Tables 5 and 6 the p-values smaller than 0.01 are marked with ** and p-values smaller than 0.05 are marked with *. The accuracy results greater than the baseline are in bold.

4 Conclusion

We show that the tweet messages generated during a positive, negative and neutral trends of stock prices can be distinguished by state of the art classifiers. This means that the topics on social media depend on the price change of a company’s stock. We measure our results considering how the balance of the dataset may influence the results.

The results obtained using Word Embeddings representation of tweets significantly outperform the results obtained by Bag of Words. This may be caused by the significant reduction in the dimensionality.

For more than half of the considered companies, we found moderate correlation between tweet volume and trading volume, which is consistent with the findings in [35] and with the Efficient Market Hypothesis [19]. We showed that for two of the companies tweet volume can be used for prediction of transaction volume with lag of one or three days. Regarding the consideration of the trading hours we found for one company that the volume of tweets generated the day before after the close time is correlated with the volume during trading hours the next day.

We also used the MAP transform and the LTAM to find associations between the tweet volume different indicators. We found inverse relationships that were not discovered by correlation analysis. For this reason, LTAM can be used as complementary measure to the correlation coefficient.

As our future work, we will increase the corpus size, try different techniques on the textual representations (e.g. add dimensionality reduction for BOW model) and add semantic analysis of the tweets. We would also combine our text approaches with time series specific forecasting techniques such as regression or ARIMA, this combination has shown good results in the literature [23, 37]. We also aim to use deep learning-based approaches with hierarchical architectures to enhance scalability and increase accuracy of our method [25 –29].

Footnotes

On June 19, 2017, Yahoo changed its stock symbol to AABA and although some tweets started to adopt the new symbol, YHOO was still used in Twitter.

There are less tweets at the beginning of the figure because we only considered at first the trading hours. This fact does not affect the experiments since we only consider the days where tweets are available per company.

Acknowledgments

This work was partially funded by CONACYT under the Thematic Networks program (Language Technologies Thematic Network project 281795), as well as by CONACYT Project 283778 and by Instituto Politécnico Nacional grants SIP 20171344, SIP 20172008, and SIP 20172044.

References

Abarbanell

J.S.

and Bushee

B.J.

, Fundamental analysis, future earnings, and stock prices, Journal of Accounting Research35(1) (1997), 1–24.

Batyrshin

, Herrera-Avelar

, Sheremetov

and Panova

, Moving approximation transform and local trend associations in time series data bases, In Perception-based Data Mining and Decision Making in Economics and FinanceSpringer, (2007), pp55–83.

Batyrshin

, Solovyev

and Ivanov

, Time series shape association measures and local trend association patterns, Neurocomputing175 (2016), 924–934.

Bollen

, Mao

and Zeng

, Twitter mood predicts the stock market, Journal of Computational Science2(1) (2011), 1–8.

Buitinck

, Louppe

, Blondel

, Pedregosa

, Mueller

, Grisel

, Niculae

, Prettenhofer

, Gramfort

, Grobler

, Layton

, VanderPlas

, Joly

, Holt

and Varoquaux

, API design for machine learning software: Experiences from the scikit-learn project,, In ECML PKDD Workshop: Languages for Data Mining and Machine Learning (2013), pp108–122.

Cambria

, Hazarika

, Poria

, Hussain

and Subramaanyam

, Benchmarking Multimodal Sentiment Analysis, arXiv preprint arXiv:1707.09538,(2017).

Chen

and Lazer

, Sentiment analysis of twitter feeds for the prediction of stock market movement, Stanford edu Retrieved January25 (2013), 2013.

De Choudhury

, Sundaram

, John

and Seligmann

D.D.

In Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, Can blog communication dynamics be correlated with stock market activity? ACM, (2008), pp55–60.

De Fortuny

E.J.

, De Smedt

, Martens

and Daelemans

Evaluating and understanding text-based stock price prediction models, Information Processing & Management50(2) (2014), 426–441.

10.

Dickinson

and Hu

, Sentiment analysis of investor opinions on twitter, Social Networking4(03) (2015), 62.

11.

Ding

, Zhang

, Liu

and Duan

, Using Structured Events to Predict Stock Price Movement: An Empirical Investigation, In EMNLP (2014), pp1415–1425.

12.

Gilbert

and Karahalios

, Widespread Worry and the Stock Market, In ICWSM (2010), pp59–65.

13.

Kwak

, Lee

, Park

and Moon

, In Proceedings of the 19th International Conference on World Wide WebWhat is Twitter, a social network or a news media? ACM, (2010), pp591–600.

14.

Lavrenko

, Schmill

, Lawrie

, Ogilvie

, Jensen

and Allan

, Mining of concurrent text and time series, In KDD-2000 Workshop on Text Mining (2000), pp37–44.

15.

Lee

C.S.

and Ma

, News sharing in social media, The effect of gratifications and prior experience, Computers in Human Behavior28(2) (2012), 331–339.

16.

Lee

, Surdeanu

, MacCartney

and Jurafsky

, On the Importance of Text Analysis for Stock Price Prediction, In LREC (2014), pp1170–1175.

17.

Leung

S.K.F.

, The Hong Kong University of Science and Technology, Automatic stock market: Predictions from World Wide Web data, Master’s thesis1997.

18.

A.W.

, Mamaysky

and Wang

, Foundations of technical analysis: Computational algorithms, statistical inference, and empirical implementation, The Journal of Finance55(4) (2000), 1705–1765.

19.

Malkiel

B.G.

, Reflections on the efficient market hypothesis, 30 years later, Financial Review40(1) (2005), 1–9.

20.

Mao

, Wei

, Wang

and Liu

, Correlating S&P 500 stocks with Twitter data ACM, In Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research (2012) pp69–72.

21.

Mikolov

, Chen

, Corrado

and Dean

, Efficient estimation of word representations in vector space, arXiv preprint arXiv:3781, 2013.

22.

Nassirtoussi

A.K.

, Aghabozorgi

, Wah

T.Y.

and Ngo

D.C.L.

, Text mining for market prediction, A systematic review, Expert Systems with Applications41(16) (2014), 7653–7670.

23.

Oliveira

, Cortez

and Areal

, Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter, ACM, In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics (2013), p31.

24.

Pagolu

V.S.

, N.

, Challa

, Panda

and Majhi

, Sentiment Analysis of Twitter Data for Predicting Stock Market Movements, arXiv preprint arXiv:1610.09225 (2016).

25.

Poria

, Gelbukh

, Das

and Bandyopadhyay

, Fuzzy clustering for semi-supervised learning-case study: Construction of an emotion lexicon, In Mexican International Conference on Artificial Intelligence, MICAI 2012, number 7629 in Lecture Notes in Computer Science (2012) 73–86Springer.

26.

Poria

, Cambria

and Gelbukh

, Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, In 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 (2015), pp2539–2544.

27.

Poria

, Cambria

and Gelbukh

, Aspect extraction for opinion mining with a deep convolutional neural network, Knowledge-Based Systems108 (2016), 42–49.

28.

Poria

, Cambria

, Hazarika

and Vij

, A deeper look into sarcastic tweets using deep convolutional neural networks, arXiv preprint arXiv:1610.08815, (2016).

29.

Poria

, Chaturvedi

, Cambria

and Bisio

, Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis,Neural Networks (IJCNN)}, IEEE, In 2016 International Joint Conference onǎNeuralpp4465–4473 (2016).

30.

Poria

, Chaturvedi

, Cambria

and Hussain

, Convolutional MKL based multimodal emotion recognition and sentiment analysis, IEEE, In 2016 IEEE 16th International Conference on Data Mining (ICDM) (2016) pp439–448.

31.

Poria

, Cambria

, Bajpai

and Hussain

, A review of affective computing: From unimodal analysis to multimodal fusion, Information Fusion37 (2017), 98–125.

32.

Poria

, Cambria

, Hazarika

, Majumder

, Zadeh

and Morency

, Context-dependent sentiment analysis in usergenerated videos, In 55th Annual Meeting of the Association for Computational Linguistics (Long Papers) (2017), Volume 1: pp873–883.

33.

Poria

, Cambria

, Hazarika

, Mazumder

, Zadeh

and Morency

, Multi-level multiple attentions for contextaware multimodal sentiment analysis, New Orleans, In ICDM 20172017.

34.

Roesslein

, Tweepy Documentation, 2016.

35.

Ruiz

E.J.

, Hristidis

, Castillo

, Gionis

and Jaimes

, Correlating financial time series with micro-blogging activity, ACM, In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (2012) pp513–522.

36.

Sankaran

K.P.

, PhD thesis, Hong Kong University of Science and Technology, A system to forecast currency exchange rates1996.

37.

, Mukherjee

, Liu

, Li

and Deng

, Exploiting topic based twitter sentiment for stock prediction,), ACL (2 (2013), 24–29.

38.

Smailović

and Grčar

, Lavrač

and Žnidaršič

Predictive sentiment analysis of tweets: A stock market application, Unstructured, Big Data, In Human-Computer Interaction and Knowledge Discovery in Complex (2013) pp77–88Springer.

39.

Sprenger

T.O.

, Tumasjan

, Sandner

P.G.

and Welpe

I.M.

, Tweets and trades: The information content of stock microblogs, European Financial Management20(5) (2014), 926–957.

40.

Tetlock

P.C.

, Giving content to investor sentiment: The role of media in the stock market, The Journal of Finance62(3) (2007), 1139–1168.

41.

Tomar

, Godin

, Vandersmissen

and De

, Neve and R. Van de Walle, Towards Twitter hashtag recommendation using distributed word representations and a deep feed forward neural network, IEEE, In 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI 2014) (2014) pp362–368.

42.

Wolfram

M.S.A.

, Modelling the stock market using Twitter, Master’s thesisUniversity of Edinburgh2010.

43.

Wüthrich

, Permunetilleke

, Leung

, Lam

, Cho

and Zhang

, Daily prediction of major stock indices from textual www data, HKIE Transactions5(3) (1998), 151–156.

44.

Zadeh

, Chen

, Poria

, Cambria

and Morency

, Tensor fusion network for multimodal sentiment analysis, arXiv preprint arXiv:1707.07250, (2017).

45.

Zhang

, Fuehres

and Gloor

P.A.

, Predicting stock market indicators through twitter “I hope it is not as bad as I fear”, Procedia-Social and Behavioral Sciences26 (2011), 55–62.

46.

Zhang

, Fuehres

and Gloor

P.A.

, Predicting asset value through twitter buzz, In Advances in Collective Intelligence 2011 (2012) pp23–34Springer.

47.

Zhang

, PhD thesis, University of Oxford, Using Financial Reports to Predict Stock Market Trends with Machine Learning Techniques2015.

Wütrich’s threshold = 0.5					Leung’s threshold = 0.3
Company	Tweets	Positive (%)	Neutral (%)	Negative (%)	Positive (%)	Neutral (%)	Negative (%)
amzn	21606	30.38	46.24	23.36	46.58	27.29	26.11
aapl	25179	27.78	49.17	23.04	33.44	37.53	29.01
fb	32627	20.82	61.31	17.85	32.16	45.52	22.30
goog	10513	23.36	57.41	19.22	33.97	41.08	24.94
msft	19523	19.24	63.61	17.14	32.14	43.41	24.44
snap	11193	48.88	13.71	37.39	50.46	10.13	39.40
twtr	16938	34.10	43.48	22.41	45.74	27.44	26.80
yhoo	1777	35.05	49.57	15.36	46.82	29.31	23.86
znga	1651	50.33	32.76	16.89	52.39	28.64	18.95