Abstract
Analysis of social media data like tweet feeds can reveal market sentiments. So, researchers are trying to forecast the stock market behaviour through social media analytics. However, the extant research broadly focused on a longer time horizon and attempted to forecast mostly stock market level indicators. On the contrary, we employ social media analytics to forecast stock market’s spontaneous behaviour as a reaction to a macroeconomic event, that is, Indian Budget announcement on 28 February 2015. We captured stock market reactions through company-level Cumulative Abnormal Returns (CAR). We collected around 0.37 million budget related tweets during our three-day event window. Our empirical evidence, of 190 firms from 8 different industries, confirms that industry tweet volume and sentiment can be an indicator of company-level share price movements. This article contributes to the extant literature of information science research as well as behavioural finance by demonstrating the applicability of social media analytics for event study methodology.
Introduction
Forecasting of stock market behaviour has attracted researchers from different disciplines. The extant literature from economics and finance areas have used rigorous econometric modelling, whereas information science researcher (such as Chen, Leung & Daouk, 2003) uses machine learning techniques to predict stock market movements with limited accuracy. Random walk hypothesis proposes that stock market follows a random walk. So, past price movements or trends cannot predict future prices. Information science research employs artificial neural network or other machine learning techniques to predict future prices from past share-price trends. Predictability power of the probabilistic neural network is better than a random walk model (Chen et al., 2003). Efficient capital market hypothesis proposes that share price of a company reflects all available information as well as expectations about the future growth potential of a company (Fama, 1970). Thus, share price movement is mostly driven by new information and efficacy of forecasting depends on efficient incorporation of this new information in econometric modeling. New information can be firm-specific as well as it can be at macroeconomic level like exchange rate fluctuation, inflation, etc.
However, these streams of research fail to capture individual investor’s sentiment/emotion and their cumulative effects on share price movements. A social media platform, such as Twitter, can capture the public sentiment. Analyzing of tweet feeds can capture individual investor’s sentiment and might enhance the predictive power of forecasting models. A handful of studies (like Bollen, Mao & Zeng, 2011; Mao et al., 2012; Oliveira, Cortez & Areal, 2013) use tweet data for stock market prediction and none of them are in developing economies like India, to the best of our knowledge. Stock markets in developing economies are not as efficient as the Anglo-Saxon world. These studies mostly consider a longer horizon to explore the relationship between stock market behaviour and public mood through time-series analysis. Prior studies mostly focus either at the overall stock market level (like Bollen et al., 2011) or at the company level. We explore whether social media mining can be used for event study methodology. Moreover, in addition to the overall stock market indicator, our study also explores share price movements at the company level.
Event study methodology can assess the impact of an event on the share price movements of a company. The event can be company specific or can be at the industry or macroeconomic level. This article explores the impact of a macroeconomic event, namely Indian budget speech, on the overall stock market indicator as well as individual companies. Budget speeches indicate government’s proposed income and expenditure as well as economic policies for the next financial year. These policy level decisions, which are macroeconomic in nature, might have a differential impact on different sectors. So, we expect that share prices of companies from a certain industry would get appreciated while others might not.
We test our proposition by exploring tweet data related to Indian Budget 2015. We collected around 0.37 million budget related tweets during the period 28 February 2015 (Budget announcement day) to 2 March 2015. Emotion analysis strongly associates this budget with ‘joy’ and this also got reflected in the overall upward movement of the stock market indicator in contrast to the last three years (see Table 1). Sentiment analysis further allowed us to dissect what all issues, captured through hashtags/trends, got appreciated or critiqued in the social media. Next, we use community detection algorithm to identify sub-themes within our data. Finally, our empirical evidence, of 190 firms from 8 different industries, confirms that industry-related tweet volumes and sentiment can be an efficient predictor of the stock market behaviour at the company level. Our robustness tests confirm the consistency of our findings. Our study demonstrates that social media analytics can predict the impact of a macroeconomic event (such as budget speech) on share price movements.
Macroeconomic Events, Social Media and Abnormal Returns
Behaviour of the stock market as a response to publicly available information is a well-researched area. The extant research explored ‘whether the amount of information that is publicly reported affects the trading activity and the price movements in securities markets’ (Mitchell & Mulherin, 1994). They observe that publicly available information impacts the trading volume and market activity. However, empirical evidence is inconclusive in nature. ‘Animal spirits’ of investors, as said by Keynes, are hard to predict and they might be irrational at times. Prior studies could not capture the sentiment or animal spirits of individual investors.
A seminal work in information science research addresses this limitation by innovative use of large-scale twitter feeds (Bollen et al., 2011). They analyze the text content of around 10 million tweets over a period of nine months to predict the stock market behaviour. They use mood tracking tools to capture emotions (namely calm, alert, sure, vital, kind and happy) and use the same to predict the stock market movement, that is, closing values of the Dow Jones Industrial Average (DJIA). A few other studies follow similar methodologies to predict stock market indicators (Sul, Dennis & Yuan, 2014; Zhang, Fuehres & Gloor, 2011). For example, Oliveira et al. (2013) predicted various stock market indicators of nine large US technological companies instead of overall market movements. Mao et al. (2012) explored the relationship between the volume of tweets and different capital market indicators at three different levels, namely overall stock market, industry level and individual company stocks, and noticed that number of tweets can be an important predictor for stock market movements.
However, the extant literature mostly considered a longer horizon and explored co-movements of tweet volumes/sentiments and share prices during their study period. Rarely any study considered social media analytics to predict the share price movements as a consequence of a specific event. Event study methodology is widely used in finance and economics disciplines. For example, Amoako-Adu (1983) explored how ‘differential taxation of dividends and capital gains’ during Canadian Tax Reform of the 1970s impacted the high-yield and low-yield stocks differently. Event study methodology, for a shorter event horizon, allows to precisely capture the share market reactions as a consequence of a specific event.
Extant literature considers the abnormal return (that is, the difference between the stock market return related to the specific event and the firm’s historical return which is a normal return) of share prices to assess the impact of an event. Conceptually, the normal return is defined as the return if the event would not have occurred. So, abnormal returns (ARs) is defined as follows,
We define our mean adjusted CAR (Cumulative Abnormal Returns) as follows,
where Rit is the daily stock return, N is the number of average trading days in the estimation period and n is (event period –1 day)/2. Preceding period (or non-event horizon) for estimating normal return is (–N to –n). Event horizon (or event window) is (–n to +n). Thus, CAR would be calculated for the event window/horizon as –n to +n. Prior studies have considered various event windows. For example, an event window of 11 days indicates –5 days before the ‘0’ day (that is the event day) to +5 days after.
Indian Union Budget 2015
The Union Budget is the annual financial statement of the Republic of India. The finance minister of India presents this annual budget in the Parliament on the last working day of February. The political regime in India for the past one and a half decade was broadly dominated by two major political alliances, namely Congress-led United Progressive Alliance (UPA) and BJP-led National Democratic Alliance (NDA) (see Table 1). However, during election years, ruling (or outgoing) government presents an interim budget during the month of February and the new government presents the full budget few months after the election results (mostly in July).
Union Budgets & Stock Market Reactions
Union Budget speech, by the finance minister, presents a detailed account of government receipts and expenditure. Union budget is categorized into two parts, namely revenue budget and capital budget. Revenue budget comprises government’s revenue receipts through tax revenue (such as income tax and excise duty) as well as non-tax revenue (such as interest receipts and profits) and expenditures met from such revenues. Capital budget consists of capital receipts through market borrowing, recovery of loans, etc. as well as capital expenditure for developing health infrastructure, education, etc. (which leads to the creation of assets). If government’s total expenditure exceeds its total revenue, it would result in a fiscal deficit. Budget speech is an important indicator of government’s economic policy for the next financial year. For example, a socialist government might increase its capital expenditure for socio-economic development, whereas a pro-industry government can reduce direct the corporate tax or indirect taxes like service tax, customs duty or excise duty. If the budget sentiment is favourable to corporate sector, stock market reacts positively. Table 1 reports the same for last one and a half decade.
Minister of railways presents the Rail Budget (which is the Annual Financial Statement of Indian Railways) a few days before the Union Budget. This year it was on 26 February 2015. Before the budget, Finance Ministry of India also presents the Economic Survey which reviews the performance of major development programmes, and discusses government’s policy initiatives and growth prospects of the Indian economy. This can be considered as an integral part of the Union Budget. Immediately after the Union Budget, Reserve Bank of India (RBI), which is India’s central banking institution, announced a repo rate (the rate at which RBI lends money to commercial banks) cut by 25 basis points to control inflation, on 4 March 2015. This announcement was in contrast to market expectations. So, the stock market did not react positively to this announcement. Thus, this article considers a shorter event horizon or event window of just three days. This allows us to assess the impact of the only budget announcement and not any other related events like RBI announcement. Our preceding estimation period is previous six months (instead of the full year) because the previous budget (that is, the budget for the financial year 2014–2015) was announced in the month of July (see Table 1). We preferred to exclude the previous budget announcement date from our preceding estimation period. Assuming roughly 252 trading days in a year, we consider 126 trading days (i.e., N) for a six month period. Considering an event window of three days, n would be i.e., [(event period –1 day)/2]. Thus, the number of trading days for non-event horizon is 124 days i.e., (N – n – 1).
The government can have different policies for different industries/sectors. For example, this year Union Finance Minister Arun Jaitley increased excise duty on cigarettes and other tobacco products for promotion of public health, whereas reduced duties on commonly used day-to-day items to reduce the burden of common man. So, share price trend was downward for tobacco companies. A tweet effectively captures this sentiment, as follows, … smokers hit hard as excise duty on cigarettes and cigars hiked smoking will become costlier.
Sometimes an opinion leader can immensely influence the public mood. For example, Chanda Kochhar, Managing Director and CEO of a leading Indian private sector bank was highly optimistic about the budget which gets mentioned in several tweets and retweets, for example, union budget for fiscal 2016 is the finance minister’s gift to nation says icici bank md & ceo chanda kochhar
Market participants interpret this (a positive view from an industry leader) as an indication of optimistic scenario for the economy as well as for the financial sector. If social media platforms like twitter discussion, regarding Union Budget 2015, effectively captures the overall market sentiment then that would also influence the behaviour of investors, which in turn would influence the share price movements. Hence, we hypothesize,
Data Collection
Our data comprises Union Budget related tweets posted by common people, key political personalities and media houses. We have manually identified around 60 trending topics and hashtags related to Union Budget 2015. Most of these hashtags were propagated by leading Indian media houses. We have collected roughly 0.64 million budget related tweets from 25 February 2015 to 4 March 2015 (see Figure 1). It is important to note that date time stamp of tweet data, extracted through API (application program interface), follows the GMT (Greenwich Mean Time). Correct date time stamp, of Indian context, is extremely crucial for our event study methodology. So, we rectified the date time stamp of our data according to IST (Indian Standard Time) by adjusting the time gap. Figure 1 indicates that the budget related tweet volume suddenly peaked on the day of the budget and tapered off during the next few days.
Indian stock market is usually closed on Saturday and Sunday. However, this time, a lobby group of stockbrokers
1
argued that
“[i]n the absence of an operational market on Saturday, 28 February 2015, price discovery process will be hampered and market participants, investors will not get the requisite opportunity to take considered decision on their portfolio … This is expected to increase volatility when market reopens on Monday, 2 March 2015, which may hurt market participants and small retail participants in particular.”

So, regulators kept the share market open on the budget announcement day. Thus, for our final analysis, we consider roughly 0.37 million tweets from 28 February 2015 to 2 March 2015 (highlighted in Figure 1). This period is also in sync with our three days event window for CAR calculation. This allows us to explore the relationship between share price movements, that is, CAR (1, –1) and spontaneous discussion in social media.
Next, we generated a word frequency list from our final data of 0.37 million tweets. Manually, we have identified industry related keywords for industry sentiment analysis. Next, we grouped industry-specific keywords as well as company names (for that particular industry) into eight industry sectors (see Table 3). However, a tweet which has joint mention of two industries can be problematic especially for sentiment analysis. This will be more problematic when the tweet is positive about one industry and negative about the other. Thus, we did not consider tweets which have joint mention of two industries for our share price related analysis. We have combined Hu & Liu’s opinion lexicon (Hu & Liu, 2004) and ‘AFINN-111’ list of English words by Nielsen (2011) for our sentiment analysis. Our sample for industry level analysis was roughly 45,000 tweets.
The secondary share price data required for this study were extracted from Centre for Monitoring Indian Economy (CMIE database). We considered companies listed in Bombay Stock Exchange (BSE) of India. We considered top 500 companies, that is, BSE 500 which represents around 90 per cent of the total market capitalization on BSE. Our sample consists of 190 companies from this BSE 500 which belongs to either of our eight industry sectors (see Table 3 for details). Except these eight industries, tweet feeds about other remaining industries were scare. So, we could not incorporate them in our analysis. We considered Sensex as the market proxy for Indian equity market (similar to DJIA of USA). Sensex is a free-float market-weighted stock market index of 30 financially sound companies from diversified industrial sector listed on BSE.
Data Analysis and Findings
Table 2 reports a summary snapshot of key trending topics in our data. We have reported top six trending topics—in terms of tweet volumes, and trends associated with positive and negative sentiments. First two columns report the number of tweets and cumulative sentiment score of all these tweets. Kindly note that if a single tweet mentions two hashtags/trending topics then we considered that particular tweet for both the hashtags in Table 2. Interestingly, many of these hashtags used regional languages. For example, sabkabudget which was one of the top trending topics means ‘budget for all’. Thus, a nuanced understanding of the context is essential for effective social media analytics.
Next, we calculated the average sentiment score of these trending topics by dividing the cumulative sentiment score by the volume of tweets. We explored which all hashtags were associated with the positive sentiment (as well as the negative sentiment) irrespective of their volume. We observed that roughly 85 percent of hashtags are associated with positive sentiments which further got confirmed in our emotion distribution analysis. Roughly 58 percent of budget tweets are associated with ‘joy’ (see Figure 2). However, a careful introspection of hashtags associated with negative sentiments reveal areas of concern for policymakers. Interpretation of hashtags with strong negative sentiments requires a fine-grained understanding of research context.
Just prior to the budget there was an instance of corporate espionage. Some budget documents and official letters related to the upcoming budget speech of finance minister got leaked. Senior executives and consultants of a few leading Indian companies got arrested. So, this incident got reflected in the twitter trend, namely, corporateespionage. Similarly, the opposition parties raised their concern against the government’s Land Acquisition Bill which also got reflected in our trend analysis. Same is the case with another social scheme, namely, MNREGA (that is, Mahatma Gandhi National Rural Employment Guarantee Act). This was initiated by the earlier government and the present Prime Minister was a bit critical about the effectiveness of this scheme. Finance minister gave a marginal hike (in terms of resource allocation) to this programme but a section of the society felt that it was not sufficient.
Trending Topics of Union Budget 2015

A section of our sample considered the government as pro-industry and they labelled this budget as corporatebudget, that is, a pro-corporate budget. However, the trending topic middleclass was volume-wise significant as well as strongly associated with positive mood which negates this pro-industry argument. Thus, social media analytics can effectively capture the discontent of the society. This analysis gives a clue to policymakers as to where they need to focus and how to make a good economic policy even better.
Social media analysis also allows us to explore a user’s influence on others. We compare two different measure of influence, namely Reply influence and Mention influence. According to Twitter 2 ‘a @ reply is any update posted by clicking the Reply button on a Tweet’, whereas ‘a mention is any Twitter update that contains ‘@username’ anywhere in the body of the Tweet.’ We identified around 70,000 unique users in our data. However, only 134 (which is just 0.2 percent of all users) users got ‘replies’ as well as ‘mentions’ by other fellow users. Most of these influential users are either political personalities or their official twitter handles (such as arunjaitley, FinMinIndia, MIB_India, narendramodi, PMOIndia etc.) or media houses (such as EconomicTimes, ndtv, TimesNow, timesofindia etc.). We sorted these 134 users according to our influence measures, that is, the number of total replies/mentions received by a user. Rank 1 indicates the most influential user (in terms of the total number of replies/mentions) and increasing rank indicates a less influential user. We allotted the same rank for users with the same influence value. This way we ranked all 134 users for Reply influence and Mention influence. Next, we use Spearman’s ρ (Spearman’s rank correlation coefficient) to test the relationship between our two influence measures. Spearman’s ρ is a non-parametric test which captures the statistical dependence between two rank variables. It is defined as follows,
where xi and yi are the ranks of users based on two different influence measures in a dataset of N users. We observe a statistically significant correlation of 0.53 among these two influence measures. Thus, we can conclude that users who receive more replies also get more mentions.
Next, we explored what all subthemes have emerged from our data. Co-occurrence of key terms in individual tweets can identify various subthemes through the formation of an information network. We used the fast greedy community detection algorithm to detect various communities from our dataset. We introduced a threshold frequency of words to visualize the primary structure of our network. Co-occurrence rate of keywords (within a community) also provide crucial information about their closeness to each other from the agglomerative point of view. Nodes in this network are coloured according to their coexistence within the network. We observe five distinct communities/subthemes from our data (See Figure 3). The biggest subtheme (in terms of the number of nodes) is related to the overall economic development of the country (coloured red) which gets captured through social media discussions related to GDP (Gross Domestic Product), fiscal deficit, investment boost, infrastructure development etc. An interesting node in this community is gold which captures the ‘gold monetization scheme’ 3 announced by the finance minister. The most obvious subtheme (coloured blue) for any budget would be the discussion whether the budget is pro-poor, pro-rich or for the middle class. Social sectors heavily dependent on government subsidies like those provided in public health or education emerged as one subtheme (coloured lemon green). Another subtheme is related to the corporate sector. The governemnt announced corporate tax cut to attract investments and create more job opportunities which get captured through corporate and service tax related discussion (coloured purple). Finally, the smallest subtheme (in terms of the number of nodes) that emerged from our data is related to the stock market which gets captured through co-occurrence of words such as sensex and stock market. This subtheme justifies our research question: how the stock market reacts to budget announcement?
We attempt to answer this question by analyzing the market sentiment captured through tweet data. We used Ordinary Least Square (OLS) regression to test our hypothesis regarding the relationship between share price movements and twitter volume/sentiment. Our dependent variable is CAR (–1, 1). Our main explanatory variables are Twitter Volume (Twt_Vol) and Twitter Sentiment (Twt_Sent). However, there is a significant variation across industries in terms of tweet volume (see Table 3). Thus, we also considered logarithm of these two variables to control heteroskedasticity: Log_Twt_Vol and Log_Twt_Sent. We also controlled a few firm-specific variables in our robustness tests as follows: firm’s age, firm’s size and firm performance. We considered market-to-book ratio (market capitalization divided by total assets) as a proxy of firm’s performance. However, a market-to-book ratio significantly varies across industries. Following the finance literature, we subtracted the industry median market-to-book ratio from the firm’s market-to-book ratio. Our firm performance variable is the industry adjusted market-to-book ratio (MB_Ratio). We extracted data for these firm-specific variables as on 31 March 2014 (which was the latest available financial year-end data in the Indian context).
Table 4 reports the OLS regression analysis. Coefficients of twitter volume (model 1) and sentiment (model 2) are statistically significant and positive at 0.01 percent level of significance (p < 0.001). The results remain consistent (at p < 0.01) for log of tweet volume and log of tweet sentiment (models 3 and 4). Our results are robust even after controlling firm-specific variables (models 5 to 8). This strongly supports our Hypotheses 1 and 2. So, higher twitter volume or positive sentiment about an industry can be an indicator of share price movements. Probably, the only concern of our statistical analysis is low coefficient values (models 1 and 2) and R-square values. It is important to note that we are not estimating a simple linear relationship between two variables. We are estimating the relationship between abnormal gains (with respect to its historical gains during the preceding period) and market sentiment at the industry level. So, this justifies low coefficient values. R-sqaure in prior studies was also in the range of 0.001 to 0.0002 (Bollen et al., 2011). Moreover, with respect to the finance literature these R-squares ‘may seem small, they are comparable to effects seen … on the impacts of information on future stock returns’ (Sul et al., 2014).
Our Sample Description
The Effects of Twitter Volume and Sentiment on Share Price Movements
This table reports Ordinary Least Square (OLS) regressions examining the incremental effect of industry level twitter volume and sentiment on company level Cumulative Abnormal Returns (CAR). The dependent variable is CAR (1, –1). The sample consists of 190 publicly traded companies on the Bombay Stock Exchange (BSE). These companies belong from eight industries mentioned in Table 3 as well as these companies are part of BSE 500 companies. Data (excluding the twitter data) are obtained from CMIE database. The standard errors are reported in the parenthesis. Twitter related explanatory variables are at industry level. Other explanatory variables are a set of firm-specific variables such as Log of Firm’s Age, Log of Firm’s size (measured through Log of Firm’s Market Capitalization) and Firm’s performance (measured through Industry adjusted Market-to-Book ratio).

Conclusions
Twitter has become immensely popular among information science researcher for exploring various socio-economic phenomena. Predicting stock market movement is one of the emerging research domains within the same. However, prior studies mostly explored the applicability of twitter data for time series kind of analysis in the context of advanced economies. Developing economies are challenging on multiple fronts because: stock markets are not as advanced or efficient like developed economy and access to the Internet is limited to roughly 20 percent of the total population. Moreover, English is not the natural language in many of these countries. Thus, developing economies offer a challenging research context to test the applicability of social media analytics for predicting stock market behaviour. Our study explores social media analytics for event study methodology and that also in the context of company level price movements. We set our study in India and explored the effects of 2015 Union Budget announcement on share price movements.
We were able to extract 0.27 million tweets only on the day of the budget. If we assume that this is just 1.0 percent of the total data (according to twitter API policy) then this huge volume of social media discussion itself is a justification of our research. We considered an event window of 3 days for our analysis. Our analysis of trending topics and emotion distribution suggests that social media was highly optimistic about this 2015 Union Budget. This got reflected in the overall upward movement of the stock market indicator. We also observed that most influential users are either political personalities or media houses. Higher mentions or replies to users like arunjaitley or FinMinIndia on the day of the budget is intuitive, but it is interesting to note that media is also playing the role of opinion leader in social media discussion. This leads to the puzzling question: what is the role of electronic media in forming opinion among masses? Future research needs to address the same.
Next, we used community detection algorithm to dissect the social media discussion further. Communities emerging from our analysis effectively captured different subthemes of this budget discussion. Co-occurrences of keywords within a tweet allowed us to visualize the primary structure of our network. Finally, we tested our hypothesis for 190 firms from 8 different industries. This is one of the largest samples (which explores the relationship between twitter and share prices) to date to the best of our knowledge. Our empirical evidence strongly supports our hypothesis, that is, social media analytics can predict stock market movements. Our results remain consistent across various robustness tests. Our study establishes the applicability of social media analytics for event study methodology. We demonstrated how industry level tweet data can predict company level share price movements. Prior studies used company-level twitter data for predicting company level share price movements. We could not test the same because volumes of company specific tweets were scarce. This is one potential limitation of our study. Abnormal returns can be calculated either on the market adjusted returns, that is, on the basis of capital asset pricing model (CAPM) or mean adjusted returns, that is, on the basis of historical returns. We have considered mean adjusted returns to explore how the stock market reacted to budget announcement in the past (see Table 1) as well as company level share price movements as a consequence of 2015 Union Budget (see Table 4). However, future studies can also consider market adjusted returns for company level analysis.
