Abstract
The media and election campaign managers conduct several polls in the days leading up to the presidential elections. These preelection polls have a different predictive capacity, despite the fact that under a Big Data approach, sources that indicate voting intention can be found. In this article, we propose a free method to anticipate the winner of the presidential election based on this approach. To demonstrate the predictive capacity of this method, we conducted the study for two countries: the United States of America and Canada. To this end, we analysed which candidate had the most Google searches in the months leading up to the polling day. In this article, we have taken into account the past four elections in the United States and the past five in Canada, since Google first published its search statistics in 2004. The results show that this method has predicted the real winner in all the elections held since 2004 and highlights that it is necessary to monitor the next elections for the presidency of the United States in November 2020 and to have more accurate information on the future results.
Introduction
The internet has become a great source of data. More than half of the world’s population uses it (Internet World Stats, 2020) and there are more than 4.2 billion Google searches every day (World Bank, 2016). Thus, data mining has become a tool of great assistance to the academic and professional world, opening the possibility of working with a high volume of data with a single click. Given this high activity, search engines have become predictors of consumers’ purchasing intention (Vosen & Schmidt, 2011), macroeconomic variables (Ettredge et al., 2005), social indicators (Hagerty & Veenhoven, 2003) or infectious diseases (Carneiro & Mylonakis, 2009). In addition, and as would be expected, search engines have become predictors of future election results.
New digital channels have focused the attention given to politicians as a communication means, but also as a means of collecting electoral information, and have become a key metric in the design and monitoring of election campaigns. The use of the internet by citizens has changed the design of election campaigns, participation in politics and how messages are conveyed. Dimitrova et al. (2014) point out that the effect of social networks is stronger on participation than on political knowledge. Zhang et al. (2010) explain the effect of the political engagement of new channels. Westwood (2010) questions to what extent the power of the information available on these channels reaches the formation of public opinion, questioning whether we are dealing with Googleocracy or Googlearchy. Kreiss and McGregor (2018) point out that technology companies such as Facebook, Twitter, Microsoft, and Google have become almost digital consultants for election campaigns following the experience of the 2016 presidential elections in the United States, defining the strategy, contents, and execution.
Ma-Kellams et al. (2018) state that Big Data in the form of Google search has emerged as the most powerful predictor of political behaviour compared with other aggregate measures. In this way, data generated on digital channels reflects citizens’ attitudes and intentions, and therefore have an explanatory and predictive nature.
Polls are becoming less credible in forecasting election results (Barnett et al., 2018). Therefore, this work proposes using Google Trends as a prediction tool for election results as an alternative, as it is a simple and effective method of measuring election results (Gómez-Martínez & Prado-Román, 2014). Google is the predominant search engine in most countries (World Bank, 2016). In addition, the registered and monitored data and statistics are available to users for consultation and/or use, facilitating their use.
In Canada, the last presidential elections took place in 2019, so the data are recent. By collecting data since 2004, we will have the opportunity to validate the evolution of candidate searches almost to the present date, allowing us to validate the tool. The next presidential elections in the United States will be as soon as November 2020. Taking into account the political instability and an unstable global environment, which is a consequence of the pandemic produced by the COVID-19, having a fast and credible tool that provides more reliable information on election results is a great opportunity.
The structure of this research is as follows. First, a theoretical review is carried out, which sets the background on the use of data for predicting different election campaigns. Second, the sample and methodology are presented, followed by the results. Finally, the main implications and conclusions of this study are discussed.
Theoretical Framework
The use of polls and data analysis in politics and in election campaigns is nothing new. Since the origin of elections, obtaining information to design and monitor campaigns is a key tool for designing the electoral strategy. The key role of data in campaigns has not changed, what has changed is the way of obtaining this information and the greater certainty provided by obtaining data through different information sources, which allows candidates to adapt their strategies and respond to their electorates’ reactions (Trevisan et al., 2018).
As data has become more sophisticated and in particular, obtaining it, data’s predictive capacity has been improving. Hence, in the past, it seemed that data were not useful and now political parties have teams of professionals trained in data analysis and intelligence systems. Early studies on the impact of information on campaigns concluded that the effect on voting intention was minimal (Berelson et al., 1954; Campbell et al., 1960). Currently, this opinion is shared by academics who base their data on the context, claiming that not being in a stable environment (altered by an electoral process) and the social nature makes obtaining data with sufficient certainty complex; therefore, carrying out a campaign to alter the results is possible, but complex (Stevenson & Vavreck, 2000; Wlezien & Erikson, 2002).
Subsequently, various studies in the field of journalism (Chaffee, 1981) and political science (Graber, 1980; Rose & Mossawir, 1967) demonstrated the effect of the mass media on election results. The most remarkable conclusion of these investigations was the quantification of the effect of these campaigns on voting intention. Specifically, in the United States, the impact indicated that between 7% and 11% of voters changed their voting intention from one party to another. In the case of voters without a deep-rooted party identification, this percentage could be as high as 28%. However, in several studies, the strong influence of personal contacts on changing the voting intention was pointed out, evidence that has also been corroborated by other studies such as those by Johann et al. (2018).
Since the emergence of digital media, academic interest has expanded to the study of election campaigns and the effect on the internet and social networks, and with the ability to identify what is happening at the present moment (Ginsberg et al., 2009), but also with the ability to predict the future (Asur & Huberman, 2010). The assumption that the volume of keywords searched on the internet or contained in chats on social networks such as Twitter are revealing the current and future thinking of a significant amount of the population (Romero et al., 2011) is what all these studies have in common.
The Google search engine domain, according to different sources 1 , accounts for 80% of worldwide search, which has enabled its searches to enrich its databases and become representative samples of the population for many studies (Trevisan, 2014). Google has designed the Google Trends tool that processes the search carried out according to the words inserted and analyses the total number of Google searches, that is, it measures the relative search popularity (Google, sf). Thus, Google search statistics have become a tool used in the academic field (Scharkow & Vogelsgang, 2011; Scheitle, 2011).
Several investigations have shown that it is an effective tool for making data predictions, as revealed by numerous studies in the social (Chai & Sasaki, 2011; Gunn & Lester, 2013), economic (Ettredge et al., 2005; Kaeserbauer et al., 2012; Trevisan et al., 2018), or health field (Hagerty & Veenhoven, 2003; Nuti et al., 2014).
In the political sphere, the results are mixed. Balz (2011) analysed the 2008 presidential campaign of B. Obama and H. Clinton concluding that Clinton, despite being searched less, got more engagement on the network, but Obama won. However, Arendt and Fawzi (2019) point out that the conflicting communication techniques used by Trump in the 2016 elections led to a greater internet search, but generated a negative bias, which led to him being less voted. Even so, Trump won.
Following this trend, in a study carried out in the United States and the United Kingdom, Trevisan et al. (2018) confirm how the information available on Google Trends is an interesting indicator, so that candidates can adapt their programme and adapt their campaign to attract indecisive people. Lui et al. (2011) points out that Google Trends was not a good predictor of the 2008 and 2010 congressional elections compared with The New York Times polls, other polls, and even fate. However, they point out that with an important and well-defined subset of data, they find that Google Trends is a better predictor. Yasseri and Bright (2013) also identify that it is necessary to review the predictive capacity of this tool. Graefe and Armstrong (2010) analysed all the American presidential elections from 1972 to 2008 with Google Insights for Search and concluded that it has a 97% predictive power probability.
Mavragani and Tsagarakis (2016) and Askitas (2015) point out that Google Trends has predictive capacity in short periods of time and even under a high-volatility scenario such as the Greek referendum in 2015 on the approval or rejection of conditional financial aid from the European Union. The predictive nature of Google Trends is confirmed in the analysis on the national elections in Greece and Spain by Polykalas et al. (2013a). Polykalas et al. (2013b) tested again with the German elections of 2005, 2009, and 2013 and demonstrated a strong correlation between potential voters’ searches and election results.
In addition, Gómez-Martínez et al. (2019) highlight the predictive capacity that was shown in the 2011 Spanish general elections with at least the same precision as in the preelection polls. The averages obtained from the search indexes carried out anticipated with some precision the real vote percentage results observed later. Gómez-Martínez and Prado-Román (2014) also demonstrate, through an econometric panel data model, that Google search statistics regarding that same election date were statistically significant and had predictive power. The results indicated that a greater (lesser) interest in the political party observed by greater (lesser) Google searches on it implied a greater (lesser) voting intention and a higher (lesser) percentage of electoral roll votes.
However, it is important to note that predicting winners with this type of tool is only possible in democratic elections and in countries with freedom of online navigation (Trevisan et al., 2018). Furthermore, there are also studies that point to the negative effect and/or biases of using these indicators, since, as mentioned by Epstein and Robertson (2015), a “search engine manipulation effect” may occur and directly affect election results; minority or alternative political parties would be excluded (Hindman, 2009) due to the loop effect of the most relevant searches (Halavais, 2009); or submission to commercial agents such as Google (Mager, 2012).
In view of these criticisms, analysis of Google Trends searches makes it possible to identify the temporal evolution of the electorate with a large amount of data and to analyse trends (Trevisan et al., 2018). Useful information for predicting candidates and identifying factors or actions that cause searches to increase and decrease can be obtained with this tool. In this case, the effectiveness of policies during the campaign could be identified and the penetration of actions be quantified.
Based on this background information, the aim of this article is to demonstrate the ability of Google Trends as a predictor of the winner of the presidential elections in the United States and Canada since 2004, and propose its use as an electoral predictor at future polls. It is obvious that if the consumer behaves differently due to digital changes, the voter will do the same, and as managers, we must obtain this information so as to be more prepared.
Sample and Methodology
Sample
Taking into account the theoretical framework discussed and the research objective, the following methodology is proposed. First, we downloaded Google Trends data for the United States and Canada and compared the search for each candidate in the last month, 2 months, and 3 months before the Election Day. Although the interest of elections is global, only the United States or Canadian citizens vote, so the search is limited to the geographical area of the United States of America or Canada.
Study 1: The United States
Figure 1 shows the information displayed on the screen when the search is performed and shows Google searches in the three months prior (from August 1 to November 1) to the 2004 elections in which George W. Bush was reelected. The figure shows that Bush obtained an average search of 37 out of 100 in those three months, while Kerry only reached 28. This would be a sign of victory for Bush due to arousing more interest than his opponent, which is eventually what happened in the elections. Furthermore, the case of the elections to the presidency of the United States of America in 2008 for the months of August, September, and October of that year can also be seen. Barack Obama exceeds with a strong majority what had already been anticipated by Google search differences.

Presidential elections USA 2004-2016.
In the 2012 presidential elections, Barack Obama defeated Mitt Romney, but with a lower difference in votes and in 2016, Donald Trump defeated Hillary Clinton despite the polls forecasting Hilary Clinton as the winner, but Google had anticipated the change to a Republican president.
Study 2: Canada
Figure 2 shows the search results for presidential candidates in the elections since 2004. As in the case of the United States, Google searches indicated the winners of the commissions.

Presidential elections Canada 2004-2019.
Methodology
Taking into account that Google Trends has been publishing statistics since January 2004, we analyse whether this pattern is repeated for the presidential elections available and we demonstrate the predictive capacity that will be very useful for the next presidential elections in the United States in 2020, and in Canada in 2024.
In addition, we repeated the analysis for different time intervals, 1 month, 2 months, and 3 months, looking for pattern differences as the elections got closer. To this end, a model of analysis of differences and correlations is applied to measure the search impact and the votes obtained, as well as the relationship between Google searches and election results.
The following variables are used for the study:
Google Trends searches: The data are provided on a 0 to 100 scale, that is, they are standardised by Google on a scale that allows for search comparisons. For this research, we use the average search value based on the time period.
Votes: The data are the election results quantified by seats won by each candidate.
Difference in electoral votes resulting from the elections (blue candidate party votes minus red candidate votes).
Xt: Difference in the Google search average for each of the candidates (G-Trends average of the blue candidate minus G-Trends average of the red candidate).
Results
First, we show the results for the United States (Table 1). It is observed that if we predict the winner of the presidential elections to the United States, assuming that it will be the candidate with the highest Google searches, the predictive capacity of this tool would have predicted the winners since the 2004 elections. We also observe that the prediction does not vary regardless of whether we take one, 2 or 3 months of observation. Therefore, we can assume that the candidate who will occupy the White House from 2020 will be the one that arouses the greatest interest among the population of the United States of America and generates the most Google searches. This is the case as long as these searches are related with positive news and the environment remains stable.
United States Results: Period 2004-2016.
In Table 2, we confirm that the results obtained in the United States are repeated in Canada. Thus, we validate the results of Google trends as an electoral predictor and we include the year 2019, which would show the evolution of the use of digital devices and tools.
Canada Results: Period 2004-2019.
Second, we performed an analysis of the search and vote variation, and a correlation analysis to validate the predictive capacity of Google Trends (Table 3 and Table 4). As verified in the table, the extra votes due to Google Trends searches increased considerably in the nine presidential elections analysed. However, this value does not increase as it approaches the present and digital media are more widely used. For example, in the case of the 2012 U.S. presidential elections, those votes based on searches increased by 18, and in Canada a year earlier, in 2011, they increased by 10.50.
Variations Analysis Results: The United States: Period 2004-2016.
Variations Analysis Results: Canada: Period 2004-2019.
Third, Figure 3 shows the correlations between the difference in votes and the time period. This figure shows a linear trend and a relationship between searches and votes, indicating that there is a relationship between both variables in the three time intervals considered, 1 month, 2 months, and 3 months before the elections.

Correlations between Google Trends searches and votes.
Implications
Political marketing and the use of the media in election campaigns arises at the same time as candidates must convince the electorate to vote for them. The presence of digital media and the use of internet search engines as a source of information for the electorate have impacted the traditional way of campaigning. Election managers and candidates must use these means as communication tools, but also as information tools that allow them to make optimal decisions to win the elections. Furthermore, this is particularly important in the case of presidential elections in which candidates’ personalism is greater and they are constantly judged and analysed by individuals who now also use the internet to access information about them.
There is awareness of this search impact in the communication and marketing area, since if candidates are not searched online, they are not there, they do not exist and it will be more difficult to attract voters. In addition, these areas are aware of the useful information that is generated on the network and that must be used, and obtaining this information can be a click away, accessing search metrics, such as Google Trends.
This research confirms that the information available on the web and that shows citizens’ concerns and interests is a reliable source for predicting voting intention (Arendt & Fawzi, 2019; Balz, 2011; Graefe & Armstrong, 2010; Trevisan et al., 2018). If we focus attention on the different sources of information available on the web, we should mention Google, not only due to its number of users worldwide and in the countries analysed but also for making the information extracted from searches available to users via Google Trends (Askitas; 2015; Gómez-Martínez et al., 2019; Mavragani & Tsagarakis, 2016; Polykalas et al., 2013b).
By extracting data from Google Trends, we analyse the election results since 2004 in the United States and Canada and confirm that, based on the analysis of Google Trends data, we can anticipate who will be the winning candidate in the presidential elections. According to the results of the empirical study, Google searches generate an increase in the number of votes, which has led in all the analysed elections to predict who the winner would be. Therefore, monitoring Google trends data would provide extremely useful information to know who will win the elections, for example, the next elections to preside over the White House.
From the point of view of electoral managers and political analysts, these results pose new challenges to be faced and a new way of campaigning or monitoring information. It is common to conduct and disseminate numerous polls on the days before the election date, but could they be replaced by more profitable tools such as the one discussed in this study? Would this investment be necessary if a free and accurate tool to predict the outcome of the elections were available?
Despite the convincing nature of the results of this research, the conclusions should be taken with caution, since it is necessary to validate these results in other countries and elections, explore in more detail the use of the internet and expand the study sample. However, we recommend that when monitoring the next elections for the United States presidency to be held next November 2020, in addition to following traditional polls, voters, the media and political parties should take into account the data collected by this indicator because it may still be the most accurate predictor.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
