Abstract

In a recent article, Beall, Hofer, and Schaller (2016) used observational time-series data to test the hypothesis that the 2014 Ebola outbreak influenced the 2014 U.S. federal elections. This represents one example of a recurring psychological interest in using observational data (a) to assess long-term temporal predictions of psychological theories in naturalistic settings (Jebb, Tay, Wang, & Huang, 2015) and (b) to examine how psychological theories can predict cross-population variation in attitudes and behavior (Eppig, Fincher, & Thornhill, 2010; Fincher & Thornhill, 2012; Gelfand et al., 2011; Murray, Schaller, & Suedfeld, 2013; Schaller & Murray, 2008). While such nonexperimental designs hold considerable promise, they also introduce analytic challenges that can lead to spurious inferences if left unaddressed (Hackman & Hruschka, 2013; Hruschka & Hackman, 2014; Hruschka & Henrich, 2013; Jebb et al., 2015; Pollet, Tybur, Frankenhuis, & Rickard, 2014). Here, we use Beall et al.’s analyses to illustrate how using observational data without attention to one long-recognized threat to inference in time-series data—temporal autocorrelation—can lead to spurious in-ferences (Yule, 1926).
Beall et al. used the coincidence of the 2014 Ebola epidemic and the 2014 U.S. federal elections (as well as ancillary analyses of Canadian elections) to assess two hypotheses derived from theories of the behavioral immune system (Schaller & Murray, 2008). First, they hypothesized that perceived threat of disease should increase political conservatism. Second, they hypothesized that disease threats may increase conformism and lead to a bandwagon effect, “the phenomenon in which voters show an increased inclination to support whichever political candidate is leading in recent polls” (p. 596). Beall et al. assessed these hypotheses by correlating 2-month time series of (a) online searches for the term “Ebola” and (b) daily polling data for U.S. congressional elections, a month before and a month after the Centers for Disease Control and Prevention’s announcement of the first Ebola case in the United States (September 30, 2014). Beall et al. found strong correlations between daily Ebola search volumes during the months of September and October and support for conservative candidates at national and state levels over that same time period. They interpreted this correlation between time series as support for their first hypothesis. Beall et al. also found that correlations between Ebola searches and Republican support were stronger in states that started off with greater support for Republican candidates and with longstanding Republican voting norms, and they interpreted this result as support for the bandwagon effect.
These analyses relied on correlations between two time-series variables—Ebola search volume and daily polling—taken over 2 months. When two variables evolve over time, they can frequently look highly correlated, even without any underlying causal relationship between them (Yule, 1926; see Koplenig & Müller-Spitzer, 2016, for an illustrative example). This results from temporal autocorrelation—greater similarity in data points that are closer to each other in time—and the common existence of long-run trends in time-series data that can create many non-independent data points (Jebb et al., 2015). One simple method for dealing with such threats is to detrend (i.e., remove the long-term trend from) the time series by analyzing the changes between time points rather than their absolute values. This removes first-order autocorrelation and is often the first step in time-series analysis (Jebb et al., 2015). Calculating changes between absolute values leads to the “loss” of the first observation in the time series. However, in time series in which observations are highly autocorrelated, this does not necessarily represent the real loss of an independent data point, because data points are highly nonindependent.
Here, we applied this simple detrending procedure to the Beall et al. time series and reanalyzed the data (see the Supplemental Material available online for further details). First, we found exceedingly high levels of temporal autocorrelation in the time-series variables (rs > .90). In other words, each observation was nearly perfectly correlated with the observation that came directly before it in the time series. This indicated that detrending was a necessary first step in analyzing the time series (see Table S1 in the Supplemental Material). By detrending the data, we were then able to compare changes between adjacent observations rather than simply compare the absolute values of those observations.
After detrending the data, we found no empirical support for either of the original two hypotheses (Table 1). At both national and state levels, there were no longer strong or significant associations between Ebola search volume and preference for conservative candidates in the U.S. federal elections. The strong correlation in the Canadian elections (based on only nine data points) was still strong but no longer significant and had exceedingly wide confidence intervals. Moreover, there was no support for a moderating bandwagon effect: States leaning Republican in either current or past elections did not show correlations greater than zero or correlations greater than those observed in Democratic states (Table 2). These results were robust to the composition of the sample (including or excluding outliers and excluding or including six states with insufficient data on daily changes; see Table S1 in the Supplemental Material).
Comparison of Correlations Between “Ebola” Search Volume and Measures of Voter Intentions
Note: Beall et al. examined U.S. national elections in Study 1 and Canadian national elections in Study 3. All other correlations refer to the state-level analyses of Study 2.
p < .05. ***p < .001.
Comparison of Differences (Cohen’s d) Between Correlations of “Ebola” Search Volume and Measures of Voter Intentions
p < .05. **p < .01.
Given that Beall et al.’s findings were not robust to basic time-series controls and were based on particularly small samples, this strongly suggests that either (a) these initial findings were spurious or (b) the study design used by Beall et al. was insufficiently powered to detect any potential associations or to test the proposed hypotheses. The latter is a clear possibility. For example, the statistical power to detect a statistically significant correlation between fully detrended time series would have been less than 0.5 in both the U.S. study (n = 23, observed r = .3, α = .05) and Canadian study (n = 8, observed r = .6, α = .05), whereas data from both studies still exhibit substantial second-order correlation (see Table S2 in the Supplemental Material). Many sources of randomness, such as measurement error in either the dependent or independent variables, would further increase the likelihood of null findings. These are all potential limitations of the data used in the original Beall et al. study and reanalyzed here.
We have described one of the simpler tools—detrending to remove first-order correlation—to deal with inferential threats that arise in observational data analysis. Autocorrelation of observed time series is by no means the only threat to inference when working with observational data. For example, using smoothed data, as in Beall et al.’s article (an issue we describe in more detail in the Supplemental Material), can also lead to spurious correlations. Many other useful analytic techniques exist for observational data analysis and are necessary for avoiding common pitfalls. For time-series data, one can also model and remove higher-order trends and seasonality, as well as other factors that introduce temporal autocorrelation (Jebb et al., 2015; in the Supplemental Material, we describe additional simulation approaches for checking inferences that can be used if researchers choose not to detrend their data). For cross-population comparisons that may be subject to pseudoreplication of units (e.g., Mississippi and Alabama may not really be independent observations in analyses across the 50 U.S. states), one can introduce controls for macroregional variation (Hruschka & Henrich, 2013), conduct spatially autocorrelated regressions (Anselin & Bera, 1998), or remove cultural autocorrelation by looking at changes over cultural phylogenies (Mace & Holden, 2005). To deal with potentially unmeasured confounding variables that are particularly pernicious in observational data, there are fixed-effects models for panel data (Allison, 2009) and instrumental-variable analyses (Angrist, Imbens, & Rubin, 1996). There is a rich literature addressing each of these that includes checks on the assumptions and appropriate implementation of these techniques to best avoid inferential threats introduced by these myriad issues.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Action Editor
D. Stephen Lindsay served as action editor for this article.
Open Practices
All data and materials have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/d9jfz/. The complete Open Practices Disclosure for this article can be found at https://journals-sagepub-com.web.bisu.edu.cn/doi/suppl/10.1177/0956797616680396. This article has received the badges for Open Data and Open Materials. More information about the Open Practices badges can be found at
.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
