Abstract
Enns and Koch (hereafter E&K) use multilevel regression and poststratification (MRP) along with survey aggregation to measure state policy mood. As E&K rely on direct information about public opinion, it would be preferable to Berry et al.’s widely used indirect measure relying on data about the issue positions and vote shares of members of Congress, if E&K’s measure were valid. Assessing the validity of E&K’s measure takes on special importance because the measure proves to be nearly uncorrelated with Berry et al.’s measure, implying that at least one is invalid. Because the “true” policy mood of states is unknown, it is impossible to definitively assess the validity of E&K’s measure. Instead, we raise some concerns about E&K’s measurement methodology and present evidence pertaining to the indicator’s face validity, convergent validity, and construct validity. Our analyses leave us doubtful that the E&K measure is valid because its characterization of state moods departs significantly from conventional wisdom and current scholarship.
Many scholars have relied on Berry et al.’s (hereafter BRFH; 1998, 2007) measure of state policy mood (e.g., Krause and Melusky 2012; Reingold and Smith 2012). The BRFH indicator—observed annually for each state beginning in 1960—is an indirect measure constructed from information about congressional candidates’ issue positions and their vote shares. Undoubtedly, a measure relying on direct information about the public’s issue preferences, if demonstrated to be valid and reliable, would be preferable to the BRFH indicator. Recently, Enns and Koch (hereafter E&K; 2013) seemed to take a step in this direction by using multilevel regression and poststratification (MRP) along with survey aggregation to measure state policy mood for each year between 1956 and 2010. 1
E&K report—and our own analysis confirms—that their measure of state policy mood and the BRFH measure are very weakly related. Indeed, across all state-years in which both measures are available (observations for each year during the period 1960–2010), the correlation between the two variables is just .10. Obviously, it is not possible for both measures to be valid; if one is, the other clearly is not. This makes assessing the validity of E&K’s measure very important. If it is demonstrated to be valid, it should be used widely, and—given the strikingly low correlation between BRFH’s and E&K’s measures—findings from previous research relying on the BRFH measure should be viewed as suspect.
In this comment, we seek to evaluate the validity of E&K’s policy mood measure. Unfortunately, a definitive assessment of its validity—or that of any other indicator of policy mood—is not possible. To establish with certainty whether any variable, X, is a valid indicator of the concept, state policy mood, one would need to know the true score of policy mood in each state in each year, so that these true scores could be compared with scores on X. But if we had access to the true scores necessary for such validation, we could measure the concept directly and would not need the indicator X. Thus, we must rely on less conclusive validity tests.
We begin by noting some features of E&K’s measurement methodology that raise concerns about the validity of the resulting indicator. Then, we conduct various assessments of the measure’s face validity. They reveal empirical characteristics of E&K’s measure that are markedly at odds with the conventional wisdom of political scientists about policy mood in the states. Next, we subject E&K’s measure to tests of convergent validity by comparing its scores to other measures of policy mood from the literature. We conclude our battery of validity tests by assessing the construct validity of E&K’s measure and discover that some variables we expect to be strongly correlated with policy mood are either uncorrelated with it or correlated with the wrong sign. 2
Given our results, we are skeptical of E&K’s claim that their measure is valid. Moreover, as the only evidence that E&K present to justify their claim that the BRFH measure of policy mood is invalid is its weak correlation with E&K’s measure, we think their claim is unjustified. We believe that the BRFH measure is valid and recommend its continued use as researchers seek to construct a more direct valid measure using MRP or other strategies. 3
Concerns about E&K’s Methodology
The Number and Types of Survey Items Used by E&K
E&K construct a state-level analog to Stimson’s (1999) measure of the national policy mood in the United States, which was originally based on 77 questions asked at least twice in surveys taken at different times. Ellis, Ura, and Robinson (2006) show that Stimson’s mood measure can be accurately reproduced using as few as 11 items from the General Social Survey (GSS). Yet E&K’s state mood estimates are based on fewer than 11 items in 23 of the 55 years they consider, according to supplementary information from Enns’ website. For the 1956–71 period—before the GSS was launched—the median number of items in a year was 6. The median was 22 from 1972 through 1994. After 1994—when the GSS shifted from annual to biennial surveys in even-numbered years—the median was 31 in even years, but only 4 in odd years.
It is not simply a problem of low numbers of question for some years; however, it is also a matter of which questions were used in those years. Stimson (1999) has argued that questions about the government’s role in combating racial discrimination are inextricably linked to the New Deal and Great Society programs that political parties debate and which define the policy mood of the country at any moment in time. A review of E&K’s Supplementary Appendix suggests that items about race-based policies dominate the small number of items used to estimate state mood in most years between 1956 and 1971. The problem seems even more severe in odd-numbered years after 1994, when the small number of items used to estimate state policy mood consists disproportionately of questions concerning abortion, gun control, and same-sex marriage. These are not New Deal social welfare policies at the heart of Stimson’s conception of policy mood. Together, the uneven number of items used to estimate state mood over time, and the changing composition of items over time lead us to question the consistency of the quality of E&K’s measure over time.
Moreover, even if Stimson’s dynamic factor analysis yields a valid measure of the nation’s mood over time, we cannot assume that method will produce comparable results at lower levels of aggregation. Gallup polling data from the 1960s clearly indicate that Southern whites who supported New Deal and Great Society programs nevertheless resented both the Civil Rights Act of 1964 and the Voting Rights Act of 1965. Consequently, voters in Deep South states abandoned the Democratic Party in the presidential elections of 1964 and 1968, confounding any claim that antidiscrimination and social welfare policies go together at the state level. 4
E&K’s State-Level Covariates
Buttice and Highton (2013) evaluate the performance of MRP using both empirical and Monte Carlo analysis. They find that MRP’s performance is dependent upon the extent to which the state-level covariates predict the attitude being measured: “of critical importance when using MRP with conventional national survey samples is the inclusion of geographic-level covariates that account for a substantial amount of the geographic variation in true opinion” (Buttice and Highton 2013, 463). Using a strategy similar to Lax and Phillips’s (2009), E&K utilize state two-party vote share in the most recent presidential election as their state-level covariate to predict survey responses: about partisanship, (symbolic) ideology, and a variety of issue positions (to measure state policy mood).
Warshaw and Rodden (2012, 218) advise those using MRP to “optimiz[e] an MRP model for a particular research question” by tailoring the choice of geographic covariates to the specific attitude being measured. Buttice and Highton (2013, 464) suggest a number of specific state-level covariates that “may be more suitable [than presidential vote share] for public opinion on economic issues . . . [:] unemployment, median household income, and poverty rate.” E&K might consider modifying their regression models to incorporate these and other variables in addition to, or instead of, state vote share in the most recent presidential election. At the very least, they ought to explain why presidential vote share is the optimal covariate for measuring state mood.
The Face Validity of E&K’s Measure
The “Liberal” South?
Conventional wisdom suggests that state policy mood in the Deep South is uniformly conservative. One indication of a problem with E&K’s measure of policy mood is that its scores for the 11 “Deep South” states of the Confederacy are predominantly liberal. In particular, for each year between 1960 and 2010, we identify the 20 most conservative states in the United States based on the E&K measure and the 20 most liberal states. This yields a set of 1,020 (= 20 × 51) “conservative” state-years and an equal number of “liberal” state-years. Nearly half (44%) of Deep South state-years fall in the liberal set, and a smaller share of Deep South state-years (only 38%) fall in the conservative set. 5 Thus, accepting E&K’s state policy mood measure as valid would require a radical revision of political scientists’ understanding of southern politics.
Regional Variation
We group the states into the Census Bureau’s nine divisions, compute the average E&K policy mood score over the period 1960–2010 in each division, and conduct a Tukey–Kramer Honestly Significant Difference (HSD) test for differences in means across divisions. 6 Figure S-1 in our unpublished supplement presents detailed results, which we summarize here. Conventional wisdom is that policy mood is most liberal in the Northeast (Caughey and Warshaw forthcoming), and consistent with this perception, the average E&K policy mood score is higher (by an amount statistically significant at the .05 level) in each of New England and the Middle Atlantic states than in each of the other seven divisions. However, among these seven other divisions, there is not a single pair for which the E&K measure displays a statistically significant difference in the mean policy mood score. For example, according to E&K’s measure, there is no significant difference in policy mood between states in the East North Central (IL, IN, MI, OH, WI) and states in the West South Central (AR, LA, OK, TX), or between states in the Pacific rim (AK, CA, HA, OR, WA) and those in the South Atlantic (DE, FL, GA, MD, NC, SC, VA, WV). This absence of regional variation in E&K’s measure contrasts sharply with both conventional wisdom, and Caughey and Warshaw’s (forthcoming) evidence of distinct, and relatively persistent, interregional differences in mood over the period between 1976 and 2006. 7
Longitudinal Variation
We now turn to longitudinal properties of E&K’s measure. Based on a visual examination of plots of policy mood scores in four illustrative states (see E&K’s Figure 5), E&K note their measure of policy mood exhibits “similar over time trajectories across states.” E&K claim this pattern is consistent with Page and Shapiro’s (1992, 317) assertion that “aggregate opinion change . . . can largely be understood in terms of homogeneous movements across the whole population,” making it so that “policy mood . . . appear[s] to rise and fall together across different states” (the quotations are from E&K 2013, 363).
We investigate more systematically the similarity across states of time trends in E&K mood scores over the period 1956–2010. For each pair of the 48 continental U.S. states, we compute the time-series correlation between E&K’s policy mood measure in one state and their mood measure in another state. 8 Figure S-2(a) in our unpublished supplement presents a histogram for the 1,128 resulting correlations. To summarize the results, the mean longitudinal correlation between E&K’s policy mood score in one state and its score in another state is .84. The median correlation is .87, and the 75th percentile is 0.91. In one example, the correlation between the policy mood score in Alabama and the score in Massachusetts is .86. These correlations are much higher than we would expect, and strike us as implausible. We believe that national forces are insufficiently powerful to counteract local forces and make interstate correlations as strong as E&K’s measure indicates. In fact, if the nature of change in policy mood were as similar across the 50 states as E&K’s policy mood scores indicate, for the purposes of empirical analyses of the effect of state policy mood on a state policy choice, it would probably suffice to use Stimson’s national mood measure for each state. 9
Convergent Validity
In this section, we compare scores on each of E&K’s and BRFH’s state policy mood measures with those of other indicators of state policy mood. As E&K’s and BRFH’s measures are the only two currently available in each year over a long period, we confine our comparisons to measures that are observed cross-sectionally. 10 Carsey and Harden’s (C&H; 2010) measure of state policy mood, and Caughey and Warshaw’s (C&W; forthcoming) measure of domestic policy liberalism are each available for all states in 2004, 2006, and 2008. 11 Accordingly, for each of these three years, and for each pair among the set of four state policy mood/liberalism measures—E&K’s, BRFH’s, C&H’s, and C&W’s—we compute the cross-sectional correlation between the two measures. The results are in Table S-1 in our unpublished supplement.
E&K’s measure consistently has a lower correlation with the C&H and C&W measures than does the BRFH measure. Across the three years of analysis, the E&K measure has an average correlation of .62 with the C&H indicator, whereas the BRFH measure’s average correlation with the C&H measure is .71. Similarly, E&K’s indicator has a mean correlation of .67 with the C&W measure, whereas the BRFH measure’s average correlation with C&W’s indicator is .81. 12
The Construct Validity of E&K’s Measure
In this section, we examine the relationship between E&K’s measure of state policy mood and each of several indicators of state policy that are widely believed to covary with policy mood. 13 These state policy indicators include measures of state tax effort (Besley and Case 2003; Krause and Melusky 2012), state Aid to Families with Dependent Children/Temporary Assistance to Needy Families benefits and caseloads (Berry, Fording, and Hanson 2003; Fording 1997), state Medicaid benefits and caseloads (Grogan 1994; Hanson 1984), and state imprisonment rates (Smith 2004; Yates and Fording 2005). The results are in Table 1—which for comparison purposes also presents correlations of the state policy indicators with BRFH’s measure of policy mood, as well as Erikson, Wright, and McIver’s (EWM; 1993) measure of public opinion (i.e., symbolic ideology).
The Correlation between Alternative Measures of State Policy Mood or Symbolic Ideology and Various Measures of State Policy.
Note. When calculating correlations, state policy mood measures are observed in the same year as the state policy indicator. Each measure of policy mood or symbolic ideology is coded so that greater scores indicate greater liberalism. Data for the AFDC/TANF program, Medicaid beneficiary data, and state poverty and population data are from University of Kentucky Center for Poverty Research (www.ukcpr.org). Medicaid spending data are from Centers for Medicare and Medicaid Services (Medicare and Medicaid Statistical Supplement for 1980, 1990, 2000 and 2008). State tax collections are from the U.S. Census Bureau (Annual Survey of State Government Tax Collections for 1980, 1990, 2000 and 2008). State imprisonment data were obtained from the Bureau of Justice Statistics (Prisoners in 19XX/200X for 1980, 1990, 2000 and 2008). E&K = Enns and Koch; BRFH = Berry, Ringquist, Fording, and Hanson; EWM = Erikson, Wright, and McIver.
For most policy variables, all three measures of policy mood or symbolic ideology yield correlations that are consistent with conventional wisdom. Moreover, no policy variable has a relationship with the BRFH measure of policy mood or the EWM measure that seems obviously suspect. But several variables have correlations with E&K’s measure of policy mood—listed in bold font—with a sign contrary to what we would expect. These include weak negative correlations of E&K’s measure of policy mood with EWM’s measure of state policy liberalism in 1980, AFDC (welfare) benefit in 1980, per capita state tax collections in 1980, and per recipient Medicaid expenditures in 1980; and weak positive correlations with imprisonment rate in three different years.
Concluding Recommendations
The appropriateness of E&K’s use of MRP to measure state policy mood hinges to a great degree on using state-level covariates that are strong predictors of true policy mood (Buttice and Highton 2013). As the choice of the most suitable covariates is debatable, we encourage E&K to remeasure policy mood using other plausible covariates (and sets of covariates) and assess the sensitivity of mood scores to the choice of covariates (Lax and Phillips 2013). We also recommend other robustness tests. E&K’s estimates of state policy mood are based on 73 different opinion questions asked on surveys in each of the 55 years between 1956 and 2010. It would be useful to experiment with conducting MRP using (1) different subsets of the 73 questions and (2) responses to these questions from surveys in different subsets of the 55-year period of analysis. Indeed, a test for the sensitivity of E&K’s state policy mood scores to minor variations in any assumption of E&K’s MRP/survey aggregation methodology that is largely arbitrary—in the sense that it cannot be justified based on empirical evidence or strong theory—would help assess the validity of E&K’s state policy mood measure.
We believe that satisfying such robustness tests constitutes a necessary but, unfortunately, insufficient condition for establishing the validity of E&K’s measure. This is because even if the E&K MRP methodology proves robust, one would still need to confront that it yields mood scores that deviate sharply from conventional wisdom about the nature of policy mood in the states. We concede that inconsistency of E&K’s mood scores with conventional wisdom is not definitive evidence that E&K’s indicator is invalid as it is possible that it is conventional wisdom—rather than E&K’s measure—that is flawed. Yet, given what we know now, we are very skeptical of the claim that E&K’s measure of state policy mood is valid and continue to believe that the BRFH measure is the best available indicator of state policy mood for researchers doing pooled cross-sectional time-series analysis.
Footnotes
Acknowledgements
We would like to thank Peter Enns and Julianna Koch, and Devin Caughey and Christopher Warshaw for providing us prepublication access to their data, which facilitated our ability to undertake the empirical analyses we present. We are also grateful to Caughey, Jeff Harden, and Ben Highton for helpful comments on earlier versions of our article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
