Abstract
The potential for spatial dependence in models of voter turnout, although plausible from a theoretical perspective, has not been adequately addressed in the literature. Using recent advances in Bayesian computation, the authors formulate and estimate the previously unutilized spatial Durbin error model and apply this model to the question of whether spillovers and unobserved spatial dependence in voter turnout matters from an empirical perspective. Formal Bayesian model comparison techniques are employed to compare the normal linear model, the spatially lagged X model (SLX), the spatial Durbin model, and the spatial Durbin error model. The results overwhelmingly support the spatial Durbin error model as the appropriate empirical model.
Introduction
Elections have proven to be fruitful areas of research for empirical economists studying public choice issues. It is interesting to observe that the amount of research devoted to uncovering the factors influencing how a person votes is dwarfed by the volume devoted to uncovering the factors influencing whether a person votes at all. The sizable amount of literature attempting to explain voter turnout is perhaps due to the fact that most studies do not yield consistent results (Tollison and Willett 1973; Matsusaka 1995; Matsusaka and Palda 1999; Geys 2006). Attempts have been made to capture the effects of formerly omitted variables in an effort to drive up the predictive power of models, but most have been underwhelming.
This article examines whether there is a spatial component to the as-yet unobserved influences on voter turnout. Factors that influence the costs and benefits of voting that are unobservable or difficult to measure may yet retain identifiable spatial effects. As a simple example, good weather has been known to spur more turnouts, yet quantifiable definitions of “good weather” may be hard to determine. 1 Two inches of rain during commuting times may affect turnout differently than two inches at 5 a.m. or 10 p.m. However, while it is likely that good weather in county A will affect turnout positively, it is equally likely that this same good weather effect will spill over to adjacent county B. Though it is difficult to observe the effect of good weather itself, we can estimate the aggregate effect of all unobservable factors that have a geographic or spatial component. For a given geographic area, these may include a common set of values or political beliefs, local partisan competition, access to only a few sources of news media, a predominant employer or industry in the area, or other similar factors. Shachar and Nalebuff (1999), Feddersen (2004), and others describe the importance of groups motivated by a leader, and it certainly stands to reason that group membership and influence is highly spatially related.
Similarly, political scientists and geographers have noted the importance of space for political issues and have demonstrated the influence of local context and social interaction through spatial econometric techniques (e.g., O’Loughlin, Flint, and Anselin 1994) or more traditional nonspatial estimation methods (Pattie and Johnston 2000; Gerber, Green, and Larimer 2008, which analyzes voter turnout). Key (1984) presents evidence (prior to its original publication in 1949) from Alabama on localism or the “friends and neighbors” effect, where candidates win significant support from their home counties even though statewide support is minimal. He attributes this to several factors, including the absence of strongly established, well-organized political parties; lack of a traditional recruitment and advancement system for candidates; the relatively high importance that voters place on local versus state issues; and lack of party loyalty. Similar patterns were seen in Georgia, Florida, South Carolina, and other southern states. The reasons for the strength of localism differed among the states, but its empirical observability has generated academic research explicitly accounting for spatial effects.
Reliance on traditional nonspatial estimation techniques for data with known spatial effects is problematic. The presence of spatial error dependence can lead to biased standard errors much like other forms of autocorrelation. On the other hand, spatial dependence that is exhibited in the dependent variable (as a spatial lag) can lead to biased and inconsistent estimates of the parameters of interest. The benefit of utilizing spatial econometric techniques is that they can take into account these econometric issues that may unknowingly bias standard normal linear model results.
Model and Data
There are a few seminal economics articles in the voter turnout literature (Downs 1957; Riker and Ordeshook 1968; Barzel and Silberberg 1973; Ashenfelter and Kelly 1975; Filer and Kenney 1980) whose theories motivate our models, but are generally well known and need no review here. (A good review and meta-analysis of past studies is found in Geys 2006). Voters make rational decisions on the costs and benefits of voting and behave accordingly. Typically, the benefits and costs are so small that slight disturbances in one or the other can tip turnout in unpredictable ways (see Aldrich 1993). Despite this, there are characteristics of voters (or groups of voters) that most researchers generally agree will usually affect turnout. The job of empirical economists has been to quantify these characteristics and measure their effects while trying to shore up the unpredictability of turnout through adding more control variables. Typically overlooked in the existing literature, though, is an explicit accounting for geographically correlated factors that affect turnout. 2
Shachar and Nalebuff (1999) provide two important contributions to the study of voter turnout. First, it critiques the typical use of an ex post measure of election closeness as a control variable to describe the ex ante decision whether to vote. The potential value an individual vote has will depend on how close the election outcome is, since in a slim margin an individual vote has more weight. While most past research uses the actual outcome of the election as the measure of closeness, Shachar and Nalebuff recognize that this information is not available to voters heading to the polls. 3 A second contribution, mentioned earlier, is the recognition of the influence that political leaders have when they expend effort to boost turnout. Their estimations include controls for US regions, but it is this concept of an influential political leader that will be more relevant for our discussion of spatial dependence in county voter turnout.
The political science literature has considered the effect of geography on turnout and other forms of political participation. A recent paper by Cho and Rudolph (2008) seeks to uncover, via a spatial lag model, whether the observed spatial dependence in political participation is attributable to individual-level or macro-level characteristics (e.g., race, income), to diffusion through formal or informal social networks, or to subtle environmental cues taken through casual observation (e.g., yard signs, well-kept gardens, bumper stickers). After controlling for a typical set of individual, aggregate, and social network variables, the model’s spatial lag parameter is positive and significant, and a Lagrange Multiplier test for residual autocorrelation is insignificant. The authors maintain that this provides evidence that the spatial effect of participation is a result of casual observation. Concern remains, however, about their definition of their dependent variable. The data come from the Social Capital Benchmark Survey where respondents were asked if, in the last year, they (1) signed a petition, (2) attended a political meeting/rally, (3) worked on a community project, 4 or (4) went to a protest/march/and so on. The dependent variable was thus an additive index ranging from 0 to 4. Quantifying political participation in this way seems somewhat arbitrary; Is a “2” respondent 100 percent more participatory than a “1” respondent? What if the “1” respondent attended twenty political meetings of different parties while the “2” respondent signed a single petition and worked on a single community project? The construction of the dependent variable may call into question the results, although the premise is interesting.
A second paper in political science that measures spatial dependence is Cutts and Webber (2010). In explaining political party vote shares, they find evidence that campaigning exhibits spatial dependency; that is, local campaign spending by a political party not only positively affects that party’s vote share in the constituency in which the spending is done but also positively affects that party’s vote share in neighboring constituencies. They also estimate spatial error models (SEM). A possible flaw is the limited set of control variables (percent with higher educational degrees; percent of the population employed in manufacturing, agriculture, or education; percent who are pensioners, students, and Muslims; and the percent living in owner-occupied housing). There are also controls for campaign spending by the political parties, but other variables identified as important in the extant literature are not included in their analysis. One shortcoming of the study is that the authors do not calculate the proper marginal effects. The coefficients from a standard spatial autoregressive (SAR) model do not represent the true marginal effect of a change in an explanatory variable on the dependent variable. LeSage and Pace (2009) provide details regarding the calculation and interpretation of the direct, indirect, and total effects.
Kim, Elliott, and Wang (2003) use a spatial lag model in a Bayesian framework to uncover spatial dependence in vote shares for the Democrats and Republicans in Presidential elections from 1988 to 2000, as well as empirically testing whether voters decide current votes on past candidate behavior or on their expectation of the success of a particular party at fixing current economic problems. They construct two spatial weight matrices, first-order contiguity and a commuting matrix to capture the number of commuters from one county to another (“geographic” neighbors and “economic” neighbors). Their model of vote share is estimated using per capita income and the unemployment rate as controls. Even with this small set of explanatory variables (which, in some models, appear insignificant), they find significant spatial dependence. One shortcoming of this study is that the authors use maximum likelihood based Lagrange Multiplier test statistics to determine the most appropriate model and then subsequently use Bayesian techniques to estimate the model. In the Bayesian paradigm posterior model probabilities should be calculated for each model and then compared to determine the most appropriate model. 5
Darmofal (2006) 6 says “citizens’ costs and benefits of voting and, by extension, their turnout, are recognized as depending not just on who citizens are [e.g., capturing their demographic or economic characteristics], but also on where they live” (p. 127). He examines county-level turnout for each presidential election from 1828 through 2000 and computes a global Moran’s I to detect spatial autocorrelation. After a series of diagnostic tests, Darmofal then estimates several SEM to correct for the resulting spatial error dependence. Interestingly, Darmofal (2006, 145) finds “no clear evidence of contagion, or spatial lag dependence.” Another paper in the political science literature that indirectly addresses the geographic component of voter turnout is Nagel and McNulty (1996). The model is somewhat reversed from ours as they attempt to explain the vote share for Democrats as a function of turnout, but they examine elections from 1928 through 1994 and separate Southern and non-Southern states, and then look at elections in each state. Their primary aim is to assess the prevailing notion that turnout helps Democrats, so they are not addressing our topic directly but at least recognize geographic differences in turnout (albeit at a more aggregated level than our county-level data). They do not quantify, however, any spatial spillover effects.
Nickerson (2008) examines the spillover effect on turnout after one member of a two-person (two-voter) household has a direct face-to-face contact with a volunteer get-out-the-vote (GOTV) representative. In two cities, three groups of households were selected: one group was contacted with GOTV, a second with a message to recycle, and a third was not contacted at all. The spillover effect was determined by comparing the turnout rates of noncontacted persons (secondary message recipients). Directly contacted GOTV voters were about 10 percent more likely to vote, and indirect voters were 6 percent more likely to vote.
Some issues with Nickerson’s results help to motivate our study. First, the estimated effect on secondary voters is not significant at 5 percent in a one-tailed test (though it is for direct voters) calling into question the spillover effect of voter turnout. Second, the effect on the secondary voter decays pretty quickly (and could be irrelevant if the effect was insignificant) and the implication is that the effect on tertiary voters (voting-age children, friends, coworkers) would be minimal, implying a spillover effect that would be quite small or negligible at the household (rather than individual voter) level, which again would indicate a lack of spatial autocorrelation at the county level in voter turnout. Third, the estimated spillover effect of the GOTV effort is determined by the percentage of the effect on the direct voter that induces the secondary voter to turn out. The author argues this effect is quite large (the direct voter passed on about 60 percent of the propensity to vote), but the individual effects on the direct and secondary voters were quite small (10 percent and 6 percent, respectively). It is not clear that a sizable spillover effect has been discovered when its impact is only a four-percentage point increase in turnout.
The existing economics and political science literature provides the foundation for our estimated models. The decision to vote should be influenced by demographic, economic, and political factors that can be distinguished from those likely associated with nonvoters. To the extent that these individual factors are observable at the county level, hypotheses about county turnout can be made. 7 The control variables we include are standard controls seen in the voter turnout literature. The contribution of this article is in formulating and estimating several spatial econometric models designed to measure the extent to which factors influencing turnout at the county level are spatially correlated and to draw inferences from the model that best fits the data. As we will see in the model comparison exercise, the heretofore unused spatial Durbin error model “wins” our model comparison race 8 and we draw inferences from this model.
We chose to analyze the 2004 Presidential election turnout at the county level, given that the popular “red/blue” county map appears to indicate significant spatial clustering. Red counties are likely to be surrounded by red counties and vice versa. This nonrandom grouping of red/blue counties indicates that there is possible spatial error correlation in that unobserved factors influencing turnout that vary over space need to be accounted for in a systematic manner. 9 There may be local customs or historical reasons that certain areas of the country vote in higher numbers that is ignored in the existing literature. This issue has normally been dealt with by using dummy variables for those regions, but this approach does not properly model the interaction among the geographic entities. The spatial econometric techniques that we use better model these interactions. The sample includes 3,061 counties and excludes Alaska, Hawaii, Washington, DC, and all cities in Virginia. 10
The dependent variables that we use represent the turnout of the county voting-age population (VAP) and the turnout of the citizen voting-age population (CVAP). Turnout is defined as the number of total votes cast for Bush and Kerry (from POLIDATA Demographic & Political Guides 11 ) in the county divided by the relevant population. VAP is the 2004 resident population estimate of those eighteen and older from the Census Bureau’s Annual County Population Estimates, 12 and CVAP multiplies this by the Census’ Current Population Survey report of the percentage of the state eighteen and older population that is a citizen. 13
McDonald (2002) uses a similar voting-eligible population (VEP) measure, where he excludes not only noncitizens as we do but ineligible felons, as an alternative to the traditional VAP. Typically, felons include prisoners, parolees, and half of probationers, but state laws vary as to which of these groups are allowed to vote (e.g., in McDonald’s data, Virginia denied the vote to all of these groups, whereas Vermont denied it to none). McDonald calculated the number of “ineligible felons” in each state (according to the state laws existing at the time) to arrive at his VEP.
For comparison purposes, we constructed a VEP turnout rate using 2004 data on parolees and probationers, 14 and prisoners. 15 The VEP is calculated similar to the CVAP where the county population was multiplied by the state citizen percentage, but the VEP further adjusts this for the state “ineligible felon” percentage. 16 The ineligible felon population is relatively small; the means of the VAP, CVAP, and VEP turnout rates, respectively, are 58.1 percent, 61.6 percent, and 62.6 percent. We estimated models using the VEP, but the results (available upon request) were not noticeably different from the VAP; 17 further, our model comparison exercise still demonstrates preference of the VAP model over the VEP.
Our use of two measures of turnout rather than one has support in the literature, especially as it applies to aggregate-level studies. Geys (2006) notes that of the eighty-three studies reviewed, about 60 percent of them used turnout measures similar to ours. We also transformed the dependent variables using the logit transformation, given that our dependent variable is measured as a proportion. 18
Demographic Variables
Several papers argue that religious sentiment is associated with a stronger sense of civic duty, 19 and it was also popularly believed that religious voters played an important role in the 2004 election, so we collected two variables measuring religiosity in counties. The religion variables come from the Glenmary Research Center of the Glenmary Home Missioners. 20 The Research Center reports data from a study by the Association of Statisticians of American Religious Bodies, who surveyed 149 religious groups on different aspects of their membership counts. % religious adherents is the percentage of the total population in the county who are considered adherents of one of the religious groups. The churches variable is the number of churches per 10,000 people in the total population in 2000. A higher percentage of adherents or number of churches at the county level is expected to positively affect turnout.
White, black, and Hispanic indicate the percentage of a county’s population of the respective racial category. These data came from the Census 2000 Summary File (SF) 1 database. The typical belief in most studies of voter turnout associates white voters with higher turnout and black and Hispanic voters with lower turnout, so our hypotheses are similar. 21 A different opinion is offered by Oberholzer-Gee and Waldfogel (2001), which examines the phenomenon of how black turnout is increased when the proportion of blacks in a jurisdiction rises. This finding, while throwing some doubt on the expectation of lower turnout for blacks generally, also supports the group theory of voting of Shachar and Nalebuff (1999) and others.
Most articles include a measure of the age of the electorate, 22 since older voters are expected to turn out in higher numbers than younger voters. Thus, we include the median age of the resident population for each county from the Census Bureau’s Annual County Population Estimates and expect it to be positively associated with turnout.
A higher cost of voting should be associated with lower turnout, and some articles include a measure of the number of single-parent families. 23 Such a family would presumably have less time available for voting, and so we include the percentage of family households that were single parent in each county from the 2000 Census. 24 The percentage of single-family households is expected to be negatively associated with county turnout.
We also hypothesize that voters with stronger ties to their communities will be more likely to vote in elections. Stigler (1975) suggests the importance of length of residence in a community, and Geys (2006) finds strong support among existing turnout studies of a positive relationship between “population stability” (including homeownership) and voter turnout. To proxy this, we calculated the percentage of the county population who resided in the same house between the years 1995 and 2000. 25 The higher the proportion of long-term county residents, the higher we expect county turnout.
We constructed two variables on likely voters in each county, the percentage male and the percentage female holding bachelor’s degrees, to test for a gender difference in turnout. These variables are defined as the percentage of a county’s total male twenty-five-and-over population who hold a bachelor’s degree, and a similar definition for females. These data come from the 2000 Census SF3 database. Lacombe and Shaughnessy (2007) found that women’s preferences for Kerry were stronger than men’s preferences for Bush though in some papers (e.g., Ashenfelter and Kelley 1975), men have a higher turnout; we therefore expect a difference in turnout between the sexes with perhaps a slight edge toward males.
Also included in our regression model is the percentage of the county population living in urban areas, obtained from the 2000 Census SF1 database. Hackey (1992) finds a positive association between turnout of black voters in the 1976, 1980, and 1984 presidential elections and whether the voter lived in an urban area, but one might posit that voters living in higher-density areas perceive more competition, and thus a lower weight attached to their vote, than do rural voters. This negative relationship between urbanization and turnout is weakly supported in Geys (2006) meta-analysis of previous voter turnout studies, where urbanization and turnout are usually negatively but insignificantly related.
Economic Variables
A persistent finding in the literature on turnout is that economic conditions of voters influence turnout. First, higher-income voters tend to vote more often than lower-income voters, 26 so we include a measure of the natural log of per capita personal income in 2004 for the county. The data come from the Bureau of Economic Analysis’ Regional Economic Accounts.
A second economic variable is the county’s 2004 unemployment rate. These data come from the Bureau of Labor Statistics Local Area Unemployment database. Given that Bush was the incumbent in 2004 and that voters typically credit or blame the president for business conditions, a bad job market will likely lead voters to voice their displeasure. 27 Thus, we expect that a higher county unemployment rate is associated with higher turnout.
Similar to unemployment, voters would be more inclined to be satisfied with the incumbent if output growth was good. Shachar and Nalebuff (1999) include a measure of gross national product (GNP) growth that is positively associated with turnout and significant. We thus include the growth rate of real state gross domestic product (GDP) and expect that a higher GDP growth rate is associated with higher turnout.
Given the prominence that labor unions place on political efforts and voter turnout drives, 28 the amount of unionization is expected to influence turnout as well. We collected 2004 data from the Bureau of Labor Statistics’ Labor Force Statistics from the Current Population Survey on union affiliation of employed wage and salary workers by state. 29 Higher unionization is expected to be positively associated with turnout.
Political Variables
An important issue in the 2004 election was controversy over Bush’s handling of the War in Iraq. A proxy that we use to express strong sentiment about the Iraq War is the percentage of the civilian population age 18 and over who are civilian veterans. These data come from Census 2000 information via the Department of Veteran Affairs, which collected data on the veteran population in the United States and Puerto Rico sorted by county and by period of service. 30 Veterans on average would have strong opinions about the continuation of the Iraq War, and thus we hypothesize that a higher veteran percentage in the county is associated with higher county turnout.
Another important political issue in the 2004 election was the presence of constitutional amendments to define marriage as “one man–one woman” on state ballots; voters (particularly religious voters) were believed to have turned out at much higher rates to voice their opinion. We use a dummy variable that equals one if, in the 2004 election, the state in which that county resides had a popular vote on a state constitutional amendment to ban gay marriage. The data came from CNN’s 2004 Election website on state ballot measures. 31 Even though popular opinion was that the presence of a gay marriage amendment would boost turnout, Lacombe and Shaughnessy (2007) found that the presence of these amendments did not significantly affect the percentage of Bush votes, so we expect at best a weak positive association between these amendments and turnout.
An important hypothesis in the voter turnout literature is that the value of a vote drives turnout; if voters perceive that their vote contributes more to the outcome of the election, then they will vote in higher numbers. The typical way this is measured is by including a variable on population, but since we already utilized population in constructing our turnout rate variable we rejected that approach. 32 We proxied “vote value” in two ways. First, it is reasonable to assume that voters assign a weight to their Presidential vote in proportion to the number of electoral votes assigned to their state since electoral votes will mirror population; thus, to measure the county’s share of the state’s electoral vote, we multiplied the number of state electoral votes 33 by the percentage of the state population that resides in the particular county. We expect a negative association between this county electoral vote share and county turnout.
“Vote value” is also likely affected by the growth rate of the surrounding population. Faster growth of the population of competing voters will more quickly dilute the importance of a particular vote. 34 To measure this, we collected data on the county age 18 and over population in 2003 and 2004 from the Census’ Annual Estimates of County Population and calculated the growth rate between them. 35 As the VAP grows in the county, we expect that county turnout should fall.
A debate in the turnout literature exists over how to control for the perceived closeness of the election. Obviously, the closer the election is perceived to be, the greater value an individual will place on his vote, and thus the higher the expected turnout. Many times the measure for expected closeness is the actual election outcome itself, 36 but as detailed in Shachar and Nalebuff (1999) and Geys (2006) this raises the problem of trying to measure an ex ante expectation with an ex post outcome. In support of ex ante measures, Geys’ (2006) meta-analysis finds that ex ante measures of closeness have a 74 percent “success rate” (achieving an expected result) versus a 51 percent “success rate” for ex post measures. One proxy is to include preelection opinion polls or professional analysis or opinion. Carter (1984) tests both ex post and ex ante measures, where the ex ante are measured by Time and Newsweek magazines considering the state to favor one candidate strongly, to simply favor one candidate, or to be too close to call. The ex post measures of actual closeness (percent of vote going to winning candidate) were unexpectedly positively associated with turnout, while the ex ante results showed that stronger leaning for a candidate reduced turnout as expected. We adopt the same approach here, and use the October 29, 2004, Cook Electoral Rating from the Cook Political Report. 37 A state that was “solid” for Bush or Kerry was coded as 0; “likely” Bush or Kerry was coded as 1; “leans” Bush or Kerry as 2; and a “toss up” as 3. Given our coding, we expect a positive association between the Cook variable and county turnout.
We also include in our estimating equations dummy variables for counties that reside in the states of Texas and Massachusetts to capture any “home state” effects. We expect that voters in the candidates’ home states will be more likely than others to turn out, ceteris paribus, since they presumably have more, and more reliable, information about the home candidate than “foreign” states. Descriptive statistics of the data used and their sources are in Table 1.
Summary Statistics.
N = 3,061.
Econometric Models
While we included what we consider a relatively rich set of controls, it is possible that there are unobservable factors such as political sentiment, sense of civic duty, or group identities that influence voter turnout that retain a strong geographical or spatial component. A strong sense of civic pride and duty to vote that prevails in one county likely will affect turnout in neighboring counties. Thus, these omitted unobservable factors that vary systematically over geographic space may result in residual spatial autocorrelation. There is also the possibility that our dependent variable could exhibit spatial autocorrelation if we believe that voter turnout in one county affects voter turnout in neighboring counties. In either case, the inferences drawn from standard econometric techniques (such as the normal linear model) may be misleading.
In order to overcome these deficiencies, we estimate voter turnout using four separate models: the normal linear model, the spatially lagged X model (SLX), the spatial Durbin model, and the spatial Durbin error model. We then employ Bayesian model comparison techniques to choose the most appropriate model.
The first model that we estimate, and the most common one found in the literature, is the normal linear model. The estimating equation for this model is as follows:
where
Our first spatially explicit model, the SLX model is identical to the normal linear model specification above with the exception that spatially weighted independent variables are added to the specification. The spatially weighted independent variables take the general form of
The SLX estimating equations take the following form:
where
The third model that we estimate and test is the spatial Durbin model. The spatial Durbin model allows for spatial spillovers in the dependent variable (through a spatially lagged dependent variable) as well as through spatially lagged independent variables. LeSage and Pace (2009) note that the spatial Durbin model is appropriate if two separate conditions hold. First, there must be omitted variables from the model that are spatially correlated. Second, these spatially correlated omitted variables must be correlated with an included explanatory variable in the model. In our particular case, it may be that a variable that measures some membership in a civic group is omitted from our model and that this variable is spatially correlated, simply because memberships in these groups tend to be geographically based. Additionally, this spatially correlated omitted variable that measures group membership could be correlated with another included explanatory variable, such as per capita income. If our model were to exhibit such characteristics, then the spatial Durbin model would be most appropriate.
The spatial Durbin model takes the following form:
where
As mentioned before, although we have included a number of independent variables that are believed to influence voter behavior, there is the possibility that there are unobserved factors that may vary systematically over space, resulting in residual spatial error correlation. Since neither the normal linear model nor the SLX model formally take this residual spatial correlation into account, we estimate what LeSage and Pace (2009, 41–42) refer to as the spatial Durbin error model. The spatial Durbin error model combines the SLX model with an error process that accounts for residual spatial autocorrelation:
where the dependent and independent variables are as before,
Even though there may be theoretical or empirical reason to doubt the validity of including a spatially lagged dependent variable in our specification, we estimate and test the spatial Durbin model as part of the modeling exercise. Given that we have four possible models that can describe the data generating environment, we now turn to the statistical model and a description of the Bayesian techniques that are utilized to make inferences.
Statistical Model
By way of notation, let θ denote a vector of parameters of interest,
where
thus resulting in the familiar Bayesian phrase, “the posterior is proportional to the likelihood times the prior.” Ideally, we would like to draw inferences regarding the parameters of the model by analytically integrating the joint posterior distribution for each of the model’s parameters, resulting in a marginal distribution for each parameter. However, the analytical solution to this integration problem is available only in a few select cases. In deriving the marginal distributions, these complications force us to draw inferences using iterative procedures referred to, generically, as Markov Chain Monte Carlo methods (MCMC). Specifically, we will make use of the Gibbs sampler and the Metropolis-Hasting algorithm to provide robust inferences regarding the model parameters.
The Gibbs sampler is an algorithm to generate a sequence of samples from the joint posterior distribution of the parameters when an analytical solution is unavailable. Gibbs sampling is applicable when the joint posterior is intractable, but the full conditional distributions of each parameter is known. In the case of the normal linear model as well as the SLX model, the full conditional distributions follow standard forms for the β and σ
2
terms, namely the multivariate normal distribution for β and an inverted Gamma distribution for the σ
2
term
39
. In the case of the spatial Durbin model and spatial Durbin error model, the full conditional distributions for the β and σ
2
terms retain the same distributional form as in the normal linear model case
40
; however, there is added complication regarding the full conditional distribution for the λ term (or
The formula for Bayes’ Rule explicitly allows for prior information to be included in the statistical analysis. In each of our models, we use proper prior distributions, but with relatively non-informative values. Specifically, we set the prior for the βs to come from a multivariate normal distribution with mean
An attractive feature of the models-comparison statistic that we employ is its foundation on basic probability calculations. In Bayesian investigations, we commence with a derivation of a joint distribution for the observed quantities, “
Results
We begin our discussion of the results by noting that across all four model specifications (i.e., the normal linear model, the SLX model, the spatial Durbin model, and the spatial Durbin error model), the model exhibiting the largest value of the log-marginal likelihood was associated with the dependent variable that measured county turnout as a proportion of the CVAP. 41
Table 2 contains the result of our model choice exercise. Looking at the second column, we see that the model with the highest value of the log-marginal likelihood is the spatial Durbin error model. Alternatively, we may calculate posterior model probabilities according to the following formula:
Log-Marginal Likelihood Values and Posterior Model Probabilities.
where
Spatial econometric models that contain a spatially lagged
Spatial regression models such as the SEM that do not involve spatial lags of the dependent variable produce coefficient estimates that are interpreted in the standard fashion, that is, effect of a change in an explanatory variable (regardless of location) on the dependent variable is simply equal to the coefficient of the explanatory variable. However, standard SEM do not allow for indirect impacts to arise from changes in explanatory variables, that is, a change in an explanatory variable at location j cannot affect the dependent variable at location i, which could be considered a shortcoming of the standard SEM.
The spatial Durbin error model that we use to draw inferences, as explained by LeSage and Pace (2009, 42), does allow for spatially lagged independent variables, in the form of our
In sum, our spatial Durbin error model allows for a richer interpretation compared to a standard SEM and has the advantage that calculation of the direct, indirect, and total effects estimates is straightforward.
Table 3 contains information regarding the direct effects in our spatial Durbin error model. Recall that the direct effect is the effect of a change in an explanatory variable at location
Direct Effects from the Spatial Durbin Error Model.
Note: Parameter inferences are based on 10,000 sampled values with 10,000 sampled values used as burn-in. Lower 95% and Upper 95% represent the lower and upper 95% credible interval bounds, respectively. Entries in boldface represent credible intervals that do not contain zero.
For our demographic variables, we find that the signs of the coefficient estimates are in accordance with our a priori expectations for those variables associated with turnout. Religious sentiment may have played a role in the 2004 Presidential turnout as Bush self-identified as a Christian, and to capture the possible relationship we utilized two different measures of county religiosity. The results indicate that the number of churches per 10,000 population is positively associated with voter turnout at the county level, while the number of religious adherents is not. The adherents variable captures anyone who self-identifies with one of over 149 religious groups, which obviously includes some non-Christian denominations or others who might be wary of voting for someone who self-identifies at Christian. The churches variable is more likely to capture voter sentiment regarding religious preferences and this may be why we obtain these results.
As expected, our results indicate that white voters are more likely to have higher turnout at the county level and that Hispanic voters are less likely to turn out to vote at the county level, while the black variable is not associated with explaining variation in the dependent variable. As mentioned earlier, Geys (2006) shows mixed evidence on black voting as well: aggregate-level studies indicate lower turnout in areas with a higher proportion of blacks, but this may be due to the fact that as the black population increases, black turnout actually rises while white turnout (and thus overall turnout in the population) falls.
In terms of the aggregate level of education and how this affects voter turnout at the county level, we note that variables measuring educational attainment at the county level for both males and females have positive coefficient estimates. The point estimate for males is slightly higher than the estimate for females, but the credible interval for the difference between the male and female bachelor’s degree holder variables contains zero, so that we cannot say that these coefficients are statistically different from each other. 43
Of the remaining demographic variables, we note that the median age of residents, single-parent family, same house, and the urban variables are all associated with explaining variation in the dependent variable and all have the expected coefficient sign. Counties with older populations are more likely to turn out to vote, counties with more single-parent families have lower turnout, counties where the population has lived in the same house for five years are also more likely to turn out to vote, and the population as a whole in urban counties is less likely to turn out to vote.
Our economic variables are designed to capture any sentiment at the county level related to economic events, either good or bad. Of the four economic variables included in the estimating equation, the per capita income at the county level and the real GDP growth rate from 2003 to 2004 are associated with explaining variation in the dependent variable. The unemployment rate at the county level and the state union membership, though they have the expected signs, do not appear to have any effect on voter turnout at the county level. These results seem to indicate that overall economic conditions are relevant for turnout at the county level whereas unemployment and union membership appear to play no role.
Our final category of independent variables, the political variables, has some surprising results. Of the variables that are included in the regression model, only the veterans, electoral votes, and the Cook political rating variables are associated with explaining variation in the dependent variable. However, each of these variables has a sign that is consistent with our a priori expectations. The veterans variable has a positive coefficient estimate, indicating that as the percentage of veterans in a county increases, voter turnout in the county increases. Considering that each of the two candidates running in this particular election had some military experience, this result makes intuitive sense since veterans are more likely to turn out to vote if the candidates share a common characteristic with the voter. The electoral vote share variable, which is designed to capture “vote value,” has a negative coefficient indicating that if the aggregate value of the vote is diluted, then it can suppress voter turnout. Finally, the Cook political rating, which measures how close the election is, has a positive coefficient estimate, which shows that as the race becomes more of a “toss up” between the candidates, county turnout is higher. Interestingly, we find no evidence that the various gay marriage ballot initiatives had an effect on voter turnout at the county level. Also, there appears to be no “home state” advantage for either candidate, as the Texas and Massachusetts state dummy variables were not associated with explaining variation in the dependent variable. Overall, these results accord with our ex ante expectations.
Table 4 contains the results for the indirect effects for the spatial Durbin error model. The indirect effects measure how a change in an explanatory variable at location
Indirect Effects from the Spatial Durbin Error Model.
Note: Parameter inferences are based on 10,000 sampled values with 10,000 sampled values used as burn-in. Lower 95% and Upper 95% represent the lower and upper 95% credible interval bounds, respectively. Entries in boldface represent credible intervals that do not contain zero.
The percentage of county population who are religious adherents has a positive coefficient estimate, indicating that as the population that self-identifies as a religious adherent in surrounding counties increases, an increase in voter turnout is seen in the home county. Interestingly, the direct effect for churches was a factor that affected voter turnout while the direct effect for adherents did not. We have the exact opposite result in terms of the indirect effects: the adherents variable is a factor that affects voter turnout while churches is not. This may reflect the fact that although churches are located in a single county, the county-level percentage of the population that self-identify as religious adherents is not necessarily confined to a single county. People are mobile and may actually travel to another county to attend services, broadly defined. If they then associate with other like-minded congregants who live in neighboring counties, the county spillover effect on turnout may be more pronounced.
The percentage of the county population that is Hispanic has a positive spillover effect on county-level voter turnout. Recall that the direct effect of this variable was negative, indicating a negative relationship between county-level population of this demographic group and voter turnout, which was expected. Our indirect results indicate that as the Hispanic county-level population increases in surrounding counties, the percentage of the own county population that turns out to vote decreases. At first, this observation may seem contradictory but may simply reflect demographic changes at the county level. If surrounding counties’ Hispanic populations are increasing, then it may be the case that the own county’s population is becoming less Hispanic and this demographic change may increase voter turnout if other demographic groups are “replacing” the Hispanic population in the own county.
Of the two variables related to education, only the percentage of the county population who are female with a bachelor’s degree variable is associated with explaining variation in the dependent variable. If the county population who are female with a bachelor’s degree variable in surrounding counties increases, then the own county’s voter turnout increases. One possible explanation for this result is that better-educated females may be more politically active and that this activity spills over across county borders, due to media exposure or other publicity (akin to Cho and Rudolph’s [2008] “casual observation” hypothesis) and this publicity may inspire or otherwise motivate individuals to head to the polls.
The positive coefficient estimate for veterans indicates that as the population of veterans in surrounding counties increases, we see an increase in county-level voter turnout in the own county. One possible explanation for this result is that veterans may be more politically active than other demographic groups and that this political activity may be more readily noticed in surrounding counties and hence may boost turnout. Veterans, relative to other identifiable characteristics, may be more likely to know of and associate with other nearby veterans so the county spillover effect on turnout is more pronounced, similar to the religious adherents variable above.
Our last two indirect effects, the percentage of the county population living in urban areas and the median age of the resident county population, both have negative spillover effects. The spillover effect for the urban variable reinforces the direct effect due to the same sign; however, the median age variable has a negative indirect effect, which dilutes the direct effect somewhat.
The final set of results for the total effects is contained in Table 5. The total effects measure the sum of both the direct effects and the indirect effects. 44 The total effects measure how a change in an explanatory variable affects voter turnout inclusive of the own county and surrounding county spillover effects. The results for the total effects estimates can be broken down by examining the different groups of variables that we use in our analysis in a manner similar to our previous results. The total effects estimates as a whole have signs on the coefficient estimates that are in accordance with our a priori expectations for those variables that have credible intervals that do not contain zero.
Total Effects from the Spatial Durbin Error Model.
Note: Parameter inferences are based on 10,000 sampled values with 10,000 sampled values used as burn-in. Lower 95% and Upper 95% represent the lower and upper 95% credible interval bounds, respectively. Entries in boldface represent credible intervals that do not contain zero.
Of the two variables that measure religious sentiment, only the percentage of the county population that are self-identified religious adherents matters in terms of county turnout. This result seems to reinforce the idea that religious sentiment is more important in explaining turnout than the actual number of churches per 10,000 county residents. Adherents to one of many religious groups as defined in the Glenmary survey may have a more intense interest in public affairs or have a more nuanced sense of civic duty than nonreligious types and hence appear to have higher turnout, all else equal.
As was the case for the direct estimates, the white and Hispanic variables are associated with explaining variation in the dependent variable, while the variable that measures the county population that is black does not. Additionally, the signs are in accordance with our ex ante expectations, that is, an increase in the white population increases voter turnout while an increase in the Hispanic population decreases turnout.
The total effects also indicate that when the percentage of the county population that has a bachelor’s degree increases (regardless of gender), there is an increase in voter turnout. It appears that education plays a major role in explaining voter turnout at the county level.
Of the remaining demographic variables, the county urban population and the percentage of the county population living in the same house are associated with voter turnout, but in opposite directions. The urban population has a negative coefficient estimate which may indicate that counties with large urban areas experience a dilution in the weight attached to their vote. Even though the value of an individual vote is quite low, nonurban voters on election day may not have many social or recreational activities in which to participate, so the opportunity cost of voting is quite low. Urban voters, however, consider the opportunity cost of voting as quite high in terms of forgone opportunities for entertainment, shopping, dining out, and so on, relative to their vote value. This result is in contrast to the result found by Geys (2006), which established that urbanization and voter turnout were weakly associated.
We turn our attention now to the economic variables, and the total effects results indicate that the only economic variable that is associated with the dependent variable is the state unionization variable, which has a positive coefficient estimate. The per capita income, unemployment, and real GDP growth rate appear to have no effect on voter turnout at the county level in our model. Given the prominence that labor unions place on political efforts and voter turnout drives, 45 this result should not be too surprising. However, it is surprising that none of the other economic variables in terms of total effects are associated with explaining voter turnout at the county level.
Our final group of variables that we examine are the political variables. Of the seven political variables that we utilize in our model specification, only the veterans, electoral votes in a county, and Cook closeness rating are associated with our dependent variable.
The percentage of the county population that are veterans has a positive association with voter turnout. In the 2004 election, a group known at the Swift Boat Veterans for Truth ran several television advertisements that were highly critical of the candidacy of John Kerry, who made his military service a major part of his election campaign. Regardless of the veracity of the claims made by the group, this kind of grass-roots organizing may indicate the power that veterans groups (broadly defined) have in terms of stimulating voter turnout, much like the influence that labor unions have in organizing voter turnout.
As we mentioned in the section explaining our choice of political variables, we proxied “vote value” in two ways, one of which was that voters assign a weight to their Presidential vote in proportion to the number of electoral votes assigned to their state since electoral votes will mirror population. Our results indicate that our a priori expectation of a negative relationship between this county electoral vote share and county turnout is confirmed.
Finally, we attempted to proxy for the closeness of the election using the Cook Electoral Rating from the Cook Political Report. Recall that a state that was “solid” for Bush or Kerry was coded as 0; “likely” Bush or Kerry was coded as 1; “leans” Bush or Kerry as 2; and a “toss up” as 3. Given our coding, we expected a positive association between this explanatory variable and county voter turnout. The coefficient estimate for this variable indicates that there is indeed a positive relationship between the Cook report rating and voter turnout at the county level. It appears that states that were deemed “toss up” states had higher turnout at the county level showing that voter turnout can be a function of how close the race is perceived ex ante.
Conclusion
The amount of literature on voter turnout is extensive mostly because the presumed benefits and costs of voting are so small that slight disturbances can alter turnout in seemingly unpredictable ways. This article adds to the existing literature by comparing four different possible models of voter turnout at the county level: the normal linear model, the SLX model, the spatial Durbin model, and the heretofore unused spatial Durbin error model. Using modern Bayesian econometric estimation and model comparison techniques, our results overwhelmingly provide support for the spatial Durbin error model, which provides inferences regarding direct, indirect, and total effects of the set of explanatory variables on the dependent variable. Quantifying these various effects may be important for policy reason as well. For example, the direct effects show how a change in an explanatory variable in one county affects voter turnout in that same county, which may be of interest to certain stakeholders, such as voter turnout advocates. However, our preferred model also shows that there may be indirect effects also associated with changes in explanatory variables that affect voter turnout. For example, the percentage of the county population who are veterans had a positive direct effect which shows that as the county-level veteran population increases, we see an increase in voter turnout in neighboring counties. This result indicates that there may be additional benefits to voter turnout drives that extend beyond a county’s border.
The results from the spatial Durbin error model also shows that many independent variables commonly thought to influence voter turnout actually are not associated with voter turnout at the county level at all, including some rather surprising ones such as per capita income and the county unemployment rate. Additionally, some commonly held opinions such as the effect of a gay marriage ballot initiative to spur voter turnout prove not to have any empirical support.
Spatial econometric techniques are seeing increasing use in various fields such as political science and economics, to name but two. These models in general, and the spatial Durbin error model we use in this article in particular, can provide much more detailed empirical findings for researchers who are interested in quantifying spillover effects and how these effects can guide policy for various stakeholders.
Log-marginal likelihood values are calculated via the method of Chib and Jeliazkov (2001). Numbers in parentheses represent numerical standard errors for the log-marginal likelihood calculation. Calculations are based on 10,000 sampled values.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
