Abstract

There has been recent growth in attention to issues surrounding the analysis of spatially referenced data among sociologists, which ranges from a substantive process that must be modeled to a statistical problem that must be resolved. Spatial regression techniques were first introduced to sociologists by Patrick Doreian in the 1980s and were significantly improved by Land and Deane in 1992. Despite the contribution of these seminal works, spatial regression techniques, and spatial data analysis more generally was not widely adopted by sociologists until recently. Disciplinary neglect is not due to a lack of interest in spatial units or macro-level processes but is owed to technical limitations. However, software advances have permitted sociologists to take a spatially informed perspective in recent years. The timing of Ward and Gleditsch’s book, Spatial Regression Models, could not be better; the authors, both political scientists, make spatial regression analysis accessible to social scientists in a period when statistical software can be easily used to analyze widely available spatial data.
Spatial Regression Models contains four chapters and begins by orienting the reader to the spatial dimension of democracy around the world, the substantive theme used throughout the book. Within the first chapter, the authors gently introduce the concept of spatial dependence and the accompanying need for special analytical techniques. They state the statistical need in clear terms, “. . . ignoring spatial dependence will tend to underestimate the real variance in the data” (p. 10). Attention also is given to substantive motivations at various points throughout the book. Also in this chapter, the authors effectively introduce the how tos of visualizing spatial data as well as the basics of diagnosing spatial autocorrelation and measuring proximity or the spatial connectedness of places. An understanding of each element is critical to pursue the more advanced regression analysis focused on in the remaining chapters.
The second chapter covers spatially lagged dependent variables, commonly known as the spatial lag regression model. If the reader is limited on time, the section on spatially lagged dependent variables versus ordinary least squares (OLS) with dummy variables (section 2.7) is worthy of primary focus. It is within this section that the authors most closely discuss the conceptual difference between spatial dependence and spatial heterogeneity (two different types of spatial processes). The authors state researchers “recognize that there is considerable heterogeneity between different regions of the world” and that a “common way to address spatial heterogeneity is to include dummy variables for different geographical regions” (p. 61). While common, this is not the only approach or the most informative approach to model spatial heterogeneity. Fortunately, the authors elaborate on spatial heterogeneity in the final chapter, although this discussion is rather limited. Still, Ward and Gleditsch provide the reader with a sound discussion of the differences between the spatially lagged y approach and the OLS with dummy variables approach as well as guidance on when and how one might combine the two. Other elements addressed in the second chapter include the nuts and bolts of the spatial lag regression equation, interpretation of the spatial lag parameter (ρ), and understanding model fit and the motivation for working within the maximum likelihood estimator framework. The authors use the time series context as a point of comparison.
The third chapter is devoted to the spatial error model, a less commonly used alternative to the spatial lag model and one that does not make a theoretical claim about the process generating spatial autocorrelation. Indeed, a section of the chapter focuses on the differences between the two models (section 3.4). Again, if pressed for time, the reader would benefit from concentrating on this section. As in the chapter on spatial lag regression, the authors efficiently describe the equation, interpretation of the model including the spatial error parameter (λ), and so on.
In the final chapter, the authors discuss extensions of the two basic spatial dependence regression models addressed in the preceding chapters. Within this chapter, specific attention is devoted to spatial heterogeneity in the section on inference and model evaluation (section 4.2.2), another type of spatial process that contrasts—but does not necessarily signal the exclusion of—spatial dependence. Spatial analysts should be well versed in this type of spatial process. Unfortunately, the authors limit the discussion to less than one half of a page. Other topics briefly addressed in this chapter include connectivities (akin to distance-based networks), discrete and latent variables (most widely available regression techniques are limited to continuous outcome variables), point and geostatistical data (the focus of much of the work conducted by geographers), hierarchical models (the commonly used tool of analysts interested in “neighborhood effects” although the focus here is on maintaining the spatially referenced aspect of the spatial data), and time series data. The book concludes with an appendix devoted to a description of various software packages available at the time of publication. The descriptions include helpful references and URLs to aid the reader in tracking down the software.
A major contribution of the book is the accessibility of the authors’ writing and examples for a sociological audience. The text addresses an important need since other reference books on spatial data analysis are oriented toward geographers (generally with greater attention to geographical information system [GIS] than spatial regression), economists (generally with greater attention to theoretical proofs without a corresponding application), or epidemiologists (generally with greater attention to point data that require a different set of analytical tools). This book outlines the basics to analyzing spatially referenced areal units, the most common type of spatial data analyzed by sociologists. Areal units are places measured as census blocks, census tracts, counties, states, or even nations. The authors’ focus on the areal unit is essential, given the discipline’s historical interest in the recent growing attention to “neighborhoods” and “communities” in addition to school and voting districts, and labor market areas.
The book meets the objective of the series on Quantitative Applications in the Social Sciences to equip a wide audience with the tools to conduct an informed analysis with little start-up cost by including R scripts for all examples illustrated in the text. R is an increasingly preferred software program among spatial data analysts in the social sciences, given the speed of development enabled by the free and open source program.
Despite these valuable contributions, the book is not without shortcomings. One significant drawback is the limited attention to spatial heterogeneity. The general trend in sociological literature has been to either ignore the spatial dimensions of the data (either as substance or as statistical nuisance) or jump to a spatial dependence framework (most commonly the spatial lag regression model). The authors discuss the importance of the theoretical framework in selecting an appropriate regression model since the data will not reveal the answer to the question of which model to choose—a decision based on purely statistical grounds is not possible, given that the two models are not nested (p. 69)—yet they could more thoroughly address spatial heterogeneity as substance or as statistical nuisance. Both require a different modeling strategy. Importantly, in many instances of sociological research and social science research more broadly, the process of spatial heterogeneity is at play in spatially referenced data. Alternative modeling approaches can include spatial regime analysis (Anselin and Cho 2002), geographically weighted regression analysis (Fotheringham, Brunsdon, and Charlton 2002), and trend surface analysis (Unwin 1975). Although space limitations may have dictated the focus on spatial dependence regression only, a separate chapter or a separate book in the series is necessary to gain a comprehensive understanding of available spatial regression models.
A second drawback is the simplicity of the examples. The authors use an example of voting turnout in Italy in 1997 within the chapter on the spatial lag regression model. Yet the fitted regression model is an analysis of the relationship between voting turnout and gross domestic product (GDP only. Similarly, a bivariate model is used to analyze the ratio of Bush to Kerry votes in the 2004 U.S. presidential election. Although researchers are motivated to develop a parsimonious model, rarely is a bivariate model adequate. A multivariate example would be more realistic and more helpful since covariates can have different relationships with the dependent variable in terms of the spatial distributions as well as relationships with each other that could complicate modeling.
Ward and Gleditsch provide a guide to spatial regression analysis that is accessible to any sociologist or social scientist with a grasp of multivariate regression analysis. I can offer a personal endorsement of the book’s utility and accessibility: since publication, I have successfully used the book in semester- and week-long courses with social science and health science audiences composed of graduate students, applied researchers, and early- to late-career academics. The book is an essential addition to the shelves of spatially oriented social scientists.
