Abstract
The geographically weighted regression (GWR) is a well-known statistical approach to explore spatial non-stationarity of the regression relationship in spatial data analysis. In this paper, we discuss a Bayesian recourse of GWR. Bayesian variable selection based on spike-and-slab prior, bandwidth selection based on range prior, and model assessment using a modified deviance information criterion and a modified logarithm of pseudo-marginal likelihood are fully discussed in this paper. Usage of the graph distance in modeling areal data is also introduced. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods with both small and large number of location scenarios, and comparison with the classical frequentist GWR is made. The performance of variable selection and estimation of the proposed methodology under different circumstances are satisfactory. We further apply the proposed methodology in analysis of a province-level macroeconomic data of thirty selected provinces in China. The estimation and variable selection results reveal insights about China’s economy that are convincing and agree with previous studies and facts.
Introduction
For geographically sparse data with inherent spatial variability, estimating coefficients of a regression model for a particular location based on only observations from this location is not feasible due to the small number of observations. The geographically weighted regression (GWR; Brunsdon, Fotheringham, and Charlton 1996) is an important tool to explore spatial non-stationarity of the regression relationship in spatial data analysis. It has been applied to a variety of fields, including geology, environmental science, epidemiology, and econometrics. Fotheringham, Brunsdon, and Charlton (2002) has summarized the basic theory, statistical inference, and bandwidth selection for GWR, and proposed natural extensions of GWR under the generalized linear model framework. The basic idea of GWR is to make use of information from nearby locations. The weighting idea is a natural strategy to use in light of Tobler’s first law that “near things are more related than distant things” (Tobler 1970). In estimating the parameter for one specific location, subjects in the data are weighed according to their distance from this location, with greater weight for closer subjects. Páez, Uchida, and Miyamoto (2002a, 2002b) proposed estimation and inference for GWR under a maximum-likelihood-based framework. Mei, Wang, and Zhang (2006) proposed a mixed geographically weighted regression model which included both spatially-varying and non-spatially-varying coefficients, and gave a testing procedure of important explanatory variables. N. Wang, Mei, and Yan (2008) proposed a local linear-based GWR for the spatially varying coefficient models which can significantly improve GWR. More recently, da Silva and Fotheringham (2016) discussed the multiple testing issue, and proposed a solution which outperforms other solutions such as the Bonferroni procedure under the GWR framework. The aforementioned works discussed the GWR in the frequentist fashion. From the Bayesian perspective, LeSage (2004) proposed a Bayesian GWR, which gives a prior distribution on the parameter vector depending on historical knowledge. The proposed model, however, used cross-validation for bandwidth selection, which relies on a user-specified grid of bandwidth, and is computationally intensive like other cross-validation based methods.
In this paper, we propose Bayesian techniques for the GWR using the likelihood-based approach in Páez, Uchida, and Miyamoto (2002a, 2002b). In addition to Bayesian estimation and inference, a spike and slab (Ishwaran and Rao 2005) prior is applied for variable selection for Bayesian GWR. Furthermore, bandwidth selection and weighting scheme selection are discussed based on prior selection and Bayesian model selection criteria. An introduction to the implementation of GWR based on nimble (de Valpine et al. 2017), a relatively new and powerful R package for Bayesian inference, is presented as an open source repository on GitHub. Our simulation studies showed the promising empirical performance of the proposed methods in both non-spatially varying and spatially varying cases. In addition, our proposed Bayesian approach reveals interesting features of the province-level macroeconomic data in China.
The rest of this paper is organized as follows. In Section “Geographically Weighted Regression,” the GWR and its weighting schemes are discussed. Section “Bayesian Recourse for GWR” gives a detailed discussion of Bayesian inference, variable selection, bandwidth selection, and model assessment for the GWR. Extensive simulation studies are conducted in Section “Simulation Studies” to investigate the empirical performance of the proposed methods. In Section “Real Data Analysis,” we implement our model using province-level macroeconomic data in China from year 2012 to year 2016. Finally, Section “Discussion” contains a brief summary of this paper.
Geographically Weighted Regression
Geographically Weighted Regression
From Brunsdon, Fotheringham, and Charlton (1998), the GWR model can be written as:
where
where X is the
Spatial Weighting Functions and Distances
We first introduce several spatial weighting functions that can be used in GWR. Notice that the weighting scheme of ordinary least squares can be defined in the following form:
where j represents the location of the observations, and k represents the location for which parameters are estimated. In a global model where observations from all locations are used to estimate one vector of coefficients, each observation is assigned a weight of unity.
A first step to consider locality is to include observations that are only within a certain distance d from the target location, i.e.
where
where b is the bandwidth that can be chosen appropriately to control the decay with respect to distance. The Gaussian weighting scheme can be written as:
Both (5) and (6) are decreasing functions of djk, which, intuitively, indicates that an observation very far away from the location of interest contributes little in the estimation of parameters at this location. In order to provide a continuous, near-Gaussian weighting function up to distance b from the estimation point, and then zero weights for any data point beyond b, Brunsdon, Fotheringham, and Charlton (1996, 1998); Fotheringham, Charlton, and Brunsdon (1998) proposed the bi-square function:
In this function, b serves two purposes simultaneously. It controls the speed of weight decay as the distance
We then briefly discuss different choices of distance function. The Euclidean distance, defined as:
is one of the most popular choices when the precise (latitude, longitude) location of each observation is available. However, in some public health and epidemiology studies, or some socioeconomics studies, data are collected and summarized on a higher level than single observations, such as wards (Brunsdon, Fotheringham, and Charlton 1996) or counties (Xue, Schifano, and Hu 2019), which produces areal data instead of point-reference data. All observations within the same area are assigned the same (latitude, longitude). For example, Hu and Huffer (2020) attributed each county’s observations to its centroid. Note that the Euclidean distance is easily affected by the areas of the administrative divisions, which additionally complicates the process of parameter tuning as there is no golden benchmark measure of distance.
An alternative distance measure when we have areal data is the graph distance (Müller et al. 1987; Bhattacharyya and Bickel 2014). The administrative devisions are regarded as vertices of a graph, denoted as
where
While it remains a subjective problem in choosing appropriate bandwidths and thresholds for the geographical distance-based methods, i.e. one has to decide “how close is close enough,” a natural definition of closeness would derive from the graph distance. Counties sharing a common boundary, i.e. having graph distance 1, are close, while having graph distance greater than 1 indicates “not close,” and observations in these far neighboring counties need to be weighed down. A graph distance-based weighting function would be
where d denotes the graph distance, and f is a certain weighting function with bandwidth b. In this work, we choose
Note that while the bi-square weighting scheme (7) also allows for bandwidth tuning, it excludes observations beyond the value of the bandwidth, or, threshold. For areal data in economics such as the Georgia housing data to be presented later, this could mean the exclusion of a fair amount of observations. The parameter estimates produced based on a small number of observations would nevertheless be highly unstable, which should be avoided. The weighting scheme in (10), by restricting the immediate neighbors to be assigned the same weight and all others positive weight, compared with (7), adds more ensurance of parameter estimation stability.
Bayesian Recourse for GWR
In this section, we propose the posterior estimation, variable selection, and bandwidth selection for the Bayesian GWR model. The proposed methods are implemented with the powerful R package
Bayesian Estimation for GWR
According to Boscardin and Gelman (1993) and Páez, Uchida, and Miyamoto (2002a, 2002b), we can get the estimation of GWR using Bayesian computation. The likelihood function of this model can be written as:
where MVN indicates the multivariate normal distribution. In order to have a conjugate posterior distribution, we can set the priors of
where
where
According to (14), we can use Markov chain Monte Carlo (MCMC, Gelman et al. 2013) to estimate
Bayesian Variable Selection
We first consider the regression problem for one location. Following the procedure of George and McCulloch (1993), the spike and slab prior for
where
When
Bayesian Bandwidth Selection
In GWR, it is important to choose a proper bandwidth for the weighting functions. In the Bayesian approach, a prior can be given to the bandwidth b so that the optimal bandwidth can be simultaneously obtained together with the estimation of other parameters. The prior also depends on which measure of distance is used. A more detailed discussion of distance measures is given in Section “Spatial Weighting Function and Distances.” Using similar ideas as in Boehm Vock et al. (2015), a prior for bandwidth can be set as:
where D is the upper limit for the support of the distribution of b. Without any prior knowledge, D can be chosen large enough so that we start from a noninformative prior for the bandwidth, i.e. we start from an approximate global model where observations are always weighed equally. There are also some other choices of prior distributions for the bandwidth, such as the gamma distribution or discrete uniform distribution. If prior information is available about the bandwidth, parameters for the prior distributions can be set to incorporate such information. Our proposed model can be summarized as follows:
where “IGamma” denotes the inverse-Gamma distribution, and f is the weighting function introduced in (9),
Bayesian Model Assessment
In Section “Spatial Weighting Function and Distances,” we introduced several spatial weighting functions that can be used in the GWR. In order to select the weighting scheme that fits the data best, we apply the most commonly used tools, the Deviance Information Criterion (DIC; Spiegelhalter et al. 2002) and the Logarithm of the Pseudo-Marginal Likelihood (LPML; Ibrahim, Chen, and Sinha 2013), for model selection. The DIC is defined as:
where 0 and 0 represent the parameter of interest and the corresponding posterior mean. The term
where n is the total number of the observations. Therefore, the DIC for the GWR model can be given as:
where
The LPML is constructed based on the Conditional Predictive Ordinate (CPO) values, which are estimates of the probability for observing Yi given that all other responses have been observed. Let
where
where T is the total number of Monte Carlo iterations. Then an estimate of the LPML is given by:
A model with a larger LPML value indicates that it is preferred.
Simulation Studies
In this section, we present the performance of the proposed estimation and variable selection techniques under scenarios where the covariate effects do not vary spatially, and where the covariate effects do vary spatially. We use the spatial structure of thirty selected provinces in China in our simulations. A map of these provinces with their names is presented in Figure 1(a). Specifically, Hainan province is an island, and is therefore not connected with any others, which makes its graph distance with any other province infinity. It is, however, very close to Guangdong province, and they bear a lot of resemblance in both culture and economic development. Therefore, we modified the adjacency matrix so that Hainan and Guangdong are adjacent. The graph distance matrix is calculated based on the modified adjacency matrix. A visualization of the graph distance matrix is presented in Figure 1(b).

(a) A map of the thirty selected provinces of China, with their names annotated. (b) Visualization of graph distance matrix for thirty selected provinces in China. Darker color indicates larger graph distance.
Denote the average parameter estimates as
where
where
The variable selection approach is evaluated using both the accuracy rate for a single variable ACC and for the entire model (Model ACC), defined as:
and
To compare with frequentist approach, the same datasets are also fitted using the classical frequentist GWR approach, where the bandwidth selection is made based on minimizing the summation of SSE across thirty provinces over a grid of candidate bandwidths. The details of frequentist bandwidth selection, as well as the parameter estimates obtained using the optimal bandwidth, are presented in Section 1 of the supplemental material. To demonstrate that the graph distance produces credible parameter estimates, and that at the same time it circumvents the additional effort of threshold selection in weighting kernels such as (4) and (10), simulation study is done for the same designs to be presented, with the great circle distance used. The results are reported in Section 2 of the supplemental material. In addition, considering that a total of 150 observations with thirty locations still make a small sample, an additional simulation study with more than 300 locations using the spatial structure of census tracts in Hartford, Litchfield, and Middletown counties in Connecticut has been conducted, and included in Section 3 of the supplemental material.
For both the following simulation studies and the supplemental simulation studies, the effective number of parameters for the frequentist GWR is also calculated as in Brunsdon, Fotheringham, and Charlton (2000). Note that the frequentist GWR is based on one bandwidth only, and only the full model that includes all covariates is fitted, therefore the frequentist pD should be used as a reference, instead of a criterion for direct comparison.
Simulation Without Spatially Varying Coefficients
Under the scenario where there are no spatially varying coefficients, we generate data using the same set of parameters for all provinces. The independent continuous covariates are generated i.i.d. from the standard normal distribution
Average Parameter Estimates, Performance of Parameter Estimates, and Variable Selection Results When There is No Spatial Variation in the Underlying True Parameters. “True” Denotes Whether a Covariate is in the True Model or Not, and ACC Stands for Variable Selection Accuracy Rate.
It is rather clear that when there is no spatial variation, the bandwidth is selected to be large, which induces a weighting scheme that assigns close to uniform weight to both nearby provinces and distant provinces. Under all three settings, the variable selection accuracies are all 100% for all five covariates, and the three model selection accuracies are 100% as well. The average effective number of parameters under the frequentist GWR are 11.08, 10.27 and 12.65 under the three settings, while under the Bayesian GWR, the values are 178.00, 177.89 and 177.57, respectively.
Simulation with Spatially Varying Coefficients
For estimation and variable selection in the presence of spatially varying coefficients, we use similar simulation schemes as in Xue, Schifano, and Hu (2019). The graph distance matrix visualized in Figure 1 is transformed using multidimensional scaling (MDS; Cox and Cox 2000) and mapped into a Cartesian space. Denoting the transformed coordinates corresponding to province as
The variation pattern is visualized in Figure 2. The estimation and variable selection results for the setting in (31) are presented in Table 2. It can be seen that in all three settings, the MAB, MSD and MMSE are all larger than when there are no spatially varying covariate effects. There have been decrease in the MCR for parameters corresponding to covariates that are in the true model. The variable selection procedure, however, remains quite robust, and in all replicates of simulation are the correct models selected. The average bandwidths selected for the three settings are around sixty seven. This indicates that in the presence of spatial variation, for some replicates of simulation, the bandwidths tend to be large as the gain in stabilizing the parameter estimates for each location dominates the incurred bias. This has also been observed for the classical frequentist GWR, as presented in Supplemental Table 2. The frequentist GWR has an average effective number of parameters of value 19.32, 20.49 and 20.19 under the three settings, and the Bayesian GWR has 179.38, 179.58 and 179.32 instead.

Visualization of parameter surfaces when the parameters vary according to (31).
Performance of Parameter Estimates, and Variable Selection Results When a Simple Linear Variation Pattern is Present. “True” Denotes Whether a Covariate is in the True Model or Not, and ACC Stands for Variable Selection Accuracy Rate.
To study the estimation and variable selection performance under a scenario where regional variation exists, we choose to use the four major economic regions of China proposed during the Eleventh Five-year plan: the west (0), northeast (1), central (2), and east (3) regions. A visualization of these four regions is given in Figure 3. Provinces within each economic region are assigned the same parameter value. The four

Map of selected provinces in China colored by their economic regions proposed during the Eleventh Five-year Plan.
The True Underlying Parameters Used for the Four Regions in Each Setting.
The estimation and variable selection results are presented in Table 4. Again, compared to results in Supplemental Table 3, both approaches yield similar MAB, MSD, MMSE and MCR/MCP. Similar to previously observed, the variable selection in the Bayesian approach effectively reduces the MAB, MSD and MMSE parameter estimates for variables that are not in the true underlying model.
Performance of Parameter Estimates and Variable Selection Results When Regional Variation is Present. “True” Denotes Whether a Covariate is in the True Model or Not, and ACC Stands for Variable Selection Accuracy Rate.
The bandwidths selected average to around 70. This could be due to the fact that a province now has a few neighbors with exactly the same true underlying coefficients, and therefore the weighting function tries to weigh observations in the neighboring provinces equally as the local one. The frequentist GWR performed similarly, and results are included Supplemental Table 3. The average effective number of parameters under the frequentist GWR are 17.27, 16.73 and 15.28 under the three settings, while under the Bayesian GWR, the values are 178.35, 178.25 and 178.38, respectively.
Real Data Analysis
The proposed Bayesian GWR model is used to analyze province-level macroeconomic data in 30 selected provinces of China from year 2012 to year 2016, i.e. we have 150 observations in total. The Gross Domestic Product (GDP, in billions of CNY) is used as the spatial response variable (Y). Five covariates, including the resident population in millions (X1), the urban population in millions (X2), the fixed asset investment in the whole society in billions of CNY (X3), total export value in billions of USD (X4), and total import value in billions of USD (X5), are incorporated in the model. The five-year means of the variables for each province are shown in Figure 4. Following the common practice in economics to account for long-tails (see, e.g. Wooldridge 2015), we take the logarithm of GDP before model fitting. All five covariates are continuous, and are therefore standardized before model fitting.

Five-year means for variables in selected provinces of China.
The proposed Bayesian GWR model (17)–(22) is fitted on this dataset. Priors
DIC and LPML Values for Different Weighting Schemes.
Under the GWR model with a Gaussian weighting scheme, the posterior modes of the indicators

Plot of parameter estimates for the three covariates in the final model in each selected province.
For comparison, classical frequentist GWR in (2) is also fitted on the dataset. We report the parameter estimates, together with plots, in the supplemental material. Particularly, the two covariates dropped by our Bayesian variable selection approach have the smallest absolute parameter values among all five covariates, which indicates that our proposed approach is indeed capable of picking out the most influential factors. The parameter estimates are also plotted on maps as in Figure 5. There are slight differences in the values of parameter estimates, which is partly due to the fact that we only have 150 observations (thirty provinces, five years). It is, however, worth noticing that the trend of variation is consistent between the frequentist and Bayesian approaches.
Discussion
We developed a likelihood-based Bayesian approach to estimate regression coefficients in conjunction with spike and slab variable selection for geographically sparse data. The selection of bandwidth is discussed for a wide choice of weighting schemes using popular Bayesian model selection criteria such as the DIC and the LPML under the GWR context. The proposed methods are implemented in
Compared to the great circle distance, with a natural threshold of 1 to define “close enough,” the graph distance yields weighting schemes that produce models with robust parameter estimation and variable selection performance. Based on comparisons with classical frequentist GWR from the simulation studies, it is interesting for us to notice that the bandwidth selection results of the Bayesian approach are different from that of the frequentist approach. A partial reason for this pattern is that the bandwidth selection of frequentist approach is based on minimizing the summation of the sum of square errors (SSE) for each location. Our Bayesian approach, however, tries to maximize the whole data likelihood. Therefore, the frequentist approach tends to select smaller bandwidth than Bayesian approach.
A few issues beyond the scope of this paper are worth further investigation. In this work, we are only concerned with estimation of parameter for linear regression. Extension of similar ideas to generalized linear models, and semi-parametric models such as the Cox model, are worth developing. In the second alternative simulation scenario with regional variation patterns, both the frequentist and Bayesian GWR try to weigh neighbors as high as possible, leading to large bandwidths. Under the frequentist framework, clustering of covariate effects has been done using hierarchical clustering on the parameter estimation, which is ad hoc. Another approach is the penalized methods in Li and Sang (2019). In the Bayesian paradigm, however, hierarchical modeling provides an integrated framework that incorporates the latent cluster configuration layer. Development of such a framework is worth investigating. Also, we are assuming that a covariate is either in the true model for all locations, or not in the true model for all locations. There are cases where a covariate is important for some locations, but is minimally impacting for other locations. Identifying such locations is devoted to future research. Detecting a relationship between two areas that do not share a boundary (Gao and Bradley 2019) other than using graph distance is also an interesting future work.
Supplemental Material
Supplemental Material, gwrnimble_tobeongithub - Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse
Supplemental Material, gwrnimble_tobeongithub for Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse by Zhihua Ma, Yishu Xue and Guanyu Hu in International Regional Science Review
Supplemental Material
Supplemental Material, gwr_supp - Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse
Supplemental Material, gwr_supp for Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse by Zhihua Ma, Yishu Xue and Guanyu Hu in International Regional Science Review
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr. Ma’s research was supported by Project of Educational Commission of Guangdong Province of China #2019WQNCX104.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
