Abstract
Modelling spatial heterogeneity (SH) is a controversial subject in real estate economics. Single-family-home prices in Austria are explored to investigate the capability of global and locally weighted hedonic models. Even if regional indicators are not fully capable to model SH and technical amendments are required to account for unmodelled SH, the results emphasise their importance to achieve a well-specified model. Due to SH beyond the level of regional indicators, locally weighted regressions are proposed. Mixed geographically weighted regression (MGWR) prevents the limitations of fixed effects by exploring spatially stationary and non-stationary price effects. Besides reducing prediction errors, it is concluded that global model misspecifications arise from improper selected fixed effects. Reported findings provide evidence that the SH of implicit prices is more complex than can be modelled by regional indicators or purely local models. The existence of both stationary and non-stationary effects implies that the Austrian housing market is economically connected.
1. Introduction
In real estate research, it is well established that hedonic prices may vary across space such as stratifications of metropolitan areas, regions and counties (for example, Bourassa et al., 1999; Goodman and Thibodeau, 2003; Bischoff and Maennig, 2011; Helbich et al., 2013a). However, this parametric modelling approach has some restrictions. Spatial units have to be defined exogenously; SH is modelled in a discrete fashion where continuous changes across space can be expected and usually the same definition of spatial units is used for all spatially varying effects (for example, Redfearn, 2009; McMillen and Redfearn, 2010).
By preventing these limitations, non-parametric locally weighted regression procedures (LWR; Cleveland and Devlin, 1988) offer significant advantages (McMillen and Redfearn, 2010; McMillen, 2010). In this context, the slowly growing body of literature on LWR primarily uses the geographically weighted regression (GWR; Fotheringham et al., 2002), which inherently assumes SH in the hedonic price function and all involved predictors (for example, Bitter et al., 2007; Yu et al., 2007; Hanink et al., 2010). However, in situations where only some of the variables vary spatially, GWR results in inefficient estimations and possibly incorrect conclusions (Wei and Qi, 2012). This is particularly true for relatively small countries like Austria where real estate markets are economically connected through common federal policies such as governmental subsidies. Economic ties might also be relevant for structural housing features where Austrian-wide equilibrium conditions between supply and demand are a rational outcome. On the other side, spatially varying implicit prices are expected where local legislation and regulation (for example, through spatial planning policies) are effective and/or where local supply leads to scarcity (for example, plot area). Consequently, some price determining effects are expected to vary across space, while others are spatially homogeneous. Both aspects are modelled simultaneously in the so-called MGWR (Fotheringham et al., 2002). Although the MGWR model seems to be rational, it has not yet been considered in real estate studies.
Therefore, the overall objective of this research is to explore SH in housing price functions. Using Austrian housing data, the following contributions to the literature are established. First, the efficiency of fixed effects is evaluated in the context of global models, considering additionally iterative technical corrections for spatial autocorrelation (SAC) and SH. Secondly, switching to fully local non-parametric modelling, it is demonstrated that SH is only inadequately captured by imposing spatially fixed effects and that systematic parameter variation is evident which deviates substantially from globally estimated implicit prices. Finally, due to an absence of empirical consensus as to which a predictor enters the model globally or locally, the data-driven MGWR procedure is proposed. Thus, the following research questions are addressed
— Are standard hedonic regressions equipped with spatial indictors suitable to model SH? Are technical corrections required to archive a well-specified model?
— Is SH beyond the regional fixed effects present? If this is the case, does a semi-local model outperform its global and fully local counterparts?
The rest of the paper is structured as follows. Section 2 reviews the theoretical foundations of hedonic pricing theory and discusses methodologies to account for SH. Following this, section 3 introduces the empirical models. Section 4 presents the study area and the dataset. The results are summarised in section 5, before section 6 highlights major conclusions and implications.
2. Background
2.1 Hedonic Pricing Theory
The theoretical basis of hedonic price modelling (Rosen, 1974) is derived from Lancaster’s (1966) consumer behaviour theory, which argues that it is not the good itself that creates utility, but its individual characteristics. As housing characteristics are non-separable and traded in bundles, real estate is usually treated as a heterogeneous good. Houses are valued for their utility-bearing characteristics with implicit prices, which can be considered as the component’s specific prices (McDonald, 1997). Thus, a household implicitly chooses a set of different goods and services by selecting a specific object (Sheppard, 1997; Malpezzi, 2003). In the course of their purchase decisions, households aim to maximise their utility depending on their own social and economic characteristics. Households’ utilities are also increasingly influenced by the absolute location of the house. Urban economic theory, in particular the monocentric Alonso–Muth–Mills model, provide a uniform framework to explain the spatial organisation of housing (Anas et al., 1998). Briefly stated, the model claims that distance to the core city is the exclusive determinant causing spatial variation in housing prices and that prices, among other things, tend to decline smoothly with distance. Due to this reductionist use of commuting costs, the model has attracted some criticism. However, in a recent empirical study of real estate commodities, Ahlfeldt (2011) concludes that the model still performs satisfactorily.
Methodologically, a hedonic price function f describes the functional relationship between the real estate price P and associated physical characteristics
2.2 Spatial Effects in Hedonic Models
SAC and SH are two challenges in hedonic modelling. Since Dubin (1992, 1998) and a wealth of subsequent research (for example, Can and Megbolugbe, 1997; Pace et al., 1998; LeSage and Pace, 2009; McMillen, 2010), it is now accepted that such spatial effects should be taken into account when estimating hedonic price functions. SAC describes the coincidence of locational and attribute similarity (Anselin, 1988) caused by analogous neighbourhood characteristics, similar socioeconomic characteristics of their residents and the quality of services (Dubin, 1992). As a consequence, the traditional ordinary least squares (OLS) estimator is inefficient and statistical inference is invalid (Dubin, 1998).
Although the price function assumes spatial equilibrium between supply and demand for dwelling characteristics, as real estate is fixed in space, and because of its durability and consistent properties, supply becomes largely inelastic (Malpezzi, 2003). Furthermore, different socioeconomic and demographic conditions of households cause spatial variation in dwelling demand (Sheppard, 1997). Functional disequilibria, which manifest themselves in the emergence of heterogeneous market structures, might result as a consequence (McMillen and Redfearn, 2010). Therefore, SH should be considered in hedonic models. As LeSage and Pace (2009) pointed out, non-modelled heterogeneity can lead to biased results and falsely induced SAC.
2.3 Modelling Spatial Heterogeneity
SH can either be considered in a discrete way or in a continuous manner. Both approaches are discussed in this section.
Discrete approaches
Discrete attempts model SH based on predefined spatial units (for example, federal states) which are usually considered within the regression as fixed or random effects.
Fixed effects model
Within the regression framework, fixed effects or spatial indicators for regions can be integrated, which let the intercepts vary over space. Slope heterogeneity can be controlled through spatial interaction effects of the spatial indicators with explanatory covariates (for example, Kestens et al., 2004). However, they implicitly assume prior knowledge about the actual spatial process. Modelling heterogeneity using a large number of fixed effects can result in insufficient observations within regions for parameter estimations which, due to the loss of degrees of freedom, decrease the prediction accuracy. Therefore, it is common practice to interact in an ad hoc fashion where only ‘one variable at a time’ is considered (McMillen and Redfearn, 2010, p. 713). Thus, a trade-off between both data fidelity and reduction of prediction accuracy is required which is partially solved by the random effects model (Orford, 2000; Goldstein, 2011).
Random effects model
The simplest case of a random effects model is the one-way intercept model, where intercepts vary spatially. Random effects can be approximated as a weighted average of the mean of the observations in the spatial units (corresponding to a dummy specification) and the overall mean of the whole country (Gelman and Hill, 2007). The weights are determined by the amount of information in a given region. For small sub-sample sizes with little information, estimates tend to equal the global mean, while for large sub-samples estimates tend towards the ‘unpooled’ dummy estimate. One advantage of this modelling strategy is that the closer the estimates are to the ‘pooled’ model, the less effective degrees of freedom are used. However, if random effects are correlated with the predictors, the regression yields biased and inconsistent results. Similar to fixed effects, random effects models can be generalised by two-way structures (for example, random time effects) and by letting predictors interact with the random effects, allowing different intercepts and/or slopes within each region (Goldstein, 2011).
Multilevel regression
As stated in Orford (2000), considering real estate as nested within several levels of spatial units turns the hedonic pricing model into a multilevel regression problem (for example, Goldstein, 2011). Multilevel models generalise random effects models to more than one hierarchical level, in which coefficients are determined by a probability model (Gelman and Hill, 2007). This second-level model has parameters of its own (hyperparameters), which are also estimated from the data of the first-level model. However, in fixed and random effects, as well as in multilevel models, spatial units do not necessarily coincide with the true spatial (possibly non-stationary) data generating process (Dubin, 1998). Thus, one faces the modifiable areal unit problem (MAUP; see Fischer and Wang, 2011). In order to incorporate more homogeneous region delineations, statistical methods, such as cluster analysis (for example, Bourassa et al., 1999) or regionalisation (for example, Helbich et al., 2013a), have been utilised. While Orford (2000) emphasises the usefulness of multilevel modelling, Fotheringham et al. (2002) sharply criticise the discrete nature in which space is implemented, which further implies that the price function is discrete and homogeneous within a spatial unit. Moreover, abrupt and artificial discontinuities along the spatial units are present, consequently this suggests continuous modelling of SH.
Continuous approaches
These techniques do not rely on exogenous assumptions concerning spatial units and therefore provide a data-driven modelling of parameter instability.
Polynomial regression and spatial expansion model
Dubin (1992) discusses the use of polynomial regression to investigate continuous large-scale variation. The expansion model developed by Cassetti (1972) relates to polynomial regression. Parameters vary as a function of the co-ordinates and are allowed to drift spatially. Comparing non-stationarity modelling approaches, Bitter et al. (2007) conclude that the expansion model is suitable to analyse previously known broad trends but fails to investigate complex unknown local spatial relationships. Similarly, Farber and Yeates (2006) state that complex patterns are poorly depicted by polynomial functions. Specifically, following Dubin (1992), this approach does not seem useful for hedonic modelling; this is because: polynomials tend to get distorted at the edges of the study area; higher-order polynomial terms are highly multicollinear; results are uncoupled from the spatial distribution of the entities; and, are too smooth for modelling local variation in prices.
Locally weighted regression
Compared with the mentioned models, non-parametric LWR (Cleveland and Devlin, 1988) offer substantial advantages, being highly adaptable for spatial housing data (McMillen and Redferan, 2010). Embedded in the LWR framework, the conditional parametric model, also termed GWR (Fotheringham et al., 2002), is an explicitly local model and circumvents problems discussed in the context of discrete modelling of heterogeneity and polynomial regression. GWR implicitly assumes continuously changing price functions and models SAC and SH (McMillen and Redfearn, 2010). The suitability of GWR has been extensively discussed: Wheeler and Tiefelsdorf (2005) note coefficient sign reversals. Jetz (2005; cited in Páez et al., 2011) hypothesises that it might be induced by artificially localising the model, causing a local omitting variable bias. Moreover, Wheeler and Tiefelsdorf (2005) have found correlations among estimated GWR parameters which may affect interpretations. Wheeler (2007) argues that local collinearity in the covariates is the reason, which inflates variances and in turn yields reverse parameter signs. Based on Monte Carlo simulations, Farber and Páez (2007) examine the GWR bandwidth calibration. They conclude that globally fixed bandwidths do not appropriately reflect spatial processes, resulting in volatile regression coefficients. Extending Farber and Yeates (2006), who detected extreme coefficients which might emerge from local distributional irregularities, Cho et al. (2009) provide evidence that fixed bandwidths and less spatially dense data are potentially more vulnerable to extreme coefficients. Also McMillen and Redfearn (2010) favour a varying bandwidth. Recently, Páez et al. (2011) mitigate the doubt not to use GWR for inference. Criticising previous studies (for example, Wheeler and Tiefelsdorf, 2005) due to their limited sample sizes of less than 400 observations and their restricted research designs, they provide simulation-based evidence that spurious correlations are noticeably reduced by samples of more than 1000 observations. A strong advantage of GWR is its flexibility (Farber and Yeates, 2006; McMillen, 2010) and that the price function needs no prior assumption concerning the price determination process and its spatial variation (for example, Yu et al., 2007). This corresponds to the suggestion of Ugarte et al. (2004), promoting more data-driven techniques. However, for some of the covariates the impact on the price can be global, while for others it may vary locally. Such constraints on the parameters can be imposed by politics or economic theory (Wei and Qi, 2012). This fact is usually neglected in GWR specifications (for example, Bitter et al., 2007; Yu et al., 2007; Hanink et al., 2010), resulting in inefficient estimates (Wei and Qi, 2012). Therefore, based on Robinson (1988), Fotheringham et al. (2002) propose an extension of the basic GWR, resulting in the statistically more parsimonious MGWR, which is the focus of this research.
3. Methods
This section introduces global econometric models and local as well as semi-local models, relevant for the subsequent empirical analysis.
3.1 Global Models
Non-spatial reference model
Formally, the OLS model is defined as
where,
SH is modelled by using fixed effects, which reduces the extent of the prediction error, removes most of the systematic error and leads to more reliable estimates (Can and Megbolugbe, 1997; Pace et al., 1998). However, as McMillen (2003) points out, unexplained similarities between neighbouring houses can result in SAC, which is captured using spatial autoregressive models.
Spatial autoregressive model
The family of spatial autoregressive models, popularised by Anselin (1988), comprises two special cases—namely, the lag and error models. The former extends the standard regression model by including a spatially lagged dependent variable (further on called the SAR model), while the latter addresses SAC by defining a spatial autoregressive error process (Anselin, 1988). If only residual SAC is present, OLS estimates are inefficient, although unbiased and consistent. However, if a spatial lag process exists, OLS produces biased and inconsistent estimates. The Lagrange multiplier statistic (LM) on the estimated OLS residuals is applied to decide between these two alternatives.
The SAR, which is employed in this study, is written as
where,
Equation (2) shows that this model adjusts for SAC in the dependent variable, interpretable as spillover effects. Bivand et al. (2008) discuss several options for the specification of
Spatial two stage least squares with spatial heteroscedasticity and autocorrelation consistent standard errors
Homoscedasticity, as stipulated in both the OLS and the SAR model, is a strong and restrictive assumption (Kelejian and Prucha, 2007). Recently, however, Piras (2010) provided functions for robust inference in the presence of spatial heteroscedasticity and autocorrelation (SHAC) across spatial units estimated by a spatial two stage least square procedure (S2SLS)
where,
Clearly, global models have the drawback that they assume homogeneous behaviour of the parameters over space, which may be a misspecification of reality and consequently may be locally biased. Therefore, locally weighted regressions provide significant advantages (McMillen, 2010).
3.2 Local and Semi-local Models
Geographically weighted regression
GWR performs a series of weighted least squares regressions on subsets of the data, where the influence of an observation i decreases with the Euclidean distance to a regression point j. These distance-dependent weights are determined by a kernel function, including the bi-square, Gaussian, or tricube function. More important than the kernel type for model estimation is the range of the input data (bandwidth) to which data points are considered for each LWR (Fotheringham et al. 2002). Besides assuming a predefined and fixed bandwidth, an adaptive bandwidth has been proven to be highly suitable in practice (McMillen and Redfearn, 2010), reflecting a relation between the density of the regression points and the bandwidth (i.e. denser points yield smaller bandwidths and vice versa). Often, adaptive bandwidth selection is determined by cross-validation (CV; Farber and Páez, 2007). This reduces large estimation variances in sparsely sampled areas. To achieve parameter surfaces of the coefficients on non-observed locations, geostatistical algorithms (Pebesma, 2004) can be employed (for example, Yu et al., 2007).
Mixed geographically weighted regression
MGWR (Fotheringham et al., 2002) is a statistically more parsimonious version of the GWR, where some coefficients (usually those with non-significant variation over space) are kept constant, saving degrees of freedom and improving efficiency of estimators of coefficients (Wei and Qi, 2012). Such a model is written in the form as follows
where, yi is the logarithmically transformed sales price of observation i;
The MGWR calibration procedure can be achieved by a multistep algorithm (Fotheringham et al., 2002), a two-step procedure (Mei et al., 2006) and a constrained type (Wei and Qi, 2012). Due to easier implementation, the multistep algorithm is applied
Each global covariate
The residuals of the response are regressed on the residuals of the global covariates using OLS:
The estimated global effects are then subtracted from the original response variable, resulting in a partial residual of the response:
Finally, these adapted prices are regressed on the spatially varying covariates by using the basic GWR method, resulting in the varying GWR coefficients:
4. Study Area and Data
4.1 Housing Data
Housing data for Austria are provided by the UniCredit Bank Austria AG. The dataset consists of 3887 geocoded single-family homes. Figure 1 shows their spatial distribution. Clearly, observations are non-uniformly distributed and show a concentration in the eastern federal states. Individual transaction prices of house purchases recorded in Euros have been collected from 1998 to 2009, together with the physical properties of the houses, in order to estimate the value of the collateral for mortgages.

Spatial distribution of houses depicted as points and as normalised kernel density estimation. The larger black point symbols represent the nine Austrian federal state capitals within each federal state (black boundary lines).
Official census data published by Statistics Austria and Michael Bauer Research GmbH for 2001 and 2009 respectively, describe the socioeconomic characteristics of each municipality and enumeration district (thereafter called neighbourhood covariates). Table 1 provides an overview of the data.
Description of the variables.
4.2 Covariates and Their Expected Effects
Physical covariates
Nine physical covariates, measuring the size of the house and describing the quality of the house, are included. A positive effect of the floor area and the size of the plot the house is built upon is expected. In line with Malpezzi (2003), both covariates are logarithmically transformed to account for the multiplicative structure. Furthermore, the market value of a house is dependent on its structural condition and architecture; that is, the efficiency of heating, the presence of a garage and basement as a positive asset, as opposed to a traditional top-floor attic which might reduce the price ceteris paribus due to the limitation of the amount of useable area.
Temporal covariates
The age of the building at a given time of sale reflects property depreciation over time and should decrease house prices, notwithstanding a vintage effect, having an opposite effect (Can, 1998). Stevenson (2004) proposes an additional quadratic age term to capture non-linearities (see also Helbich et al., 2013b). The year of the house purchase can be considered as the remaining unexplained temporal heterogeneity and is a measure for the quality-adjusted development of prices over time.
Neighbourhood covariates
Both the purchase power index and the proportion of academics reflect disposable income and the economic status. These covariates should therefore affect house prices positively. In contrast, the population age index reflects excess of age and serves as a proxy for structural weakness. This index should have a negative effect on house prices. Also, the unemployment rate reflects economic weakness and should have a negative impact. Furthermore, urban economic theory (for example, McDonald, 1997) states that shorter commuting distances to centres of economic activity should raise property prices, which is why a high commuter index should tend to affect prices positively. Although close proximity to these centres can provide certain amenities (for example, functional services) there might occur disamenities, such as environmental pollution. Therefore, the effect of commuting distances is unclear. Logged population density measures urbanity. As land becomes more valuable in densely populated areas, a positive effect on house prices is expected. Finally, the quality of living within a municipality is quantified by a weighted average of the dwelling category according to the Austrian rental law. It is assumed that in areas with largely high-quality dwellings prices are higher.
Compositional effects
Neighbourhood effects can be used at the level of the enumeration district or the municipality. In the latter case, compositional effects (Goldstein, 2011) allow neighbourhood variables to be integrated at both hierarchical levels in which they have been measured. The difference between a hierarchy level (for example, enumeration district) and the average on the higher level (for example, municipality level) is considered. The resulting effect is then interpretable as a level-specific differential effect of the higher level.
Regional indicators
Each house is nested within one of the nine Austrian federal states. However, due to a sparse sample in the western federal states of Vorarlberg and Tyrol, both regions are considered as one spatial unit in the global models. Such ad hoc stratifications are commonly used to model small-area spatial heterogeneity (for example, Bischoff and Maennig, 2011). This kind of unexplained spatial heterogeneity is accounted for by using a regional indicator specification.
5. Results
5.1 Descriptive and Exploratory Analysis
The mean transaction price of a house is approximately 167,200 Euro, with a minimum of 30,000 Euro (in Haugschlag next to Czech Republic border) and a maximum of 550,000 Euro in Mödling, in the south of Vienna. Initially, for exploratory spatial analysis, the weight matrix, reflecting the houses’ spatial configuration, is defined. Distance-based weights are rejected, as a distance of 21.6 km would have been necessary to avoid houses without any neighbours, which seems unrealistic. Instead, a k-nearest neighbour definition with row standardisation is used, accounting for heterogeneously distributed dwellings (Bivand et al., 2008). To be consistent with the global models, a commonly used value of 5 is chosen (for example, Wilhelmsson, 2002). The impression of spatially autocorrelated dwelling prices is supported by a significant Moran’s I statistic (I = 0.286, p <0.001). Due to a sharing of similar socioeconomic characteristics of the residents and quality of services, Dubin (1998) claims that such patterns are common in real estate analysis and may affect subsequent regression analysis.
5.2 Global Models
This section reports the estimation results of the OLS, SAR and S2SLS models. To realise a more parsimonious OLS model, a stepwise variable selection procedure minimising the Akaike Information Criterion (AIC) is applied. With a highest value around 2, the generalised variance inflation factors gives no indication of multicollinearity. With an adjusted R2, the OLS model (Table 1) explains around 47 per cent of the variation in logged prices. Excluding the regional dummies lowers the adjusted R2 and increases the AIC about 204. Further model inspection shows inconsistency with the Gauss–Markov assumptions. The Breusch-Pagan test indicates heteroscedasticity (BP = 166.519, p<0.001). This is a recognised problem in econometrics and, hence, White’s heteroscedasticity consistent coefficient covariance matrix is calculated (Kleiber and Zeileis, 2008). However, the adjusted standard errors show only marginal differences. Furthermore, these results are based on the incorrect assumption of spatially uncorrelated model residuals (I = 0.080, p <0.001), leading to inefficient parameter estimations, with the resulting confidence intervals being incorrect. This misspecification is a result of existing spatial effects and must be taken into account.
Autoregressive models adjust for SAC. On the basis of the robust LM test (LM = 28.506, p <0.001) it is concluded that a respecification in terms of a SAR model is a proper alternative, which is in line with Yu et al. (2007). Thus, the OLS model results are not only inefficient, but also inconsistent, and an additional spatial lag based on k = 5 nearest neighbours is considered. Tests show that the parameter estimations are robust against changes in the weight matrix (for example, different k-values, distance decay functions) and lead to marginal changes in the coefficients. Again, variables are selected based on the full model. Predictors not significant on a 0.05 level are omitted stepwise. If the main effect shows no significance, the corresponding compositional effect is also dropped. Overall, the SAR model (Table 2) performs significantly better than the OLS model, confirmed by the AIC reduction. A comparison of the model predictive quality, applying the root mean square error (RMSE), also supports the SAR model. Dropping the federal state dummy increases both the RMSE and AIC; ρ is highly significant. Thus, the model captures SAC effectively and points to relevant spillover effects. Consequently, the error term is not plagued by SAC. However, the BP test rejects the homoscedasticity assumption (BP = 171.308, p <0.001). Hence, heteroscedasticity and autocorrelation consistent standard errors, applying the S2SLS model, are required. As before, non-significant predictors are removed in a stepwise manner, starting with the full model.
Estimation results of the global models.
Notes: significance: *** 0.001; ** 0.01; * 0.05.
Both the SAR and S2SLS coefficients of the neighbourhood covariates are smaller compared with the OLS model, showing the bias induced by the neglect of SAC in the OLS model. A comparison between SAR and S2SLS with a triangular kernel and k = 5 nearest neighbours indicates that the coefficients remain virtually unchanged. The S2SLS results in slightly smaller standard errors compared with the SAR model. The final S2SLS consists of 17 significant covariates. Both covariates quantifying the size of a house are positively related to the house price. A 1 per cent increase in total floor area results in a 0.4 per cent increase in the price, and the plot space shows a positive elasticity of 0.1 per cent. A poor overall house condition leads to a loss in value of approximately 4 per cent, whereas a poor heating system reduces house prices by more than 10 per cent compared with a high quality heating system. Furthermore, a poor quality bath leads to a price reduction of 7 per cent and a traditional attic reduces dwelling prices by up to 2 per cent. A poor quality garage reduces prices by 8 per cent. Other properties, such as the existence of a basement (+13 per cent) and terrace (+7 per cent), have the expected positive effects on housing prices. Both temporal effects are statistically significant. The house’s age has a marginally negative impact on the price. Contradicting original expectations, the quadratic age term is insignificant. The year of purchase, modelled as linear effect, suggests that there has been a price increase over time.
As stated above, compositional effects are only used if both levels are significant. In the S2SLS model, the main and the differential effects of the proportion of academics show significance. A one percentage point increase in the proportion of academics in a municipality results in a 0.8 per cent increase in house prices. The corresponding differential effect shows that, within a municipality, a one percentage point increase of this proportion results in a further 0.5 per cent increase in house price value. An increase in the purchase power index also has a positive effect (+0.5 per cent). Furthermore, an increase in the average population age of one year, reflecting structural weaknesses, reduces expected house prices by some 2 per cent. Densely populated areas show a marginally positive elasticity. Although the compositional effect for the rate of unemployment is significant, it is removed because its main corresponding effect is not significant. Finally, with the exception of Upper Austria, the federal state dummies are highly significant and indicate SH. For instance, properties located in the structurally weaker state of Burgenland have a significantly lower price compared with houses situated in Lower Austria, serving as reference category.
5.3 Local and Semi-local Model
Next, GWR and MGWR are estimated. A data-driven approach to explore functional instability replaces the federal states dummies. For both models, the analysis is restricted exclusively to the main effects. Otherwise, compositional effects would result in complex and difficult interpretation of local effects. Because of a significant compositional effect of the unemployment rate and an insignificant corresponding main effect in the S2SLS model, the unemployment rate is additionally considered at the enumeration district level. To achieve a trade-off between computational burden and estimation accuracy, a random sample of 50 per cent of the overall dataset is used and only those variables are chosen that are significant in the global S2SLS model (Table 1).
First, it is necessary to distinguish between global and local effects using non-parametric GWR with a Gaussian kernel. CV determines an optimal adaptive bandwidth that includes 227 neighbouring houses. The test statistic after Leung et al. (2000a) does not reject the H0 of stationary parameters for six out of 16 variables. In a MGWR specification, those six variables are thus held spatially constant, while the remaining 10 variables are allowed to vary. Again, applying a Gaussian kernel function, CV results in an optimal bandwidth of 177 neighbouring houses for the MGWR. Compared with the GWR, the AIC score is noticeably reduced from 1396 (GWR) to 1381 (MGWR), while the RMSE for both models is quasi-similar (0.338). Statistically, the MGWR is favoured. Sensitivity analysis with alternative kernels (i.e. bi-square kernel) shows no significant differences.
The spatial variation of the model fit ranges from 0.24 to 0.49. The lowest values are located in Vienna plus its surroundings and in the south of Austria. The best fit is achieved in the northern parts of Austria. The reason for the low fit in Vienna might be that flats dominate the market in the capital. Figure 2 visualises the parameter surfaces of the local R2 and the predicted dwelling prices in Euro, achieved through ordinary kriging (Pebesma, 2004). The F(3)-test (Leung et al., 2000a) confirms that now all spatially varying covariates are statistically significant. No significant residual dependence on a 0.01 level is detected by the Moran’s I (Leung et al., 2000b).

Local model performance and the predicted house prices.
Table 3 lists the MGWR results and its global counterparts. Additionally, Figures 3–5 visualise the parameter surfaces. Based on pseudo t-values, only significant coefficients on α = 0.1 have grey tones.
Estimation results of the semi-local model.
Notes: significance: *** 0.001; ** 0.01; * 0.05.

Physical covariates and temporal effect.

Temporal effect and neighbourhood covariates.

Neighbourhood covariates.
The negative MGWR intercept is not, by itself, meaningful. The spatially fixed coefficients consist solely of physical covariates, meaning that buyers value these housing characteristics equally across space. Furthermore, these stationary coefficients illustrate that real estate markets are economically connected. Compared with the S2SLS estimates, the MGWR coefficients do not considerably deviate, which is an indication of model suitability.
The total floor area is statistically significant across the entire study site and shows strong spatial variation (Figure 3). Its positive effect on house prices is less pronounced closer to Vienna and reaches its largest positive effects in parts of the federal states of Salzburg and Tyrol. These areas include well-established winter sports sites (for example, Kitzbühel) where tourism and vacation homes, secondary residences, etc. increase house prices. Particularly, the scarcity of space leads to higher prices for additional floor area in these regions. In contrast, a different pattern can be found for logged plot space. The marginal values are more distinct in the north-east of Vienna’s suburbs (Figure 3). A 10 per cent increase in plot space results in a price increase of 1.5 per cent and prices decrease the farther away from Vienna. Additional plot area has little effect on house prices in southern and central Austria, meaning that in these regions, the proportion of land value in the total value of the house is relatively low. This leads to the conclusion that plot space is sufficiently available and lowers demand. Inferior heating systems have a negative effect (Figure 3), yet this effect is not significant in central Austria and has its most negative impact in the surroundings of Vienna (up to −21 per cent). Hence, in these suburban areas the appreciation of a high technical standard is relatively high. It turns out that in these regions there is a large share of new construction, so the spatial variation of this effect is likely to be due to an unobserved interaction effect between the age of the house and the technical standard of the heating system. Both temporal effects possess a significant non-stationary pattern (Figures 3 and 4). Most notably, the age effect is rather weak in Vienna (−0.2 per cent for an additional year of age), while in Linz and its surroundings the marginal effect is close to −1 per cent. The lowest age effects are observed in central Upper Austria. In the Vienna region, there seems to be a tendency towards renovation of older buildings. This can also be interpreted with respect to the effect of an inferior heating system, as this effect is very pronounced where the effect of the age is weak. The year of purchase is most significant in the western federal states, whereas the northern surroundings of Vienna show the highest house price values increase over time (2 per cent per year). This is a consequence of the on-going suburbanisation, provoking increasingly high demand.
The unemployment rate 1 (Figure 4) is not significant for the northern regions of Austria but, as expected, is negatively related to house prices in the southern federal states. Unemployment yields the strongest negative effect close to the cities of Salzburg and Graz. The purchasing power index (Figure 4) shows both negative and positive effects, but only the latter are statistically significant. Highest positive effects are achieved in the federal state of Salzburg (+1 per cent). The proportion of academics, in Figure 4, shows a slight north-west to south-east trend, where maximum values can be found in the northern parts of Salzburg. In this area, an increase in the proportion of academics of one percentage point results in a 3 per cent increase in dwelling prices, holding all other effects constant. This compares with just 0.5 per cent in the south-western parts of Austria (for example, Innsbruck). Beside this, Figure 5 illustrates a limitation of MGWR. For example, the area around the city of Salzburg shows a high impact of the proportion of academics on price. Against our expectation that high purchase power affects house price positively in this area, the coefficient is partly negative, but not significant at all. Local sign reversals are also reported in Wheeler and Tiefelsdorf (2005) and Wheeler (2007). Possibly, this is caused by local correlations between these covariates. The age index in Figure 5 shows a slight east–west divide. In the western federal states of Salzburg, Tyrol and Vorarlberg, an increase in average population age of one year results in a price reduction of approximately 5 per cent. Outside the cities of Linz and Vienna, this covariate shows no significance at all. Finally, the logged covariate population density (Figure 5) has the highest elasticity in central Burgenland, while in the federal state of Salzburg and most parts of Upper Austria this covariate has no explanatory power.
By visualising the predicted house prices in Figure 2, a distinct pattern of house prices becomes evident. Prices decline smoothly and relatively symmetrically with distance to the urban cores and peripheral areas yield unsurprisingly lower prices. These results agree with monocentric urban models. Comparing the price gradients between federal state capitals, substantial slope differences of the price gradients are noticeable. Significantly higher prices are especially achieved in Vienna, Salzburg and Innsbruck and their surroundings. Additionally, localised peaks with high house prices, primarily in scenic and tourist-oriented areas, are noticeable. In contrast, significantly lower prices are predicted within the structurally and economically weaker Burgenland as well as the northern parts of the Weinviertel next to the Czech Republic and Slovakia.
6. Conclusions
Hedonic modelling is frequently used to access the price of real estate. Although SH in the price function is obvious, it is a priori unclear how to model it which constitutes a major challenge. Analysing data for single-family homes throughout Austria from 1998 to 2009, the prime intention of this research is an empirical comparison and evaluation of global and locally weighted state-of-the-art hedonic models in order to model SH.
In the global specifications—comprising OLS, SAR and S2SLS models—SH is considered by means of federal state dummies, representing homogeneous regions. The empirical model comparison reveals that, independent of the model, ignoring SH always leads to a lower model fit and worse prediction accuracies. The OLS estimates confirm findings reported in Yu et al. (2007), that the non-spatial model systematically tends to overestimate the importance of covariates. Besides correcting for SAC, the SAR and S2SLS model also fit substantially better than OLS. Moreover, the S2SLS coefficients show only minor changes compared with the SAR model, which is similar to Bivand (2010). The results clearly confirm that heterogeneity must always be taken into account, even if dummy variables are obviously less suitable to explain micro-geographical relationships. Exogenously defined regional indicators are only partially capable to account for SH, predominantly when SH beyond the selected dummies is present. Due to remaining and unmodelled SH, this requires a technical amendment in terms of a S2SLS considering SHAC to achieve a well-specified hedonic model.
Having provided evidence that global specifications are not fully capable of modelling SH, functional instability is further explored by LWRs—namely, GWR and MGWR—which prevent the observed weaknesses of fixed effects. Both GWR and MGWR are data-driven approaches to model smoothly varying marginal prices without adhering to regional dummies to model SH and are consequently less prone to the MAUP effects. Compared with previous studies by Bitter et al. (2007), Yu et al. (2007), Páez et al. (2008) and Hanink et al. (2010) which assume without data support that all covariates have non-stationary effects on house prices, this research proposes MGWR dealing with both stationary and non-stationary effects simultaneously. Compared with the global models, MGWR is evidently more flexible, while being more parsimonious than GWR, which improves model efficiency (Wei and Qi, 2012). In contrast to the global models, the prediction error is reduced by both LWRs. It is assumed that the spatial heteroscedasticity in the SAR model originates from an inappropriate selection of fixed effects, which is also resolved by LWRs. Moreover, it is shown by the MGWR that significant spatial variation in some of the estimated parameters is present, while the global effects provide evidence for policy-based linkages and an economically connected housing market across Austria, which would be neglected by the traditional GWR.
There are, however, some limitations to this research. The variable selection is based on the global S2SLS model. Therefore, it is possible that different features in different parts of the study area may cancel each other out, incorrectly suggesting a lack of significance in the global model and, hence, omitted in the LWRs, which can cause, among others, omitted variable bias, although the non-significant residual SAC refutes this argument. Furthermore, computational burdens limit MGWR for large datasets. Hence, more computation-efficient methods are promising, such as tensor product smoothing (Wood, 2006; Helbich et al., 2013b).
Finally, this study results in some general implications for policy-makers, mortgage lenders, appraisers, etc. LWRs produce meaningful and reliable estimates which are highly suitable to inform about local housing conditions. The spatial diversity of the coefficients is of utmost importance for locally acting decision-makers, requiring explicit knowledge of the local or regional housing market. This serves to refine policies and gain deeper understanding about local house price anomalies. Thus, it is recommended that LWR should occupy a more central role in practitioners’ toolboxes and methodologies.
Footnotes
Funding
Marco Helbich gratefully acknowledges the Alexander von Humboldt Foundation for supporting this research.
