Abstract
Asking price has a fundamental influence on market behaviour and understanding how attributes affect the perceived price of housing. This paper employs a large database of houses from the Spanish housing market to estimate the role of attributes in asking price formation. STAR methodology and GLM with random effects are used to extract the price of attributes and the spatial and time pattern. Results show that the pricing of attributes varies by geographical region and over time with property size and economic and demographic attributes being the key variables explaining asking price formation.
1. Introduction
Research into house prices and heterogeneity in the housing market from both theoretical and empirical perspectives has centred on the application of hedonic techniques which consider that price is obtained from a combination of attributes reflecting household preferences. Hedonic methods have largely been used to estimate the price of housing attributes, calculate price indexes adjusted by quality and determine the shadow price of housing components. For example, single-family homes in the same neighbourhood are likely to share similar structural characteristics (Basu and Thibodeau, 1998), while the weight of each attribute, relative to the others, reflects purchaser preference or taste (Tse, 2002). It is argued that weights are invariant inside a homogeneous sub-market (Sun et al., 2005), but that weights change among sub-markets reflecting differences in demand preferences. In this context, space–state techniques are used in the estimation of the price of housing attributes to avoid spatial and time autocorrelation that is usually present in information obtained at the house (transaction) level and which may bias estimated parameters.
The aim of this paper is to estimate the price of attributes for houses in the Spanish market, to test differences among regions and assess how variation across urban areas affects house prices. A hedonic method is used to define the implicit prices of characteristics using asking prices, the supply side of the market. Asking prices encapsulate a set of housing characteristics that refer to both supplier features and buyer features with the residuals considered to capture information about sellers’ preferences. The use of asking or list price is not unusual in the housing literature; for instance, time on the market studies are often based on list price (Knight et al., 1998; Arnold, 1999; Anglin et al., 2003). There is also a body of literature which has analysed the search process and its impacts on transaction prices, comparing asking prices with selling or transaction prices (Genesove and Mayer, 1997; Harding et al., 2003). Pryce and Mason (2006) observe that indices based on sale prices carry transaction biases in that properties which trade form the respective samples with no cognisance paid to those which do not trade. Furthermore, McGreal et al. (2010) show that asking prices tend to be close to transaction prices and only deviate to any appreciable extent in rapidly rising or falling markets with asking prices lagging sale price in the up-cycle and the opposite effect in the down-cycle.
In assessing how implicit prices vary over time and space, the paper seeks to separate the relative importance of these two effects. The former captures the dynamics of change whereas the latter embraces different market and economic structures, household variation and sentiment. The paper extends the existing literature base by examining whether parameters change and tests the hypothesis that time and space effects do not modify the value of hedonic parameters. Controlling by time and space to reduce autocorrelation problems in hedonic studies has been previously discussed by authors such as Tse (2002), in this paper the particular research question tested is whether parameter weights are stable or change over time at an urban level and how dynamics across urban areas may modify the price assigned to attributes.
The paper is organised as follows. The second section reviews the literature on hedonic house price modelling and state–space models. Section 3 provides details of the database and variables used in the analysis and model implementation. Sections 4 and 5 present the results of the analytical component of the study; the former considers spatio-temporal autoregressive (STAR) models and the latter the general linear model (GLM). Conclusions stemming from the paper are discussed in section 6.
2. Literature Review: Hedonic Price Models
The literature on hedonic models is well established and used mostly to estimate quality adjusted house price indices (Rosen, 1974; Linneman, 1980; Haurin et al., 1991; Peek and Wilcox, 1991; Geltner, 1993; Adair et al., 1996; Clapp, 2004) or to test the impact of different characteristics on prices (Goodman and Thibodeau, 1995; Clapp and Giaccotto, 2002; Bourassa et al., 2005). The majority of these papers are based on the seminal work of Rosen (1974) who argued that the estimated hedonic price characteristics identify neither demand nor supply but are described by a joint envelope function containing both housing attribute groups. However, certain authors have criticised the adherence to hedonic models maintaining that they are characterised by econometric problems and thus provide limited accuracy in the estimation of house prices (Goodman and Thibodeau, 1995). This has raised questions concerning the ability to capture the full behaviour of house prices, in light of criticisms that hedonic models focus on internalising the dynamic evolution of the market (Case and Wachter, 2005).
The potential for bias in hedonic models, arising from spatial and temporal autocorrelation effects, results in inefficient estimated parameters with large errors (Anselin, 1999; Tse, 2002) while failure to correct for autoregressive processes could lead to strongly biased price indexes (Hwang and Quigley, 2010). Two further statistical problems identified in the literature can affect the estimation. First, attributes change among locations and market segments due to sub-market effects (Maclennan and Tu, 1996; Adair et al., 1996; Tu et al., 2007); and, secondly, highly correlated attributes produce multicolinearity which impedes robustness in regression estimated parameters. Furthermore, a lack of sufficient attributes creates an omitted variable problem in the model with residual correlation (Pace et al., 1998; Basu and Thibodeau, 1998; Des Rosiers et al., 2000). Thus, it is important to reduce the effect of autocorrelation by time and space.
The conventional linear hedonic model follows a functional form as in equation (1)
where, Ph = price of the house
With observations belonging to different groups (i) across years (t), the model produces efficient parameters and robust results when X and v are assumed to be independent with
Three different forms of dependence create non-zero correlations adding unknown components to the residuals (Kuethe et al., 2008; Petersen, 2009; Tse, 2002). First, parameters that vary across housing attributes within a known and observed sample are the result of similarities among houses in a neighbourhood and differences between neighbourhoods,
Different econometric techniques have been developed to avoid the bias generated by autocorrelation and control for spatial dependence. In particular, STAR models have been utilised to avoid parameter mis-estimation. The STAR model (see Anselin, 1999; Anselin et al., 1992 and Pace et al., 1998) considers that both geographical and time dependences jointly affect housing attributes and could be parametrised in a matrix
The matrix
The STAR model is obtained by multiplying both sides of the equation by
Then
As
Anselin (1999) defined four types of STAR models. The full STAR time–space dynamic model, which includes all forms of dependence, follows equation (5)
where,
The econometric method to estimate STAR models depends on the specification and in this respect there is general agreement in the literature that OLS is an imperfect tool. However, Anselin (1999) considered that OLS is a good estimator when a pure-space recursive model is estimated as errors are defined as independent and satisfy the asymptotic assumption of the classical regression model. Other options to estimate the model include two-stage spatial least squares (2SSLS), which achieves the consistency and asymptotic normality properties of the standard 2SLS, the generalised method of momentums (GMM) and non-parametric models.
Literature concerned with estimation of spatial-temporal effects in housing markets has embraced studies adopting varying econometric tools. For example, Bourassa et al. (1999) employed a sequential method, clustering the information using principal components analysis and estimating the hedonic price equation using a priori sub-markets. Clustering attributes in order to minimise collinearity in the hedonic estimation has been applied by Leishman (2009) using a random intercept model in the hedonic OLS specification of housing prices. Pace et al. (1998) employed maximum likelihood and Kuethe et al. (2008) used flexible least square FLSE methodology. Case et al. (2003) compared OLS, OLS with spatial trend and Dubin’s maximum likelihood while Caliman and di Bella (2011) used OLS in a spatio-temporal recursive model specification.
3. Data and Model Development
This paper is concerned with the variation of house prices in seven Spanish provinces 2 over the period from 1995 to 2010, a period which incorporates the global financial crisis (GFC) with Spain amongst those countries most affected by the GFC. The valuation database 3 includes evidence from the whole of Spain but there is a strong regional presence in the provinces of Alicante, Valencia, Murcia, Castellón and the Balearic Islands, as well as significant activity in the two major provinces of Spain, Madrid and Barcelona. In total, the database includes 2 362 800 observations over the period considered.
The variables in the database are sub-divided into property variables (capturing structural attributes/building characteristics), neighbourhood and accessibility variables and socioeconomic characteristics. The dependent variable is the asking price of each property. In addition, the data include variables reflecting time and spatial characteristics. Table 1 provides a summary of the variables, their scaling and other characteristics.
Summary of variables
Asking price (the median price) over the period 1995–2010 for the seven provinces is illustrated in Figure 1. 4 Differences are apparent between the seven provinces. For example, in Madrid the asking price distribution is distinctly different from most other provinces, although Barcelona shows a similar pattern from 2002. Differences are also apparent concerning when asking prices started to increase rapidly. In the case of Murcia, Valencia, Castellón, the Balearic Islands and Alicante, strong growth commenced in 1999, whereas in Madrid and Barcelona this is observed later, in 2001.

Asking prices average: by year in euros.
The valuation database, as it contains observations at different points of time, is essentially a pool of data but also has the characteristics associated with panel data. A large number of different characteristics of the property are included, several of which are qualitative variables scaled from best to worst in an effort to capture the specific differences in these characteristics as part of the valuation, but raising the potential that the database contains a degree of endogeneity in similar attributes and non-independence.
Observations in the database are at a property-specific level with both spatial (location) and time dimensions in an appropriate format to utilise STAR models. However, the database is not geo-referenced, meaning that a conventional spatio-temporal matrix cannot be utilised. A further limitation is the lack of an exact date, with year being the only time reference. The database provides information about the location of each property and whether the comparable is located in a dependent municipality (social and economic) of a larger city, an autonomous city, a county capital or the province capital. These four levels describe an urban structure which is taken as the space reference for this paper and used to build the spatial matrix.
For the analysis, the time–space recursive functional form proposed by Anselin (1999) to estimate a pseudo spatio-temporal hedonic model is adopted. The model is stated in equation (6).
where, yit
is the element of the vector
It is assumed, following Dubé and Legros (2011), that time and spatial effects do not occur simultaneously allowing exploration of the separate effects of time and space on both asking prices and attributes with
Then
where,
The
where,
Such dimensions reflect the spatial dependence between urban areas—how they depend on each other administratively or for particular services (schools, health system, public services). Matrix
The time matrix is represented by the pooled set of dummy variables by year with n x 16 dimensions. The
4. Empirical Analysis and Modelling Space and Time–STM
The analysis seeks to identify the role of attributes in explaining the asking prices for properties over the period 1995–2010. Specific attention is paid to the role of demographic variables due to the population shock that occurred in Spain during the time-period covered by the analysis. The size of the database, the extent of the geographical coverage (seven provinces) and the time-series (16 years) provides added complexity. In this respect, the analysis isolates the space and time independent effects to allow for the different provinces and to capture variation, controlled by quality.
A time–space recursive model using 2SLS with spatial parameters, following Anselin (1999), is estimated using the pseudo time–space matrix (
A number of variables are transformed into dummies such as type of building which generates a dummy variable for property quality. Age is also included with its squared function (in logs, lage and lage2 ) to control for the non-linearity associated with this variable. Resarea captures whether the property is located in a prime residential area (resarea = 1), in a mixed neighbourhood (resarea = 2) or in a secondary area (resarea = 3). The variable is transformed into dummies to avoid collinearity, its parameter captures the effect of primary homes relative to a secondary location. All continuous variables are transformed into natural logs.
As the STM model includes the dependent variable in a log format, beta parameters represent pseudo elasticities in the log–log relationships. Thus each parameter is in effect the attribute’s shadow price representing how the owner values such attributes in the total asking price. The
The results of the model are shown in Table 2. 8 Two variables were discarded from the calculation: urbanenv (as more than 93 per cent of the observations were classified as being in an urban environment) and log age, as the variable log age squared (lage2 ) was more efficient in capturing the effect of this parameter. Residuals of the model are small and bell-shaped suggesting that they are randomly distributed as white noise.
2SLS regression model in recursive space–time framework: STM model results
anon-standardised parameters.
Notes: *** p <0.01; ** p <0.05.
The model gives results within acceptable margins with small standard deviations and errors for all provinces (with the exception of the Balearic Islands). Parameters are significant at an aggregate level for the seven provinces, an observation supported by the provinces’ parameter values. The model also includes time dummy variables representing the price index. 9
4.1 Socioeconomic Attributes
For the spatio-temporal model, all four attributes are statistically significant at the overall level and also for most provinces. Income is significant and positive in all models with a high power parameter in the Balearics and Castellón, between 0.3 and 0.5, but has lower impact (
Population attributes show the effect of migration and concentration arising from demand pressures with a positive effect on house prices. Two variables, density and changes in population, are included. Alicante (
The results indicate that the reaction of prices to an increase in population density is not consistent. Although the overall model returns a positive and statistically significant parameter (location in dense areas are priced at a premium), at the urban level, half of the provinces have a negative and statistically significant parameter between house prices and population density. As the model controls for neighbourhood quality, the willingness to live in a high-quality dense urban area should yield a positive parameter reflecting the ability to pay for that location, while the desire for a less dense area should give a negative parameter. For example, density is strongly negative in Castellón (
4.2 Neighbourhood and Accessibility Attributes
These embrace a bundle of quality measures ranging from the extent of developed land to the quality of facilities. The analysis suggests that shopping has a strong impact in Madrid (
School quality has mainly a positive effect on house prices especially in Alicante, Murcia and Valencia, but is negative in Barcelona and has no clear effect in Madrid, the Balearics and Castellón. Sport facilities are statistically significant with a positive effect on prices in the Balearics, Barcelona and Castellón, but not in Madrid, Murcia or Valencia, while leisure facilities have a negative effect in the overall model. Health facilities have surprising results, with a negative effect on price in the overall model as well in the models for the Balearics and Valencia suggesting that such activities are perceived as diminishing price. However, in Barcelona and Madrid, the opposite effect is apparent with a positive and statistically significant impact on house price (
The overall model and those at a province level notably for the Balearics, Barcelona, Castellón and Murcia indicate that properties located in a primary home area have a lower rate of price increase than those located in areas with a strong second-home market. Significantly, these provinces are located on the Mediterranean coast with a high incidence of second homes.
Accessibility measures the existence of bus, train and underground stations near the property. The overall model captures a strong significant and positive influence of the three transport modes on house prices (
4.3 Property/Structural Attributes
This group of attributes measures the quality of house characteristics. View and orientation are shown to have a positive effect on housing prices. View is especially relevant in Alicante and Castellón (
The number of dwellings in the building (measuring density of construction) is negative and statistically significant in the overall model. A similar interpretation is apparent in Alicante and the Balearics (
In all models, age is captured by the log of the squared term which is negative and statistically significant in Balearics (
4.4 Time and Space Effect
Time and time–space effects on prices are also apparent, the lambda coefficients show a strong effect and are statistically significant in the overall model (
5. General Linear Model
This section considers space–time effects using the general linear model (GLM) taking space (Urb_) and time (year) as random variables and estimating the interaction between these variables and house price attributes (as covariates) using maximum likelihood. The general linear model is used to estimate the variability in asking prices (expressed in logs) in accounting for various housing attributes or covariates and the random effects of time and space. Random effects which are included as fixed effects (assume that all observations are independent of each other) are not appropriate for analysis in panel data with the potential for correlated data structures that occur for house characteristics. Random effects are used at time and space levels to account for correlations between interest rates and house prices. The combined effects between random factors and the covariates are included as random interactions to capture the linear relationship between a covariate and dependent variable change for different levels of random factors. In order to compare with time–space results,
where,
The estimated effects are calculated against the last observed year (2010) and the higher urban level (level 4 referenced to capital cities) and interactions are defined using 25 out of 27 attributes.
10
Two models: GLM with random effects by time and space and pseudo STAR 2SLS model (STM)
The results are broadly similar to the pseudo spatio-temporal model calculated before (STM) with most estimated parameters in GLM having the same sign. In general, the STM estimated parameters are at the lower bound of those in the GLM random model. The main differences in sign are found in neighbourhood attributes for which co-linearity is strong. The lack of significance of bus stops at the province level is reproduced in the GLM; similarly, the relevance of underground stations in the main capital cities (Madrid, Barcelona) is also captured by the random model. View, orientation, construction quality, number of dwellings in buildings and lifts, all have similar parameter values. The age effect is better captured by the GLM random effects model while the floor area tends to be overweighted in STM. The parameter accounting for the effect on house price of extra housing areas (patios, garden) is similar in both models. The recursive parameter (
6. Conclusions
This paper evaluates the role of housing and related attributes in explaining asking prices in the Spanish market over a long period from 1995 to 2010. The paper utilises hedonic models to fit the pricing process and observes how the parameters change with time and space. In this respect, the paper makes an important contribution to the literature through its application to the Spanish market, of which there has been little previous analysis in the international literature, but more significantly adds to the knowledge base on how parameters can change over the property market cycle and spatially at a macro level by province. The former reflects the dynamics of the market and the latter captures the effect of different perceptions of the value of housing and housing-related parameters by region arising from different economic, social, cultural and household structure factors.
The modelling process uses two methodologies to estimate hedonic models obtaining shadow prices of housing attributes and controlling by space and time. The first method is based on STAR models; the second uses GLM to estimate the hedonic model including time and space as random factors and calculating interaction effects. By observing changes over time and space, the paper shows how perceptions of value influence different pricing models. In particular, the results illustrate that the structure of attribute values is stable among regions reflecting specific characteristics of their housing market. Weightings change with time suggesting that the perception of attribute values in price formation varies depending on the position in the housing cycle.
The results support the relevance of the income, population, accessibility and structural features in explaining house price and differences at a spatial level. The analysis also discovers how different regions shape their particular characteristics and the different role played by primary and second-home urban areas depending on location. It is shown that economic activity and diversity increase house prices in the regions and that population dynamism and density are important influences. The paper complements the literature regarding how a large database could incorporate bias in hedonic price indices if not controlled for time and space.
Footnotes
Acknowledgements
The authors wish to acknowledge TABIMED for access to the company’s valuation database.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
