Abstract
How to better evaluate the value of urban real estate is a major issue in the reform of real estate tax system. So the establishment of an accurate and efficient housing batch evaluation model is crucial in evaluating the value of housing. In this paper the second-hand housing transaction data of Zhengzhou City from 2010 to 2019 was used to model housing prices and explanatory variables by using models of Ordinary Least Square (OLS), Spatial Error Model (SEM), Geographically Weighted Regression (GWR), Geographically and Temporally Weighted Regression (GTWR), and Multiscale Geographically Weighted Regression (MGWR). And a correction method of Barrier Line and Access Point (BLAAP) was constructed, and compared with three correction methods previously studied: Buffer Area (BA), Euclidean Distance (ED), and Non-Euclidean Distance, Travel Distance (ND, TT). The results showed: The fitting degree of GWR, MGWR and GTWR by BLAAP was 0.03–0.07 higher than by ND. The fitting degree of MGWR was the highest (0.883) by BLAAP but the smallest by Akaike Information Criterion (AIC), and 88.3% of second-hand housing data could be well interpreted by the model.
Introduction
Accurate and efficient evaluation of the value of real estate is a major problem faced by real estate tax reform. Shanghai and Chongqing were selected as pilot cities for China’s current real estate tax reform, which still faces problems despite progress. The tax base is obviously unreasonable based on the price of the housing on contract signing date, as the value of the housing is in constant change with the change of location and time, and the price of the housing on contract signing date can not accurately reflect the current value of the housing. At present, the international common practice is to use the current evaluation value of the housing as the tax base evaluation standard. Real estate evaluation is divided into batch evaluation and case evaluation. Batch evaluation is more suitable for real estate tax base evaluation for its low cost, high efficiency and fairness. Model selection and optimization are essential in establishing an accurate and efficient housing batch evaluation model. First, select a model, and then optimize the model from the timeliness of data acquisition and quantification.
The existing literature has studied the housing price from macro and micro perspectives, which focuses on regional policies and regulations, supply and demand balance, economic level [1–4] and the location and characteristics of residential buildings [5–8] respectively. Due to the uncertainty of macro influencing factors, scholars mostly studied the housing price from micro perspective. By analyzing the literature of housing batch evaluation model, this paper found the literature studies the formation mechanism of housing price from two aspects: selection of spatial econometric model and measurement of characteristic factors.
(1) Selection of spatial econometric model
The models studying housing price and characteristic variables are divided into four categories. The first category is the global regression model: OLS [9–13], and Hedonic price model [14–18]; the second category is the improved global regression model: SLM [19, 20] considering the lag of observed values, and SEM considering the space error terms of explanatory variables [21, 22]; the third category is the local regression model: GWR considering spatial non-stationarity [23–33]; the fourth category is the improved local regression model: GTWR [34–37] considering temporal heterogeneity, and MGWR considering action scale heterogeneity [38–45].
Global regression model: Beekmans et al. [13] used OLS to explore the driving forces of industrial land in the Netherlands to estimate its value. Rosen [14] first used the Hedonic price model to evaluate the value of commodity characteristic attribute, which laid the theoretical foundation of Hedonic price model. Both OLS and Hedonic price model regard houses as homogeneous goods, and ignore the spatial non-stationarity and spatial self-correlation caused by the fixity of house spatial location and the internal differences of house characteristics. In fact, houses in different geographical locations have different Hedonic price mixes. For example, houses in different areas within a city have different Hedonic price mixes. However, and the characteristics of buildings with heating system in different cities, such as cities in northern China and southern China, are obviously different. Relevant studies have also confirmed this view. Goodman [15] and Fletche [16] found house age, floor space, primary school and other external regional factors in different geographical locations had problems with heteroscedasticity based on the characteristics factors of OLS. Goodman [17] and Schanre [18] found the Hedonic price model of sub-markets from division of total housing market fitted better than that of total housing market.
The above studies have confirmed that the fixity of house spatial location and the internal differences of house characteristics are the essential sources of spatial non-stationarity, so there is a spatial non-stationarity between housing prices and characteristic factors, rather than a simple linear relationship. Although the spatial non-stationarity of characteristic factors is analyzed by dividing the urban housing sub-market, which, however, is highly subjective and cannot reveal the influence mechanism of characteristic factors on house value formation based on different spatial locations.
Improved global regression model: Chiang et al. [19] verified the non-linear effect of grocery store accessibility on different housing price ranges through quantile regression technology. A spatial autoregressive model (Anselin) can be used for analysis given the strong externality of housing prices and independence of observed variables. Cui et al. [20] analyzed the characteristics of factors influencing residential land price in Beijing by using OLS and spatial regression models such as SLM and SEM, and concluded that the model fitting degree from large to small was SLM>SEM>OLS, proving the substantial spatial dependence of Beijing residential land price. Clark and Pennington-Cross [21] used SEM to analyze the relationship between industrial land rent in Mexico and its characteristic factors, and depicted the spatial non-stationarity between them. Lim and Park [22] applied spatial regression model to describe the spatial autocorrelation of warehouse rent in Seoul metropolitan area.
Although the above studies analyzed the spatial non-stationarity of house by incorporating the lag of observed values and the spatial error items of explanatory variables, and obtained the average estimation of the weight parameters of characteristic variables, they failed to simulate the local dynamic regression model of housing prices and characteristic variables based on different spatial locations.
Local regression model: Brunsdon et al. [23] proposed GWR, which can well explore the variation of spatial parameters for local modeling. Subsequently, this technology was widely used in urban development [24, 25], ecological environment [26–28], sociology [29, 30] and other fields. Lu et al. [31] used GWR to study the spatial variation of housing price in London. Oh et al. [32] found regional population density, economy, buying and selling indexes and other characteristic factors had different effects on housing in Busan, South Korea because of regional differences, demonstrating the effectiveness of GWR. Wu et al. [33] studied the housing price data of Shenzhen City, Chine from 2004 to 2008, and found the fitting degree of GWR was 0.119% higher than that of OLS.
The above studies factored in the spatial non-stationarity of house, but ignored the influence of time and action scale effect on housing price, which was discovered through continuous study of the influence mechanism of housing price.
Improved local regression model: Since GWR has not yet considered temporal non-stationarity, Huang et al. [34] and Fotheringham et al. [35] added time effect to GWR to construct a GTWR model based on space-time function. Huang et al. [34], Liu et al. [36], Gong et al. [37] used GTWR to study the spatio-temporal differentiation characteristics of housing price, and found the fitting degree of GTWR increased by 0.0385, 0.33 and 0.06 respectively compared with GWR. A lot of studies have shown that GTWR can not only describe the temporal and spatial differentiation of housing price, but also improve the explanatory power of model, verifying the advantage of housing price model considering spatial and temporal non-stationarity. When we try to explain a social phenomenon, such as “Why do houses with similar building characteristics in different urban areas have obviously different prices...”, “space scale benefits” are constantly emerging. To solve the housing scale effect, a MGWR [38] was put forward. It was later applied to the fields of ecological environment [39, 40], soil [41] and disease [42, 43]. Lim and Park [22] applied MGWR to explain the relationship between warehouse rent and explanatory variables, and provided valuable advice for the improvement of urban logistics infrastructure. Zhang [44] applied MGWR to study the housing rent in Nanjing City, China, and concluded commercial service centers, primary and secondary schools, subways and highways were the most significant factors affecting housing rent, and the fitting degree of MGWR was 0.724, 0.09 higher than that of GWR. Lan et al. [45] used MGWR and GWR for comparison, and the fitting degree increased by 0.04 respectively.
All the above studies show that the improved local regression models (GTWR and MGWR) can better explain the influence mechanism of housing price, but they have not been compared to obtain their applicable conditions. Therefore, this paper proposes the applicable conditions for GTWR and MGWR through comparative analysis, which further improves the influence mechanism of housing price.
(2) Measurement of characteristic factors
With the continuous deepening of the influence mechanism of housing price, the spatial econometric model of housing price is constantly improving, but it ignores the accuracy of model corrected by the measurement of basic data. The measurement of data, an effective input of model building, is crucial to the analysis results of the model. In the process of city development, effective inputs affecting housing price modeling include objective factors (roads, rivers, topography, administrative regions, etc.) and subjective factors (personal preferences), complexity of geographical space, which cannot use straight distance to accurately quantify the proximity between housing and characteristic variables. Existing inputs to study housing price modeling include Buffer Area (BA), Euclidean Distance (ED), Non-Euclidean Distance (ND), Travel Distance (TT). Lan et al. [45] corrected MGWR by the number of each infrastructure within a buffer area of 1 km when studying the effect of accessibility of public facilities in Xi’an City, China on housing price. Liu et al. [36] used ED to correct GTWR when studying the influence of infrastructure and other factors on housing price in Wuhan City, China. They took 619 communities in the main urban area of Wuhan City as the research subject and used GTWR to study the impact of urban infrastructure and other factors on housing price, and concluded the fitting degree of GTWR increased by 0.33 compared with GWR. Liu et al. [46] took the housing price data of Beijing City from 1980 to 2016 as the research object, and used ED and TT to correct GTWR respectively, and concluded the fitting degree of the corrected GTWR was higher. Lu et al. [47] used ND and ED to correct GWR to analyze the spatial change of housing price and building area in London, and concluded the correction of ND proved better. Lu et al. [31] used ED, ND and TT to correct GWR to explore the spatial change law of housing price in London, and concluded the correction of ND, which can better describe the essential relationship between housing prices and characteristic variables, was better.
The above studies on the measurement of characteristic factors proved that the measurement of road network and traffic lights can correct the model better than the measurement of straight-line distance. However, in today’s context of rapid development of big data, transformation from traditional urban model to smart city model, popularization of smart transportation and smart planning, perfect urban infrastructure, and strong transportation accessibility, people are less constrained by road network when traveling, and they pursue environmental quality and education quality around their living places. Thus, the measurement of basic data should focus on the difference in impact of original data on housing price due to different scales, and weaken the impact of road network and other factors.
In summary, previous studies have not compared the applicable conditions of GTWR and MGWR for the selection of spatial econometric model. In terms of measurement of basic data, too much attention was paid to cost distance measurement of explanatory variables based on road network, but the difference in impact of basic data on housing price due to different scales was ignored. To solve the above problems, taking the second-hand housing transaction data within the Fourth Ring Road of Zhengzhou from 2010 to 2019 as the research object, this paper constructed a BLAAP to correct the housing batch evaluation model, and compared it with the existing distance measurement methods, and finally described the spatial differentiation characteristics of characteristic factors. In addition, it first discussed the applicable conditions of GTWR and MGWR, improved the research on the impact mechanism of housing price, provided theoretical basis and ideas for building a housing batch evaluation model, providing technical support for real estate tax reform and reference significance for the government to formulate reasonable housing policies. The study attempts to solve the following problems: (1) Construct a BLAAP to correct the housing batch evaluation model and compare it with other measurement methods; (2) Build a high-precision and efficient second-hand housing batch evaluation model. (3) Discuss the application conditions of MGWR and GTWR. The structure of this paper is as follows: Section 1 reviews the recent development of previous literature and presents its own insights; Section 2 describes the study area, data sources, and research methods; Section 3 introduces the construction of BLAAP in detail; Section 4 is the result analysis; Section 5 is the conclusion and discussion.
Data and study methods
Study area
Zhengzhou City, the provincial capital of Henan Province in the central region of China, is a mega-city and core city of Central Plains Urban Agglomeration. It is also a key national integrated transport hub and historical and cultural city. Zhengzhou has jurisdiction over 6 municipal districts, 1 county and 5 county-level cities, covering an area of 7,446 sq.km, and construction land area of 830.97 sq.km. The geographical coordinates of it are E112°42′–114°14′, N34°16′–34°58′. The study scope of the paper included 5 major built-up districts, namely Jinshui District, Guancheng District, Zhongyuan District, Erqi District, and Huiji District (Fig. 1), covering the geographical area within the Fourth Ring Road of Zhengzhou. There were two main reasons for choosing the area within the Fourth Ring Road of Zhengzhou as the research object: (1) The urban infrastructure there was perfect and the road network was dense, covering commercial service centers, primary and secondary schools, greenbelts and hospitals. (2) The second-hand housing transaction data of Zhengzhou was almost from the housing there. Based on the above considerations, we selected the second-hand housing transaction data within the Fourth Ring Road of Zhengzhou from 2010 to 2019 as the research object, and analyzed the spatial differentiation law of the influencing factors of housing price, so as to construct an accurate and efficient housing batch evaluation model.

Study area.
Because of the continuous improvement of real estate market, second-hand housing transaction volume is increasing. Compared with new commercial housing, second-hand housing price is less affected by the developer’s sales strategy, and the housing sample distribution is comprehensive. So second-hand housing transaction price was selected (average price: yuan/m2) as the dependent variable. In March 2019 the second-hand housing transaction data was crawled out from Anjuke (Anjuke.com) and Fangtianxia (Fang.com) by Web crawler technology. The sample data included the spatial location, building area, floor, decoration of housing, orientation, and the green rate, plot ratio, and property management fee of the residential area. After eliminating abnormal and incomplete data, 822 second-hand housing transaction data within the Fourth Ring Road of Zhengzhou from 2010 to 2019 was finally obtained. Points of interest (POI) such as Zhengzhou road network, schools, shopping malls, subway stations, bus stops, hospitals, railway stations and bus stations were crawled out from amap.com. The location of second-hand housing on Anjuke and Fangtianxia used Baidu map coordinate system, and POI location used Amap coordinate system. For the convenience of study, all vector data was transformed into WGS_1984_World_Mercator for spatial analysis. The geographical database of spatial attribute file of second-hand housing was established with the application of ArcGIS10.2 Open Street Map as the base map.
19 influencing factors in three categories of building characteristics, neighborhood characteristics and location characteristics were selected to explore the impact of various factors on the second-hand housing price in Zhengzhou. Before establishing the housing batch evaluation model, a multiple collinearity test was conducted with variance expansion factor less than 10 and correlation coefficient less than 0.75 in order to prevent multiple collinearity from leading to the inaccuracy of regression model. The number of influencing factors to consider was 11 when the model was corrected by BA and ND and 10 by ED and BLAAP. The detailed processing of the data is shown in Table 1.
Description of variables in different metrics
Description of variables in different metrics
SEM
Housing prices have strong externality, and the observed values of samples are affected not only by their own variables, but also by the observed values of neighboring samples, so traditional linear regression analysis will lead to large errors. Therefore, this paper used SEM, a method to deal with the spatial dependence of error term and also a combination of standard regression model and spatial autoregressive model of error term, to represent the spatial spillover benefit of housing price. The most common expression for the first-order space autoregressive model was:
Where y was the explained variable; X was the explanatory variable; W ij was No. (i, j) element of the space weight matrix Wn*n ; ɛ was the random error vector; β was the effect of the explanatory variable X on the explained variable Y; λ was the autoregressive parameter measuring the spatial dependence in the perturbative error term; u i was the random error item.
Although SEM factored in the influence of explanatory variables of surrounding study area on explanatory variables in this paper and added them to the measurement equation in the form of spatial error item, the final study area was still a set of linear regression models. And it only considered the average estimation of spatial error term parameters of explanatory variables. GWR, a local linear regression model for spatial change relationship modeling, introduces the geographic location coordinates of research elements into the model, so that each research sample has its own independent regression model, making the regression model a dynamic model. Compared to conventional linear regression models, GWR allows local parameter estimation and explains the spatial nonstationary features of natural and social phenomena more reasonably.
(1) The general expression of GWR was:
(2) Determination of weight and bandwidth
When using GWR, the most important thing is the choice of spatial weights. In practice, the general spatial weight functions are Gaussian function and Gaussian-like function, the former of which is commonly used. The expression of Gaussian function was:
Where b was the bandwidth, d
ij
was the distance between No. j data point and No. i sample, W
ij
was the regression weight value for No. j data point and No. i sample. The general methods of selecting bandwidth are CV and AIC, among which AIC is commonly used. AIC (Akaike information criterion), founded in 1973 by Akaike, a Japanese statistician, is a measure of the fine fitting of statistical models and has been widely developed and applied in practical research.
Where
Traditional OLS only conducts universe estimation of regression coefficients, which can not reflect the heterogeneity of each regression coefficient in different spatial ranges, and cannot effectively find the local regression characteristics between the explanatory variables and the explained variables. GWR considers spatial non-stationarity rather than temporal non-stationarity. In this paper, based on GWR, the Spatiotemporal Geographical Weighted Regression (SGWR) model was used to solve the problem of spatial and temporal heterogeneity. The expression was:
Where (y i ; X ik ) was the observed value of the explained variable and the explanatory variable at (u i , v i , t i ), (u i , v i , t i ) was the space-time coordinate of No. i sample, ɛ i was independent random error, and β k (u i , v i , t i ) was the regression coefficient of No. k influencing factor against the explained variable y i at No. i sample.
In spatiotemporal data analysis, the closer the data point to any point (u0, v0, t0) in the study area, the greater the effect on parameter estimation at point (u0, v0, t0). The spatiotemporal distance between any point in the study area and the data point (u
i
, v
i
, t
i
) was measured by d. The larger the distance, the greater the weight given. Spatiotemporal distance can be expressed as a linear combination of spatial distance d
s
and temporal distance d
t
.
The function between spatiotemporal distance in SGWR and spatiotemporal weight in Gaussian Function Method was established, and time information and space information were combined. The spatiotemporal weight function was:
Where W(u,v) was the spatial weight, W
t
was time weight, and the basic idea was to set the weight by the distance d
ij
between each data point j and the regression point i. The relationship between the weight and the distance was represented by a function. The expression was:
Where b was the bandwidth of the spatiotemporal weight function, and the bandwidth directly affected the simulation optimization results of the GTWR. In order to obtain the optimal bandwidth, the bandwidth b was the best when the AIC of the model was the smallest. The expression was:
Where n was the sample size,
Both GWR and GTWR were bound to the variation of local coefficients of the explained variables and the explained variables at the same scale, while MGWR allowed the conditional relationship between the explained variables and the different interpreted variables to vary at different spatial scales. The regression coefficient β
bwj
of MGWR was based on local regression, and each explanatory variable had the optimal bandwidth, which was more in line with the analysis of relationship between urban housing prices and variables. The expression was:
Where bwj in β bwj was the bandwidth of No. j explanatory variable, (u i , v i ) was the position of No. i data point, and β in β bwj was the coefficient of No. J explanatory variable at data point (u i , v i ).
Adaptive square kernel function was selected for spatial function, and AICC was selected for bandwidth. Classical GWR used the estimation method of weighted least squares, while the estimation method of MGWR would be regarded as a generalized additive model.
For the generalized additive model, back fitting algorithm would be used to fit each smoothing term. First, all smoothing items needed to be initialized, suggesting each coefficient in MGWR needed to be preliminarily estimated in the early stage. GWR estimation was chosen as the initial estimation in this paper. After the initial setting was determined, the initialization residual
GWR regression between the residual
Where RSS old was the residual sum of squares of the previous step, RSS new was the residual sum of squares of this step.
When building a housing batch evaluation model, the measurement of data, an effective input of model building, is crucial to the analysis results of the model. The characteristic variables were divided into three categories of building characteristics, regional characteristics and neighborhood characteristics. In terms of difference in measure, the building characteristics such as building age, decoration, floor, and building area were small, while the neighborhood characteristics such as primary and secondary schools and hospitals as well as the regional characteristic variables such as commercial service centers, subway stations and bus stations were great. The measures affecting the characteristic variables of housing price included BA, ED, ND, TT. BA and ED are introduced at the beginning of the study on influencing factors affecting housing price. BA focuses on the quantity statistics of characteristic factors, establishing the buffer of housing price samples, and extracting the number of samples of characteristic factors to quantify the impact on the housing price. ED focuses on the statistics of the nearest distance between housing samples and characteristic factor samples to quantify the impact on the housing price. BA and ED have not considered the influence of the scale difference of characteristic variables, road network and other factors on housing price. To further simulate the actual impact of various characteristic factors on housing price, scholars have established ND fully considering road network and traffic lights, which improves the model efficiency more greatly than BA and ED [45–47]. The existing literature divides the study area into first road, secondary road, general area, viaduct, railway, river, etc., and traffic light-based travel distance (Table 2, Fig. 2(a)) when considering the cost distance measurements (ND, TT). Today with the rapid development of big data, traditional urban model is developing into an intelligent urban model featuring popularization of intelligent transportation and intelligent planning, and measurement of indicator data should focus on the difference in impact of original data on housing price due to different scales. For example, a school will have different impacts on housing price due to different teaching quality, business scale and so on. Hence, in this paper BLAAP was constructed to reduce the consideration of road networks and traffic lights, and to enhance the consideration of differences of raw data due to different scales. The detailed process of constructing BLAAP was described below.
ND
ND

(a) ND (b) BLAAP.
Zhengzhou is a plain area without large rivers passing by and natural barriers. Thus, by extracting the basic data of Zhengzhou railway and viaduct, this paper extracted three main railway lines in Zhengzhou based on its network structure—two north-south barrier lines of Zhengzhou Railway Station and Zhengzhou High-speed Railway Station, and one east-west barrier line connecting them, as well as 44 traffic points (viaducts) to construct the basic BLAAP (Fig. 2 (b)). POI data was obtained from Amap. Although the data timeliness and integrity was strong, the scale of the basic data was differentiated. In this paper, we classified the basic data by consulting information and Internet, and obtained the scale index of each characteristic variable. The scale index of each characteristic variable was obtained, based on which linear attenuation was made using the scale index based on the basic BLAAP, and finally the score of each characteristic variable corresponding to housing price sample was extracted based on spatial position. The above was the detailed process of constructing BLAAP. The expression was:
Where N was the score of each characteristic variable, I was the scale index of each characteristic variable, d i was the measurement method based on the basic BLAAP, d was the influence radius of each characteristic variable, and S was study area, n was the POI number of each characteristic variable.
Spatial distribution of samples
To compare the spatial distribution characteristics of housing price samples, the housing price samples were uniformly revised to the value at the time point in 2019. The spatial distribution characteristics of housing price in Zhengzhou were described by Origin 2018 based on random (Kriging correlation) (Fig. 3). The results showed that:

Spatial distribution of housing price samples.
(1) The spatial pattern of housing price in Zhengzhou was “one center, multi-cores”
Housing price was distributed in “one center, four cores” with gradually outward attenuation. The highest housing price was in Longhu streets with an average housing price of 46,876 yuan/m2. Four cores referred to Yingbinlu streets, Wenhualu streets, Linshanzhai streets, and Minggonglu streets, where the housing price reached about 20,000 yuan/m2.
(2) There was strong synergy between high housing price areas and urban commercial service centers
Areas with high housing price were mainly distributed in commercial service centers of 5 municipal districts with convenient transportation and perfect infrastructure. Longhu streets are the CBD center of Zhengdong New District, a ring building complex integrating many functions, including business, residence, leisure and entertainment, and scientific research.
(3) Second-hand housing samples near subway stations were less
Figure 4 showed 822 second-hand housing samples were uniformly distributed in general, with few samples near subway stations. The samples were collected in March 2019 when only Metro Lines 1 and 2 were put into operation and only 44 subway stations were open to the public. Set area 1,000 m at the subway entrance as the buffer area, and only 123 second-hand housing samples were in the buffer area, accounting for 1/7 of total second-hand housing data.

Spatial distribution of second-hand housing samples.
Exploratory Spatial Data Analysis (ESDA) is a visual expression of the spatial action mechanism of various natural and social phenomena to reveal the degree of spatial autocorrelation of such phenomena. It includes both global spatial autocorrelation and local spatial autocorrelation. Global spatial autocorrelation can reveal the overall spatial agglomeration and discretization of the study area, thus determining the degree of correlation between observed values in the overall range and surrounding observed values. Local spatial autocorrelation can only determine the degree of spatial agglomeration and discretization between samples in the overall range, and cannot reveal the spatial non-stationarity and spatial dependence between local samples in the study area. Local spatial autocorrelation can reveal the degree of correlation between samples at the local scale as well as local differences.
This paper used the common Global Moran’I to test the existence of the spatial agglomeration of second-hand housing samples of Zhengzhou. Global Moran’I was defined as:
Where
This paper used the spatial autocorrelation tools in ArcGIS10.2 software to analyze the global autocorrelation of second-hand housing price in Zhengzhou. The results are shown in Fig. 5. Global Moran’I was 0.49, Z was 54.60, far higher than the critical value of 1.65 when P was significant at 0.01, indicating that the probability of random generation of this cluster was less than 1%. Thus the spatial agglomeration effect of second-hand housing price in Zhengzhou was significant.

Global spatial autocorrelation analysis.
To further study the local differences of second-hand housing samples of Zhengzhou, local spatial autocorrelation analysis was needed. Local spatial autocorrelation was usually expressed by LISA (Local Indicators of Spatial Association) significance.
Where Z i and Z j were the standardized forms of adjacent space units, J j was the neighborhood set of the space unit i.
This paper used the univariate Local Moran’I in GeoDa1.12 software to further analyze the local autocorrelation of second-hand housing price in Zhengzhou. From Fig. 6 we can see that 763, 735 and 696 housing samples showed significant spatial agglomeration at the significant levels of P = 0.05, P = 0.01 and P = 0.01, respectively.

LISA significance.
To improve the accuracy of the second-hand housing batch evaluation model, this chapter corrected the fitting degree of the model and conducted multiple collinearity test by using five regression models and four correction methods. The comparison of OLS and SEM showed that the fitting degree of OLS increased from 0.5231 to 0.6282. The results showed that housing price had spatial non-stationarity, 62.82% of the change characteristics of housing price can be explained by the fitting degree of SEM, and the Log-likelihood of SEM was greater than that of OLS. Detailed results are shown in Table 3 (IN for the constant term).
Comparison results of OLS and SEM
Comparison results of OLS and SEM
To further accurately describe the spatial changes of the influencing factors of second-hand housing price in Zhengzhou, the second-hand housing batch evaluation model was studied from the selection of the model and the improvement of the processing mode of data index to discuss the improvement of four correction methods of BA, ED, ND, BLAAP for batch evaluation model. Figure 5 showed the fitting degree of GWR, GTWR and MGWR improved more greatly than OLS not considering the spatial and temporal non-stationarity of housing price and the difference of scale, and all the four correction methods improved their fitting degree. For the five models, their fitting degree by ND was 0.04–0.13 higher than by ED. For GWR, GTWR and MGWR, their fitting degree by BLAAP was 0.03–0.07% higher than by ND. For OLS and SEM not fully considering the spatial non-stationarity and autocorrelation of housing price, their fitting degree by ND was higher than by BLAAP. Although ND increased the effective input of subway characteristic variables more greatly than BLAAP, the influence coefficient of subway characteristic variables was far less than standard error. Thus, the influence of subway characteristic variables on the second-hand housing price in Zhengzhou was not ideal. The final results showed (Table 4) the Adj.R2 of MGWR by BLAAP was 0.871, and the AIC was 1,100.91, smaller than that by other correction methods, indicating MGWR had the best performance.
Comparison of results of four different correction methods
The results showed the fitting degree of GWR considering the spatial non-stationarity of housing price was significantly higher than that of OLS and SEM. The fitting degree of GTWR considering the spatial non-stationarity of housing price and MGWR considering action scale heterogeneity was significantly higher than that of GWR. There is no study on the degree of interpretation and applicable conditions of GTWR and MGWR. This paper, taking the second-hand housing in Zhengzhou as an example, compared the analysis results of MGWR and GTWR and discussed their applicable conditions.
According to the comparison table of the analysis results of MGWR and GTWR (Table 5), their overall effectiveness can be compared from two aspects. First, compare the diagnostic information of the model. For MGWR (BLAAP), its Adj.R2 increased to 0.018, Log-likelihood increased to 103.837, and Sigma estimate decreased to 0.049. All the results showed the analysis results of MGWR (BLAAP) were closer to the second-hand housing market in Zhengzhou. Second, compare the analysis results of each characteristic variable. The standard deviation of 9 characteristic variables of MGWR (BLAAP) was less than that of GTWR (BLAAP). The most obvious difference between the two models was the characteristic variable of commercial service center whose standard deviation was 0.003 and 0.571, respectively. The significant difference may be closely related to the rapid development of Zhengdong New District from 2010 to 2019, which has been transformed from the original farmland village to the central business district with complete functional areas of science and technology, education, finance and medical treatment. The results showed that when major functional areas were planned and adjusted in the city, the action scale heterogeneity of each characteristic variable had a more significant impact on housing price than the temporal non-stationarity. And the differentiation of the action scale of each characteristic variable of MGWR was obvious. Therefore, when analyzing the influence of variables on housing price, the power of interpretation of MGWR was better than that of GTWR when major functional areas were planned and adjusted in the city.
Comparison of the analysis results of MGWR and GTWR
The overall difference of coefficients of each variable was preliminarily discussed through the analysis results of MGWR. To better explore the performance of the model and spatial non-stationarity, the spatial variation characteristics of local coefficients of each characteristic variable were displayed in Fig. 8.

Model comparison.

Spatial differentiation of characteristic variables.
(1) Analysis of location characteristic factor
As can be seen from Fig. 8(f), the average influence coefficient of bus station on housing price, which was negatively correlated with the housing price as a whole, was –0.027, and the standard error was 0.004. The closer to the bus station, the lower the housing price. And the housing price of Erqi District was the lowest as the coefficient was up to 0.033. This was because Erqi District is an old urban area of Zhengzhou, where there are 1 railway station and 3 bus stations. So it has large pedestrian volume, heavy traffic congestion and serious noise. Having been built for a long time, its surrounding infrastructure is seriously damaged. Thus, the government should put more efforts into the renovation of Erqi District, such as perfecting its infrastructure and improving its living environment. Because Zhengzhoudong Railway Station is a newly built high-speed railway station with perfect infrastructure and high green rate, it is less affected by the bus station. As can be seen from Fig. 8(i), the average influence coefficient of commercial service center on housing price, which was positively correlated with the housing price, was up to 0.355, and the standard error was 0.003. This showed the influence coefficient of commercial service center on housing price was stable under this scale of action (Table 5). Zhengdong New District has now been built into a fully functioned central business district since its construction in 2001. With the implementation of Ruyi Lake, Fashion Culture Square and many theme parks, the south of Longhu area has become a high-grade livable area. Figure 9 shows the correlation coefficient of Xingdalu streets, Longhu streets and Ruyihu streets was as high as 0.363, suggesting they were at a distance from the commercial agglomeration core area and the area with beautiful environment was the most livable area.

Distribution of variable coefficients and variances of Zhengzhou streets.
(2) Analysis of neighborhood characteristic factor
As can be seen from Fig. 8(h), the average influence coefficient of kindergarten on housing price was –0.063, the standard error was up to 0.251, much higher than the significance coefficient. It showed that when the action scale of kindergarten was 44, the spatial non-stationarity was strong. From Fig. 9 we can see that the kindergarten was negatively correlated with the housing price of Boxuelu streets and Ruyihu streets in Zhengdong New District. This was mainly because the areas are mostly high-grade residential areas, where people pursue clean and elegant living environment, high green rate within the area, and high property management level. The kindergarten was positively correlated with the housing price of other areas as a whole. As can be seen from Fig. 8(j), the influence coefficient of property management fee was reduced from northeast to southwest, and the average influence coefficient of property management fee was 0.139. Property management fee, a reflection of service management level in a residential area, had a strong impact on the housing price of high-grade residential areas, as high as 0.201, but a weak impact on the housing price of ordinary residential areas. From Fig. 8(a) we can see that the spatial characteristics of the change of green rate coefficient were similar to those of property management fee, both of which were positively correlated. And the influence of it on housing price was lower than that of property management fee. Figure 8 shows the housing price of high-grade residential agglomeration area was very sensitive to plot ratio, property management fee and green rate, which was an important factor affecting the housing price of the area. Plot ratio was positively correlated with the housing price of Dehua streets, Changjianglu streets, Nanguan streets. It was perhaps because these areas have large pedestrian volume, heavy traffic congestion and serious noise because of railway station and bus station, explaining why people prefer to live on high floors.
(3) Analysis of architecture characteristic factor
As can be seen from Fig. 8(e), the average influence coefficient of building area on housing price was 0.086, and the range of influence coefficient was –0.405∼0.407, and the standard error was 0.152, and the scale of action was 48. It showed that the spatial non-stationarity of building area was strong when close to the scale of action of street area. In terms of spatial characteristics, the area whose building factor was positively correlated with the housing price, such as Longhu streets, is mostly the high-grade residential area, where people will bear high housing price for other services. The area whose building factor was negatively correlated with the housing price is mostly the ordinary residential area where working population with low income are more concerned about the total value of the housing and prefer the housing with low total value, which will result in excess large-area housing, leading to the decrease of the housing price. The Fig. 8(d) is the spatial variation of the influence coefficient of residual year on housing price. The area whose building factor was negatively correlated with the housing price is mostly the new residential area, and the area whose building factor was negatively correlated with the housing price is mostly the old residential area. From the basic data analysis, it can be found that the boundary point of the residual year was 50 years, which may be related to the reform policy of the old residential area. The Fig. 8(i) is the spatial variation of local fitting degree. It can be found that the areas with high fitting degree were concentrated in the commercial service centers of Jinshui District, Erqi District, Guancheng District and Huiji District, all of which boast perfect infrastructure and convenient transportation. Therefore, the government should optimize the layout of regional infrastructure when formulating housing policies.
Conclusion
Taking the second-hand housing transaction data from 2010 to 2019 within the Fourth Ring Road of Zhengzhou as the research object, by combining big data information technology, spatial econometric model, and geographic information system, this paper summarized the basic data acquisition, model correction method construction, spatial measurement operation, result analysis and visualization, and batch evaluation model construction, and provided theoretical basis and ideas for building a housing batch evaluation model. The conclusions were as follows: We compared the performance of MGWR by different correction methods using model diagnostic information parameters such as Adj.R2 and AIC, and found the Adj.R2 of MGWR by BLAAP was 0.871, and the AIC was 1,100.91, smaller than that by other correction methods, and 87.1% of housing price data can be interpreted, indicating MGWR had the best performance. Thus, BLAAP was the most effective correction method to construct the Zhengzhou second-hand housing batch evaluation model, providing a deep understanding of the inherent formation mechanism of spatial non-stationarity of housing value and technical support for real estate tax reform, as well as reference significance for the government to optimize urban functions and formulate reasonable housing policies. The performances of OLS, SEM, GWR, GTWR and MGWR by ND and ED were compared. The results showed ND greatly improved the performance of the model, and increased the fitting degree by 0.04–0.13, which was consistent with the previous studies [31, 46–49]. BLAAP improved the performance of GWR, GTWR and MGWR, and increased the fitting degree by 0.03–0.07. In contrast, it didn’t improve the performance of OLS and SEM not fully considering the spatial and temporal heterogeneity of housing price and the difference in scale of action. The construction of complex ND can make up for the error to the model caused by the spatial non-stationarity of housing price variable parameters. This phenomenon also shows that the rapid development of big data and Internet has witnessed the transformation from the traditional city mode to the intelligent city mode, and growing popularity of intelligent transportation. The megacities with developed road network, convenient transportation and perfect infrastructure should consider the difference of the influence of characteristic variables on housing price due to different size when considering the spatial and temporal heterogeneity and scale difference of housing price. This paper first compared MGWR with GTWR, and proved that MGWR can better explain the spatial distribution characteristics of second-hand housing price in Zhengzhou by using examples. The significant difference may be closely related to the rapid development of Zhengdong New District from 2010 to 2019, which has been transformed from the original farmland village to the central business district with complete functional areas of science and technology, education, finance and medical treatment. The results showed that when major functional areas were planned and adjusted in the city, the action scale heterogeneity of each characteristic variable had a more significant impact on housing price than the temporal non-stationary. MGWR can better explain the housing price than GTWR. The synergy between the high housing price agglomeration area and the commercial service center of the city was obvious, and the influence coefficient on housing price was up to 0.355, and the estimation standard error was the smallest of all the characteristic variables. It showed that the commercial service center changed smoothly in space. The estimation standard error of kindergarten variable was larger than that of other variables, which indicated that the influence coefficient of kindergarten on housing price was obviously different because of different spatial position. The spatial distribution of kindergartens is extremely sensitive to urban housing price, so the government should optimize the spatial layout of kindergartens. The results show that the impact of residential environment on housing price is increasing, as people tend to choose the residential area with high green rate and low population density.
Discussion
Deficiency
The research encountered some objective problems in the writing process, because the data acquisition time was March 2019 when Zhengzhou only operated Metro Lines 1 and 2, the second-hand housing data in the 1,000 meter buffer area at the subway entrance accounted for only 1/7 of total second-hand housing data. Limited data may have an impact on the analysis.
Research prospect
By combining big data and new information technology, spatial econometric model, and geographic information system, this paper provided theoretical basis and ideas or building a housing batch evaluation model. The research is still at the theoretical level and has not been applied to practice. However, the simulation experiments of this paper and the latest research progress of other literature suggest that batch evaluation can efficiently and accurately evaluate the housing price, which is very meaningful to the improvement of real estate tax system and the guidance of market trading behavior. It has huge development space in the future, and will be the focus of future research.
Author Contributions
Data curation, Chaojie Liu, Wenjing Fu and Zhuoyi Zhou; Methodology, Chaojie Liu; Writing –original draft, Chaojie Liu; Writing –review & editing, Jie Lu.
