Abstract
In order to study the semi-parametric spatiotemporal variable coefficient regression model, firstly, the parametric and non-parametric models are introduced. Based on the spatiotemporal variable coefficient model and estimation method, a semi-parametric spatiotemporal variable coefficient model is established. According to the two-step estimation simulation experiment of semi-parametric space-time variable coefficient model, three groups of numerical experiments of semi-parametric spatiotemporal variable coefficient model are carried out. The experimental results show that the semi-parametric regression model has both the advantages of parametric model and non-parametric model, and the estimated value is very close to the real value. After combining the semi-parametric and time-varying coefficient regression model, the fitting effect of the model is better, and the estimation accuracy is obviously improved.
Introduction
Statistical inference of spatiotemporal models has always been an important and widely used branch of statistical and econometric research. However, the main spatiotemporal models are generally parametric models. Although standard parameter models, such as spatiotemporal lag regression model and spatiotemporal error regression model, are very useful in testing problems, their model forms are strictly dependent on subjective assumptions and are easy to be set incorrectly [1]. In recent 20 years, with the rapid development of computer computing technology, in the field of spatiotemporal statistics and spatiotemporal econometrics, a variety of semi-parametric modeling methods have been proposed to explore the complex relationship between dependent variables and independent variables, including geographically additive model, geographically weighted regression model and spatiotemporal non-parametric regression model [2]. Semi-parametric regression model is a new regression model introduced by Engle et al. when studying the relationship between electricity demand and climate. So far, the model has been widely concerned and applied from theoretical research to practical data analysis. Semi-parametric model combines parametric model and non-parametric model. On the one hand, the parametric part can still concentrate the main information and has strong explanatory ability. On the other hand, it not only alleviates the problem of dimension, but also improves the accuracy of the model [3].
With the continuous development of science and technology, regression analysis, as the basis of mathematical statistics in the era of big data, has been widely used in many fields such as geology, meteorology, hydrology, medicine, industry, agriculture and economy. It has become one of the important tools of statistical analysis, an effective method to solve practical problems, and a hot research direction in the field of statistics [4]. The arrival of the data age promotes the further exploration of science and technology, brings more changes to our life and work, brings opportunities to the application and development of regression analysis theory, and also makes it face more significant challenges [5, 6]. As the basic model of regression analysis, linear regression model has been widely used in data research because of its complete system. However, linear regression models generally require the form of dependent variables and the uniform normal distribution of random variables, which has limitations. Therefore, the non-linear regression model has been developed and applied, the most representative of which is the generalized linear model, which mainly represents the exponential family distribution [7]. It can deal with the regression analysis of discrete distribution of dependent variables such as binomial distribution and Poisson distribution.
The research of regression model is an important part of mathematical statistics. Because of its fixed form, the parametric regression model cannot satisfy the flexibility of data in practical problems. Although the non-parametric regression model does not need to select the model beforehand and can reflect the information more accurately, it faces the problem of high dimension. Semi-parametric model has the advantages of both the above two models, and contains both parametric components and non-parametric components. It can reduce the risk of misjudgement and overcome the “dimension disaster”, which has become a hot research topic in recent years and has been widely used in various fields.
Method
Parametric and non-parametric models
In real life, many variables have interdependent relationships, which are composed of some variables
The relationship between
In the above formula,
The model Eq. (1) is called a linear regression model; otherwise, it is called a non-linear regression model.
If the specific form of regression function
Non-parametric statistics mainly use large sample method. When the distribution of test statistics is too complex, limit distribution is generally used instead. For example, when estimating density function
In many practical applications, with the increasing complexity of the data, it is not enough to consider a single regression coefficient. Taking the price of urban commercial housing as an example, the price of the house is not only related to the area, but also depends on the location (
Spatiotemporal variable coefficient model can accurately analyze the change of house price with the change of space-time location. Spatiotemporal information is introduced into the model, and the function of space-time location of observation points is used as regression coefficient. The spatiotemporal characteristics of regression relationship are analyzed by using the estimated values of regression coefficients. Assuming that the coordinates of independent variable
(
Set (
The Gauss kernel function used here is in the form of a single-valued function.
Construct weighted least squares function:
To minimize it, the estimation of parameter
Solve partial derivative of
Then the solution is:
The fitting of
Let (
Then there is:
The estimation of the error square is usually recorded as:
For spatiotemporal variable coefficient model, because two-dimensional geographic area is much larger than one-dimensional, the boundary effect of GWR method will be more serious, which will make the estimation of coefficient function in the boundary area inaccurate. Considering that the local polynomial fitting method has many advantages such as automatically correcting the boundary effect, the local linear fitting method is applied to the spatiotemporal variable coefficient model, and the improved effect of the local linear GWR method is given.
At present, the semi-parametric models studied by scholars mainly focus on partial variable coefficient models. However, in some practical problems, people will encounter such problems. Some variables have temporal and spatial differences in the impact of dependent variables, while other variables have no temporal and spatial differences in the impact of dependent variables, but they are non-linear. The linear part of the combined model will be extended to the non-parametric model, and the estimation method of the semi-parametric model which is composed of an independent variable non-parametric model and a time-space varying coefficient model will be further explored. Assuming a semiparametric spatiotemporal variable coefficient model:
If (
Next, a series of simulation experiments will be carried out to investigate the accuracy and stability of linear function and spatiotemporal variable coefficient parameter estimation in the nonlinear semi-parametric variable coefficient model proposed in the second part.
In the simulation experiment, the spatiotemporal region edge length is
Fitting figures under three groups of model experiments (A: (i) Fitting figure under variance of 0.2; B: (i) Fitting figure under variance of 0.6; C: (i) Fitting figure under variance of 1; D: (ii) Fitting figure under variance of 0.2; E: (ii) Fitting figure under variance of 0.6: F: (ii) Fitting figure under variance of 1; G: (iii) Fitting figure under variance of 0.2; H: (iii) Fitting figure under variance of 0.6; I: (iii) Fitting figure under variance of 1.
Three sets of semi-parametric spatiotemporal variable coefficient models are numerically tested.
The error term
Using MATLAB software programming, after running, the following results are obtained.
The numerical results show that under different
Based on the definition and difference of parametric model and non-parametric model, the expression and estimation method of variable coefficient regression model and the generalized spatiotemporal variable coefficient regression model, two-step estimation method is used to estimate the semi-parametric spatiotemporal variable coefficient model combined with one-variable non-parametric and spatiotemporal variable coefficient model, and the estimation expression is given. The simulation experiment shows that the semi-parametric regression model has the advantages of both parametric and non-parametric models. The estimated value is very close to the real value and the fitting effect is good. As the variance decreases, the disturbance to the model decreases, and the accuracy of estimation increases significantly. Combining the non-parametric and time-varying coefficient model, an estimation method is proposed. The estimated value fits well with the real value, and the accuracy of the method is high.
