Abstract
Uncertain least squares estimation is one of the important methods to deal with imprecise data, which can fully consider the influence of given data on regression equation and minimize the absolute error. In fact, some scientific studies or observational data are often evaluated in terms of relative error, which to some extent allows the error of the forecasting value to vary with the size of the observed value. Based on the least squares estimation and the uncertainty theory, this paper proposed the uncertain relative error least squares estimation model of the linear regression. The uncertain relative error least squares estimation minimizes the relative error, which can not only solve the fitting regression equation of the imprecise observation data, but also fully consider the variation of the error with the given data, so the regression equation is more reasonable and reliable. Two numerical examples verified the feasibility of the uncertain relative error least squares estimation, and compared it with the existing method. The data analysis shows that the uncertain relative error least squares estimation has a good fitting effect.
Keywords
Introduction
Regression analysis is a widely used statistical analysis method, which can better determine the interdependence between variables. Regression analysis can be divided into unary regression analysis and multiple regression analysis according to the number of independent variables. If the regression analysis contains only two variables, and the relationship between them can be approximately expressed by a straight line. This kind of regression analysis is called unary linear regression analysis. A regression analysis is called multiple linear regression analysis if it contains more than one variable and the relationship between the variables is linear. The least squares method is one of the most commonly used methods to solve the unknown parameters of linear regression equation. It makes the sum of the distances from each point to the fitted line the shortest, that is, the least squares method can minimize the absolute error, so as to obtain a better fitting line. Zhang and Fu [1] studied several problems of least squares method in regression analysis. Wang [2] applied the least squares method to establish the regression model of the air temperature curve. Many expert scholars have carried out in-depth studies on the least square method and applied to many fields, which will not be listed one by one here.
The absolute error reflects the deviation of the forecasting result from the true value, but cannot reflect the relative degree of the error. Sometimes the absolute error is relatively large, but the deviation from the actual data may be small. Sometimes the absolute error is relatively small, but the deviation from the actual data may be large. Relative error is the ratio of absolute error to true value, so it is a percentage. Generally speaking, the relative error can better reflect the credibility of the forecasting. For example, the surveyor uses the same ruler to measure objects with a length of 1cm and 10cm respectively. Assuming that the absolute errors of their measured values are the same, but the relative errors of the former are an order of magnitude larger than the latter, it indicates that the measured values of the latter are more reliable. So it is more meaningful to study the relative error of data. In fact, some scientific studies and observations are often evaluated on the basis of relative error. Therefore, many experts and scholars have studied and applied the relative error. Li [3] proposed least squares method based on relative error. Qi [4] put forward relative error analysis of least squares estimation for unary linear regression. Jin, Huang and Shan [5] proposed a new linear regression model from electric load forecasting of shandong province. Yun and Cao [6] proposed data fitting in the sense of relative error based on optimization. Gai and Zhang [7] proposed single-index relative error regression models. The relative error least squares method can minimize the sum of relative error squares, and the regression equation obtained is more effective and practical.
In the traditional mathematical statistics, the least squares method of relative error can better solve the regression equation of given precision data. But in fact, the observation data we get are often imprecise, or there is no historical data, so the relative error least squares method is powerless. Liu [10] proposed uncertainty regression analysis based on uncertainty theory can help to solve this problem. In order to describe the relationship between imprecision data, Liu [8] founded the uncertainty theory in 2007 and gradually improved it [9–13]. Uncertainty theory can deal with the uncertainty in reality very well, and many experts and scholars have conducted in-depth research on it. Uncertainty theory has been widely used [14–17]. In 2010, Liu [10] started his research on uncertain statistics. In order to estimate the unknown parameters in the uncertain distribution, Liu put forward the principle of least squares. Since then, many experts and scholars have systematically studied uncertain statistics [18–20]. In 2018, Yao and Liu [21] proposed the least squares estimation and Song and Fu [22] presented the least squares method of uncertain multiple linear regression. Wang, Li and Guo [23] put forward a new uncertain regression model in 2020. Wang et al. [24, 25] proposed two new uncertain linear regression models in 2021. Shi et al. [26] proposed total least squares estimation model based on uncertainty theory in 2022. Ye and Liu [27] proposed uncertain hypothesis test with application to uncertain regression analysis in 2022.
Uncertain linear regression model is an important part of uncertain regression analysis and an effective tool to deal with imprecise data. At present, least squares estimation is one of the best methods to solve the unknown parameters of uncertain linear regression equation. It minimizes the sum of squares of residuals, and can also forecast data effectively and get confidence interval. However, although the least squares estimation is not the absolute error in the traditional sense, it still solves the degree of deviation of the estimated value from the true value. The least squares estimation can only get the magnitude of the error, but can not reflect the credibility of the estimated value. Other methods for solving the unknown parameters of uncertain linear regression equations are similar without considering the relative degree of errors. In fact, some scientific studies need to evaluate the relative error of the data. On the condition that the sum of squares of relative fitting errors is minimum, References [3–7] put forward the relative error least squares method and discusses some related problems. The significance of the relative error least squares method is that the larger the observed quantity is, the larger the actual measurement error is allowed, so it has more extensive use and practical significance. The relative error least squares method can only deal with precise data, but can not solve the regression equation of imprecise data. On the basis of previous studies [3], this paper proposed an uncertain relative error least squares estimation model. The uncertain relative error least squares estimation model minimizes the sum of squares of relative errors and obtains a more realistic regression equation. Compared with the absolute error least squares estimation model, the uncertain relative error least squares estimation model proposed in this paper minimizes the sum of the squares of the relative errors, which not only makes the regression equation more realistic, but also has a better fitting effect than the absolute error model. At the same time, the relative error least squares estimation model can be used to solve the unary linear regression equation, and can also be extended to multiple regression, which can be used to deal with both imprecise data and precise data.
The main organizational structure of this paper is as follows: In section 2, we introduce the uncertain least squares estimation, which can effectively solve the regression equation. In section 3, we propose uncertain relative error least squares estimation model. This paper first deduces the formula of unknown parameters of unitary linear regression equation and extends it to the case of multiple linear regression equation. In section 4, we verify the feasibility of the model with two numerical examples, and compare the calculated results with the existing methods. The results shows that the relative error least square estimation model has a good fitting effect. Finally, we compare the absolute error with the relative error, analyze the superiority of the uncertain relative error least squares estimation model, and summarize the whole paper.
Uncertain regression model
In order to describe the relationship between imprecise data, Liu [8] founded uncertainty theory in 2007. Readers interested in uncertainty theory can read Reference [13]. This section mainly introduces the uncertain regression equation.
Let (x1, x2, ⋯, x n ) be a vector of explanatory variables, and let y be a response variable. If the functional relationship between (x1, x2, ⋯, x n ) and y can be expressed by a regression model
where
In particular, Liu [13] called
Now supposed that we have a set of imprecise data
Yao-Liu [21] proposed the least squares estimate of unknown parameter
If the minimization solution is
We call
If the disturbance term ɛ is uncertain variable, its expected value and variance can be estimated as
Lio-Liu [28] suggested that the forecast uncertain variable of response variable y with respect to
The relative error least squares estimation model studies the relative error of observation data, which minimizes the sum of relative errors and reflects the relative accuracy of estimates. For imprecise data, the model first calculates the expected value of each relative error, and then minimizes the sum of squares of the expected values.
Unary relative error least squares estimation
We always assume that there is a linear relationship between
Step 1. Construct the relative error least squares estimation model.
According to the imprecise data
Step 2. Solve the regression parameters β0 and β1 of the model.
Equation (11) is a binary elementary function of β0 and β1. In order to solve β0 and β1, equation (11) takes the first partial derivative of β0 and β1, respectively, and makes its first partial derivative equal to 0. The following equations are obtained
By deforming equation (12), we obtain the following equation
In equation (13),
Denoted
By solving equation (14), we obtain estimates of β0 and β1
According to the expected value calculation formula in reference [13], the specific calculation formula of equations (17) are
Equations (19) are specific calculation formulas for calculating unknown parameters. By substituting known data, the estimated value of unknown parameters can be obtained, thus obtaining the fitting equation.
Step 3. Data analysis and forecasting.
After we get the estimated values
Relative error least squares estimation model is actually to solve the minimum problem of relative error by partial derivative. The method has more practical significance and the overall fitting effect is good.
The basic idea and derivation process of the relative error least squares estimation model of multiple linear regression are similar to the Unary relative error least-squares estimation model. However, due to the large number of variables, the calculation process is more complicated. In order to simplify the calculation of the relative error least squares estimation model of multiple linear regression, we derive it by means of matrix equation.
We assumed that there is a linear functional relationship between uncertain variables vector
The main steps of the relative error least squares estimation model are as follows.
Step 1. Construct the relative error least squares estimation model.
Similar to the unary relative error least squares estimation model, the multivariate relative error least squares estimation model also calculates the ratio of the difference between each observed value and the estimated value and the observed value, and then minimizes the sum of the relative errors according to the uncertain expectation. Multivariate relative error least squares estimation model is
Step 2. Solve the regression parameters of the model.
Equation (21) can be regarded as a basic elementary function of β0, β1, β2, ⋯ , β n . According to the extreme value theory of functions, if a differentiable function has an extreme value, its partial derivative equation must have a solution. We take the partial derivatives of equation (21) for β0, β1, β2, ⋯ , β n respectively, and we obtain the following equation.
After collating and calculating equation (22), the following equation is obtained.
In Equation (23),
We introduce the following matrices
Therefore, the matrix equation form of Equation (24) is
Here,
If
Step 3. Data analysis and forecasting.
After obtaining the estimate of
The derivation process of multivariate relative error least squares estimation method is similar to that of unary relative error least squares estimation method, but due to many variables, the solution of the model is more complicated. In the process of solving the model, we use the inverse matrix and matrix elementary transformation and other related content, requires readers to have a certain matrix foundation.
In the previous section, we derived the uncertain relative error least squares estimation model in detail. In this section, we verify the feasibility of the relative error model through two numerical examples and compare the results with the least squares estimation and the uncertain linear regression model based on the slope mean, respectively. The results of numerical examples show that the uncertain relative error least squares estimation model has the better effect.
Example of imprecise data
In this part, we verify the feasibility of the uncertain relative error least squares estimation model through an example of imprecise data.
It is assumed that (
Imprecise data (Linear uncertainty distribution)
Imprecise data (Linear uncertainty distribution)
The linear regression equations obtained by the relative error least squares estimation, least squares estimation and uncertain slope mean model are shown in Table 2. From the numerical point of view, there is little difference between the three equations, indicating that the fitting effect of the three methods should be similar.
The regression equation
The expected value and variance of the disturbance terms of the three models are shown in Tables 4. The results show that the expected value and variance of the disturbance term of the relative error least squares estimation model are the smallest and the fitting effect is the better.
Expected value of the disturbance term
Variance of the disturbance term
The relative error least squares estimation can solve the forecast value of imprecise data and find the confidence interval. We assumed that
If we take the confidence level α = 95%, and the disturbance term is subject to an uncertain normal distribution
In this numerical example, the uncertain relative error least squares estimation model considers the relative error of the given observation data, which makes the regression equation more practical. Compared with least squares estimation and the uncertain linear regression model based on the slope mean, uncertain relative error least squares method has better fitting effect. However, the numerical example is unary linear equation, and its solving process and analysis degree are not too complicated. The calculation of the multivariate relative error least squares estimation model is very complicated due to the large number of variables.
In this part, we verify the feasibility of the relative error least squares estimation model with an example of precise data. In terms of absolute error and residual sum of squares, the model has achieved good results.
Assuming that (x i ,y i ), i = 1, 2, ⋯ , 10 are precise data provided in Table 5.
Precise data
Precise data
The linear regression equations are obtained by relative error least squares estimation, least squares estimation and the uncertain linear regression model based on the slope mean are shown in the Table 6. From the coefficient of the equation, the fitting effect of relative error least squares estimation and least squares estimation are similar, and the fitting effect of the uncertain linear regression model based on the slope mean is the worst.
The linear regression equations
Tables 8 are statistical tables of sum of absolute error and sum of squared residuals of the three models respectively. The results shows that the relative error least squares estimation model has the best fitting effect.
The absolute error
The residual sum of squares
In general, this is a numerical example of precise data, and the relative error least squares estimation model is also feasible. The data of the fitting results are analyzed and compared with the existing methods, the fitting effect is the better.
Absolute error is the difference between the estimated value and the true value, which can reflect the magnitude of the deviation of the estimated value from the true value. Relative error is the ratio of absolute error to the estimated value or the average value of multiple estimates. It is a dimensionless value, which can compare the reliability of different estimation results and reflect the impact of error on data more truly. The relative error can better reflect the reliability of estimated value, so it is of great practical significance to study the relative error of linear regression equation.
In the numerical analysis of practical problems, in order to study the impact of error on data, we often need to solve the relative error regression equation of data. In practice, however, the data we obtain may not be precise enough, or there may be no historical data at all. The traditional relative error cannot solve these problems. The relative error least squares estimation model proposed in this paper can solve this problem well, and can solve the fitting equations of precise data and imprecise data. The relative error least squares estimation is not only related to the magnitude of the estimated value, but also can reflect the influence of the estimated value on the actual value, making it more reliable. In this paper, we first introduce the least squares estimation of uncertain regression models, which can deal with the regression fitting problem of given data well. Then combined with the relative error, we proposed the uncertain relative error least squares estimation. On the basis of the unary relative error least squares estimation model, we extend the uncertain relative error least squares estimation to the multiple linear regression model. In the case of few variables, the relative error least squares estimation model is not complicated to solve. When there are many variables or a large amount of data, we should use computer programming or software to perform calculations and solve problems, which is also the direction we strive towards.
Footnotes
Acknowledgments
This research was supported by the Natural Science Foundation of Shandong Province (No.ZR2019BG015).
