Abstract
Regression analysis is a potent tool to explore the relationship of variables and widely used in many areas. Classical statistics assume that the residual of regression model should follow the Gauss-Markov hypothesis. However, in many cases, the data is not obeying this hypothesis particularly real-life data. Therefore, this paper explores the Von Bertalanffy regression model under the framework of the uncertainty theory, and employs the uncertain maximum likelihood estimation (MLE) to estimate the unknown parameters. Furthermore, the uncertain hypothesis test and an algorithm for data modification which aimed to find outliers and modify data are studied, then the forecast value and confidence interval be formulated. Finally, a real-life numerical example of applying the above theories be given, this example shows that the uncertain MLE has better performance compare with the uncertain least squares and the least absolute deviations methods. Consequently, the uncertain MLE is a better way to deal with the real-life data.
Keywords
Introduction
Regression analysis, one of a widely used tool in statistics, that study the relevant between variables, has a long history of development. Galton [1] brought about the statement of regression for a linear regression model to describe a natural phenomenon. Yule [2] expanded the notion of regression, and then, regression could be applied for more general context in statistical. The main problem of regression is to estimate the unknown parameters, Legendre [3] and Gauss [4] put forward the least squares estimator (LSE) which is a popular way to compute the regression parameters. Edgeworth [5, 6] proposed the least absolute deviations (LAD) to solve this problem. Wilks [7] proposed the maximum likelihood estimation (MLE) which is also a way that widely used by academics. Meantime, a growing number of regression model had emerged.
Since Von Bertalanffy growth model be proposed [8], Beverton and Holt [9] applied it to the yield per recruit problem, this model has been wildly used such as fisheries biology [10, 11], oil production [12] and adult body size prediction [13]. However, many cases shown that if the data from real-life or cannot be precisely observed, the regression analysis under the frame work of probability theory may cause some counterintuitive results [14]. Zadeh [15, 16] used fuzzy set theory to solve it and Tanaka et al. [17] introduced the fuzzy linear regression. As time went by, reseachers have more efficient requirements for data analysis. Inspired by this, the uncertainty theory was established and perfected by Liu [14, 18]. The uncertainty theory based on normality, duality, subadditivity, and product axioms, is a good way to resolve the unreliability of human system. Nowadays, the uncertain statistics based on uncertainty theory has been triumphantly used in many ways, this theory have seen considerable development and have been distributed by many researchers. For instance, Sheng and Kar [19] employed inverse uncertainty distribution study moments of uncertain variables. Some scholars [20, 21] achieved results of uncertain time series. Notably, in uncertain regression analysis, Yao and Liu [22] proposed uncertain least squares (LS) and employed it to estimate the unknown parameters. Lio and Liu [23] given the residual analysis and provided the predict value of an interval estimation. Further uncertain maximum likelihood estimation was proposed by Lio and Liu [24] had be applied to uncertain regression analysis. Hu et al. [25] proposed uncertain Gompertz regression model and employed uncertain least squares estimation for that. Liu et al. [26] introduced rescaled least squares estimation for uncertain Box-Cox regression model. Zou et al. [27] proposed uncertain Weibull regression model with imprecise observations.
Nonetheless, the Von Bertalanffy regression model under the uncertainty theory has not been studied. For processing real-life data, the uncertain MLE be employed. The paper rest part is organized as follows. This paper proposes the uncertain Von Bertalanffy regression model in Section 2, and Section 3 introduces the uncertain MLE for the uncertain Von Bertalanffy regression model to estimate unknown parameters. Following that, in Section 4, the uncertain hypothesis test be introduced and an algorithm for data modification be developed. Forecast value and confidence interval be suggested in Section 5. Then a numerical example from real life will employ above theories, this leads to the formulation of the uncertain Von Bertalanffy regression model and obtain forecast value, Next, we compare uncertain MLE with uncertain LSE and LAD in Section 6. Finally, we draw some conclusion in Section 7.
Uncertain Von Bertalanffy regression model
The Von Bertalanffy regression model [8] is a growth model, be proposed for fishery biologist’s problems [28]. From Beverton and Holt [9] applicated the model for the first time, many scholars used it to several areas [10, 29].
The model can be written as
Although the Von Bertalanffy regression model under the frame work of probability theory has a long history of development, many cases hardly satisfied with the assume of probability theory that the true frequency be close adequately by estimated distribution, which be indicated as excessive outliers [10, 30], predicted value far from actual value [11, 12]. Due to above reasons, it is hard or impossible to process the real-life data, the classical Von Bertalanffy model is not work well. Therefore, we will present the uncertain Von Bertalanffy model to solve the violation of assumptions, and in following sections, several methods and a numerical example be given to show that the uncertain Von Bertalanffy model can handle these problems well.
Uncertain maximum likelihood estimation
Lio and Liu [24] proposed uncertain maximum likelihood estimation to estimate unknown parameters of regression model. The ideology of uncertain maximum likelihood estimation is employed to the uncertain Von Bertalanffy model.
Otherwise, L (a, b, k, e, σ|z1, z2, ⋯ , z
n
) is strictly decreasing on
The proof is completed.
Uncertain hypothesis test
How to judge whether the simple regression model and disturbance term are appropriate is a classical problem in regression analysis, meantime, the outliers is also a confusing problem. In the framework of probability theory, we have t-test to solve the first problem, and have t-test for studentized residual (SRE) [31] to find the outliers. In uncertainty theory, Ye and Liu [32] propose the uncertain hypothesis test which is a good way to judge whether the estimated parameters
The two-side hypothesis that testing whether
Following the uncertain rejection region (17) that
Data from real-life always have outliers, the regression curve may be misled deviate from the right path by outliers, this may lead to the forecast values stay away from the actual values. The following algorithm be proposed to smooth the curve and reduce effect of outliers, then make the predicted values more precise. The numerical example in Section 6 can verify it.
Step 1. (Parameter estimation) With the observed data (t i , y i ) , i = 1, 2, ⋯ , n, calculate maximum likelihood estimations (b*, k*, e*, σ*) for (b, k, e, σ) in uncertain Von Bertalanffy model.
Step 2. (Uncertain hypothesis test) Assume the significance level α be given, construct the uncertain rejection region W.
Step 3. (Data modification) If
and the number of outliers is more than αn, then set
Step 4. (Forecast) For new data, to calculate the forecast values, and formulate the confidence interval.
Forecast value and confidence interval
Suppose that (t i , y i ) , i = 1, 2, ⋯ , n be a set of modified data which has no outliers, (t0, y0) be the item that be forecasted, the fitted model on this data is
Stochastic regression model for the water cut in Shengtuo oilfield
Chen et al. [30] use Von Bertalanffy model to predict the water cut in water flood field. They set p = 3, a = 1 and estimate the unknown parameters by ordinary least squares (OLS), the model and data for 20 years (Table 1) follow as,
The actual value of water cut in Shengtuo oilfield
The actual value of water cut in Shengtuo oilfield
However, some predicted values have strong bias because the oilfield corporation had adjusted technology, this fact may raise heteroscedasticity. Otherwise, regression analysis based on probability theory requires that residual should subject to the Gauss-Markov assumptions, these assumptions assume the disturbance term subject to stochastic normal distribution which has zero mean and constant variance, it also assumes that the residual plot has no outliers and sequence irrelevance. But, the residual plot for the model (12) shown in Fig. 1 is seems not to subject to the Gauss-Markov assumptions. The t-test for SRE which null hypothesis is the data is normal value had found the first data in rejection region that constructed by Bonferroni critical value, it is outlier. Here we set the significance level as 0.05.

Residual plot for the model (12).
Using Ljung and Box [33] test which
Finding outliers and lowing the influence of outliers is also a method which could solve problems. After 3rd iteration, the model is

Residual plot for the model (13).
Whereas the p-value with Kolmogorov-Smirnov test is 6.47077 × 10-5 and the p-value with Ljung-Box test is 0.02899, all of this less than 0.05. As a result, the model (13) is also violate the Gauss-Markov assumptions.
In these circumstances, the distribution function be assumed by stochastic regression analysis is far from the actual frequency for ɛ. This led to the violation of assumptions if a random variable still regards as the disturbance term. Consequently, at this case, regression analysis in probability theory is invalid, and that is why we attempt to use uncertain regression analysis to handle the instance of Shengtuo oilfield.
The uncertain Von Bertalanffy model is
For the first 19 data in Table 1 and model (14), suppose α = 0.05 and αn = 0.95, which means every ɛ cannot fall into rejection region if the model passed the hypothesis test in subsection 4.1, the following uncertain rejection region can be calculated

Residual plot for the uncertain Von Bertalanffy model (16).
After once iteration,

Residual plot for the uncertain Von Bertalanffy model (17).
Suppose x = 20 be a new observed data. According to Equation (9) and model (17), the following point estimation of x = 20 be worked out
Compare the variance for the disturbance term of model (17) and model (12), the variance for the disturbance term of model (12), i.e.,
The uncertain LS [22] have less complex calculation than MLE. Nevertheless, the outlier can affect LS enormously, the Shengtuo oilfield water cut dataset is a good instance.
If we use uncertain LS to estimate the parameters of model (14), the algorithm in Section 4 cannot convergence but the variance for disturbance term is not descend obviously. Calculate the LS

Residual plot for the 7-th uncertain Von Bertalanffy model.
The robust for LS is less than MLE and outlier is widely existing in the real world, due to this, manipulating data like Shengtuo oilfield may apply uncertain MLE yet not the uncertain LS.
The comparison of three methods
Liu and Yang [35] proposed LAD for uncertain regression which can handle data with outliers. According to method of LAD and after 4-th iteration, the estimation of known parameters is
the residual plot as shown as Fig. 6, and it has no obvious trends. However, the variance for

Residual plot for the uncertain Von Bertalanffy model (19).
To sum up, the Von Bertalanffy regression model in probability theory hardly to handle the data of Shengtuo oilfield which has outliers, the residual of stochastic model violated the Gauss-Markov assumptions. Moreover, the uncertain MLE have the least iteration times and the smallest variance of disturbance item than uncertain LS and LAD. Due to this, the model which estimated by the uncertain MLE have more precisely and stable prediction interval.
In this paper, we considered the Von Bertalanffy model under the framework of uncertainty theory for real-life data, because the regression analysis in probability theory is powerless for some real-life cases. The uncertain Von Bertalanffy model was first proposed, it also was used to fit the data of water cut in Shengtuo oilfield. Otherwise, in selection of parameter estimation, the uncertain MLE has better performance that both iteration times and residual characteristics than uncertain LS and LAD in a real case, these features can offer several advantages: less iteration times can save time and the modification for raw data be run down; smaller variance means narrower confidence interval and more stable forecast value. Therefore, the uncertain MLE was employed to estimate the unknown parameters. In addition, uncertain MLE has more robust than the uncertain LS and the LAD method when data has outliers and heteroscedasticity, it means that we should use uncertain MLE if data from real world and has been over multiple stages (data in every stage may have new features which could lead to heteroscedasticity). Technology has been rapidly updated in recent decades, dataset with heteroscedasticity will become increasingly common in future.
In future studies, machine learning methods such as neural networks could be employed to the traditional regression model under the framework of uncertainty theory.
Footnotes
Acknowledgments
This work was funded by the National Natural Science Foundation of China (Grant Nos. 12061072 and 62162059) and the Xinjiang Key Laboratory of Applied Mathematics (Grant No. XJDX1401).
Declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This paper does not contain any studies with human participants or animals performed by any of the authors.
Author contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yuhong Sheng and Hao Zhang. The first draft of the manuscript was written by Hao Zhang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
