Abstract
Traditional support vector regression dedicates to obtaining a regression function through a tube, which contains as many as precise observations. However, the data sometimes cannot be imprecisely observed, which implies that traditional support vector regression is not applicable. Motivated by this, in this paper, we employ uncertain variables to describe imprecise observations and build an optimization model, i.e., the uncertain support vector regression model. We further derive the crisp equivalent form of the model when inverse uncertainty distributions are known. Finally, we illustrate the application of the model by numerical examples.
Introduction
Support vector regression was first introduced by Vapnik [23] to explore the relationship between explanatory variables and response variables through a tube, which is uniquely determined by a function and the radius of the tube. Different from other regression methods, support vector regression selects a regression function by minimizing some errors of the observations. Up to now, support vector regression has achieved excellent performances [2, 19] in real-world applications, such as exchange rate forecasting [8, 21] and stock price forecasting [6]. More details about support vector regression can be found in overviews [20, 25].
The tacit assumption in traditional support vector regression is that the observations are always crisp. However, the data are difficult or impossible to be precisely observed in some cases. It implies that the traditional support vector regression cannot be applied to such a problem. As an alternative method of solving the problem with imprecise observations, Liu [11, 13] introduced uncertain variables to describe imprecise data. The regression problems with imprecisely observed data [28] were discussed based on the assumption. Then, some studies about imprecise observations emerge in different fields. For time series analysis with imprecise observations, the idea has been explored, and several models have been proposed [26, 27]. The problem of testing the estimating results [15, 29] and forecasting [9] were also investigated. For classification problems, the distance from an uncertain vector to a hyperplane was defined [7], and a hard margin uncertain support vector machine was given for separating linearly α-separable data sets.
Among the researches, regression problems with imprecise observations is an important topic. In fact, there have been extensive researches. The first attempt was given by Yao and Liu [28], who proposed least squares method for regression problems. Then on the one hand, different methods, such as least absolute deviations [17], lasso method [18] and ridge method [1] have been proposed to improve the robustness of the least squares method. On the other hand, different regression models were explored in the framework of uncertainty theory. For example, Fang and Hong [4] derived the crisp equivalent forms of different models under logarithmic, square root and reciprocal transformations. Song and Fu [22] derived the analytical expressions for the uncertain multivariable linear regression model by generalized least squares estimate. Hu and Gao [5] explored the properties of Gompertz regression model. However, the above methods focus on all losses from each observation. While the support vector regression function is determined by observations with heavy losses.
This paper dedicates to extending a support vector regression model to explore the relationship between the response variable and explanatory variables reviewed by the imprecise data. First, we employ uncertain variables to model imprecise data. Then we build an uncertain support vector regression model based on the maximum distance from observations to a hyperplane. It is distinct from random support vector models, which assume that the distribution is sufficiently close to its true frequency. In addition to giving a model, we also conduct numerical examples to illustrate the application of the proposed model.
The outline of the remaining paper is as follows. Section 2 lists some necessary definitions and theorems in uncertainty theory. Then Section 3 introduces the uncertain support vector regression model. Section 4 presents examples to illustrate the application of the model before concluding in Section 5.
Preliminaries
SectionPreliminaries This section reviews some necessary definitions and theorems used in the rest of the paper.
Let Ł be a σ-algebra on a nonempty set Γ . Liu [10, 11] defined that a set function M : Ł → [0, 1] is called an uncertain measure if it satisfies:
(1) M {Γ} =1 for the universal set Γ; (2) M {Λ} + M {Λ
c
} =1 for any Λ∈ Ł
(3) For every countable sequence Λ1, Λ2, ⋯ ∈ Ł,
The triple (Γ
k
, Ł
k
, M
k
) is called an uncertainty space. Then Liu [10] defined an uncertain variable τ as a measurable function from an uncertainty space (Γ, Ł , M) to the set of real numbers, i.e., the set {τ ∈ B} = {γ ∈ Γ ∣ τ (γ) ∈ B} belongs to Ł for any Borel set B. The function
Uncertain support vector regression
Suppose that
As stated above,
It follows from Definition 1 that the distance from uncertain vector
We choose
When the distance is minimized, each
Let ∈ > 0 be a real number. In this work, we assume that the function The tube determined by a hyperplane
which is the second constraint of Model (4). It follows immediately from Theorem 1 that the inverse uncertainty distribution of
The theorem is completed.
In this section, we illustrate the application of uncertain support vector regression by numerical examples.
We suppose that all the imprecise observations are characterized by linear uncertain variables. The uncertainty distribution and the inverse uncertainty distribution of a linear uncertain variable Ł (a, b) is
Imprecise observations in Example 1
Imprecise observations in Example 1
In order to obtain the optimal value (w, b) of function y = wx + b, we formulate Model (6) according to the uncertain support vector regression model.
We want to know how the optimal solutions to Model (6) change with different confidence levels and different accuracy parameters. Let parameters α i = β i ∈ {0.90, 0.95, 0.99} for i = 1, 2, …, 15, and let ∈ ∈{10, 9, 8, 7, 6} . Then we employ the function ‘fmincon’ in Matlab to solve Model (6) and the obtained results are presented in Table 2.
Optimal solutions (w, b) to Model 6) under different ∈ and α
When the confidence level α = 0.95 and accuracy parameter ∈ = 6, the optimal function is y = 1.9500x + 6.4250. The function and the tube with radius 6 is plotted in Fig. 2.

The result generated from Model (6) when α = 0.95, ∈ = 6.
The optimal function, y = 1.9500x + 6.4250, can be employed to predict a new observation. Forecast value of a new observation is also known as estimated value.
Suppose that the uncertain variable
Imprecise observations in Example 2
We reformulate Model (7) to seek the optimal value of (w1, w2, b). Let parameters α i = β i ∈ {0.90, 0.95, 0.99} for i = 1, 2, …, 20, and let ∈ ∈ {20, 15, 10, 8, 6} . Then the optimal solutions (w1, w2, b) can be obtained by using ‘fmincon’ function in Matlab. The results are reported in Table 4.
Optimal solutions (w1, w2, b) to Model (7) under different ∈ and α
This paper proposed an uncertain support vector regression model to explore the relationship between explanatory variables and the response variable with imprecise observations. An optimization model was presented, and the crisp equivalent form was derived. Then numerical examples were conducted to illustrate the application of the uncertain support vector regression model. In future work, it is worthy to generalize the model for non-linear researches and to discriminate outliers in high-dimensional regression.
Footnotes
Acknowledgment
This work was supported by National Natural Science Foundation of China (Nos. 72071008 and 71771011).
