Uncertain least square support vector regression with imprecise observations

Abstract

In this study, an innovative approach that combines least square support vector regression (LSSVR) with uncertainty theory to enhance its performance in dealing with low-quality or imprecise data from real-world be proposed. The resulting model, called uncertain least square support vector regression (ULSSVR), incorporates chance constraints and simplified parameter selection, which are critical to handle imprecise observations. A numerical algorithm called the conjugate residual method (CR) is introduced to reduce the computational complexity of the model solution. The experimental results using both small and medium-sized datasets demonstrate the superior performance of ULSSVR in terms of prediction accuracy and generalization ability compared to other models such as uncertain support vector regression (USVR), uncertain linear lodel, uncertain polynomial model, and uncertain growth models. ULSSVR not only improves prediction accuracy by at least 28.49% but also demonstrates faster computational speed. Overall, ULSSVR presents a promising solution for data science and internet applications where dealing with imprecise and low-quality data is a common challenge.

Keywords

Least square support vector regression uncertainty theory conjugate residual method chance constraint

1 Introduction

Least square support vector machine (LSSVM) has been extensively applied in academia and industry due to its impressive generalization performance. Devised by Suykens and his collaborators in 1999 [1], LSSVM is based on support vector machine (SVM) proposed by Vapnik et al. [2, 3]. Similar to SVM, LSSVM maps the input vector into a high-dimensional feature space through a nonlinear mapping. Furthermore, LSSVM can be divided into least square support vector classification (LSSVC) and LSSVR [4] with respect to purpose, the latter being the main focus of this work. In contrast with SVM, LSSVM uses equality constraints instead of inequality constraints in traditional SVM. This has contributed to that LSSVM will solve a least square (LS) problem instead of a quadratic programming (QP) from SVM, LSSVM runs significantly faster [4]. LSSVM has now attracted many researchers devote into. Suykens et al. [5] utilized conjugate gradient method (CG) [6] to solve LS problem in LSSVM and presented a large-scale algorithm, the CG method can reduce space complexity of LSSVM effectively. Suykens and Vandewalle [7] structured a recurrent network based on LSSVM, which utilized early stopping as a form of regularization, reducing its computational complexity. In terms of practical applications, LSSVR has been applied successfully in various fields such as stock market [8], river flow forecasting [9] and wind power generation [10].

The input data of the aforementioned researches is precise. In many cases, however, data or observations are imprecise or low-quality, especially those from humans. The traditional statistics was inefficient when dealing with such imprecise data, which has prompted researchers to search for solutions. Zadeh [11, 12] had proposed the fuzzy set theory, and based on this theory, fuzzy SVM be structured by Lin and Wang [13]. Xie [14] introduced fuzzy parameters to LSSVM and thus proposed fuzzy LSSVM. Sun and Pan [15] established a data domain description fuzzy LSSVR to solve isolated points problem. Over time, researchers need more effective and better-performing tools for processing imprecise observations. Liu [16] put forward the uncertainty theory and perfected it [17]. This theory, which is based on the normality, duality, subadditivity, and product axioms, provides a useful method for handling imprecise observations. In uncertainty theory, the imprecise observations can be regarded as uncertain variables, using statistics tools from uncertainty theory can cope uncertain variables efficient and excellent. The uncertainty theory has now evolved into an attractive field of research. In moments of uncertain variable, Sheng and Kar [18] proposed several methods via inverse uncertainty distribution. In uncertain regression, Yao and Liu [19] proposed the uncertain least squares (LS), which became a widespread used estimation method at present. Lio and Liu [20] applied the residual analysis to uncertain regression analysis and proposed the interval estimation of predictive value. Li et al. [21] proposed uncertain SVR with chance constraints and hard margin. In this work, the uncertain vectors are treated as input vectors, while the output variables are uncertain variables, and ULSSVR will handle this imprecise data and calculate the expectation of prediction. ULSSVR is a novel option with marvelous performance for the research of regression analysis.

The paper rest part is organized as follows. The elementary knowledge be introduced in Section 2. Introduction to LSSVR with precise observations in Section 3. In Section 4, some definitions and theorems be given, then ULSSVR be proposed and properties be discussed. An algorithm for ULSSVR with low space complexity be presented in Section 5. Methods and theories in Sections 4 and 5 will be applied to two numerical examples in Section 6, and ULSSVR shows the better generalization performance via comparing with other uncertain regression models. Finally, some conclusions of this work are drawn in Section 7.

2 Preliminaries

In this section, axioms and a theorem in the uncertainty theory will be introduced as foundation knowledge.

Definition 2.1. (Liu [16]) Let $L$ be a σ-algebra on a nonempty set Γ. Then $(Γ, L)$ is a measurable space, and each element Γ in $L$ is called an event. An uncertain measure is a set function $M : L \to [0, 1]$ if it satisfies the following axioms:

Axiom 1. (Normality Axiom) The universal set Γ has measure $M {L} = 1$ .

Axiom 2. (Duality Axiom) Any event Λ has measure $M {Λ} + M {Λ^{c}} = 1$ .

Axiom 3. (Subadditivity Axiom) For every countable sequence of events Λ₁, Λ₂, ⋯ , we have $M {⋃_{i = 1}^{\infty} Λ_{i}} \leq \sum_{i = 1}^{\infty} M {Λ_{i}} .$

For uncertainty space $(Γ, L, M)$ , the following axiom about product measure was given by Liu [22]

Axiom 4. (Product Axiom) Let $(Γ_{k}, L_{k}, M_{k})$ be uncertainty spaces for k = 1, 2, ⋯ The product uncertain measure $M$ is an uncertain measure satisfying $M {\prod_{i = 1}^{\infty} Λ_{i}} = ⋀_{i = 1}^{\infty} M {Λ_{i}}$ where Λ_k are arbitrarily chosen events from $L$ .

Theorem 2.1. (Liu [17]) A function $Φ^{- 1} : (0, 1) \to ℝ$ is the inverse uncertainty distribution of an uncertain variables ξ iff it is continuous and $M {ξ \leq Φ^{- 1} (α)} = α$ for all α ∈ (0, 1) .

3 Least square support vector regression with precise observations

LSSVR was devised by Suykens with his collaborators [4] in 2002. Based on SVR, which proposed by Vapnik et al. [2], the inequality constraints in SVR were replaced by equality constraint in LSSVR, then a QP can be converted to a LS problem via Karush-Kuhn-Tucker (KKT) conditions.

Similar to SVR, the goals of LSSVR are to find a linear equation $f (x_{i}) = (ω^{T} x_{i}) + b, ω \in ℝ^{k}, b \in ℝ$ (1) which can achieve the best prediction. In Equation (1), where x_i , i = 1, 2, ⋯ , n, is input vector, y_i, i = 1, 2, ⋯ , n, is actual value and as the corresponding scalar output, ω is parameter vector, b is a constant parameter. Here and below, the dataset is denoted as $G = {(x_{i}, y_{i})}_{i = 1}^{n}$ , and G will be divided into test set D and train set D^c, where D can be regarded as the data from ‘future’, which need to be predicted, and regression Equation (1) be estimated from D^c.

Solution of the following optimization problem is the key to access the regression Equation (1) $\begin{matrix} \min_{ω, b, e_{i}} & \frac{1}{2} ‖ ω ‖^{2} + \frac{C}{2} \sum_{i = 1}^{n} e_{i}^{2} \\ s . t . & y_{i} = (ω^{T} x_{i}) + b + e_{i}, \end{matrix}$ (2) where e_i = y_i - f ( x_i ).

To solve the optimization problem (2), the Lagrange multipliers β_i, i = 1, 2, ⋯ , n, which is a real number, be introduced to structure Lagrangian

$\begin{matrix} L (ω, b, e, β) & = \frac{1}{2} ‖ ω ‖^{2} + \frac{C}{2} \sum_{i = 1}^{n} e_{i}^{2} \\ - \sum_{i = 1}^{n} β_{i} ((ω^{T} x_{i}) + b + e_{i} - y_{i}), \end{matrix}$ (3) where β = [β₁, β₂, ⋯ , β_n] ^T. Then the optimization problem (2) can be transformed into the following linear system $Y = (I / C + H) β,$ (4) where y = [y₁, y₂, ⋯ , y_n] ^T, H_ij = x_ix_j ^T is the element in the ith row and jth column of the matrix H.

Solving β from the linear system (4) can obtain a equation $f (x_{j}) = \sum_{i = 1}^{n} β_{i} x_{j} {x_{i}}^{T} + b,$ (5) this equation is the LSSVR-based equation.

4 Uncertain least square support vector regression

Now, LSSVR will be introduced into the uncertainty theory. But before that, some settings and methods be presented as follows.

Suppose ${\tilde{x}}_{i},$ i = 1, 2, ⋯ , n, be a set of imprecisely observed k-dimensional vectors, ${\tilde{y}}_{i},$ i = 1, 2, ⋯ , n be a set of imprecisely observed data, $g ({\tilde{x}}_{i})$ is a monotonic mapping function, and $g ({\tilde{x}}_{i}) = u_{i}$ , where ${\tilde{u}}_{i 1}, {\tilde{u}}_{i 2}, \dots, {\tilde{u}}_{ik}, \tilde{y_{i}}$ are independent uncertain variables with regular uncertainty distributions Φ_i1, Φ_i2, ⋯ , Φ_ik, Ψ_i, i = 1, 2, ⋯ , n. The above settings will be used in rest part in this paper. The goal of ULSSVR is to seek ω and b to construct the following regression equation $f (u_{i}) = (ω^{T} u_{i}) + b, ω \in ℝ^{k}, b \in ℝ .$ (6) The constraint in ULSSVR is a chance constraint, the definition of residual variable is proposed before the chance constraint be presented.

Definition 4.1. Let ${\tilde{y}}_{i}, u_{i 1}, u_{i 2}, \dots, u_{ik}$ , i = 1, 2, ⋯ , n be uncertain variables with regular uncertainty distributions Ψ_i, Φ_i1, Φ_i2, ⋯ , Φ_ik, i = 1, 2, ⋯ , n, then $e_{i} = {\tilde{y}}_{i} - (ω^{T} u_{i}) - b .$ (7) The uncertain variable e_i is called residual variable, and vector $[e_{1}, e_{2}, \dots, e_{n}]^{T}$ (8) is called residual vector.

Following Definition 4.1, a theorem about inverse uncertainty distribution of residual variable can be proofed.

Theorem 4.1. For the residual variable e_i and belief degree α_i, it has an inverse uncertainty distribution $T^{- 1} (α_{i}) = Ψ_{i}^{- 1} (α_{i}) - \sum_{j = 1}^{k} ω_{j} ϒ_{ij}^{- 1} (α_{i}) - b,$ (9)with $M {e_{i} \leq T^{- 1} (α_{i})} = α_{i},$ (10)where $ϒ_{ij}^{- 1} (α_{i}) = {\begin{matrix} Φ_{ij}^{- 1} (1 - α_{i}), & if ω_{j} \geq 0, \\ Φ_{ij}^{- 1} (α_{i}), & otherwise, \end{matrix}$ (11)and $T^{- 1} (α_{i})$ is the inverse uncertainty distribution of e_i.

Proof. The event ${e_{i} \leq T^{- 1} (α_{i})}$ is always equal to ${{\tilde{y}}_{i} - (ω^{T} u_{i}) \leq Ψ_{i}^{- 1} (α_{i}) - \sum_{j = 1}^{k} ω_{j} Υ_{i j}^{- 1} (α_{i})},$

this can be denoted as

${e_{i} \leq T^{- 1} (α_{i})} \equiv {{\tilde{y}}_{i} - (ω^{T} u_{i}) \leq Ψ_{i}^{- 1} (α_{i}) - \sum_{j = 1}^{k} ω_{j} Υ_{i j}^{- 1} (α_{i})} .$

For the sake of simplicity, the parameter vector ω is temporarily set negative which means $ϒ_{ij}^{- 1} (α_{i}) = Φ_{ij}^{- 1} (α_{i})$ . Then the following two operations can be obtained $\begin{matrix} {e_{i} \leq T^{- 1} (α_{i})} \supset & {{\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})} \cap {u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})} \cap \\ {u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})} \cap \dots \cap {u_{ik} \leq ϒ_{ik}^{- 1} (α_{i})} \\ {e_{i} \leq T^{- 1} (α_{i})} \subset & {{\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})} \cup {u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})} \cup \\ {u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})} \cup \dots \cup {u_{ik} \leq ϒ_{ik}^{- 1} (α_{i})} . \end{matrix}$ Following the independence of each variable, two inequations of measure are gotten as follows $\begin{matrix} M_{e_{i}} \geq M & {({\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})) \cap (u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})) \cap \\ (u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})) \cap \dots \cap (u_{ik} \leq ϒ_{ik}^{- 1} (α_{i}))} \\ = M & {{\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})} \land M {u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})} \land \\ M & {u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})} \land \dots \land M {u_{ik} \leq ϒ_{ik}^{- 1} (α_{i})} \\ = α_{i} & \land α_{i} \land \dots \land α_{i} = α_{i} \end{matrix}$ $\begin{matrix} M_{e_{i}} \leq M & {({\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})) \cup (u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})) \cup \\ (u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})) \cup \dots \cup (u_{ik} \leq ϒ_{ik}^{- 1} (α_{i}))} \\ = M & {{\tilde{y}}_{i} \leq Ψ_{i}^{- 1} (α_{i})} \lor M {u_{i 1} \leq ϒ_{i 1}^{- 1} (α_{i})} \lor \\ M & {u_{i 2} \leq ϒ_{i 2}^{- 1} (α_{i})} \lor \dots \lor M {u_{ik} \leq ϒ_{ik}^{- 1} (α_{i})} \\ = α_{i} & \lor α_{i} \lor \dots \lor α_{i} = α_{i} . \end{matrix}$ where $M_{e_{i}} = M {e_{i} \leq T^{- 1} (α_{i})}$ . It is proved that $M {e_{i} \leq T^{- 1} (α_{i})} = α_{i}$ .

In more general situation, if ω_j ≥ 0 and $ϒ_{ij}^{- 1} (α_{i}) = Φ_{ij}^{- 1} (1 - α_{i})$ , the measure of event ${u_{ij} \geq Φ_{ij}^{- 1} (1 - α_{i})}$ can be expressed as $M {u_{ij} \geq Φ_{ij}^{- 1} (1 - α_{i})} = α_{i} .$ Hence the residual variable e_i has the inverse uncertainty distribution $T^{- 1}$ . The theorem is proved.□

The parameters of Equation (6) can be obtained by the following constrained optimization problem $\begin{matrix} \min_{ω, b, e_{i}} & \frac{1}{2} ‖ ω ‖^{2} + \frac{C}{2} \sum_{i = 1}^{n} {(T^{- 1} (α_{i}))}^{2} \\ s . t . & M {e_{i} \leq T^{- 1} (α_{i})} = α_{i}, i = 1, 2, \dots, n, \end{matrix}$ (12) where C is regularization constant, ω with b are estimating parameters and α_i, i = 1, 2, ⋯ , n is belief degree, chosen a priori.

In many cases, however, researchers already know the inverse distribution of each variable, the equivalent form in the following theorem can keep thing simple in this situation.

Theorem 4.2. The model (12) has an equivalent form $\begin{matrix} \min_{ω, b, e_{i}} & \frac{1}{2} ‖ ω ‖^{2} + \frac{C}{2} \sum_{i = 1}^{n} {(T^{- 1} (α_{i}))}^{2} \\ s . t . & T^{- 1} (α_{i}) = Ψ_{i}^{- 1} (α_{i}) - \sum_{j = 1}^{k} ω_{j} ϒ_{ij}^{- 1} (α_{i}) - b, \\ i = 1, 2, \dots, n . \end{matrix}$ (13)

Proof. On the one hand, it follows from Theorem 4.1 that residual variable e_i has the inverse uncertainty distribution $T^{- 1} (α_{i}) = Ψ_{i}^{- 1} (α_{i}) - \sum_{j = 1}^{k} ω_{j} ϒ_{ij}^{- 1} (α_{i}) - b .$ (14) On the other hand, using Theorem 2.1, $M {e_{i} \leq T^{- 1} (α_{i})} = α_{i}$ holds if and only if Equation (14) holds. The theorem is proved.□

The optimization problem (13) can be rewritten as $\begin{matrix} \min_{ω, b, A} & \frac{1}{2} ω^{T} ω + \frac{C}{2} A^{T} A \\ s . t . & A = Y - X ω, \end{matrix}$ (15) where $A = [T_{1}^{- 1} (α_{1}), T_{2}^{- 1} (α_{2}), \dots, T_{n}^{- 1} (α_{n})]^{T}$ , $Y = [Ψ_{1}^{- 1} (α_{1}), Ψ_{2}^{- 1} (α_{2}), \dots, Ψ_{n}^{- 1} (α_{n})]^{T}$ and $X = [\begin{matrix} 1 & ϒ_{11}^{- 1} (α_{1}) & ϒ_{12}^{- 1} (α_{1}) & \dots & ϒ_{1 k}^{- 1} (α_{1}) \\ 1 & ϒ_{21}^{- 1} (α_{2}) & ϒ_{22}^{- 1} (α_{2}) & \dots & ϒ_{2 k}^{- 1} (α_{2}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & ϒ_{n 1}^{- 1} (α_{n}) & ϒ_{n 2}^{- 1} (α_{n}) & \dots & ϒ_{nk}^{- 1} (α_{n}) \end{matrix}] .$ (16) Introducing the Lagrange multipliers β_i, which is a real number, and obtain the following Lagrangian

$\begin{matrix} L (ω, A, β) & = \frac{1}{2} ω^{T} ω + \frac{C}{2} A^{T} A \\ - β^{T} (A + X ω - Y), \end{matrix}$ (17) where β = [β₁, β₂, ⋯ , β_n] ^T.

Taking the partial derivative of ω and β, then let them equal to 0, the two equations be derived as follows

$\begin{matrix} ω = X^{T} β, \\ A + X ω - Y = 0 . \end{matrix}$ (18) By equation (18) into Lagrangian (17), the Lagrangian as follows

$\begin{matrix} L (ω, A, β) & = \frac{1}{2} β^{T} {XX}^{T} β \\ + \frac{C}{2} (β^{T} {XX}^{T} {XX}^{T} β \\ - 2 β^{T} {XX}^{T} Y + Y^{T} Y), \end{matrix}$ (19) Differentiating function (19) and let it equals to 0

$\begin{matrix} \frac{\partial L (ω, A, β)}{\partial β} & = {XX}^{T} β \\ + C ({XX}^{T} {XX}^{T} β - {XX}^{T} Y) = 0 \\ \Rightarrow CY = (I + {CXX}^{T}) β, \end{matrix}$ (20) the following linear system be calculated $[\begin{matrix} 0 & 1_{n}^{T} \\ 1_{n} & H + I / C \end{matrix}] [\begin{matrix} b \\ β \end{matrix}] = [\begin{matrix} 0 \\ Y \end{matrix}],$ (21) where the element in the ith row and jth column of matrix H is $H_{ij} = ϒ_{i}^{- 1} (α_{i}) {(ϒ_{j}^{- 1} (α_{j}))}^{T}$ , 1_n = [1, 1, ⋯ , 1] ^T. Solving β from the linear system (21) can obtain a equation $f (u_{j}) = \sum_{i = 1}^{n} β_{i} u_{j} {(ϒ_{i}^{- 1} (α_{i}))}^{T} + b,$ (22) this equation is the ULSSVR-based equation.

This ULSSVR has some properties:

Parameters

Model (13) has two parameters, which chosen a priori, the regularization constant C and belief degree α. It is less than USVR [23]. In USVR [23], seeking optimal width of the margin may take a lot time and computing power because the value range of the width is [0, + ∞). However, ULSSVR does not have this parameter, thus parameter selection may be simpler.

Lack of sparseness

According to Equation (20), no β_i value will normally be exactly equal to zero, every input vector is support vector. From this a drawback of ULSSVR is the lack of sparseness can be clearly concluded.

The lack of sparseness may cause solving of the linear system (21) with increased difficulties. For example, the space complexity of solve system (21) is O (n²), it may run out of memory if scale of n is large. Therefore, an algorithm with low space complexity will be presented in next section to solve linear system (21).

5 Numerical algorithm for ULSSVR

The conjugate residual method (CR) [24] is a low memory requirements method for solving linear system $A x = B$ [25], where $A$ is symmetric matrix. First, a function of residual $r = B - A x$ be set as $R (x) = ‖ r ‖^{2} = x^{T} A^{T} A x - 2 x^{T} A^{T} B + B^{T} B,$ (23) then function (23) has gradient $▽ R (x) = 2 A^{T} A x - 2 A^{T} B = - 2 A^{T} r .$ (24)

Algorithm 1 Conjugate residual
Input:
k = 0; x ⁽⁰⁾ is an initial guess; $r^{(0)} = B - A x^{(0)}$ ;
set p _-1 = 0, μ_-1 = 0
1: while not reach the convergence condition do
2: k = k + 1
3: p _k = r ^(k-1) + λ_k-1 p _k-1
4: $μ_{k} = \frac{r^{(k) T} A^{T} r^{(k)}}{p_{k}^{T} A^{T} A p_{k}}$
5: x ^(k) = x ^(k-1) + μ_k p _k
6: $r^{(k)} = r^{(k - 1)} - μ_{k} A p_{k}$
7: $λ_{k} = \frac{r^{(k + 1) T} A^{T} r^{(k + 1)}}{r^{(k) T} A^{T} r^{(k)}}$
8: end while
9: x = x ^(k).

In the case of one dimension, the minimum of function (23) be sought by $x^{(k)} = x^{(k - 1)} + μ_{k} p_{k}$ (25) from direction p _k, the initial value of the direction is always set to zero [25]. By Equation (25) into function (23) to get $R (x^{(k)}) = μ_{k}^{2} p_{k}^{T} A^{T} A p_{k} - 2 μ_{k} p_{k}^{T} A^{T} r^{(k - 1)} + R (x^{(k - 1)}) .$ (26) Since R ( x ^(k)) is a quadratic function of μ_k, the best μ_k can be obtained by taking partial derivatives of μ_k. The best μ_k is $μ_{k} = \frac{p_{k}^{T} A^{T} r^{(k - 1)}}{p_{k}^{T} A^{T} A p_{k}} .$ (27) In linear system (21), let $A = (I / C + {XX}^{T})$ , x = β and $B = Y$ , the following algorithm can solve this linear system

6 Numerical example

An important definition of forecast value should be given here before ULSSVR be employed in examples. A dataset be divided into test set D and train set D^c, where D need to be predicted, and regression function be estimated from D^c. A definition of forecast value as follows.

Definition 6.1. Suppose that $({\tilde{x}}_{0}, {\tilde{y}}_{0})$ be the item which will have been forecasted, and the ULSSVR-based equation fitted by train set as $f (u_{0}) = (ω^{* T} u_{0}) + b^{*},$ where ω ^*, b^* are estimated parameters and $u_{0} = g ({\tilde{x}}_{0})$ is the data be mapped. Then the forecast value of ${\tilde{y}}_{0}$ can be defined as ${\hat{\tilde{y}}}_{0} = E .$ (28)

In this section, a toy dataset will be employed to verify feasibility of ULSSVR and reveal properties at first. Then the generalization performance of ULSSVR be shown via a medium-sized dataset. The pipeline of using ULSSVR to process imprecise observations is shown in Fig. 1.

Fig. 1

Pipeline.

6.1 Toy dataset

The dataset which shown in Table 1 will be trained by ULSSVR to show some properties. Let first 8 data as train set D^c and last 2 data as test set D. Using Algorithm 1 to train D^c and set C = 10, α = 0.95, kernel function is linear, the ULSSVR-based equation as follows $f (u_{j}) = \sum_{i = 1}^{n} β_{i}^{*} u_{j} {(ϒ_{i}^{- 1} (0.95))}^{T} + b^{*},$ (29)

Table 1

Toy dataset

x1	x2	y	x1	x2	y
$L (0.50, 2.50)$	$L (1.00, 3.00)$	$L (4.50, 6.50)$	$L (1.60, 3.60)$	$L (5.70, 7.70)$	$L (15.00, 17.00)$
$L (2.00, 4.00)$	$L (2.10, 4.10)$	$L (8.20, 10.20)$	$L (5.70, 7.70)$	$L (3.90, 5.90)$	$L (15.50, 17.50)$
$L (1.30, 3.30)$	$L (0.20, 2.20)$	$L (3.70, 5.70)$	$L (3.20, 5.20)$	$L (- 0.20, 1.80)$	$L (4.80, 6.80)$
$L (3.20, 5.20)$	$L (3.20, 5.20)$	$L (11.60, 13.60)$	$L (1.10, 3.10)$	$L (2.20, 4.20)$	$L (7.50, 9.50)$
$L (2.10, 4.10)$	$L (4.10, 6.10)$	$L (12.30, 14.30)$	$L (3.50, 5.50)$	$L (0.40, 2.40)$	$L (6.30, 8.30)$

where b^* = -1.76 and β^* = [-0.16, - 0.04, - 0.18, 0.07, 0.08, 0.16, 0.20, - 0.13] ^T. The forecast value of D with two belief degrees be shown in Table 2.

According to model (29), every element in β is nonzero. However, USVR [23] obtains a different result. Using the same parameters and dataset to train USVR, the comparison of sorted | β| can be shown in Fig. 2. From Fig. 1, USVR has two zero β_i, it means that ULSSVR lacks sparseness compare with USVR.

Fig. 2

Comparison of sorted | β|.

Table 2

Forecast value

USVR	α = 0.90	α = 0.95
8.53	8.31	8.51
7.36	7.11	7.33

6.2 The uncertain abalone dataset

The uncertain abalone dataset has 4177 instances, each instance has a 5-dimensional uncertain vector as input vector and an uncertain variable as output. The output is weight of abalone, and each dimension in input vector is represent

sex of abalone (Sex),

longest shell measurement (Length),

diameter perpendicular to length (Diameter),

height with meat in shell (Height),

rings which +1.5 gives the age in years (Rings).

In this subsection, every model will be trained 10 times and the test set D which accounts for 10 percent of dataset is randomly selected for each time. In each training, after D and D^c are selected, the optimal parameters will be selected. Several methods for search parameters had been proposed, Runarsson and Sigurdsson [26] introduced asynchronous parallel evolution strategy to do this work. Li et al. [27] proposed quantum butterfly optimization algorithm (QBOA) to find hyper-parameters of a hybrid forecasting model, this algorithm used quantum computing to expand the ergodicity of the search and improve the original butterfly optimization algorithm. However, no universal methods and criterions are available to select these parameters of LSSVR at present. Smets et al. [28] found v-fold cross-validation is an accurate and autonomous way also with low test error for parameter selection of LSSVR. Therefore, the cross-validation method (CV) [23] be applied to find the optimal parameter, the average test error (ATE) [23] be metrics for CV. Then the model trained by D^c and compute the forecast value, the root mean squared error (RMSE) [23] be used to measure the generalization performance of models.

In parameter selection for ULSSVR, the average of 10 ATE(4) be shown in Fig. 3. Therefore, the optimal parameters in this dataset are kernel function is polynomial with degree 3 and C = 1. 1

Fig. 3

Parameter selection for ULSSVR¹.

Some uncertain regression models also be employed to process this dataset. These models use same D, D^c and procedure as ULSSVR, then compute RMSE of each model. The results and optimal parameters are shown in Table 3. From Table 3, two results can be drawing:

Generalization Performance

RMSE has the same dimension as data, and sensitive to large errors, thus it is commonly used as a standard to measure the generalization performance of machine learning models. Comparing RMSE of each model, in the uncertain abalone dataset, the ULSSVR has shown a superior performance in prediction accuracy compared to the USVR, uncertain linear model, uncertain polynomial model and two uncertain growth models. The lower RMSE value of 0.1097 (shown in Table 3) indicates that ULSSVR is state of the art, RMSE was 28.49% lower than USVR, which is crucial in many real-world applications. This advantage becomes even more significant when dealing with complex and noisy datasets.

Parameters

ULSSVR and USVR have same kernel function but different optimal C, the two model have analogous structure and ideology, while different in penalty function and sparseness. ULSSVR has less parameters than USVR, less time be wasted in parameter selection, but more time for features engineering.

Table 3

RMSE of each model

Model	Optimal parameters	RMSE
Johnson-Schumacher		0.4613
Logistic		0.4382
Linear		0.1825
Polynomial	Degree = 3	0.1572
USVR	Kernel = polynomial,	0.1534
	ɛ = 0.05, C = 20
ULSSVR	Kernel = polynomial,	0.1097
	C = 1

In computational performance, the memory footprint and the elapsed time of two models shown in Table 4. Based on the data in Table 4, the following conclusions can be drawn:

Elapsed Comparison

ULSSVR is faster than USVR in terms of elapsed. This means that when performing model training and prediction, ULSSVR takes less time and can complete tasks more quickly. This advantage is especially important in applications that deal with large datasets or require real-time responses. By reducing runtime, ULSSVR is able to increase the efficiency of data processing, leading to faster feedback and more efficient decision support for real-world applications.

Memory Footprint Comparison

ULSSVR has a significantly higher memory footprint than USVR, which generally means higher resource consumption and a potentially negative impact on computer performance. Therefore, when choosing to use ULSSVR, it is important to consider the computer hardware configuration and available memory capacity to ensure that the system can withstand the memory demands of model operation.

Applicable Scenario Analysis

Considering the advantages of ULSSVR in terms of running speed and the shortcomings of memory consumption, the following analysis of applicable scenarios can be derived: 1. ULSSVR may be more suitable for applications that require fast response and high real-time requirements, such as financial market forecasting or real-time control systems. Its fast runtime reduces latency and improves decision making. 2. USVR may be more advantageous for applications that have limited memory resources or need to operate in resource-constrained environments, such as embedded systems or mobile devices. Its lower memory footprint can help reduce the burden on the system and improve overall performance. 3. For applications with large-scale data processing and analysis, ULSSVR may be more suitable. Despite its higher memory footprint, it typically outperforms USVR when processing large data sets.

Tradeoffs and Choices

The choice to use ULSSVR or USVR needs to be weighed against the needs and limitations of the specific application. ULSSVR can be chosen if there is a high demand for model speed or if it needs to run with sufficient memory resources; USVR can be chosen if there are limited memory resources or if it needs to run in a resource-constrained environment. choosing according to different application scenarios and actual needs can better leverage the advantages of each and meet specific needs.

Table 4

Memory footprint and elapsed time

Model	Memory footprint (MB)	Elapsed time (s)
USVR	38.61	5.77
ULSSVR	755.53	3.39

7 Conclusion

ULSSVR with imprecise observations was proposed in this paper, which had the same data loss ‖ ω ‖² as USVR but different regularization loss and constraints. Due to these differences, a QP can be converted to a LS problem, the computational difficulty was reduced. However, reduced computational difficulty comes with increased space complexity. To address this problem, CR method, which is a low-memory-requirement method was introduced to ULSSVR. After the theoretical part of ULSSVR was completed, a small dataset was employed to show how to use ULSSVR for forecast and verifying properties which are lack sparseness than USVR and only three parameters need to be chosen a priori. The generalization performance of ULSSVR was revealed by the uncertain abalone dataset. In this numerical experiment, ULSSVR obtained the smallest RMSE value, indicating it had the best generalization performance among the uncertain regression models in the uncertain abalone dataset. Otherwise, memory footprint of USVR was lower than ULSSVR but ULSSVR was faster than USVR, ULSSVR was better suited to data science and internet.

In the future work, ULSSVR may be applied to multi-layer network. Kernel function in uncertainty theory may be researched in the future.

Footnotes

Acknowledgments

This research was supported by the Natural Science Foundation of China (Nos. 12061072 and 62162059) and the autonomous region’s key R&D plan project (No. 2022B01006).

In figure (a), poly-n means polynomial with degree n.

References

Suykens

and Vandewalle

, Least squares support vector machineclassifiers, Neural Processing Letters 9 (1999), 293–300.

Vapnik

, Kotz

, The Nature of Statistical Learning Theory, Springer Verlag, 1995.

Vapnik

, Golowich

and Smola

, Support vector method forfunction approximation, regression estimation, and signalprocessing, Advances in Neural Information Processing Systems 9 (1997), 281–287.

Suykens

, Van Gestel

, De Brabanter

, De Moor

, Vandewalle

, Least Squares Support Vector Machines, World Scientific, 2002.

Suykens

, Lukas

, Van

, De

, Vandewalle

, Least squares support vector machine classifiers: a large scale algorithm, in: Proceeding of the European Conference on Circuit Theory and Design (ECCTD’99), 2000, pp. 839–842.

Hestenes

and Stiefel

, Methods of conjugate gradients forsolving linear systems, Journal of Research of the NationalBureau of Standards 49 (1952), 409–436.

Suykens

and Vandewalle

, Recurrent least squares support vectormachines, IEEE Transactions on Circuits And Systems Part IFundamental Theory And Applications 47(7) (2000), 1109–1114.

, Chen

, Wang

and Lai

, Evolving least squares supportvector machines for stock market trend mining, IEEETransactions on Evolutionary Computation 13(1) (2009), 87–102.

Samsudin

, Saad

and Shabri

, River flow time series usingleast squares support vector machines, Hydrology and EarthSystem Sciences 15 (2011), 1835–1852.

10.

Zhang

, Li

, Wang

and Du

, A local semi-supervied enssemble learning strategy for the data-driven soft sensor of thepower prediction in wind power generation, Fuel 2(333) (2023), 35–47.

11.

Zadeh

, Fuzzy sets, Information & Control 8(3)(1965), 338–353.

12.

Zadeh

, Probability measures of fuzzy events, Journal of Mathematical Analysis and Applications 23(2) (1968), 421–427.

13.

Lin

and Wang

, Fuzzy support vector machines, IEEE Transactions on Neural Networks 13(2) (2002), 464–471.

14.

Xie

, Fuzzy least square support vector machine applied to detect damage for fiber smart structures, in: International Symposium on Intelligent Information Technology Application Workshops, 2008, pp. 383–386.

15.

Sun

and Pan

, Research and application of fuzzy leasts quare support vector machine regression, Computer Systems & Applications 23(8) (2014), 105–108.

16.

Liu

, Uncertainty Theory, 2nd edn, Springer Berlin Heidelberg, 2007.

17.

Liu

, Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty, Springer Berlin Heidelberg, 2010.

18.

Sheng

and Kar

, Some results of moments of uncertain variable through inverse uncertainty distribution, Fuzzy Optimization & Decision Making 14(1) (2015), 57–76.

19.

Yao

and Liu

, Uncertain regression analysis: anapproach for imprecise observations, Soft Computing 22(17) (2018), 5579–5582.

20.

Lio

and Liu

, Residual and confidence interval for uncertain regression model with imprecise observations, Journal of Intelligent and Fuzzy Systems 35 (2018), 2573–2583.

21.

, Qin

and Liu

, Uncertain support vector regression with imprecise observations, Journal of Intelligent & FuzzySystems 43(2022) (2022), 3403–3409.

22.

Liu

, Some research problems in uncertainy theory, Journal of Uncertain Systems 3(1) (2009), 3–10.

23.

Zhang

, Sheng

, Support vector regression with imprecise observations, Technical report, Xinjiang university, 2022.

24.

Golub

, Loan

, Matrix computations, Johns Hopkins University Press, 1996.

25.

Sogabe

, Sugihara

and Zhang

, An extension of the conjugateresidual method to nonsymmetric linear systems, Journal ofComputational and Applied Mathematics 226 (2009), 103–113.

26.

Runarsson

, Sigurdsson

, Model selection for support vector machines using an asynchronous parallel evolution strategy, in: Neural Networks and Signal Processing, 2003. Proceedings of the 2003 International Conference on, 2004, pp. 495–498.

27.

, Xu

, Geng

and Hong

, A ship motion forecastingapproach based on empirical mode decomposition method hybrid deeplearning network and quantum butterfly optimization algorithm, Nonlinear Dynamics 107 (2022), 2447–2467.

28.

Smets

, Verdonk

, Jordaan

, Evaluation of performance measures for SVR hyperparameter selection, in: International Joint Conference on Neural Networks, 2007, pp. 495–498.