Sage Journals: Discover world-class research

Abstract

Most works on Support Vector Regression (SVR) focus on kernel or loss functions, with the corresponding support vectors obtained using a fixed-radius $ε$ -tube, affording good predictive performance on datasets. However, the fixed radius limitation prevents the adaptive selection of support vectors according to the data distribution characteristics, compromising the performance of the SVR-based methods. Therefore, this study proposes an “Alterable $ε_{i}$ -Support Vector Regression” ( $A ε_{i}$ -SVR) model by applying a novel $ε$ , named “Alterable $ε_{i}$ ,” to the SVR model. Based on the data point sparsity at each location, the model solves the different $ε_{i}$ at the corresponding position, and thus zoom-in or zoom-out the $ε$ -tube by changing its radius. Such a variable $ε$ -tube strategy diminishes noise and outliers in the dataset, enhancing the prediction performance of the $A ε_{i}$ -SVR model. Therefore, we suggest a novel non-deterministic algorithm to iteratively solve the complex problem of optimizing $ε_{i}$ associated with every location. Extensive experimental results demonstrate that our approach can improve the accuracy and stability on simulated and real data compared with the baseline methods.

Keywords

Support Vector Regression non-deterministic algorithm

Introduction

Support Vector Machines (SVM) is a powerful machine learning technique based on Statistical Learning and a promising tool to overcome the function approximation problem based on the Structural Risk Minimization (SRM) principle. As an advanced supervised learning algorithm, SVM conducts data classification by constructing a hyperplane $ω^{T} x + b = 0, ω \in R^{n}, b \in R .$ After decades of development, it has reached a high level of maturity considering its theoretical support^1–5, and has been widely employed in various regression estimation tasks, such as Cancer prevention and treatment,^6–8 renewable energy exploitation prediction,^9–11 and financial time series data.^12–14 This regression scheme is termed Support Vector Regression (SVR) and performs better than current machine learning algorithms. SVR solves practical problems by introducing prediction equation $\hat{f} (x) = ω^{T} x + b, ω \in R^{n}, b \in R$ , where the predictor predicts new observations in the feature space and is defined as $\hat{f} (x) = ω^{T} ϕ (x) + b$ , where $ϕ ()$ is a transformation function in the covariate space. By introducing SVR kernel functions, it presents an excellent nonlinear function regression capability and therefore has been successfully applied to nonlinear prediction systems.

Over the past few decades, various studies have focused on the kernel function $κ (x_{i}, x_{j}) = 〈 ϕ (x_{i}), ϕ (x_{j}) 〉$ and loss function $L (y - \hat{f} (x))$ . Regarding the former, it aims to avoid directly computing high-dimensional inner products and improve the algorithm’s generalization performance.^15–17 The literature presents several kernel functions such as the random radial basis function (RRBF) kernel,¹⁸ the composite wavelet kernel,¹⁹ the Hermite orthogonal polynomial kernel,²⁰ and the robust low-rank multiple kernels.²¹ Through these kernel functions, the SVR model can transform the original data into the nonlinear finite-dimensional kernel space to avoid the “Dimensional Disaster” and improve classification accuracy and prediction.

Considering the conventional loss functions, these include Huber loss function-based SVR,³ quadratic loss function-based $LS$ -SVR,²² Maximum Likelihood Optimal and Robust Support Vector Regression model,²³ and $L_{1}$ -norm SVR.²⁴

Motivated by $ε$ -SVR, Pritam²⁵ introduced an $ε - PSVR$ model utilizing an $ε$ -penalty loss function that offers different penalty rates to the data points depending upon whether they lie inside or outside the $ε$ -tube. Moreover, Gupta and Gupta²⁶ proposed the asymmetric $v$ -twin support vector regression (Asy- $v$ -TSVR) that finds two non-parallel hyperplanes to construct the corresponding decision functions given two different $ε$ -insensitive loss functions, which are the lower and upper bound functions. Besides, Balasundaram and Meena²⁷ presented an $ε$ -insensitive asymmetric Huber function-based $ε$ -AHSVR model, where an $ε$ -insensitive loss function is integrated with the Huber loss function to compensate for the high complexity of the latter. Cheng and Lu²⁸ developed an $ε$ -insensitive square loss function-based Bayesian $ε$ -SVR model that adopts the minimizing structural risk principle through the $ε$ -insensitive square loss function while provides point-by-point probability predictions and allows determining the optimal hyperparameters by maximizing the Bayesian model evidence.

Most innovations on loss functions rely on the $ε$ -insensitive loss functions because, among all points in the training set, the $ε$ -insensitive loss function ignores the points inside the $ε$ -tube. Therefore, the prediction model is essentially determined by those support factors located at the edges or outside the tube, enhancing the generalization ability of $ε$ -SVR. Here, the $ε$ -insensitive loss function¹ is given by:

$L_{ε} = {\begin{matrix} 0, & if | y_{i} - f (x_{i}) | \leq ε, \\ | y_{i} - f (x_{i}) | - ε, & otherwise . \end{matrix}$ (1)

where $ε >$ 0. Figure 1 shows some import loss function. Employing $ε$ in the $ε$ -SVR enhances robustness to errors with the error tolerance corresponding to each position being $ε$ . Such an approach allows the solution vector of the $ε$ -SVR model to become sparse.

Figure 1.

Loss function corresponding to several different models.

However, these new loss functions only focus on the $ε$ -tube radius value or optimizing the hyper-parameter to minimize the empirical risk. However, when complex datasets are involved, a single value of $ε$ cannot obtain the support vectors at different locations of high-dimensional data. Indeed, an $ε$ -tube with a fixed radius limits the choice of support vectors for each position, prohibiting it from utilizing the full information of the training set.

To address the above challenges, this work proposes the‘Alterable $ε_{i}$ -Support Vector Regression’ ( $A ε_{i}$ -SVR) based on an alterable $ε$ -insensitive loss function. Similar to the standard $ε$ -SVR, the $A ε_{i}$ -SVR exploits the loss function computed by the $Alterable$ $ε_{i}$ to measure the empirical risk with a regularization term $\frac{1}{2} w^{T} w$ . This loss function is given by:

$L_{A ε} = {\begin{matrix} 0, & if | y_{i} - f (x_{i}) | \leq ε_{i}, \\ | y_{i} - f (x_{i}) | - ε_{i}, & otherwise, \end{matrix}$ (2)

where $ε_{i} >$ 0, i = 1,2,…,n. The value of Alterable- $ε_{i}$ depends on the distribution sparsity of the data points’ location, allowing $A ε_{i}$ -SVR to capture support vectors for each position adaptively. Figure 2 illustrates proposed ε _i loss function with different value of ε.

Figure 2.

The proposed alterable $ε_{i}$ loss function with different value of $ε$ .

The $ε$ -tube constraints on standard $ε$ -SVR and $A ε_{i}$ -SVR are $| y_{i} - (ω^{T} x_{i} + b) | \leq ε + ξ_{1}^{i}$ and $| y_{i} - (ω^{T} x_{i} + b) | \leq ε_{i} + ξ_{1}^{i}$ , respectively. Hence, this paper introduces the concept of adaptive $ε_{i}$ , which affords $A ε_{i}$ -SVR to exploit the information provided by the training set and thus adapt better to the data distribution characteristics and improve the prediction effect. Although the kernel function can map the data to a high-dimensional Hilbert space based on statistical learning and effectively dealing with high-dimensional data, the hyper-plane is not flexible enough to fit every point in the space when making predictions in the high-dimensional feature space. Therefore, the hyper-surface is obtained after training the $A ε_{i}$ -SVR model on the training set, which affords a better prediction performance than the hyper-plane.

However, the hyper-plane’s overall distribution varies with the value of $ε_{i}$ at each location, posing a large computational burden to choose and solve $ε_{i}$ . Hence, to overcome these challenges, this paper suggests a novel non-deterministic algorithm for determining $ε_{i}$ .

Specifically, the proposed SVR model improves the previously discussed methods, including the loss function (standard $ε$ -SVR expression) and the kernel function (RBF kernel function) by introducing many penalty values $ε_{i}$ through an iterative process, providing a variable $ε$ -tube. Opposing the standard $ε$ -tube, the variable $ε$ -tube based $A ε_{i}$ -SVR model allows a different penalty depending on the data distribution that enhances our method’s flexibility in removing outliers, which in turn mitigates the noise interference on the prediction curve, and finally achieves better prediction results.

The remainder of this paper is organized as follows. Section 2 compares the proposed $A ε_{i}$ -SVR model against the $ε$ -SVR-based model for both linear and nonlinear scenarios. Section 3 derives $A ε_{i}$ -SVR, and Section 4 introduces a novel iterative fluctuation self-selection algorithm. Sections 5 and 6 present and discuss the key comparisons based on real-world benchmarks, and finally, Section 7 concludes this work and provides some future research directions.

Theory of alterable $ε_{i}$ -support vector regression

This section shortly reviews the standard $ε$ -Support vector regression models and compares them with the Alterable $A ε_{i}$ -SVR in both linear and nonlinear scenarios. Given a training set $T = {(x_{i}, y_{i}); x_{i} \in R^{n}, y_{i} \in R, i = 1, 2, \dots, n},$ C is a pre-specified value, b is the offset item, and $ξ^{+} = {(ξ_{1}^{+}, ξ_{2}^{+} \dots ξ_{m}^{+})}^{T}$ and $ξ^{-} = {(ξ_{1}^{-}, ξ_{2}^{-} \dots ξ_{m}^{-})}^{T}$ are slack variables that represent the upper and lower constraints on the system outputs.

Linear support vector regression model

To estimate the linear function $f (x) = ω^{T} x + b$ , where $ω$ is the weight vector and b is the offset term, the standard $ε$ -Support Vector Regression minimizes:

$min_{(ω, ξ^{+}, ξ^{-})} \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{m} (ξ_{i}^{+} + ξ_{i}^{-}),$

subject to

$\begin{matrix} y_{i} - (ω^{T} x_{i} + b) \leq ε + ξ_{i}^{+}, (i = 1, 2, \dots, m), \\ (ω^{T} x_{i} + b) - y_{i} \leq ε + ξ_{i}^{-}, (i = 1, 2, \dots, m), \\ ξ_{i}^{+} \geq 0, ξ_{i}^{-} \geq 0, (i = 1, 2, \dots, m), \end{matrix}$ (3)

The $A ε_{i}$ -SVR model minimizes

$min_{(ω, ξ^{+}, ξ^{-})} \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{m} (ξ_{i}^{+} + ξ_{i}^{-}),$

subject to

$\begin{matrix} y_{i} - (ω^{T} x_{i} + b) \leq ε_{i} + ξ_{i}^{+}, (i = 1, 2, \dots, m), \\ (ω^{T} x_{i} + b) - y_{i} \leq ε_{i} + ξ_{i}^{-}, (i = 1, 2, \dots, m), \\ ξ_{i}^{+} \geq 0, ξ_{i}^{-} \geq 0, (i = 1, 2, \dots, m) . \end{matrix}$ (4)

Figure 3 illustrates an intuitive geometric interpretation of the linear SVR. Normally, the standard $ε$ -SVR model has a tube with a fixed $ε$ radius, which, although avoids over-fitting, it cannot obtain all support vectors. Indeed, the fixed radius $ε$ -tube provides the same tolerance for all data points, either closer or further away from the tube wall, enhancing the $ε$ -SVR model’s insensitivity to outliers. To overcome this problem, we propose the Alterable- $ε_{i}$ -SVR model, which, based on the training data set, captures the appropriate support vector machine for each position by iteratively obtaining the distinct $ε$ values per position. When the location with a dense data point distribution is involved, the $A ε_{i}$ -SVR model increases the number of support vectors at that location by shortening the $ε$ -tube radius. On the contrary, at a location with a sparse data point distribution, our method increases the $ε$ -tube radius, decreasing the number of support vectors. This way, the hyper-plane orientation and displacement can be changed to fit the data points better and exhibit better predictive performance.

Figure 3.

Comparison of linear support vector regression models: (a) $ε$ -SVR (kernel = Linear) and (b) $A ε_{i}$ -SVR (kernel = Linear).

Non-linear support vector regression model

To estimate the linear function $f (x) = ω^{T} ϕ (x) + b,$ where $ϕ () : R^{n} \to H$ is a non-linear mapping from the input space to higher dimensional Hilbert space, the standard $ε$ -Support Vector Regression minimizes

$min_{(ω, ξ^{+}, ξ^{-})} \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{m} (ξ_{i}^{+} + ξ_{i}^{-}),$

subject to

$\begin{matrix} y_{i} - (ω^{T} ϕ (x_{i}) + b) \leq ε + ξ_{i}^{+}, (i = 1, 2, \dots, m), \\ (ω^{T} ϕ (x_{i}) + b) - y_{i} \leq ε + ξ_{i}^{-}, (i = 1, 2, \dots, m), \\ ξ_{i}^{+} \geq 0, ξ_{i}^{-} \geq 0, (i = 1, 2, \dots, m), \end{matrix}$ (5)

The $A ε_{i}$ -SVR model minimizes

$min_{(ω, ξ^{+}, ξ^{-})} \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{m} (ξ_{i}^{+} + ξ_{i}^{-}),$

subject to

$\begin{matrix} y_{i} - (ω^{T} ϕ (x_{i}) + b) \leq ε_{i} + ξ_{i}^{+}, (i = 1, 2, \dots, m), \\ (ω^{T} ϕ (x_{i}) + b) - y_{i} \leq ε_{i} + ξ_{i}^{-}, (i = 1, 2, \dots, m), \\ ξ_{i}^{+} \geq 0, ξ_{i}^{-} \geq 0, (i = 1, 2, \dots, m) . \end{matrix}$ (6)

The geometric interpretation of the nonlinear SVR is presented in Figure 4(a), highlighting that $ε$ -SVR performs well on some data points but poorly on others. This is probably because, in this model, all data points are treated equally and have the same tolerance due to the limitation of the fixed-radius $ε$ -tube. Nevertheless, real data sets tend to be unevenly distributed, with both sparsely and densely distributed data points. Limiting the fixed radius prevents adaptively selecting the support vectors according to the distribution characteristics of the data, compromising the predictive performance of $ε$ -SVR.

Figure 4.

Comparison of non-linear support vector regression models: (a) $ε$ -SVR (kernel = RBF) and (b) $A ε_{i}$ -SVR (kernel = RBF).

Considering $A ε_{i}$ -SVR, in Figure 4(b), our model distinguishes the locations with sparse and dense data point distribution and calculates the distinct tolerances at the different positions so that the $ε$ -tube radius changes depending on the location. Moreover, changing the $ε$ -tube radius allows the prediction curve to be more flexible when dealing with datasets, thus producing better prediction results. It should be noted that this section presents only the schematic diagram of $A ε_{i}$ -SVR, and its performance on fitting the real data set is evaluated in Section 5.

Solution of alterable $ε_{i}$ -support vector regression

Given a training set $T = (x_{i}, y_{i}) : x_{i} \in R^{n}, y_{i} \in R, i = 1, 2, \dots, m, ω \in R^{n}$ is the weight vector, b is a real constant, and $ξ^{+} = {(ξ_{1}^{+}, ξ_{2}^{+} \dots ξ_{m}^{+})}^{T}$ , $ξ^{-} = {(ξ_{1}^{-}, ξ_{2}^{-} \dots ξ_{m}^{-})}^{T}$ are slack variables that represent the upper and lower bound constraints of the system output, and C > 0 is a pre-specified value. To solve equation (4), we formulate Lagrange as follows:

$\begin{matrix} L_{P} (ω, b, ξ^{+}, ξ^{-}, α, μ) \\ = \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{m} (ξ_{i}^{+} + ξ_{i}^{-}) - \sum_{i = 1}^{m} (μ^{+} ξ_{i}^{+} + μ^{-} ξ_{i}^{-}) \\ - \sum_{i = 1}^{m} α_{i}^{+} (ε_{i} + ξ_{i}^{+} + y_{i} - ω x_{i} - b) \\ - \sum_{i = 1}^{m} α_{i}^{-} (ε_{i} + ξ_{i}^{-} - y_{i} + ω x_{i} + b), \end{matrix}$ (7)

where $α_{i}^{+} \geq 0$ and $α_{i}^{-} \geq 0$ (i = 1,2,…,m) are the Lagrange multipliers.

$\frac{\partial L_{P}}{\partial ω} = ω - \sum_{i = 1}^{m} (α_{i}^{-} - α_{i}^{+}) x_{i} = 0,$ (8)

$\frac{\partial L_{P}}{\partial b} = \sum_{i = 1}^{m} (α_{i}^{+} - α_{i}^{-}) = 0,$ (9)

$\frac{\partial L_{P}}{\partial ξ_{i}^{+}} = C - μ_{i}^{+} - α_{i}^{+} = 0,$ (10)

$\frac{\partial L_{P}}{\partial ξ_{i}^{-}} = C - μ_{i}^{-} - α_{i}^{-} = 0,$ (11)

$ω = \sum_{i = 1}^{m} (α_{i}^{-} - α_{i}^{+}) x_{i} .$ (12)

According to the above KKT conditions, the Wolfe dual of the primal problem equation (4) can be obtained as follows:

$\begin{matrix} max_{α^{-}, α^{+}} - \frac{1}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{m} (α_{i}^{-} - α_{i}^{+}) (α_{j}^{-} - α_{j}^{+}) x_{i} x_{j} \\ - \sum_{i = 1}^{m} ε_{i} (α_{i}^{-} + α_{i}^{+}) + \sum_{i = 1}^{m} y_{i} (α_{i}^{-} - α_{i}^{+}), \end{matrix}$

subject to

$\sum_{i = 1}^{m} (α_{i}^{-} - α_{i}^{+}) = 0, α_{i}^{-}, α_{i}^{+} \in [0, C] .$ (13)

If there is a nonlinear relationship between y and x, then we can linearize it by mapping x to a higher dimensional feature space²⁹ through the mapping relationship $ϕ : R^{n} \to$ $H$ . The estimated regressor is given by:

$f (x) = ω^{T} ϕ (x) + b .$ (14)

Equation (13) is rewritten as

$\begin{matrix} max_{α^{-}, α^{+}} & - \frac{1}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{m} (α_{i}^{-} - α_{i}^{+}) (α_{j}^{-} - α_{j}^{+}) ϕ (x_{i}) ϕ (x_{j}) \\ - \sum_{i = 1}^{m} ε_{i} (α_{i}^{-} + α_{i}^{+}) + \sum_{i = 1}^{m} y_{i} (α_{i}^{-} - α_{i}^{+}), \end{matrix}$

subject to

$\sum_{i = 1}^{m} (α_{i}^{-} - α_{i}^{+}) = 0, α_{i}^{-}, α_{i}^{+} \in [0, C]$ (15)

According to the Mercer condition, $ϕ (x_{i}) ϕ (x_{j})$ can convert to a positive definite kernel $κ (x_{i}, x_{j}) .$

$κ (x_{i}, x_{j}) = ϕ (x_{i}) ϕ (x_{j})$ (16)

Given the excellent fitting properties of the radial basis function (RBF), we employ it to define the kernel function $k (x_{i}, x_{j})$ . An RBF kernel function can be expressed as:

$κ (x_{i}, x_{j}) = \exp (- \frac{{‖ x_{i} - x_{j} ‖}^{2}}{2 σ^{2}}) .$ (17)

This transformation reduces the computational complexity of the optimization problems, and therefore the problem can be reformulated by equation (16) as follows:

$K = [\begin{matrix} κ (x_{1}, x_{1}) & \dots & κ (x_{1}, x_{m}) \\ ⋮ & ⋱ & ⋮ \\ κ (x_{m}, x_{1}) & \dots & κ (x_{m}, x_{m}) \end{matrix}],$ (18)

$A^{-} = [\begin{matrix} α_{1}^{-} \\ ⋮ \\ α_{m}^{-} \end{matrix}], A^{+} = [\begin{matrix} α_{1}^{+} \\ ⋮ \\ α_{m}^{+} \end{matrix}],$ (19)

$E = [\begin{matrix} ε_{1} \\ ⋮ \\ ε_{m} \end{matrix}], y = [\begin{matrix} y_{1} \\ ⋮ \\ y_{m} \end{matrix}],$ (20)

The above formula in the given computer calculation form becomes:

$max_{α^{-}, α^{+}} - \frac{1}{2} {[\begin{matrix} A^{-} \\ A^{+} \end{matrix}]}^{T} [\begin{matrix} K & - K \\ - K & K \end{matrix}] [\begin{matrix} A^{-} \\ A^{+} \end{matrix}] + {[\begin{matrix} E - y \\ E + y \end{matrix}]}^{T} [\begin{matrix} A^{-} \\ A^{+} \end{matrix}],$

subject to

${[\begin{matrix} H \\ - H \end{matrix}]}^{T} [\begin{matrix} A^{-} \\ A^{+} \end{matrix}] = 0, H = {[\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}]}_{m \times 1}, α_{i}^{-}, α_{i}^{+} \in [0, C],$ (21)

Equation (21) is the Linear Programing Problem (LPP), where $α_{i}^{-}$ and $α_{i}^{+}$ denote the Lagrange multipliers estimated when the condition $ε_{i}$ and the constant C are given. Here, we use the quadratic programing package in Python to solve $α_{i}^{-}$ and $α_{i}^{+}$ . Using equation (14), the $A ε_{i}$ -SVR function can be represented as:

$f (x) = \sum_{i = 1}^{m} (α_{i}^{-} - α_{i}^{+}) κ (x_{i}, x_{j}) + b .$ (22)

It is evident that the influence of $ε$ on the support vector has been transformed into the influence on the value of $(α_{i}^{-} - α_{i}^{+})$ . Concerning the training set, the samples in the sample set $S = {(x_{i}, y_{i}) : x_{i} \in R^{n}, y_{i} \in R, i = 1, 2, \dots, s}$ that satisfy the condition $(α_{i}^{-} - α_{i}^{+}) \neq 0$ are used as support vectors for the SVR and must locate on the edges or outside the tube. Here, we alter $ε_{i}$ at each position in the training set and then change the number of support vectors for the counterpart location. In this case, given the training set, the adaptive $ε_{i}$ is computed through the iterative algorithm (Algorithm 1) to determine the best support vectors at every position. Based on the KKT condition, each input dimension b is described as:

$b_{i} = y_{i} - \sum_{i = 1}^{s} (α_{i}^{-} - α_{i}^{+}) κ (x_{i}, x_{j}) + ε_{i},$ (23)

Finally, we take the average value of b obtained by all the support vectors and obtain:

$b = \frac{\sum_{i = 1}^{s} b_{i}}{s} .$ (24)

Algorithm 1: The iterative fluctuation self-selection algorithm
Input: $D a t a s e t (x_{i}, y_{i})$ , parameters $E_{0}$ , $E_{original}$ , $E_{up}$ , $E_{md}$ , $E_{dow}$ , $E_{best}$ , in the first iteration $E_{best} = E_{0} = (ε, ε, \dots, ε)$ , $i = 1, \dots, N$ , repeat times M, constant C, kernel parameter $σ$ , fluctuation $F (0 < F < 1)$ Output: $E_{best} = (ε_{1}^{}, ε_{2}^{}, \dots, ε_{i}^{*})$ 1 According to a certain proportion P, randomly divided into a function of training set $S_{train} = {x_{i}, y_{i}}_{i = 1}^{P} and test set S_{test} = {x_{j}, y_{j}}_{j = 1}^{1 - P}, x_{i}, x_{j} \in R^{d}; y_{i}, y_{j} \in R$ 2 for $i = 1$ Mdo3 $Calculate e_{i} = y_{i} - f (x)_{E = E_{best}}$ 4 for $i = 1$ N do5 if $\| e_{i} / y_{i} \| > Q$ then6 $Calculate the ε_{i}^{up} = (1 + F) \cdot e_{i}, ε_{i}^{dow} = (1 - F) \cdot e_{i} and ε_{i}^{med} = e_{i}$ 7 else8 $ε_{i}^{up} = ε_{i}^{dow} = ε_{i}^{md} = ε_{i}^{best}$ 9 end10 end11 Calculate $f (x)_{E = E_{up}}, f (x)_{E = E_{md}}$ , and $f (x)_{E = E_{dow}}$ according to Eq12 Calculate the training $MA E_{E = E_{up}}, MA E_{E = E_{md}} and MA E_{E = E_{dow}}$ 13 if $MA E_{E = E_{dow}} < MA E_{E = E_{up}} andMA E_{E = E_{dow}} < MA E_{E = E_{md}}$ then14 $E_{best} = E_{dow}$ 15 else if $MA E_{E = E_{dow}} \geq MA E_{E = E_{up}} and MA E_{E = E_{up}} < MA E_{E = E_{md}}$ then16 $E_{best} = E_{up}$ 17 else18 $E_{best} = E_{med}$ 19 end20 end

Algorithm 1: The iterative fluctuation self-selection algorithm

Input:

D a t a s e t (x_{i}, y_{i})

, parameters

E_{0}

E_{original}

E_{up}

E_{md}

E_{dow}

E_{best}

, in the first iteration

E_{best} = E_{0} = (ε, ε, \dots, ε)

i = 1, \dots, N

, repeat times M, constant C, kernel parameter

σ

, fluctuation

F (0 < F < 1)

Output:

E_{best} = (ε_{1}^{*}, ε_{2}^{*}, \dots, ε_{i}^{*})

1 According to a certain proportion P, randomly divided into a function of training set

S_{train} = {x_{i}, y_{i}}_{i = 1}^{P} and test set S_{test} = {x_{j}, y_{j}}_{j = 1}^{1 - P}, x_{i}, x_{j} \in R^{d}; y_{i}, y_{j} \in R

2 for

i = 1

Mdo3

Calculate e_{i} = y_{i} - f (x)_{E = E_{best}}

4 for

i = 1

N do5 if

| e_{i} / y_{i} | > Q

then6

Calculate the ε_{i}^{up} = (1 + F) \cdot e_{i}, ε_{i}^{dow} = (1 - F) \cdot e_{i} and ε_{i}^{med} = e_{i}

7 else8

ε_{i}^{up} = ε_{i}^{dow} = ε_{i}^{md} = ε_{i}^{best}

9 end10 end11 Calculate

f (x)_{E = E_{up}}, f (x)_{E = E_{md}}

, and

f (x)_{E = E_{dow}}

according to Eq12 Calculate the training

MA E_{E = E_{up}}, MA E_{E = E_{md}} and MA E_{E = E_{dow}}

13 if

MA E_{E = E_{dow}} < MA E_{E = E_{up}} andMA E_{E = E_{dow}} < MA E_{E = E_{md}}

then14

E_{best} = E_{dow}

15 else if

MA E_{E = E_{dow}} \geq MA E_{E = E_{up}} and MA E_{E = E_{up}} < MA E_{E = E_{md}}

then16

E_{best} = E_{up}

17 else18

E_{best} = E_{med}

19 end20 end

Iterative fluctuation self-selection algorithm

The developed framework comprises three steps. First, the data is divided into K subsets, and we randomly select K–1 subsets as training samples and the rest are the test samples, while the kernel parameter $σ$ , adaptive $ε_{i}$ , and constant C are given. The second step determines the starting position of the iteration by introducing the standard $ε$ -SVR, after which $e_{i} = y_{i} - f (x_{i})$ is calculated. In this case, data points outside the tube should satisfy $e_{i} \geq ε + ξ^{+}$ . Here F is the fluctuation coefficient, which provides a fluctuation range for the e value and increases the randomness of each e value to prevent overfitting. Moreover, we set a criterion Q value (Q = 0.1), when $| e_{i} / y_{i} | > Q$ , and $e_{i}$ is considered as the corresponding $ε_{i}$ for $A ε_{i}$ -SVR. After screening, the $E_{original} = (ε_{1}, ε_{2}, \dots, ε_{i})$ can be obtained. The last step is to acquire the optimal $E_{best} = (ε_{1}^{best}, ε_{2}^{best}, \dots, ε_{i}^{best})$ through the iterative fluctuation self-selection algorithm, which is summarized in Algorithm 1.

In Algorithm 1, MAE is used as the reliability index, and the fluctuation $F (0 < F < 1)$ is exploited to improve the model’s generalization ability. By appropriately setting $F$ , $E_{UP}$ , and $E_{dow}$ are generated, and then we find the best value $E_{best}$ among $E_{UP}$ , $E_{MD}$ , and $E_{dow}$ . If the F value is set too large, we may miss a satisfactory $E_{best}$ , and if it is set too small, the iteration process will last longer.

Figure 5 illustrates the relationship between the reliability index and the number of algorithm calls, revealing that as the number of iterations increases, the prediction accuracy of the $A ε_{i}$ -SVR model improves.

Figure 5.

Summary of the steps performed by the iterative fluctuation self-selection algorithm.

Applications on simulation and real data

This section evaluates the proposed approach on seven simulated and seven real-world benchmark datasets. For all the subsequent experiments, we set $F = 0.1$ , and we employ the RBF kernel $κ (x_{i}, x_{j}) = \exp (- \frac{{‖ x_{i} - x_{j} ‖}^{2}}{2 σ^{2}})$ , where $σ$ is the kernel parameter in all regression models, and its value is searched in the set ${0.1, 0.3, 0.5, 0.7, 0.9, 1, 3, 5, 7, 10, 30}$ . Additionally, the values of parameter $C$ and $ε$ for all SVR models are obtained by searching in the sets ${0.01, 0.1, 1, 10, 100, 1000}$ and ${0.001, 0.01, 0.03, 0.05, 0.07, 0.09, 0.1, 1, 2}$ , respectively. Moreover, the following three common criteria are presented to verify the effectiveness of the $A ε_{i}$ -SVR model.

$SSE / SST = \frac{\sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{m} {(y_{i} - {\bar{y}}_{i})}^{2}},$ (25)

$RMSE = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}},$ (26)

$MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | .$ (27)

Simulation data application

We generate seven simulated datasets randomly using Python (http://www.python.org/) to evaluate the proposed method’s performance and behavior. Accordingly, we employ seven datasets, that is, 1–2, 3–5, 6, and 7, from Anand et al.,²⁵ Scorent,³⁰ van der Laan,³¹ and Meier et al,³² respectively. Regarding the regression frameworks, we consider the covariates $X = (X_{1} \dots X_{n})$ . In dataset 1–2, each predictor $X_{i}$ is from $U [- 10, 10]$ while in dataset 3–7, each predictor $X_{i}$ is from $N (0, 1)$ , and $X_{i}$ is converted to ${\tilde{X}}_{i}$ through the function ${\tilde{X}}_{i} = 2 (X_{i} - 0.5), i = 1, \dots, n$ . All datasets are described below:

$• Dataset 1 : Y = \frac{\sin (X)}{X} + N (0, 0.1)$

$\begin{matrix} • Dataset 2 : Y = | \frac{X - 1}{4} | + | \sin (π (1 + \frac{X - 1}{4})) | \\ + 1 + N (0, 0.1) \end{matrix}$

$• Dataset 3 : Y = {\tilde{X}}_{1}^{2} + e^{- {\tilde{X}}_{2}^{2}} + N (0, 0.25)$

$\begin{matrix} • Dataset 4 : Y = {\tilde{X}}_{1} {\tilde{X}}_{2} + {\tilde{X}}_{3}^{2} - {\tilde{X}}_{4} {\tilde{X}}_{7} + {\tilde{X}}_{5} {\tilde{X}}_{8} \\ - {\tilde{X}}_{6}^{2} + N (0, 0.5) \end{matrix}$

$\begin{matrix} • Dataset 5 : Y = - \sin ({\tilde{X}}_{1}) + {\tilde{X}}_{2}^{2} + {\tilde{X}}_{3} - e^{- {\tilde{X}}_{4}^{2}} \\ + N (0, 0.5) \end{matrix}$

$\begin{matrix} • Dataset 6 : Y = {\tilde{X}}_{1}^{2} + {\tilde{X}}_{2}^{2} {\tilde{X}}_{3} e^{- | {\tilde{X}}_{4} |} + {\tilde{X}}_{6} \\ - {\tilde{X}}_{5} + N (0, 0.5) \end{matrix}$

$• Dataset 7 : Y = {\tilde{X}}_{1} + 3 {\tilde{X}}_{2}^{2} - 2 e^{- {\tilde{X}}_{3}} + {\tilde{X}}_{4}$

All datasets contain 400 training and 100 test samples. Table 1 reports the performance of the $A ε_{i}$ -SVR, $ε$ -SVR, and $LS$ -SVR models under several evaluation metrics. SSE/SST, RMSE, and MAE were calculated on seven simulation data sets with different data distribution characteristics, and the optimal parameter settings for $ε$ -SVR per dataset are listed in Table 1. To compare the performance more objectively, the values of C and $σ$ of the competitor models are aligned with the $ε$ -SVR. Table 1 highlights that without changing the parameters, the performance of the proposed $A ε_{i}$ -SVR model presents a significant improvement in the SSE/SST, RMSE, and MAE metrics compared with the $ε$ -SVR and $LS$ -SVR models. Specifically, $A ε_{i}$ -SVR obtains the lowest RMSE value (0.103) in the artificial Dataset 1, inferring that the proposed $A ε_{i}$ -SVR model consistently outperforms the $ε$ -SVR and $LS$ -SVR models, demonstrating an appealing generalization ability.

Table 1.

Performance comparison of $A ε_{i}$ -SVR, $ε$ -SVR, and $LS$ -SVR models on the lation datasets.

Datasets	Regressor	SSE/SST	RMSE	MAE	$(σ, C, ε)$
Dataset 1	$ε$ -SVR	0.080	0.107	0.086	(0.1, 1, 0.2)
	$LS$ -SVR	0.081	0.108	0.085	(0.1, 1, –)
	$A ε_{i}$ -SVR	0.074	0.103	0.081	(0.1, 1, –)
Dataset 2	$ε$ -SVR	0.321	2.210	1.760	(0.1, 10, 1)
	$LS$ -SVR	0.320	2.207	1.735	(0.1, 10, –)
	$A ε_{i}$ -SVR	0.317	2.194	1.729	(0.1, 10, –)
Dataset 3	$ε$ -SVR	0.597	0.483	0.376	(0.7, 10, 0.5)
	$LS$ -SVR	0.582	0.477	0.370	(0.7, 10, –)
	$A ε_{i}$ -SVR	0.576	0.474	0.366	(0.7, 10, –)
Dataset 4	$ε$ -SVR	0.377	0.532	0.419	(0.1, 10, 0.6)
	$LS$ -SVR	0.379	0.534	0.427	(0.1, 10, –)
	$A ε_{i}$ -SVR	0.374	0.529	0.415	(0.1, 10, –)
Dataset 5	$ε$ -SVR	0.289	0.489	0.383	(0.1, 10, 0.3)
	$LS$ -SVR	0.292	0.492	0.388	(0.1, 10, –)
	$A ε_{i}$ -SVR	0.283	0.482	0.377	(0.1, 10, –)
Dataset 6	$ε$ -SVR	0.281	0.565	0.461	(0.1, 10, 0.01)
	LS-SVR	0.285	0.569	0.464	(0.1, 10, –)
	$A ε_{i}$ -SVR	0.273	0.558	0.453	(0.1, 10, –)
Dataset 7	$ε$ -SVR	0.199	0.514	0.422	(0.1, 100, 0.6)
	$LS$ -SVR	0.193	0.505	0.412	(0.1, 100, –)
	$A ε_{i}$ -SVR	0.188	0.499	0.408	(0.1, 100, –)

Next, we calculated the reduction in SSE/SST, RMSE, and MAE for $A ε_{i}$ -SVR relative to the $ε$ -SVR and $LS$ -SVR models. The corresponding results are illustrated in Figure 6, highlighting that the proposed model has an enhanced predictive performance compared to the competitor models, with the most obvious improvement being in the SSE/SST index. In Dataset 1, the SSE/SST value of $A ε_{i}$ -SVR is 0.074, achieving an improvement of 7.5% and 8.6% relative to $ε$ -SVR and $LS$ -SVR, respectively. Additionally, Figure 6 reveals that the proposed model affords an improved prediction performance over the competitor models, reflecting the importance of “Alterable $ε_{i}$ .” Therefore, the experiments imply that assigning variable $ε$ depending on the data distribution characteristics is superior to treating each data point in the training set equally.

Figure 6.

Percentage reduction of SSE/SST, RMSE, and MAE.

Real data application

We also apply our proposed method to real benchmark datasets, namely Traizines, Servo, NO₂, Chwirut, Auto Mpg, Nelson, and Boston Housing. These datasets were downloaded from the UCI repository³³ (https://archive.ics.uci.edu/address) and NLSRD (https://www.itl.nist.gov/div898/strd/nls/nls_main.shtml/address), and are commonly used to evaluate regression methods. Further information on the seven datasets is reported in Table 2. In this trial, we challenged the proposed SVR against some existing SVR models, such as the $ε$ -SVR model, the Huber SVR model, and the $ε$ -PSVR. The eigenvectors in the range $[0, 1]$ were normalized for all datasets, and a 10-fold cross-validation method was used (Figure 7). The performance criterion for $ε$ -PSVR is the result obtained with the same dataset and processing.²⁵ In Figure 8, we present the iterative results of three real benchmark datasets and highlight that Algorithm 1 convergences after about 30 iterations.

Table 2.

Description of seven real-world benchmark datasets.

Regressor	No. of instances	Total features	Source
Auto Mpg	398	9	UCI
Boston Housing	506	14	UCI
Traizines	186	61	UCI
NO₂	500	8	UCI
Servo	167	5	UCI
Nelson	128	3	NLSRD
Chwirut	214	2	NLSRD

Figure 7.

Convergence curves of algorithm 1 for real world datasets versus iteration times.

Figure 8.

Comparison of models fitting in the AutoMpg dataset.

To comprehensively verify the effectiveness of the proposed $A ε_{i}$ -SVR model, this section evaluates our method based on the SSE/SST, RMSE, and MAE metrics and compares it against the current advanced methods presented in Tables 3 –6. The experiments are conducted on seven real data sets, where 90% of the data are used for training and 10% for testing. During the trials, we adopted the original parameter setup as suggested by each paper. Table 3 reports the prediction performance of all competitor models, highlighting that $A ε_{i}$ -SVR yields the lowest RMSE (2.505) followed by $ε$ -PSVR (2.553) in the Auto Mpg dataset while employing the same parameters C and sigma. Additionally, the prediction performance of $A ε_{i}$ -SVR is superior to $ε$ -PSVR (2.553), Huber SVR (2.566), and $ε$ -SVR (2.567). Besides, the SSE/SST metric of $A ε_{i}$ -SVR shows a noticeable improvement in the Boston Housing dataset (0.123), which is much lower than $ε$ -SVR (0.229) and $ε$ -PSVR (0.228). After analyzing all three evaluation metrics, we conclude that $A ε_{i}$ -SVR attains better performance in the seven real datasets than SVR, where the $ε$ -tube has a single fixed value or two different values for inside and outside. Furthermore, although the $A ε_{i}$ -SVR model performs better on all datasets, the performance gain over the competitor methods varies. Therefore, Tables 4 –6 calculate our model’s improved prediction performance for each evaluation index and compare it against the other models.

Table 3.

Performance comparison of $A ε_{i}$ -SVR, $ε$ -SVR, Huber SVR, $ε$ -PSVR, and $A ε_{i}$ -SVR models on real world benchmark datasets.

Datasets	Regressor	(t2, t1)	SSE/SST	RMSE	MAE	$(σ, C, ε)$
Auto Mpg	$ε$ -SVR	(1, 0.1)	0.115	2.567	1.844	(0.5, 64, 1)
	Huber SVR		0.115	2.566	1.832	(0.5, 64, 1)
	$ε$ -PSVR		0.112	2.553	1.849	(0.5, 64, 1)
	$A ε_{i}$ -SVR		0.101	2.505	1.818	(0.5, 64, –)
Boston Housing	$ε$ -SVR	(1, 0.1)	0.229	3.363	2.402	(2, 128, 2)
	Huber SVR		0.229	3.363	2.402	(2, 128, 2)
	$ε$ -PSVR		0.235	3.395	2.396	(2, 128, 2)
	$A ε_{i}$ -SVR		0.228	3.351	2.398	(2, 128, 2)
	$A ε_{i}$ -SVR		0.123	3.106	2.031	(2, 128, –)
Chwirut	$ε$ -SVR	(1.5, 0.7)	0.023	3.231	2.249	(0.15, 64, 0.3)
	Huber SVR		0.022	3.219	2.242	(0.15, 64, 0.3)
	$ε$ -PSVR		0.022	3.225	2.287	(0.15, 64, 0.3)
	$A ε_{i}$ -SVR		0.019	3.110	2.234	(0.15, 64, –)
Traizines	$ε$ -SVR	(1, 0.1)	0.893	0.132	0.099	(1, 4, 0.1)
	Huber SVR		0.914	0.136	0.101	(1, 4, 0.1)
	$ε$ -PSVR		0.874	0.132	0.099	(1, 4, 0.1)
	$A ε_{i}$ -SVR		0.807	0.131	0.095	(1, 4, –)
NO2	$ε$ -SVR	(1, 0.4)	0.484	0.502	0.398	(0.5, 8, 0.3)
	Huber SVR		0.484	0.502	0.398	(0.5, 8, 0.3)
	$ε$ -PSVR		0.495	0.507	0.401	(0.5, 8, 0.3)
	$A ε_{i}$ -SVR		0.482	0.502	0.400	(0.5, 8, 0.3)
	$A ε_{i}$ -SVR		0.479	0.501	0.392	(0.5, 8, –)
Nelson	$ε$ -SVR	(1, 0.2)	0.153	1.311	0.978	(10, 102, 0.2)
	Huber SVR		0.152	1.300	0.965	(10, 102, 0.2)
	$ε$ -PSVR		0.149	1.293	0.963	(10, 102, 0.2)
	$A ε_{i}$ -SVR		0.147	1.271	0.950	(10, 102, –)
Servo	$ε$ -SVR	(1.2, 0.1)	0.165	0.538	0.305	(2.8, 4, 0.1)
	Huber SVR		0.173	0.529	0.292	(2.8, 4, 0.1)
	$ε$ -PSVR		0.162	0.530	0.304	(2.8, 4, 0.1)
	$A ε_{i}$ -SVR		0.136	0.527	0.261	(2.8, 4, –)

Table 4.

Percentage reduction in $A ε_{i}$ -SVR compared to other models on SSE/SST.

Regressor	Datasets
	Auto Mpg (%)	Boston Housing (%)	Chwirut (%)	Traizines (%)	NO₂ (%)	Nelson (%)	Servo (%)
$ε$ -SVR	12.17	46.29	13.04	9.63	1.03	3.92	17.58
Huber SVR	12.17	47.66	9.09	11.71	3.23	3.29	21.39
$ε$ -PSVR	9.82	46.05	9.09	7.67	1.34	1.34	16.05

Table 5.

Percentage reduction in $A ε_{i}$ -SVR compared to other models on RMSE.

Regressor	Datasets
	Auto Mpg (%)	Boston Housing (%)	Chwirut (%)	Traizines (%)	NO₂ (%)	Nelson (%)	Servo (%)
$ε$ -SVR	2.42	7.64	3.74	0.76	0.20	3.05	2.04
Huber SVR	2.38	8.51	3.39	3.68	1.18	2.23	0.38
$ε$ -PSVR	1.88	7.31	3.57	0.76	1.70	1.70	0.57

Table 6.

Percentage reduction in $A ε_{i}$ -SVR compared to other models on MAE.

Regressor	Datasets
	Auto Mpg (%)	Boston Housing (%)	Chwirut (%)	Traizines (%)	NO₂ (%)	Nelson (%)	Servo (%)
$ε$ -SVR	1.41	15.45	0.67	4.04	1.51	2.86	14.43
Huber SVR	0.76	15.23	0.36	5.94	2.24	1.55	10.62
$ε$ -PSVR	1.68	15.30	2.32	4.04	1.35	1.35	14.14

Tables 4 –6 highlight that from the three evaluation metrics, the developed model attains the highest performance on the Boston Housing dataset, affording an improvement of 46.29%, 7.64%, and 15.45% on the SSE/SST, RMSE, and MAE metrics, respectively. This is because our model focuses on solving the problem of inconsistent data distribution in real data, which is the case for the Boston Housing dataset. It should be noted that $A ε_{i}$ -SVR performs well on the other datasets, with the improvement being around 1%, but in some cases exceeds 10% compared to existing methods.

Next, we randomly select one independent variable from the multiple independent variables as the x-axis and the dependent variable as the y-axis to demonstrate the model’s fit to the data sets on a two-dimensional plane.

Figures 9 –11 reveal that the $A ε_{i}$ -SVR model outperforms $ε$ -SVR on all data sets, with the Boston Housing dataset presenting the most pronounced prediction gap. Particularly, many data points are located in the lower half, and the distribution is relatively dense when the Indus value is 0.2. When the Indus value is 0.5, there are more data points in the upper half, and the distribution is sparse, with $ε$ -SVR being unable to adapt to the data distribution characteristics because the locations where the data are sparse and dense are treated equally. Consequently, the model’s fitting effect on Boston Housing at positions 0.2 and 0.5 is poor. However, since $A ε_{i}$ -SVR can vary the $ε$ -tube radius depending on the sparsity levels, it exhibits a more flexible predictive performance, allowing it to fit data points better at both sparse and dense locations.

Figure 9.

Comparison of models fitting in the Boston Housing dataset.

Figure 10.

Comparison of models fitting in the NO₂ dataset.

Figure 11.

Comparison of models fitting in the Chwirut dataset.

Furthermore, the Chwirut dataset is a two-dimensional dataset with low dimensionality and a relatively uniform data point distribution, evenly distributed at each location. In this dataset, $A ε_{i}$ -SVR and $ε$ -SVR have a relatively satisfactory fit to this dataset, with Figure 12 highlighting the minor fitting difference between the two lines due to the thinness of $ε$ -tube.

Figure 12.

The ten-fold cross validation.

The experimental results demonstrate that $A ε_{i}$ -SVR outperforms the other SVR models because it has different penalty values for each data point in the dataset, which can be changed during the iteration process based on the sample information, aiming to achieve better prediction.

Parameter sensitivity study

This experiment uses a 10-fold cross-validation sampling strategy to evaluate the prediction performance of $A ε_{i}$ -SVR under different parameter setups. Six independent experiments were conducted to investigate the sensitivity of the $A ε_{i}$ -SVR and $ε$ -SVR models while altering the c and gamma values on the Auto Mpg, Boston Housing, and NO₂ data sets. The range of the c and gamma values is chosen adjacent to the optimal value, and only one parameter is changed at a time, with the other taking the optimal value. The corresponding results are illustrated in Figure 13.

Figure 13.

Performance evaluation of $A ε_{i}$ -SVR and $ε$ -SVR on three datasets. First row (a–c): the testing accuracy of two models with different $sigma$ . Second row (d–f): the training time of the two models with different $C$ .

For this experiment, we adopt a 10-fold cross-validation strategy, and in each fold, the average value is considered the final evaluation index after being evaluated on the test set. As observed, the prediction performance of the $A ε_{i}$ -SVR model is better than the $ε$ -SVR model under different C and sigma values. Compared with the standard $ε$ -SVR model, $A ε_{i}$ -SVR is less sensitive to the c-value as it maintains a good prediction effect even for a large c-value range. The sensitivity of $A ε_{i}$ -SVR to sigma is the same as for the SVR model. If for the standard SVR model the time complexity is $O (\cdot)$ , while the time complexity of the $A ε_{i}$ -SVR model proposed in this paper is $MNO (\cdot)$ , where $M$ is the number of iterations set in Algorithm 1 and $N$ is the number of samples for training. When the sample size is large, it will increase the computation time significantly. Therefore, when selecting the $A ε_{i}$ -SVR model, we can set a large search step for the C value and a standard search step for sigma. Through this setting, we offset some of the time spent to iteratively solve for $ε_{i}$ , thus balancing training time and prediction performance.

Conclusion and future works

This paper proposes a novel “Alterable- $ε_{i}$ ” model, which is adaptive to the data set distribution characteristics by determining $ε_{i}$ at each position. Experimental results on several simulated and real data sets demonstrate that Alterable- $ε_{i}$ has an obvious performance improvement over the state-of-the-art models due to the non-fixed $ε$ -tube radius. Specifically, the proposed model has a significant advantage over current models for certain datasets.

Overall, the proposed $A ε_{i}$ -SVR model’s performance outperforms the standard SVR model for the following reasons. First, the developed iterative fluctuation self-selection algorithm can find the appropriate $ε$ penalty parameter for each location. Second, $A ε_{i}$ contributes to the removal of noise and outliers in the dataset through a variable $ε$ -tube formed by the free parameters. Third, $A ε_{i}$ successfully extracts important data information through the free parameters.

Moreover, flexible bounds can lead to better prediction results in classification learning tasks. This property enables the Alterable- $ε_{i}$ proposed for regression tasks to distinguish data points with inconsistent distribution sparsity, which is quite coherent with the requirements of classifying data. Therefore, in the future, we aim to extend Alterable- $ε_{i}$ to solve classification problems, such as image classification.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work is supported by National Science Foundation of China (No. 11201356) and Hubei Province Key Laboratory of Systems Science in Metallurgical Process (Wuhan University of Science and Technology)(No. Y202201).

ORCID iD

Xiaoxia He

References

Drucker

Burges

Kaufman

, et al. Support vector regression machines. In: Advances in neural information processing systems. Cambridge, MA: Massachusetts Institute of Technology Press, 1997, pp.155–161.

Smola

Schölkopf

. A tutorial on support vector regression. Stat Comput 2004; 14(3): 199–222.

Gunn

. Support vector machines for classification and regression. ISIS Tech Rep 1998; 14(1): 5–16.

Vapnik

. An overview of statistical learning theory. IEEE transactions on neural networks, 1999; 10(5): 988–999.

Khemchandani

Chandra

. Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 2007; 29(5): 905–910.

D’Ambrosio

Degasperi

Anolli

, et al. Incidence of liver- and non-liver-related outcomes in patients with HCV-cirrhosis after svr.undefined. J Hepatol 2022; 76(2): 302–310.

Badr

Almotairi

Salam

, et al. New sequential and parallel support vector machine with grey wolf optimizer for breast cancer diagnosis. Alex Eng J 2022; 61(3): 2520–2534.

Ioannou

. HCC surveillance after SVR in patients with F3/F4 fibrosis. J Hepatol 2021; 74(2): 458–465.

Ghimire

Bhandari

Casillas-Pérez

, et al. Hybrid deep CNN-SVR algorithm for solar radiation prediction problems in Queensland, Australia. Eng Appl Artif Intell 2022; 112: 104860.

10.

Luo

Liu

, et al. Wind power prediction based on EEMD-tent-SSA-LS-SVM. Energy Rep 2022; 8: 3234–3243.

11.

Han

Chen

, et al. Advanced learning-based energy policy and management of dispatchable units in smart grids considering uncertainty effects. Int J Electr Power Energy Syst 2021; 132: 107188.

12.

Karasu

Altan

Bekiros

, et al. A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy 2020; 212: 118750.

13.

Zhang

H-C

F-Y

. Application of online multitask learning based on least squares support vector regression in the financial market. Appl Soft Comput 2022; 121: 108754.

14.

Chao

Zhipeng

Yuanjie

. A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification. Expert Syst Appl 2019; 123: 283–298.

15.

Muller

Mika

Ratsch

, et al. An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw, 2001; 12(2): 181–201.

16.

Gönen

Alpaydın

. Multiple kernel learning algorithms. J Mach Learn Res 2011; 12: 2211–2268.

17.

Wang

Zuo

Zhang

, et al. A kernel classification framework for metric learning. IEEE Trans Neural Netw Learn Syst 2015; 26(9): 1950–1962.

18.

Ding

Liu

Yang

, et al. Random radial basis function kernel-based support vector machine. Journal of the Franklin Institute 2021; 358(18): 10121–10140.

19.

Cai

Wang

, et al. A multiple-kernel LSSVR method for separable nonlinear system identification. J Control Theory Appl 2013; 11(4): 651–655.

20.

Hooshmand Moghaddam

Hamidzadeh

. New Hermite orthogonal polynomial kernel and combined kernels in support vector machine classifier. Pattern Recognit 2016; 60: 921–935.

21.

Jiang

Tao

Dong

, et al. Robust low-rank multiple kernel learning with compound regularization. Eur J Oper Res 2021; 295(2): 634–647.

22.

Suykens

JAK

Vandewalle

. Chaos control using least-squares support vector machines. Int J Circuit Theory Appl 1999; 27(6): 605–615.

23.

Karal

. Maximum likelihood optimal and robust support vector regression with lncosh loss function. Neural Netw 2017; 94: 1–12.

24.

Tanveer

Mangal

Ahmad

, et al. One norm linear programming support vector regression. Neurocomputing 2016; 173: 1508–1518.

25.

Anand

Rastogi

Chandra

. A class of new support vector regression models. Appl Soft Comput 2020; 94: 106446.

26.

Gupta

. An improved regularization based Lagrangian asymmetric ν-twin support vector regression using pinball loss function. Appl Intell 2019; 49(10): 3606–3627.

27.

Balasundaram

Meena

. Robust support vector regression in primal with asymmetric huber loss. Neural Process Lett 2019; 49(3): 1399–1431.

28.

Cheng

. Active learning Bayesian support vector regression model for global approximation. Inf Sci 2021; 544: 549–563.

29.

Schölkopf

Smola

Bach

, et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press, 2002.

30.

Scornet

. Random forests and kernel methods. IEEE Trans Inf Theory 2016; 62(3): 1485–1500.

31.

van der Laan

Polley

Hubbard

. Super learner. Stat Appl Genet Mol Biol 2007; 6(1): Article25.

32.

Meier

van

Geer

Bühlmann

. High-dimensional additive modeling. Ann Stat 2009; 37(6B): 3779–3821.

33.

Newman

Hettich

Blake

, et al. UCI repository of machine learning databases. Irvine, CA: University of California, Dept. of Information and Computer Science.

Support vector regression model with variant tolerance

Abstract

Keywords

Introduction

Theory of alterable ε i -support vector regression

Linear support vector regression model

Non-linear support vector regression model

Solution of alterable ε i -support vector regression

Iterative fluctuation self-selection algorithm

Applications on simulation and real data

Simulation data application

Real data application

Parameter sensitivity study

Conclusion and future works

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References

Theory of alterable $ε_{i}$ -support vector regression

Solution of alterable $ε_{i}$ -support vector regression