Most works on Support Vector Regression (SVR) focus on kernel or loss functions, with the corresponding support vectors obtained using a fixed-radius -tube, affording good predictive performance on datasets. However, the fixed radius limitation prevents the adaptive selection of support vectors according to the data distribution characteristics, compromising the performance of the SVR-based methods. Therefore, this study proposes an “Alterable -Support Vector Regression” (-SVR) model by applying a novel , named “Alterable ,” to the SVR model. Based on the data point sparsity at each location, the model solves the different at the corresponding position, and thus zoom-in or zoom-out the -tube by changing its radius. Such a variable -tube strategy diminishes noise and outliers in the dataset, enhancing the prediction performance of the -SVR model. Therefore, we suggest a novel non-deterministic algorithm to iteratively solve the complex problem of optimizing associated with every location. Extensive experimental results demonstrate that our approach can improve the accuracy and stability on simulated and real data compared with the baseline methods.
Support Vector Machines (SVM) is a powerful machine learning technique based on Statistical Learning and a promising tool to overcome the function approximation problem based on the Structural Risk Minimization (SRM) principle. As an advanced supervised learning algorithm, SVM conducts data classification by constructing a hyperplane After decades of development, it has reached a high level of maturity considering its theoretical support1–5, and has been widely employed in various regression estimation tasks, such as Cancer prevention and treatment,6–8 renewable energy exploitation prediction,9–11 and financial time series data.12–14 This regression scheme is termed Support Vector Regression (SVR) and performs better than current machine learning algorithms. SVR solves practical problems by introducing prediction equation , where the predictor predicts new observations in the feature space and is defined as , where is a transformation function in the covariate space. By introducing SVR kernel functions, it presents an excellent nonlinear function regression capability and therefore has been successfully applied to nonlinear prediction systems.
Over the past few decades, various studies have focused on the kernel function and loss function . Regarding the former, it aims to avoid directly computing high-dimensional inner products and improve the algorithm’s generalization performance.15–17 The literature presents several kernel functions such as the random radial basis function (RRBF) kernel,18 the composite wavelet kernel,19 the Hermite orthogonal polynomial kernel,20 and the robust low-rank multiple kernels.21 Through these kernel functions, the SVR model can transform the original data into the nonlinear finite-dimensional kernel space to avoid the “Dimensional Disaster” and improve classification accuracy and prediction.
Considering the conventional loss functions, these include Huber loss function-based SVR,3 quadratic loss function-based -SVR,22 Maximum Likelihood Optimal and Robust Support Vector Regression model,23 and -norm SVR.24
Motivated by -SVR, Pritam25 introduced an model utilizing an -penalty loss function that offers different penalty rates to the data points depending upon whether they lie inside or outside the -tube. Moreover, Gupta and Gupta26 proposed the asymmetric -twin support vector regression (Asy--TSVR) that finds two non-parallel hyperplanes to construct the corresponding decision functions given two different -insensitive loss functions, which are the lower and upper bound functions. Besides, Balasundaram and Meena27 presented an -insensitive asymmetric Huber function-based -AHSVR model, where an -insensitive loss function is integrated with the Huber loss function to compensate for the high complexity of the latter. Cheng and Lu28 developed an -insensitive square loss function-based Bayesian -SVR model that adopts the minimizing structural risk principle through the -insensitive square loss function while provides point-by-point probability predictions and allows determining the optimal hyperparameters by maximizing the Bayesian model evidence.
Most innovations on loss functions rely on the -insensitive loss functions because, among all points in the training set, the -insensitive loss function ignores the points inside the -tube. Therefore, the prediction model is essentially determined by those support factors located at the edges or outside the tube, enhancing the generalization ability of -SVR. Here, the -insensitive loss function1 is given by:
where 0. Figure 1 shows some import loss function. Employing in the -SVR enhances robustness to errors with the error tolerance corresponding to each position being . Such an approach allows the solution vector of the -SVR model to become sparse.
Loss function corresponding to several different models.
However, these new loss functions only focus on the -tube radius value or optimizing the hyper-parameter to minimize the empirical risk. However, when complex datasets are involved, a single value of cannot obtain the support vectors at different locations of high-dimensional data. Indeed, an -tube with a fixed radius limits the choice of support vectors for each position, prohibiting it from utilizing the full information of the training set.
To address the above challenges, this work proposes the‘Alterable -Support Vector Regression’ (-SVR) based on an alterable -insensitive loss function. Similar to the standard -SVR, the -SVR exploits the loss function computed by the to measure the empirical risk with a regularization term . This loss function is given by:
where 0, i = 1,2,…,n. The value of Alterable- depends on the distribution sparsity of the data points’ location, allowing -SVR to capture support vectors for each position adaptively. Figure 2 illustrates proposed εi loss function with different value of ε.
The proposed alterable loss function with different value of .
The -tube constraints on standard -SVR and -SVR are and , respectively. Hence, this paper introduces the concept of adaptive , which affords -SVR to exploit the information provided by the training set and thus adapt better to the data distribution characteristics and improve the prediction effect. Although the kernel function can map the data to a high-dimensional Hilbert space based on statistical learning and effectively dealing with high-dimensional data, the hyper-plane is not flexible enough to fit every point in the space when making predictions in the high-dimensional feature space. Therefore, the hyper-surface is obtained after training the -SVR model on the training set, which affords a better prediction performance than the hyper-plane.
However, the hyper-plane’s overall distribution varies with the value of at each location, posing a large computational burden to choose and solve . Hence, to overcome these challenges, this paper suggests a novel non-deterministic algorithm for determining .
Specifically, the proposed SVR model improves the previously discussed methods, including the loss function (standard -SVR expression) and the kernel function (RBF kernel function) by introducing many penalty values through an iterative process, providing a variable -tube. Opposing the standard -tube, the variable -tube based -SVR model allows a different penalty depending on the data distribution that enhances our method’s flexibility in removing outliers, which in turn mitigates the noise interference on the prediction curve, and finally achieves better prediction results.
The remainder of this paper is organized as follows. Section 2 compares the proposed -SVR model against the -SVR-based model for both linear and nonlinear scenarios. Section 3 derives -SVR, and Section 4 introduces a novel iterative fluctuation self-selection algorithm. Sections 5 and 6 present and discuss the key comparisons based on real-world benchmarks, and finally, Section 7 concludes this work and provides some future research directions.
Theory of alterable -support vector regression
This section shortly reviews the standard -Support vector regression models and compares them with the Alterable -SVR in both linear and nonlinear scenarios. Given a training set C is a pre-specified value, b is the offset item, and and are slack variables that represent the upper and lower constraints on the system outputs.
Linear support vector regression model
To estimate the linear function , where is the weight vector and b is the offset term, the standard -Support Vector Regression minimizes:
subject to
The -SVR model minimizes
subject to
Figure 3 illustrates an intuitive geometric interpretation of the linear SVR. Normally, the standard -SVR model has a tube with a fixed radius, which, although avoids over-fitting, it cannot obtain all support vectors. Indeed, the fixed radius -tube provides the same tolerance for all data points, either closer or further away from the tube wall, enhancing the -SVR model’s insensitivity to outliers. To overcome this problem, we propose the Alterable--SVR model, which, based on the training data set, captures the appropriate support vector machine for each position by iteratively obtaining the distinct values per position. When the location with a dense data point distribution is involved, the -SVR model increases the number of support vectors at that location by shortening the -tube radius. On the contrary, at a location with a sparse data point distribution, our method increases the -tube radius, decreasing the number of support vectors. This way, the hyper-plane orientation and displacement can be changed to fit the data points better and exhibit better predictive performance.
Comparison of linear support vector regression models: (a) -SVR (kernel = Linear) and (b) -SVR (kernel = Linear).
Non-linear support vector regression model
To estimate the linear function where is a non-linear mapping from the input space to higher dimensional Hilbert space, the standard -Support Vector Regression minimizes
subject to
The -SVR model minimizes
subject to
The geometric interpretation of the nonlinear SVR is presented in Figure 4(a), highlighting that -SVR performs well on some data points but poorly on others. This is probably because, in this model, all data points are treated equally and have the same tolerance due to the limitation of the fixed-radius -tube. Nevertheless, real data sets tend to be unevenly distributed, with both sparsely and densely distributed data points. Limiting the fixed radius prevents adaptively selecting the support vectors according to the distribution characteristics of the data, compromising the predictive performance of -SVR.
Comparison of non-linear support vector regression models: (a) -SVR (kernel = RBF) and (b) -SVR (kernel = RBF).
Considering -SVR, in Figure 4(b), our model distinguishes the locations with sparse and dense data point distribution and calculates the distinct tolerances at the different positions so that the -tube radius changes depending on the location. Moreover, changing the -tube radius allows the prediction curve to be more flexible when dealing with datasets, thus producing better prediction results. It should be noted that this section presents only the schematic diagram of -SVR, and its performance on fitting the real data set is evaluated in Section 5.
Solution of alterable -support vector regression
Given a training set is the weight vector, b is a real constant, and , are slack variables that represent the upper and lower bound constraints of the system output, and C > 0 is a pre-specified value. To solve equation (4), we formulate Lagrange as follows:
where and (i = 1,2,…,m) are the Lagrange multipliers.
According to the above KKT conditions, the Wolfe dual of the primal problem equation (4) can be obtained as follows:
subject to
If there is a nonlinear relationship between y and x, then we can linearize it by mapping x to a higher dimensional feature space29 through the mapping relationship . The estimated regressor is given by:
According to the Mercer condition, can convert to a positive definite kernel
Given the excellent fitting properties of the radial basis function (RBF), we employ it to define the kernel function . An RBF kernel function can be expressed as:
This transformation reduces the computational complexity of the optimization problems, and therefore the problem can be reformulated by equation (16) as follows:
The above formula in the given computer calculation form becomes:
subject to
Equation (21) is the Linear Programing Problem (LPP), where and denote the Lagrange multipliers estimated when the condition and the constant C are given. Here, we use the quadratic programing package in Python to solve and . Using equation (14), the -SVR function can be represented as:
It is evident that the influence of on the support vector has been transformed into the influence on the value of . Concerning the training set, the samples in the sample set that satisfy the condition are used as support vectors for the SVR and must locate on the edges or outside the tube. Here, we alter at each position in the training set and then change the number of support vectors for the counterpart location. In this case, given the training set, the adaptive is computed through the iterative algorithm (Algorithm 1) to determine the best support vectors at every position. Based on the KKT condition, each input dimension b is described as:
Finally, we take the average value of b obtained by all the support vectors and obtain:
Algorithm 1: The iterative fluctuation self-selection algorithm
Input:, parameters , , , , ,, in the first iteration,, repeat times M, constant C, kernel parameter , fluctuation Output:1 According to a certain proportion P, randomly divided into a function of training set2 for Mdo34 forN do5ifthen6 7 else89 end10end11 Calculate , and according to Eq12 Calculate the training 13 ifthen14 15 else ifthen1617 else1819 end20 end
Iterative fluctuation self-selection algorithm
The developed framework comprises three steps. First, the data is divided into K subsets, and we randomly select K–1 subsets as training samples and the rest are the test samples, while the kernel parameter , adaptive , and constant C are given. The second step determines the starting position of the iteration by introducing the standard -SVR, after which is calculated. In this case, data points outside the tube should satisfy . Here F is the fluctuation coefficient, which provides a fluctuation range for the e value and increases the randomness of each e value to prevent overfitting. Moreover, we set a criterion Q value (Q = 0.1), when , and is considered as the corresponding for -SVR. After screening, the can be obtained. The last step is to acquire the optimal through the iterative fluctuation self-selection algorithm, which is summarized in Algorithm 1.
In Algorithm 1, MAE is used as the reliability index, and the fluctuation is exploited to improve the model’s generalization ability. By appropriately setting , , and are generated, and then we find the best value among , , and . If the F value is set too large, we may miss a satisfactory , and if it is set too small, the iteration process will last longer.
Figure 5 illustrates the relationship between the reliability index and the number of algorithm calls, revealing that as the number of iterations increases, the prediction accuracy of the -SVR model improves.
Summary of the steps performed by the iterative fluctuation self-selection algorithm.
Applications on simulation and real data
This section evaluates the proposed approach on seven simulated and seven real-world benchmark datasets. For all the subsequent experiments, we set , and we employ the RBF kernel , where is the kernel parameter in all regression models, and its value is searched in the set . Additionally, the values of parameter and for all SVR models are obtained by searching in the sets and , respectively. Moreover, the following three common criteria are presented to verify the effectiveness of the -SVR model.
Simulation data application
We generate seven simulated datasets randomly using Python (http://www.python.org/) to evaluate the proposed method’s performance and behavior. Accordingly, we employ seven datasets, that is, 1–2, 3–5, 6, and 7, from Anand et al.,25 Scorent,30 van der Laan,31 and Meier et al,32 respectively. Regarding the regression frameworks, we consider the covariates . In dataset 1–2, each predictor is from while in dataset 3–7, each predictor is from , and is converted to through the function . All datasets are described below:
All datasets contain 400 training and 100 test samples. Table 1 reports the performance of the -SVR, -SVR, and -SVR models under several evaluation metrics. SSE/SST, RMSE, and MAE were calculated on seven simulation data sets with different data distribution characteristics, and the optimal parameter settings for -SVR per dataset are listed in Table 1. To compare the performance more objectively, the values of C and of the competitor models are aligned with the -SVR. Table 1 highlights that without changing the parameters, the performance of the proposed -SVR model presents a significant improvement in the SSE/SST, RMSE, and MAE metrics compared with the -SVR and -SVR models. Specifically, -SVR obtains the lowest RMSE value (0.103) in the artificial Dataset 1, inferring that the proposed -SVR model consistently outperforms the -SVR and -SVR models, demonstrating an appealing generalization ability.
Performance comparison of -SVR, -SVR, and -SVR models on the lation datasets.
Datasets
Regressor
SSE/SST
RMSE
MAE
Dataset 1
-SVR
0.080
0.107
0.086
(0.1, 1, 0.2)
-SVR
0.081
0.108
0.085
(0.1, 1, –)
-SVR
0.074
0.103
0.081
(0.1, 1, –)
Dataset 2
-SVR
0.321
2.210
1.760
(0.1, 10, 1)
-SVR
0.320
2.207
1.735
(0.1, 10, –)
-SVR
0.317
2.194
1.729
(0.1, 10, –)
Dataset 3
-SVR
0.597
0.483
0.376
(0.7, 10, 0.5)
-SVR
0.582
0.477
0.370
(0.7, 10, –)
-SVR
0.576
0.474
0.366
(0.7, 10, –)
Dataset 4
-SVR
0.377
0.532
0.419
(0.1, 10, 0.6)
-SVR
0.379
0.534
0.427
(0.1, 10, –)
-SVR
0.374
0.529
0.415
(0.1, 10, –)
Dataset 5
-SVR
0.289
0.489
0.383
(0.1, 10, 0.3)
-SVR
0.292
0.492
0.388
(0.1, 10, –)
-SVR
0.283
0.482
0.377
(0.1, 10, –)
Dataset 6
-SVR
0.281
0.565
0.461
(0.1, 10, 0.01)
LS-SVR
0.285
0.569
0.464
(0.1, 10, –)
-SVR
0.273
0.558
0.453
(0.1, 10, –)
Dataset 7
-SVR
0.199
0.514
0.422
(0.1, 100, 0.6)
-SVR
0.193
0.505
0.412
(0.1, 100, –)
-SVR
0.188
0.499
0.408
(0.1, 100, –)
Next, we calculated the reduction in SSE/SST, RMSE, and MAE for -SVR relative to the -SVR and -SVR models. The corresponding results are illustrated in Figure 6, highlighting that the proposed model has an enhanced predictive performance compared to the competitor models, with the most obvious improvement being in the SSE/SST index. In Dataset 1, the SSE/SST value of -SVR is 0.074, achieving an improvement of 7.5% and 8.6% relative to -SVR and -SVR, respectively. Additionally, Figure 6 reveals that the proposed model affords an improved prediction performance over the competitor models, reflecting the importance of “Alterable .” Therefore, the experiments imply that assigning variable depending on the data distribution characteristics is superior to treating each data point in the training set equally.
Percentage reduction of SSE/SST, RMSE, and MAE.
Real data application
We also apply our proposed method to real benchmark datasets, namely Traizines, Servo, NO2, Chwirut, Auto Mpg, Nelson, and Boston Housing. These datasets were downloaded from the UCI repository33 (https://archive.ics.uci.edu/address) and NLSRD (https://www.itl.nist.gov/div898/strd/nls/nls_main.shtml/address), and are commonly used to evaluate regression methods. Further information on the seven datasets is reported in Table 2. In this trial, we challenged the proposed SVR against some existing SVR models, such as the -SVR model, the Huber SVR model, and the -PSVR. The eigenvectors in the range were normalized for all datasets, and a 10-fold cross-validation method was used (Figure 7). The performance criterion for -PSVR is the result obtained with the same dataset and processing.25 In Figure 8, we present the iterative results of three real benchmark datasets and highlight that Algorithm 1 convergences after about 30 iterations.
Description of seven real-world benchmark datasets.
Regressor
No. of instances
Total features
Source
Auto Mpg
398
9
UCI
Boston Housing
506
14
UCI
Traizines
186
61
UCI
NO2
500
8
UCI
Servo
167
5
UCI
Nelson
128
3
NLSRD
Chwirut
214
2
NLSRD
Convergence curves of algorithm 1 for real world datasets versus iteration times.
Comparison of models fitting in the AutoMpg dataset.
To comprehensively verify the effectiveness of the proposed -SVR model, this section evaluates our method based on the SSE/SST, RMSE, and MAE metrics and compares it against the current advanced methods presented in Tables 3–6. The experiments are conducted on seven real data sets, where 90% of the data are used for training and 10% for testing. During the trials, we adopted the original parameter setup as suggested by each paper. Table 3 reports the prediction performance of all competitor models, highlighting that -SVR yields the lowest RMSE (2.505) followed by -PSVR (2.553) in the Auto Mpg dataset while employing the same parameters C and sigma. Additionally, the prediction performance of -SVR is superior to -PSVR (2.553), Huber SVR (2.566), and -SVR (2.567). Besides, the SSE/SST metric of -SVR shows a noticeable improvement in the Boston Housing dataset (0.123), which is much lower than -SVR (0.229) and -PSVR (0.228). After analyzing all three evaluation metrics, we conclude that -SVR attains better performance in the seven real datasets than SVR, where the -tube has a single fixed value or two different values for inside and outside. Furthermore, although the -SVR model performs better on all datasets, the performance gain over the competitor methods varies. Therefore, Tables 4–6 calculate our model’s improved prediction performance for each evaluation index and compare it against the other models.
Performance comparison of -SVR, -SVR, Huber SVR, -PSVR, and -SVR models on real world benchmark datasets.
Datasets
Regressor
(t2, t1)
SSE/SST
RMSE
MAE
Auto Mpg
-SVR
(1, 0.1)
0.115
2.567
1.844
(0.5, 64, 1)
Huber SVR
0.115
2.566
1.832
(0.5, 64, 1)
-PSVR
0.112
2.553
1.849
(0.5, 64, 1)
-SVR
0.101
2.505
1.818
(0.5, 64, –)
Boston Housing
-SVR
(1, 0.1)
0.229
3.363
2.402
(2, 128, 2)
Huber SVR
-PSVR
0.235
3.395
2.396
(2, 128, 2)
-SVR
0.228
3.351
2.398
(2, 128, 2)
0.123
3.106
2.031
(2, 128, –)
Chwirut
-SVR
(1.5, 0.7)
0.023
3.231
2.249
(0.15, 64, 0.3)
Huber SVR
0.022
3.219
2.242
(0.15, 64, 0.3)
-PSVR
0.022
3.225
2.287
(0.15, 64, 0.3)
-SVR
0.019
3.110
2.234
(0.15, 64, –)
Traizines
-SVR
(1, 0.1)
0.893
0.132
0.099
(1, 4, 0.1)
Huber SVR
0.914
0.136
0.101
(1, 4, 0.1)
-PSVR
0.874
0.132
0.099
(1, 4, 0.1)
-SVR
0.807
0.131
0.095
(1, 4, –)
NO2
-SVR
(1, 0.4)
0.484
0.502
0.398
(0.5, 8, 0.3)
Huber SVR
-PSVR
0.495
0.507
0.401
(0.5, 8, 0.3)
-SVR
0.482
0.502
0.400
(0.5, 8, 0.3)
0.479
0.501
0.392
(0.5, 8, –)
Nelson
-SVR
(1, 0.2)
0.153
1.311
0.978
(10, 102, 0.2)
Huber SVR
0.152
1.300
0.965
(10, 102, 0.2)
-PSVR
0.149
1.293
0.963
(10, 102, 0.2)
-SVR
0.147
1.271
0.950
(10, 102, –)
Servo
-SVR
(1.2, 0.1)
0.165
0.538
0.305
(2.8, 4, 0.1)
Huber SVR
0.173
0.529
0.292
(2.8, 4, 0.1)
-PSVR
0.162
0.530
0.304
(2.8, 4, 0.1)
-SVR
0.136
0.527
0.261
(2.8, 4, –)
Percentage reduction in -SVR compared to other models on SSE/SST.
Regressor
Datasets
Auto Mpg (%)
Boston Housing (%)
Chwirut (%)
Traizines (%)
NO2 (%)
Nelson (%)
Servo (%)
-SVR
12.17
46.29
13.04
9.63
1.03
3.92
17.58
Huber SVR
12.17
47.66
9.09
11.71
3.23
3.29
21.39
-PSVR
9.82
46.05
9.09
7.67
1.34
1.34
16.05
Percentage reduction in -SVR compared to other models on RMSE.
Regressor
Datasets
Auto Mpg (%)
Boston Housing (%)
Chwirut (%)
Traizines (%)
NO2 (%)
Nelson (%)
Servo (%)
-SVR
2.42
7.64
3.74
0.76
0.20
3.05
2.04
Huber SVR
2.38
8.51
3.39
3.68
1.18
2.23
0.38
-PSVR
1.88
7.31
3.57
0.76
1.70
1.70
0.57
Percentage reduction in -SVR compared to other models on MAE.
Regressor
Datasets
Auto Mpg (%)
Boston Housing (%)
Chwirut (%)
Traizines (%)
NO2 (%)
Nelson (%)
Servo (%)
-SVR
1.41
15.45
0.67
4.04
1.51
2.86
14.43
Huber SVR
0.76
15.23
0.36
5.94
2.24
1.55
10.62
-PSVR
1.68
15.30
2.32
4.04
1.35
1.35
14.14
Tables 4–6 highlight that from the three evaluation metrics, the developed model attains the highest performance on the Boston Housing dataset, affording an improvement of 46.29%, 7.64%, and 15.45% on the SSE/SST, RMSE, and MAE metrics, respectively. This is because our model focuses on solving the problem of inconsistent data distribution in real data, which is the case for the Boston Housing dataset. It should be noted that -SVR performs well on the other datasets, with the improvement being around 1%, but in some cases exceeds 10% compared to existing methods.
Next, we randomly select one independent variable from the multiple independent variables as the x-axis and the dependent variable as the y-axis to demonstrate the model’s fit to the data sets on a two-dimensional plane.
Figures 9–11 reveal that the -SVR model outperforms -SVR on all data sets, with the Boston Housing dataset presenting the most pronounced prediction gap. Particularly, many data points are located in the lower half, and the distribution is relatively dense when the Indus value is 0.2. When the Indus value is 0.5, there are more data points in the upper half, and the distribution is sparse, with -SVR being unable to adapt to the data distribution characteristics because the locations where the data are sparse and dense are treated equally. Consequently, the model’s fitting effect on Boston Housing at positions 0.2 and 0.5 is poor. However, since -SVR can vary the -tube radius depending on the sparsity levels, it exhibits a more flexible predictive performance, allowing it to fit data points better at both sparse and dense locations.
Comparison of models fitting in the Boston Housing dataset.
Comparison of models fitting in the NO2 dataset.
Comparison of models fitting in the Chwirut dataset.
Furthermore, the Chwirut dataset is a two-dimensional dataset with low dimensionality and a relatively uniform data point distribution, evenly distributed at each location. In this dataset, -SVR and -SVR have a relatively satisfactory fit to this dataset, with Figure 12 highlighting the minor fitting difference between the two lines due to the thinness of -tube.
The ten-fold cross validation.
The experimental results demonstrate that -SVR outperforms the other SVR models because it has different penalty values for each data point in the dataset, which can be changed during the iteration process based on the sample information, aiming to achieve better prediction.
Parameter sensitivity study
This experiment uses a 10-fold cross-validation sampling strategy to evaluate the prediction performance of -SVR under different parameter setups. Six independent experiments were conducted to investigate the sensitivity of the -SVR and -SVR models while altering the c and gamma values on the Auto Mpg, Boston Housing, and NO2 data sets. The range of the c and gamma values is chosen adjacent to the optimal value, and only one parameter is changed at a time, with the other taking the optimal value. The corresponding results are illustrated in Figure 13.
Performance evaluation of -SVR and -SVR on three datasets. First row (a–c): the testing accuracy of two models with different . Second row (d–f): the training time of the two models with different .
For this experiment, we adopt a 10-fold cross-validation strategy, and in each fold, the average value is considered the final evaluation index after being evaluated on the test set. As observed, the prediction performance of the -SVR model is better than the -SVR model under different C and sigma values. Compared with the standard -SVR model, -SVR is less sensitive to the c-value as it maintains a good prediction effect even for a large c-value range. The sensitivity of -SVR to sigma is the same as for the SVR model. If for the standard SVR model the time complexity is , while the time complexity of the -SVR model proposed in this paper is , where is the number of iterations set in Algorithm 1 and is the number of samples for training. When the sample size is large, it will increase the computation time significantly. Therefore, when selecting the -SVR model, we can set a large search step for the C value and a standard search step for sigma. Through this setting, we offset some of the time spent to iteratively solve for , thus balancing training time and prediction performance.
Conclusion and future works
This paper proposes a novel “Alterable-” model, which is adaptive to the data set distribution characteristics by determining at each position. Experimental results on several simulated and real data sets demonstrate that Alterable- has an obvious performance improvement over the state-of-the-art models due to the non-fixed -tube radius. Specifically, the proposed model has a significant advantage over current models for certain datasets.
Overall, the proposed -SVR model’s performance outperforms the standard SVR model for the following reasons. First, the developed iterative fluctuation self-selection algorithm can find the appropriate penalty parameter for each location. Second, contributes to the removal of noise and outliers in the dataset through a variable -tube formed by the free parameters. Third, successfully extracts important data information through the free parameters.
Moreover, flexible bounds can lead to better prediction results in classification learning tasks. This property enables the Alterable- proposed for regression tasks to distinguish data points with inconsistent distribution sparsity, which is quite coherent with the requirements of classifying data. Therefore, in the future, we aim to extend Alterable- to solve classification problems, such as image classification.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work is supported by National Science Foundation of China (No. 11201356) and Hubei Province Key Laboratory of Systems Science in Metallurgical Process (Wuhan University of Science and Technology)(No. Y202201).
ORCID iD
Xiaoxia He
References
1.
DruckerHBurgesCJKaufmanL, et al. Support vector regression machines. In: Advances in neural information processing systems. Cambridge, MA: Massachusetts Institute of Technology Press, 1997, pp.155–161.
2.
SmolaAJSchölkopfB. A tutorial on support vector regression. Stat Comput2004; 14(3): 199–222.
3.
GunnSR. Support vector machines for classification and regression. ISIS Tech Rep1998; 14(1): 5–16.
4.
VapnikVN. An overview of statistical learning theory. IEEE transactions on neural networks, 1999; 10(5): 988–999.
5.
KhemchandaniRChandraS. Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell2007; 29(5): 905–910.
6.
D’AmbrosioRDegasperiEAnolliMP, et al. Incidence of liver- and non-liver-related outcomes in patients with HCV-cirrhosis after svr.undefined. J Hepatol2022; 76(2): 302–310.
7.
BadrEAlmotairiSSalamMA, et al. New sequential and parallel support vector machine with grey wolf optimizer for breast cancer diagnosis. Alex Eng J2022; 61(3): 2520–2534.
8.
IoannouGN. HCC surveillance after SVR in patients with F3/F4 fibrosis. J Hepatol2021; 74(2): 458–465.
9.
GhimireSBhandariBCasillas-PérezD, et al. Hybrid deep CNN-SVR algorithm for solar radiation prediction problems in Queensland, Australia. Eng Appl Artif Intell2022; 112: 104860.
10.
LiZLuoXLiuM, et al. Wind power prediction based on EEMD-tent-SSA-LS-SVM. Energy Rep2022; 8: 3234–3243.
11.
HanAChenXLiZ, et al. Advanced learning-based energy policy and management of dispatchable units in smart grids considering uncertainty effects. Int J Electr Power Energy Syst2021; 132: 107188.
12.
KarasuSAltanABekirosS, et al. A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy2020; 212: 118750.
13.
ZhangH-CWuQLiF-Y. Application of online multitask learning based on least squares support vector regression in the financial market. Appl Soft Comput2022; 121: 108754.
14.
ChaoLZhipengJYuanjieZ. A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification. Expert Syst Appl2019; 123: 283–298.
15.
MullerKMikaSRatschG, et al. An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw, 2001; 12(2): 181–201.
WangFZuoWZhangL, et al. A kernel classification framework for metric learning. IEEE Trans Neural Netw Learn Syst2015; 26(9): 1950–1962.
18.
DingXLiuJYangF, et al. Random radial basis function kernel-based support vector machine. Journal of the Franklin Institute2021; 358(18): 10121–10140.
19.
CaiYWangHYeX, et al. A multiple-kernel LSSVR method for separable nonlinear system identification. J Control Theory Appl2013; 11(4): 651–655.
20.
Hooshmand MoghaddamVHamidzadehJ. New Hermite orthogonal polynomial kernel and combined kernels in support vector machine classifier. Pattern Recognit2016; 60: 921–935.
21.
JiangHTaoCDongY, et al. Robust low-rank multiple kernel learning with compound regularization. Eur J Oper Res2021; 295(2): 634–647.
22.
SuykensJAKVandewalleJ. Chaos control using least-squares support vector machines. Int J Circuit Theory Appl1999; 27(6): 605–615.
23.
KaralO. Maximum likelihood optimal and robust support vector regression with lncosh loss function. Neural Netw2017; 94: 1–12.
24.
TanveerMMangalMAhmadI, et al. One norm linear programming support vector regression. Neurocomputing2016; 173: 1508–1518.
25.
AnandPRastogiRChandraS. A class of new support vector regression models. Appl Soft Comput2020; 94: 106446.
26.
GuptaUGuptaD. An improved regularization based Lagrangian asymmetric ν-twin support vector regression using pinball loss function. Appl Intell2019; 49(10): 3606–3627.
27.
BalasundaramSMeenaY. Robust support vector regression in primal with asymmetric huber loss. Neural Process Lett2019; 49(3): 1399–1431.
28.
ChengKLuZ. Active learning Bayesian support vector regression model for global approximation. Inf Sci2021; 544: 549–563.
29.
SchölkopfBSmolaAJBachF, et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press, 2002.
30.
ScornetE. Random forests and kernel methods. IEEE Trans Inf Theory2016; 62(3): 1485–1500.
31.
van der LaanMJPolleyECHubbardAE. Super learner. Stat Appl Genet Mol Biol2007; 6(1): Article25.
32.
MeierLvandeGeerSBühlmannP. High-dimensional additive modeling. Ann Stat2009; 37(6B): 3779–3821.
33.
NewmanDHettichSBlakeC, et al. UCI repository of machine learning databases. Irvine, CA: University of California, Dept. of Information and Computer Science.