Sage Journals: Discover world-class research

Abstract

Accurately predicting the load change of the information system during operation has important guiding significance for ensuring that the system operation is not interrupted and resource scheduling is carried out in advance. For the information system monitoring time series data, this article proposes a load trend prediction method based on isolated forests-empirical modal decomposition-long-term (IF-EMD-LSTM). First, considering the problem of noise and abnormal points in the original data, the isolated forest algorithm is used to eliminate the abnormal points in the data. Second, in order to further improve the prediction accuracy, the empirical modal decomposition algorithm is used to decompose the input data into intrinsic mode function (IMF) components of different frequencies. Each intrinsic mode function (IMF) and residual is predicted using a separate long-term and short-term memory neural network, and the predicted values are reconstructed from each long-term and short-term memory model. Finally, experimental verification was carried out on Amazon’s public data set and compared with autoregressive integrated moving average and Prophet models. The experimental results show the superior performance of the proposed IF-EMD-LSTM prediction model in information system load trend prediction.

Keywords

System load trend isolated forests EMD LSTM

Introduction

With the continuous development of edge computing, fog computing, and artificial intelligence technology, the Internet of Things (IOT) is more and more widely used in military and civil fields. For the IOT system, due to resource constraints, high real-time performance, and intermittent network, how to accurately predict system load changes and carry out early scheduling optimization plays a vital role.^1–3 In IOT systems, the data acquired by monitoring systems usually have time characteristics, which are called time series. The data of some statistical indicators are arranged in sequence according to time. The method of time series trend prediction is to predict the possible level in the next period or years by drawing up and analyzing the time series and making analogy or extension according to the development process, direction, and trend reflected by the time series^4–6.

At present, many scientific research institutions and enterprises are trying to predict the future development of the system or business in advance by means of trend prediction. The commonly used method is to use statistical algorithms such as Holter–Winter, autoregressive integrated moving average (ARIMA), and 3Sigma, combined with historical data for trend prediction.^7–9 In Zhi-Yu et al.,¹⁰ a hybrid prediction model of EMD/EEMD-ARIMA is proposed. The empirical modal decomposition (EMD) and integrated empirical mode decomposition (EEMD) are used to decompose components with different time scale features in the original hydrological time series for long-term runoff prediction in the upper reaches of the Yellow River. In Ruixue and Yuan,¹¹ a network traffic prediction method based on Global Artificial Fish Swarm Algorithm (GAFSA) is proposed to optimize the Support Vector Regression (SVR) model. GAFSA is used to optimize the parameters of the SVR prediction model, which improves the SVR network based on other intelligent optimization algorithms. The problem that the traffic prediction model has a large difference in multiple prediction results makes the prediction result stable and can also improve the prediction accuracy. In Huimin et al.,¹² an integrated back propagation (BP) prediction model is proposed for BP neural network, which easily falls into local minimum and slow convergence. The model integrates multiple BP models with different initial weights and training sets and uses the weighted average method as a combined method to apply the model to the forecasting example of the traffic volume of traffic at the intersection.

Related research

In recent years, with the continuous development of deep learning technology, some deep learning models have been gradually applied to the study of time series data. The deep learning model is a deep neural network model with multiple nonlinear mapping levels. It can abstract the input signal layer by layer and extract features to dig deeper potential laws. In many deep learning models, the cyclic neural network (RNN) introduces the concept of timing into the design of network structures, making it more adaptable in time series data analysis. Among the many variants of RNN, the long-term and short-term memory (LSTM) model compensates for the gradient disappearance and gradient explosion of RNN, and the lack of long-term memory ability, so that the RNN can effectively utilize long-distance time series information.^13,14 Shi et al.¹⁵ proposed a new convolutional LSTM (ConvLSTM) network for precipitation nowcasting, extending the concept of fully connected LSTM (FC-LSTM) to ConvLSTM with convolution structure in input-to-state and state-to-state transitions. Multiple ConvLSTM layers and a coding prediction structure can be constructed to construct an end-to-end trainable model for precipitation nowcasting. Jaeger and Haas¹⁶ proposed a method for learning nonlinear systems, echo state network (ESN), using artificial recurrent neural networks as a learning mechanism in biological brains in a recently proposed independent way. In the benchmark task of predicting chaotic time series, the accuracy is 2400 times higher than the previous technology, and the medium signal error rate is increased by two orders of magnitude. Guo et al.¹⁷ proposed a recurrent neural network for adaptive gradient learning to predict the flow time series in the presence of anomalies and change points. The local features of the time series are explored to automatically weight the loss gradient of the newly available observations with the distribution characteristics of the data in real time, and extensive analysis of the synthetic data set and the real data set is carried out.

In this article, based on information system monitoring time series data, a load trend prediction method based on isolated forests-empirical modal decomposition-long-term (IF-EMD-LSTM) is proposed. Considering the problem of noise and abnormal points in the original data, the Wiener filtering method is used to denoise the input data, and the isolated forest algorithm is used to eliminate the abnormal points in the data. To further improve the prediction accuracy, the EMD algorithm is used to decompose the input data. For the intrinsic mode function (IMF) components of different frequencies, LSTM network training is performed for each group of IMF components, and each IMF and residual is predicted by a separate LSTM neural network, and the predicted values are reconstructed from each LSTM model. Finally, experiments were conducted using Amazon’s public data sets and compared with the ARIMA and Prophet models. The experimental results show the superior performance of the proposed IF-EMD-LSTM prediction model in information system load trend prediction.

Trend prediction hybrid algorithm based on IF-EMD-LSTM

Outliers based on isolated forests

Isolation forest is an ensemble-based rapid anomaly detection method with linear time complexity and high precision. It is an anomaly detection algorithm that meets the requirements of big data processing. Isolated forests are suitable for anomaly detection of continuous data, and anomalies are defined as “an isolated point that is easily isolated, which can be understood as a point that is sparsely distributed and distant from a dense group.”^18–20

Isolated forests need to use the ensemble method to get a convergence value (Monte Carlo method), that is, repeatedly cut from the beginning and then average the results of each cut. Isolated forest is composed of t isolated trees (iTree). Each iTree is a binary tree structure. The implementation steps are as follows:

Randomly select an attribute A.

Randomly select a value of this attribute value.

Classify each record according to A; place the record with A less than value on the left subtree and place the record with value greater than or equal to value on the right subtree.

Recursively construct the left subtree and the right subtree until the following conditions are met: (1) the incoming data set has only one or more identical records and (2) the height of the tree reaches the height threshold.

Figure 1 shows an example of four test samples traversing an iTree.

Figure 1.

Example of four test samples traversing an iTree.

After the iTree is constructed, the data are predicted. The process of prediction is to start the test record from the iTree root node and determine which leaf node the test record falls on. The outliers generally show that the path from the leaf node to the root node is very short. Calculate the anomaly index using a normalized formula

$S (x, n) = 2 - (\frac{h (x)}{c (n)})$ (1)

$c (n) = 2 H (n - 1) - (\frac{2 (n - 1)}{n})$ (2)

$H (k) = \ln (k) + ξ$ (3)

where $n$ is the size of the sample and $h (x)$ is the height of the record x on the iTree. $ξ$ is the Euler’s constant. The range of $S (x, n)$ is [0,1]. The closer the training sample is to 1, the higher the probability of being an abnormal point. The closer the training sample is to 0, the higher the probability of being a normal point. If most training samples are close to 0.5, it means no obvious abnormal points.

The random tree is unstable. After combining multiple iTrees into iForest, you can improve its stability. The iTree is constructed by randomly sampling a portion of the data set to ensure the variability of each tree.

Multi-scale empirical modal data decomposition

EMD decomposition can adaptively decompose non-stationary signals into a series of IMF signals and residual signals. The IMF satisfies two points: first, the number of extreme points and the number of zero crossings must be equal or differ by no more than one; second, at any point, the envelope is formed by the local maximum points and the local pole. The average value of the envelope formed by the small value points is zero.^21–23 For a given signal, the steps to perform EMD decomposition are as follows:

Find the upper extreme point of $x (t)$ and the lower extreme point; use the interpolation method to form the upper and lower envelopes and calculate the initial value $m_{1}$ ;

Extract details

$h_{1} = x (t) - m_{1}$ (4)

Determine whether $h_{1}$ satisfies the conditions of IMF. If it is satisfied, then $h_{1}$ is the first component of $x (t)$ , which is recorded as $c_{1} = h_{1}$ , and the decomposition is terminated. If it is not satisfied, repeat the above steps k times for $h_{1}$ to obtain

$h_{1 k} = h_{1 (k - 1)} - m_{1 k}$ (5)

where $h_{1 k}$ is an IMF, then $c_{1 k} = h_{1 k}$ is the first IMF component of the signal $x (t)$ ;

Until the above iteration satisfies the termination criterion standard deviation (SD), the standard deviation is generally (0.2–0.3);

Separate $c_{1}$ from $x (t)$ and get

$r_{1} = x (t) - c_{1}$ (6)

Decompose $c_{1}, c_{2}, \dots, c_{n}$ and repeat it: the components contain the components of the different frequency segments from high to low. In summary, the decomposition of the original signal is

$x (t) = \sum_{j = 1}^{n} c_{j} + r_{n}$ (7)

LSTM time series prediction

Most of the traditional time series prediction algorithms use ARIMA, Holter–Winter, and so on, and their prediction effect on actual data cannot be very satisfactory. There are two main reasons: one is the smoothing of time series data, and the prediction result is good or bad. It has an important influence; the second is the sliding average operation of the original data, which easily causes undulations and sawtooth, resulting in a decrease in prediction accuracy.

The LSTM network is the most famous one in the threshold RNN. The memory unit is used to judge whether the information is useful. LSTM is an effective technique for solving the long-order dependency problem, and it is highly usable. Compared with the traditional prediction algorithm, LSTM fully considers the time memory and conforms to the characteristics of time series data. Figure 2 shows the schematic diagram of the LSTM principle.

Figure 2.

Schematic diagram of LSTM.

The LSTM contains four very critical elements, the input gate, the output gate, the forget gate, and the memory unit. In LSTM, the calculation formula for each part is as follows:

Input gate.

$i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})$ (8)

where $W_{i}$ is the weight matrix of the input gate, $[h_{t - 1}, x_{t}]$ is the union of the two vectors into a longer vector, $b_{i}$ is the offset term of the input gate, and $σ$ is the sigmoid function

$σ (x) = \frac{1}{1 + e^{- x}}$ (9)

Output gate.

$o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$ (10)

where $W_{o}$ is the weight matrix of the output gate and $b_{o}$ is the bias term of the output gate.

Forgotten door.

$f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$ (11)

where $W_{f}$ is the weight matrix of the forgetting gate and $b_{f}$ is the bias term of the forgetting gate.

Memory unit.

${\tilde{c}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})$ (12)

$c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot {\tilde{c}}_{t}$ (13)

where $W_{c}$ is the weight matrix of the memory unit and $b_{c}$ is the bias term of the memory unit. The $\tanh$ function expression is

$\tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}$ (14)

The final output of the LSTM is determined by the output gate and the unit state

$h_{t} = o_{t} \tanh (c_{t})$ (15)

Among them, $c_{0}$ = 0 at the time of initialization and $h_{0}$ = 0. The input unit of LSTM represents $x_{t}$ , and the output unit represents $h_{t}$ .

By constructing a two-layer LSTM with a number of neurons of 256 per layer, the pair of historical sequence lengths, batch_sizes, and training rounds is changed to find the most suitable value for the parameter. The average absolute error and the average relative error are used as the evaluation indicators of the model:

Set the training round number to 50; the batch_size to 20; and the historical sequence length to 5, 10, 15, 20, 25, and 30. The prediction results are shown in Table 1, where the historical sequence length is 25, and the error is the smallest.

Set the history sequence length to 10; the number of training rounds to 50; and the batch_size to 20, 32, 64, and 128. The prediction results are shown in Table 2. When the batch_size is 10, the error is the smallest.

Set the history sequence length to 10; the batch_size to 20; and the number of training rounds to 30, 50, 80, 100, 300, and 1000. The prediction results are shown in Table 3. When the number of training rounds is 80, the error is the smallest.

Table 1.

Prediction results for different historical sequence lengths.

Sequence length	5	10	15	20	25	30
MAPE	2.66	2.38	2.23	2.28	2.21	2.22
MSE	6.27e-05	6.32e-05	6.38e-05	6.30e-05	6.11e-05	6.14e-05

MSE: mean square error; MAPE: mean absolute percentage error.

Table 2.

Prediction results of different batch_size.

Batch_size	5	10	15	20	32	64	128
MAPE	2.42	2.24	2.34	2.38	2.37	2.7	2.75
MSE	6.42e-05	6.24e-05	6.33e-05	6.38e-05	6.38e-05	6.71e-05	6.74e-05

MSE: mean square error; MAPE: mean absolute percentage error.

Table 3.

Prediction results of different training rounds.

Epoch	30	50	80	100	300	1000
MAPE	2.6	2.38	2.37	2.44	2.72	2.57
MSE	6.72e-05	6.39e-05	6.38e-05	6.46e-05	6.77e-05	6.91e-05

MSE: mean square error; MAPE: mean absolute percentage error.

Through experimental debugging, it can be seen that the historical sequence length = 25, batch_size = 10, the number of training rounds = 80 average absolute error, and the average relative error is the smallest.

Overall flow of the hybrid algorithm

The overall flow of the system load trend prediction method based on IF-EMD-LSTM is shown in Figure 3:

Data preprocessing: the Wiener filtering method is used to denoise the input data, and then the isolated forest algorithm is used to eliminate the abnormal points in the data.

Data decomposition: the EMD algorithm is used to decompose the input data into IMF components of different frequencies.

Neural network training: LSTM network training is performed for each group of IMF components, and each IMF and residual is predicted by a separate LSTM neural network, and predicted values are reconstructed from each LSTM model.

Figure 3.

Flowchart of system load trend prediction method based on IF-EMD-LSTM.

Experimental verification

Evaluation index

The average relative error MAPE and the root-mean-square error (RMSE) are used as the model prediction indicators

$MAPE = \frac{1}{m} \sum_{j = 1}^{m} | \frac{X_{j} - {\hat{X}}_{j}}{X_{j}} |$ (16)

$RMSE = \sqrt{\frac{1}{m} \sum_{j = 1}^{m} | X_{j} - {\hat{X}}_{j} |}$ (17)

where $m$ is the total length of the data, $X_{j}$ is the actual data, and ${\hat{X}}_{j}$ is the predicted data.

Comparison algorithm

In addition to the IF-EMD-LSTM–based system load trend prediction method proposed in section “Trend prediction hybrid algorithm based on IF-EMD-LSTM”, this article selects the commonly used ARIMA and Prophet models to experiment on the same data set and compares it with IF-EMD-LSTM method.

ARIMA model

The full name of the ARIMA model is autoregressive integrated moving average. The ARIMA model is not a specific model, but a general term for a class of models. ARIMA(p,d,q) consists of three parts. AR(p) represents the autoregressive model, relying on the most recent p-history value; I(d) means that the model differentiates the time series, and d represents the difference order; MA(q) represents the moving average model, relying on the most recent q historical prediction error values. The basic idea of the ARIMA method is that a series of digital sequences that change with time and are related to each other can be approximated by the corresponding model. Through the analysis and research of the corresponding mathematical models, the inherent structure and complexity of these dynamic data can be more fundamentally understood, so as to achieve the best prediction in the sense of minimum variance.

Prophet model

Prophet is an open source library based on a decomposable model developed by Facebook. It fully integrates business background knowledge and statistical knowledge. It can perform high-precision time series prediction with simple and intuitive parameters and supports the influence of custom seasons and holidays. It avoids the traditional time series prediction, which has difficulty in achieving the ideal fusion in the accuracy of the model and the interaction with the user. It can be predicted in a simpler and more flexible way and can obtain predictions comparable to experienced analysts. As a result, it has high flexibility, no need to compensate for missing values, and a fast fitting speed. Prophet can be used to handle common features of time series. The user can make efficient adjustments to the model by modifying some of the intuitive parameters without having to understand the internal details of the model. The algorithm uses a decomposable time series, which includes three parts: trend item, period item, and holiday item.

Analysis of results

The experiment requires the LSTM library, sklearn machine learning library, pandas, numpy, matplotlib scientific calculation, and drawing library under the Keras deep learning framework and uses the data of Amazon data set “ec2_cpu_utilization_53ea38.csv” for prediction. The data point in the file is {time, value}, where time represents time and value represents the current CPU value. Data sets were predicted using the IF-EMD-LSTM, ARIMA, and Prophet models.

Figure 4 shows the results of anomalous point culling of the original data using the isolated forest algorithm. It can be seen from the figure that some short high-explosion anomaly data have been eliminated. Figure 5 shows the results of the decomposition of the data using the EMD algorithm, which is decomposed into IMF components and residuals of different frequencies.

Figure 4.

Data curve of abnormal forest culling in isolated forest.

Figure 5.

Data curve after EMD decomposition.

Figure 6 shows the results of predicting the data using the IF-EMD-LSTM prediction algorithm. Figure 7 shows the prediction curve results using the ARIMA model, and Figure 8 shows the prediction curve results using the Prophet model.

Figure 6.

Prediction curve of the IF-EMD-LSTM model.

Figure 7.

Prediction curve results of ARIMA model.

Figure 8.

Prediction curve results of Prophet model.

The three relative prediction algorithms are compared using the average relative error MAPE and the RMSE. The comparison of the prediction results is shown in Table 4. From the table, the IF-EMD-LSTM based system load trend prediction method proposed in this article can be seen. The average relative error and RMSE are better than the ARIMA and Prophet prediction models.

Table 4.

Comparison of evaluation indicators of three prediction algorithms.

	ARIMA	Prophet	IF-EMD-LSTM
MAPE	23.8	3.915	2.21
MSE	1.73	0.098	5.26e-05

ARIMA: autoregressive integrated moving average; MSE: mean square error; MAPE: mean absolute percentage error.

Conclusion

This article proposes a system load trend prediction method based on IF-EMD-LSTM, including abnormal point culling of original data, EMD decomposition of historical data, and LSTM model parameter preferences. The experimental results show that the proposed prediction method is superior to the ARIMA and Prophet prediction models in comparison with the currently used ARIMA and Prophet prediction algorithms and has higher prediction accuracy and better data adaptability. Based on the current research work, further research can be carried out in the aspects of expanding the number of hidden layers and testing the application effect of the multi-hidden layer LSTM network structure and improving the application of deep learning technology in the forecasting field.

Future work

In the future, the author will study the application of gated recurrent unit (GRU) model in load forecasting and consider the method of deep learning to predict resource faults such as disks and networks, so as to maintain and replace them in advance.

Footnotes

Handling Editor: Giancarlo Fortino

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) received no financial support for the research,authorship,and/or publication of this article.

ORCID iD

Jing Yu

References

Zhou

Lü

et al. Sample selected extreme learning machine based intrusion detection in fog computing and MEC. Wirel Commun Mob Com 2018; 2018: 7472095.

Lin

Zhou

et al. Fair resource allocation in an intrusion-detection system for edge computing: ensuring the security of Internet of Things Devices. IEEE Consum Elec Mag 2018; 7(6): 45–50.

Lin

Lü

You

et al. Novel utility based resource management scheme in vehicular social edge computing. IEEE Access 2018; 6: 66673–66684.

Fang

et al. Rural power system load forecast based on principal component analysis. J Northeast Agr Univ 2015; 22(2): 67–72.

Jia

Fan

et al. The application of improved grey GM(1,1) model in power system load forecast. In: Zhang

(ed.) Future wireless networks and information systems. Berlin: Springer, 2012, pp.603–608.

Sapankevych

Sankar

. Time series prediction using support vector machines: a survey. Comput Intell Mag IEEE 2009; 4(2): 24–38.

Chen

Fan

Feng

. Forecast about the output of industrial cement based on Holter-Winter model. J Huanggang Norm Univ 2012; 8(6): 127–133.

Lin

. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 2005; 33(6): 497–505.

Black

. A comparison of key forecast variables derived from isentropic and sigma coordinate regional models. Mon Weather Rev 1987; 115(12): 3097–3114.

10.

Zhi-Yu

Jun

Fang-Fang

. Hybrid models combining EMD/EEMD and ARIMA for long-term streamflow forecasting. Water 2018; 10(7): 853–866.

11.

Ruixue

Yuan

. Research on network traffic prediction model based on GAFSA optimized SVR. J Comput Appl 2013; 30(3): 856–860.

12.

Huimin

Jiangtao

Junchao

et al. Research and application of integrated BP neural network prediction model. Telecommun Sci 2016; 32(2): 60–67.

13.

Greff

Srivastava

Koutnik

et al. LSTM: a search space Odyssey. IEEE T Neur Net Lear 2016; 28(10): 2222–2232.

14.

Miao

Gowayyed

Metze

. EESEN: end-to-end speech recognition using deep RNN models and WFST-based decoding. In: IEEE workshop on automatic speech recognition and understanding, Scottsdale, AZ, 13–17 December 2016. New York: IEEE.

15.

Shi

Chen

Wang

et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS’15 Proceedings of the 28th international conference on neural information processing systems, Montreal, QC, Canada, 7–12 December 2015, pp.802–810. Cambridge, MA: MIT Press.

16.

Jaeger

Haas

. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Sci 2004; 304(5667): 78–80.

17.

Guo

Yao

et al. Robust online time series prediction with recurrent neural networks. In: IEEE international conference on data science & advanced analytics, Montreal, QC, Canada, 17–19 October 2016. New York: IEEE.

18.

Liu

Kai

Zhou

. Isolation forest. In: Eighth IEEE international conference on data mining, Pisa, 15–19 December 2008. New York: IEEE.

19.

Scheller

Spencer

Rustigian-Romsos

et al. Using stochastic simulation to evaluate competing risks of wildfires and fuels management on an isolated forest carnivore. Landscape Ecol 2011; 26(10): 1491–1504.

20.

Kitamura

Thongaree

Madsri

et al. Mammal diversity and conservation in a small isolated forest of southern Thailand. Raffles Bull Zool 2010; 58(1): 1205–1210.

21.

Boudraa

Cexus

. EMD-based signal filtering. IEEE T Instrum Meas 2007; 56(6): 2196–2202.

22.

Kopsinis

Mclaughlin

. Development of EMD-based denoising methods inspired by wavelet thresholding. IEEE T Signal Proces 2009; 57(4): 1351–1362.

23.

Wang

Lai

. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energ Econ 2008; 30(5): 2623–2635.