Abstract
Introduction
With the continuous development of edge computing, fog computing, and artificial intelligence technology, the Internet of Things (IOT) is more and more widely used in military and civil fields. For the IOT system, due to resource constraints, high real-time performance, and intermittent network, how to accurately predict system load changes and carry out early scheduling optimization plays a vital role.1–3 In IOT systems, the data acquired by monitoring systems usually have time characteristics, which are called time series. The data of some statistical indicators are arranged in sequence according to time. The method of time series trend prediction is to predict the possible level in the next period or years by drawing up and analyzing the time series and making analogy or extension according to the development process, direction, and trend reflected by the time series4–6.
At present, many scientific research institutions and enterprises are trying to predict the future development of the system or business in advance by means of trend prediction. The commonly used method is to use statistical algorithms such as Holter–Winter, autoregressive integrated moving average (ARIMA), and 3Sigma, combined with historical data for trend prediction.7–9 In Zhi-Yu et al., 10 a hybrid prediction model of EMD/EEMD-ARIMA is proposed. The empirical modal decomposition (EMD) and integrated empirical mode decomposition (EEMD) are used to decompose components with different time scale features in the original hydrological time series for long-term runoff prediction in the upper reaches of the Yellow River. In Ruixue and Yuan, 11 a network traffic prediction method based on Global Artificial Fish Swarm Algorithm (GAFSA) is proposed to optimize the Support Vector Regression (SVR) model. GAFSA is used to optimize the parameters of the SVR prediction model, which improves the SVR network based on other intelligent optimization algorithms. The problem that the traffic prediction model has a large difference in multiple prediction results makes the prediction result stable and can also improve the prediction accuracy. In Huimin et al., 12 an integrated back propagation (BP) prediction model is proposed for BP neural network, which easily falls into local minimum and slow convergence. The model integrates multiple BP models with different initial weights and training sets and uses the weighted average method as a combined method to apply the model to the forecasting example of the traffic volume of traffic at the intersection.
Related research
In recent years, with the continuous development of deep learning technology, some deep learning models have been gradually applied to the study of time series data. The deep learning model is a deep neural network model with multiple nonlinear mapping levels. It can abstract the input signal layer by layer and extract features to dig deeper potential laws. In many deep learning models, the cyclic neural network (RNN) introduces the concept of timing into the design of network structures, making it more adaptable in time series data analysis. Among the many variants of RNN, the long-term and short-term memory (LSTM) model compensates for the gradient disappearance and gradient explosion of RNN, and the lack of long-term memory ability, so that the RNN can effectively utilize long-distance time series information.13,14 Shi et al. 15 proposed a new convolutional LSTM (ConvLSTM) network for precipitation nowcasting, extending the concept of fully connected LSTM (FC-LSTM) to ConvLSTM with convolution structure in input-to-state and state-to-state transitions. Multiple ConvLSTM layers and a coding prediction structure can be constructed to construct an end-to-end trainable model for precipitation nowcasting. Jaeger and Haas 16 proposed a method for learning nonlinear systems, echo state network (ESN), using artificial recurrent neural networks as a learning mechanism in biological brains in a recently proposed independent way. In the benchmark task of predicting chaotic time series, the accuracy is 2400 times higher than the previous technology, and the medium signal error rate is increased by two orders of magnitude. Guo et al. 17 proposed a recurrent neural network for adaptive gradient learning to predict the flow time series in the presence of anomalies and change points. The local features of the time series are explored to automatically weight the loss gradient of the newly available observations with the distribution characteristics of the data in real time, and extensive analysis of the synthetic data set and the real data set is carried out.
In this article, based on information system monitoring time series data, a load trend prediction method based on isolated forests-empirical modal decomposition-long-term (IF-EMD-LSTM) is proposed. Considering the problem of noise and abnormal points in the original data, the Wiener filtering method is used to denoise the input data, and the isolated forest algorithm is used to eliminate the abnormal points in the data. To further improve the prediction accuracy, the EMD algorithm is used to decompose the input data. For the intrinsic mode function (IMF) components of different frequencies, LSTM network training is performed for each group of IMF components, and each IMF and residual is predicted by a separate LSTM neural network, and the predicted values are reconstructed from each LSTM model. Finally, experiments were conducted using Amazon’s public data sets and compared with the ARIMA and Prophet models. The experimental results show the superior performance of the proposed IF-EMD-LSTM prediction model in information system load trend prediction.
Trend prediction hybrid algorithm based on IF-EMD-LSTM
Outliers based on isolated forests
Isolation forest is an ensemble-based rapid anomaly detection method with linear time complexity and high precision. It is an anomaly detection algorithm that meets the requirements of big data processing. Isolated forests are suitable for anomaly detection of continuous data, and anomalies are defined as “an isolated point that is easily isolated, which can be understood as a point that is sparsely distributed and distant from a dense group.”18–20
Isolated forests need to use the ensemble method to get a convergence value (Monte Carlo method), that is, repeatedly cut from the beginning and then average the results of each cut. Isolated forest is composed of
Randomly select an attribute A.
Randomly select a value of this attribute value.
Classify each record according to A; place the record with A less than value on the left subtree and place the record with value greater than or equal to value on the right subtree.
Recursively construct the left subtree and the right subtree until the following conditions are met: (1) the incoming data set has only one or more identical records and (2) the height of the tree reaches the height threshold.
Figure 1 shows an example of four test samples traversing an iTree.

Example of four test samples traversing an iTree.
After the iTree is constructed, the data are predicted. The process of prediction is to start the test record from the iTree root node and determine which leaf node the test record falls on. The outliers generally show that the path from the leaf node to the root node is very short. Calculate the anomaly index using a normalized formula
where
The random tree is unstable. After combining multiple iTrees into iForest, you can improve its stability. The iTree is constructed by randomly sampling a portion of the data set to ensure the variability of each tree.
Multi-scale empirical modal data decomposition
EMD decomposition can adaptively decompose non-stationary signals into a series of IMF signals and residual signals. The IMF satisfies two points: first, the number of extreme points and the number of zero crossings must be equal or differ by no more than one; second, at any point, the envelope is formed by the local maximum points and the local pole. The average value of the envelope formed by the small value points is zero.21–23 For a given signal, the steps to perform EMD decomposition are as follows:
Find the upper extreme point of
Extract details
Determine whether
where
Until the above iteration satisfies the termination criterion standard deviation (SD), the standard deviation is generally (0.2–0.3);
Separate
Decompose
LSTM time series prediction
Most of the traditional time series prediction algorithms use ARIMA, Holter–Winter, and so on, and their prediction effect on actual data cannot be very satisfactory. There are two main reasons: one is the smoothing of time series data, and the prediction result is good or bad. It has an important influence; the second is the sliding average operation of the original data, which easily causes undulations and sawtooth, resulting in a decrease in prediction accuracy.
The LSTM network is the most famous one in the threshold RNN. The memory unit is used to judge whether the information is useful. LSTM is an effective technique for solving the long-order dependency problem, and it is highly usable. Compared with the traditional prediction algorithm, LSTM fully considers the time memory and conforms to the characteristics of time series data. Figure 2 shows the schematic diagram of the LSTM principle.

Schematic diagram of LSTM.
The LSTM contains four very critical elements, the input gate, the output gate, the forget gate, and the memory unit. In LSTM, the calculation formula for each part is as follows:
Input gate.
where
Output gate.
where
Forgotten door.
where
Memory unit.
where
The final output of the LSTM is determined by the output gate and the unit state
Among them,
By constructing a two-layer LSTM with a number of neurons of 256 per layer, the pair of historical sequence lengths, batch_sizes, and training rounds is changed to find the most suitable value for the parameter. The average absolute error and the average relative error are used as the evaluation indicators of the model:
Set the training round number to 50; the batch_size to 20; and the historical sequence length to 5, 10, 15, 20, 25, and 30. The prediction results are shown in Table 1, where the historical sequence length is 25, and the error is the smallest.
Set the history sequence length to 10; the number of training rounds to 50; and the batch_size to 20, 32, 64, and 128. The prediction results are shown in Table 2. When the batch_size is 10, the error is the smallest.
Set the history sequence length to 10; the batch_size to 20; and the number of training rounds to 30, 50, 80, 100, 300, and 1000. The prediction results are shown in Table 3. When the number of training rounds is 80, the error is the smallest.
Prediction results for different historical sequence lengths.
MSE: mean square error; MAPE: mean absolute percentage error.
Prediction results of different batch_size.
MSE: mean square error; MAPE: mean absolute percentage error.
Prediction results of different training rounds.
MSE: mean square error; MAPE: mean absolute percentage error.
Through experimental debugging, it can be seen that the historical sequence length = 25, batch_size = 10, the number of training rounds = 80 average absolute error, and the average relative error is the smallest.
Overall flow of the hybrid algorithm
The overall flow of the system load trend prediction method based on IF-EMD-LSTM is shown in Figure 3:
Data preprocessing: the Wiener filtering method is used to denoise the input data, and then the isolated forest algorithm is used to eliminate the abnormal points in the data.
Data decomposition: the EMD algorithm is used to decompose the input data into IMF components of different frequencies.
Neural network training: LSTM network training is performed for each group of IMF components, and each IMF and residual is predicted by a separate LSTM neural network, and predicted values are reconstructed from each LSTM model.

Flowchart of system load trend prediction method based on IF-EMD-LSTM.
Experimental verification
Evaluation index
The average relative error MAPE and the root-mean-square error (RMSE) are used as the model prediction indicators
where
Comparison algorithm
In addition to the IF-EMD-LSTM–based system load trend prediction method proposed in section “Trend prediction hybrid algorithm based on IF-EMD-LSTM”, this article selects the commonly used ARIMA and Prophet models to experiment on the same data set and compares it with IF-EMD-LSTM method.
ARIMA model
The full name of the ARIMA model is autoregressive integrated moving average. The ARIMA model is not a specific model, but a general term for a class of models. ARIMA(p,d,q) consists of three parts. AR(p) represents the autoregressive model, relying on the most recent p-history value; I(d) means that the model differentiates the time series, and d represents the difference order; MA(q) represents the moving average model, relying on the most recent q historical prediction error values. The basic idea of the ARIMA method is that a series of digital sequences that change with time and are related to each other can be approximated by the corresponding model. Through the analysis and research of the corresponding mathematical models, the inherent structure and complexity of these dynamic data can be more fundamentally understood, so as to achieve the best prediction in the sense of minimum variance.
Prophet model
Prophet is an open source library based on a decomposable model developed by Facebook. It fully integrates business background knowledge and statistical knowledge. It can perform high-precision time series prediction with simple and intuitive parameters and supports the influence of custom seasons and holidays. It avoids the traditional time series prediction, which has difficulty in achieving the ideal fusion in the accuracy of the model and the interaction with the user. It can be predicted in a simpler and more flexible way and can obtain predictions comparable to experienced analysts. As a result, it has high flexibility, no need to compensate for missing values, and a fast fitting speed. Prophet can be used to handle common features of time series. The user can make efficient adjustments to the model by modifying some of the intuitive parameters without having to understand the internal details of the model. The algorithm uses a decomposable time series, which includes three parts: trend item, period item, and holiday item.
Analysis of results
The experiment requires the LSTM library, sklearn machine learning library, pandas, numpy, matplotlib scientific calculation, and drawing library under the Keras deep learning framework and uses the data of Amazon data set “ec2_cpu_utilization_53ea38.csv” for prediction. The data point in the file is {time, value}, where time represents time and value represents the current CPU value. Data sets were predicted using the IF-EMD-LSTM, ARIMA, and Prophet models.
Figure 4 shows the results of anomalous point culling of the original data using the isolated forest algorithm. It can be seen from the figure that some short high-explosion anomaly data have been eliminated. Figure 5 shows the results of the decomposition of the data using the EMD algorithm, which is decomposed into IMF components and residuals of different frequencies.

Data curve of abnormal forest culling in isolated forest.

Data curve after EMD decomposition.
Figure 6 shows the results of predicting the data using the IF-EMD-LSTM prediction algorithm. Figure 7 shows the prediction curve results using the ARIMA model, and Figure 8 shows the prediction curve results using the Prophet model.

Prediction curve of the IF-EMD-LSTM model.

Prediction curve results of ARIMA model.

Prediction curve results of Prophet model.
The three relative prediction algorithms are compared using the average relative error MAPE and the RMSE. The comparison of the prediction results is shown in Table 4. From the table, the IF-EMD-LSTM based system load trend prediction method proposed in this article can be seen. The average relative error and RMSE are better than the ARIMA and Prophet prediction models.
Comparison of evaluation indicators of three prediction algorithms.
ARIMA: autoregressive integrated moving average; MSE: mean square error; MAPE: mean absolute percentage error.
Conclusion
This article proposes a system load trend prediction method based on IF-EMD-LSTM, including abnormal point culling of original data, EMD decomposition of historical data, and LSTM model parameter preferences. The experimental results show that the proposed prediction method is superior to the ARIMA and Prophet prediction models in comparison with the currently used ARIMA and Prophet prediction algorithms and has higher prediction accuracy and better data adaptability. Based on the current research work, further research can be carried out in the aspects of expanding the number of hidden layers and testing the application effect of the multi-hidden layer LSTM network structure and improving the application of deep learning technology in the forecasting field.
Future work
In the future, the author will study the application of gated recurrent unit (GRU) model in load forecasting and consider the method of deep learning to predict resource faults such as disks and networks, so as to maintain and replace them in advance.
