Abstract
Introduction
The rapid development of our industrialization process and economy in recent years, air pollution has already affected in developed regions of China such as Beijing, Tianjin and Hebei, the Pearl River Delta and the Yangtze River Delta. 1 PM2.5 is one of particulate matter causing pollution. The sources of PM2.5 are mainly divided into primary emission and secondary transformation sources, and the secondary transformation is the main source.2,3 The people’s health, vegetation, agriculture and building materials are harmed in the Los Angeles Smog in 1930s, so people must use protective equipment to avoid being affected.4–7 To make people avoid air pollution in advance, China’s environment department has also formulated a large number of measures and policies. 8
The prediction of PM2.5 concentration is the first research topic at present. Many methods are proposed to predict PM2.5 concentration by a large number of experts and scholars. For example, Zhou et al.9,10 proposes a generalized regression neural network prediction by data processing and analysis integrated empirical mode decomposition to predict PM2.5. This method can improve the prediction accuracy of PM2.5 concentration. Elangasinghe et al. 11 studies the PM2.5 concentration in the region of New Zealand by artificial neural network (ANN). Feng et al. 12 uses different stations to establish a hybrid model to improve the prediction accuracy of daily average PM2.5 concentration by artificial neural network (ANN). The influencing factors between different stations is used as the inputs of the hybrid model by the data of 13 pollution monitoring stations. The data is decomposed by wavelet changes and applied to the corresponding models. The method can establish powerful nonlinear network and improve prediction accuracy. Some scholars use time series method to study PM2.5 concentration. Erdem et al. 13 studies that the hourly average wind speed is predicted by the autoregressive moving average (ARMA) method. This method is used to predict the single sequence PM2.5 concentration. This method improves the prediction accuracy of non-stationary wind speed time series. Lee and Tong 14 uses a hybrid forecasting method to solve nonlinear time series prediction by improved autoregressive integral moving average method. The support vector machine and other statistical methods are also widely applied. Kavousi-Fard et al studies that a hybrid forecasting algorithm to forecast power load. The support vector regression is combined with improved firefly method to obtain optimal parameters and improve the prediction accuracy. 15 Lin et al. 16 gives an improved combination algorithm of support vector regression (SVR) to forecast seasonal revenues. However, the above-mentioned advanced methods need a large number of sample data, and the algorithm is complex and the amount of calculation is large. In complex environment, the most required variables are usually difficult to obtain. For example, the monitoring of SO2 emissions from coal-fired power plant, the environment of monitoring flue gas is bad. The cost of installing monitoring equipment is high, so the SO2 emissions are difficult to obtain. Thus, it is difficult to apply online in atmospheric environment. For this problem, the Kalman filter (KF) method doesn’t need a lot of calculation, and the calculation is simple. The Kalman method can estimate the missing information by limited measurement information. The KF method is widely applied.17,18 The traditional KF method can only be used in linear system, and the application scope is limited. The extended Kalman filter (EKF) and unscented Kalman filter (UKF)19,20 in nonlinear methods are used to apply in prediction area. Nasseri et al. 19 studies the EKF is combined with Genetic Programming (GP) method to predict the water demand in Tehran. Zhao et al. 21 proposes a robust iterated extended Kalman filter method. The EKF method needs to linearize the nonlinear function and calculate the Jacobian matrix by the Taylor’s formula. The prediction accuracy of this method is not ideal. Qi et al. 22 proposes an improved UKF method (UKF-GPS) to improve numerical stability for power system. The shortcomings of the EKF method are solved. 23
However, most Kalman filter applications need to assume that the noise statistics are accurate or known. Such as noise variance of system and measurement noise, which is correct.
Once the noise statistics are not accurate, the PM2.5 concentration prediction based on Kalman method will be inaccurate or even unstable. For this problem, Dai et al. 24 uses adaptive Kalman filter to estimate battery temperature. Liu and Partovibakhsh 25 studies adaptive unscented Kalman filter to estimate the state of lithium battery. This method uses covariance matching technology to adjust the noise covariance adaptively, which makes state of charge (SoC) estimation with high accuracy. Xiong et al. 26 uses the adaptive extended Kalman filter to realize the accurate SOC estimation of battery pack and ensure the accuracy of each battery estimation.
In addition, in the process of PM2.5 concentration prediction. The statistical noise is usually assumed to be zero mean Gaussian white noise. In actual situation, the assumption is difficult to realize. Because the actual environment is complex and changeable, it is easy to be affected by external interference, which leads to the deviation of noise distribution, and affects the prediction accuracy. For this problem, an adaptive unscented Kalman filter based on support vector regression is proposed and applied in the area of PM2.5 concentration prediction and on-line noise estimation. The prediction accuracy of atmospheric PM2.5 concentration is improved from two aspects in this paper. Firstly, a framework of state equation based on support vector regression (SVR) is constructed. The framework of the model takes advantage of the SVR which can be used to do regression under the condition of small samples, and the kernel function can be used to solve the nonlinear problems. The SVR method avoids the shortcomings of model over fitting. Secondly, for the randomness and uncertainty of PM2.5 concentration, an adaptive unscented Kalman filter (AUKF) is used to update the status continuously, and this method can predict the atmospheric PM2.5 concentration when the noise statistics are incorrect or incorrect Gaussian. Then, the proposed method is compared with SVR-UKF. The simulation results show that the noise is estimated adaptively when the noise statistics of atmospheric PM2.5 model is incorrect or exactly Gaussian. The accuracy and stability of the proposed method are proved. Finally, the proposed method is compared with SVR-UKF, AR-Kalman, AR and BP method. The simulation results prove the proposed method is more accurate in predicting PM2.5 concentration.
The structure of the remainder of this paper is described as follows: the second section describes the development of PM2.5 monitoring system. The third section is the PM2.5 concentration prediction method. The simulation analysis is shown in fourth section. The last part draws the conclusions.
The development of PM2.5 monitoring system
The monitoring system is introduced in this section. Beijing is a city with distinct seasons. The historical data of PM2.5 concentration in 2018 is collected in a campus monitoring station area. The data is selected as research object from May to August. About 70% of the monitoring data is used as the training set, and the remaining 30% is the test data. The air monitoring system is shown in Figure 1, and the data is monitored by outdoor monitoring sensor probe. The system of platform equipment is composed of a solar panel power supply system and a power distribution box, which uses solar energy to supply power for the whole equipment. And the monitoring data is transferred to network cloud. The monitoring data concentration and historical trend are directly accessed the network by the upper computer. The trend of air pollution data can be monitored by trend graph. The operation status and geographical location of equipment at different monitoring areas can be seen in the monitoring area map. The users can monitor normal operation of equipment at different areas by different computer. The monitoring device can also monitor air quality at different times and display the PM2.5 mean concentration. The monitored PM2.5 concentration is used to verify the proposed method by this system.

The air monitoring data system.
Through this monitoring platform, the proposed method in this paper is applied to predict the atmospheric PM2.5 concentration in the next hour. Thus, the proposed is mainly introduced in next section.
Method
Support vector regression
The support vector machine (SVM) has usually classification and regression.
27
The SVR method is one of the SVM methods. The SVR is widely used in the field of regression. In this paper, the SVR is used to predict PM2.5 concentration. The SVR method mainly constructs linear decision function in high-dimensional space to realize model prediction. The linear decision function is usually found by solving convex quadratic programming problems. This solution process embodies the principle of structural risk minimization, which avoids the over fitting phenomenon of traditional methods and improves the prediction ability. The SVR gives an
where
where
where
By substituting formula (5) to (7) into formula (4), the optimization problem is given as follows:
where
where
The Radial Basis Function (RBF) is selected as kernel function in this paper, which is described as follows:
where
The optimal choice of
The adaptive unscented Kalman filter method
The sampling points in the original state distribution are selected by a certain rule for this method, so that the mean and covariance of original state distribution are replaced by the sampling points. The state space frame of SVR is given as follows:
where
where
where
where
where
The one step prediction of calculating sigma point set is obtained as follows:
The one step prediction of system state is obtained by the weighted average value as follows:
The one step prediction of covariance matrix is calculated as follows:
The new sigma point is generated by the UT transform according to the predicted value as follows:
The predicted observations are obtained as follows:
The predicted mean of the system is respectively described by weighted sum as follows:
The update covariance and measurement covariance are obtained as follows:
The Kalman gain is calculated by formulate (25):
The state update and covariance update of the system are given as follows:
where
When the statistic noise is assumed to set, the SVR-UKF method can predict PM2.5 concentration accurately. However, the noise is usually unknown or inaccurate. For example, assuming that the noise is zero mean Gaussian noise,
The mean value of process noise is given as follows:
where
where
where
The mean value of measured noise is estimated as follows:
The measurement noise covariance is estimated as follows:
While the accuracy of PM2.5 concentration prediction is improved, the noise is estimated adaptively. The state space framework of SVR is established, and the AUKF method is combined with the state space framework to estimate the noise online. The flowchart of SVR-AUKF method is shown in Figure 2.

The flowchart of SVR-AUKF method.
The flowchart shows the operation process of SVR-AUKF method. Firstly, we select the RBF as kernel function. The state space framework of SVR is established by the monitored PM2.5 concentration data. Secondly, the appropriate set of sigma points is selected by a new set of PM2.5 concentration data. Then, the recursive calculation of UKF method is carried out by the established SVR framework. Finally, the SVR-AUKF method is implemented to predict PM2.5 concentration with the noise (
Simulation analysis
In this section, the advantages and effectiveness of the proposed method are verified by simulation analysis. The PM2.5 concentration is collected every hour as sampling period by monitor monitoring data system. The first 700 sets of sample data are used for training data, the other 300 sets are used for test data. The different performance indexes are used to evaluate the prediction precision, which includes mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), which is respectively given as follows:
where
PM2.5 concentration prediction with inaccurate process noises
The different noise statistics are used for simulation research and comparison in this section. When the process noises are incorrect or unknown, the SVR-AUKF method is compared with SVR-UKF method. The prediction accuracy and validity of SVR-AUKF is verified by comparison. Firstly, the measurement noise (
The given measurement statistics and the inaccurate process noise.
The given measurement statistics and the inaccurate process noise.
The prediction result of SVR-AUKF method is compared with SVR-UKF method. The Figure 3 shows the prediction result of SVR-AUKF and SVR-UKF with the inaccurate process noise. The comparison results are given as follows:

PM2.5 concentration prediction based on SVR-AUKF and SVR-UKF with the inaccurate process noise.
Figure 3 shows the prediction results based on SVR-AUKF method and SVR-UKF method with incorrect noise statistics. It can be seen that when the inaccurate noise statistics are given, the SVR-AUKF can predict PM2.5 concentration. The SVR-UKF can also predict PM2.5 concentration. SVR-UKF has no noise estimator. Once the noise is changed, the prediction accuracy will be affected. In order to further prove this situation, the Figure 4 gives boxplot of the prediction error based on the two methods.

Boxplot of the prediction residuals of SVR-AUKF and SVR-UKF with the inaccurate process noise.
The Figure 4 shows that the SVR-AUKF method is more accurate in predicting PM2.5 concentration. More uniform error distribution. The upper and lower limits of error are smaller. However, SVR-UKF has no noise estimator. It is more susceptible to the given incorrect noise than SVR-AUKF. The prediction accuracy is also affected. For example, the Table 2 gives another set of incorrect noises is given.
The Figure 5 shows the prediction result of SVR-AUKF and SVR-UKF with another set of inaccurate process noise. The comparison results are given as follows:

PM2.5 concentration prediction based on SVR-AUKF and SVR-UKF with the inaccurate process noise.
It is obvious that incorrect noise statistics has a great impact on SVR-UKF from Figures 5 and 6. The prediction error is easy to be affected and changed, and the prediction accuracy will also be affected when noise change. The SVR-AUKF method is not affected greatly, and the noise is estimated adaptively. When the noise of SVR-AUKF method is 0.3, 0.05, 0.3, 0.08. The MAE is 2.7104, the MAPE is 0.9035, the RMSE is 4.2086. The MAE of SVR-UKF is 5.1892, the MAPE is 1.7297, the RMSE is 8.0205. Therefore, the SVR-AUKF method can adaptively realize noise statistics and improve PM2.5 concentration prediction accuracy when the process noise is not correct. The SVR-AUKF has better robustness.

Boxplot of the prediction residuals of SVR-AUKF and SVR-UKF with the inaccurate process noise.
In order to further prove that the SVR-AUKF method has higher accuracy, the five different methods are compared. When the noise (

The comparison results of PM2.5 concentration prediction based on five methods.

Boxplot of the prediction residuals based on five method.
In order to further prove the prediction accuracy of the proposed method for PM2.5 concentration, the proposed method is compared with other methods. The statistical error of different methods is given as follows:
The Table 3 shows that the performance error of SVR-AUKF method is smallest. The SVR-AUKF method is superior to other method. The proposed SVR-AUKF method not only has PM2.5 concentration prediction accuracy, but it also has higher robustness.
The statistical error of different method.
PM2.5 concentration prediction with inaccurate measurement noise
In this section, the initial value of process noise (
The given process statistics and the inaccurate measurement noise.
The given process statistics and the inaccurate measurement noise.
The Figure 9 shows the prediction result of SVR-AUKF and SVR-UKF with the inaccurate measurement noises. The comparison results are given as follows:

PM2.5 concentration prediction based on SVR-AUKF and SVR-UKF with the inaccurate measurement noise.
The Figures 9 and 10 shows that the prediction accuracy of the SVR-AUKF method with the inaccurate measurement noise. The prediction accuracy of the SVR-AUKF method is higher than the SVR-UKF method. The inaccurate noise is easy to produce large error. When the inaccurate measurement noise is given, the SVR-AUKF method still has higher prediction accuracy, and it is not affected by the noise, which can be estimate adaptively. The proposed method is superior to SVR-UKF, and it has good robustness, and the effectiveness of SVR-AUKF method is also proved. In order to prove that SVR-UKF method is easily affected without noise estimator, another set of incorrect measurement noise is given as follows:

Boxplot of the prediction residuals of SVR-AUKF and SVR-UKF with the inaccurate measurement noise.
The Figure 11 shows the prediction result of SVR-AUKF and SVR-UKF with the inaccurate measurement noises. The comparison results are given as follows:

The comparison results of SVR-AUKF and SVR-UKF with the inaccurate measurement noise.
It can be seen that the prediction accuracy of SVR-AUKF method is still more accurate from Figures 11 and 12. The Figure 12 shows the prediction residuals of SVR-AUKF and SVR-UKF. The error upper limit distribution of SVR-AUKF is smaller and more uniform. When another inaccurate measurement noise is given, the SVR-AUKF method still has higher prediction accuracy, it is not affected and estimate adaptively. In order to better prove the prediction accuracy of the proposed method, when the noise (

Boxplot of the prediction residuals of SVR-AUKF and SVR-UKF with the inaccurate measurement noise.

The comparison results of PM2.5 concentration prediction based on five methods.
From the Figures 13 and 14, it can be concluded that the SVR-AUKF method with noise estimator is more accurate and robust in predicting PM2.5 concentration. The error statistical analysis of Table 6 is given. The MAE of SVR-AUKF is 3.0418, the MAPE is 1.0139, the RMSE is 5.0469. The MAE of AR-Kalman method is 5.7451, the MAPE is 1.9150, the RMSE is 8.2389. The MAE of SVR-UKF method is 4.8615, the MAPE is 1.6205, the RMSE is 7.6176. The error of SVR-AUKF method is the smallest in five methods. Due to the SVR-UKF method lacks noise estimator, the SVR-UKF method is easy to be affected with noise. The proposed SVR-AUKF method has higher prediction accuracy than other methods, the statistical error is smaller and the robustness is better. The statistical error is shown as follows:

Boxplot of the prediction residuals based on five method.
The statistical error of different method.
Conclusions
In this paper, a hybrid modeling method based on support vector regression state space and adaptive unscented Kalman filter is proposed. Firstly, the support vector regression model is established and the state space framework of PM2.5 concentration series is described. Then, the adaptive unscented Kalman filter is used to dynamically update the state when the process and measurement noise are unknown or incorrect, and the noise is correctly estimated in the iterative process, so as to improve the prediction accuracy of PM2.5 concentration. Finally, the proposed method is compared with the SVR-UKF method which lacks the ability of noise estimation, the results show that the proposed method can not only accurately predict the PM2.5 concentration, but also has higher robustness when the process and measurement noises are incorrect. The proposed is compared with SVR-UKF, AR-Kalman, AR and BP methods, the simulation results prove that the proposed method has higher prediction accuracy in predicting PM2.5 concentration.
However, the PM2.5 concentration always changes with complex environment in atmospheric environment. The single model method is difficult to satisfy the current changing environment. Thus, the PM2.5 concentration prediction of multiple models may be considered and studied in the future.
