Abstract
Introduction
Novelty detection or outlier/anomaly detection techniques have been applied in many practical domains, such as fraud detection for credit cards, intrusion detection for cyber-security, faulty detection for industrial systems, to name but a few.1–3 However, developing dedicated novelty detectors for industrial control systems has rarely been taken into account, let alone for an electric arc furnace (EAF) control system. During recent years, many industrial systems (including EAF) have introduced data-driven techniques to facilitate the modeling and control processes as more and more process data could be collected and stored. Inspired by this, novelty detection is drawing increasing attentions in industrial systems, because anomalous observations have adverse impact on both the modeling and the control process that use any data-driven technique. In the control system of EAF, outliers are referred to as observations that cannot reflect the normal system states. 4
Recently, several advanced control strategies such as adaptive control and model predictive control have been proposed to apply to EAF systems, in order to improve the control performance and save energy.5–8 We have noticed that in these control methods, some machine learning algorithms like neural network have been used to establish the process model of EAF. It is well known that these data-driven models are often very sensitive to outliers in training sets and the resultant control performance will deteriorated when they have been included. (This should be the main reason why these advanced data-driven control strategies have not been extensively used in practical EAF control systems.) In this situation, implementing an efficient novelty detector on the EAF control system may be beneficial. It is noteworthy that this novelty detector has subtle distinct with a process monitoring model. Here, we are only interested in the controlled variables or variables that would be used by data-driven control strategies. In contrast, process monitoring concerns more about the state variables of systems.
In spite that rare dedicated detectors in literature have been proposed for EAF, there still many existing techniques in machine learning and data mining that can be used here. According to the availability of supervision, these novelty detectors are often categorized into three groups, that is, supervised detectors, unsupervised detectors, and semi-supervised detectors. Supervised detectors use labeled training data to learn conventional classifiers, such as support vector machine (SVM) and decision tree, that can separate normal observations from anomalous ones. A crucial drawback of this type of detectors resides on the need of labeled training data, which is of great challenges for most practical applications including EAF, because labeling process data is a time- and human-consuming work. In contrast, unsupervised detectors can use some similarity criteria like distance and density to mine potential outliers in databases. Therefore, these detectors are usually used in off-line ways and most presentative methods should be distance-based detector and local outlier factor (LOF).9,10 Semi-supervised detectors are also referred to as one-class (OC) classifiers and data description techniques. The basic idea of this type of detectors is that a normal pattern can be learnt since all training samples are assumed from the target class. In this paper, we will often use the term OC classifiers to denote semi-supervised detectors. The most representative OC classifiers should be support vector data description (SVDD), which aims to enclose all training data with a hypersphere whose volume should be as small as possible. 11
Based on the mechanisms of these detectors, those semi-supervised ones are more appropriate for novelty detection in an online manner. Then in order to improve the performance of single detectors, several ensemble models have been proposed.12–14 By combining diverse base detectors, the final detection performance could be enhanced. Note that a notable limitation of these ensemble models is that the used base detectors should be accurate and diverse simultaneously. This assumption can hardly be satisfied when we have none data labels. In this paper, we propose a dynamic selective model that uses SVDD as base detectors. Our detector can also be deemed an ensemble model as several base learners are necessary. In contrast to above ensemble detectors where a fix model structure will be used for all test points, our detector will select the most competent base detector for each test point dynamically. In order to facilitate the selective procedure, we use a trick to generate artificial outliers. Moreover, these outliers will also be used to obtain an optimal parameters of algorithm SVDD in our detector. We have noted that this problem has always been ignored by many researches that use SVDD. In addition, clustering technique is also used in the selective process to determine validation set for each test point.
Here, we conclude the contributions as follows:
A dynamic novelty detection model is proposed for EAF control system.
Artificial outlier examples are generated to determine validation sets and optimize parameters of SVDD.
Datasets from real-world EAF system are used to verify the effectiveness of our detection model.
The rest of paper is organized as follows. Some related works and preliminaries will be presented in section “Related works and preliminaries.” The proposed method will be introduced in section “Methodology,” followed by the experiments in section “Experiments and analysis.” Finally, some conclusions will be drawn in section “Conclusion.”
Related works and preliminaries
Several related works regarding novelty detection in EAF systems will be introduced in spite of their sparseness. Then some necessary preliminaries will also be presented simply.
Related work
In Liu et al., 4 a model-based novelty detection model is proposed for the process control system of EAF. In this model, an improved Radial Basis Function (RBF) neural network is first used to establish the process model of EAF. Then hidden Markov model (HMM) is used to analyze the residuals between the true measurements and the results of this process model. From our point of view, the main drawback of this method is that it can only be used for univariate dataset. For multivariate datasets, several such models may be necessary. Then it heavily depends on the predictive model, and the detection performance will deteriorate much when the predictions are biased. In Wang and Mao, 12 a clustering-based ensemble detector is proposed for EAF control system. In this method, a clustering algorithm is used first to separate the training set into several subsets, in each of which a single detector is established. Then any test point will be labeled as an outlier if it rejected by all base detectors. In Wang and Mao, 13 technique Random Subspace (RS) is used to develop an ensemble detector. In this method, RS is used first to divide the feature space into several subspaces, then all training points will be projected onto these subspaces to generate several training subsets, on which corresponding base detectors can be trained. Then a combination rule is used to derive the ultimate result for each test point. As we have mentioned previously that a main drawback of these ensemble detectors is that generating accurate and diverse base detectors may be difficult for some situations.
In fields of machine learning and data mining, novelty detection is always a hot topic since outliers often indicate interesting data patterns. In general, existing novelty detection methods can be categorized into probabilistic detection models, distance-based detection models, reconstruction-based detection models, and domain-based detection models. 15 For probabilistic detection models, Gaussian mixture model (GMM) should be one of the most popular parametric ones, and HMM and Kalman filter are another two commonly used parametric ones. Kernel density estimation should be the most popular non-parametric detection method. For distance-based detection models, methods based on nearest neighbors and clustering technique are often used in many applications. Then neural network (NN) based and principal components analysis (PCA) should be two commonly used reconstruction-based detection models. Finally, SVDD and OC SVM are two representative methods of domain-based detectors.
Preliminaries
Basic concepts concerning SVDD and ensemble learning will be introduced simply.
SVDD
Algorithm SVDD defines a model by using a hypersphere to give a closed boundary around all observations in the training set. This hypersphere can be characterized by center
where the parameter
By setting partial derivations to zero, the dual optimization problem has changed into
Solution of this dual optimization problem are a set of values of
where
For any test point
If
Ensemble of OC classifiers
Designing parallel ensembles of OC classifiers is easier than sequential ones because those techniques used in conventional classification problem can be directly used here. The rationale of parallel ensemble resides in reducing the variance by inducing diverse base detectors or OC classifiers and aggregating them. Several strategies could be used to enhance this diversity. The most commonly used technique should be Bagging that uses a bootstrap sampling. By fusing individuals learnt on different training subsets, Bagging is expected to obtain more robust result. Another well-known strategy of enhancing diversity is to use different feature subsets. RS and feature bagging (FB) should be the most commonly used techniques of this type. Note that subspace-based outlier ensembles are more efficient on high-dimensional datasets since outliers could be easily masked there and may be exposed in certain subspace. Apart from these two strategies, using different model parameters (or initializations) and even different algorithms have also been proposed to enhance diversity of outlier ensembles.
EAF
The EAF is a highly energy-intensive process used to convert scrap metal into molten steel. EAFs range in capacity from a few tons to as many as 400 tons. Figure 1 gives a simple description of the EAF operation. The graphite electrodes that connect to the electrical supply could convert the electrical energy into thermal energy via the electric arcs between electrodes and the steel scrap surface. In addition, natural gas and oxygen would also been injected into the furnace so that the releasing chemical energy could be converted into the thermal energy. The scrap keeps melting by absorbing extensive thermal energy. Therefore, the scrap surface becomes irregular as parts of the scrap melt and removed, leading to the contours of the scrap surface. The corresponding disturbances will occur for the arc length. Then the electrode regulate system will response to these disturbances by adjusting the distance from the electrodes to the scrap surface so that the optimal arc length can be obtained. We should also note that when sufficient space is available inside the furnace, another scrap charge will be added. Then the melting process will proceed until a flat batch of molten steel is formed at the end of the batch.

Electric arc furnace operations.
Methodology
Base detectors
As discussed previously that our detection model can also be deemed an ensemble model, the generation of base detectors is thus indispensable. In order to train diverse and accurate SVDD models in our ensemble, a subspace-based ensemble technique called FB 17 is used. It is observed that subspace-based ensemble techniques are often more efficient than subsampling techniques in unsupervised learning, even the dimension of the given data is not very high. 18 The basic steps of FB can be described as follows:
Sample an integer
Select
Use base detector on projected representation;
Repeat the above steps until
The rationale is to sample an integer
Artificial outliers
Before introducing the generation of artificial outliers, we first give a simple description of dynamic ensemble, by which the goal of artificial outliers can hence be explained. A general training and test process of dynamic ensemble learning can be demonstrated in Figure 2.

General description of dynamic ensemble learning.
Once base detectors have been trained, a selective procedure will be implemented for each test point. We can see from Figure 2 that a validation set is necessary to complete the selection. The objective of this validation set is to provide reference examples in order to identify the competence of all base detectors, with respect to this test point. Then the most competent base detector(s) will be selected according to the result of competence calculations.
From the above description of dynamic ensemble, we can find that the role of validation set is very critical as it is the premise of competence calculation. However, we have none labeled training examples in novelty detection to constitute this validation set. As a result, we have to generate some artificial ones instead. A simple strategy of generating artificial outliers is by sampling examples from a bounded uniform distribution. Another strategy is to assume that outlier examples locate in sparse regions of the target domain, that is, regions where the target data are either absent or isolated from the rest of the data. 19 Fan et al. 20 propose to generate outliers close to the target data by constraining the learning algorithm to form an accurate boundary between known classes and anomalies. The value of one feature of a target point is changed randomly while leaving other features unchanged. Major drawbacks of these two methods are the impossibility to generate a sufficient amount of outlier examples in high-dimensional situations due to the curse of dimensionality. To this end, Tax and Duin 21 propose to generate a uniform hyper-spherical outlier distribution that might fit tighter around the target class than a hyper-box distribution. However, such an approach to generate artificial outliers is mainly used to optimize parameters of OC classifiers, and the resultant outlier class heavily covers the target class. Consequently, it is not appropriate for providing reference validation sets in dynamic ensembles.
In Désir et al. 22 randomization principles of ensemble learning methods to subsample the number of features and the number of training target instances are used to generate artificial outliers from a computational point of view. Random subspace method (RSM) and Bagging are used to subsample features and training set, respectively. Then the amount of required outliers has been reduced much more than the original version. Furthermore, sparsity information extracted from the original training set is also used to make the artificial outlier class complementary to the target class. Experimental results on several benchmark datasets have shown the superiority of this method. Inspired by this, we also use such a strategy to generate outliers in this paper. But some adjustments are made here. In particular, RSM and Bagging will not be used to sample features and training set, and only FB is used instead since it is also used to train base detectors.
Score normalization
Before the selective procedure, a normalization procedure for all base detectors is of great necessity in order to achieve an unbiased selection. As has been proved that even when using the same method as base detector and identical parameterization, outlier scores obtained from different subspaces could vary considerably, if some subspaces have largely different scales.
23
For techniques concerning outlier score normalization, those converting outlier scores of different base detectors into probability estimates are more acknowledged. As claimed by Gao and Tan,
24
there are many advantages to transforming outlier scores into well-calibrated probability estimates, and a dominant one is that the probability estimates are more appropriate for developing an ensemble outlier detection framework. Sigmoid function and mixture modeling are accordingly used to fit outlier scores into probability values in this study. While in Kriegel et al.,
25
a more general framework of outlier score normalization is provided. The fundamental motivation of this framework is to establish sufficient contrast between outlier scores and inlier scores so that outliers could be easily separated from inliers. This seems more practical than only the interpretation of outlier scores because we actually need to pick out outliers in some applications. However, we may encounter a problem if we directly use these normalization methods. Note that normalization methods in Gao and Tan
24
and Kriegel et al.
25
are used for mining outliers in given databases. The normalization or scaling procedures are implemented with samples only in the given database. When using these procedures for unseen test samples, the probability of an observation being an outlier may be out of the range of
To address this problem, we first do some adjustments on outputs of base detectors before converting to probabilistic estimates. Generally, normalizing them to
Dynamic selection
Validation set
In general, algorithm K-nearest neighborhood (KNN) is always used to determine the validation set of any test point in dynamic ensembles. However, calculating the distances to all data points in the training set is necessary to find its nearest K neighbors. The cost of computation is too expensive for online application sometimes. While for the clustering-based method only the determination of its belonging cluster is necessary so long as all data points have been divided into several clusters at the training phase. As a result, we prefer to use a clustering algorithm to determine the validation set for each test point.
Here, we choose three representative clustering algorithms as candidates and do some quantitative comparisons. The first one is the classical K-means clustering, the second one is GMM, the third one is a density-based algorithm named DBSCAN (density-based spatial clustering of applications with noise). K-means should be the most popular clustering algorithm due to its simple theory and implementation. 29 Its drawbacks mainly reside on its sensitivity to initial values, noise, and outliers. Correspondingly, several improved versions have also been proposed. GMMs are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the Expectation-Maximization algorithm. 30 Its probabilistic form of output may be an advantage, which make GMM clustering can be combined with other statistical learning models more smoothly and naturally. But its drawbacks lie on its probabilistic assumption, which also requests more on the size of samples and representativeness. The largest advantage of DBSCAN should reside on its ability of discovering clusters of arbitrary shape. 31 It also requires less input parameter, including the number of clusters. While it becomes unstable when detecting border objects of adjacent clusters.
The quantitative criterion we use is Calinski–Harabasz (CH). 32 This quantity is defined as
where
where
When classifying a test point, we should decide its belonging cluster first. Then all data points in that cluster constitute the validation set with respect to this test point.
Competence calculation
It is desired that we could select the most competent base detector by computing their competences through the validation sets. With the artificial outliers in the training set, we can employ selection mechanisms proposed traditional classification problem. More than just estimating the classifier accuracy on the basis of a simple percentage of corrected classified samples, here we use a probabilistic measure to select the most competent classifier.
Let
where
where
where
As outputs of base classifiers have been transformed into posterior probability estimates, we can exploit this information to measure classifier competence
where
The term
The term
Then a weight is also assigned to each neighbor pattern to reduce the uncertainty triggered by the neighbor size. Finally, competence of classifier
We conclude this procedure in Algorithm 1.
Optimization of SVDD
In algorithm SVDD, two parameters
where
where
where
Experiments and analysis
Datasets
In EAF control systems, three secondary current and three secondary voltage are often used by data-driven control strategies. For example, in Li and Mao, 6 these six variables are used to identify the process model of EAF. In this paper, we also use these variables to constitute the training and test sets. Totally, we will use six datasets, including three synthetic ones by the simulation model and three real-world ones. A simple description of these datasets is shown in Table 1.
Description of datasets.
The first three datasets are generated using the simulation model, and different faults are simulated in different datasets. The last three datasets are collected from real-world EAF control systems. In each dataset, 70% normal data is randomly selected to constitute the training set, and all remaining samples constitute the test set. This process will be repeated 10 times, and the averaging values will be used as the final results.
Competitors and metrics
In order to put the experimental results of our method into context, we compare it with several competitors:
Random subspace SVDD (RS-SVDD) proposed in Wang and Mao. 13 In this detection model, technique RSM is used to develop a parallel ensemble model.
Bagging SVDD (BA-SVDD) proposed in Ge and Song. 33 In this detection model, technique Bagging is used to develop a parallel ensemble model.
Clustering-based SVDD (C-SVDD) proposed in Wang and Mao. 12 In this detection model, clustering technique is used to develop a parallel ensemble model.
Here, we refer to our method as Dynamic selection SVDD (DS-SVDD) as it is a dynamic selective SVDD model.
In this paper, we use three metrics, that is,
Confusion matrix of two-class classification problem.
Then we can formulate
This metric evaluates the degree of inductive bias in terms of a ratio of positive accuracy and negative accuracy.
where
The ROC curve describes the trade-off between the true-positive rate and the false-positive rate. (Note that normal data are regarded as positive in this paper. So true-positive rate indicates the rate of correctly detected normal data.) It could thus evaluate the general performance rather than performance at only one working point. In practice, the area under the ROC curve (AUC) is used since comparing directly ROC curves of varying detectors is difficult. For a novelty detection task, the AUC value of a perfect algorithm equals 1, implying that all outliers have been identified and none misclassified normal data occur simultaneously. Algorithms with AUC values smaller than 0.5 are often deemed invalid since “random guessing” could obtain the AUC of 0.5. Here, we employ method in Huang and Ling 34 to calculate the AUC.
Result and analysis
Results on all six datasets with respect to three metrics are shown in Tables 3–5, respectively. Apart from the values in terms of three metrics, we also provide the averaging values over all datasets so that we can have an insight into the general performance. The comparison of these averaging values can be understood clearly in Figure 3, from which we could find that our method (DS-SVDD) has achieved the best general result on all three metrics. Then we compare DS-SVDD with its competitor one by one:
Comparative result with respect to
SVDD: support vector data description.
The best result is in bold.
Comparative result with respect to
SVDD: support vector data description.
The best result is in bold.
Comparative result with respect to
AUC: area under the receiver operating characteristic curve; SVDD: support vector data description.
The best result is in bold.

Comparison result of the averaging values on three metrics.
Conclusion
To facilitate the development of advanced data-driven control strategies in EAF systems, this paper proposes a dedicated novelty detection model with the help of dynamic ensemble learning theory. In this detection model, SVDD plays the role of base detector. Artificial outliers are generated with two objectives, one is to complete the dynamic selection, and the other is to optimize two parameters of SVDD. Then clustering technique is used to determine the validation set for each test point. Finally, a probabilistic method is used to compute the competence of base detectors. In order to validation the proposed detection model, we compare it with four competitors on three synthetic and three real-world datasets. We compare results of all these methods and show the superiority of our method.
However, several issues regarding our method are still open to solve. For example, the procedure of generating artificial outliers may be not appropriate in some situation. When training set contains unknown outliers, the robustness of our method may be poor. These problems have not been considered in this paper, but they should be our future research directions.
