Sage Journals: Discover world-class research

Abstract

The presence of outliers is the main reason leading to ineffectiveness of advanced data-driven control methods in electric arc furnace systems. This paper proposes a hybrid method dedicated to detecting outliers in electric arc furnace systems, where process data are characterized as unlabeled, imbalanced, non-stationary and noisy. First, the raw data are divided into certain number of clusters. Then, with each cluster, a one-class classifier can be trained. So with these well-trained sub-models, new test points can be investigated. Those points that are rejected by all sub-models will be labeled as outliers. With the combination of one-class classification and clustering technique, the intricate data in electric arc furnace can be processed effectively. In addition, the detector will be updated with a specific strategy to enhance its adaptiveness. A series of experiments are carried out, and comparative results have shown the effectiveness of our method.

Keywords

Clustering Electric Arc Furnace hybrid method one-class classification outlier detection

I. Introduction

Electric arc furnace (EAF) is widely used in many countries for refining quality steel for industry. Nowadays in steel making companies, the number of EAFs is rapidly increasing since they are suitable devices to melt scrap and directly reduce iron for steel production. Generally, an EAF is among the highest electrical energy consumers in the power grid. The rising cost of energy has put pressure on the steel industry to improve its process control systems to conserve energy without sacrificing quality and equipment. This pressure is more accentuated when we consider the adverse effects of EAFs on the power quality of its feeding power system. Since an EAF is a non-stationary electric load, it can cause voltage fluctuation or flicker. It also produces current harmonics due to its highly nonlinear behavior. The unbalance in the meltdown phase is another adverse effect of such loads in a power system.

Note that the graphite electrodes that connect to the electrical supply could convert electrical energy into thermal energy through the current electric arcs between graphite electrodes and metal scraps. In order to have the best thermal efficiency, constant arc current is needed. While the arc current depends on arc length, and arc length changes with melted material and scrap surface variations. Therefore, the arc length should be controlled at a particular value. To this end, the displacements of graphite electrodes should be controlled via an electrode regulate system (ERS). In the literature concerning control strategies for EAF systems, adaptive control and predictive control are the most prominent ones. In addition, different sets of state variables have been considered by these control strategies in order to reach higher control performance. Billings et al.¹ proposed a temperature-weighting adaptive controller, where ambient arc temperature was used as an additional condition parameter to weigh the feedback error. Mao and Li² proposed to use the energy applied into the furnace to approximate the arc changes. This skillful design could address the problem of continuous temperature measurement, which is hardly to achieve in practice. Then, an adaptive feedback controller was designed based on this idea. Using Lyapunov design, Parsapoor et al.³ also proposed an adaptive control method for the ERS. Srdic and Nedeljkovic⁴ proposed a fast and robust predictive controller to reduce the flicker caused mainly by reactive-power variations during the electrode short-circuits. In their method, the arc current could be controlled without relying on the accurate arc model. Bekker et al.⁵ exploit the off-gas system of an EAF to provide valuable manipulated variables for feedback control. Then, a model predictive controller was developed based on manipulated variables such as the forced-draught fan power and the air-entrainment slip-gap width of off-gas system to control the relative pressure inside the furnace, and the temperature and composition of the gas that exits the cooling duct. Rashid et al.⁶ proposed a two-tiered economic model predictive control algorithm and implemented it on the EAF process control. The key idea of their algorithms is to use a tiered economic model predictive control framework to achieve an acceptable end-point while optimizing an economical cost function. Khoshkhoo et al.⁷ proposed to control the displacement of the graphite electrodes based on the estimation of the instantaneous value of the flexible cable inductance. Li and Mao⁸ proposed an adaptive neural network controller for ERS. In their method, the weights of the neural networks could be directly updated online based on the input–output measurement.

Summarizing these control methods, we find that in order to achieve higher control performance they all construct a data-driven model via specific intelligent algorithms like neural network to simulate the complex input–output relationship of EAFs. This has become the trend in control of EAFs since more and more measurements of state variables will be available due to the success of sensor technology. Combining outputs of this data-driven model and some other methodologies, the optimal control law could be calculated. Based on empirical results provided in literature, we find these model-based control methods usually outperform traditional ones like proportional–integral–derivative (PID) control. However, from the perspective of engineering practice, the quality of measurements of state variables has always been the main obstacle of these model-based control strategies when they are implemented in realistic EAFs. In other words, if measurements used for constructing the data-driven models are problematic, then outputs of the model will deviate from real outputs of system. Consequently, the calculated control law can hardly achieve deserved performance.

Inspired by such a practical problem, we developed a new direction that is referred to as outlier detection dedicated to data-based control systems for EAFs. This outlier detection phase should be implemented toward measurements prior to their entrance into the data-driven model. It is noteworthy that we only concentrate on methods for detecting real-time outliers existing in process data of EAFs. The design of control methods for EAF is out of the range of this paper. And we expect that our outlier detection method will be significant for enhancing the engineering value of such model-based control methods. To our knowledge, outlier detection methods dedicated to control methods for EAFs are extremely limited, and the bulk of outlier detection methods for control systems in literature are tailored for fault detection phase of a fault detection and identification (FDI) system. A highly relevant study is the one proposed by Liu et al.,⁹ where the authors proposed an improved radial basis function network to construct the model of controlled plant and an Auto-Regression Hidden Markov Model to detect outliers according to the generated deviations. From the provided experimental results, their method achieved prominent detection performance. Nevertheless, time series datasets used in their experiment are all one-dimensional, while most practical measurements of state variables used for control in EAFs are high-dimensional. While referring to methods dedicated to FDI systems, lot of techniques have been developed for various industrial systems. Jia et al.¹⁰ proposed an improved principal component analysis which was applied to the output of the extended system to develop a statistical model, by which the process monitoring and fault diagnosis could be available. Based on the sudden reduction of current harmonics in the fault conditions, Dehghan Marvasti et al.¹¹ used the primary side data of EAF transformer to detect fault in the secondary side. To overcome the failure of the second harmonic filter in the static VAR compensator, Park et al.¹² first concluded the reasons via various measurements. Then three solutions to suppress the transformer inrush current and another three solutions to avoid the parallel resonance were suggested. Several fault diagnosis methods for blast furnace may also be available for EAFs. However, the requirements and conditions for outlier detection for control systems are essentially distinct from those for FDI systems. Therefore, a new detection method must be designed expressly.

In this paper, we propose a hybrid method dedicated to detecting outliers in EAF systems. The notions of one-class classification (OCC) and clustering are combined to cope with the intricate data. Specifically, we exploit the intrinsic data structure with a clustering algorithm and construct several sub-models defined by one-class classifiers. In contrast to traditional OCC model, our method has higher robustness to the noise and outliers (in training set) and nonstationarity of data stream. According to the experimental results, effectiveness of our proposed method has been verified when compared with other competitors.

The rest of this paper is structured as follows. The main challenges of detecting outliers in EAF systems are summarized in the Section I. Section III expands upon the proposed method. A series of experiments are carried out in Section IV. Finally, some conclusions are drawn in Section V.

II. Main Challenges

In this section, we mainly analyze the main challenges of detecting outliers for control systems of EAF systems. By analyzing the characteristics of measurements of state variables and practical requirements of EAF, we summarize the following points.

The first and foremost challenge is that measurements used in training set are all unlabeled. Since in adaptive and predictive control algorithms designed for EAF systems, measurements for constructing data-driven models are all collected real-time data. Furthermore, non priori knowledge can be employed to these online collected data. Thus, we can only use these unlabeled measurements to train the detection model. This should be the greatest obstacle for traditional detection methods.

In addition to the issue of unlabeled data, data imbalance can be another prominent feature of data in EAF systems. Measurements of state variables on normal working conditions are very cheap and easy to obtain, but outlier measurements are expensive and are very rare generally. This phenomenon is usually referred to as data imbalance in data mining domain. A classifier affected by the class imbalance problem for a specific data set would see strong accuracy overall but very poor performance on the minority class.

Non-stationary and noisy should be the features of most industrial datasets. Since most contemporary industrial systems are extremely complicated, noise can easily sneak into the measurements. Furthermore, transformation of operating points will usually destroy the stationarity of the previous data stream. Consequently, detection methods should be robust to the noise, but also capture the dynamic trend of online data.

In addition to those features of measurements of state variables, the requirement of real-time detection indicates that the detection phase of the method should be fast enough.

III. The Proposed Method

According to the above analysis and due to insufficiency of existing methods, we propose a novel outlier detection method specifically devised for control methods of EAF systems. The whole process is illustrated by Figure 1 , from which we can find three phases, namely, online training, online detection, and model updating. In the following subsections, we will expand upon methodologies concerning these three phases.

Figure 1.

The whole process of the proposed outlier detection method.

A. Online training

This phase is extremely significant for the whole detection method since its performance determines the ultimate detection result directly. Our proposed training algorithm is totally based on the characteristics of measurements of state variables discussed in Section II.

First, our method can be categorized into the classification-based techniques. As we have analyzed, online detection requires the detection phase of whole method to be fast enough. A detection method with high computational complexity in testing phase will still be useless no matter how high its detection performance. Chandola et al.¹³ have concluded that a big advantage of classification-based techniques is that the testing phase is fast since each test instance only needs to be compared against the precomputed model. Meanwhile, they also concluded that multi-class classification-based techniques rely on the availability of accurate labels for various normal classes, which is often not possible. Therefore, we propose to construct a one-class classifier to describe the measurements of state variables in EAF. Thus, both the problem of unlabeled data and imbalanced data can be addressed simultaneously.

OCC is among the most difficult, but very promising areas of the contemporary machine learning. It works with the assumption that during the training phase, it has only objects originating from a single class at our disposal. This may be caused by cost restraints, difficulties or ethical implication of collecting some samples or simply complete lack of ability to access or generate objects. Of literature about one-class classifiers, boundary-based approaches such as support vector data description (SVDD)¹⁴ were proven to have better generalization ability. In this paper, we propose to use an improved SVDD called weighted SVDD (w-SVDD), which is inspired by the method proposed in Bicego and Figueiredo.¹⁵ Note that w-SVDD can be more robust to the noise or outliers in the training set than the original SVDD. The boundary of w-SVDD should be tighter than that of SVDD and can describe the target class more exactly. Such a feature is significant for EAF systems where noise or outliers in training sets are extremely common.

Assume $X = {x_{1}, \dots, x_{n}}$ are n d-dimensional training points. In the original SVDD formulation, the aim is to find the smallest sphere containing as many as training points with some relaxation given by slack variables. This aim is formulated as a constrained convex optimization problem

$\begin{array}{l} \min F = R^{2} + C \sum_{i} ξ_{i} \\ s . t . {‖ Φ (x_{i}) - o ‖}^{2} \leq R^{2} + ξ_{i}, i = 1, …, n \\ ξ_{i} \geq 0, i = 1, …, n \end{array}$ (1)

where R and o are the radius and center of the sphere, $Φ (x_{i})$ denotes a mapping that converts the original input space into a high-dimensional feature space, $ξ_{i}$ are the slack variables, and C gives the tradeoff between the volume of the description and the errors.

Incorporate the constraints into the error function by using Lagrange multipliers

$\begin{matrix} L (R, a, α_{i}, γ_{i}, ξ_{i}) = R^{2} + C \sum_{i = 1}^{n} ξ_{i} \\ - \sum_{i = 1}^{n} α_{i} (R^{2} + ξ_{i} - {‖ ϕ (x_{i}) - a ‖}^{2}) \\ - \sum_{i = 1}^{n} γ_{i} ξ_{i} \end{matrix}$ (2)

With the Lagrange multipliers $α_{i} \geq 0$ and $γ_{i} \geq 0$ . L should be minimized with respect to R, a, and $ξ_{i}$ and maximized with respect to $α_{i}$ and $γ_{i}$ . Then, the constraints can be derived by setting partial derivations to zero

$\begin{array}{l} \frac{\partial L}{\partial R} = 0 \to \sum_{i = 1}^{n} α_{i} = 1 \\ \frac{\partial L}{\partial a} = 0 \to a = \sum_{i = 1}^{n} α_{i} ϕ (x_{i}) \\ \frac{\partial L}{\partial ξ_{i}} = 0 \to C - α_{i} - γ_{i} = 0 \end{array}$ (3)

Then the Wolfe dual of this problem is

$\begin{array}{l} \min \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (x_{i} \cdot x_{j}) - \sum_{i = 1}^{n} α_{i} K (x_{i} \cdot x_{i}) \\ s . t 0 \leq α_{i} \leq C \\ \sum_{i = 1}^{n} α_{i} = 1 \end{array}$ (4)

where $K (x_{i}, x_{j})$ is the kernel function and $K (x_{i}, x_{j}) = 〈 Φ (x_{i}) \cdot Φ (x_{j}) 〉$ . In SVDD, Gaussian kernel is default. Here, we also use Gaussian kernel, $K (x_{i}, x_{j}) = \exp (\frac{{‖ - x_{i} - x_{j} ‖}^{2}}{s^{2}})$ . ${a_{1}, \dots, a_{n}}$ are corresponding Lagrange multipliers of n training points.

Provided the solution of this problem is $[{a_{1}}^{*}, \dots, {a_{n}}^{*}]$ , then according to Equation (3) we can calculate the center as

$o = \sum_{i = 1}^{n} {α_{i}}^{*} Φ (x_{i})$ (5)

Note that in Equation (5), only samples with $α_{i} > 0$ are used for calculating the center. These samples $x_{i}^{*}$ are called support vectors (SVs). Furthermore, samples with $0 < α_{i} < C$ locate at the boundary of the hypersphere, and those with $α_{i} = C$ will fall outside the description. Therefore, SVs locating at the boundary can be used to calculate the radius of the hypersphere

$R^{2} = \frac{1}{| S V_{< C} |} \sum_{x_{i} \in S V_{< C}} {‖ ϕ (x_{i}) - o ‖}^{2}$ (6)

where $S V_{< C}$ is the set of SVs which have $α_{i} < C$ .

In w-SVDD, a set of weights ${ω_{1}, \dots, ω_{n}}, ω_{i} \in [0, 1]$ indicating the importance assigned to each point of the training set is taken into account. In our method, we use the reciprocal of the distance to its cluster center as its weight. If $x_{i}$ is far from its cluster center, its weight will be smaller, the corresponding slack variable $ξ_{i}$ has a small penalty, thus being able to have a large value, which will allow $x_{i}$ to be far from the center of the hyper-sphere, having a weak influence on the final boundary. As thus, the optimization problem can be

$\begin{array}{l} \min F = R^{2} + C \sum_{i} ω_{i} ξ_{i} \\ s . t . {‖ Φ (x_{i}) - o ‖}^{2} \leq R^{2} + ξ_{i}, i = 1, …, n \\ ξ_{i} \geq 0, i = 1, …, n \end{array}$ (7)

The dual problem is

$\begin{array}{l} \min \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} K (x_{i} \cdot x_{j}) - \sum_{i = 1}^{n} α_{i} K (x_{i} \cdot x_{i}) \\ s . t . 0 \leq α_{i} \leq ω_{i} C \\ \sum_{i = 1}^{n} α_{i} = 1 \end{array}$ (8)

Furthermore, when more measurements at varying operating points are collected, the data may appear in homogeneous groups resulting in nonstationary data stream. In this situation, boundary approaches are prone to enclose a large empty area and thus increase the chance of accepting outliers. In order to mitigate this problem, we consider exploiting the data structure information and combining this information into our one-class classifier. Specifically, the proposed architecture comprises three main steps:

Step 1. K-mean clustering algorithm is used to partition the raw training data into K clusters;

Step 2. One w-SVDD model is constructed on each cluster;

Step 3. Make decision with K well-trained w-SVDD models.

General structure of our method can be shown in Figure 2 .

Figure 2.

General structure of “Cw-SVDD”.

Compared with single model-based methods, our method has the following advantages:

SVDD has better generalization ability but is prone to atypical distributions. The combination with clustering may effectively address this problem.

As each classifier is trained only on a partition of the raw data, its complexity is lower than in the case of a single model approach. This is significant for online detection. Moreover, this can also lead to reduced probability of over-training.

Using chunks of data as the classifier input leads to a reduction in the problem known as the empty sphere, that is, the area covered by the boundary in which no objects from the training set are located.

A boundary one-class classifier trained on a more compact data partition usually has a lower number of support vector.

As thus, our method can be regarded as a hybrid approach utilizing both classification-based and cluster-based approach combines the advantages of each while reducing their drawbacks. It is noteworthy that choosing the optimal clustering algorithm is not the aim of this paper. Here, we employ a simple but efficient clustering algorithm, namely K-means, in our method. Details about K-means will not be repeated in this paper. Up to now, all challenges about outlier detection for EAF systems discussed in Section II have been taken into account and corresponding solutions have been proposed. Due to the combination of w-SVDD and clustering, we call our method “Cw-SVDD.”

B. Online detection

Assume that ${M_{1}, \dots, M_{K}}$ are K sub-models obtained at the training phase, and each one can be expressed by the center and radius of its hyper-sphere $M_{i} = (o_{i}, R_{i})$ . Then, the target concept of the normal condition can be represented by the union of the regions represented by these k sub-models. Therefore, when an unseen testing point $x_{t}$ arrives, we can use the following function to predict its label

$x_{t} is {\begin{cases} a normal sample, if \exists i \in {1, …, K}, {‖ x_{t} - o_{i} ‖}^{2} \leq {R_{i}}^{2} \\ an outlier sample, otherwise \end{cases}$ (9)

C. Model updating

Actually, the requirement of online detection can hardly avoid sacrificing some accuracy compared with the offline detection methods. Furthermore, the nonstationarity of the data stream also requests the detection method to be adaptive simultaneously. Therefore, updating the detection model seems to be an indispensable implementation in our application. Naturally, there is no single best adaptation strategy that is suitable for all situations according to the “no free lunch.” Thus, we should take into account both the characteristic of data at our disposal and the corresponding detection method so that the proposed updating strategy will have more benefits.

As we have analyzed, the bulk of data measurements are sampled at normal working condition, and we propose to construct a method belongs to the category of one-class classifiers. Furthermore, an improved SVDD is employed in order to be more robust to the noise and outliers in the training set. Thus, a small number of noise and outliers will not influence the detection performance significantly. Also as stated by Žliobaitė et al.,¹⁶ excessive adaptation may be a waste of resources and provide only incremental insignificant benefits toward the model performance from a practical perspective.

As a result, in this paper, we propose a batch incremental (BI) updating rule for our detection method. We summarize three common adaptation strategies in Table 1 , where $x_{t}$ is the training sample at the current time, $M_{t - 1}$ is the latest detection model, and l is the batch size. In the fully incremental (FI) strategy updating is implemented with the latest model and the current sample at each sampling time. If we apply FI in our detection model, the return on investment is very low when we regard the model update as an investment decision as in the financial markets. While for non-incremental (NI) strategy, the detection model is rebuilt from scratch on a batch of past observations every time adaptation is required. Since the old model is discarded totally, more samples are required to make up the loss of information in order to achieve satisfactory performance. However, BI strategy can maintain the adaptive ability of the model at a high level with limited resources. The specific process of model updating is shown by Figure 3 .

Table 1.

Descriptions of three common adaption strategies.

Fully incremental (FI)	$M_{T} = f (M_{T - 1}, x_{t})$
Batch incremental (BI)	$M_{T} = f (M_{T - 1}, x_{t - l + 1}, …, x_{t})$
Non-incremental (NI)	$M_{T} = f (x_{t - l + 1}, …, x_{t})$

Figure 3.

The whole process of model update.

In our proposed updating strategy, we set two indicators for informing the update of the detection model. The first one is inspired by the change of intrinsic structure of the data stream. Specifically, if the indicator has detected the occurrence of outliers is in group during a period of time, the updating procedure starts. The rationale of this indicator is that the intrinsic structure of normal data may evolve with time and the OCC classifier represented by the old observations may be biased. However, it is worthy to state that if the occurrence of outlier is still in group after the updating of detection model, a system fault should be alarmed. The aim of the second indicator is to take full advantage of limited resources. Specifically, when the upper limit of the buffers storing the measurements has reached, the updating procedure will also start. Otherwise, some valuable information will be wasted. When either indicator has informed an updation for the detection model, we use measurements in the nearest window to construct an OCC classifier. Then, we calculate the distance from each center of clusters in the old model to the center of this new cluster and abandon the farthest cluster. Thus, the detection model can be updated with newer measurements that can accurately represent the trend of the data stream.

IV. Experiments and Analysis

A. Descriptions of datasets

As summarized by Li and Mao,⁸ the generalized plant can be described in the discrete system as

$i (k + 1) = f_{0} [ω (k), u (k)]$ (10)

where $i (k + 1) = [i_{a} (k + 1), i_{b} (k + 1), i_{c} (k + 1)]$ are secondary current, $u (k) = [u_{a} (k), u_{b} (k), u_{c} (k)]$ are three control signals, $ω (k) = [\begin{array}{l} i_{a} (k), i_{a} (k - 1), i_{a} (k - 2), i_{b} (k), i_{b} (k - 1), \\ i_{b} (k - 2), i_{c} (k), i_{c} (k - 1), i_{c} (k - 2) \end{array}]$ , $f_{0} (\cdot) = [f_{a} (\cdot), f_{b} (\cdot), f_{c} (\cdot)]$ , $f_{j} (\cdot), j = a, b, c$ is nonlinear mapping.

There are 6 state variables, that is, three secondary current and three secondary voltage, to be used for identifying the data-driven model of the control method. Thus, in our experiments, we will detect these six variables at each time. Totally, three datasets are used in our experiments. Meanwhile, several synthetic outliers are injected into the datasets at random positions. Details about the datasets are shown in Table 2 .

Table 2.

Descriptions of datasets used in experiments (“#” represents the number of the corresponding item).

No.	# Examples	Ratio of noise (and outliers)	# Operating points
Experiment 1
1	10,000	0	0
2	10,000	0	0
3	10,000	0	0
Experiment 2
1	10,000	1%	0
2	10,000	1.5%	0
3	10,000	2%	0
Experiment 3
1	10,000	0	3
2	10,000	0	4
3	10,000	0	5

B. Baselines and metrics

Since SVDD was developed initially, it has been applied in various areas and achieved lots of satisfactory results. It is the same for w-SVDD as well. Thus, in our experiments, we compare our proposed method with these two techniques for all datasets. Contributions of our method could be embodied clearly via a series of comparisons.

Referring to the evaluation criteria for methods of classification, many metrics have been developed, such as overall accuracy, precision, recall, geometric mean of accuracies (G-mean), F-measure, AUC (area under receiver operating characteristic curve (ROC) curve), and so on. As the research community continues to develop a greater number of intricate and promising imbalanced learning algorithms, it becomes paramount to have standardized evaluation metrics to properly assess the effectiveness of such algorithms. Therefore, in this paper, we choose three metrics which are more suitable for data in EAF systems (data imbalance), namely G-mean, F-measure, and AUC.

Prior to the introductions of these metrics, a representation of classification performance is formulated by a confusion matrix as illustrated in Table 3 .

Table 3.

Confusion matrix of two-class classification problem.

		Actual label
		Target class	Negative class
Predicted label	Target class	True positive (TP)	False positive (FP)
	Negative class	False negative (FN)	True negative (TN)

It is worthy to state that outlier samples belong to positive class and normal samples belong to negative class. Then, we can formulate G-mean as

$G - m e a n = \sqrt{\frac{T P}{T P + F N} \times \frac{T N}{T N + F P}}$ (11)

This metric evaluates the degree of inductive bias in terms of a ratio of positive accuracy and negative accuracy.

F-measure can be formulated as

$F - M e a s u r e = \frac{(1 + β^{2}) \cdot R e c a l l \cdot P r e c i s i o n}{β^{2} \cdot R e c a l l + P r e c i s i o n}$ (12)

where $β$ is a coefficient to adjust the relative importance of precision versus recall (usually, $β = 1$ ), and $R e c a l l = (T P / T P + F N)$ , $P r e c i s i o n = (T P / T P + F P)$ . F-measure combining recall and precision as a measure could provide more insight into the functionality of a classifier than the accuracy metric.

Referring to the ROC curve, both true positive rate and false positive rate are used to evaluate the average performance of a classifier by providing a visual representation of the relative trade-off between the two metrics. Then, AUC is an evaluation criterion that uses the AUC.

C. Designs and results

As analyzed in previous sections, several restrictions have been placed on the outlier detection for EAF systems. Since the performance with respect to data unlabeled and data imbalance cannot be verified due to the natural limitation of datasets, we mainly investigate the performance of our method at three aspects, namely general performance, robustness to noise and outliers (in training set), and robustness to the nonstationarity of data stream. All experiments will be repeated five times and the average values with respect to G-mean, F-measure, and AUC will be listed.

Experiment 1

First, we investigate the general performance of three competitive methods for all datasets. Datasets used in this part are sampled from only one working operating point and no additional noise or outliers are injected. Here, the number of cluster is 3. Results for all datasets in terms of three metrics are shown in Table 4 . For clear visual comparison, we also show the result with respect to G-mean metric in Figure 4 . Note that results in terms of F-measure and AUC are similar with that of G-mean.

Table 4.

Comparison of Cw-SVDD, w-SVDD and SVDD in Experiment 1.

Dataset	SVDD	w-SVDD	Cw-SVDD
G-mean values
1	0.855	0.871	0.912
2	0.847	0.856	0.899
3	0.811	0.819	0.867
F-measure values
1	0.891	0.912	0.957
2	0.877	0.903	0.948
3	0.851	0.891	0.941
AUC values
1	0.909	0.918	0.965
2	0.900	0.903	0.957
3	0.875	0.891	0.932

AUC: area under curve; SVDD: support vector data description.

Figure 4.

Comparative results with respect to G-mean metric for Experiment 1.

As can be seen, for G-mean metric, Cw-SVDD outperforms the other two methods for all three datasets and w-SVDD outperforms SVDD for two of three datasets. Since few noise or outliers are contained in the training set, the difference of performance between w-SVDD and SVDD is not significant. Actually, the advantage of Cw-SVDD is also not prominent. The reason may be that datasets used in this part are too simple. For F-measure metric, the differences between Cw-SVDD and the other two methods are greater compared with G-mean. This is mainly because F-measure metric highly emphasizes the influence of the accuracy of positive class (outliers). While for AUC metric, average performance is evaluated and results with respect to this metric are very similar with that of G-mean.

Experiment 2

Then, in the second part, we investigate the robustness to the noise and outliers in the training set of three competitive methods (no operation for testing samples). The process for generating noise and outliers is illustrated by Algorithm 2. Specifically, a noisy example is generated by adding one standard deviation to the original data value and triple standard deviation for the amplitude of outlier. Examples act as noise or outliers are selected randomly, and states variables (maybe one, maybe more) behave abnormally are also selected randomly. Also datasets used in this experiment are sampled from only one operating point. The number of the cluster is also set to 3 here. Results for all datasets with respect to three metrics are show in Table 5 . For clearly visual comparison, we also show the result in Figure 5 .

Table 5.

Comparison of Cw-SVDD, w-SVDD and SVDD in Experiment 2.

Dataset	SVDD	w-SVDD	Cw-SVDD
G-mean values
1	0.794	0.851	0.902
2	0.781	0.849	0.891
3	0.741	0.832	0.862
F-measure values
1	0.801	0.901	0.943
2	0.790	0.893	0.928
3	0.751	0.881	0.911
AUC values
1	0.809	0.908	0.956
2	0.801	0.893	0.945
3	0.775	0.861	0.921

AUC: area under curve; SVDD: support vector data description.

Figure 5.

Comparative results with respect to G-mean metric for Experiment 2.

Comparing results with that in Experiment 1 for all datasets, performances in terms of all metrics deteriorate. And this is worse for SVDD, since noise and outliers could influence the calculation of center and radius heavily. For w-SVDD, this deterioration has been alleviated since the negative influence of noise and outliers has been reduced by the weights of slack variables in the objective function. For Cw-SVDD, apart from the usage of w-SVDD, the exploitation of intrinsic structure of dataset can also remove some impact of noise and outliers in training set.

Experiment 3

Finally, we investigate the robustness to the nonstationarity of the data stream. This feature is extremely vital for detection methods used in EAF systems since weak robustness to the nonstationarity will lead to lots of outliers misclassified. Then, the detection is meaningless even if all normal samples are classified correctly. Thus, datasets used in this experiment are sampled from several operating points. Different from the former two experiments, the number for the cluster is set to 5 here. Neither additional noise nor outliers are added to original training samples. Results for all datasets with respect to three metrics are shown in Table 6 . For clear visual comparison, we also show the result in Figure 6 .

Table 6.

Comparison of Cw-SVDD, w-SVDD and SVDD in Experiment 3.

Dataset	SVDD	w-SVDD	Cw-SVDD
G-mean values
1	0.651	0.674	0.901
2	0.642	0.656	0.893
3	0.610	0.619	0.853
F-measure values
1	0.591	0.612	0.937
2	0.577	0.603	0.924
3	0.551	0.591	0.901
AUC values
1	0.609	0.613	0.955
2	0.600	0.603	0.940
3	0.571	0.594	0.919

AUC: area under curve; SVDD: support vector data description.

Figure 6.

Comparative results with respect to G-mean metric for Experiment 3.

As can be seen from results with respect to all three metrics, the difference between Cw-SVDD and SVDD, w-SVDD becomes very obvious in this experiment. Samples are distributed in many small groups due to presence of several operating points. Boundary defined by SVDD and w-SVDD would usually contain more outliers than that defined by Cw-SVDD. This drawback of SVDD and w-SVDD can be shown more obviously by F-measure and AUC ( Table 6 ) metric.

V. Conclusion

This paper proposes a novel and practical direction for outlier detection dedicated to control system of EAF after summarizing many researches on control methods for EAF. The aim is to provide high-quality examples for data-driven models used in control methods of EAF. In our proposed method, we sufficiently analyze the characteristics of measurements in EAF systems and conclude that the outlier detection method will work under the condition of unlabeled, imbalanced, non-stationary and noisy data. Furthermore, we propose a hybrid method that combines OCC and clustering according to the characteristics of the data. Also, a BI updating strategy is proposed to enhance the adaptivity of the detection model. Through a series of experiments, robustness to noise and outliers in training set and robustness to nonstationarity of data stream of our methods have been verified. Results show that our method (Cw-SVDD) outperforms SVDD and w-SVDD in terms of three metrics for all datasets. Actually, the performance of detection method can be further improved. By employing both positive and negative samples collected online, a binary classifier could be constructed. Since more data information is utilized, a binary classifier could achieve better results. This is just our further research direction.

Footnotes

Funding

The author(s) received no financial support for the research,authorship and/or publication of this article.

References

Billings

Boland

Nicholson

. Electric arc furnace modelling and control. Automatica 1979; 15(2): 137–148.

Mao

. Adaptive controller of an electric arc furnace with feedforward. Journal of Northeastern University 1996; 17: 65–68.

Parsapoor

Ataei

Kiyoumarsi

. Adaptive control of the electric arc furnace electrodes using Lyapunov design. In International conference on control, automation and systems, Seoul, South Korea, 17–20 October 2007.

Srdic

Nedeljkovic

. Predictive fast DSP-based current controller for thyristor converters. IEEE Transactions on Industrial Electronics 2011; 58(8): 3349–3358.

Bekker

Craig

Pistorius

. Model predictive control of an electric arc furnace off-gas process. Control Engineering Practice 2000; 8(4): 445–455.

Rashid

Mhaskar

Swartz

CLE

. Multi-rate modeling and economic model predictive control of the electric arc furnace. Journal of Process Control 2016; 40: 50–61.

Khoshkhoo

Sadeghi

SHH

Moini

Talebi

. An efficient power control scheme for electric arc furnaces using online estimation of flexible cable inductance. Computers & Mathematics with Applications 2011; 62(12): 4391–4401.

Mao

. A direct adaptive controller for EAF electrode regulator system using neural networks. Neurocomputing 2012; 82(4): 91–98.

Liu

Mao

. Outlier detection for process control data based on a non-linear Auto-Regression Hidden Markov Model method. Transactions of the Institute of Measurement and Control 2012; 34(5): 527–538.

10.

Jia

Wang

Guo

Niu

. Application of improved PCA to fault diagnosis for vacuum consumable electric-arc furnace. Journal of Northeastern University 2007; 28(9): 1221–1224.

11.

Dehghan Marvasti

Samet

. Fault detection in the secondary side of electric arc furnace transformer using the primary side data. International Transactions on Electrical Energy Systems 2015; 24(10): 1419–1433.

12.

Park

Lee

Jang

Han

. A fault analysis of DC electric arc furnaces with SVC harmonic filters in a mini-mill plant. Electric Power Systems Research 2010; 80(7): 7–14.

13.

Chandola

Banerjee

Kumar

. Anomaly detection: A survey. ACM Computing Surveys 2009; 41(3): 15.

14.

Tax

DMJ

Duin

RPW

. Support vector data description. Machine Learning 2004; 54(1): 45–66.

15.

Bicego

Figueiredo

MAT

. Soft clustering using weighted one-class support vector machines. Pattern Recognition 2009; 42(1): 27–32.

16.

Žliobaitė

Budka

Stahl

. Towards cost-sensitive adaptation: When is it worth updating your predictive model? Neurocomputing 2014; 150: 240–249.

Detecting Outliers in Electric Arc Furnace under the Condition of Unlabeled,Imbalanced,Non-stationary and Noisy Data

Abstract

Keywords

I. Introduction

II. Main Challenges

III. The Proposed Method

A. Online training

B. Online detection

C. Model updating

IV. Experiments and Analysis

A. Descriptions of datasets

B. Baselines and metrics

C. Designs and results

Experiment 1

Experiment 2

Experiment 3

V. Conclusion

Footnotes

Funding

References