Abstract
Introduction
In numerical control (NC) milling operations, the manufacturing quality of workpieces, such as surface roughness and dimensional accuracy, is largely influenced by the tool wear state. Moreover, most milling machine tool failures are generated from the tool system. Therefore, the development of online tool condition monitoring (TCM) is of great significance. TCM seeks to provide the essential information required for the predictive maintenance of milling cutters via signal processing methods and the extraction of the most significant signal features is indicative of tool condition (i.e. feature selection) from the time domain, frequency domain, and time-frequency domain. This represents an extensive research focus worldwide. Feature selection, in particular, has raised considerable interest among researchers. Yum et al. 1 adopted a new two-step combination feature selection method, which improved the performance of all classifiers. The maximum classification rate was 98.3%, which was an improvement of 4.2% compared with the best single-step feature selection method. Yu 2 exploited the adaptive Gaussian mixture model (AGMM) for the effective assessment of tool wear. Good feature extraction results were achieved using Daubechies wavelet of order 5. Wang and Sun 3 proposed a feature selection method based on ant colony optimization (ACO). This method, which modeled the feature selection process according to the behavior of ants searching for food, achieved good results. Lin et al. 4 developed an efficient feature selection method by regression analysis. However, the experimental results represented a non-precise assessment because regression analysis cannot precisely describe non-linear relationships and decision variants for tool wear state monitoring. Zhao et al. 5 used a clustering evaluation model to select significant features, although the model largely relied on professional knowledge, which is not suitable for a dynamic milling process. Goldberg 6 employed genetic algorithms (GAs) in searching, optimization, and machine learning, which resulted in low efficiency owing to the absence of a common rule for parameter selection. To predict flank wear in drilling, Garg et al. 7 employed particle swarm optimization (PSO) with a trained artificial neural network (ANN). Their experiments provided good prediction results in conjunction with rapid computation. In general, correlation analysis (i.e. the application of a correlation coefficient and fuzzy classification) is a good feature selection method with acceptable results. 8 Moreover, intelligent optimization algorithms have attracted the attention of numerous scholars in recent years. These algorithms, such as the fruit fly optimization algorithm (FOA), ACO, and PSO, which are among the most effective, have become the most widely employed methods in optimization problems.
In TCM, the signal acquisition methods employed to predict tool wear can be classified as direct and indirect methods. In the direct method, the actual tool wear is measured directly, which requires a temporary halt in manufacturing. Thus, most researchers have investigated indirect methods employing monitoring signals that can be obtained online, including the milling force, vibration, sound, acoustic emission (AE), temperature, spindle power, and surface roughness. In actual practice, a worn tool requires more force than a sharp tool to remove an equivalent amount of material, and the milling force is therefore considered one of the most effective parameters for monitoring tool wear based on previous experimental work. 9
It has been observed that indirect measurements are subject to several experimental limits. For example, the computational effort involved in correlating process parameters with flank wear is high. Thus, significant effort has been devoted to improving computational models. Ren et al. 10 unitized milling force measurements in a Takagi–Sugeno–Kang (TSK) fuzzy approach for TCM. However, it was observed that such models had difficulty estimating approximation errors and therefore required development to capture uncertainties during the turning process. Al-Habaibeh et al. 11 and Fang et al. 12 used multiscale methods, both of which were based on a milling force signal. Their research demonstrated the usefulness of the milling force, and the designed systems were able to predict tool wear successfully.
However, the milling force is also sensitive to other parameters and can vary with cutting speed, depth of cut, and workpiece hardness, making correlation with wear more complicated. In this article, the time domain features of milling force signals are extracted by sensors. The features in the time-frequency domain are then extracted by wavelet analysis. Feature selection is conducted by an improved fruit fly optimization algorithm (IFOA), and the selected features are then input to a back propagation (BP) neural network to monitor the tool wear state. To verify the advantages of the proposed IFOA, experiments comparing four feature selection methods (the proposed IFOA, ACO, a correlation analysis method, and PSO) were conducted. The results verify that the proposed IFOA exhibits good adaptation and good optimization effectiveness. Therefore, this method is suitable for feature extraction in TCM.
Experimental milling cutter feature extraction scheme
Experimental signal acquisition of milling force
Milling force signals directly reflect the state of tool wear. Moreover, milling force signals respond quickly to changes in the state of tool wear, and are easily extracted, resulting in easily achieved TCM. Therefore, the milling force signal was selected as the feature extraction object and formed the basis for the judgment of the tool wear state. The overall work flow of the proposed TCM system is illustrated in Figure 1.

Flow chart illustrating the tasks performed by a TCM system.
The experiments were conducted on a Makino computer numerical control (CNC) machine equipped with an EGD 4440R milling cutter, A30N cutting material, and an ASSAB 718 HH machining workpiece with dimensions 206 mm × 43 mm × 106 mm, the ways of milling is face down milling. It has been shown that the milling force signal has no simple linear relationship to the tool wear state.13,14 Therefore, the milling force signal extracted by a single sensor cannot reflect the tool wear state accurately. A Kistler 9257B three-phase dynamometer and a Kistler 5019 multi-channel charge amplifier were employed to evaluate the milling force, and a NI-DAQ PCI1200 data acquisition board, Olympus microscope, and a Panasonic digital camera are employed to record the cutting process. Feature selection experiments were conducted in MATLAB R2013, equipped with a Windows 10 system, 64-bit operating system, processor frequency of 2.7 GHz, and 8 GB of memory. The cutting conditions tested involved spindle speeds of 600, 800, 1000, and 1200 r/min; feed rates of 100, 150, 200, and 300 mm/min; and a depth of cut of 1 mm. Figure 2 illustrates the experimental setup.

Schematic diagram of the experimental setup.
Figure 3 presents images of cutting tool wear at four stages: initial wear, normal wear, severe wear, and tool failure. Because tool wear mainly consists of various stages of flank wear (VB), the ISO 08688-1 standard defines 0.5 mm of VB as the tool failure limit.

Images of the four different states representative of progressive tool wear.
Figure 4 presents the experimental tool wear curves of seven tools. Initial tool wear represents VB values of 0.00–0.10 mm, and the extent of VB increases rapidly in this stage. Normal tool wear represents VB values of 0.10–0.40 mm. In this stage, the wear resistance of the cutter increases, and the wear rate is not as rapid as that of the initial tool wear stage. Severe tool wear represents VB values of 0.40–0.50 mm. During this phase, the vibration resistance of the cutter is reduced, and the extent of VB increases rapidly. Finally, tool failure occurs when the extent of VB exceeds 0.50 mm.

Experimental tool wear process curves for a variety of tools.
Extraction of time domain milling force features
An end mill is employed as the research subject in the experiment, and six time-domain milling force features related to the Z direction were extracted through the dynamometer, including the maximum value (X1), peak amplitude (X2), given as the difference between the maximum and minimum values, mean value (X3), root-mean-square value (X4), standard variance (X5), and peak value (X6). Collection of the milling force signal (denoted as
Maximum value and peak amplitude of the milling force
The two features, respectively, represent the steady state and transient state of the milling force.
Mean value (
Root-mean-square value
Standard variance
The value of
Peak value
Here,
Extraction of time-frequency domain milling force features
Wavelet analysis can decompose signals into independent frequency regions orthogonally without gaps or overlaps. These signals in the frequency domain are useful information for TCM. This study employed Daubechies wavelet of order 5 to decompose the milling force into four layers. Each node obtains a time-frequency feature
Theoretical background
Basic FOA
The FOA is a global optimization searching algorithm proposed by Pan
15
that models the food searching behavior of fruit flies. A schematic of the FOA is given in Figure 5, representative of the extraction of features

Illustration of the fruit fly optimization algorithm (FOA).
Currently, the FOA has been applied to adjust financial warning models, locate mathematical extrema, and optimize the parameters of general regression neural networks and vector machines. 16 Wu and Li 17 compared the performance of FOA with five other evolutionary algorithms (GA, ACO, PSO, fish school algorithm, and immune algorithm) using the Schaffer formula. FOA was found to be superior to the other algorithms in terms of its reduced calculational burden. Moreover, FOA can optimize non-negative parameters easily. However, the disadvantage of FOA is equally obvious: low optimization precision owing to a tendency to converge to local optima. The basic steps involved in FOA include the following:
Randomly establish initial position of the fruit fly cluster:
Randomly establish food searching position (
Because the location of the food source is unknown, the distance to the origin (Dist
Substitute
Determine the maximal
where
Record the value of
Conduct an iterative optimization by repeating steps (2)–(5). If the new value of
Preliminary testing using FOA demonstrated an unacceptable level of instability because tool wear is a random process, and premature convergence owing to its low optimization precision, making it unsuitable for the milling feature selection process of TCM. As such, the optimization ability of FOA requires improvement for application to the feature selection process. Numerous successful efforts have been made to improve the optimization performance of FOA. Marko et al. 18 proposed chaotic FOA (CFOA) based on an investigation of FOA and another 10 different chaotic systems, and the method demonstrated a superior global optimal reliability and a high success rate. Wu et al. 19 improved the FOA global searching ability by adjusting the entropy parameter to amplify the searching radius in a cloud model–based FOA. Wang 20 optimized a wavelet neural network based on an IFOA to predict the melt index of industrial polypropylene. In the experiment, inertial weight parameters were employed to balance global and local searching abilities, resulting in an improved global searching ability. Wang 21 added randomized mutation and group cooperation in FOA to optimize complicated functions and solve joint replenishment problems (JRPs), resulting in an efficient FOA with an improved global searching ability. Experiments demonstrated the good performance of the IFOA.
Basic theory of Fisher screening
Fisher linear discrimination analysis is one of the most effective methods of feature extraction. The Fisher discriminant achieves high discriminant efficiency because it can maximize and minimize different sample diversity. The Fisher criterion is given by the following expression 22
Here,
Improved fruit fly algorithm
Because tool wear is a random process, a self-adapting FOA was adopted, where Fisher screening is added as a second decision after step (7), resulting in a reduced dimensionality of features. Assuming the initial coefficient of each feature is 1, after FOA optimization, features with small coefficients are screened, and nearby fruit flies are clustered at the selected features. Then, the Fisher discrimination criterion is employed as a second optimization standard. Here, if the Fisher discriminant value of optimized features satisfies the output standard, the process is terminated; otherwise, we return to step (2) until the standard condition is fulfilled. A flow chart of the proposed IFOA is shown in Figure 6.

Experimental tool wear process curves for a variety of tools.
Experimental data analysis
Feature selection based on IFOA
A total of 22 features were selected in the experiments, consisting of the 6 time domain features and 16 time-frequency domain features. The fixed factors are spindle speed of 800 r/min, feed of 150 mm/min, cutting depth of 1 mm, and sampling frequency of 2 kHZ. Authors set the initial coefficients of all 22 features as 1, which is then optimized by the proposed IFOA. The population of the fruit fly cluster is 22, and the maximum number of interactive optimization is 100. According to FOA theory, individuals with the largest Smell values fly in the same direction, which optimizes the vector coefficient. Features with small Fisher criterion values are removed, which completes the feature selection process for those features. In this article, training samples are divided into
The selected features should clearly distinguish between the initial wear state and the severe wear state. Therefore, the evaluation index is defined by the Fisher criterion as follows
Here,
Diagnose tool wear state by BP neural network
ANNs have demonstrated a strong capacity for fault identification.
24
This study established a BP neural network with three layers to evaluate the effectiveness of feature selection. First, the data were normalized, and the selected cutting feature set was input into the BP neural network, where the tool wear condition was obtained as the output of neurons. According to Kolmogorov’s theorem, the number of hidden neurons is 2
Experimental results
Because IFOA is a random optimization searching algorithm for feature selection, the number of selected features varies after each optimization, but the good performance of the BP neural network is retained each time and is therefore suitable for monitoring the tool wear state after training. Table 1 reflects the relationship among the various feature subsets, the training time of the BP neural network, and the simulation errors (mean square error (MSE)), where the feature subsets have been listed according to decreasing MSE values. Clearly, the feature selection method based on IFOA demonstrated good performance. Compared to the results with all 22 features selected, the proposed method reduced the training time significantly and demonstrated low prediction error and fast training speed when the dimension was less than 10. Figure 7 illustrates the structure of BP neural network.
The results of feature selection by the proposed IFOA.
IFOA: improved fruit fly optimization algorithm; MSE: mean square error.

The structure of BP neural network.

BP neural network performance of the selected feature set (X3, X10, X14, X17).
Comparison experiments
Comparison experiments were conducted to compare the results of feature selection methods employing the proposed IFOA, ACO, a correlation analysis method, and PSO on equivalent sets of the experimental milling force data.
Feature selection results
The number of selected features directly affects the progress of TCM. As shown in Figure 9, the number of selected features obtained from equivalent data varied for each method. Obviously, IFOA achieved the best feature selection result, under the condition of equivalent selection effectiveness. While PSO selected a similar number of features as IFOA (i.e. 6), this is not unexpected because they are both evolutionary algorithms deduced from natural foraging behavior that obtain high optimization efficiency and optimal positioning on the basis of evolutionary particle position changes. As for ACO and the correlation analysis method, the distinguish ability between features is not as definite as in the case of IFOA and PSO, and a greater number of features are therefore selected, possibly resulting in redundancy.

Comparison of the number of selected features obtained from various feature selection methods.
Fisher discriminant values
The Fisher discriminant values of each selected feature were arranged from low to high for each of the selection methods considered, as is shown in Figure 10. In this experiment, the Fisher discriminant value represents the relationship between a feature and the tool wear condition, where the higher the value, the more closely the feature reflects the tool wear state. The feature sets selected by ACO and the correlation method both include X21, which has a Fisher discriminant value less than 1. While PSO and IFOA selected features with similar Fisher discriminant values, the average Fisher discriminant value obtained by PSO is 3.98, which is lower than that of IFOA with 4.55.

Fisher discriminant values of the selected feature sets obtained from various feature selection methods.
Training time
Figure 11 presents the comparative results of computing time. It can be observed from the figure that ACO requires the longest computing time, which is greater than 140 s, while the other three feature selection methods have similar times of around 6 s and do not differ by more than 0.2 s.

Comparison of the training times required for various feature selection methods.
MSE
MSE is the most important index for the diagnostic effectiveness of the BP neural network. As shown in Figure 12.

Comparison of MSE values obtained for various feature selection methods.
The following conclusions can be drawn from the comparison analysis. ACO selected the largest number of features, required the longest training time, and provided the highest MSE, which corresponds with the lowest optimization efficiency of all methods considered. The correlation method provided both a low training time and MSE, but it selected a relatively large number of potentially redundant features. While PSO performed well, the number of features selected, training time, and MSE were all a little greater than those obtained for IFOA. Therefore, we can conclude that IFOA provided the best optimization performance compared with the other three algorithms.
Conclusion
An IFOA employing the Fisher criterion was developed and applied for feature selection in the process of tool wear state monitoring for a CNC milling machine based on milling force data. The optimized set of features provided was demonstrated to provide for effective monitoring of the tool wear state. The following conclusions can be drawn from the experimental results:
The proposed IFOA realizes easy implementation, precise optimization, and rapid training by selecting a small number of significant features, resulting in good BP neural network performance.
The proposed IFOA demonstrates good effectiveness and is suitable for use in TCM.
