Abstract
Keywords
Introduction
As one of the important parts of the engineering machines, the fault diagnosis system can efficiently monitor the condition of the machine and maintain the performance safely and reliably. Because of the non-stationarity and non-linearity of the vibration collected by the monitoring system, the regular fault diagnostic methods, performed on the basis of the calculation of the statistic indexes, cannot sufficiently extract the fault information. Therefore, images were gradually applied in the monitoring and diagnosis process of the machine in recent years.
For example, images of the temporal waveform, power or amplitude spectrum, time–frequency distribution, time-scale distribution and non-linear spectrum, and so on have been applied on the fault diagnosis of machines with different degrees. Compared with the statistic index, the image is a presentation of the feature with much higher dimension. For example, a memory with the dimension of 1024 × 512 is needed for the storage of the time–frequency distribution image, if the temporal waveform of 1024 samplings was transformed to the time–frequency domain with a window of 512 samplings. Containing more fault information, the image presentation of the vibration signal is more favorable to the maintenance of the machine, although the larger memory, longer calculation time, and more knowledge are needed. In this case, a great many attentions have been paid on the image-based fault diagnosis methods in the past decades.1–8 However, the high dimension makes big troubles for the application of the image-based diagnostics in engineer applications because the dimension of the input vector increases sharply in the processing program. In this instance, reducing the dimension of input signal is the key step for the application of the image-based diagnosis method, and many dimensionality reduction methods, shown in Table 1, have been investigated in different areas.
Dimensionality reduction methods and corresponding applications in recent years.
As one of the initial literatures on the dimensionality reduction, the principal components analysis (PCA) has been proposed by Hotelling 32 in 1933 and applied in the statistical analysis of the psychological data. Consequently, a series of dimensionality reduction methods, including linear discriminant analysis (LDA), locally linear embedding (LLE), local tangent space alignment (LTSA), and so on, were developed on the basis of different optimization criterions. However, in engineering applications, non-linear features are imbedded in the high-dimension spaces, and the traditional linear dimensionality reduction methods can hardly deal with problems involving non-linearity. Therefore, non-linear dimensionality reduction methods, including non-negative matrix factorization (NMF), manifold alignment, and the kernel-based dimensionality reduction methods, attracted many attentions in the past decades. Suppose that the samples were located or approximately located on the non-linear manifold in a high-dimension space, Zhang et al.33,34 introduced the manifold learning to the dimensionality reduction of the face recognition. Mapping the original data to a higher space, the kernel-based method can transform the non-linear problem to be a linear one in the dimensionality reduction. For the data processing with very high dimension in engineering applications, the kernel-based methods can effectively save the consuming time and memories used by the regular methods. 35 Compared with the manifold learning method, NMF is more stable and mature in the application of dimensionality reduction. Involving the sparse NMF (SNMF),36,37 kernel NMF (KNMF),38,39 and so on, the NMF can present the original data with a coefficient matrix with less dimensions. However, a single kernel function cannot perfectly present all features imbedded in the data, many useful features may be lost in the mapping program, and the diagnosis accuracy may be decreased. In this instance, by combining the multi-kernel method and the NMF, a novel fault diagnosis method is proposed in the article and applied for the bearing fault diagnosis and rotor condition identification.
Multi-KNMF
NMF
Adding a negative restriction on all elements of the matrix, the NMF was proposed by Lee and Seung 25 in 1999. The mechanism of the NMF is given as follows.
Suppose that
where
In order to obtain the base matrix
When the loss function is determined by the Kullback–Leibler dispersion, the optimization function of the NMF is given by
It can be seen that the NMF obtains the matrixes of
In engineering applications, many signals can be presented by the non-negative matrix, and the memory needed for the factorization matrix is much less than the original one. Moreover, the NMF includes implicitly the concept that the original matrix is synthesized by the base and coefficient matrixes. Therefore, the NMF has attracted many attentions in many areas.
Kernel NMF
When the Euclidean distance is used to calculate the loss function of NMF and
Apparently, equation (4) is a quadratic programming problem and can be rewritten as
It can be seen that only products of
where
Multi-KNMF-based mechanical fault diagnosis
Multi-kernel design
In engineering applications, the collected data/signals are produced by many sources or hold heterogeneous features, and a single kernel function cannot sufficiently describe whole features imbedded in the data/signals. In this instance, a multi-kernel function can be used to compensate the shortage of the single kernel function, and a new dimensionality reduction method, named multi-KNMF, is proposed here.
In the multi-KNMF, the kernel function is composed of at least two different kernel functions. Apparently, the construction of the multi-kernel function is the key of the multi-KNMF. The most convenient method is the convex combination of the different kernel functions
where the kernel function
where
It is known that kernel functions contain the linear kernel, the radial-based function kernel, the polynomial kernel, the sigmoid kernel, and so on, and these functions can be divided into two different groups based on the different feature description capabilities. The first group is the global kernel function which can describe the feature with large scales in the data, while the other one is the local kernel function used to present the feature with small scales. The Gaussian radial-based kernel function is the typical local kernel function
Comparatively, the polynomial kernel function is one of the global kernel functions
Combining the global and the local kernel functions, the multi-kernel function can describe both the large- and small-scale features that a single kernel function cannot. Figure 1 illustrates a Gaussian radial-based function kernel, a polynomial kernel function, and the combination of both kernel functions. The center of the test is 0.05. It can be seen that the amplitude variation of the radial-based kernel function is concentrated in the areas around zero when the kernel parameter σ varies. Comparatively, the amplitude of the polynomial kernel varies sharply in the areas beyond zero when the kernel parameter

Combination of the Gaussian and polynomial kernel function: (a) Gaussian kernel function, (b) polynomial kernel function, and (c) combination of both kernel functions.
Based on the above analysis, a multi-kernel function is constructed by the linear summation of the polynomial kernel and Gaussian radial-based function kernel function here in order to satisfy the various features imbedded in the data
where
It can be seen that the performance of multi-kernel function
Multi-KNMF
Based on the construction of multi-kernel function by Gaussian kernel and the polynomial kernel, a novel fault diagnosis method, performed by the combination of the multi-KNMF method and multi-kernel support vector machine (SVM), is proposed in this section. The sketch of the proposed method is listed as follows:
For a given data set
Step 1: Initialize the kernel parameter
Step 2: Factorize the training set
Step 3: Input the coefficient matrix
Step 4: Optimize parameters of the multi-KNMF and the multi-kernel SVM classifier on the maximum of the classification accuracy of the training set. Generally, the ranges of
Step 5: The genetic algorithm was used for the optimization of all these parameters, and corresponding settings are listed as follows: the initial generation is randomly created with the size of 20 populations. In all, 60% of the generation with higher classification accuracy is selected to perform the crossover operation, while the mutation probability is 0.1. The convergence threshold is 0.0001, and the maximum generation is 1000.
Step 6: Map the test data on the base matrix, obtained by the multi-KNMF on the train data with the optimal parameters. Then, input corresponding coefficient matrix to the multi-kernel SVM classifier with the optimal parameters and diagnose the classification of the test data.
Figure 2 illustrates the sketch of the proposed diagnosis method performed by the combination of the multi-KNMF and the multi-kernel SVM. It can be seen that parameters of both multi-KNMF and multi-kernel SVM are optimized by the genetic algorithm together. Optimization on the multi-KNMF is to find the optimal input vector for the classifier, while optimization on the multi-kernel SVM is to find the optimal classifier for the input vector. The combination of both parts illustrates that the proposed method designs such classification system which can make the combination of the input and the classifier most suitable for the classification. Therefore, the proposed method is inclined to classify the data with the most comprehensive feature. It is worth noting that the multi-kernel function used in the multi-KNMF can be equal to the one used in the multi-kernel SVM or can be different from each other. Ideally, any feature of the classifier input can be mapped to the kernel space by an independent kernel function. However, this idea is time and memory consuming in the training program and can be investigated in the future.

Fault diagnostic sketch based on the combination of the multi-kernel NMF and multi-kernel SVM.
Case study
Case 1: bearing fault diagnosis
In this section, the vibrations signals collected in the bearing center of Case Western Reserve University was used to validate the efficacy of the proposed method. The test rig was composed of the electrical motor, the torque transducer, the power consumption machine, and the control part. The drive end was supported by a groove bearing of type 6205-2RS JEMSKF, whose geometry size is listed in Table 2. In order to simulate the bearing faults, grooves with a depth of 11 mils (279.4 μm) were planted on the surface of the inner race, outer race, or the rolling element. The diameter of the grooves was set as 7 mils (177.8 μm), 14 mils (355.6 μm), and 21 mils (533.4 μm) to illustrate the different degrees of the faults.
Geometry size of the drive end bearing (mm).
An acceleration transducer was mounted on the bearing house at the drive end. The vertical bearing vibration, produced by various combinations of loads (0, 1, 2, and 3 hp), types, and degrees of the faults, was collected by the DAT data acquisition machine with 16 channels at a sampling frequency of 48 kHz when the shaft rotated at a frequency of 30 Hz. In all, 40 groups of the vibrations were recorded with a length of 10 s. Randomly select 10 groups collected at the load of 3 hp and divide them into nine sets with the length of 2048 points based on the different combination of the fault type and degree. Detailed descriptions of the nine sets are listed in Table 3. Numbers in the name of each set denote the diameter size of the groove on the rolling element, the inner race, and the outer race. For example, D070707 presents a data set produced by groove with a diameter of 7 mils on the rolling element, inner race, and outer race, respectively. DBALL, DINN, and DOUT present data sets produced by different degrees of ball, inner race, and outer race faults, respectively.
Data set composed of various combinations of bearing faults.
N: normal condition; B: ball fault; I: inner race fault; O: outer race fault.
Figure 3 illustrates typical waveforms produced by different bearing faults. It can be seen that the vibration collected at the normal condition is very similar to that produced by the rolling element fault. In the validation program, all samplings were decomposed by DB10 wavelet transform. As shown in Figure 4, the wavelet coefficients of four layouts were attached to be a waveform with the length of 8192 points and used for the input image of the proposed method. The results are listed in Table 4. It can be seen that all faults are accurately classified even when the size of the input vector is decreased from 8192 to less than 16. Only six fault features are used for the accurate classification of the ball faults with different degrees. Compared with the original size of the wavelet coefficient waveform, the dimensionality of the input vector was sharply reduced by the multi-KNMF. It is shown that the proposed method can be efficiently applied for the fault diagnosis of rolling element bearings.

Waveforms collected at different bearing conditions: (a) normal, (b) rolling element fault, (c) inner race fault, and (d) outer race fault.

Wavelet coefficients of different bearing conditions: (a) normal, (b) rolling element fault, (c) inner race fault, and (d) outer race fault.
Optimal kernel parameters and corresponding average accuracy of the application of the proposed method for bearing fault diagnosis.
Case 2: rotor condition identification
The shaft orbit involves both amplitude and phase information of the rotor and is more convenient for the condition monitoring performed by a signal amplitude curve or amplitude–frequency curve. However, the collected shaft orbit may not be completely closed and may be disturbed heavily by the noise. To extract the orbit and to identify the conditions of the rotor automatically, the proposed method was applied on the condition identification of the rotor based on the shaft orbit images in this section.
Composed of the drive system, control system, data acquisition system, and lubrication system, a test rig developed by Xi’an Jiaotong University is shown in Figure 5. In order to simulate the eccentricity of the rotor, a screw of 10.5 g was mounted to depart 0.08 m from the center of the disk at the radial direction. A plastic circle was mounted beyond the disk, and the friction between the disk and the plastic circle may happen when the distance between these two objects changes.

Rotor test rig.
Eddy current displacement sensors were placed at the vertical and horizontal directions to collect the displacement of the rotor. Corresponding sketch of the sensor displacement is shown in Figure 6. Averaged 210 orbit images of the rotor were created by both directions of the displacements at each condition. Containing

Schematic diagram of the rotor test rig.

Shaft orbit images of different rotor condition: (a) normal condition, (b) rubbing, (c) eccentricity, (d) combination of rubbing and misalignment, and (e) combination of parallel and misalignment.
Transforming the orbit image to be a vector with a dimension of 235,200 × 1 and making the summation of all elements to be the unit, all orbit images were normalized first. The generic algorithm was used to optimize the averaged accuracy of the condition identification. In the same time, the SVM classifier based on the SNMF was applied on the condition identification of the rotor. Corresponding settings are listed as follows: the key dimension
Optimal kernel parameters and corresponding average accuracy of the application of the proposed method for rotor condition identification.
SNMF: sparse non-negative matrix factorization.
It can be seen that both methods have got very high accuracy in the condition identification of rotor. Compared with the fault diagnosis method based on the SNMF, the proposed method is more accurate even when the dimension of the input vector is reduced more sharply. When the shaft rotates with the speed of 1200 r/min, only 29 features are used by the proposed method and accurate identifications are carried out. When the shaft rotates with the speed of 600 r/min, the accuracy of proposed method is a little lower than the method based on the SNMF. However, when the shaft rotates with the speed of 300 r/min, the proposed method presents obvious improvements. It is worth noting that the feature selection method is not involved in the proposed method. In summary, the proposed method is more powerful and concise than the method based on the SNMF.
Conclusion
In order to maintain the fault information as good as possible in the dimensionality reduction program, a multi-KNMF was proposed on the basis of the combination of the radial-based function kernel and the polynomial kernel function. By inputting the feature vectors into the multi-kernel SVM classifier, a novel fault diagnosis method was further proposed by the combination of the multi-KNMF and multi-kernel SVM. The genetic algorithm was used for the optimization of the parameters involved in the dimensionality reduction and fault classification. Two experiments were used to validate the efficacy of the proposed method. It is shown that the multi-KNMF can efficiently maintain the fault information in the fault diagnosis. The proposed method shows very high accuracy in the application of the bearing fault diagnosis. Compared with the classifier based on the SNMF, the proposed method is more powerful in the condition identification of rotor even not involving a feature selection program.
