Abstract
Keywords
Introduction
With the increasing demand for applications and the continuous development of modern technology, the complexity of industrial equipment is gradually increasing, and it plays an important role in industrial production.1,2 However, the industrial equipment often faces a complicated working environment during its operation, which may affect its performance and even cause equipment failure.
At present, domestic and foreign scholars have proposed many methods for equipment fault diagnosis, including qualitative analysis–based method,3,4 model-based method,5,6 data-driven-based method,7–10 and so on. The qualitative analysis–based method, which analyzes the causality and development law of equipment failure, needs to rely on some qualitative analysis tools and combine with expert knowledge and system knowledge to carry out knowledge reasoning, including graph theory methods, 11 expert systems,12–14 qualitative simulation, 15 and so on. Therefore, this method has some disadvantages. For instance, it cannot obtain quantitative analysis results for equipment failure, the complexity of graph model increases suddenly when the device is more complicated, and knowledge acquisition may be difficult. The key of the model-based approach, including state estimation method, 16 parameter estimation method, 17 parity space method, 18 and so on, is to establish a mathematical model that is consistent with the running process of the equipment and judge the running state and fault type of the equipment through the residual mathematical signal between the accurate mathematical model and the observable measurement. This method can diagnose equipment failure well. However, the model of equipment may not be modeled well when the equipment is complicated. The data-driven technique does not require the creation of a physical model of the device; use the monitored data during the operation of the equipment to diagnose the fault type of the equipment, for example, machine learning,19–22 multivariate statistical analysis, 23 signal processing, 24 rough set,25,26 fuzzy set, 27 and multi-sensors or multi-sources information fusion method.28–31
In the multi-sensors information fusion based method, in which the data of multiple sensors (or sources) are fused, reflects the diversity, redundancy, and complementarity of multiple information. Therefore, this method could obtain more reliable diagnostic results than single source information. A great deal of researches based on this approach have been developed. Wu et al. 32 proposed a framework of fault diagnosis based on Bayesian network (BN) for nuclear power plants. Within the framework, the data from multiple sensors were improved by fuzzy theory and data fusion. Zhang et al. 33 analyzed tunnel-induced pipeline damage based on fuzzy BN, and the proposed method was demonstrated on the construction of the Wuhan Yangtze River Tunnel. Huo et al. 34 proposed a new bearing fault diagnosis method based on weighted Dempster–Shafer (D-S) evidence theory combined with Genetic Algorithm (GA). Banerjee and Das 35 presented a new hybrid method based on information fusion for fault diagnosis, which combined the support vector machine (SVM) and short-term Fourier transform (STFT) techniques. Owing to the complexity of the working environment of the equipment and the limitations of the equipment loading sensors, the observation data often has certain ambiguity and uncertainty. The relationship between the fault characteristics and the fault type may be complex. Hence, the fault diagnosis method should deal with this uncertainty well. Compared with other methods, the D-S evidence theory provides the basic probability assignment (BPA), which can effectively represent the uncertainty and complex relationships.36–39 Besides, it also provided Dempster combination rule to fuse multiple information.
Many studies based on D-S evidence theory for fault diagnosis have been carried out. Wang and Xiao 40 and Xiao 41 proposed an improved multi-sensors data fusion method combined Euclidean distance with belief entropy to fuse sensor data for fault diagnosis. Jiang et al. 42 proposed a novel fuzzy evidential method to analyze failure mode and effects. Gong et al. 43 built a triangle fuzzy function according to historical data of the symptoms and utilized the BPA functions based on D-S evidence theory to diagnose the fault of nuclear power plant. Dong et al. 44 proposed a fault diagnosis method generating weight by the diversity degree of sensor reports to combine multi-sensor information. Credibility degree of evidence calculated by the support degree of all evidence is translated into weight by Xiao. 45 Besides, the weight is adjusted by the information volume of the evidence. Chen et al. 46 proposed a weighted fault diagnosis method, which generated weights using evidence distance and uncertainty.
As far as we know, the above researches treated historical data equally and did not consider the impact of data acquisition time on the diagnosis results. Actually, the farther the monitored data acquisition time is from the current diagnosis time, the smaller the impact on the device failure type diagnosis result is. Based on this, in this article, fault diagnosis method based on time domain weighted data aggregation and information fusion is proposed. At first, data from multi-sensors in a period of time are aggregated by a set of linear decaying weights, which ensure that the farther the data from the current time, the smaller the weight. Next, fault Gaussian model is constructed utilizing the aggregated data. Compared with the fault gauss model constructed by original data, the new fault model has better distinguishing ability among fault features. Then, the intersections between aggregated testing data and fault Gaussian model are transformed into BPAs. Finally, these BPAs are fused with discount coefficient based on D-S evidence theory. Besides, the final BPA is transformed into pignistic probability to make a decision for the fault type. The proposed method based on data aggregation in this article takes account of the influence of historical data so that it can reflect the uncertainty of fault data. The fusion results can also reflect the time effect. In addition, the original fault data are aggregated into a series of data considering different time length. Thus, the effects of certain extreme points on the diagnosis results can be avoided while increasing the robustness of the proposed method.
The rest of this article is organized as follows: in the “Preliminaries” section, the preliminaries about data aggregation and D-S evidence theory are introduced. In the “The proposed method” section, fault diagnosis method based on time domain weighted data aggregation and information fusion is proposed. In the “Illustrative example and discussion” section, verification of the proposed method on the motor rotor is elaborated and analyzed. The conclusion is made in section “Conclusion.”
Preliminaries
Data aggregation based on linear decaying weights
The data aggregation method based on linear decaying weights, proposed by Yager, 47 is an effective tool for data aggregation, which can reflect the impact of data acquisition time. The generated approach of linear decaying weights is shown in Figure 1.

Generated method for linear decaying weights.
Suppose that the data of a sensor of device during a period of time are
where
where
Using the above weight generated method, the aggregated data are
When the length of time
The Gaussian model
There is some uncertainty about the fault monitored data of the equipment, and an error may occur in judging the state of the equipment by using a certain observation value. Therefore, it is necessary to use a simple and accurate mathematical model to model the fault data, which can effectively extract the essential characteristics of the data, and then identify the fault type of the engine. Practices have indicated that many of the random variables produced in everyday production and scientific experiments can be approximated as a Gaussian distribution. Hence, the Gaussian model is utilized to model the fault data. The detailed construction process is described in the following.
Suppose there are
When calculating
It should be noted that the extreme values of the generated Gaussian model may be different due to differences in fault data dimensions. In order to better establish the Gaussian fault model of the device, the generated Gaussian model is normalized in this article, described in the section “The proposed method.”
D-S evidence theory
D-S evidence theory is a mathematical theory of multi-source information proposed by Dempster 48 and expanded by Shafer. 49 It extends the theory of probability and can effectively represent uncertainty due to inaccuracy and completely unknown uncertainty. This theory is widely used in the field of fault diagnosis, 50 multiple criteria decision,51–53 game theory,54,55 complex network, 56 and so on. In this part, a few concepts about D-S evidence theory are given.
Assume the device has
And the power set of
When using evidence theory for fault diagnosis, all propositions are a subset of FOD, that is an element of the power set. And the reliability of a proposition is determined by the BPA function. The concept of BPA is defined as follows:
Definition 1
Set
Then the mass function
In addition, D-S evidence theory provides Dempster combinatorial rules to fuse multi-source information. Let the two basic probability distribution functions be
In this equation,
The proposed method
In this article, a new method for engine fault diagnosis based on data aggregation, which considers the time of data acquisition, is proposed. The detailed description of the proposed method is shown in Figure 2, and the procedures are elaborated step by step in the following text.
Then, the aggregated data used the aforementioned linear decaying weights considering different the different lengths of time
where
For the fault feature
Based on the obtained mean and standard deviation, generate a Gaussian model of fault type

The flow-process diagnosis of the proposed method.
Suppose the generated Gaussian model in step 2 of fault feature
When the test sample intersects a Gaussian model of a single fault type, the ordinate of the intersection point is the probability that the test sample belongs to the fault type.
If the test sample intersects a Gaussian model of multiple fault types, the ordinate height of the intersection point represents support for a single fault type, and the low point represents support for multiple fault types. For example, there are two intersections in Figure 3. And the point
In addition, the sum of all BPAs in D-S evidence theory is equal to 1. Therefore, if the sum of the reliability values for the generated BPAs is greater than 1, normalization processing is performed; if less than 1, the redundant reliability is assigned to the complete set.

The generated Gaussian model under fault feature
Assume there are two fault types, if the aggregated test sample is located in the intersection area of the Gaussian model of two fault types under a fault feature, then it is impossible to distinguish which fault type the aggregated test sample belongs to. Thus, the indistinguishability of this fault feature for the two fault types can be expressed as
Therefore, the weight of the fault feature is
Similarly, we can get the weights of other fault features. After obtaining the weight of each fault feature, the BPAs under different fault characteristics generated in step 2 are multiplied by the corresponding weight to correct the reliability for fault type. Then, the Dempster combination rule, mentioned in the “Preliminaries” section, is used to fuse these corrected BPAs.
Finally, determine the fault type of the test sample. After the evidence is combined, the information we wish to obtain is as reasonable and reliable as possible, and the uncertainty of the information is as low as possible. However, BPA after fusion may contain a certain degree of uncertainty, which does not conducive to make a decision. Therefore, in this article, Pignistic probability conversion method is utilized to convert BPA into probability, and then judges the fault type for the test sample. The Pignistic probability conversion is defined as follows.
Assuming that
where
Illustrative example and discussion
In order to verify the effectiveness of the proposed method, the motor rotor fault diagnosis is used as an example in this section. The equipment is multi-functional flexible rotor test-bed, and fault data are from rotor vibration signal collected and extracted by the displacement sensor and acceleration sensor. For the equipment, three fault types are configured, including the rotor imbalance, rotor misalignment, and support base loosening, represented by
The vibration energy of the three faults is mostly concentrated on the basic frequency
The fault data utilized in this article is from Wen and Xu. 2 And the detailed fault diagnosis is verified in the following text.
Step 1: aggregate the fault data
For each fault feature under each fault type, 40 observations were continuously collected in
Then, aggregate the above fault data using the linear decaying weights. When the length of the historical data is taken, the aggregated data are
Comparing the fault data before and after the aggregation, it can be seen that the aggregated data are closer to the value of the diagnosis point and are more effective for fault diagnosis at the current time. It can be inferred that the type of equipment failure based on aggregated fault data may have a higher accuracy rate, which will be verified below.
Step 2: generate Gaussian model for aggregated fault data
According to the “The proposed method” section, the mean values and standard deviation values for all fault features are calculated and shown in Table 1. And the generated Gaussian model based on these mean values and standard deviation values is shown in Figure 4. Correspondingly, the generated Gaussian model for original data is shown in Figure 5.
The mean value and standard deviation value for fault features.

The generated Gaussian model for aggregated data under (a) fault feature

The generated Gaussian model for original data under (a) fault feature
Taking the Gaussian model of fault feature
In addition, the discriminability of different fault features is verified by statistical methods 59
where
where
The smaller the value of
Based on the above, we can obtain the value of
The
Step 3: generate the BPAs for aggregated testing data
First, the testing data are aggregated using a set of linear decaying weights. Then, aggregated testing data are matched with the generated Gaussian model to generate BPAs using the method mentioned in the “The proposed method” section.
For instance, one aggregated testing sample of fault type
The BPAs of the aggregated testing sample for fault features.
BPA: basic probability assignment.
Step 4: fuse the generated BPAs and make decision
In this step, first, calculate the weights (discrimination) of all fault features using the method mentioned in the “The proposed method” section. The result is shown in Table 4.
The weights of the fault features.
Then, multiply BPA under each fault feature by the corresponding weight to discount the BPAs generated in the “Step 3: generate the BPAs for aggregated testing data” section and fuse these BPAs using the Dempster combinational rule. The final BPA is
Finally, the final BPA is converted probability to make a decision using Pignistic probability conversion method
Based on the above, the probability of fault type
To further illustrate the significance of the proposed method in this article, the diagnosis results are compared with the Jiang et al., 28 as shown in Table 5. In Table 5, all fault types are correctly recognized with high reliability by the proposed method in this article and the method in Jiang et al. 28
The comparison of fusion results of our method and other method.
BPA: basic probability assignment.
Finally, as in the above steps, fault identification of all test samples can be obtained and showed in Table 6. From Table 6, it can be seen that the recognition rate of fault type
The recognition rate for different fault types.
The above results show that the proposed method can be effectively utilized for fault diagnosis of engine. In practical applications, sensors loaded on the equipment need to be monitored for a long time. Then, the monitored data are transmitted to the processing software. The proposed method is applied to analyze the monitoring data on the computer to obtain the current state of the engine. In addition, in order to improve the accuracy of fault identification, the monitored data are analyzed multiple times, and a certain threshold is set according to the domain knowledge. When the results of multiple analyses exceed the threshold, the fault type can be confirmed.
Conclusion
In this article, fault diagnosis method based on time domain weighted data aggregation and information fusion has been proposed. The focus of this method is fault data aggregation considering different length of time, first, which was more effective for fault diagnosis than single point diagnosis. Second, the Gaussian models based on aggregated fault data are constructed for different fault features. Third, BPAs are generated by the intersection between aggregated testing sample and the constructed Gaussian models. Finally, the BPAs are fused with different weights for different features based on D-S evidence theory. In addition, a straightforward verification and analysis by the motor rotor fault data for this method are presented. The proposed method is compared with the exiting method, and the results show that the proposed method can identify fault correctly with higher reliability. The diagnosis for fault testing data before and after aggregation is also compared. The results of this example provide compelling evidence that the Gaussian fault model generated by the proposed method has better distinguishing ability and the total recognition rate has been improved. And the total recognition rate has been improved by 1.9%. This approach has potential in areas such as risk assessment. In the future research, more influencing factors for fault diagnosis will be considered.
