Abstract
Keywords
Introduction
The ball screw pair is a driving mechanism that converts rotary into linear motion. It is characterized by high efficiency, high precision, and long operation life. The ball screw is widely used in CNC machine tools, servo drives, aerospace, precision instruments, medical equipment, and robots. However, under long-term high-speed, heavy loading, and other severe working conditions, the contact interface of the ball screw pair raceway exhibits wear and pitting. This leads to severe threats to the processing quality and operating safety of the mechanical equipment, particularly in aviation. In high-precision equipment, such as machinery and CNC machine tools, a failure leads to severe equipment or even personal safety accidents. Moreover, the aforementioned may lead to economic losses. Therefore, attention must be paid to the reliability and safety of the ball screw pair. Hence, fault diagnosis technology must be investigated to achieve condition-based maintenance and reduce the problems caused by the ball screw pair. The ball screw pair fault diagnosis method typically includes two steps: fault feature extraction and fault pattern recognition.
During the fault feature extraction stage, the vibration signal of the ball screw pair is analyzed in time and frequency domains. For complex vibration signals, simple time-frequency analysis is not sufficient to complete signal noise reduction in fault feature extraction. Bin et al. 1 proposed a method that combines wavelet packet decomposition and empirical mode decomposition to extract fault features for rotating machinery fault diagnosis. Wen-Yi et al. 2 proposed a hybrid time-frequency technology based on the improved Morlet wavelet and auto terms window for fault feature extraction. Wang and Shao 3 combined the distance evaluation technique (DET) with Pearson correlation analysis to propose an improved hybrid feature selection technique (IHFST) for fault feature extraction. Then, a subset of sensitive features without correlation features is obtained by this method. 3 Therefore, continuous wavelet transform can effectively denoise the vibration signal of a ball screw and be used for fault feature extraction.
During the fault pattern recognition stage, the data-driven method is often used to identify fault feature vectors. The support vector machine (SVM) is a binary classification model. Its purpose is to find a hyperplane to segment the sample and convert it into a convex quadratic programing problem that can be solved. The SVM cannot only achieve the binary classification of signals. Moreover, it is also successful in converting the problem. Deng et al. 4 employed advanced fuzzy entropy after empirical wavelet transform (EWT) as the feature vector. The authors input the feature vector into the support vector machine (SVM) to complete the fault diagnosis of a bearing. 4 Ben Ali et al. 5 used empirical mode decomposition (EMD) to reduce the noise and extract energy entropy. The authors used the extracted entropy for an artificial neural network (ANN) to classify bearing defects. 5 Benkedjouh et al. 6 combined the Isometric Feature Mapping reduction technique (ISOMAP) with support vector regression (SVR) to simulate degradation and predict the remaining useful life(RUL) of the bearings. Li et al. 7 used support vector machines and Gaussian process regression (GP) for the fault diagnosis, health assessment, and remaining life prediction of ball screw pairs. Based on the wavelet packet transform (WPT) method, Hu et al. 8 proposed the improved WPT (IWPT) method. They extracted the best fault features from the signal using the IWPT and distance evaluation techniques. Then, the authors input the best fault features into the support vector machine combined with the AdaBoost algorithm to identify bearing faults. The neural network is a complex network system formed by a large number of interconnected neurons. SVM and neural networks are widely used for mechanical fault diagnosis. The accuracy of pattern classification methods, such as SVM and neural networks, is greatly affected by fault characteristics. Because early fault signal characteristics of ball screw pairs are weak, neural networks can easily get stuck on local optimal solutions. Moreover, the local optimal solution and the convergence speed are both slow. Therefore, the diagnostic accuracy of this method is generally low.
In recent years, neural networks are widely employed in the field of fault diagnosis. Saravanan and Ramachandran 9 used the discrete wavelet transform (DWT) and ANN to diagnose the early faults of gearboxes. Zhang et al. 10 proposed a ball screw state monitoring method based on a deep confidence network and multi-sensor information fusion. The authors concluded that the proposed method has high accuracy and stability. Zhang et al. 11 used a dynamic cuckoo search algorithm to optimize the structure of a BPNN. The authors effectively addressed BPNN shortcomings of slow convergence and ease of getting stuck on a local optimal solution.
However, convolutional neural network (CNN), which integrates feature extraction and pattern recognition, has been widely developed. The CNN does not require a separate feature extraction. Moreover, it has high recognition accuracy. CNN comprises a feature extraction module and a classification module. By constructing multiple convolutional and pooling layers to extract the features of the input data, failure mode classification is achieved by employing a fully connected layer and the classifier. Islam and Kim 12 defined the evaluation index defect rate (DDR) which they used as the input of the adaptive deep convolutional neural network (ADCNN) to train the network and perform fault diagnosis. Abdeljaber et al. 13 used a one-dimensional convolution neural network (1DCNN) to identify and locate bearing damage. The authors experimentally demonstrated that the method has high accuracy in damage detection, positioning, and quantification. Their method is highly robust and can achieve the highest accuracy with minimum training. Guo et al. 14 proposed a new deep convolutional transmission learning network (DCTLN) which is comprised of two modules: domain adaptation and state recognition. For feature learning and health status recognition, the domain adaptation module helps the 1DCNN state recognition module learn domain-invariant features by maximizing the domain recognition errors and minimizing the distance of the probability distribution. Chen et al. 15 proposed a CNN-based and discrete wavelet transform (DWT) planetary gearbox failure mode classification method. The method uses convolution calculation to learn and identify features from discrete wavelet transform coefficients. Then, a softmax regression model is used to distinguish between different health conditions.
In this paper, the vibration signal of the ball screw pair is considered as the research object. A fault diagnosis method and the corresponding test method are proposed. Continuous wavelet transforms (CWT) and two-dimensional convolutional neural networks (2DCNN) are combined to achieve fault diagnosis. First, CWT is used to reduce the noise of the vibration signal and obtain the time-frequency domain graph characterizing the fault information of the signal. Then, features of the time-frequency graphs are extracted by constructing alternately connected convolutional and pooling layers. Finally, a fully connected layer and a softmax classifier are defined to complete the fault of the ball screw pair pattern classification.
Data acquisition experiment
Experimental setup
The ball screw pair fault diagnosis test device is shown in Figure 1. It is primarily comprised of a servo motor, control system, ball screw pair, screw fixed seat, screw support seat, magnetic powder brake, torque-speed sensor, and the grating ruler. The grating ruler is located at the bottom of the cast iron platform. Three acceleration sensors are installed onto the test stand. Sensor 1 is a vibration sensor installed on the exterior of the screw-fixed seat. Sensor 2 is a wire-side vibration sensor installed on the exterior of the nut. Sensor 3 is a remote motor end vibration sensor installed on the interior of the screw support seat. Specific installation is shown in Figure 2.

The ball screw pair fault diagnosis test device: (a) front view of the test bench and (b) enlarged view of the test bench.

Sensor installation location: (a) vibration sensor near the motor end, (b) wire side vibration sensor, and (c) remote motor end vibration sensor.
Data description
According to the analysis of common ball screw pair faults, screw pitting fault, screw wear fault, screw fixing seat fault, and screw support seat fault were selected as the preset faults for fault diagnosis testing. The preset fault types are shown in Figure 3.

Preset fault types of the ball screw: (a) screw pitting fault, (b) screw wear fault, (c) screw fixing seat fault, and (d) screw support seat fault.
The input torque of the ball screw pair fault test was 1 N·m, the input speed was 1200 r/min, and the data sampling frequency was 20 kHz. Vibration signals of the ball screw pair under the normal condition, screw pitting failure, screw wear failure, screw fixing seat failure, and screw support seat failure were obtained. The time-domain waveform of vibration signals of each state is shown in Figure 4. In Table 1, data sample distribution and category label definition of the ball screw pair are presented. Here, 20 samples were obtained for each state, amounting to a total of 100 samples with 20,000 data points in each sample.

Time-domain waveform of ball screw pair vibration signal: (a) normal, (b) screw pitting fault, (c) screw wear fault, and (d) screw fixing seat fault.
Data sample distribution and category label.
Fault diagnosis based on continuous wavelet transform and two-dimensional convolution neural network
Due to strong background noise and weak fault characteristics of the ball screw pair’s vibration signal, it is difficult to capture the internal rule of the fault state by only depending on the time domain or frequency domain signal information. Therefore, CWT and 2DCNN are employed in this paper to perform the fault diagnosis.
The flow of the proposed fault diagnosis method based on the CWT-2DCNN is shown in Figure 5. First, the CWT is used to analyze the vibration signal in the time-frequency domain, draw the vibration signal time-frequency diagram, and save it to the designated folder. The 2DCNN model was constructed and trained with the time-frequency diagram as the input. The CNN can extract the fault features of the time-frequency diagram of the ball screw pair through multiple convolutions and pooling layers. Moreover, it can complete the fault classification by setting the softmax layer. Finally, the test signal was input into the 2DCNN model to conduct the fault diagnosis of the ball screw pair.

Fault diagnosis method based on CWT-2DCNN.
Input of 2DCNN
The CWT method is used to analyze vibration signals of a normal ball screw pair, pitting failure of the screw, wear failure of the screw, loosening of the screw fixed seat bolt, and loosening of the screw support seat bolt. Time-frequency representations of the normal and fault states of the ball screw pair are shown in Figure 6. Signal amplitude with various times and frequency is modified under different health conditions of the ball screw pair. This can be used to represent the fault characteristics information of the ball screw pair more comprehensively. To reduce CNN computation, the time-frequency graph is saved as a JPG file with the size of 64 × 64 and named according to the category label of the ball screw pair.

Time-frequency diagram of ball screw pair: (a) normal, (b) screw pitting fault, (c) screw wear fault, (d) screw fixing seat fault, and (e) screw support seat fault.
Structure of 2DCNN
The structure of the 2DCNN developed in this study is shown in Figure 7, and its details are presented in Table 2. CW represents the width of the convolution kernel (filter). CH represents the height of the convolution kernel, CN represents the number of the convolution kernels and the number of output characteristic maps, the channel represents the depth of the input characteristic map of the current layer, and S represents the width of the pool band of the pool layer. Max pooling was used in this study, and strides represents the moving step.

Structure of 2DCNN model.
Structural parameters of 2DCNN model.
Convolution layers
Convolutional neural networks use several convolution kernels when performing convolution operations on the input data to extract their features. Therefore, the convolutional layer can also be considered as the feature extraction layer. As such, it is an important part of CNN. Since the same convolution kernel is used in the same convolution layer, it has the characteristics of weight sharing, which can effectively reduce the training parameters. To obtain the output characteristic map, convolution operation entails the computation of the input data from left to right and from top to bottom by the convolution kernel with a certain step size S. Typically, multiple convolution kernels are used to extract multi-dimensional characteristics of the signal.
The convolution calculation is expressed as follows:
where
The ReLU function is defined as:
The ReLU function is linear, and the slope of the function is equal to 1 within the greater-than-zero domain. Therefore, it not only has a fast convergence rate but also effectively avoids the phenomenon of gradient vanishing.
Pooling layers
Pooling is the down-sampling of the feature map of the convolutional layer. It reduces the size of the feature map, the complexity of the network, compresses the features of the feature map, and extracts the main features. As shown in Figure 8, two common pooling methods for improving the learning speed of the subsequent network are: (1) max pooling: output the maximum value of the data in the pooling window; (2) average pooling: output the average value of the data in the pooling window.

Schematic diagram of pooling calculation.
The pooling operation is expressed as follows:
where
Fully-connected layer
The fully connected layer is equivalent to the multi-layer perceptron in a traditional neural network. Within the network, each neuron is connected with all neurons in the previous layer. The input vector of the fully connected layer is one-dimensional. Therefore, the pooling layer after feature extraction must be converted into a one-dimensional vector. Then, the fully connected layer is used to integrate and classify the feature information, which can be understood as a simple multi-classification neural network.
The fully-connected layer is expressed as follows:
where
The CNN sets a classifier after the fully-connected layer to carry out classification. Traditional logistic regression classifier mainly solves the binary classification problem, while softmax classifier is based on the generalization of the logistic regression classification to multiclass problems. Therefore, a combination of the fully-connected layer and softmax classifier is selected in this paper to perform classification via CNN.
Model optimization based on BN algorithm
The 2DCNN model constructed in this paper has a batch normalization (BN) layer. This layer can effectively inhibit the internal covariate transfer of CNN, improve the convergence speed of the model, and enhance its generalization ability. BN algorithm is similar to data standardization processing. As such, it can reduce the transfer of covariates within the network and improve the training speed of the network. However, such operation limits the input to a narrow space, thus reducing the expression ability of the network. Therefore, scaling parameter
The mean
The input signal
The final batch normalized output is as follows:
Where,
Adam parameter optimization algorithm
In the backward propagation process of CNN, a derivative of the loss function is calculated for each weight. Then, an optimization algorithm is used to update the weight. Thus, the value of the loss function is continuously reduced to achieve the optimal solution of the network. For deep CNN with a complex structure and more parameters and hyperparameters, stochastic gradient descent (SGD) can easily fall into the local optimal problem. This, in turn results a model being unable to obtained optimal classification results. Therefore, the Adam algorithm is selected in this paper. This algorithm has better robustness to hyperparameter selection, which consequently reduced the difficulty of CNN parameter adjustment.
Adam algorithm is defined as follows:
Network weights
where,
The first moment and the second moment of step t are updated as follows:
where,
First moment deviation
Finally, network weights are updated as follows:
Experimental results
Fault diagnosis results
In Figure 9, fault diagnosis results for the ball screw pair based on the CWT-2DCNN are shown. According to Figure 9, recognition accuracy of the network model has also increased with an increase in the training times. When the network was trained 18 times (six iterations), the 2DCNN model began to converge. At the same time, the loss value of the training set and the test set rapidly decreased with an increase in training times. Finally, the loss value decreased to 0.0015 and 0.0255, respectively. This indicates that the proposed CWT-2DCNN method can accurately assess all types of ball screw pairs. The training and testing accuracy are as high as 100%.

Fault diagnosis results for ball screw pair based on CWT-2DCNN: (a) training and test recognition accuracy and (b) training and test loss.
To verify the reliability of the CWT-2DCNN method for ball screw pair fault pattern recognition, 10 groups of tests were carried out using the ball screw pair fault test signal. Network recognition accuracy is shown in Figure 10. Only one of the 10 groups achieved a recognition rate of 96.67%, while the remaining nine groups achieved a recognition rate of 100%. The average recognition rate of the 2DCNN model was 99.667%. It can be concluded that the proposed CWT-2DCNN fault diagnosis method has high recognition accuracy. Moreover, the network reached convergence after six iterations, and the training of the model was relatively fast.

The recognition rate of 2DCNN model.
t-SNE analysis
Feature vectors in the 2DCNN are high-dimensional and cannot directly display the feature distribution of each layer. Therefore, t-SNE was used to reduce the extracted high-dimensional features to three dimensions and carry out the visual analysis. Five different colors are used to indicate the five state types of the ball screw pairs, respectively. The t-SNE feature visualization results of the input layer and fully-connected layer is shown in Figure 11. According to Figure 11, feature differentiation of the input layer of the CNN is relatively low. Because the features of the input layer of the ball screw pair in different states overlap and are mixed, they cannot be directly distinguished. However, after feature extraction of multiple convolutions and pooling layers, the feature spacing between the same states decrease, while the feature spacing between different states increase. Features in the same state are gradually clustered, while those in different states are gradually separated after the convolution and pooling operation. This is useful for distinguishing different ball screw fault types. The CWT-2DCNN model constructed in this paper can effectively extract different feature information from the input datasets and recognize the fault patterns of ball screws.

Characteristic distribution of 2DCNN model: (a) input layer and (b) fully-connected layer.
Performance comparison
2DCNN, 1DCNN, and BPNN are compared and analyzed in this section. Fault features of 2DCNN and 1DCNN are time-frequency domain graphs of the noise reduction signal. Moreover, fault features of the BP neural network are the full feature vectors in the time-frequency domain. A specific comparison between the three network models is presented in Table 3. The 2DCNN has a faster convergence speed. After reaching 20 training cycles, the network b stable, and the accuracy rate was approximately 100%. The 1DCNN has a slower convergence rate and worse network stability, while its accuracy rate is only 60%. The BPNN has a slower convergence rate, and the network recognition accuracy rate of 96.67%. The convolution layer and the pooling layer in the CNN can directly extract the input data features. Therefore, the CNN avoids the uncertainty included in the artificial extraction of fault features. However, traditional BPNNs require artificial extraction of fault features and generation of the fault feature vector as an input to train and test the network. The accuracy of the BPNN recognition is greatly affected by the fault features, whose artificial extraction requires some experience. In summary, the two-dimensional CNN model proposed in this paper has more advantages compared with the BPNN model. Moreover, it does not require independent feature extraction. Furthermore, the network convergence speed is faster and the classification accuracy is higher, which means that fault diagnosis can be accurately performed for the ball screw pair.
Performance comparison of several networks.
According to Figure 12, the accuracy rate of 1DCNN training/test slowly increases with an increase in training times. After reaching 20 trainings, the accuracy rate of 1DCNN training/test fluctuates around 60%. Moreover, the network fluctuation range is large and the recognition rate is low. Considering that vibration signal characteristics of a ball screw pair are not obvious and they have a significant impact on noise, the accuracy of 2DCNN training/test increases rapidly with an increase in training times. After 20 trainings, the accuracy of 2DCNN training/test reaches nearly 100%. Therefore, 2DCNN can classify the ball screw pair more accurately and verify the validity of the CWT-2DCNN model.

Comparison of classification results between 2DCNN and 1DCNN.
Conclusions
Artificial extraction of fault features required by the traditional fault diagnosis methods leads to uncertainty in fault classification. To overcome this problem, a ball screw fault diagnosis method based on CWT and 2DCNN is proposed in this paper. Noise reduction of the vibration signal by CWT can express the fault features of the vibration signal more accurately and comprehensively. Convolutional and pooling layers of CNN were used to directly extract the time-frequency domain features of the vibration signal. Thus, the uncertainty caused by manual extraction of fault features and accurately achieve fault diagnosis is avoided and accurate fault diagnosis was achieved. The proposed CWT-2DCNN fault diagnosis method has an average recognition rate of 99.67%. Compared with the 1DCNN and traditional BPNN, the proposed method has fast network convergence and high recognition accuracy.
