A complementary approach for fault diagnosis of rolling bearing using canonical variate analysis based short-time energy feature

Abstract

Signal decomposition is a meaningful and effective methodology which is widely used for fault diagnosis. Mode/feature selection is an inevitable topic for fault diagnosis of rolling bearing due to over-decomposition. In practical application, the selection of sensitive modes is a challenging task, so many valuable works have been performed to cope with it. However, the published works lack an effective approach to acquire few meaningful modes by avoiding the complicated mode selection procedures, prior to feature extraction. Moreover, selection of the modes of interest fails to take the residual part into account, which makes the diagnosis result sensitive to the number of modes/features retained. This paper proposes a complementary approach to extract fault features and avoid the selection of single mode of interest, which employs canonical variate analysis to convert the original variable into two complementary spaces; canonical variate space; and residual space. Then the complementary statistical indicators Hotelling T² statistic and Q statistic are used to provide important information about the conditions of the rolling bearing. Subsequently, a new feature index, complementary short-time energy extracted from the two statistics are used as fault features which are given as an input to a classifier such as a support vector machine. Two data sets collected from different test rigs are used for demonstration of the proposed work. The experimental result shows that the troublesome feature/mode selection issue is avoided, and the diagnosis result is not sensitive to the number of canonical variate retained. Besides, the proposed approach can identify various working conditions of rolling bearing accurately, which is simple and effective for fault diagnosis of rolling bearing, compared with the existing methods.

Keywords

Canonical variate analysis feature extraction complementary short-time energy bearing fault diagnosis feature/mode selection

1. Introduction

Time–frequency analysis is a potential method used for decomposition of original signal to achieve embedded modes. The outcome of perfect signal decomposition is of great importance for fault diagnosis of rolling bearing. Generally speaking, fault diagnosis of rolling bearing includes three phases, namely, feature extraction, mode/feature selection and pattern recognition (Van and Kang, 2015). In addition to the components of interest, the collected vibration signal also comprises environmental noises. In order to acquire useful information from the collected signal to separate fault patterns, the original signal is first decomposed into a number of modes, which are then used to select the good information about various operating conditions. Although the commonly employed signal decomposition methods, like the empirical mode decomposition (EMD) (Van and Kang, 2015), local mean deposition (Sun et al., 2016), wavelet packet decomposition (Tao et al., 2013), and singular value decomposition (Muruganatham et al., 2013), are suitable for fault diagnosis of rolling bearing, mode/feature selection is a challenging task in many practical applications. To remove redundancy and noises so as to obtain a minimal number of relevant modes, an automatic method combined mode functions (Grasso et al., 2016) which denotes a sum of adjacent intrinsic mode functions (IMF) is proposed to enhance the feature of the original signal. In some cases, intelligent algorithms are adopted to select the optimal subset from the decomposed components, such as particle swarm optimization (Van and Kang, 2015) and attribute clustering (Cerrada et al., 2016a), support vector machine recursive feature elimination ((Yan and Zhang, 2015) and genetic algorithm (Cerrada et al., 2016b). Other presented criterioa for the selection of the most appropriate modes include: Kullback–Leibler divergence based adaptive selection method (Sun et al., 2015); and kurtosis based method (Georgoulas et al., 2013). Indeed, these techniques are effective for bearing fault diagnosis and the results obtained are satisfactory. But, we can clearly observe from these techniques that a common manner is adopted for mode selection. That is, a feature index of rolling bearing is extracted from the modes of interest depending on mode selection criteria but the residual components are ignored. Evidently, only the modes of interest are considered for feature extraction, the rest cannot be accepted for further estimation, so that these techniques are sensitive to the number of components retained. Moreover, this manner fails to estimate the original data comprehensively so as to cause information loss readily. It is worth pointing out here that the important information regarding bearing fault may exist in the removed components. As mentioned above, mode selection is a challenging task, thus the introduction of the extra approaches (e.g., optimization algorithms) for mode selection makes fault diagnosis of rolling bearing computationally intensive. Last but not least, mode/feature selection approaches usually rely on an expert’s knowledge, and they are difficult to implement in an optimal manner which is expected to simplify or avoid the selection of modes of interest. Thus, further study towards presenting an automatic approach for avoiding mode selection is required.

In fact, the aim of mode selection is to choose the important components from a series of modes for feature extraction as well as dimension reduction purposes. An alternative for mode selection is feature selection: many meaningful works based on feature selection approaches have been presented for fault diagnosis of rolling bearing, like the principal component analysis (PCA) (Xu and Chen, 2013), kernel principal component analysis (Liu et al., 2016), distance estimation (DE) technique (Liu et al., 2013), etc. Though the effective features can be retained depending on the selection criteria such as cumulative percent variation and eigenvalue, the diagnosis results are sensitive to the number of features retained. Canonical variate analysis (CVA) is a state space-based dynamic tool, which was first introduced by Hotelling, and then developed by Akaike, Larimore, and Odiowei (Odiowei and Cao, 2010). Moreover, CVA based on correlation statistic has been used for fault detection in many applications such as chemicals and polymers processes (Gajjar and Palazoglu, 2016; Yin et al., 2015). It requires a group of data sets recorded under normal condition to train a reference model so that the resulting model can remember the healthy condition. Furthermore, statistical indicators such as Hotelling T² and Q statistic (Squared Prediction Error) are adopted to provide useful information about the operating conditions, which are designed to measure the variability in retained space and residual space, respectively. Here, an interesting phenomenon is that the two metrics Hotelling T² and Q statistic are complementary (Odiowei and Cao, 2010). The ignored components in state space will be considered by the residual space, and vice versa. As a result, the detection result is insensitive to the number of components retained if the two spaces are taken into account at the same time. This complementary property has the potential to avoid the mode/feature selection issue. To solve the classification problem with a simple and effective method, a new feature extraction method for fault diagnosis of rolling bearing based on statistical indicators is proposed to avoid the selection of appropriate modes.

This paper hereafter is organized as follows: Section 2 briefly reviews the theoretical background of CVA and describes the proposed feature extraction method. Section 3 demonstrates the effectiveness of the proposed approach in practical application, and comparison of the proposed approach with the published works is done. Eventually, the conclusion is drawn in section 4.

2. Canonical variate analysis and the scheme for avoiding feature selection

Canonical variate analysis is a state-space based dimension reduction tool, which attempts to provide a linear combination which maximizes the correlation statistic between pairs of variables. Suppose the nonlinear dynamic system is expressed as follows (Odiowei and Cao, 2010):

x_{k + 1} = s (x_{k}) + w_{k}

(1)

y_{k} = t (x_{k}) + v_{k}

(2)

Where $x_{k}$ and $y_{k}$ are the state and measured vectors, respectively, s and t are unknown nonlinear functions, $w_{k}$ and $v_{k}$ are noise terms. However, the unknown nonlinear dynamic system is difficult to implement for detection purposes. At a stable instant, the linear stochastic state-space model is considered to approximate the unknown nonlinear system:

x_{k + 1} = {Ax}_{k} + ε_{k}

(3)

y_{k} = {Cx}_{k} + η_{k}

(4)

Where $A$ and $C$ are system matrices, $ε_{k}$ and $η_{k}$ are error vectors, respectively associated with the state and measured vectors.

In order to consider time correlations, the measured vector $y_{k}$ is first expressed at each time point k by setting p past and f future measurements (suppose each one contains m variables). This operation gives the past and future observations vectors $y_{p, k}$ and $y_{f, k}$ , respectively (Odiowei and Cao, 2010; Ruiz-Cárcel et al., 2015):

\begin{matrix} y_{p, k} = [\begin{matrix} y_{k - 1} \\ y_{k - 2} \\ : \\ y_{k - p} \end{matrix}] \in ℝ^{mp} \end{matrix}

(5)

\begin{matrix} y_{f, k} = [\begin{matrix} y_{k} \\ y_{k + 1} \\ : \\ y_{k + f - 1} \end{matrix}] \in ℝ^{mf} \end{matrix}

(6)

Then those variables are scaled to zero mean and unit variance to generate normalized vectors:

{\hat{y}}_{p, k} = y_{p, k} - {\bar{y}}_{p, k}

(7)

{\hat{y}}_{f, k} = y_{f, k} - {\bar{y}}_{f, k}

(8)

Where ${\bar{y}}_{p, k}$ and ${\bar{y}}_{f, k}$ are the sample means of $y_{p, k}$ and $y_{f, k}$ , respectively. Then, the resulting vectors are collected at different sampling instants k, which are then arranged into columns of past and future observation matrices $Y_{p}$ and $Y_{f}$ :

Y_{p} = [{\hat{y}}_{p, k + 1}, \begin{matrix} {\hat{y}}_{p, k + 2} & \dots & {\hat{y}}_{p, k + M} \end{matrix}] \in ℝ^{mp \times M}

(9)

Y_{f} = [{\hat{y}}_{f, k + 1}, \begin{matrix} {\hat{y}}_{f, k + 2} & \dots & {\hat{y}}_{f, k + M} \end{matrix}] \in ℝ^{mf \times M}

(10)

Where M+p+f−1 = N, N and M are the numbers of observations and dimensions, respectively.

Hence, covariance and cross-covariance matrices of $Y_{p}$ and $Y_{f}$ are respectively defined as:

Σ_{p} = \frac{1}{M - 1} Y_{p} Y_{p}^{T} \in ℝ^{mp \times mp}

(11)

Σ_{f} = \frac{1}{M - 1} Y_{f} Y_{f}^{T} \in ℝ^{mf \times mf}

(12)

Σ_{fp} = \frac{1}{M - 1} Y_{f} Y_{p}^{T} \in ℝ^{mf \times mp}

(13)

In order to extract an underlying relationship between variables, a singular value decomposition is done on the Hankel matrix H (Jiang et al., 2015; Ruiz-Cárcel et al., 2015):

H = Σ_{f}^{- 1 / 2} Σ_{fp} Σ_{p}^{- 1 / 2} = U Σ_{V} \in ℝ^{mf \times mp}

(14)

Where $Σ = diag (λ_{1}, λ_{2} \dots, λ_{r}, \dots, λ_{m})$ is a diagonal matrix containing the elements in a descendent order, $U$ and $V$ are orthogonal matrices, the dimensionality reduction matrix $V_{r} \in ℝ^{mp \times r}$ is obtained by retaining the first r columns of $V$ associated to the most attractive singular values in Σ, while the last (m-r) ones represent residues including noises and redundancy.

The projection matrices $J$ and $L$ can be calculated as:

J = V_{r}^{T} Σ_{p}^{- 1 / 2} \in ℝ^{r \times mp}

(15)

L = (1 - V_{r} V_{r}^{T}) Σ_{p}^{- 1 / 2} \in ℝ^{mp \times mp}

(16)

Subsequently, projection matrices $J$ and $L$ transform the data matrix $Y_{p}$ (the past matrix) into new matrices $Z$ and $E$ of uncorrelated variables termed canonical variates and residuals, respectively:

Z = {JY}_{p} \in ℝ^{r \times M}

(17)

E = {LY}_{p} \in ℝ^{mp \times M}

(18)

Hence the states of the system are determined. The system matrices $A$ and $C$ can be estimated by recursive algorithms, for example, linear least squares regression. Since the two parameter matrices are not used in this paper, determination of the matrices $A$ and $C$ can be ignored. By applying CVA, the new variables maximizing total data variance are linear combinations of the variables in $Y_{p}$ . Hotelling T² and Q statistic metrics respectively defined into the canonical variate space and residual space are the most commonly used criteria for capturing the information about the states of variables:

T_{k}^{2} = \sum_{i = 1}^{r} z_{i, k}^{2}

(19)

Q_{k} = \sum_{i = 1}^{m} e_{i, k}^{2}

(20)

Where $z_{i, t}$ and $e_{i, t}$ are the elements of column vectors of matrices $Z$ and $E$ , respectively.

Consequently, CVA transforms the variable space into canonical variate space and residual space, and the statistics T² and Q are the measures of all fault information in these spaces, respectively. So, bearing fault can be detected by either the one statistic or both the Q statistic and $T^{2}$ simultaneously (Harmouche J et al., 2014, 2015). It should be noted that statistics $T^{2}$ and Q are complementary, that is, an increment in one space will give rise to a relevant variation in the other. We are motivated to employ the complementary property to express the patterns captured by the two statistics, respectively. However, it is a challenging task to present a feature extraction method which can be applied to recognize the patterns among various operating conditions.

Short-time energy feature (STE) is an effective feature index that is widely used for classification scheme (Kang and Kim, 2013). This paper proposes a normalized STE combining the two complementary subspaces to develop a complementary short-time energy (CSTE) feature for overcoming mode/feature selection issue. Two features are respectively extracted from the complementary spaces, retained space and residual space, so as to avoid the procedures of mode/feature selection. Let y (n) be a statistical variable, n = 1, 2, $\dots$ N. Equation (21) is defined as a CSTE feature to distinguish various operating conditions:

F = \frac{\frac{1}{N} \times \sum_{i = 1}^{N} y (i)^{2}}{var (y)}

(21)

Where $var (y)$ is the variance of variable y .

Different from the traditional approaches used for feature extraction, the proposed work requires a group of vibration data collected from healthy working condition to construct a reference model. The state information of an unknown variable relative to the reference model can be estimated through Equation (19) and Equation (20). According to Equation (21), the proposed feature indexes are extracted from the two complementary spaces by replacing y with the two metrics, respectively. That is, y = $T_{k}^{2}$ for the retained space, y =Q_k for the residual space.

From a view of complementation, in this paper, an increment which results from r (the number of canonical variates retained) in the state space will cause a decrement in the residual space, and vice versa. Hence, we take advantage of this thought to achieve that diagnosis results are insensitive to r.

The scheme of the proposed work is summarized in Figure 1.

Figure 1.

The flowchart of the proposed complementary scheme.

3. Experiment and result

As mentioned earlier, this paper proposes a CSTE feature for bearing fault diagnosis to avoid mode/feature selection. In this section, experiments on validation of the proposed work are implemented.

3.1. Case 1

3.1.1. Experimental setup

The proposed method is evaluated using a data set acquired from the Case Western Reserve University bearing test rig (Yang et al., 2014). Figure 2 shows the bearing test rig. Three bearing faults, inner race fault, outer race fault and ball fault are introduced to rolling bearings. For each state, the vibration signals are collected under various loads and speeds with sampling rate of 12 kHz. The corresponding data sets are listed in Table 1. Further details concerning the bearing test rig can be founded in Smith and Randall (2015).

Figure 2.

Case Western Reserve University bearing test rig.

Table 1.

Dataset under different operating conditions.

Conditions	0 hp (1797 rpm)	1 hp (1772 rpm)	2 hp (1750 rpm)	3 hp (1730 rpm)
Normal	97.mat	98.mat	99.mat	100.mat
Ball fault	118.mat	119.mat	120.mat	121.mat
Inner race fault	105.mat	106.mat	107.mat	108.mat
Outer race fault	130.mat	131.mat	132.mat	133.mat

As for the aforementioned strategy, two data sets 97.mat and 98.mat obtained under healthy conditions are selected to train the CVA model as a reference. Then the CVA model is employed to detect various operating conditions such as normal, ball fault, inner race fault and outer race fault. Figure 3 uses the CVA based approach to show the resulting statistics corresponding to the vibration signals under healthy and faulty conditions. As shown in Figure 3, the data sampled under each condition are transformed into two metrics for the purpose of feature extraction.

Figure 3.

Output of canonical variate analysis based approach: (a) metrics for state space of normal condition; (b) metrics for residual space of normal condition; (c) metrics for state space of ball fault; and (d) metrics for residual space of ball fault.

3.1.2. Result and discussion

We get two features and 320 samples from the feature extraction procedure. In this study, each state comprises 80 samples and each sample is extracted from 6000 sample points. The total samples are divided into two sets, training set and testing set. 50 samples from each bearing state are used for training purpose and the rest of each condition (30 samples) is used for testing. Then, the training set (a total of 200 samples) and testing set (a total of 120 samples) are given as an input of a classifier such as the support vector machine (SVM). The SVM tool in libsvm-3.18 (Chang and Lin, 2011) is used to demonstrate the proposed work, and radial basis function is selected as a kernel function due to its good performance in nonlinear mapping. However, there are two parameters c (penalty factor) and g (width parameter) that have significant impact on classification capability of SVM. Thus, we use grid-search to determine the optimal values of c and g using 10-fold cross-validation. The CSTE features for the four bearing conditions are listed in Table 2. Here, the retained canonical variate r is firstly set to 4. The number of observations p and f can be determined by calculating the autocorrelation function of summed squares of variables (Ruiz-Cárcel et al., 2015). In addition, in order to investigate the influence of each r on CSTE features, p and f are set to 15. Figure 4 presents a prediction accuracy of SVM using the proposed work.

Table 2.

Features obtained from testing set (for r=4).

Normal F1, F2	Inner race fault F1, F2	Ball fault F1, F2	Outer race fault F1, F2
10.2726, 5.4196	1.3250, 1.5050	1.8525, 2.7194	1.1970, 1.3413
10.4768, 5.7045	1.3192, 1.5193	1.8982, 2.8692	1.1974, 1.3399
10.7223, 6.2126	1.3398, 1.5264	1.9199, 2.8515	1.1934, 1.3345
10.1819, 5.8479	1.3333, 1.4999	1.8328, 2.4264	1.1961, 1.3436
10.4580, 5.7517	1.3232, 1.5036	1.8690, 2.9135	1.1940, 1.3332
10.8945, 5.3970	1.3205, 1.4824	1.8629, 2.7078	1.1948, 1.3407

Figure 4.

Prediction result for r=4, optimal parameter c=1.0718, and g=8.0556.

As shown in Figure 4, the CSTE features corresponding to four bearing conditions are recognized accurately by a SVM classifier. Nevertheless, the number of canonical variates retained r significantly affects the capability of fault detection in process monitoring. Moreover, many published works depending on feature/mode selection are sensitive to the number of components retained. In this work, further research towards discussing the influence of canonical variates retained r (1 < r < 15) on the proposed feature extraction method is required. Due to space limitation, Tables 3 –5 only give a group of features (the first five) corresponding to different r, respectively.

Table 3.

Feature set obtained using present work (for r=2).

Normal F1, F2	Inner race fault F1, F2	Ball fault F1, F2	Outer race fault F1, F2
3.2002, 8.5832	1.2633, 1.4251	1.6282, 2.6854	1.1537, 1.3306
3.2404, 7.2198	1.2567, 1.4500	1.6864, 2.8915	1.1527, 1.3307
3.2489, 7.6484	1.2712, 1.4603	1.6482, 2.8486	1.1550, 1.3327
3.2294, 7.8474	1.2706, 1.4673	1.6451, 2.6724	1.1567, 1.3395
3.2617, 7.7008	1.2726, 1.4569	1.5744, 2.4308	1.1554, 1.3327

Table 4.

Feature set obtained using present work (for r=7).

Normal F1, F2	Inner race fault F1, F2	Ball fault F1, F2	Outer race fault F1, F2
9.5354, 4.3480	1.3995, 1.4936	2.1784, 2.6967	1.2698, 1.3366
8.8392, 4.4253	1.3883, 1.4782	2.3965, 2.8302	1.2734, 1.3337
9.4580, 4.6744	1.3632, 1.4413	2.4019, 2.8139	1.2666, 1.3289
9.2908, 4.6244	1.4110, 1.5089	2.1448, 2.4017	1.2771, 1.3374
9.2850, 4.4374	1.3914, 1.4788	2.3670, 2.8742	1.2745, 1.3264

Table 5.

Feature set obtained using present work (for r=10).

Normal F1, F2	Inner race fault F1, F2	Ball fault F1, F2	Outer race fault F1, F2
8.8109, 3.1201	1.2961, 1.4597	1.7544, 2.4154	1.2083, 1.3118
9.4017, 3.1500	1.2854, 1.4501	1.8667, 2.4436	1.2095, 1.3097
9.4530, 3.2487	1.2623, 1.4171	1.8608, 2.4437	1.2059, 1.3050
9.8132, 3.0436	1.3045, 1.4780	1.7109, 2.1551	1.2116, 1.3125
9.6173, 3.1230	1.2892, 1.1123	1.8758, 2.4763	1.2053, 1.3043

As displayed in Tables 3 –5, for each working condition, the CSTE features extracted from the retained space tend to increase along increase of r, while the corresponding features from error space are decreased gradually. Moreover, for each r, the features of the four conditions clearly indicate a separable pattern and similar to two complementary metrics, the features respectively extracted from the two metrics show an obvious complementary property, which makes the features obtained insensitive to r. Similarly, r=2, 7, 10 are then employed to investigate the influence of r on diagnosis result. For each r, the total samples, the samples for SVM training and testing are the same as the case for r=4. The identification rates for the four bearing conditions are depicted in Figure 5. Obviously, for the r considered in this study, the identification accuracies for the four bearing conditions reach 100%. This phenomenon illustrates that the proposed complementary feature also makes the diagnosis results insensitive to r.

Figure 5.

Experimental results: (a) identification result for r = 2, optimal parameters c = 1.0718, g = 1.0070; (b) identification result for r = 7, optimal parameters c = 1.0718, g = 4.0278; and (c) identification result for r = 10, optimal parameters c = 4.2871, g = 16.1113.

Then comparison of the proposed work with the published methods is done. In this study, two representative approaches for feature/mode selection are employed for comparison. To make a fair comparison, the techniques compared are tested under the same conditions. As shown in Table 6, derived from Xu and Chen (2013), to collect energy features, the authors first decomposed the vibration signal into a number of IMFs using EMD. Then the first six modes were selected using PCA to generate energy feature indexes for bearing fault diagnosis purposes. According to the procedures for extraction of the features, the energy feature indexes are extracted from the data used above as an input of the SVM. PCA is applied on the features obtained to retain the most effective components. In this study, accumulation contribution rate of 85% is chosen to retain the first 6 modes. Finally, we get 6 features and 320 samples. The other is based on feature selection criteria, for example, the DE technique. Liu et al. (2013) used 27 feature parameters (14 time-domain features and 13 frequency-domain features) to characterize bearing faults. Then the DE technique was applied to remove redundant features and retain the most important features for bearing fault diagnosis. In their study, the first 3 modes are selected from the IMFs adaptively decomposed by EMD. As a result, a total of 3×27=81 features were obtained. Subsequently, the important features are chosen from the feature set using a defined threshold. We repeat the feature extraction steps to obtain the 81 feature parameters. It is well known that the redundant features may have significant impact on performance of a classifier. Therefore, 4 important features and 320 samples are selected from the feature set with the help of the DE technique. For the techniques compared, the samples obtained are separated into two sets, training set (200 samples) and testing set (120 samples), which are then fed to SVM optimized by grid search using 10-fold cross-validation for further analysis. The details on comparison are presented in Table 6. Here, the accuracy is an average of five identification rates. As an example, Figure 6 depicts the results obtained for the two techniques.

Figure 6.

Experimental results for the techniques compared: (a) identification result for energy feature, optimal parameters c = 2.1435, g = 8.0556; (b) distance estimation result for threshold = 0.7; (c) identification result for 81 features parameters, optimal parameters c = 2.1435, g = 1.0070; and (d) identification result for 4 features parameters, optimal parameters c = 1.0718, g = 1.0070.

As displayed in Figure 6 (a), the energy features corresponding to four bearing conditions are basically recognized, the identification rate is 99.2%. Figure 6 (b) gives a result of DE for the 81 features, and the features with scores exceeding a threshold are selected as the most important features for SVM training and testing. For comparison, the 81 feature parameters and the reduced features are respectively presented to SVM. As shown in Figure 6 (c), the recognition rate for 81 features is 98.3%, while for the reduced features it reaches 100%. Therefore, feature/mode selection procedure is efficient for bearing fault diagnosis, which can improve the performance of classifier and reduce computational complexity. However, the procedures of feature/mode selection inevitably rely on an expert’s knowledge, which may resist the implementation of these techniques in practical application. Moreover, we can observe that the diagnosis results are sensitive to the feature/mode retained, which has important influence on the robustness of the existing approaches.

As shown in Tables 2 –5, clearly, the complementary short-energy features for different r indicate a distinguishable phenomenon. Besides, no matter what value of r is selected from the predefined range, the dimensionality of features is invariable. In other words, the proposed technique transforms the impact of different r into changes in values of features while the resulting features still show a separable pattern. This is why the diagnosis results are not sensitive to r. However, for the published contributions, the common way for feature/mode selection has inevitable influence on the length of feature vector so that the diagnosis results are sensitive to the features. For example, the length of feature vector in technique (Liu et al., 2013) is based on selection of a threshold. In the current paper, the proposed features are derived from the two complementary statistics rather than from the modes of interest. Thus, the feature/mode selection procedures can be avoided. To further investigate the superiority and effectiveness of the proposed work, case 2 is carried out.

3.2. Case 2

3.2.1. Experimental setup

In this study, inner and outer race faults are introduced to NICE® bearings to collect vibration data for further validation of the proposed work. Figure 7 shows the schematic diagrams of bearing component faults. The bearings are tested at various loads. The data considered are recorded at a load of 150 lbs, with sampling rate of 48828 sps. The speed of the input shaft is 1500 rpm. The baseline data are regarded as data collected from healthy conditions. In total, three working conditions, like normal state, inner race fault and outer race fault, are considered for validation.

Figure 7.

Bearing faults: (a) inner race fault; and (b) outer race fault.

According to the scheme presented in Figure 1, the vibration signals are firstly transformed into two subspaces. The number of past and future observations can be determined through estimating the autocorrelation function of summed squares of variables. Besides, to investigate the influence of each r on the features obtained and diagnosis results, the considered p and f are set to 10 (1<r<10). For each r, in total, 120 samples are extracted from the three working conditions. Each condition contains 40 features, each of features is extracted from 2500 sample points, and half of the total samples (60 samples) are utilized randomly as training data while the rest (60 samples) are used for testing. Due to space limitation, for r considered in this study, only the first five features are presented, as listed in Tables 7 –11.

Table 6.

Comparison of the proposed method with the existing works.

Techniques	Decomposition methods	Feature/mode selection	Feature extraction	Number of samples	Accuracy (%)
Xu and Chen (2013)	Empirical mode decomposition (EMD)	Principal component analysis	Energy	Train (200) Test (120)	99.2
Liu et al (2013)	Empirical mode decomposition (EMD)	Distance estimation	27 $\times 3$ features	Train (200) Test (120)	98.3–100
Current study	Singular value decomposition (SVD)	Not applicable	Complementary short-time energy	Train (200) Test (120)	100

Table 7.

Features obtained for r=3.

Normal F1, F2, Sum	Inner race fault F1, F2, Sum	Outer race fault F1, F2, Sum
2.3759, 4.5990, 6.9749	1.0599, 1.0807, 2.1406	1.6872, 2.6174, 4.3046
2.5249, 5.1586, 7.6835	1.0713, 1.1021, 2.1734	1.8490, 3.1508, 4.9998
2.2964, 4.5361, 6.8325	1.0690, 1.1533, 2.2223	1.7448, 2.5237, 4.2685
2.4603, 4.2312, 6.6915	1.0585, 1.0808, 2.1393	1.9667, 2.7969, 4.7636
2.5023, 4.4886, 6.9909	1.0981, 1.1033, 2.2014	1.6947, 2.6814, 4.3761

3.2.2. Result and discussion

As described in Tables 7 –11, for normal condition, the features (F1) extracted from state space tend to grow as r is changed from 3 to 8. In contrast to this, the features (F2) obtained from the residual space tend to decrease. Moreover, the sums of the two features are relatively stable. The root cause for this phenomenon lies in the fact that the subspaces are complementary. For inner race fault, with increasing the values of r, variation of the CSTE features are not obvious, which shows that the features are not sensitive to r, as demonstrated by case 1. For the outer race fault, the features (F1) collected from the retained space tend to increase gradually along growth in the values of r, while F2 is opposite. These phenomena agree well with the conclusion that an increment triggered by r in retained space will cause a decrement in error space. Evidently, the proposed CSTE feature indicates a complementary property. In addition, for different r, Tables 7 –11 show that the features collected from the three working conditions are separable. Then the feature vectors are considered as an input of a classifier for the purpose of validation. Here, taking SVM as an example to illustrate the effectiveness of the proposed work, penalty weight c and kernel width g are optimized by grid search using 10-fold cross-validation.

Figure 8 indicates that the features corresponding to three operating conditions are identified perfectly by SVM (optimal parameters c=1.0718, g=1.0070). Experimental results show the diagnosis results are insensitive to the number of components retained r (1<r<10). The diagnosis result for r=7 is not presented in Figure 6 due to space limitation. It is worth noting that this paper proposes an approach for fault diagnosis of rolling bearing for avoiding mode/feature selection derived from signal decomposition techniques, rather than only presents a feature extraction method.

Figure 8.

Identification results for the features corresponding to different r: (a) for r = 3; (b) for r = 4; (c) for r = 6; and (d) for r = 8.

To further validate the effectiveness of the proposed work, two representative techniques for fault diagnosis of rolling bearing are performed for comparison. To make a fair and valid comparison, the techniques compared are tested under the same conditions, like the number of samples used for SVM training and testing. The energy feature proposed in Xu and Chen (2013) is extracted from the data used in this study according to the original procedures (the details have been presented in case 1).

Cumulative percent variation of 95% is chosen to retain the first l=7 modes for further analysis. Moreover, l=5, 6 are also examined for comparison. For each l, the obtained feature set comprises 120 samples covering the three working conditions. Then these samples are divided into 60 training and 60 testing samples for classifier (SVM) training and testing, respectively. In addition, the 27 feature parameters consisting of 14 time-domain features and 13 frequency-domain features, employed in Liu et al. (2013) are also extracted from the first three modes derived from EMD. So, a feature set including 120 samples and 3× 27=81 features are obtained. The DE technique is utilized to select the most important features for the aim of dimensionality reduction. Finally, the selected 6 features and 120 samples (60 training samples and 60 testing samples) are presented to SVM classifier optimized by grid search using 10-fold cross-validation. Considering that the identification result may be sensitive to initial condition, we run the SVM classifier five times and obtain an average accuracy. Figures 9 and 10 show an example of experimental results for the three bearing conditions. The details on comparison are listed in Table 12.

Figure 9.

Experimental results for different conditions using the technique in Xu and Chen (2013): (a) for l=5, c=1.0718, g=1.0070; (b) for l=6, c=1.0718, g=1.0070; and (c) for l=7, c=1.0718, g=1.0070.

Figure 10.

Experimental results for different conditions using technique in Liu et al. (2013): (a) distance evaluation for all features (threshold=0.5); (b) identification result for the 81 features, optimal parameters c=1.0718, g=1.0070; and (c) identification result for the selected 6 features, optimal parameters c=1.0718, g=1.0070.

Table 8.

Features obtained for r=4.

Normal F1, F2, Sum	Inner race fault F1, F2, Sum	Outer race fault F1, F2, Sum
2.8420, 4.0664, 6.9074	1.0476, 1.1286, 2.1762	1.7489, 2.7859, 4.5348
3.0544, 4.5768, 7.6312	1.0893, 1.0892, 2.1785	1.8643, 3.2283, 5.0926
2.6950, 4.2233, 6.9183	1.0727, 1.1937, 2.2664	1.7180, 2.6264, 4.3444
2.7701, 3.8805, 6.6506	1.0571, 1.0970, 2.1541	1.9406, 2.9102, 4.8508
2.8305, 3.9430, 6.7735	1.1123, 1.0924, 2.2047	1.7008, 2.7166, 4.4174

Table 9.

Features obtained for r=6.

Normal F1, F2, Sum	Inner race fault F1, F2, Sum	Outer race fault F1, F2, Sum
3.7781, 3.1333, 6.9114	1.0629, 1.1947, 2.2576	2.2140, 2.2207, 4.4347
4.0963, 3.5006, 7.5969	1.0928, 1.0819, 2.1747	2.4446, 2.5273, 4.9719
3.7013, 3.2130, 6.9143	1.0895, 1.2773, 2.3668	2.1707, 2.1429, 4.3136
3.7783, 2.9030, 6.6813	1.0680, 1.1232, 2.1912	2.4739, 2.3259, 4.7998
3.7877, 2.8998, 6.6875	1.1131, 1.0833, 2.1964	2.1275, 2.2128, 4.3403

Table 10.

Features obtained for r=7.

Normal F1, F2, Sum	Inner race fault F1, F2, Sum	Outer race fault F1, F2, Sum
4.2775, 2.5900, 6.8675	1.0701, 1.1775, 2.2476	2.4025, 1.9920, 4.3972
4.7749, 2.8068, 7.5817	1.0950, 1.0736, 2.1686	2.7075, 2.1909, 4.8984
4.2359, 2.5695, 6.8054	1.1045, 1.2427, 2.3472	2.3281, 1.9739, 4.3020
4.2680, 2.4325, 6.7005	1.0725, 1.1173, 2.1898	2.6664, 2.0274, 4.7106
4.2871, 2.4238, 6.7109	1.1095, 1.0763, 2.1858	2.3176, 2.0274, 4.3450

Table 11.

Features obtained for r=8.

Normal F1, F2, Sum	Inner race fault F1, F2, Sum	Outer race fault F1, F2, Sum
4.7510, 2.0458, 6.7968	1.0777, 1.1831, 2.2608	2.5427, 1.7669, 4.3096
5.4244, 2.1892, 7.6136	1.0973, 1.1033, 2.2006	2.9323, 1.9224, 4.8547
4.8059, 2.0055, 6.8114	1.1207, 1.3512, 2.4719	2.4237, 1.8103, 4.2340
4.7219, 1.9574, 6.6793	1.0775, 1.1538, 2.2313	2.7974, 1.8883, 4.6857
4.7480, 1.9754, 6.7234	1.1106, 1.0666, 2.1772	2.4408, 1.8773, 4.3181

Table 12.

Comparison of the proposed work with the reported techniques.

Techniques	Decomposition methods	Feature/mode selection	Feature extraction	Number of data sets	Accuracy (%)
Xu and Chen (2013)	Empirical mode decomposition (EMD)	Principal component analysis	Energy	Train (60) Test (60)	96.7–98.3
Liu et al. (2013)	Empirical mode decomposition (EMD)	Distance estimation	27 features	Train (60) Test (60)	98.3–100
Current study	Singular value decomposition (SVD)	Not applicable	Complementary short-time energy	Train (60) Test (60)	100

As shown in Figure 9, for l=7, the classification accuracy for the three bearing conditions is 96.7%. For l=6, the classification accuracy is 98.3%. Moreover, for l=5, identification rate of 98.3% is achieved. Obviously, the diagnosis result is sensitive to the number of components retained l. For the energy features, too large l will add more redundancy into the modes of interest, while small l causes information loss readily. So, selection of useful features may weaken reliability of fault diagnosis of rolling bearing. As displayed in Figure 10 (b), first, the original 81 features are employed as an input of SVM classifier, and the resulting classification rate is 98.3%. However, redundant features may significantly influence the performance of classifier such as SVM. Figure 10 (a) suggests that the features with scores exceeding the predefined threshold should be selected as an input of SVM. The diagnosis result obtained for the reduced feature set is shown in Figure 10 (c). After feature selection, the identification accuracy is increased to 100%. Indeed, feature selection technique is effective to improve recognition rate of bearing faults. Nevertheless, the diagnosis results are sensitive to the number of selected components. Besides, the selection of feature usually relies on an expert’s knowledge, which is hard for implementation of an automatic fault diagnosis. Different from the existing approaches, to avoid feature selection the proposed work takes advantage of two complementary metrics to transform the original signal into two parts for extraction of fault features. This approach has three merits. On the one hand, the length of feature vector is constant with changes in the values of r. On the other hand, the features collected from the complementary metrics are of complementary property. For the two subspaces, the impact of r is transformed into an increment/decrement in the values of the features. Last but not least, the diagnosis results are not sensitive to the number of canonical variate retained r. Table 12 indicates that the proposed technique is simple and robust for bearing fault diagnosis, and the result obtained is satisfactory.

4. Conclusions

This paper proposes an effective approach to avoid the mode/feature selection issue. The advantages of the proposed work over the published works are discussed. The experimental results indicate that the novel feature extraction method is suitable for application of bearing fault diagnosis. In this paper, the diagnosis result is insensitive to the number of retained components, and mode/feature selection procedure can be removed by applying the proposed work. Identification accuracy of proposed method is satisfactory. To sum up, the following conclusions can be obtained:

This paper proposes a simple and effective scheme to avoid the complex feature/mode selection procedure. Two complementary subspaces are considered at the same time without considering selection of modes of interest.

A CSTE feature is proposed to distinguish various working conditions of bearing. In this paper, for different r, the features obtained still retain a separable pattern.

Different from the reported works, the length of proposed feature vector is constant. Viz. two features, one is extracted from retained space and the other is from residual space. It is worth pointing out that the objective of feature/mode selection is to retain the most useful components for feature extraction. So, how to achieve small number of features is essential for improving accuracy of bearing fault diagnosis.

The diagnosis results are not sensitive to the number of canonical variates retained. And the results obtained are satisfactory.

Two representative techniques are estimated for comparison. Experimental results show that the proposed work is simple and effective for fault diagnosis of rolling bearing.

In this paper, CVA is used as a data-driven technique, which requires vibration data collected from healthy operating condition to construct a reference model. For detecting a single operating condition, only one healthy vibration datum is sufficient for model training. For detecting varying operating conditions, it is necessary to ensure that the training set covers different normal working conditions in order to obtain a comprehensive and robust reference model. To achieve this, in case 1, normal data sets 97.mat and 98.mat collected from different working conditions are used to train a CVA model through calculating the past and future matrices of each data set according to Equation (9) and Equation (10), and then the matrices obtained are combined to produce a new matrix covering different normal conditions. Similarly, in case 2, the baseline conditions are joined for model construction to avoid loss of information and enhance the robustness of a reference model. Indeed, we should pay more attention to the effects of averaging on reference model construction in normal condition.

Footnotes

Acknowledgements

The research is supported by the National Natural Science Foundation of China (Grant: 51675491 and 51175480). The authors would like to express their most sincere appreciation to the Society for Machinery Failure Prevention Technology for providing the experimental data.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Cerrada

Sánchez

Pacheco

et al. (2016a) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Applied Intelligence 44(3): 687–703.

Cerrada

Zurita

Cabrera

et al. (2016b) Fault diagnosis in spur gears based on genetic algorithm and random forest. Mechanical Systems and Signal Processing 70(2016): 87–103.

Chang

Lin

(2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3): 1–27.

Gajjar

Palazoglu

(2016) A data-driven multidimensional visualization technique for process fault detection and diagnosis. Chemometrics and Intelligent Laboratory Systems 154(2016): 122–136.

Grasso

Chatterton

Pennacchi

et al. (2016) A data-driven method to enhance vibration signal decomposition for rolling bearing fault analysis. Mechanical Systems and Signal Processing 81(1): 126–147.

Georgoulas

Loutas

Stylios

et al. (2013) Bearing fault detection based on hybrid ensemble detector and empirical mode decomposition. Mechanical Systems and Signal Processing 41(1): 510–525.

Harmouche

Delpha

Diallo

(2014) Incipient fault detection and diagnosis based on Kullback–Leibler divergence using principal component analysis: Part I. Signal Processing 94(1): 278–287.

Harmouche

Delpha

Diallo

(2015) Incipient fault detection and diagnosis based on Kullback–Leibler divergence using principal component analysis: Part II. Signal Processing 109(3): 334–344.

Jiang

Huang

Zhu

et al. (2015) Canonical variate analysis-based contributions for fault identification. Journal of Process Control 26(10: 17–25.

10.

Kang

Kim

(2013) Singular value decomposition based feature extraction approaches for classifying faults of induction motors. Mechanical Systems and Signal Processing 41(1): 348–356.

11.

Liu

Cao

Chen

et al. (2013) Multi-fault classification based on wavelet SVM with PSO algorithm to analyze vibration signals from rolling element bearings. Neurocomputing 99(2013): 399–410.

12.

Liu

Guo

et al. (2017) A hybrid intelligent multi-fault detection method for rotating machinery based on RSGWPT, KPCA and Twin SVM. ISA transactions 66(2017): 249–261.

13.

Muruganatham

Sanjith

Krishnakumar

et al. (2013) Roller element bearing fault diagnosis using singular spectrum analysis. Mechanical systems and signal processing 35(1): 150–166.

14.

Odiowei

PEP

Cao

(2010) Nonlinear dynamic process monitoring using canonical variate analysis and kernel density estimations. IEEE Transactions on Industrial Informatics 6(1): 36–45.

15.

Ruiz-Cárcel

Cao

Mba

et al. (2015) Statistical process monitoring of a multiphase flow facility. Control Engineering Practice 42(1): 74–88.

16.

Smith

Randall

(2015) Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mechanical Systems and Signal Processing 64(2015): 100–131.

17.

Sun

Xiao

Wen

et al. (2015) Natural gas leak location with K–L divergence-based adaptive selection of Ensemble Local Mean Decomposition components and high-order ambiguity function. Journal of Sound and Vibration 347(2015): 232–245.

18.

Sun

Xiao

Wen

et al. (2016) Natural gas pipeline leak aperture identification and location based on local mean decomposition analysis. Measurement 79(2016): 147–157.

19.

Tao

et al. (2013) An approach to performance assessment and fault diagnosis for rotating machinery equipment. EURASIP Journal on Advances in Signal Processing 1(1): 1–16.

20.

Van

Kang

(2015) Bearing-fault diagnosis using non-local means algorithm and empirical mode decomposition-based feature extraction and two-stage feature selection. IET Science, Measurement & Technology 9(6): 671–680.

21.

Chen

(2013) An intelligent fault identification method of rolling bearings based on LSSVM optimized by improved PSO. Mechanical Systems and Signal Processing 35(1): 167–175.

22.

Yan

Zhang

(2015) Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors and Actuators B: Chemical 212(2015): 353–363.

23.

Yang

Pan

et al. (2014) A fault diagnosis approach for roller bearing based on improved intrinsic timescale decomposition de-noising and kriging-variable predictive model-based class discriminate. Journal of Vibration and Control 22(5): 1431–1446.

24.

Yin

Zhu

Kaynak

(2015) Improved PLS focused on key-performance-indicator-related fault diagnosis. IEEE Transactions on Industrial Electronics 62(3): 1651–1658.