Class-information–incorporated kernel entropy component analysis with application to bearing fault diagnosis

Abstract

Rolling bearings are present ubiquitously in industrial fields; timely fault diagnosis is of crucial significance in avoiding serious catastrophe. The extraction of ideal fault feature is a challenging task in vibration-based bearing fault detection. In this article, a novel method called class-information–incorporated kernel entropy component analysis is proposed for bearing fault diagnosis. The method is developed based on the Hebbian learning theory of neural network and the kernel entropy component analysis which attempts to compress the most Renyi quadratic entropy of input dataset after dimension reduction and presents a good performance for nonlinear feature extraction. Class-information–incorporated kernel entropy component analysis can take advantage of the label information of training samples to guide dimensional reduction and still follow the same simple mathematical formulation as kernel entropy component analysis. The high-dimensional feature dataset including time-domain, frequency-domain, and time–frequency domain characteristic parameters is first derived from the vibration signals. Then, the intrinsic geometric features are extracted by class-information–incorporated kernel entropy component analysis, and a classification strategy based on fusion information is applied to recognize different operating conditions of bearings. The experimental results demonstrated the feasibility and effectiveness of the proposed method.

Keywords

Bearing fault diagnosis feature extraction Renyi entropy class-information–incorporated kernel entropy component analysis

1. Introduction

Rolling element bearings are prevalent, important, and fragile parts in equipment, which account for almost 45–55% of rotating machinery failures (Rai and Upadhyay, 2016). The safety and stability of rolling bearings are the key to ensure the normal operations of machines; unexpected failures may cause massive economic losses and even lead to casualties (Ahmed et al., 2018; Cerrada et al., 2018; Rai and Upadhyay, 2016). Once the fault occurs, accurate condition monitoring technique allows beforehand the detection of faults and correct recognition of the type of faults. Vibration-based bearing condition monitoring has received extensive studies and become a well-accepted technique of many planned maintenance managements (Ahmed, 2019; Alamelu Manghai and Jegadeeshwaran, 2019). The major challenge for bearing condition monitoring lies on how to extract reliable and sensitive features from the original signals. The vibration signals of defective bearings derived from rotating machines are usually complicated, including some useful information and noises. To obtain more fault information to reflect the conditions of bearings, a high-dimensional feature set including time-domain, frequency-domain, and time–frequency domain characteristic parameters is often used for diagnosis (Gan et al., 2015; Lei et al., 2007; Shao et al., 2018). However, the features obtained by traditional feature extraction methods may not contain all the diagnostic information, and the sensitivity of diagnostic features from the vibration signals may vary with load conditions because of nonlinear effect and nonstationary noise (Chen et al., 2017). Moreover, the computation of high-dimensional feature set constructed by all extracted features is a cumbersome process (Bull et al., 2019; Chen et al., 2017; Su et al., 2019). Thus, it is essential to develop effective dimension reduction and feature extraction techniques.

Various feature extraction methods associated with dimension reduction have been developed, such as principal component analysis (PCA) (Jolliffe, 2011), linear discriminant analysis (Belhumeur et al., 1997), locally linear embedding (Roweis and Saul, 2000), local tangent space alignment (Zhang and Zha, 2004), to name a few. Among these robust algorithms, PCA based on the data correlation matrix is the most widely used method for its simplicity and effectiveness (Huang et al., 2019). It is a linear and unsupervised transformation method, which aims to ensure that the original feature set is uncorrelated and maximizes the second-order statistics of original features (Guo et al., 2019). However, the performance of linear methods may degenerate for nonlinear data that have the nonlinear manifolds (such as nonlinear curves and surfaces) in the underlying low-dimensional structure (Li and Zhang, 2011). Kernel methods are a promising approach because they can constitute a framework to formulate nonlinear versions from linear algorithms (Gómez-Chova et al., 2011). A very influential method is kernel principal component analysis (KPCA) (Schölkopf et al., 1998), which is an extension of traditional linear PCA by using kernel trick. The key of KPCA is to implicitly find the nonlinear relationship between input dataset and the kernel feature space with kernel matrix that is derived from a positive semidefinite kernel function of computing inner products (Ji and Huang, 2018). KPCA overcomes the limitation of PCA in dealing with nonlinear problems and has been widely used in fault diagnosis (Cheng et al., 2016; Liu et al., 2018). However, these methods perform feature extraction and dimension reduction by choosing top eigenvalues and corresponding eigenvectors of kernel matrix without revealing the nature of original feature sets. Therefore, the extraction may select uninformative eigenvectors from the standpoint of information theory (Zhou et al., 2017a).

Kernel entropy component analysis (KECA) is a novel information theory–based nonlinear feature extraction method associated with dimension reduction, which has been proposed and used in pattern recognition by Jenssen (2009b). Unlike the methods depending on the second-order statistics of the input dataset, KECA is developed based on information theory and can compress the most Renyi quadratic entropy of input dataset instead of depending on top eigenvalues (Gao et al., 2014; Jenssen, 2009b; Zhou et al., 2017b). KECA has been applied to face recognition, process monitoring, data clustering, and so on, which showed the unique superiority in terms of feature extraction (Jenssen, 2013; Qi et al., 2019; Yang et al., 2015; Yuan et al., 2017; Zhang et al., 2015). KECA can greatly reserve the raw data information by several main kernel entropy scores with clear physical meanings, and it is an effective tool to extract the fatigue evolution information of rolling bearing (Wang et al., 2018).

However, KECA is an unsupervised method without considering scatter information of class, which may discard represent discriminant classification information for recognition. To take full use of the tagged information and improve the performance of fault identification, this article presents a supervised method called class-information–incorporated kernel entropy component analysis (CIKECA) based on the Hebbian learning theory of neural network. In the implementation process of bearing fault diagnosis, CIKECA can take advantage of the label information of training samples to guide dimensional reduction and feature extraction of high-dimensional feature dataset. CIKECA is presented in two objectives: one is to make full use of tagged information of the given data and the other is to still follow the same simple mathematical formulation as KECA. The experimental results indicate that CIKECA has better performance in bearing fault diagnosis compared with other three manifold learning methods, especially when the vibration signals are contaminated by strong background noise.

The rest of the article is organized as follows. In Section 2, the principle of KECA is introduced, and an improved KECA method based on the Hebbian learning theory of neural network is proposed. In Section 3, the fault diagnosis method based on the CIKECA algorithm is proposed. In Section 4, experiments with the proposed method were implemented, and the experimental results show the effectiveness and advantage of CIKECA compared with the PCA, KPCA, and KECA. Finally, the conclusions are drawn in Section 5.

2. Principle of the proposed method

2.1. Brief review of KECA

Assuming p( x ) is the probability density function of a given dataset X = [ x ₁, x ₂, …, x _N], then its Renyi entropy of order α is expressed as $H_{α} (X) = (1 / (1 - α)) lg (\int p^{α} (x) d x)$ where α ≥ 1 (Rényi et al., 1961). In KECA, Renyi quadratic entropy (α = 2) is used because the entropy value can be elegantly estimated by Parzen window density estimator (Jenssen, 2009a; Zhou et al., 2017a). Thus, its Renyi quadratic entropy is $H (p) = - lg \int p^{2} (x) d x$ . Because the logarithm is monotonous, only the following integral function V(p) = ∫p²( x )d x = E{p( x )} needs to be considered. To estimate V(p), and hence H(p), a Parzen window density $\bar{p} (x) = (1 / N) \sum_{x_{i} \subset D} K_{σ} (x, x_{i})$ is applied, where K _σ( x , x _i) is the so-called Parzen window or kernel function (Parzen, 1962), centered at x _i with a width dominated by the parameter σ .

Using the sample mean approximation of the expectation operator

\begin{array}{l} \bar{V} (p) = \frac{1}{N} \sum_{x_{i} \subset D} \bar{p} (x_{i}) \\ = \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{N} K_{σ} (x_{i}, x_{j}) \\ = \frac{1}{N^{2}} 1^{T} K 1 \end{array}

(1)

where K is a N × N kernel matrix, the element ( i , j ) of K is K_σ ( x _i, x _j), and 1 is the N × 1 vector (all elements are one).

The Renyi entropy could be expressed by the eigenvalues and eigenvectors of the kernel matrix, which could be eigen-decomposed as K = ED _λ E ^T, where D _λ = diag(λ₁, λ₂, …, λ_N) is a diagonal matrix storing eigenvalues λ₁, …, λ_N and E = [ e ₁, e ₂, …, e _N] is a matrix with corresponding eigenvectors e ₁, …, e _N as columns. Thus, equation (1) can be expressed as follows

\bar{V} (p) = \frac{1}{N^{2}} \sum_{i = 1}^{N} {(\sqrt{λ_{i}} e_{i}^{T} 1)}^{2}

(2)

Each term in this equation will contribute to the entropy estimate. This means that some eigenvalues and corresponding eigenvectors can contribute more to the entropy estimate than others that depend on largest d eigenvalues and eigenvectors. The eigenvalues and eigenvectors in KECA are selected based on the first d largest contributions to the entropy estimate so that the cumulative contribution rate of the selected Renyi entropy reaches 95% of all the Renyi entropy.

2.2. Introduction of CIKECA

Given a set of c-class training sample patterns $x_{j}^{i} \in R^{n} (i = 1,2, \dots, C; j = 1,2, \dots, n_{i}; \sum_{i = 1}^{C} n_{i} = N)$ , then constructed the corresponding class labels y _i ∈ {0,1}^C. Where $x_{j}^{i}$ is the jth training sample of ith class, y _i is encoded in 1-of-c ways, that is only the ith component is set to 1, 0 otherwise.

The training samples $x_{j}^{i}$ are envisaged to be mapped through a nonlinear function $Φ (x_{j}^{i})$ into a feature space. Now connecting the mapped data $Φ (x_{j}^{i})$ and its label y _i to develop an augmented column vector $z_{j}^{i} = {[Φ (x, x_{j}^{i}), y_{i}^{T}]}^{T} Φ (x_{j}^{i})$ that contains the label information, where $z_{j}^{i} \in R^{N + C}$ .

Constructed the kernel matrix K _t based on $z_{j}^{i}$ as follows

K_{t} = \frac{1}{N} \sum_{i = 1}^{C} \sum_{j = 1}^{n_{i}} (z_{j}^{i} - μ) {(z_{j}^{i} - μ)}^{T}

(3)

Here, $μ = (1 / N) \sum_{i = 1}^{C} \sum_{j = 1}^{n_{i}} z_{j}^{i}$ . Eigen-decompose $K_{t} = E_{t} D_{t} E_{t}^{T}$ and estimate V(p) according to equation (2)

\begin{array}{l} \hat{V} (p) = \frac{1}{N^{2}} 1 K_{t} 1^{T} \\ = \frac{1}{N^{2}} 1 E_{t} D_{t} E_{t}^{T} 1^{T} \\ = \frac{1}{N^{2}} 1 \sum_{i = 1}^{N} {(\sqrt{λ_{t i}} α_{t i}^{T} 1)}^{2} \end{array}

(4)

where D _t = diag(λ_t1, λ_t2, …, λ_tN) is a diagonal matrix storing eigenvalues λ_tN and E _t = [ α _t1, α _t2, …, α _tN] is a feature matrix with corresponding eigenvectors α _ti as columns. Then, the optimal projection matrix U = { u _i|i = 1, 2, …, d} ∈ R ^N+C can be obtained according to

\sqrt{λ_{t i}} α_{t i}^{T}

. Here, U is formed by the first d eigenvectors, corresponding to the first d largest entropy estimate, in a decreasing order. The number, d, of the projection vectors to be kept is determined in terms of

\sum_{j = 1}^{d} λ_{j} / \sum_{j = 1}^{n + c} λ_{j} \geq α

(set to 0.95 here both for CIKECA and KECA).

For a pattern to be classified, we cannot directly use U to extract so-needed features of testing samples because of the unknown label. To perform an effective extraction task, we decomposed the projection matrix U into two parts U _x and U _y as follows

U = (u_{1}, u_{2}, \dots, u_{d}) = (\begin{matrix} U_{x} \\ U_{y} \end{matrix}), u_{i} = (\begin{matrix} u_{x i} \\ u_{y i} \end{matrix})

(5)

where U _x = { u _xi} ∈ R^N×d, U _y = { u _yi} ∈ R^C×d, i = 1, 2, …, d. To avoid the redundancy of the features and improve the recognition accuracy after the projection, the matrix U _x and U _y should be orthogonalized by Schmidt.

For the testing sample x _new, assuming its unknown class label is y, the corresponding augmented vector $z = {[Φ (x, x_{new}), y^{T}]}^{T}$ can be approximately represented by a set of the found principal components. Now, we can not only get so-needed features directly from x _new but also estimate its corresponding class label indirectly from its extracted features according to the following formulation

Z = (\begin{matrix} X_{Φ} \\ Y \end{matrix}) \approx \sum_{i = 1}^{d} a_{z i} u_{i} = (\begin{matrix} \sum_{i = 1}^{d} a_{z i} u_{x i} \\ \sum_{i = 1}^{d} a_{z i} u_{y} \end{matrix}) = (\begin{matrix} U_{x} a_{z} \\ U_{y} a_{z} \end{matrix})

(6)

where X _Φ = Φ ( x , x _new) is the kernel testing sample and

a_{z} = {[a_{z 1}, a_{z 2}, \dots, a_{z d}]}^{T}

is the vector with the data projected in the new vector basis.

X_{Φ} \approx \sum_{i = 1}^{d} a_{z i} u_{x i} = U_{x} a_{z}

and

Y \approx \sum_{i = 1}^{d} a_{z i} u_{y i} = U_{y} a_{z}

can be obtained by equation (6), and then the least-squared approximate solution to a _z can be acquired from X _Φ, that is

a_{z} = {(U_{x})}^{+} X_{Φ}

. Now, substituting a _z into Y ≈ U _y a _z, the approximate class label Y for the testing samples could be estimated

\hat{Y} = U_{y} {(U_{x})}^{+} X_{Φ}

(7)

Here, ${(U_{x})}^{+}$ is the pseudo-inverse matrix of U _x.

As seen, CIKECA provides class estimation for each testing samples distinctly different from KECA where we cannot conduct the estimate.

2.3. Classification strategy based on fusion information

Although the estimated information Ŷ is obtained, classifier still should be adopted to determine the true class to which the unknown pattern belongs. A classification strategy based on fusion information (the low-dimensional features a _z and estimated label information Ŷ ) and k-nearest neighbor classifier (KNN) is introduced, which is widely used for its simplicity and effectivity (Biswas et al., 2018). According to equations (5) and (6), the corresponding low-dimensional features and estimated label information of ith testing sample x _i and the training sample x _t are a _z( x _i) and a _z( x _t) and Y _i and Ŷ , respectively. Defining the distance between a _z( x _i) and a _z( x _t) is

D (a_{z} (x_{t}), a_{z} (x_{i})) = ‖ a_{z} (x_{t}) - a_{z} (x_{i}) ‖

(8)

where

‖ ‖

is the 2-norm of feature vectors, then a fusion strategy of feature information can be defined as

d_{i} = θ ‖ a_{z} (x_{t}) - a_{z} (x_{i}) ‖ + (1 - θ) ‖ \hat{Y} - Y_{i} ‖

(9)

Here, θ (0 ≤ θ ≤ 1) is fused coefficient, which can determine the proportion of low-dimensional feature information and category label information in classification.

Because of the dimensional difference of the two feature information, there may be a big difference in the quantitative relationship between the two features, which would affect the final state recognition effect. To achieve better fusion effect, the numerical nonequilibrium of different feature information is eliminated as follows

d_{i} = θ \frac{‖ a_{z} (x_{t}) - a_{z} (x_{i}) ‖}{\sum_{k = 1}^{N} ‖ a_{z} (x_{t}) - a_{z} (x_{k}) ‖} + (1 - θ) \frac{‖ \hat{Y} - Y_{i} ‖}{\sum_{k = 1}^{N} ‖ \hat{Y} - Y_{k} ‖}

(10)

According to the principle of nearest neighbor classifier, the classification rules can be defined as: if $d_{k} = \min_{i} d_{i}, i = 1,2, \dots, N$ , then it can consider that the testing sample x _i belonging to the same class as its nearest neighbor x_i in the training. To analyze and reveal the inner nature of CIKECA algorithm’s classification performance, the nearest neighbor classifier based on information fusion of the two features is recorded as CIKECA (θ). From equation (10), if θ = 1, only the extracted feature is used for classification (short for CIKECA (1)), whereas the single estimated label is used for θ = 0 (short for CIKECA (θ)), and the fusion information is exploited for θ (0 ≤ θ ≤ 1) (short for CIKECA (θ)).

3. Fault diagnosis based on CIKECA

CIKECA introduces the tag information of samples based on KECA algorithm mainly which is inspired by the Hebbian learning method of neural network. The information extraction process of rolling bearing fatigue evolution stage based on the improved KECA is given as follows: 1.

The high-dimensional feature set, which can represent well the operating condition of machines, should be first extracted from the raw vibration signals. The feature set is constructed by 11 time-domain statistical characteristics, 13 frequency-domain statistical characteristics, and the first six relative empirical mode decomposition (EMD) energy entropies according to reference Lei et al. (2008).

CIKECA is used to extract the low-dimensional manifold features az and the estimated value of category tag information in the high-dimensional feature dataset, according to equations (5)–(7).

Using the nearest neighbor classifier based on fusion information to recognize the running state of rolling bearing.

And the flowchart of the proposed method is shown in Figure 1.

Figure 1.

Fault diagnosis based on class-information–incorporated kernel entropy component analysis method.

4. Experimental and results

To verify the effectiveness of CIKECA algorithm, two experimental studies on fault diagnosis of rolling bearings were carried out, and details of the setup are described as follows. Where Experiment 1 is mainly aimed at different types of rolling bearing state recognition. And Experiment 2 mainly studies the influence of different noises (i.e. the distance from the fault vibration source) on the state identification rate of rolling bearing.

4.1. Experiment 1

The vibration measuring system is mainly composed of hardware and software part, and the schematic of load devices of machinery system and the actual experimental system are shown in Figure 2(a) and (b), respectively. Deep groove ball bearings (6328-2RZ) are applied as the test bearing. The geometric parameters of the bearings are listed as follows: inner ring diameter (Di) = 28 mm, outer ring diameter (Do) = 28 mm, ball diameter (Db) = 11.509 mm, pitch diameter (Pd) = 48.5 mm, number of balls (Z) = 8, and the contact angle (α) = 0.274 rad. The inner ring rotates uniformly with the shaft driven by the driving device, whereas the outer ring maintains a static state under the central axial load. The single-point radial vibration signal at the 1/2 width of the outer ring is collected by the acceleration sensor (YD-1) and amplified by a charge amplifier (DHF-2). The charge sensitivity and frequency response of the sensor are 6–10 pC/ms⁻² and 1–10 kHz ± 1 dB, respectively, and the frequency range of the amplifier is 0.3 Hz − 100 kHz. Then, the signals are converted to voltage signals by an A/D converter (PCI-9114) and sent to a computer for further processing. The sampling frequency is 25 kHz, and the rotational speed of the driving motor is 1500 r/min. Radial load of 0 kN and axial load of 1.0 kN were added to the shaft and bearings by the cylinder.

Figure 2.

Test rig. (a) Schematic of load devices of machinery system and (b) actual experimental system.

Four different operating conditions were introduced, including inner race fault, outer race fault, ball fault, and normal condition. The scratch defects of the bearings were introduced by electric engraving pen, and the width of the scratch defects of the inner race, outer race, and ball was 65 ± 22 μm, 70 ± 20 μm, and 70 ± 20 μm, respectively, and the depths of the scratch defects were 0.2 ± 0.05 mm. According to the kinematic parameters and the rotational speed, the characteristic bearing defect frequencies of the inner race, outer race, and ball are 121.75 Hz, 78.25 Hz, and 55 Hz, respectively. Four different vibration signal waveforms in the time domain together with the amplitude spectrums are shown in Figure 3. It is difficult to distinguish different faults only from Figure 3 because of the effects of the noise. The vibration signals under those four conditions are selected as samples, and 100 bearings for each state were tested. Thus, 400 data can be obtained, and each containing 25,000 sample points. The training dataset is half samples of the original dataset in the experiment.

Figure 3.

Time domain and frequency domain figures of vibration signals for the four bearing conditions: (a) normal condition, (b) inner race fault, (c) outer race fault, and (d) ball fault.

After the high-dimensional feature set is constructed, it is input into CIKECA for eliminating redundant and extracting low-dimensional features, and the first six relative EMD energy entropies of different operating conditions are shown in Figure 4. To examine whether such an introduction of class label to KECA is helpful for recognition, similar methods including PCA, KPCA, and KECA are conducted for comparison. The target dimension for each method is set to a certain number so that the cumulative variance contribution rate is more than 95%. The Gaussian kernel function is selected for all the kernel functions, and the five-fold cross-validation is applied to decide parameter σ , where the parameter σ of KPCA, KECA, and CIKECA is 1.2, 1.5, and 1.6, respectively. For visualization, the plots of the first three principal components of their projection results are shown in Figures 5–9, where Figures 5(a), 6(a), 7(a), 8(a), and 9(a) represent the training results and Figures 5(b), 6(b), 7(b), 8(b), and 9(b) represent the testing results. The hollow dots represent the training samples and the solid dots represent the test samples.

Figure 4.

First six relative empirical mode decomposition energy entropies of different operating conditions.

Figure 5.

Clustering results with principal component analysis: (a) training results and (b) testing results.

Figure 6.

Clustering results with kernel principal component analysis: (a) training results and (b) testing results.

Figure 7.

Clustering results with kernel entropy component analysis: (a) training results and (b) testing results.

Figure 8.

Clustering results with class-information–incorporated kernel entropy component analysis (1): (a) training results and (b) testing results.

Figure 9.

Clustering results with class-information–incorporated kernel entropy component analysis (0): (a) training results and (b) testing results.

It is evident that the low-dimensional feature dataset of PCA is mostly overlapped, which means that PCA has certain limitations in the extraction of low-dimensional sensitive features of rolling bearings. From Figure 6, most of the low-dimensional feature dataset of KPCA is overlapped, only partial inner race fault and outer race fault samples can be distinguished. As seen in Figure 7, the ball fault and inner race fault in low-dimensional feature dataset of KECA can be recognized, whereas the normal condition and outer race fault samples are overlapped. Figure 8 shows the clustering results of CIKECA (1), where only the extracted feature is used for classification. As seen, the low-dimensional feature dataset of CIKECA (1) separates from each other in three-dimensional space, only a few data points are overlapped. Figure 9 shows the clustering results of CIKECA (0), where the single estimated label is used for recognition. Most of the low-dimensional feature dataset of CIKECA (0) is parted from each other in three-dimensional space, especially for the normal state and outer ring fault. The experimental results indicated that CIKECA has better clustering and dimensionality reduction performance than the other methods. The low-dimensional sensitive features and estimate label information extracted by CIKECA can both represent the running state of rolling bearings; the fusion information may improve the rolling bearing fault identification.

Within the fault diagnosis related to pattern recognition in conjunction with feature extraction techniques that find low-dimensional representation for samples, classifiers are needed to identify those different bearing faults. The nearest neighbor classifier based on fusion information, that is the extracted feature a _z and estimated label information Ŷ , is applied to decide the category in CIKECA, whereas the extracted features are directly fed into KNN classifier as the final evaluation criteria in other methods, where the nearest neighbor number k is set to 5. 50 datasets per class are selected randomly for training, and remaining dataset is used for testing. The quantitative evaluation procedures are repeated for 10 times, and the average classification results are shown in Table 1. The classification accuracies of original data, PCA, KPCA, KECA, CIKECA (1), CIKECA (0), and CIKECA (θ = 0.6) are 75.5%, 82%, 85%, 93%, 94%, 93%, and 95.5%, respectively. All accuracies of low-dimension features are higher than origin features, which means effective dimension reduction methods can remove redundancy information and improve classification accuracy, while CIKECA has better classification performance than KECA and KPCA. When θ = 1, only the extracted feature is used for classification, the accuracy of CIKECA (1) (94%) is higher than that of KECA (93%). When θ = 0, only estimated label is used for classification, the accuracy of CIKECA (0) (93%) is equal to accuracy of KECA (93%), whereas the accuracy of CIKECA (0) (94%) is higher than that of KECA (86%) and CIKECA (1) (88%) for outer race fault. When θ = 0.6, the fusion information is exploited, and the best classification performance can be obtained. CIKECA extracts low-dimensional sensitive feature parameters and estimates the category label information of high-dimensional feature space by combining training samples and category label features, which can improve the fault recognition accuracies.

Table 1.

Classification results (%) of each method with KNN classifier.

Fault type	Normal	Inner race fault	Outer race fault	Ball fault	Accuracy
Origin feature	76	80	76	70	75.5
PCA	86	82	88	72	82
KPCA	90	88	84	78	85
KECA	88	100	86	98	93
CIKECA (1)	92	100	88	96	94
CIKECA (0)	92	96	94	90	93
CIKECA (θ), θ = 0.6	94	100	92	96	95.5

PCA: principal component analysis; KPCA: kernel principal component analysis; KECA: kernel entropy component analysis; CIKECA: class-information–incorporated kernel entropy component analysis; KNN: k-nearest neighbour.

θ (0 ≤ θ ≤ 1) is the fused coefficient, which can determine the proportion of low-dimensional feature information and category label information in classification. Therefore, it is necessary to analyze the relationship between the parameter theta and the final identification accuracy. For different fused coefficient (increased from 0 to 1), the nearest neighbor classifier based on fusion information is used for classification. The quantitative evaluation procedures are repeated for 10 times; the relationship between average accuracy and fused coefficient θ is shown in Figure 10. As seen, the accuracy varies in different fused coefficient, the classification accuracy of CIKECA(θ) features is still higher than that of other algorithms for most θ values.

Figure 10.

Relationship between accuracy and fused coefficient θ.

4.2. Experiment 2

The platform includes the induction motor, s shaft coupling to the motor, several bearings supporting the shaft, and an oil-loaded device as a radial loader, the schematic of the platform together with the actual experimental system is shown in Figure 11(a) and (b), and the detailed description can be found in Pan et al. (2016). The type of tested bearing is HRB 6010-2RZ, and the geometric parameters of the bearings are listed as follows: Di = 50 mm, Do = 80 mm, Db = 9 mm, Pd = 65 mm, and Z = 13. Spalls were introduced by wire-cut electro-discharge machining on the inner race and outer race, the width and depth of defects were 0.3 mm and 1 mm, respectively. The signals were collected by B&K3560 system, the sampling frequency was 65,536 Hz, and the rotational speed was kept constant at 1500 r/min. Radial load of 1.0 kN and axial load of 2.0 kN were added to the shaft and bearings by the hydraulic cylinder. The vibration signals under those three conditions are selected as samples, and each state contains 80 samples which contains 65,536 sample points. The training dataset is half samples of the original dataset in the experiment.

Figure 11.

Test rig. (a) Schematic of the testing machinery system and (b) testing machinery setup.

Because of various noise and interfering vibrations, the distance between the fault vibration source and the acceleration sensor can influence the recognition accuracy. As shown in Figure 12, six accelerometers were mounted along the shaft and fixed platform at Point 1, Point 2, Point 3, Point 4, Point 5, and Point 6 to acquire vibration signals, where Point 1–Point 4 were mounted along the shaft and Point 5 and Point 6 were mounted on the fixed platform. The vibration signal waveforms in the time domain of normal condition, outer race fault, and inner race fault at Point 1–Point 6 are illustrated in Figure 12(a)–(c), respectively. The periodic impulse is obvious at Point 1 as shown in Figure 12, and the periodic impulse at Point 2 is partially submerged by noise, whereas the periodic impulse is completely overwhelmed by noise at the remaining points. As the location of sensor gets further and further, the vibration impulse is gradually submerged by noise. Although measuring Point 5 located on the fixed platform of experimental setup is close to tested bearing, it affects by the joint impact of the connecting parts that the characteristic impulse is also drowned by noise.

Figure 12.

Vibration signal waveforms of Point 1–Point 6 in the time domain: (a) normal condition, (b) outer race fault, and (c) inner race fault.

To verify the robustness of the proposed method, Experiment 2 has been performed. High-dimensional features of vibration signals at Point 1–Point 6 are first constructed, PCA, KPCA, KECA, and CIKECA are applied to extract the sensitive features, and the extracted features are directly fed into KNN classifier as the final evaluation criteria, where the nearest neighbor number k is set to 5. 40 samples per class are selected randomly for training, and remaining samples are used for testing. The quantitative evaluation procedures are repeated for 10 times, and the average classification results are shown in Figure 13(a). And a receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) are also used to evaluate the performance of the proposed method (Nayana and Geethanjali, 2017). These AUC-ROC plots are obtained for each class of the classification, and the average AUC-ROC values of all datasets are computed to assist in validating the better performing feature extraction method. Average AUC is presented in Figure 13(b).

Figure 13.

Experimental results at different measuring points. (a) Average accuracy of different methods at different measuring points and (b) average AUC of datasets by KNN for origin features, PCA, KPCA, KECA, and CIKECA. KNN: k-nearest neighbour; PCA: principal component analysis; KPCA: kernel principal component analysis; KECA: kernel entropy component analysis; and CIKECA: class-information–incorporated kernel entropy component analysis.

As seen, all the classification accuracy of the extracted features decreases gradually as the distance increases along the shaft (Point 1–Point 4). For the measuring point mounted on the fixed platform, that is Point 5 and Point 6, Point 5 is near the tested bearing, the accuracy at Point 5 is close to Point 3 because it affects by the joint impact of the connecting parts. The accuracy at measuring Point 6 is lower than other points. For all the measuring points, the recognition results of origin features are lowest, which means effective dimension reduction methods can remove redundant information and improve recognition accuracy, and the accuracy of CIKECA (θ) features is obviously higher than other methods, especially at measuring Point 6, the accuracy is high as 97.5%. Above all, compared with other algorithms, the proposed method has better robustness and can effectively extract low-dimensional sensitive characteristic parameters from high noise data, which can improve the state recognition performance of rolling bearings.

5. Conclusions

In this study, a feature extraction method called CIKECA is proposed for fault diagnosis of rolling bearings. CIKECA is presented in two objectives: one is to make full use of tagged information of the given data and the other is to still follow the same simple mathematical formulation as KECA. First, the multidomain features involving time domain, frequency domain, and time–frequency domain are constructed. Then, CIKECA is used to dimension reduction which can extract the intrinsic geometric features and remove the superfluous information. Finally, the extracted features of the feature space derived from the original vibration signals are fed into the KNN classifier to recognize different operating conditions of bearings. The results demonstrated the feasibility and effectiveness of the proposed method for fault diagnosis of rolling bearings.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by major project for technological innovation of Hubei Province of China (No. 2018AAA026), the Scientific Research Foundation for Doctoral Program of Hubei University of Technology (BSQD2017003), and the Natural Science Foundation of Hubei Province (No. 2019CFB326).

ORCID iDs

Hongdi Zhou

Jian Duan

References

Ahmed

(2019) Intelligent methods for condition monitoring of rolling bearings using vibration data. PhD Thesis, Brunel University London, London, England.

Ahmed

HOA

Wong

MLD

Nandi

(2018) Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse over-complete features. Mechanical Systems and Signal Processing 99: 459–477.

Alamelu Manghai

Jegadeeshwaran

(2019) Vibration based brake health monitoring using wavelet features: a machine learning approach. Journal of Vibration and Control 25(18): 2534–2550.

Belhumeur

Hespanha

Kriegman

(1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7): 711–720.

Biswas

Chakraborty

Mullick

, et al. (2018) A parameter independent fuzzy weighted k-nearest neighbour classifier. Pattern Recognition Letters 101: 80–87.

Bull

Worden

Fuentes

, et al. (2019) Outlier ensembles: A robust method for damage detection and unsupervised feature extraction from high-dimensional data. Journal of Sound and Vibration 453: 126–150.

Cerrada

Sánchez

R-V

, et al. (2018) A review on data-driven fault severity assessment in rolling bearings. Mechanical Systems and Signal Processing 99: 169–196.

Chen

Liu

Huang

(2017) Sparse discriminant manifold projections for bearing fault diagnosis. Journal of Sound and Vibration 399: 330–344.

Cheng

Chen

, et al. (2016) Study on planetary gear fault diagnosis based on entropy feature fusion of ensemble empirical mode decomposition. Measurement 91: 140–154.

10.

Gan

Wang

Zhu

(2015) Multiple-domain manifold for feature extraction in machinery fault diagnosis. Measurement 75: 76–91.

11.

Gao

Chen

, et al. (2014) A fisher discriminant framework based on kernel entropy component analysis for feature extraction and emotion recognition. In: 2014 IEEE international conference on multimedia and expo workshops (ICMEW), Chengdu, China, 14–18 July 2014, pp. 1–6. Piscataway, NJ: IEEE.

12.

Gómez-Chova

Jenssen

Camps-Valls

(2011) Kernel entropy component analysis for remote sensing image clustering. IEEE Geoscience and Remote Sensing Letters 9(2): 312–316.

13.

Guo

Gao

, et al. (2019) Sparse kernel principal component analysis via sequential approach for nonlinear process monitoring. IEEE Access 7: 47550–47563.

14.

Huang

Liu

(2019) Drivability evaluation model using principal component analysis and optimized extreme learning machine. Journal of Vibration and Control 25(16): 2274–2281.

15.

Jenssen

(2009a) Information theoretic learning and kernel methods. In: Information Theory and Statistical Learning. Boston, MA: Springer, 209–230.

16.

Jenssen

(2009b) Kernel entropy component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(5): 847–860.

17.

Jenssen

(2013) Entropy-relevant dimensions in the kernel feature space: cluster-capturing dimensionality reduction. IEEE Signal Processing Magazine 30(4): 30–39.

18.

Huang

(2018) Kernel entropy component analysis with nongreedy L1-norm maximization. Computational Intelligence and Neuroscience 2018: 1–9.

19.

Jolliffe

(2011) Principal Component Analysis. Berlin, Heidelberg: Springer.

20.

Lei

(2008) A new approach to intelligent fault diagnosis of rotating machinery. Expert Systems with Applications 35(4): 1593–1600.

21.

Lei

, et al. (2007) Fault diagnosis of rotating machinery based on multiple anfis combination with gas. Mechanical Systems and Signal Processing 21(5): 2280–2294.

22.

Zhang

(2011) Supervised locally linear embedding projection (sllep) for machinery fault diagnosis. Mechanical Systems and Signal Processing 25(8): 3125–3134.

23.

Liu

Yang

Zio

, et al. (2018) Artificial intelligence for fault diagnosis of rotating machinery: a review. Mechanical Systems and Signal Processing 108: 33–47.

24.

Nayana

Geethanjali

(2017) Analysis of statistical time-domain features effectiveness in identification of bearing faults from vibration signal. IEEE Sensors Journal 17(17): 5618–5625.

25.

Pan

Zhang

(2016) Research on fault test rig for rolling bearings in high-speed complex structure equipment (in Chinese). Bearing (10): 45–48.

26.

Parzen

(1962) On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3): 1065–1076.

27.

Meng

, et al. (2019) Keca similarity-based monitoring and diagnosis of faults in multi-phase batch processes. Entropy 21(2): 121.

28.

Rai

Upadhyay

(2016) A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribology International 96: 289–306.

29.

Rényi

et al. (1961) On measures of entropy and information. In: Proceedings of the fourth berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics, Berkeley, California, pp. 547–561. California, USA: The Regents of the University of California.

30.

Roweis

Saul

(2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326.

31.

Schölkopf

Smola

Müller

K-R

(1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5): 1299–1319.

32.

Shao

Jiang

, et al. (2018) Rolling bearing fault detection using continuous deep belief network with locally linear embedding. Computers in Industry 96: 27–39.

33.

Wang

, et al. (2019) Automated x-ray recognition of solder bump defects based on ensemble-elm. Science China Technological Sciences 62(9): 1512–1519.

34.

Wang

Liu

, et al. (2018) Research on feature extraction algorithm of rolling bearing fatigue evolution stage based on acoustic emission. Mechanical Systems and Signal Processing 113: 271–284.

35.

Yang

Liu

, et al. (2015) Wavelet kernel entropy component analysis with application to industrial process monitoring. Neurocomputing 147: 395–402.

36.

Yuan

, et al. (2017) A diagnostics method for analog circuits based on improved kernel entropy component analysis. Journal of Electronic Testing 33(6): 697–707.

37.

Zhang

Zuo

, et al. (2015) A novel approach to diagnosis of analog circuit incipient faults based on keca and oao lssvm. Metrology and Measurement Systems 22(2): 251–262.

38.

Zhang

Zha

(2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing 26(1): 313–338.

39.

Zhou

Shi

Liao

, et al. (2017a) Weighted kernel entropy component analysis for fault diagnosis of rolling bearings. Sensors 17(3): 625.

40.

Zhou

Shi

Liao

, et al. (2017b) Using supervised kernel entropy component analysis for fault diagnosis of rolling bearings. Journal of Vibration and Control 23(13): 2167–2178.