Incipient Fault Detection of Helical Gearbox Based on Variational Mode Decomposition and Time Synchronous Averaging

Abstract

As the fault detection methods diagnose defects in the earlier stage, the subsequent costs will be reduced. Feature extraction from the vibration signal is the foremost step for incipient fault detection of gearboxes. However, the current statistical features in the time and frequency domains cannot diagnose the early or low intensity faults. In this research, for the first time, besides these features, some other features are extracted by combining variational mode decomposition and time synchronous average (VMD-TSA) to overcome the problem. The combinations have occurred in two ways. First, the Intrinsic Mode Functions (IMFs) of the TSA signal are calculated by VMD, and the Amplitude Energy (AE) and Permutation Entropy (PE) of the first four IMFs are computed. Secondly, the IMFs of vibration signals are calculated, and the TSA features are extracted from the most informative IMF. Moreover, 16 features in the time domain, 13 features in the frequency domain, and 9 features by TSA are extracted from the vibration signals. These features are extracted from healthy and four faulty conditions: crack, spalling, chipping, and wear in three different severities. After feature extraction, the Relief-F algorithm selects the informative features, and selected features are utilized for fault detection by a Feed-Forward Neural network (FNN) classifier. In this study, the ability of the VMD-TSA method is compared with others like Empirical Mode Decomposition-TSA (EMD-TSA) and Ensemble Empirical Mode Decomposition-TSA (EEMD-TSA), which shows that the proposed method is more powerful than others in early fault detection. Besides, the classification accuracy of these methods is compared with some other feature selection methods like Laplacian Score (LS), Principal Component Analysis (PCA), and Minimum Redundancy-Maximum Relevance (MRMR). Also, the performance of the FNN classifier is compared with the Support Vector Machine (SVM). As shown in this study, the VMD-TSA features improve the early fault detection in a positive manner. For instance, the classification accuracy of all features without VMD-TSA in crack fault detection is 93.98%. However, by adding VMD-TSA features, the accuracy grows up to 99.48% in the same conditions.

Graphical Abstract

Keywords

Gear fault early fault detection variational mode decomposition time synchronous average relief-F algorithm

Highlights

• A new method VMD-TSA is proposed to detect the helical gearbox’s incipient faults.

• Experimental vibration signals have been used for the evaluation of the VMD-TSA method for incipient fault detection.

• A wide range of features in time, frequency, and time-frequency domains are extracted from experimental signals, using some common features and the VMD-TSA method.

• The performance of some feature selection methods is compared, and more informative features are selected from 63 ones in different domains, using the Relief-F method.

• The proposed VMD-TSA method increases the accuracy of classifiers compared to not using it and outperforms the EMD-TSA and EEMD-TSA in early fault detection.

Introduction

Gearboxes are one of the main parts of rotating equipment employed to transmit the power in a specific torque and rotation speed, where may experience gear faults due to improper lubrication, overloading, and production defects. Also, the gear faults may lead to some irreparable damages. Therefore, the faults need to be detected as early as possible to reduce the maintenance cost, improve the gearbox’s operation, and prevent subsequent damages. The common faults of gears are crack, spalling, chipping, and wear, which are studied in this research.

Many methods have been used for gearbox fault detection, whereas vibration signal processing is one of the practical approaches. For example, Schmidt et al.¹ proposed a new method using instantaneous power spectrum of vibration signals. They used this method for gearbox fault detection under time-varying conditions and the method was validated by experimental testing. Another usage of vibration signals is presented by Yu et al.² for planetary gearbox fault detection. They modeled vibration signals as an amplitude modulation and frequency modulation (AM-FM) in resonance region to detect the faults. They also validated their theoretical calculation using numerical simulation and experimental tests. In addition to these methods, time synchronous averaging (TSA) is a classical method for gearbox condition monitoring.³ Although the TSA can reduce noise in experimental vibration signals, it requires stationary vibration signals, which is usually impossible in many experimental cases.⁴ N. Ahmed et al.⁵ developed a modified TSA which was robust for gearbox fault detection under nonstationary torque and speed conditions. They proposed a Multiple-pulse Individually Rescaled-Time Synchronous Averaging (MIR-TSA) method for tooth root crack detection in spur gear. They showed that the mentioned method improves the gearbox fault detection under nonstationary conditions. Later, Jong M. Ha et al.⁴ introduced an Autocorrelation-based TSA (ATSA) for condition monitoring of planetary gearboxes in wind turbines. TSA’s optimal shape and window range function were defined in the ATSA method. It helped to improve data processing efficiency and inhibition of signal distortion during the TSA. In another method, RMS based probability density function and entropy measures were used by Sharma and Parey,⁶ in fluctuating speed condition. They proposed the RMS based method to boost the efficiency of entropy measures under varied speed condition.

In addition to TSA, mode decomposition methods like Empirical Mode Decomposition (EMD),⁷ Ensemble Empirical Mode Decomposition (EEMD),⁸ and Variational Mode Decomposition (VMD)⁹ have been more utilized for fault detection. EMD is a well-known time-frequency signal decomposition method that is useful for fault detection. The EMD decomposes a signal into some IMFs containing specific frequencies. However, in some complex signals like experimental vibration signals, which contain much background noise, the IMFs may affect each other. This phenomenon is called the mode mixing problem.¹⁰ Therefore, some approaches are proposed to alleviate the mode mixing. For example, Wu and Huang⁸ proposed a new method named EEMD. The main idea of EEMD is to create an ensemble of observations by adding white Gaussian noise to the measured signal. Ensemble averaging of the extracted IMFs reduces the effects of the mode interferences. The noise amplitude and ensemble number are two critical parameters in EEMD that affect the method’s efficiency. In addition, the remaining noise in the reconstructed signals can cause lower computational efficiency.¹⁰ To overcome these drawbacks, some other methods like Complementary EEMD (CEEMD),¹¹ and EMD Manifold (EMDM)¹² are proposed. Meanwhile, VMD is a noise-robust method proposed by Dragominitskiy and Zosso in 2014.⁹ Some of the EMD limitations, like mode mixing problems, sampling limits, and lacks of mathematical description, have been solved. In this regard, some successful applications of VMD for fault detection are reported. Yan and Jia¹³ employed VMD for rolling bearing fault diagnosis. First, they extracted features in three domains, including time, frequency, and time-frequency, that VMD computed the last one. Then, the LS algorithm was employed to select more fault-informative features. At last, Particle Swarm Optimization-based SVM (PSO-SVM) was introduced to diagnose multiple fault conditions. Y. Li et al.¹⁴ pursued VMD to remove the noise of the vibration signals from a planetary gearbox and highlight the faults. Then, they used Generalized Composite Multi-scale Symbol Dynamic Entropy (GCMSDE) to extract the features from denoised signals. After that, the features were refined by the LS algorithm. Finally, the healthy and faulty conditions were identified by Softmax regression. In another study, J. Li et al.¹⁵ presented a new method for impact fault detection of gearboxes. The proposed method was a combination of VMD with Coupled Underdamped Stochastic Resonance (CUSR) used to extract impulse fault features. In this study, the VMD decomposed the vibration signals into several IMFs, and then, according to the Correlation Kurtosis (CK) of each IMF, the informative IMF was selected. Finally, the impulse features were extracted by the output signal of the CUSR system. Vanraj et al.¹⁶ used a hybrid data fusion approach by combining the EMD and Teager-Kaiser energy operator for gearbox fault detection. They extracted some statistical features from acoustic and vibration signals and then, they are sorted by order of relevance using floating forward selection method. Finally, the fault diagnosis was carried out by k-nearest neighbor classifier. In 2019, R. Gu et al.¹⁷ proposed a method for incipient fault diagnosis of rolling bearings. The introduced method was a mixture of Adaptive VMD with Teager Energy Operator (AVMD-TEO). Firstly, the optimized parameters of AVMD, which was adopted by the grey wolf optimization algorithm, were searched. Then, the capable modal component for signal reconstruction was selected by efficient weighted kurtosis index, and at last, the reconstructed signal is processed by TEO to identify the fault frequency. In another example, Deng et al.¹⁸ used VMD and PCA-SVM incipient fault detection of rolling bearing. The energy and kurtosis values of each IMF were calculated and then, evaluated by PCA and used as an input for SVM. The performance of this method is validated by experimental data. In the aim of early fault detection, Zhipeng Ma et al.¹⁹ used encoder-based method to detect incipient faults for rotating machinery. They proposed an improved Gaussian process regression analysis for early fault diagnosis via encoder signals. The Gaussian process regression was modified by a spectral density complex kernel. In another research about incipient faults, Sharma and Parey²⁰ utilized the VMD for gearbox early fault detection under variational speed conditions. Also, the performance of VMD was compared with Empirical Wavelet Transform (EWT) and Flexible Analytic Wavelet Transform (FAWT). In continuation of using VMD, Qing Ni et al.²¹ introduced a Fault Information-guided VMD (FIVMD) for rolling element diagnosis. The FIVMD method was proposed to extract the incipient bearing repetitive transients and identify the optimum bandwidth control parameter to reach the maximum fault information. Finally, comparisons with VMD, EMD, and Local Mean Decomposition (LMD) were reported, showing that the proposed FIVMD is preferable in diagnosing early bearing repetitive transients. The Fine-tuned VMD is another VMD-based method that was used for fault diagnosis of rotary machinery by Dibaj et al.²² In this research, the values of the number of extracted modes $(K)$ and the mode frequency bandwidth control parameter (α) were optimized. They optimized these parameters using minimizing the mean bandwidth of the extracted modes.

In addition to the aforementioned methods, some other methods like Wavelet Packet Transform (WPT),²³ wavelet energy,²⁴ and adaptive parameter-induced stochastic resonance²⁵ were employed for gearbox fault detection in different fault severities. Some other signal processing and health indicators are reviewed in Ref. [26]. However, the severities of studied faults in these methods were not incipient enough for early fault detection. In this regard, earlier faults have been introduced in this study to show the proposed method’s ability in early fault detection.

Increasing the number of features in various domains can be beneficial for gearbox fault detection and assessment of the faults in different aspects. However, some of these features can be redundant and deceptive. As a result, some feature selection methods are utilized to choose more informative features to defects. For example, Karabacak et al.²⁷ proposed an intelligent feature selection and classifier method for worm gear fault diagnosis. They considered wear, pitting, and tooth breakage as gear faults. They extracted some features based on vibration, sound, and thermal image data and then, classify the faulty and healthy conditions using SVM and ANN. One of the other feature selection methods is MRMR. The MRMR algorithm finds a feature subset containing distinct features.²⁸ The two main conditions of this algorithm are defined with mutual information, and the combination of these conditions is called Mutual Information Quotient (MIQ). As the amount of MIQ increases, the feature is more valuable for fault detection. The LS algorithm, which is a method developed by X. He et al.,²⁹ is an effective solution for informative feature selection. In this method, a weight number that is based on feature variance is assigned to each feature. Due to the variance-basis of most feature selection methods, their main limitation is in multiple-class problems. The Relief-F algorithm, which is a kind of modified Relief, is suitable for two or more classes.³⁰ Like previous methods, a weight number is specified for each feature based on the k number of the nearest neighbor of each observation. The consideration of more than two neighbors can appropriate the method for more than two classes. Therefore, the Relief-F method is used for feature selection in this study. After that, the selected features are feed into the FNN classifier to identify all fault severities.

To sum up, fault diagnosis based on an experimental vibration signal is not as easy as a theoretical vibration signal. Measured signals are influenced by friction, lubrication between gears, load mechanism, and sensor mounting. Moreover, the measured signals are often contaminated by noise, which may overwhelm incipient faults and make fault detection more difficult. So, the classical methods like the Fast Fourier Transform (FFT), EMD, or some ordinary time and frequency features may not detect early fault as well as a severe one. One of the main differences between incipient faults and noise is that the faults are occurring at a specific frequency. However, the noise has been observed in a wide range of frequencies in vibration signals. The mentioned difference is the primary basis of the proposed method in this study which is called VMD-TSA.

In the first step of this paper, the theoretical background of the introduced method is reviewed, including VMD and TSA. Then, the feature extraction procedures are explained. For example, the equation of the time and frequency domain features, PE, AE, and TSA features are proposed. Also, the combination of VMD and TSA is explained. After the feature extraction step, the Relief-F algorithm used for feature selection in this paper is reviewed. In addition to the theoretical concepts of this paper, the experimental setup and measurement conditions are explained, and then, the classification accuracies of fault detections are reported. In addition, the results of different signal processing and feature selection methods are compared, and the conclusion is reported.

Theoretical background of the proposed method

In this section, the theoretical background of the proposed method is introduced. As mentioned in the previous section, the recommended method is based on VMD and TSA methods. Therefore, the main idea and the formulation of these methods are explained in this section.

Variational mode decomposition method

The VMD method is one of the newest mode decomposition methods was developed by Dragomiretskiy and Zosso.⁹ In contrast to the EMD and EEMD methods, which were experimental-based, this method is established on signal processing theories, including the Wiener filter, Hilbert transform, and signal transmission in frequency space. The primary purpose of VMD is to decompose signals into modes in which most of the valuable data is compressed at a specific frequency in each mode. This frequency is called the central frequency. The mode mixing problem, which was observed in EMD and EEMD, is rarely occurred in VMD. As a result, the frequency of defects is much better determined. The basic principle of VMD is regarded as solving the constrained variational problem, as described in equation (1)

\min_{u_{k}, ω_{k}} {\sum_{k} {‖ \partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2}}; \sum_{k} u_{k} = f

(1)

where

u_{k}

and

ω_{k}

denote the

k^{t h}

IMF and its central frequency, respectively. The signal processing theories which were mentioned above can be seen in equation (1). At first, the analytic signal of each mode is computed through the Hilbert transform. Then, the analytical signal is transformed to baseband utilizing exponential term

(e^{- j ω_{k} t})

. Eventually, the bandwidth is estimated by the L2-norm of the gradient. To solve the constrained variational problem in equation (1), the augmented Lagrangian

L

is introduced in equation (2)

L ({u_{k}}, {ω_{k}}, η) = ξ \sum_{k} {‖ \partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2} + {‖ f (t) - \sum_{k} u_{k} (t) ‖}_{2}^{2} + 〈 η (t), f (t) - \sum_{k} u_{k} (t) 〉

(2)

where

ξ

and

η

are penalty parameter and Lagrangian multiplier, respectively. Equation (2) can be solved by the Alternate Direction Method of Multipliers (ADMM), which is an iterative solution, and IMFs and their central frequencies are updated by equation (3)

{\begin{matrix} {\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \frac{\hat{η} (ω)}{2}}{1 + 2 ξ {(ω - ω_{k})}^{2}} \\ ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω {| {\hat{u}}_{k} (ω) |}^{2} d ω}{\int_{0}^{ω} {| {\hat{u}}_{k} (ω) |}^{2} d ω} \end{matrix}

(3)

In equation (3), ${\hat{u}}_{k}^{n + 1}$ , ${\hat{u}}_{k}^{n}$ , $\hat{f} (ω)$ , and $\hat{η} (ω)$ represent the Fourier transform of $u_{k}^{n + 1}$ , $u_{k}^{n}$ , $f (ω)$ , and $η (ω)$ . Also, n is the number of iteration. In addition to IMFs and their central frequencies, the Lagrangian multiplier is updated by equation (4)

{\hat{η}}^{n + 1} (ω) = {\hat{η}}^{n} (ω) + τ (\hat{f} (ω) - \sum_{k} {\hat{u}}_{k}^{n + 1} (ω))

(4)

The update process is continued until the convergence criterion in equation (5) is satisfied.

\sum_{k} \frac{{‖ {\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n} ‖}_{2}^{2}}{{‖ {\hat{u}}_{k}^{n} ‖}_{2}^{2}} < ε

(5)

where

ε

is usually set as

10^{- 6}

. More details about the VMD method are given in Ref. [9].

Time synchronous averaging method

The original TSA method is one of the usual gearbox condition monitoring methods. The TSA divides experimental signals into $N$ segments based on a rotational signal obtained by the tachometer. Although the number of revolutions in each section is equal, each has a different number of samples because of varying rotational speed. So, the signals should be resampled by interpolating to balance the number of samples during a rotation. The synchronized signals that are named TSA signals are summed together and averaged. Then, three other signals are obtained from the TSA signal to extract more valuable features that indicate faults. These signals are called difference, regular and residual signals. The difference signal is computed by filtering gear mesh frequencies, first-order sidebands at gear mesh frequencies, and respective harmonics from the TSA signal. The regular signal contains the shaft frequency and harmonics, the gear mesh frequencies, and the individual harmonics in the TSA signal. Finally, the residual signal is calculated by removing the shaft frequency, the gear mesh frequencies, and the respective harmonics from the TSA signal. In Figure 1(a), the experimental vibration signal in medium wear fault condition with speed 35 Hz and under 30N.m load, extracted from the gearbox test setup is shown. Also, the TSA, difference, regular and residual signals are proposed in Figure 1(b)–(e), respectively.

Figure 1.

The calculated signals from the time-synchronous average method in medium wear condition. (a) Experimental vibration signal (speed: 35 Hz, load: 30N.m), (b) TSA signal, (c) difference signal, (d) regular signal, and (e) residual signal.

The extracted feature from the TSA method is explained in section 3.3. More details about the TSA method can be found inRef. [3].

Feature extraction

In this section, all features which were extracted from experimental signals are defined. These features are divided into four sections, including some common statistic features in the time and frequency domain (section 3.1), permutation entropy (PE), and amplitude energy (AE) from IMFs (section 3.2), TSA features (section 3.3), and VMD-TSA features (section 3.4). The first three sections were used for bearing and gearbox fault detections.^5,13 However, they were not much accurate in incipient faults or early fault detection. In this study, some new features are introduced, produced by a combination of VMD and TSA. The main idea of this combination is that VMD and TSA are categories of noise-reduction methods, which help diagnose the early faults from noise. In addition, the VMD method is a kind of time-frequency method which can separate repetitive incipient faults from random noise.

Time and frequency domain feature

The amplitude and distribution of signals in the time-domain may be different in faulty and healthy conditions. In addition, the faults cause some new frequency components and/or change the value of existing components. So, the time and frequency domain features can be helpful in fault diagnosis. In this study, 16 time-domain and 13 frequency-domain features are extracted from vibration signals.

Time-domain features are some common statistical features that are listed in Table 1

F_{T 1}

–

F_{T 10}

are mean value, standard deviation, Root Mean Square (RMS), absolute mean value, skewness, kurtosis, variance, maximum value, minimum value, and peak-to-peak, respectively. Other features are known as dimensionless statistical features.

F_{T 11}

–

F_{T 16}

are named waveform index, peak index, pulse index, margin index, skewness index, and kurtosis index.¹³Where

x (i)

and

n

are the vibration signal and its number of samples, respectively.

Table 1.

Time-domain features.

$F_{T 1} = \frac{1}{n} \sum_{i = 1}^{n} x (i)$	$F_{T 2} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {[x (i) - F_{T 1}]}^{2}}$	$F_{T 3} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x {(i)}^{2}}$	$F_{T 4} = \frac{1}{n} \sum_{i = 1}^{n} \| x (i) \|$
$F_{T 5} = \frac{\sum_{i = 1}^{n} {(x (i) - F_{T 1})}^{3}}{(n - 1) F_{T 2}^{3}}$	$F_{T 6} = \frac{n \sum_{i = 1}^{n} {(x (i) - F_{T 1})}^{4}}{{(\sum_{i = 1}^{n} {(x (i) - F_{T 1})}^{2})}^{2}}$	$F_{T 7} = \frac{\sum_{i = 1}^{n} {(x (i) - F_{T 1})}^{2}}{n - 1}$	$F_{T 8} = \max \| x (n) \|$
$F_{T 9} = \min \| x (n) \|$	$F_{T 10} = F_{T 8} - F_{T 9}$	$F_{T 11} = \frac{F_{T 2}}{F_{T 4}}$	$F_{T 12} = \frac{F_{T 8}}{F_{T 2}}$
$F_{T 13} = \frac{F_{T 8}}{F_{T 4}}$	$F_{T 14} = \frac{F_{T 8}}{F_{T 3}}$	$F_{T 15} = \frac{F_{T 5}}{{(\sqrt{F_{T 7}})}^{3}}$	$F_{T 16} = \frac{F_{T 6}}{{(F_{T 7})}^{2}}$

Frequency-domain features are based on FFT. As illustrated in Table 2, the feature value

F_{F 1}

indicates the signal energy in the frequency domain,

F_{F 5}

and

F_{F 7}

–

F_{F 9}

denote the position change of the main frequency band and other features that show the concentration and dispersion of the frequency spectrum.¹³

y (k)

is the FFT spectrum of time signal

x (t)

, and

k

represents the number of spectrum lines. Also

f_{k}

denotes the frequency value of the

k^{t h}

spectrum line.

Table 2.

Frequency-domain features.

$F_{F 1} = \frac{\sum_{k = 1}^{K} y (k)}{K}$	$F_{F 2} = \frac{\sum_{k = 1}^{K} {[y (k) - F_{F 1}]}^{2}}{K - 1}$	$F_{F 3} = \frac{\sum_{k = 1}^{K} {[y (k) - F_{F 1}]}^{3}}{K {(\sqrt{F_{F 2}})}^{3}}$	$F_{F 4} = \frac{\sum_{k = 1}^{K} {[y (k) - F_{F 1}]}^{4}}{K {(F_{F 2})}^{2}}$	$F_{F 5} = \frac{\sum_{k = 1}^{K} (f_{k} y (k))}{\sum_{k = 1}^{K} y (k)}$
$F_{F 6} = \sqrt{\frac{\sum_{k = 1}^{K} [{(f_{k} - F_{F 5})}^{2} y (k)]}{K}}$	$F_{F 7} = \sqrt{\frac{\sum_{k = 1}^{K} (f_{k}^{2} y (k))}{\sum_{k = 1}^{K} y (k)}}$	$F_{F 8} = \sqrt{\frac{\sum_{k = 1}^{K} (f_{k}^{4} y (k))}{\sum_{k = 1}^{K} (f_{k}^{2} y (k))}}$	$F_{F 9} = \frac{\sum_{k = 1}^{K} (f_{k}^{2} y (k))}{\sqrt{[\sum_{k = 1}^{K} (f_{k}^{4} y (k))] [\sum_{k = 1}^{K} y (k)]}}$	$F_{F 10} = \frac{F_{F 6}}{F_{F 5}}$
$F_{F 11} = \frac{\sum_{k = 1}^{K} [{(f_{k} - F_{F 5})}^{3} y (k)]}{K {(F_{F 6})}^{3}}$	$F_{F 12} = \frac{\sum_{k = 1}^{K} [{(f_{k} - F_{F 5})}^{4} y (k)]}{K {(F_{F 6})}^{4}}$	$F_{F 13} = \frac{\sum_{k = 1}^{K} [\sqrt{\| f_{k} - F_{F 5} \|} y (k)]}{K \sqrt{F_{F 6}}}$

Permutation entropy and amplitude energy

Amplitude Energy (AE) is based on FFT amplitude. Faults may lead to some peaks in the frequency domain, making a difference between healthy and faulty conditions. Equation (6) shows that the AE is computed by summing up the square FFT amplitude for each signal point¹³

A E = \sum_{i = 1}^{n} {[F F T (x (i))]}^{2}

(6)

x (i)

and

n

are the time signal and its number of samples, respectively.

Permutation Entropy (PE) is a novel feature presented by Bandt and Pompep³¹ that evaluates the regularity of signal and is useful for rotary machine condition monitoring.³² For any given time signal that includes $N$ samples, the phase space can be reconstructed as below

[\begin{matrix} X (1) = x (1), x (1 + τ), \dots, x (1 + (m - 1) τ) \\ \begin{matrix} ⋮ \\ X (i) = x (i), x (i + τ), \dots, x (i + (m - 1) τ) \end{matrix} \\ ⋮ \\ X (N - (m - 1) τ) = x (N - (m - 1) τ), x (N - (m - 2) τ), \dots, x (N) \end{matrix}]; i = 1, \dots, (N - (m - 1) τ)

(7)

Where

m

is the embedded dimension and

τ

is the time delay which, based on trial and error, are set to 3 and 1 in this study, respectively. As described in,³²

m

, which is the number of real samples in each reconstructed section

x (i)

, can be arranged in increasing order as

x (i + (j_{1} - 1) τ) \leq x (i + (j_{2} - 1) τ) \leq \dots \leq x (i + (j_{m} - 1)) τ

(8)

If there are two or more equivalent samples in the reconstructed section $x (i)$ , the samples would be arranged based on their original position. For example, if $x (i + (j_{2} - 1) τ) = x (i + (j_{3} - 1) τ)$ and $j_{2} < j_{3}$ , it may be concluded that $x (i + (j_{2} - 1) τ) \leq x (i + (j_{3} - 1) τ)$ . Then, a group of symbols is assigned to each section, based on the arrangement shown in equation (8). The arrangement of samples in each section with $m$ samples can be $m!$ , so the symbol group is one of these $m!$ symbol groups. The probability of each group of the symbols is indicated by $P_{1}, P_{2}, \dots, P_{k}$ , and the summation of them equals one. So, the PE of order $m$ for the time signal $x (i)$ is computed by equation (9)

H_{P} (m) = - \sum_{l = 1}^{k} P_{l} \ln P_{l}

(9)

When all the symbol groups have the same probability $1 / m!$ , the maximum value of PE can be obtained as $ln (m!)$ . So, PE is normalized as equation (10).

H_{P} = \frac{H_{p} (m)}{\ln (m!)}

(10)

Therefore, the maximum and minimum values of $H_{P}$ can be one and zero, respectively. The minimum value means the time signal is entirely regular, and the maximum value indicates that all probabilities are equal, as is in the white Gaussian noise. So, PE in time signals can be helpful to distinguish the faulty and healthy conditions.

In this study, the AE and PE features are extracted from the first four IMFs of time signals obtained by the VMD method. Moreover, they are used in VMD-TSA combinations.

Time synchronous averaging features

In this study, nine features are extracted from TSA, difference, regular and residual signals. Each feature is effective in diagnosing some kinds of faults. For example, the RMS is employed to evaluate the energy amplitude of the signal. The presence of localized faults increases the peakiness of vibration signals, and hence, the statistical distribution of the signal departs from the Gaussian distribution. As a result, kurtosis as a measure of non-Gaussianity will change. Also, the crest factor is a practical feature in detecting severe defects in a limited number of teeth. The crest factor is informative to high amplitude and a limited number of peaks in the signal. The RMS and kurtosis are explained in Table 1, and the crest factor is computed by equation (11)

C r e s t F a c t o r (x) = \frac{P (x)}{R M S (x)}

(11)

where

x

and

n

are the TSA signal and number of samples, respectively. The three other features are the

4^{t h}

Order Figure of Merit (FM4), M6A, and M8A computed from the difference signal. FM4 has been used to detect some faults in a limited number of teeth. This feature is practically the same as the kurtosis obtained from the difference signal. Also, the

6^{t h}

normalized moment (M6A) and the 8^th normalized moment (M8A) are obtained to check for defects related to the surfaces of teeth. These features are calculated by equations (12)–(14)

F M 4 = \frac{n \sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{4}}{{(\sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{2})}^{2}}

(12)

M 6 A (d) = \frac{n^{2} \sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{6}}{{(\sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{2})}^{3}}

(13)

M 8 A (d) = \frac{n^{2} \sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{8}}{{(\sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{2})}^{4}}

(14)

Parameter $d$ indicates the difference signal in the above equations. In addition, zero-order Figures of Merit (FM0), Energy ratio, and NA4 are extracted from signals. FM0 is appropriate for significant faults. It is computed by comparing the maximum peak-to-peak amplitude of the TSA signal to the sum of the amplitudes of meshing frequencies and their harmonics, as shown in equation (15)

F M 0 = \frac{P . P . (x)}{\sum_{i = 1}^{n} A_{i} (R)}; A = \frac{f f t (R)}{n}

(15)

where

P . P . (x)

is the maximum peak-to-peak, and

A

is the amplitude of meshing frequencies and their harmonics. The energy ratio is defined as the ratio of the standard deviation of the difference and regular signals. It helps to detect the continuous fault, like wear. The energy ratio is computed by equation (16)

E n e r g y R a t i o = \frac{σ (d)}{σ (R)}

(16)

The last feature is NA4 which is computed from the residual signal. It is a modified FM4 and indicates the beginning of damage and its extension in a signal. The NA4 is calculated by equation (17)

N A 4 = \frac{\frac{1}{n} \sum_{i = 1}^{n} {(r_{i k} - {\bar{r}}_{k})}^{4}}{{[\frac{1}{k} \sum_{j = 1}^{k} \frac{1}{n} \sum_{i = 1}^{n} {(r_{i j} - {\bar{r}}_{j})}^{2}]}^{2}}

(17)

where

r

is the residual signal and

k

is the current time.

The combination of time synchronous averaging and variational mode decomposition

The main challenge in early fault diagnosis is that the fault signatures are buried in background noise and hard to be recovered. In most cases, the amplitude of the noise and incipient faults are the same in the time series and frequency spectrum. Therefore, noise-reduction methods like TSA and VMD can be suitable for early fault diagnosis. This paper proposes a new method that is made of VMD and TSA combinations. The combinations of these methods are obtained in two ways. First, the VMD method obtains the IMFs of the TSA signal, and AE and PE are extracted from the first four IMFs. The TSA features in section 3.3 are extracted from the most informative IMF of the time signal for the second way. Some techniques are proposed to choose the most informative IMF. For example, Yan and Gao³³ utilized a method based on energy measure and correction measure criteria to find the most informative mode. In another way, Wang et al.³⁴ selected the most informative IMF with the highest kurtosis index value. However, these methods used the time domain of the IMF and ignored the frequency domain, but they provide valuable information. In this paper, a hybrid method introduced by J. Wang et al.¹² has been used. The method is based on the kurtosis of the time and frequency domain of IMFs. The purpose of this combination is to reduce the effect of outliers. This method is named Time and Envelope Spectrum Kurtosis (TESK) and expressed by equation (18)

t k = k_{c} . k_{e s}

(18)

where

k_{c} = \frac{E {(c (t) - μ_{c})}^{4}}{σ_{c}^{4}}, k_{e s} = \frac{E {(e s (f) - μ_{e s})}^{4}}{σ_{e s}^{4}}

(19)

that

c (t), μ_{c}, σ_{c}

and

E ()

are the considered IMF, mean of

c (t)

, the standard deviation of

c (t)

, and expectation operator, respectively. Also,

e s (f)

as the power spectrum of the envelope of

c (t)

μ_{e s}

as the mean of

e s (f)

, and

σ_{e s}

as the standard deviation of

e s (f)

are specified.

In conclusion, 63 features are extracted from vibration signals, categorized into three main parts: time-domain, frequency-domain, and time-frequency-domain features. These features include 16 time-domain features, 13 frequency-domain features, 8 features by VMD method, 9 features by TSA method, 8 features by the first combination of TSA and VMD, and 9 features by the second combination. In this regard, the provided experimental data are evaluated in different representation, and the loss of valuable information of faults will be reduced. It is worth mentioning that if all features fed to classifier simultaneously, the high dimensionality of input space will result in poor classification accuracies (the so-called “curse of dimensionality” effect). Therefore, the most informative features are selected by the feature selection methods and used for fault detection. The algorithm flowchart of the introduced method is shown in Figure 2.

Figure 2.

Algorithm flowchart of the introduced method.

Relief-F algorithm

Many methods like LS, PCA, MRMR, and Relief-F are used for feature selection. In this paper, the performance of these feature selection methods has been evaluated accurately in detection of spalling fault severity (see Appendix 1) informative, and Relief-F as the most effective method has been selected for further investigations. The Relief-F is a kind of modified Relief method that was presented by Kononenko.³⁰ In the Relief method, a weight number is assigned to each feature, and whatever it increases, the feature is more potent to separate the different classes. To compute the weight number, Relief found two nearest neighbors for a random instance ( $R_{i}$ ) in which one from the same class and the other from a different class, and they are called the nearest hit (H) and the nearest miss (M), respectively. Then, the weight number for feature F is calculated by equation (20), and this process is repeated for $m$ times that the $m$ is a user-defined parameter

W [A] = W [A] - \frac{d i f f (F, R_{i}, H)}{m} + \frac{d i f f (F, R_{i}, M)}{m}

(20)

The initial value of $W [A]$ is zero, and the weight number is updated each time. Also, the diff operator is defined as shown in equation (21)

d i f f (A, I_{1}, I_{2}) = \frac{| v a l u e (A, I_{1}) - v a l u e (A, I_{2}) |}{\max (A) - \min (A)}

(21)

As noted in equations (20) and (21), whatever the difference between $F$ and the nearest miss increases and the difference within $F$ and the nearest hit decreases, the weight number will be increased. In other words, feature $F$ can separate different classes more accurately than other features in which their weight number is not enough.

Although the Relief can separate two different classes in a feature vector, its accuracy has decreased by increasing the number of classes. So, the Relief-F method is proposed to select the features that can separate multiple classes and increase the weight reliability, mainly in noisy conditions.³⁵ Both Relief and Relief-F have the same idea, but the main difference between them is that Relief-F chooses $k$ neighbors in the nearest hits $(H_{j})$ and misses $(M_{j})$ instead of one neighbor in Relief. As in other researches, the $k$ is set 10 in this study.³⁵ So, the equation (20) changes to equation (22) as follows

W [A] = W [A] - \sum_{j = 1}^{k} \frac{d i f f (F, R_{i}, H_{j})}{m . k} + \sum_{C \neq c l a s s (R_{i})} \frac{\frac{P (C)}{1 - P (c l a s s (R_{i}))} \sum_{j = 1}^{k} d i f f (F, R_{i}, M_{j} (C))}{m . k}

(22)

Each class of misses is weighted with the prior probability of that class P (C). Also, according to the third term in equation (22), which includes miss classes, the prior probability of the hit class $R_{i}$ is missed. More details about Relief-F can be fined in.³⁵

Experimental evaluation

Test setup description

In order to gearbox fault detection in vibration signals by the proposed method, an experimental test setup is being used, as shown in Figure 3. The test setup is designed and constructed in the Acoustics Research Laboratory at Amirkabir University of Technology.²⁴ It uses an AC motor (3 phase, 7.5 kW) to drive the gearbox with a pulley-belt mechanism. An inventor is used to control the rotation speed and estimate the internal torque of the electrical motor. In addition, a tachometer is used to evaluate the rotation speed of the input shaft to consider the effect of the belt slip. A cable brake is used to apply torque, and the load is controlled by changing the cable tension.

Figure 3.

Gearbox test setup and its component.

Furthermore, two uniaxial accelerometers are mounted on the gearbox, and one is mounted on the test setup. Also, the signals are collected by an A/D converter (Advantech PCI-1712, 12 bit, 1 MS/s), and then they are loaded into MATLAB (Figure 4).

Figure 4.

Three uniaxial accelerometers.

Test gears

The investigated gearbox contains five helical gear pairs, of which two pairs are used in this study. The spalling and chipping faults are created in pair 1, and wear and crack faults are made in pair 2. The chipping is induced on the pinion of pair 1, and the spalling is made on gear. Also, the wear fault is generated on the pinion of pair 2, and the crack is produced on gear. Parameters of each pair are proposed in Table 3.

Table 3.

Parameters of each pair.

Parameter	Pair (1)	Pair (2)
Parameter	Pinion/Gear	Pinion/Gear
Number of teeth	31/40	15/28
Pressure angle $(°)$	20	20
Helix angle $(°)$	31.5	31.5
Normal module (mm)	2	2
Face width (mm)	15	15

As shown in Figures 5–8, each fault is created in three different severities. A pneumatic rod grinder has been used to produce the spalling and chipping with a polycrystalline diamond drill bit. The spalling was made in one tooth as the incipient condition. The created fault is 0.9 mm in depth, and its length and width are 9 and 1.5 mm, respectively. The spalling was produced in three consecutive teeth with the same length and width as the incipient one and 0.2 mm depth in the medium fault. Finally, the severe spalling was induced in three repeated teeth, as same as the early fault. In addition, the severities of chipping are created based on the removed volume of a tooth. Thus, the early chipping is produced by 5% volume reduction, and medium and severe conditions are created by 10% and 15% volume reduction, respectively.

Figure 5.

Spalling fault on the gear of pair 1. (a) Severe fault, (b) medium fault, and (c) incipient fault.

Figure 6.

Chipping fault on the pinion of pair 1. (a) Severe fault, (b) medium fault, and (c) incipient fault.

Figure 7.

Crack fault on the gear of pair 2. (a) Severe fault, (b) medium fault, and (c) incipient fault.

Figure 8.

Wear fault on the pinion of pair 2. (a) Severe fault, (b) medium fault, (c) incipient fault, and d) healthy condition.

As observed in Figure 7, the crack is considered as a groove in the root of teeth, and it is created by wire cut. Also, the wear is made by electrolytic polish operation. In this process, the shaft that contains the gear is rotated uniformly, and the teeth are contacted with electrolytes. So, the wear is created uniformly in all teeth (Figure 8).

Test measurement conditions

As mentioned in section 5.1, three uniaxial accelerometers measure the vibration signals, and an A/D converter collects these signals. Signal acquisition for each test condition is performed with a sampling rate of 40 kS/s at 30, 35, and 40 Hz motor rotation speed and 35 and 45 Nm load on the motor output. Also, each test took about 10 s and was repeated three times to reduce the effect of some unknown factors which exist in experimental tests.

Experimental results and analysis

Based on the explained conditions in section 5.3, the experimental vibration signals are collected and then loaded into MATLAB. Figures 9 and 10 show that the difference between healthy and faulty conditions is not recognizable, using time-domain signals. Due to the environmental noise and test setup complexity like cable brake, pulley-belt mechanism, and gearbox operation, the fault detection might be difficult or impossible for some conditions. In Figures 9 and 10, both signals are collected at the speed of 40 Hz and the output torques 45 Nm. The mentioned points show the Gear Mesh Frequency (GMF) calculated by multiplying the number of pinion teeth and rotation speed. As explained earlier, because of the pulley-belt mechanism, the rotation speed of the input shaft is not as exact as the motor speed. So, the rotation speed is 37.6 Hz and based on Table 3, the GMF of gear pair 2 is 564.08 Hz. By comparing Figures 9 and 10, the wear and healthy signals are not much different from each other. For this reason, the VMD-TSA method is introduced to separate the healthy and early fault conditions.

Figure 9.

Experimental vibration signal of gear pair 2, in time and frequency domain for a health condition (H). (a) Accelerometer 1, (b) accelerometer 2, and (c) accelerometer 3.

Figure 10.

Experimental vibration signal of gear pair 2, in time and frequency domain for incipient wear condition (W1). (a) Accelerometer 1, (b) accelerometer 2, and (c) accelerometer 3.

The features in section 3 are extracted from each accelerometer’s vibration signal, and each one’s detection ability in different fault severities is evaluated. The accelerometer, which has the best performance, is reported in Tables 4–6. As shown in these tables, features are separated into three parts, including (1) time and frequency domain features, (2) TSA features, and (3) VMD and VMD-TSA combinations. In each part, 10 features are selected by the Relief-F method, in which they are more informative than the other features in fault detection. Finally, the classification accuracy of each feature section is calculated by the FNN classifier and presented in Tables 4–6 in all faults. The FNN classifier consists of three layers, including an input, hidden, and output layer, as the number of hidden layers can be changed.³⁶ These models are called feed-forward because there are no feedback connections between layers. In this study, the Scaled conjugate gradient backpropagation error algorithm trains the classifier, and 70% of input data are assigned for training. In addition, 15% of input data are used for training verification, and the last 15% are consigned for tests. The FNN classifier used in this study contains a hidden layer, and the number of neurons in this layer is changed from 5 to 30. More information about the classifier’s performance is reported in Table 7. In some cases, the number of neurons in the hidden layer is set at 15 since, compared with the lower and higher number, the classifier’s performance is better (see Tables 8 and 9).

Table 4.

The classification accuracy of faults’ severities, using selected time and frequency domain features and feed-forward neural network classifier with 15 neurons in hidden layer (four-class problem).

Faults	Accelerometer number	Classification accuracy (%)
Spalling	1	99.20
Chipping	2	98.20
Crack	1	96.90
Wear	3	100.00

Table 5.

The classification accuracy of faults’ severities, using TSA features and feed-forward neural network classifier with 15 neurons in hidden layer (four-class problem).

Faults	Accelerometer number	Classification accuracy (%)
Spalling	1	88.10
Chipping	2	99.50
Crack	1	94.60
Wear	1	100.00

Table 6.

The classification accuracy of faults’ severities using selected VMD and VMD-TSA features and feed-forward neural network classifier with 15 neurons in hidden layer (four-class problem).

Faults	Accelerometer number	Classification accuracy (%)
Spalling	1	96.40
Chipping	2	99.60
Crack	1	96.20
Wear	2	99.70

Table 7.

The feed-forward neural network classifier’s performance conditions.

Training algorithm	Scaled conjugate gradient backpropagation
Loss function	Cross entropy
Number of hidden layers	1
Number of hidden neurons	5 to 30
Activation function	Sigmoid
Training data	70% of input data
Validation data	15% of input data
Test data	15% of input data

Table 8.

The classification accuracy of faults’ severities by using ten selected features without VMD-TSA and different numbers of neurons in the hidden layer (four-class problem).

Faults	Number of neurons in the hidden layer
Faults	5	10	15	20	25	30
Spalling	91.76	91.99	92.02	92.16	92.29	94.60
Chipping	97.22	98.60	97.72	98.70	99.58	98.02
Crack	89.63	90.82	93.98	87.78	91.05	91.96
Wear	98.46	98.93	97.07	99.09	98.74	97.18

Table 9.

The classification accuracy of faults’ severities by using ten selected features with VMD-TSA and different numbers of neurons in the hidden layer (four-class problem).

Faults	Number of neurons in the hidden layer
Faults	5	10	15	20	25	30
Spalling	96.47	99.14	99.46	99.46	98.91	99.55
Chipping	99.81	99.91	99.15	99.71	100.00	99.83
Crack	99.52	99.56	99.48	99.25	99.58	98.68
Wear	99.39	99.09	99.93	99.36	100.00	98.73

One of the classification problems evaluated in this study is a kind of four-class problem, including three different severities of each fault and healthy condition. As shown in Table 4, the time and frequency domain features have classified the fault severities with high efficiency. The TSA features have separated the classes with acceptable accuracy, as mentioned in Table 5. In addition, the proposed VMD-TSA combinations method has achieved high performance in fault severity classification, as shown in Table 6. The different performance of classifiers in different faults is related to the intrinsic of each of them. In other words, the chief reason for the difference is based on the influence of Time-Varying Mesh Stiffness (TVMS) and Static Transmission Error (STE) as parametric and displacement excitations, respectively, on the dynamics behavior of gears.³⁷

Therefore, the suitable accelerometer for each part of the features and in each fault is designated. For example, in wear fault, the classification accuracy of the VMD and VMD-TSA features which are extracted from the accelerometer numbers 2, is better than other accelerometers, as shown in Table 6. So, all the 63 features are extracted from the selected vibration signals in Tables 4–6. In addition, another finding can be concluded from the results that each part of features cannot separate severities of all faults. For instance, as reported in Table 5, the accuracy of TSA features in spalling fault is not as good as other features. It shows that each feature can solve the classification problem in its view. So, it is necessary to use all features together and select a set of features that are more informative to the presence of faults. In this way, 10 features are selected by the Relief-F method, and FNN solves the same four-class problem as a classification in some different neurons in the hidden layer. To show the positive effect of the proposed VMD-TSA for improving the classification accuracy, the four-class problem is solved by all features without and with VMD-TSA features that results are reported in Tables 8 and 9, respectively. As shown in Tables 8 and 9, the classification accuracy of selected features with VMD-TSA is more than without one in all faults. For example, using the selected features without VMD-TSA and FNN with 15 neurons in the hidden layer can classify the severities of spalling with 92.02%. However, the classification accuracy with VMD-TSA features is 99.46%, in the same condition, which shows that the VMD-TSA method can increase detection accuracy.

Besides diagnosing the severities of each fault, the selected features can classify all severities together. In this regard, 14 classes are introduced, including three severities of faults (four faults including crack, spalling, chipping and wear) and two healthy conditions of each gear pair. The classification accuracy is calculated in 10–25 features and 5 to 30 neurons in the hidden layer. For example, Table 10 shows that all conditions’ accuracy is near 100%, and the maximum accuracy can be achieved by ten features and 15 neurons in the hidden layer for the first time.

Table 10.

The classification accuracy of all severities by using different numbers of selected features and neurons in the hidden layer (14-class problem including 12 faulty conditions and 2 healthy condition for 2 gear pair).

Feature number	Number of neurons in the hidden layer
Feature number	5	10	15	20	25	30
10	99.12	99.24	100.00	99.99	100.00	100.00
15	97.16	99.60	99.81	99.84	99.99	100.00
20	94.94	99.64	99.77	99.91	98.99	100.00
25	97.88	99.94	99.95	99.98	99.96	99.99

In addition to the ability of the VMD-TSA method to detect all faults’ severities, it can be used primarily for early faults. In this regard, a six-class classification problem is introduced, including four incipient conditions of each fault (crack, chipping, spalling, and wear) and two health conditions of each gear pair. The classification is occurred with and without VMD-TSA combinations. As shown in Table 11, the influence of the extracted features by VMD-TSA combinations is considerable in growing accuracy. For example, using 30 neurons in the hidden layer, the accuracy of the early fault problem (six-class problem) is 96.06% and 98.73% without and with combinations, respectively. So, the proposed VMD-TSA is effective in early fault detection.

Table 11.

The classification accuracy of incipient faults by using selected features with and without VMD-TSA and different numbers of neurons in the hidden layer (six-class problem, including four incipient severities of each kind of fault and two healthy conditions).

Features	Number of neurons in the hidden layer
Features	5	10	15	20	25	30
Without combinations	97.83	97.25	97.88	98.17	99.15	96.06
With combinations	98.63	98.61	98.56	98.51	99.25	98.73

To sum it up, the extracted features by VMD-TSA combinations have a noticeable effect on improving classification accuracy, especially in incipient faults. As a result, the proposed features in this study can separate all severities of faults and classify early severities of all. It is worth to mention that the early faults are not misclassified by some benign vibrations, because the faulty situations are compared with healthy one, which the benign vibrations exist there. Moreover, in the aim of reducing misclassification, multi-domain features are extracted from the signal and the most informative features are selected for classification. In addition, as shown in Tables 4–6, all accelerometers are useful for fault detection, and each one has a different function in feature domains. However, the vibration signal which is collected by accelerometer number 1 has more valuable samples for fault detection.

Comparison among different methods

As mentioned in section 2.1, in contrast to EMD and EEMD, the VMD is a mode decomposition method based on some signal processing theories. In this section, the performance of VMD is compared with EMD and EEMD. In this way, EMD-TSA and EEMD-TSA are proposed. In these methods, the IMFs are calculated by EMD and EEMD, respectively. In addition, the support vector machine (SVM) method is exploited for classification, and the results are compared with FNN. It is worth mentioning that the performance of different feature selection methods is reported in Appendix 1.

As expected, because of the mathematical base of the VMD, the VMD is more efficient than EMD and EEMD. In this regard, the FFT of the first three IMFs of the experimental signals of early chipping and healthy conditions are shown in Figure 11. As shown in Figure 11, the difference in the central frequency between faulty and healthy conditions in VMD is more recognizable than others. Therefore, the VMD method is more accurate than EMD and EEMD in early fault classification. The chief reason is that the VMD focuses on frequencies. As explained earlier, the EMD method is influenced by noise, and since the amplitude of incipient faults is trapped by noise, the EMD cannot separate the incipient faults as VMD can. In addition, in the EEMD method, adding some noise into experimental signals makes the classification more complicated than EMD. According to early faults in vibration signals, it can cause misclassification, and therefore, the classification performance is reduced. Hence, the frequency difference between noise and early faults plays a prominent role in fault detection, and this is the primary preference of the VMD over others. For more explanation, the classification accuracy of EMD-TSA, EEMD-TSA, and VMD-TSA in a four-class problem in chipping fault is reported in Appendix 2.

Figure 11.

The FFT of three first modes of early chipping (CH1) and health conditions (H). (a) VMD, (b) EMD, and (c) EEMD.

Moreover, the performance of other feature selection methods is evaluated for EMD-TSA and EEMD-TSA. For this purpose, 10 more informative features are used for classification, selected by Relief-F, LS, MRMR, and PCA methods. Then, the classification accuracy for each selection is calculated by SVM and FNN, and the maximum accuracy is presented in Tables 12 and 13. As shown, the features selected by the Relief-F are more accurate than selected features by other methods. For instance, in chipping fault, the best feature selection method in EMD, EEMD, and VMD combination with TSA is Relief-F. In Relief-F, k number of nearest neighbors is selected, that each of them belongs to a unique class number. However, other methods are variance-based, which can cause misclassification in problems with more than two classes and being more informative to noisy conditions. Also, in each fault, the classification accuracy of the VMD-TSA is more than other combinations. For example, in crack fault, the classification accuracy of the VMD-TSA method using the FNN classifier is 96.20%. However, the accuracy of EMD-TSA and EEMD-TSA is 91.50% and 79.00%, respectively.

Table 12.

The classification accuracy of faults’ severities by using selected features with EMD, EEMD, and VMD-TSA and feed-forward neural network classifier (four-class problem).

Spalling				Chipping
Mode decomposition	Accelerometer number	Best feature selection	FNN accuracy (%)	Mode decomposition	Accelerometer number	Best feature selection	FNN accuracy (%)
EMD-TSA	1	MRMR	88.58	EMD-TSA	2	Relief-F	96.55
EEMD-TSA	1	LS	79.82	EEMD-TSA	1	Relief-F	80.98
VMD-TSA	1	Relief-F	99.45	VMD-TSA	2	Relief-F	99.65
Crack				Wear
Mode decomposition	Accelerometer number	Best feature selection	FNN accuracy(%)	Mode decomposition	Accelerometer number	Best feature selection	FNN accuracy (%)
EMD-TSA	2	Relief-F	91.50	EMD-TSA	3	LS	98.32
EEMD-TSA	3	Relief-F	79.00	EEMD-TSA	2	LS	89.65
VMD-TSA	1	Relief-F	96.20	VMD-TSA	2	Relief-F	99.70

Table 13.

The classification accuracy of faults’ severities by using selected features with EMD, EEMD, and VMD-TSA and SVM classifier (four-class problem).

Spalling				Chipping
Mode decomposition	Accelerometer number	Best feature selection	SVM accuracy (%)	Mode decomposition	Accelerometer number	Best feature selection	SVM accuracy (%)
EMD-TSA	1	MRMR	80.00	EMD-TSA	2	Relief-F	87.50
EEMD-TSA	1	Relief-F	64.60	EEMD-TSA	1	Relief-F	68.50
VMD-TSA	1	Relief-F	94.40	VMD-TSA	2	Relief-F	99.00
Crack				Wear
Mode decomposition	Accelerometer number	Best feature selection	SVM accuracy%	Mode decomposition	Accelerometer number	Best feature selection	SVM accuracy%
EMD-TSA	2	Relief-F	76.70	EMD-TSA	2	LS	93.30
EEMD-TSA	2	Relief-F	65.20	EEMD-TSA	3	Relief-F	85.80
VMD-TSA	1	Relief-F	90.60	VMD-TSA	2	Relief-F	98.50

Finally, the proficiency of classification methods is appraised. As shown in Table 12 and 13, the accuracy of the FNN classifier is higher than the SVM. The SVM method natively is a binary classifier. The same principle is applied for multiclass problems after breaking down the multiclass into multiple binary classification problems. The One-vs-One and One-vs-Rest are two approaches that are used for multiple classifications.³⁸ However, the FNN classifier can be designed for multiple classes. Also, the difference between two classifiers in the training approach can be influential on their performance.

Some faults like crack and chipping dramatically change the TVMS of mating gears. On the other hand, wear and shallow spalls result in minor changes in TVMS while reasonably alter the STE between mating gears. Therefore, the signal pattern of each type of fault differs from the other.^39,40 A proper combination of signal processing, feature extraction, and feature selection techniques can reveal the incipient signatures of faults among other components. On the other hand, classifiers like FNN could differentiate various fault signatures and classify them into correct categories.

Conclusion

This study evaluates the performance of the VMD and the VMD-TSA combinations in gearbox fault diagnosis, specifically early faults. The experimental vibration signals of gearbox for different severities of four faults under three different speeds and two different torques are acquired and analyzed with signal processing methods. In addition to VMD and VMD-TSA features, other features in time and frequency domains are extracted from vibration signals to enrich the feature set. Overall, 63 features are extracted, including nine TSA features, 16 time-domain features, 13 frequency-domain features, 8 VMD features, and 17 VMD-TSA features. Then, 10 features that are more fault-informative than others are selected by the Relief-F algorithm, and finally, fault severity is detected by the FNN classifier. As discussed, the performance of the proposed method is significant in fault severity classification, chiefly to early faults. For example, the classification accuracy of the spalling fault severity is 92.02% using all features without the VMD-TSA method and an FNN classifier with 15 neurons in a hidden layer. However, adding the VMD-TSA features can increase the accuracy up to 99.46%. Particularly in incipient faults, the accuracy of the six-class problem (early faults and healthy condition) is 97.88% and 98.56% without and with VMD-TSA features, respectively. In addition, to show the superiority of the VMD over EMD and EEMD, all features extracted by VMD and VMD-TSA are computed by EMD-TSA and EEMD-TSA, and the classification performance for each feature set is presented. It is concluded that the VMD and VMD-TSA are more accurate than other methods. In chipping fault, the classification accuracy of the VMD-TSA method using the FNN classifier is 99.65%. However, the accuracy of EMD-TSA and EEMD-TSA is 96.55% and 80.98%, respectively. Moreover, the effectiveness of the Relief-F algorithm is compared with other feature selection methods, including PCA, LS, and MRMR. It is also observed that the selected features by Relief-F are more fault-informative than other selected features. Also, the classification accuracy of the FNN method in EMD, EEMD, and VMD feature sets is much better than SVM. In crack fault, the classification accuracy of the four-class problem is 96.20%, using VMD-TSA features and FNN classifier. However, the accuracy of the SVM method is 90.60%, in the same conditions. It is worth mentioning that the performance of the FNN and SVM in some cases like crack and spalling defects are approximately the same, using VMD-TSA features. It shows that the proposed VMD-TSA method is powerful enough to detect all fault severities independent of classifier type.

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Abdolreza Ohadi

Appendix 1

As mentioned in section 4, the performance of each feature selection method is proposed in Table 14. The four-class classification problem is solved for spalling fault’s severities using all features with VMD and VMD-TSA. As shown in Table 14, the features selected by the Relief-F method have the maximum classification accuracy compared with other methods. Table 14.

Detailed classification accuracy of spalling fault’s severities using all features with VMD and VMD-TSA, feed-forward neural network classifier, and different feature selection methods (four-class problem).

VMD-TSA method
Feature selection method	Accelerometer number	Number of neurons in the hidden layer
Feature selection method	Accelerometer number	5	10	15	20	25	30
LS	1	91.30	91.77	90.06	90.48	92.18	90.25
	2	79.67	79.84	80.13	78.98	80.28	77.68
	3	68.48	69.84	69.48	67.17	68.38	68.14
PCA	1	86.72	88.43	91.18	90.17	90.62	88.52
	2	74.96	73.88	75.41	75.40	84.26	76.40
	3	70.90	65.29	68.80	67.77	69.95	66.47
Relief-F	1	96.47	99.14	99.45	99.46	98.91	99.55
	2	90.64	92.75	94.42	91.29	91.81	93.69
	3	73.72	72.36	71.10	81.74	76.36	75.21
MRMR	1	87.76	95.67	95.96	96.14	91.49	96.33
	2	94.02	93.67	96.13	93.30	93.77	95.56
	3	70.73	81.61	71.32	78.87	81.83	80.69

About the time consumption of feature extraction and classification, the calculation time for extracting all 63 features takes about 5 min. However, the classification time is much lower than a minute. For example, the classification time for a four-class problem (three fault intensity and healthy condition) in spalling fault is reported here in Table 15. We used an FNN classifier with 15 neurons in a hidden layer and selected different number of features using the Relief-F method from all 63 ones. Table 15.

Accuracies and time consumption of feed-forward neural network classifier with 15 neurons in a hidden layer in a four-class problem of spalling fault.

Number of features	Accuracy (%)	Time consumption (sec)
10	99.45	19.29
25	98.58	23.30
40	97.92	19.45
55	98.29	19.51
63	97.69	19.63

As shown in Table 15, there is not a remarkable difference between the time consumption of the classifier in the different number of features. However, as the number of features increases, the classifier’s accuracy is reduced, which can be a result of “curse of dimensionality” effect. So, the highest percentage of accuracy can be achieved by ten number of the most sensitive features.

Appendix 2

As explained in section 6, the maximum accuracy of EMD, EEMD, and VMD methods in all faults is reported in Tables 12 and 13. This section presents the detailed classification accuracy for each mode decomposition method using the FNN classifier and ten features selected by feature selection methods. These results belong to the chipping fault (Tables 16–18). Table 16.

Detailed classification accuracy of chipping fault’s severities using all features with EMD and EMD-TSA and feed-forward neural network classifier (four-class problem).

EMD-TSA method
Feature selection method	Accelerometer number	Number of neurons in the hidden layer
Feature selection method	Accelerometer number	5	10	15	20	25	30
LS	1	74.03	74.75	76.96	80.21	75.55	74.98
	2	83.80	85.39	85.81	83.43	84.65	87.59
	3	73.58	76.68	75.91	77.90	81.26	76.01
PCA	1	73.19	74.29	73.95	77.13	76.56	74.90
	2	85.49	86.73	86.35	88.43	87.23	84.79
	3	72.51	77.28	76.50	75.09	76.91	77.24
Relief-F	1	83.42	84.24	83.82	83.76	83.90	84.23
	2	88.40	92.54	92.19	91.65	94.27	96.55
	3	85.13	86.47	89.38	86.39	86.28	86.68
MRMR	1	85.76	86.27	86.52	83.72	88.04	87.20
	2	89.82	86.59	92.79	92.38	89.09	92.84
	3	86.06	87.66	87.28	85.92	88.36	86.10

Table 17.

Detailed classification accuracy of chipping fault’s severities using all features with EEMD and EEMD-TSA and feed-forward neural network classifier (four-class problem).

EEMD-TSA method
Feature selection method	Accelerometer number	Number of neurons in the hidden layer
Feature selection method	Accelerometer number	5	10	15	20	25	30
LS	1	70.29	68.01	69.26	70.66	71.43	70.21
	2	66.19	66.16	66.65	69.24	67.23	68.94
	3	68.32	70.95	71.65	72.81	70.45	70.16
PCA	1	65.59	65.50	70.73	66.00	65.87	64.35
	2	70.32	70.50	69.56	71.04	71.71	71.06
	3	67.50	66.36	68.14	67.89	66.18	66.44
Relief-F	1	79.49	78.37	79.86	80.98	80.41	80.94
	2	73.26	72.78	73.14	72.81	74.02	73.41
	3	74.06	74.40	75.26	74.39	73.20	73.58
MRMR	1	79.52	80.36	80.46	80.32	79.62	79.19
	2	72.09	71.99	73.25	72.87	73.23	75.79
	3	74.10	69.72	74.23	73.87	73.84	74.57

Table 18.

Detailed classification accuracy of chipping fault’s severities using all features with VMD and VMD-TSA and feed-forward neural network classifier (four-class problem).

VMD-TSA method
Feature selection method	Accelerometer number	Number of neurons in the hidden layer
Feature selection method	Accelerometer number	5	10	15	20	25	30
LS	1	65.66	72.30	73.00	73.20	73.89	67.47
	2	73.13	77.80	69.52	74.72	77.71	78.43
	3	66.86	66.66	67.24	68.82	69.44	73.00
PCA	1	80.19	79.58	80.76	79.88	80.84	81.60
	2	96.04	95.53	96.0	96.42	97.53	97.39
	3	69.94	67.06	68.00	68.17	77.09	69.89
Relief-F	1	83.51	91.47	89.84	85.19	86.13	84.92
	2	98.30	99.52	99.65	99.51	98.56	99.54
	3	69.53	79.77	72.63	74.23	76.17	79.21
MRMR	1	81.28	87.51	91.67	85.74	88.86	84.85
	2	98.24	98.49	99.03	98.61	99.12	98.96
	3	70.70	78.05	76.36	70.20	76.51	73.36

References

Schmidt

Zimroz

Heyns

. Enhancing gearbox vibration signals under time-varying operating conditions by combining a whitening procedure and a synchronous processing method. Mech Syst Signal Process 2021; 156: 107668.

Feng

Liang

. Analytical vibration signal model and signature analysis in resonance region for planetary gearbox fault diagnosis. J Sound Vibration 2021; 498: 115962.

Bechhoefer

Kingsley

. A review of time synchronous average algorithms. Annu Conf PHM Society 2009; 1(1).

Youn

, et al. Autocorrelation-based time synchronous averaging for condition monitoring of planetary gearboxes in wind turbines. Mech Syst Signal Process 2016; 70–71: 161–175.

Ahamed

Pandya

Parey

. Spur gear tooth root crack detection using time synchronous averaging under fluctuating speed. Measurement 2014; 52: 1–11.

Sharma

Parey

. Gearbox fault diagnosis using RMS based probability density function and entropy measures for fluctuating speed conditions. Struct Health Monit 2017; 16(6): 682–695.

Huang

Shen

Long

, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond Ser A: Mathematical, Physical Engineering Sciences 1998; 454: 903–995.

Huang

. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adaptive Data Analysis 2009; 01: 1–41.

Dragomiretskiy

Zosso

. Variational Mode Decomposition. IEEE Transactions Signal Processing 2014; 62(3): 531–544.

10.

Lei

Lin

, et al. A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mech Syst Signal Process 2013; 35: 108–126.

11.

Yeh

J-R

Shieh

J-S

Huang

. Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv Adaptive Data Analysis 2010; 02(02): 135–156.

12.

Wang

Zhu

, et al. Fault diagnosis of rotating machines based on the EMD manifold. Mech Syst Signal Process 2020; 135: 106443.

13.

Yan

Jia

. A novel optimized SVM classification algorithm with multi-domain feature and its application to fault diagnosis of rolling bearing. Neurocomputing 2018; 313: 47–64.

14.

Wei

, et al. Health condition identification of planetary gearboxes based on variational mode decomposition and generalized composite multi-scale symbolic dynamic entropy. ISA Trans 2018; 81: 329–341.

15.

Wang

Zhang

, et al. Impact fault detection of gearbox based on variational mode decomposition and coupled underdamped stochastic resonance. ISA Trans 2019; 95: 320–329.

16.

Vanraj Dhami

Pabla

. Hybrid data fusion approach for fault diagnosis of fixed-axis gearbox. Struct Health Monit 2018; 17(4): 936–945.

17.

Chen

Hong

, et al. Incipient fault diagnosis of rolling bearings based on adaptive variational mode decomposition and Teager energy operator. Measurement 2020; 149: 106941.

18.

Deng

Zhang

Zhao

. Intelligent identification of incipient rolling bearing faults based on VMD and PCA-SVM. Advaces Mech Eng 2022; 14(1): 16878140211072990.

19.

Zhao

Chen

, et al. Encoder-based weak fault detection for rotating machinery using improved Gaussian process regression. Struct Health Monit 2021; 20(1): 255–272.

20.

Sharma

Parey

. Extraction of weak fault transients using variational mode decomposition for fault diagnosis of gearbox under varying speed. Eng Fail Anal 2020; 107: 104204.

21.

Feng

, et al. A fault information-guided variational mode decomposition (FIVMD) method for rolling element bearings diagnosis. Mech Syst Signal Process 2022; 164: 108216.

22.

Dibaj

Ettefagh

Hassannejad

, et al. Fine-tuned variational mode decomposition for fault diagnosis of rotary machinery. Struct Health Monit 2020; 19(5): 1453–1470.

23.

Wang

Liu

Cao

, et al. Subband averaging kurtogram with dual-tree complex wavelet packet transform for rotating machinery fault diagnosis. Mech Syst Signal Process 2020; 142: 106755.

24.

Heidari Bafroui

Ohadi

. Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 2014; 133: 437–445.

25.

Shi

Zhong

, et al. Incipient fault diagnosis of planetary gearboxes based on an adaptive parameter-induced stochastic resonance method. Appl Acoust 2022; 188: 108587.

26.

Kundu

Darpe

Kulkarni

. A review on diagnostic and prognostic approaches for gears. Struct Health Monit 2020: 1475921720972926. in press.

27.

Karabacak

Gürsel Özmen

Gümüşel

. Intelligent worm gearbox fault diagnosis under various working conditions using vibration, sound and thermal features. Appl Acoust 2022; 186: 108463.

28.

Ding

Peng

. Minimum redundancy feature selection from microarray gene expression data. J Bioinformatics Computational Biology 2005; 03(02): 185–205.

29.

Cai

Niyogi

. Laplacian score for feature selection. Adv Neural Information Processing Systems 2005; 18: 507–514.

30.

Robnik-Sikonja

Kononenko

. Theoretical and empirical analysis of relieff and relieff. Machine Learning 2003; 53(1): 23–69.

31.

Bandt

Pompe

. Permutation entropy: a natural complexity measure for time series. Phys Review Letters 2002; 88(17): 174102.

32.

Yan

Liu

Gao

. Permutation entropy: a nonlinear statistical measure for status characterization of rotary machines. Mech Syst Signal Process 2012; 29: 474–484.

33.

Ruqiang

Gao

. Rotary machine health diagnosis based on empirical mode decomposition. J Vibration Acoust 2008; 130(2).

34.

Wang

Chen

Dong

. Feature extraction of rolling bearing’s early weak fault based on EEMD and tunable Q-factor wavelet transform. Mech Syst Signal Process 2014; 48(1–2): 103–119.

35.

Urbanowicz

Meeker

La Cava

, et al. Relief-based feature selection: Introduction and review. J Biomedical Informatics 2018; 85: 189–203.

36.

Quiza

Davim

. Computational methods and optimization. In: Machining of Hard Materials. London: Springer, 2011, pp. 177–208.

37.

Dadon

Koren

Klein

, et al. A realistic dynamic model for gear fault diagnosis. Eng Fail Anal 2018; 84: 77–100.

38.

Gunn

. Support vector machines for classification and regression. ISIS Technical Report 1998; 14(1): 5–16.

39.

Shao

Mechefske

. The effects of spur gear tooth spatial crack propagation on gear mesh stiffness. Eng Fail Anal 2015; 54: 103–119.

40.

Mechefske

Timusk

. new dynamic model of a cylindrical gear pair with localized spalling defects. Nonlinear Dyn 2018; 91(4): 2077–2095.