A cross-domain fault diagnosis method for planetary gear-boxes based on multi-channel information fusion

Abstract

Cross-domain fault diagnosis of planetary gearboxes remains a significant challenge due to complex operating conditions and pronounced domain-specific distribution discrepancies. To address this issue, this study proposes a novel cross-domain fault diagnosis method based on multi-channel information fusion and domain-invariant representation learning. First, the synchrosqueezing S-transform (SSST) is employed to fuse and transform raw multi-channel vibration signals collected under varying working conditions into discriminative three-channel time–frequency representations, effectively enhancing fault-related feature expression. To mitigate domain shift, a global–local domain discrepancy metric strategy is introduced, which simultaneously measures and minimizes global distribution differences and local subdomain discrepancies, thereby promoting more effective domain confusion. Subsequently, a unified diagnostic framework is constructed based on the ResNet-50 architecture, enabling joint feature extraction and domain adaptation in an end-to-end manner. Experiments conducted on two planetary gearbox datasets demonstrate that the proposed method outperforms existing methods in terms of cross-domain diagnostic accuracy and robustness.

Keywords

planetary gearboxes cross-domain fault diagnosis multi-channel information fusion global-local domain discrepancy synchrosqueezing S-transform

1. Introduction

Planetary gearboxes, as critical components in rotating machinery, are extensively employed in industries such as wind power, aerospace, and rail transportation due to their high transmission efficiency, compact structure, and strong load-bearing capacity (Liu et al., 2023). However, their complex internal structure and frequent operation under high-speed, heavy-load, and harsh environmental conditions result in vibration signals that are highly non-stationary and nonlinear, posing significant challenges for accurate fault diagnosis (Liu et al., 2023; Peng et al., 2025). Once a fault occurs, it may lead to severe equipment failure, safety hazards, and substantial economic losses (Alabsi et al., 2024; Zhang et al., 2024). Therefore, developing effective and robust fault diagnosis methods for planetary gearboxes is of great importance to ensure the safe and reliable operation of mechanical systems, especially under variable working conditions and cross-domain scenarios (Han et al., 2024; Liu et al., 2024).

Feature extraction and pattern recognition are two critical steps in fault diagnosis (Z. Zhu et al., 2023). The complexity of gearbox vibration signals makes time-frequency joint analysis methods more effective than single-domain approaches in either the time or frequency domain (Liu et al., 2023). The S-transform (ST) is a widely used time-frequency analysis technique that is well-suited for handling and analyzing non-stationary signal impact features. However, its further development is limited by suboptimal time-frequency resolution (Wang et al., 2023). To address this issue, the synchrosqueezing S-transform (SSST), which integrates the ST with the synchrosqueezing transform (SST), has been proposed to enhance the time-frequency resolution of traditional ST methods (X. Zheng et al., 2020). Shuo et al. (Meng et al., 2020) employed the SSST to process strongly time-varying signals and combined it with a convolutional neural network to extract image features from the time-frequency representations, thereby achieving accurate fault diagnosis of gearboxes. In S. Li et al. (2023), a fault state identification method for diesel engines was proposed by integrating the SSST with a vision transformer. This approach effectively leverages the advantages of SSST in handling nonlinear and non-smooth signals, as well as the powerful image classification capability of the vision transformer. Chen et al. (Chen and Zheng, 2023) employed the SSST to convert the acquired sensor signals into time-frequency images. Frequency-domain features were extracted using a convolutional neural network CNN, fused via an attention mechanism, and further processed by a gated recurrent unit to capture time-frequency features. Finally, classification was achieved using a softmax layer, enabling effective fault diagnosis of wind turbine. The aforementioned studies have demonstrated the effectiveness of the SSST in fault diagnosis. However, these methods primarily focus on the transformation and application of single-channel information, while the synergistic contribution of multi-channel data has been overlooked.

Multi-channel signals can comprehensively reflect the operating condition of mechanical equipment, thereby enabling thorough mining of fault-related information. They have been widely applied in scenarios such as early fault diagnosis, where fault features are weak. By fusing multi-channel vibration signals into high-dimensional images and feeding them into convolutional neural networks (CNNs), the superior capabilities of CNNs in image feature extraction can be effectively leveraged for accurate fault classification (Guo et al., 2023). Azamfar et al. (Azamfar et al., 2020) proposed a fault diagnosis method based on motor current signature analysis. This approach utilizes a two-dimensional convolutional neural network (2D-CNN) architecture to fuse data obtained from multiple current sensors, and performs classification directly without the need for manual feature extraction. In T. Li et al. (2021), a multi-channel information extraction and fusion framework within non-Euclidean spaces is proposed. The authors develop a graph convolutional neural network model with multiple receptive fields, which significantly enhances the expressiveness of learned representations. Liu et al.(Liu et al., 2023) proposed a multi-source time-frequency feature fusion method by employing a strategy involving image-to-matrix transformation, matrix concatenation, and matrix-to-image reconstruction. In addition, Peng et al. (Peng et al., 2020) proposed a novel multi-branch and multi-scale convolutional neural network capable of automatically learning and fusing rich and complementary fault information from multiple signal components and time scales of vibration signals.

Nevertheless, these methods rely heavily on the assumption that the distribution of test data is consistent with that of the training data. In practice, the operating conditions are frequently non-stationary and diverse, while labeled training samples are scarce. Such distribution shifts significantly impair the generalization ability of trained models, resulting in reduced diagnostic accuracy and hindering their deployment in real-world industrial settings (Liu et al., 2025; Misbah et al., 2024). To address the aforementioned challenges, domain adaptation methods developed in the field of image recognition have emerged as a promising solution (Sun and Saenko, 2016; Y. Zhu et al., 2021). These methods aim to reduce the distribution discrepancy between the source and target domains, thereby enabling the knowledge learned from labeled source domain data to be effectively transferred to the target domain, even when labeled data in the target domain is scarce or unavailable(Qian et al., 2023; Wang et al., 2025).

Among the various domain adaptation strategies, discrepancy-based methods and adversarial-based methods are two mainstream paradigms. In discrepancy-based methods, a domain discrepancy term is incorporated into the loss function to explicitly measure and minimize the distribution discrepancy (Peng et al., 2026). For example, Li et al. (J. Li et al., 2024) employed a multi-kernel maximum mean discrepancy approach to measure and minimize domain discrepancy, and Liang et al. (Liang et al., 2023) combined local maximum mean discrepancy (LMMD) with a residual network to address fault diagnosis under variable speed conditions. Cao et al. (Cao et al., 2022) introduced the Cauchy kernel-induced maximum mean discrepancy and applied it to gearbox cross-domain diagnosis. In adversarial-based methods, a domain discriminator is trained jointly with the feature extractor through minimax optimization to learn domain-invariant representations (Alabsi et al., 2024; Han et al., 2024). More recently, graph neural network-based approaches have also been explored to model topological relationships among multi-sensor signals for cross-domain diagnosis (Li et al., 2021; Liu et al., 2025), and Transformer-based methods have shown strong capabilities in capturing long-range dependencies in vibration signals (Chen and Zheng, 2023). However, the aforementioned methods primarily focus on either global domain discrepancy (e.g., MMD) or local domain alignment (e.g., LMMD), while neglecting the synergistic contribution of both global and local information in cross-domain feature discrepancy evaluation. In fact, global metrics such as MMD align the marginal distributions of the two domains but are insensitive to class-conditional differences, while local metrics such as LMMD achieve class-wise alignment within subdomains but overlook the overall distribution consistency. For planetary gearboxes under variable speeds, where both an overall distribution shift and fault-specific local discrepancies coexist, jointly exploiting global and local alignment is expected to provide a more comprehensive and balanced measure of domain divergence.

Based on that, a novel cross-domain fault diagnosis method based on multi-channel information fusion is proposed in this paper. First, the SSST is employed to fuse and convert raw multi-channel vibration signals under varying working conditions into three-channel time-frequency representations. To address domain discrepancies, a global-local domain discrepancy metric strategy is introduced to measure and minimize both global and local distribution differences simultaneously, thus promoting domain confusion. A diagnostic framework is then established based on the ResNet-50 architecture, which integrates feature extraction and domain-invariant representation learning. Finally, extensive experiments are conducted to validate the feasibility and superiority of the proposed method. The main contributions of this paper include the following:

(1) A novel signal fusion strategy is proposed, where SSST is employed to convert multi-channel vibration signals under varying working conditions into unified three-channel time-frequency representations, effectively enhancing the feature richness for cross-domain diagnosis of planetary gearboxes.

(2) A global-to-local domain discrepancy measurement scheme is developed, improving domain adaptability in fault representation learning.

(3) A ResNet50-based cross-domain fault diagnosis framework is constructed, integrating the fused multi-channel time-frequency inputs and domain discrepancy metrics, and experimentally validated on planetary gearbox datasets to demonstrate its superior performance.

The rest of this paper is organized as follows: Section 2 presents the fundamental principles and advantages of the SSST. Section 3 details the framework of the proposed method. In Section 4, the data sources and preprocessing techniques used in this study are described. Section 5 provides the validation of the effectiveness of the proposed approach. Conclusions are made in Section 6.

2. Synchrosqueezing S-Transform

The SSST integrates the S-transform with synchrosqueezing techniques, enabling a more precise characterization of non-stationary signals in the time-frequency domain. By applying the SST to the time-frequency matrix obtained via the ST, the original time-frequency region is compressed, thereby enhancing the time-frequency resolution. This method is particularly well-suited for the time-frequency analysis of non-stationary vibration signals, as it allows for more accurate localization and tracking of transient components within the signal. Specifically, let x(t) represent the signal to be analyzed; its ST is given by (Stockwell et al., 1996):

S (τ, f) = \int_{- \infty}^{+ \infty} x (t) w (t - τ, f) e^{- j 2 π f t} d t

(1)

where τ represents the time-shift parameter along the time axis and

w (t - τ, f) = \frac{| f |}{\sqrt{2 π}} \exp (- \frac{{(t - τ)}^{2} f^{2}}{2})

denotes a frequency-dependent Gaussian window function. The standard deviation of this Gaussian window is inversely proportional to the frequency f, which enables the ST to provide higher time resolution at high frequencies and higher frequency resolution at low frequencies.

To perform synchrosqueezing of the ST result along the frequency axis, the instantaneous frequency of the signal x(t) must be calculated. Based on the ST output, the instantaneous frequency of x(t) can be expressed as:

f_{x} (τ, f) = f - \frac{i \partial_{τ} S (τ, f)}{2 π S (τ, f)}

(2)

The SSST of the signal can thus be obtained as:

S S S T_{x} (τ, f_{l}) = L_{f}^{- 1} \sum_{f_{k}} | S (τ, f_{k}) | f_{k} (f_{k} - f_{k - 1})

(3)

where f_k denotes the discretized frequency sample points in the time-frequency representation obtained by the ST that satisfy condition

| f_{x} (τ, f_{k}) - f_{l} | \leq L_{f} / 2

. That is, the time-frequency energy will be rearranged and aggregated to the target frequency

f_{l}

only when the instantaneous frequency

f_{x} (τ, f)

calculated by equation (2) falls into the frequency interval

[f_{l} + L_{f} / 2, f_{l} + L_{f} / 2]

centered at

f_{l}

and with a width of

L_{f}

To verify the advantage of the SSST in signal feature extraction, the following simulated signal is constructed:

{\begin{cases} x (t) = x_{1} (t) + x_{2} (t) + x_{3} (t) \\ x_{1} (t) = (2 + 0.2 \cos 10 π t) \cos [100 π t + 1.2 π \cos 6 π t] \\ x_{2} (t) = (2 + 0.6 \cos 60 π t) e^{- 0.1 t} \cos [300 π t + 4 π t^{1.8} + 6 π \sin 20 t] \\ x_{3} (t) = \sin [700 π t + 120 π \arctan ({(t - 5)}^{2})] \end{cases}

(4)

The signal is sampled at a frequency of 1000 Hz over a duration of 10 seconds. Both the ST and the SSST are applied for analysis. The resulting time-frequency representations are shown in Figure 1. After adding white noise to the signal, the corresponding results are presented in Figure 2.

Figure 1.

Time-frequency diagram of signal x(t): (a) ST; (b) SSST.

Figure 2.

Time-frequency diagram of signal x(t) after adding noise: (a) ST; (b) SSST.

It can be observed that both the ST and the SSST accurately capture the instantaneous frequency variations of the original signal prior to noise addition. However, the high-frequency component x₃(t) appears more blurred in the ST time-frequency representation compared to SSST. After the addition of white noise, the overall signal clarity in the time-frequency domain is noticeably reduced, with the degradation being particularly evident for the x₃(t) component. While both ST and SSST can still accurately identify low-frequency components, the high-frequency content becomes completely indistinguishable in the ST. In contrast, SSST effectively reassigns the dispersed energy back to its central frequency, maintaining high time-frequency resolution even in the presence of noise.

3. Proposed method

3.1. Methodological framework

This section presents a detailed description of the proposed method. As illustrated in Figure 3, the proposed method consists of three main components: a multi-channel information fusion module, a feature extractor, and a domain adaptation module. The multi-channel information fusion module constructs the model input based on the SSST approach. The feature extractor, built upon the ResNet50 architecture(He et al., 2016), extracts fault-related information and domain-invariant features from the fused input. The domain adaptation module facilitates the learning of transferable features by constructing global and local mean discrepancy measures. Finally, a classifier is applied to the learned transferable features to enable cross-domain fault diagnosis of gearboxes. The detailed parameter settings of the proposed method are summarized in Table 1.

Figure 3.

The framework of the proposed method.

Table 1.

Structure and parameters of ResNet50.

Layer name	Input size	Output size	Parameters
Layer 1	192 ×192	96 × 96	7 × 7, 64, stride 2
Layer 2	96 × 96	48 × 48	3 × 3 max pool, stride 2
Layer 2	96 × 96	48 × 48	$[\begin{array}{l} 1 \times 1, 32; f c [2, 32] \\ 3 \times 3, 32; f c [2, 32] \\ 1 \times 1, 128; f c [8, 128] \end{array}] \times 2 \times 3$
Layer 3	48 × 48	24 × 24	$[\begin{array}{l} 1 \times 1, 64; f c [4, 64] \\ 3 \times 3, 64; f c [4, 64] \\ 1 \times 1, 256; f c [16, 256] \end{array}] \times 2 \times 4$
Layer 4	24 × 24	12 × 12	$[\begin{array}{l} 1 \times 1, 128; f c [8, 128] \\ 3 \times 3, 128; f c [8, 128] \\ 1 \times 1, 512; f c [32, 512] \end{array}] \times 2 \times 6$
Layer 5	12 × 12	6 × 6	$[\begin{array}{l} 1 \times 1, 256; f c [16, 256] \\ 3 \times 3, 256; f c [16, 256] \\ 1 \times 1, 1024; f c [64, 1024] \end{array}] \times 2 \times 3$
Bottleneck	6 × 6	1 × 1	Average pool, 256-d fc
Class	1 × 1	1 × 1	3-d fc, softmax

3.2. Multi-channel information fusion

To fully exploit the time-frequency information of vibration signals, a multi-channel information fusion approach inspired by the color channels of images is proposed, as illustrated in Figure 4. Given the superior time-frequency representation capability of the SSST, it is employed to preprocess the acquired vibration signals and extract time-frequency feature matrices. In typical equipment health monitoring setups, sensors are installed in the horizontal, vertical, and axial directions, and vibration signals from each direction can be individually transformed into time-frequency coefficient matrices. Based on this, an independent feature channel is constructed for each directional monitoring signal, forming a multi-channel fused feature map. The specific steps are as follows:

(1) The vibration signals acquired from the three orthogonal directions (x, y, and z) of the monitored object are processed using the SSST to obtain the corresponding time-frequency coefficient matrices S₁, S₂, and S₃, which are expressed as:

S_{1} = [\begin{array}{l} s_{11}^{1} & s_{12}^{1} & \dots & s_{1 n}^{1} \\ s_{21}^{1} & s_{22}^{1} & \dots & s_{2 n}^{1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{1} & s_{m 2}^{1} & \dots & s_{m n}^{1} \end{array}], S_{2} = [\begin{array}{l} s_{11}^{2} & s_{12}^{2} & \dots & s_{1 n}^{2} \\ s_{21}^{2} & s_{22}^{2} & \dots & s_{2 n}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{2} & s_{m 2}^{2} & \dots & s_{m n}^{2} \end{array}], S_{3} = [\begin{array}{l} s_{11}^{3} & s_{12}^{3} & \dots & s_{1 n}^{3} \\ s_{21}^{3} & s_{22}^{3} & \dots & s_{2 n}^{3} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{3} & s_{m 2}^{3} & \dots & s_{m n}^{3} \end{array}]

(5)

where m and n represent the number of time samples and frequency bins of the vibration signal, respectively.

(2) Based on equation (6), the time-frequency coefficient matrices S₁, S₂, and S₃ are normalized to the grayscale image pixel value range of [0, 255].

s_{m n}^{i *} = 255 \times (s_{m n}^{i} - s_{\min}^{i}) / (s_{\max}^{i} - s_{\min}^{i})

(6)

where sⁱ_max and sⁱ_min denote the maximum and minimum values in the time-frequency coefficient matrix of the i-th vibration signal, respectively; i = 1, 2, 3 corresponds to the i-th sensor.

(3) The normalized matrix s^i*_mn is rounded and converted into an 8-bit unsigned integer type, which is commonly used for image storage. The resulting pixel matrices are denoted as R, G, and B.

R = [\begin{array}{l} s_{11}^{1 *} & s_{12}^{1 *} & \dots & s_{1 n}^{1 *} \\ s_{21}^{1 *} & s_{22}^{1 *} & \dots & s_{2 n}^{1 *} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{1 *} & s_{m 2}^{1 *} & \dots & s_{m n}^{1 *} \end{array}], G = [\begin{array}{l} s_{11}^{2 *} & s_{12}^{2 *} & \dots & s_{1 n}^{2 *} \\ s_{21}^{2 *} & s_{22}^{2 *} & \dots & s_{2 n}^{2 *} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{2 *} & s_{m 2}^{2 *} & \dots & s_{m n}^{2 *} \end{array}], B = [\begin{array}{l} s_{11}^{3 *} & s_{12}^{3 *} & \dots & s_{1 n}^{3 *} \\ s_{21}^{3 *} & s_{22}^{3 *} & \dots & s_{2 n}^{3 *} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1}^{3 *} & s_{m 2}^{3 *} & \dots & s_{m n}^{3 *} \end{array}]

(7)

(4) The pixel matrices R, G, and B are concatenated along the channel dimension to form the fused information matrix, denoted as F = cat(R, G, B). This fused matrix F is then mapped into a fused feature map, thereby enabling multi-channel information fusion.

Figure 4.

Multi-channel information fusion method.

3.3. Domain adaptation module

Another advantage of applying deep learning to fault diagnosis of mechanical equipment lies in the model’s inherent complexity, which endows it with feature transfer learning capabilities. This allows the model to effectively adapt to discrepancies between source and target domain data, thereby addressing diagnostic challenges under varying operating conditions, different fault severities, different but related equipment or simulation models, and incomplete information sources (H. Zheng et al., 2019). Variations in factors such as operating conditions often lead to distribution differences between the training dataset (source domain) and the testing dataset (target domain). Such discrepancies significantly degrade the generalization performance of classifiers trained solely on the source domain when applied to the target domain. The concept of transfer learning aims to address this issue by leveraging deep neural networks to map both source and target domains into a shared feature space, in which the distribution discrepancy is minimized, thereby reducing domain divergence.

The cross-domain diagnostic model builds upon conventional deep neural networks by introducing an adaptation layer between the feature extraction module and the classifier to quantify the discrepancy between source and target domain data (Mao et al., 2022). The introduction of the adaptation layer shifts the network’s optimization objective from minimizing classification error alone to jointly minimizing both the classification error and the domain discrepancy loss (Shao and Kim, 2024):

ζ = ζ_{C} + λ ζ_{D} (x^{s}, x^{t})

(8)

For the classification error, this paper employs the cross-entropy loss function $ζ_{C}$ to measure the discrepancy between the predicted labels ${\hat{ƛ}}_{c}$ and the true labels $ƛ_{c}$ . The loss function $ζ_{C}$ can be expressed as:

ζ_{C} = - \sum_{c}^{N u m_{c l a s s}} ƛ_{c} \log {\hat{ƛ}}_{c}

(9)

where

N u m_{c l a s s}

denotes the number of fault categories, and the predicted labels

{\hat{ƛ}}_{c}

are gained by activating the features with softmax function.

For the domain discrepancy loss, this study adopts a strategy that combines both global and local distribution differences to achieve a comprehensive measure of domain divergence. Specifically, the global distribution discrepancy is measured using MMD, while the local discrepancy is quantified using LMMD. It can be represented as:

{\begin{cases} ζ_{D} (x^{s}, x^{t}) = M M D (x^{s}, x^{t}) + L M M D (x^{s}, x^{t}) \\ M M D (x^{s}, x^{t}) = {‖ \frac{1}{n} \sum_{x_{i} \in D_{s}} Φ (x_{i}^{s}) - \frac{1}{m} \sum_{x_{j} \in D_{t}} Φ (x_{j}^{t}) ‖}_{H_{k}}^{2} \\ L M M D (x^{s}, x^{t}) = \frac{1}{C} \sum_{c = 1}^{C} {‖ \sum_{x_{i} \in D_{s}} w_{i}^{s c} Φ (x_{i}^{s}) - \sum_{x_{j} \in D_{t}} w_{j}^{t c} Φ (x_{j}^{t}) ‖}_{H_{k}}^{2} \end{cases}

(10)

where

x_{i}^{s}

and

x_{j}^{t}

represent the samples from the source domain and the target domain, respectively.

w_{i}^{s c}

and

w_{j}^{t c}

are, respectively, the weights of

x_{i}^{s}

and

x_{j}^{t}

belonging to category c.

Φ (\cdot)

provides a mapping from the data to the reproducing kernel Hilbert space. n and m denote the number of samples in the source domain and target domain, respectively.

In summary, this study proposes a method that integrates multi-channel information fusion with global-local discrepancy measurement to achieve cross-domain fault diagnosis of planetary gearboxes under complex operating conditions. The main steps are as follows:

(1) Multi-channel vibration signals of the gearbox are collected under various operating conditions and fault states. These signals are converted into equal-length time series samples in MATLAB.

(2) The time series samples are downsampled to half of their original sampling frequency. The SSST is then applied to transform the single-channel signals into time-frequency coefficient matrices, where the width represents temporal information and the height represents frequency information.

(3) The coefficient matrices obtained from each single channel are resized to a uniform dimension and normalized. The processed grayscale images from individual channels are stacked to form three-channel color images.

(4) A transfer learning-based neural network is constructed under the PyTorch framework, using ResNet50 as the backbone. Cross-domain fault diagnosis is performed on samples from different image domains.

4. Data description

4.1. The planetary gearbox test rig

The feasibility of the proposed method is validated using a planetary gearbox vibration test rig with preset faults. The test rig consists of a motor, a planetary gearbox, a load motor, and couplings, as shown in Figure 5. By controlling the motor and load motor, different rotational speeds and torques are applied to the gearbox. To simulate the effects of load variation and fault-induced excitation under real operating conditions, typical faults such as pitting, broken tooth, and cracks are artificially introduced into the planetary gears, as shown in Figure 6.

Figure 5.

Planetary gearbox test rig.

Figure 6.

Three health states of tested gears: (a) Normal, (b) Pitting, (c) Broken tooth, (d) Crack.

4.2. Data acquisition and preprocessing

Acceleration sensors are used to collect vibration signals from the gearbox under various combinations of rotational speeds and torques, with different fault types introduced. The signals are measured in three directions: radial-horizontal, radial-vertical, and axial. The sensor placement locations are illustrated in Figure 5. The sampling frequency is set to 8192 Hz, and the duration of each sampling period is 20 seconds.

Vibration data are collected under various operating conditions and fault states. A sliding window with random sampling is employed to convert the raw data into datasets consisting of 200 equal-length time series. Each sample contained 16,384 data points (corresponding to 2 seconds), with an overlap of 8192 points between adjacent windows. To enhance the time-frequency resolution of the spectrograms without losing critical frequency components, the signals are downsampled to a bandwidth of 2048 Hz. The basic information of each dataset is summarized in Table 2.

Table 2.

Details of experimental datasets.

Dataset number	Rotation speed (r/min)	Load (N.m)	Fault category	Processing method	Channel
1	150	100	Normal, Broken tooth, Pitting, Crack.	SSST	A + B + C
2	200	100		SSST	A + B + C
3	300	100		SSST	A + B + C
4	Sine variation	100		SSST	A + B + C
5	Linear variation	100		SSST	A + B + C
6	150	100		ST	A + B + C
7	Linear variation	100		ST	A + B + C
8	150	100		/	A + B + C
9	Linear variation	100		/	A + B + C
10	150	100		SSST	A/B/C
11	Linear variation	100		SSST	A/B/C

Furthermore, SSST is applied to transform each dataset into a set of time-frequency images with a fixed size of 192 × 192, which are used as inputs to the neural network for fault diagnosis of planetary gearboxes under variable operating conditions. Channels A, B, and C represent the three signal acquisition directions. The fault categories considered include broken tooth, normal, pitting, and crack. Taking the normal signal from Dataset 1 as an example, the preprocessing procedure is illustrated in Figure 7.

Figure 7.

Data preprocessing process.

5. Experimental verification of cross-domain fault diagnosis

In this section, the effectiveness of the proposed method is validated under both constant and variable speed conditions. Additionally, the effects of the number of channels and different feature extraction methods on the diagnostic performance are analyzed. It is worth noting that each method is executed ten times per experiment to mitigate the influence of randomness, and the mean and standard deviation are reported. Meanwhile, the key hyperparameters of the training procedure are summarized in Table 3.

Table 3.

Training hyperparameters.

Hyperparameter	Value
Optimizer	SGD with momentum 0.9
Initial learning rate	0.001
Learning rate schedule	StepLR, decay factor 0.1 every 300 iterations
Batch size	64
Total iterations	1000
Stopping criterion	Fixed at 1000 iterations
Trade-off coefficient λ in equation (8)	Progressive strategy: λ = 2/(1+exp(−10p))−1, p∈[0,1]
Kernel function	Gaussian kernel

5.1. Cross-domain fault diagnosis at constant rotational speed

Dataset 1 is used as the source domain, while Datasets 2 and 3 served as target domains, to investigate the impact of rotational speed variation on fault recognition performance under constant-speed conditions. The source and target domain data are respectively input into the proposed transfer learning network and the original ResNet50 network for comparative testing. Figure 8 illustrates the variation of diagnostic accuracy with the number of training iterations in the first trial.

Figure 8.

Diagnostic accuracy convergence curve under constant rotational speed.

As observed in the figure, the ResNet50 network without transfer learning achieves faster convergence, but due to its limited transferability, the final stable accuracy remains below 70%. In contrast, with identical network structures, the recognition accuracy when transferring from Dataset 1 to Dataset 2 is higher than that from Dataset 1 to Dataset 3. This is mainly attributed to the greater operational discrepancy between Datasets 1 and 3 compared to Datasets 1 and 2, which hinders effective feature space alignment and domain discrepancy minimization within the network. Fine-tuning network parameters and hyperparameters can moderately improve fault classification performance.

To further explore recognition accuracy under different transfer directions, Datasets 2 and 3 are also used as source domains. The results are presented in Figure 9. In the plot, “2-1” denotes transfer from Dataset 2 to Dataset 1, and the rest follow the same convention. It can be clearly observed that the proposed model consistently achieves high accuracy across various transfer tasks, demonstrating strong generalization capability.

Figure 9.

Comparison of cross-domain diagnostic accuracy under constant rotational speed.

5.2. Cross-domain fault diagnosis at variable speeds

Gearboxes often operate under varying conditions when faults occur. To simulate the speed ramp-up and fluctuating conditions encountered during real-world operations, Dataset 1 is selected as the source domain, while Datasets 4 and 5 are used as target domains to investigate the fault diagnosis performance when transferring from constant speed to linearly and sinusoidally varying speeds. Ten repeated experiments are conducted, and the results are shown in Figure 10.

Figure 10.

Comparison of recognition accuracy under different rotational speed changing trends.

As illustrated in the figure, during repeated experiments, the network exhibited relatively small fluctuations in recognition accuracy for Dataset 5, achieving an average accuracy of up to 88.57% ± 1.33%, which is comparable to the accuracy obtained under constant-speed target domain conditions. This indicates that the proposed method offers good classification stability under linearly varying speed conditions. In contrast, when the target domain featured sinusoidal speed variation, the recognition accuracy fluctuated more significantly, with an average accuracy of only around 65.53% ± 5.54%. The relatively lower accuracy under sinusoidal speed variation is primarily attributed to two factors. First, the source domain is collected at a constant speed of 150 r/min, whereas the target domain exhibits a considerably broader speed range, which significantly amplifies the marginal distribution shift. Second, the relatively low frequency of sinusoidal speed variation causes samples of the same fault category to fall within different segments of the speed profile, thereby increasing intra-class variability.

5.3. The impact of channel number on diagnosis performance

To investigate the extent to which multi-channel samples improve fault recognition accuracy compared to single-channel samples, new transfer tasks are established as summarized in Table 4. The fused three-channel samples and the three types of single-channel samples are respectively fed into the transfer learning network. Figure 11 illustrates the variation in diagnostic accuracy with the number of iterations under four different transfer tasks.

Table 4.

Transfer tasks under different channel numbers.

Transfer task	Source domain dataset	Target domain dataset	Number of channels
IV	Dataset 1	Dataset 2	3(A + B + C)
V	Dataset 1	Dataset 2	1(A)
VI	Dataset 10	Dataset 11	1(B)
VII	Dataset 10	Dataset 11	1(C)

Figure 11.

Comparison of recognition accuracy under different numbers of channels.

As shown in the figure, it is evident that the diagnostic accuracy achieved with the fused three-channel samples is consistently higher than that obtained with each of the three single-channel sample types. Since the network parameters tend to stabilize after 800 iterations, the average diagnostic accuracy during the 800 th to 1000 th iterations are calculated to quantitatively compare the performance differences across the four transfer tasks.

To further evaluate the contribution of multi-channel fusion, the accuracy improvement rate of the three-channel samples over single-channel samples is defined as:

α = [M A_{4} - \frac{M A_{5} + M A_{6} + M A_{7}}{3})] / (\frac{M A_{6} + M A_{6} + M A_{7}}{3})

(11)

Specifically, the average accuracies for Transfer Tasks IV, V, VI, and VII (denoted as MA₄, MA₅, MA₆, and MA₇, respectively) are calculated to be 91.16%, 78.61%, 85.44%, and 79.47%. Based on these results, the accuracy improvement rate of the fused three-channel samples over the three types of single-channel samples is quantified as 12.3%.

5.4. Comparative study and ablation analysis

To comprehensively evaluate the proposed method, three groups of comparative and ablation experiments are designed in this section: (1) comparison with mainstream domain adaptation methods, (2) ablation study on the domain discrepancy metric strategy, and (3) comparison of different feature extraction methods. All experiments use the same ResNet-50 backbone, and results are reported as mean and standard deviation over ten independent runs with different random seeds.

5.4.1. Comparison with mainstream domain adaptation methods

To validate the effectiveness of the proposed framework, five representative domain adaptation methods are reproduced under the same backbone and input settings for fair comparison: DDC (Tzeng et al., 2014), DAN (Long et al., 2015), Deep CORAL (Sun and Saenko, 2016), DANN (Ganin et al., 2016), and DSAN (Zhu et al., 2021). All methods use the same SSST-based three-channel fused images as input. Three transfer tasks with increasing difficulty are selected: Dataset 1→Dataset 2, Dataset 1→Dataset 3, and Dataset 1→Dataset 5. The results are summarized in Table 5.

Table 5.

Comparison with different domain adaptation methods.

Method	1-2	1-3	1-5	Average
ResNet-50 (No adaptation)	69.89 ± 0.52	50.32 ± 0.36	56.74 ± 1.45	58.98
DDC	76.25 ± 0.43	58.64 ± 0.52	63.82 ± 1.48	66.24
DAN	81.37 ± 0.35	65.48 ± 0.44	71.25 ± 2.39	72.7
Deep CORAL	79.82 ± 0.41	63.15 ± 0.49	69.43 ± 1.42	70.8
DANN	83.46 ± 0.38	68.73 ± 0.56	74.52 ± 2.51	75.57
DSAN	86.53 ± 0.28	74.26 ± 0.37	80.18 ± 2.33	80.32
Proposed	92.10 ± 0.12	82.50 ± 0.11	88.57 ± 1.33	87.72

Bold values denote the best results among all compared methods

As shown in Table 5, the no-adaptation baseline yields the lowest accuracy across all tasks, with an average of only 58.98%, confirming the significant impact of domain shift on diagnostic performance. DDC achieves limited improvement due to its single-layer single-kernel design. DAN and Deep CORAL show moderate gains through MMD alignment and second-order statistics matching, respectively. DANN achieves competitive results through adversarial training, but exhibits the largest standard deviation among all methods. DSAN performs well by incorporating subdomain-level alignment, reaching an average accuracy of 80.32%. The proposed method consistently achieves the highest accuracy across all transfer tasks, with an average of 87.72%, outperforming the best baseline DSAN by 7.40 percentage points. This superiority can be attributed to the complementary effects of simultaneous global and local alignment, which will be further analyzed in the following ablation study.

5.4.2. Ablation study on domain discrepancy metrics

To quantitatively evaluate the contribution of each discrepancy term in the proposed method, four configurations are compared under the same backbone and input settings: no adaptation, MMD-only, LMMD-only, and the proposed MMD + LMMD combination. The results are presented in Table 6.

Table 6.

Ablation study on domain discrepancy metrics.

Method	1-2	1-3	1-5	Average
No adaptation	69.89 ± 0.52	50.32 ± 0.36	56.74 ± 1.45	58.98
MMD-only	82.64 ± 0.31	66.73 ± 0.40	73.42 ± 0.36	74.26
LMMD-only	87.15 ± 0.24	75.48 ± 0.32	81.36 ± 0.29	81.33
MMD + LMMD (Proposed)	92.10 ± 0.12	82.50 ± 0.11	88.57 ±1.33	87.72

Bold values denote the best results among all compared methods

Several observations can be drawn from Table 6. First, both MMD-only and LMMD-only substantially outperform the no-adaptation baseline, confirming the effectiveness of domain alignment. Second, LMMD-only achieves higher accuracy than MMD-only across all three tasks, with an average improvement of 7.07 percentage points, indicating that class-conditional alignment plays a more critical role than marginal distribution alignment in the cross-domain fault diagnosis of planetary gearboxes. Third, the proposed MMD + LMMD combination further improves the average accuracy to 87.72%, surpassing MMD-only and LMMD-only by 13.46 and 6.39 percentage points, respectively. This demonstrates that global alignment and local alignment address different aspects of domain shift: MMD reduces the overall marginal distribution discrepancy, while LMMD refines the alignment at the subdomain level. Their integration leads to more comprehensive domain confusion and thus yields the best diagnostic performance.

5.4.3. Impact of feature extraction methods

To further investigate the impact of different feature extraction methods on planetary gearbox fault identification performance, a new set of transfer learning tasks is established using Dataset 1 as the source domain and Dataset 5 as the target domain. The preprocessed multi-channel time-frequency images generated by SSST, ST, and raw data are employed as network inputs, respectively. The fault identification accuracy for each fault category under different feature processing methods is illustrated in Figure 12. As can be observed from Figure 12, the proposed method maintains high and stable diagnostic accuracy across all fault categories, achieving an average accuracy of 88.57% with a standard deviation of 1.33%. In contrast, the method employing ST as the feature extraction approach yields a lower diagnostic accuracy of 75.43% ± 3.31%, while the method using raw data as input achieves only 61.38% ± 3.46%.

Figure 12.

Comparison of recognition accuracy under different feature extraction methods.

To intuitively demonstrate the superiority of the proposed method in fault classification, t-distributed stochastic neighbor embedding (t-SNE) is employed to project the output features of the network’s fully connected layer into a two-dimensional space for each transfer task. As shown in Figure 13, when no feature processing or domain adaptation is applied, the similarity between the source and target domains is low, and the inter-class separability within the target domain is poor.

Figure 13.

Comparison of t-SNE visualization under different feature extraction methods.

By incorporating ST with transfer learning, both the inter-domain similarity and inter-class separability are notably improved. Furthermore, the combination of SSST and the transfer learning network achieves even better performance, clearly distinguishing different fault types across both domains. It also minimizes the domain shift for the same fault class between the source and target domains, resulting in the largest inter-class distance and the smallest intra-class distance.

5.5. Validation on a public dataset

To further verify the generalizability of the proposed method, additional experiments are conducted on the publicly available SEU gearbox dataset. This dataset was collected from a drivetrain dynamic simulator under two rotational speeds (20 Hz and 30 Hz) and two load levels (0 V and 2 V), covering five gear health states. Following the same preprocessing pipeline described in Section 4.2, the raw vibration signals from three measurement directions are transformed into SSST-based three-channel fused images and used as network input. Two cross-domain transfer tasks with different speed and load combinations are constructed as summarized in Table 7.

Table 7.

Transfer tasks on the SEU gearbox dataset.

Transfer task	Source domain	Target domain
A	20 Hz, 0 V	30 Hz, 2 V
B	30 Hz, 2 V	20 Hz, 0 V

The proposed method is compared with three representative domain adaptation baselines under the same ResNet-50 backbone and input settings. The results are presented in Table 8. As shown in the table, DAN and DANN exhibit low accuracy, with average values of 80.85% and 82.89%, respectively. DSAN improves accuracy to 90.60% by introducing subdomain-level alignment. The proposed method achieves the highest average accuracy of 97.20%, outperforming DSAN by 6.6 percentage points. This improvement can be attributed to the complementary advantages of multi-channel SSST fusion, which provides richer fault-related time-frequency features, and the global-local discrepancy metric, which achieves more comprehensive domain alignment.

Table 8.

Results on the SEU gearbox dataset.

Method	A	B	Average
DAN	82.40 ± 0.65	79.30 ± 0.84	80.85
DANN	84.15 ± 0.38	81.62 ± 0.76	82.89
DSAN	91.80 ± 0.48	89.40 ± 0.37	90.60
Proposed	97.80 ± 0.22	96.60 ± 0.16	97.20

Bold values denote the best results among all compared methods

The results on the SEU dataset are consistent with the findings on the self-built test rig presented in Sections 5.1-5.4, confirming that the proposed method generalizes well across different experimental platforms, fault types, and operating conditions.

6. Conclusions

This paper proposes a cross-domain fault diagnosis method for gearboxes based on multi-channel information fusion. The method transforms raw multi-channel vibration signals under different operating conditions into three-channel time-frequency representations using the SSST, and employs a global-local domain discrepancy measurement strategy to achieve cross-domain feature alignment and domain confusion. Finally, experimental validation is conducted on a planetary gearbox vibration test rig. The main conclusions are as follows:

(1) Multi-channel information fusion combined with a global-local domain discrepancy measurement strategy enables cross-domain fault diagnosis under various operating conditions, including constant and variable rotational speeds. The smaller the speed difference between the source and target domains, the higher the diagnostic accuracy.

(2) Compared with single-channel samples, the fused three-channel samples provide significantly higher diagnostic accuracy when used as input to the network, with an overall improvement rate of up to 12.3%.

(3) The use of SSST for processing time-varying vibration signals yields more stable and accurate diagnostic results compared to other methods such as the ST. When integrated with a deep transfer learning network, this approach can effectively reduce the adverse impact of working condition variations on diagnosis performance.

It should be noted that this study adopts a single network model for transfer learning-based fault diagnosis. The effects of different network architectures and transfer tasks under larger rotational speed differences have not been thoroughly investigated. Future work will further explore fault diagnosis under more complex operating conditions and varying fault severity levels.

Footnotes

Acknowledgment

The authors would like to thank the editor and referees for their valuable comments.

ORCID iD

Jiayang Liu

Author contributions

Xiaoping Ding: Methodology, formal analysis, and writing—original draft.

Jiawei Yuan: Software, formal analysis, and writing—original draft.

Long Zhang: Conceptualization, supervision, and writing—review and editing.

Xuetong Li: Formal analysis and writing—original draft.

Bing Zhou: Formal analysis.

Jiayang Liu: Supervision, methodology, formal analysis, and writing—review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences Foundation of Jiangxi Province (Grant No. 20252BAC20008), Open Topic of the Hunan Engineering Research Center of Precision Manufacturing Technology for Rotating Components of Railway Vehicles (Grant No. KFJJ2025101), National Natural Science Foundation of China (Grant No. 52565011), Early-Career Young Scientists and Technologists Project of Jiangxi Province (Grant No. 20244BCE52159), and Research Project of State Key Laboratory of Mechanical System and Vibration (Grant No. MSV202508).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Alabsi

Pearlstein

Franco-Garcia

(2024) Cross domain fault diagnosis based on generative adversarial networks. Journal of Vibration and Control 30(13–14): 3184–3194. https://doi.org/10.1177/10775463231191679

Azamfar

Singh

Bravo-Imaz

, et al. (2020) Multisensor data fusion for gearbox fault diagnosis using 2-D convolutional neural network and motor current signature analysis. Mechanical Systems and Signal Processing 144: 106861. https://doi.org/10.1016/j.ymssp.2020.106861

Cao

Shao

Zhong

, et al. (2022) Unsupervised domain-share CNN for machine fault transfer diagnosis from steady speeds to time-varying speeds. Journal of Manufacturing Systems 62: 186–198. https://doi.org/10.1016/j.jmsy.2021.11.016

Chen

Zheng

(2023) Fault diagnosis of wind turbine based on multi-signal CNN-GRU model. Proceedings of the Institution of Mechanical Engineers, Part A: Journal of Power and Energy 237(5): 1113–1124. https://doi.org/10.1177/09576509231151482

Ganin

Ustinova

Ajakan

, et al. (2016) Domain-adversarial training of neural networks. Journal of Machine Learning Research 17(59): 1–35.

Guo

Zhen

, et al. (2023) Multi-sensor data fusion for rotating machinery fault detection using improved cyclic spectral covariance matrix and motor current signal analysis. Reliability Engineering & System Safety 230: 108969. https://doi.org/10.1016/j.ress.2022.108969

Han

Feng

Zhang

, et al. (2024) Intelligent fault diagnosis of planetary gearbox across conditions based on subdomain distribution adversarial adaptation. Sensors 24(21): 21. https://doi.org/10.3390/s24217017

Zhang

Ren

, et al. (2016) Deep Residual Learning for Image Recognition, 770–778. https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html

Zhao

Sun

, et al. (2021) Multireceptive field graph convolutional networks for machine fault diagnosis. IEEE Transactions on Industrial Electronics 68(12): 12739–12749. https://doi.org/10.1109/tie.2020.3040669

10.

Liu

Yan

, et al. (2023) Research on diesel engine fault status identification method based on synchro squeezing S-Transform and vision transformer. Sensors 23(14): 14. https://doi.org/10.3390/s23146447

11.

Gao

, et al. (2024) Fault transfer diagnosis of rolling bearings across different devices via multi-domain information fusion and multi-kernel maximum mean discrepancy. Applied Soft Computing 159: 111620. https://doi.org/10.1016/j.asoc.2024.111620

12.

Liang

Wang

Jiang

, et al. (2023) Unsupervised fault diagnosis of wind turbine bearing via a deep residual deformable convolution network based on subdomain adaptation under time-varying speeds. Engineering Applications of Artificial Intelligence 118: 105656. https://doi.org/10.1016/j.engappai.2022.105656

13.

Liu

Xie

Zhang

, et al. (2023) A multisensory time-frequency features fusion method for rotating machinery fault diagnosis under nonstationary case. Journal of Intelligent Manufacturing.

14.

Liu

Zhang

Xie

, et al. (2023) Incipient fault detection of planetary gearbox under steady and varying condition. Expert Systems with Applications 233: 121003. https://doi.org/10.1016/j.eswa.2023.121003

15.

Liu

Wan

Xie

, et al. (2024) Cross-machine deep subdomain adaptation network for wind turbines fault diagnosis. Mechanical Systems and Signal Processing 210: 111151. https://doi.org/10.1016/j.ymssp.2024.111151

16.

Liu

Wang

, et al. (2025) A novel multiscale adaptive graph adversarial network for mechanical fault diagnosis. Knowledge-Based Systems 309: 112787. https://doi.org/10.1016/j.knosys.2024.112787

17.

Long

Cao

Wang

, et al. (2015) Learning transferable features with deep adaptation networks. In: In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 97–105.

18.

Mao

Jia

, et al. (2022) Interactive dual adversarial neural network framework: an open-set domain adaptation intelligent fault diagnosis method of rotating machinery. Measurement 195: 111125. https://doi.org/10.1016/j.measurement.2022.111125

19.

Meng

Kang

Chi

, et al. (2020) Intelligent fault diagnosis of gearbox based on multiple synchrosqueezing S-Transform and convolutional neural networks. International Journal of Performability Engineering 16(4): 528. https://doi.org/10.23940/ijpe.20.04.p4.528536

20.

Misbah

Lee

CKM

Keung

(2024) Fault diagnosis in rotating machines based on transfer learning: literature review. Knowledge-Based Systems 283: 111158. https://doi.org/10.1016/j.knosys.2023.111158

21.

Peng

Wang

Liu

, et al. (2020) Multibranch and Multiscale CNN for fault diagnosis of wheelset bearings under strong noise and variable load condition. IEEE Transactions on Industrial Informatics 16(7): 4949–4960. https://doi.org/10.1109/tii.2020.2967557

22.

Peng

Shao

Yan

, et al. (2025) A systematic review on interpretability research of intelligent fault diagnosis models. Measurement Science and Technology 36: 012009. https://doi.org/10.1088/1361-6501/ad99f4

23.

Peng

Shao

Xiao

, et al. (2026) Dual-stage interpretable domain generalization fault diagnosis: integrating prior knowledge and gradient-weighted class activation mapping. Engineering Applications of Artificial Intelligence 166: 113655. https://doi.org/10.1016/j.engappai.2025.113655

24.

Qian

Qin

Luo

, et al. (2023) Deep discriminative transfer learning network for cross-machine fault diagnosis. Mechanical Systems and Signal Processing 186: 109884. https://doi.org/10.1016/j.ymssp.2022.109884

25.

Shao

Kim

C-S

(2024) Adaptive multi-scale attention convolution neural network for cross-domain fault diagnosis. Expert Systems with Applications 236: 121216. https://doi.org/10.1016/j.eswa.2023.121216

26.

Stockwell

Mansinha

Lowe

(1996) Localization of the complex spectrum: the S transform. IEEE Transactions on Signal Processing 44(4): 998–1001. https://doi.org/10.1109/78.492555

27.

Sun

Saenko

(2016) Deep CORAL: correlation alignment for deep domain adaptation. In: Hua

Jégou

(eds) Computer Vision – ECCV 2016 Workshops. Springer International Publishing, pp. 443–450.

28.

Tzeng

Hoffman

Zhang

, et al. (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474.

29.

Wang

Fang

Wang

, et al. (2023) A novel time-frequency analysis method for fault diagnosis based on generalized S-transform and synchroextracting transform. Measurement Science and Technology 35(3): 036101. https://doi.org/10.1088/1361-6501/ad0e59

30.

Wang

Jiang

, et al. (2025) Fault diagnosis based on convolutional autoencoders combined with multivariate information fusion. Journal of Vibration and Control 1–15. Available at: https://doi.org/10.1177/10775463251353553

31.

Zhang

Ding

, et al. (2024) Multi-modal data cross-domain fusion network for gearbox fault diagnosis under variable operating conditions. Engineering Applications of Artificial Intelligence 133: 108236. https://doi.org/10.1016/j.engappai.2024.108236

32.

Zheng

Wang

Yang

, et al. (2019) Cross-Domain fault diagnosis using knowledge transfer strategy: a review. IEEE Access 7: 129260–129290. https://doi.org/10.1109/access.2019.2939876

33.

Zheng

Wei

Liu

, et al. (2020) Multi-synchrosqueezing S-transform for fault diagnosis in rolling bearings. Measurement Science and Technology 32(2): 025013. https://doi.org/10.1088/1361-6501/abb620

34.

Zhu

Zhuang

Wang

, et al. (2021) Deep subdomain adaptation network for image classification. IEEE Transactions on Neural Networks and Learning Systems 32(4): 1713–1722. https://doi.org/10.1109/TNNLS.2020.2988928

35.

Zhu

Lei

, et al. (2023) A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Measurement 206: 112346. https://doi.org/10.1016/j.measurement.2022.112346