Dynamics-assisted unsupervised domain adaptation method for rolling bearing fault diagnosis under noisy conditions

Abstract

Real-time and accurate fault diagnosis of rolling bearings is crucial for the safe operation of rotating machinery. Deep learning technology has gained widespread applications in fault diagnosis due to its capability in vast data analysis and complex nonlinear modeling. However, such data-driven approaches are highly dependent on both data quantity and quality. In practical applications, they suffer from significant performance degradation due to scarce labeled data and noisy measurements. To address these issues, this paper proposes a dynamics-assisted unsupervised domain adaptation method for rolling bearing fault diagnosis. First, a dynamic model of rolling bearings is established and a parameter identification method is developed to determine its critical physical parameters, which enable the generation of high-fidelity labeled fault simulation data. Then, an adversarial unsupervised domain adaptation framework is constructed to mitigate the distribution discrepancy between simulated and measured data. Meanwhile, a deep learning model incorporating a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is proposed to extract domain-invariant fault features by capturing multi-scale modal characteristics, enhancing fault-related features, and suppressing noise. The effectiveness of the proposed method is validated on two public datasets under various target-domain data availability conditions, achieving average accuracies of 81.09% and 85.00%, respectively, and outperforming the best-performing method among other advanced methods by 2.27% and 3.83%. Under −5 dB noise, the improvements further increase to 13.12% and 11.65%, respectively.

Keywords

rolling bearing fault diagnosis domain adaptation dynamic model noise robustness

1. Introduction

Rolling bearings are among the most critical components in mechanical systems, as their health directly affects the safe operation of the entire system. However, operating in complex and dynamic environments with high noise levels, extreme temperature variations, and significant pressure fluctuations (Zhao and Guo, 2024), they are prone to failure. This highlights the critical importance of accurate and reliable fault diagnosis for maintaining safety and efficiency (Hoang and Kang, 2019).

Vibration signals reflect bearing fault characteristics, but their nonlinearity and non-stationarity complicate the direct extraction of fault information. Time-domain or frequency-domain methods are limited in capturing both temporal and spectral characteristics. Variational Mode Decomposition (VMD), as a time-frequency signal processing method, provides a robust approach by adaptively decomposing signals into intrinsic modes (Cui et al., 2021; Ni et al., 2022; Song et al., 2023).

In recent years, deep learning methods have been widely applied to fault classification following signal processing (Chen et al., 2023). However, many models require sufficient labeled samples, which are often scarce in engineering due to cost and safety constraints. Deep forest methods leverage multi-grained scanning and feature selection to improve feature representation and robustness, which enables relatively good performance with small labeled datasets (Shao et al., 2025). Data augmentation techniques provide a more effective solution by generating additional fault samples to supplement the limited labeled data (Hei et al., 2025). Numerical-based data augmentation (Chawla et al., 2002; Wang et al., 2023a) processes existing fault data through interpolation, transformation, and reconstruction to expand datasets but may fail to preserve the underlying distribution, leading to overfitting on minority samples (Zhang et al., 2020). Deep learning-based data augmentation (Akhenia et al., 2022; Shah et al., 2026; Zhang et al., 2024) creates diverse labeled synthetic fault samples using models such as Generative Adversarial Networks and Variational Autoencoders but often suffer from mode collapse and lack high-frequency details (Miao et al., 2022). Physics-based data augmentation methods (Qin et al., 2024; Shi et al., 2024; Wang et al., 2023b), by simulating bearing faults via dynamic models, generate high-fidelity labeled vibration signals and reduces reliance on measured data. Among them, the two-degree-of-freedom dynamic model effectively represents bearing dynamics and fault features while allowing efficient parameter tuning (Qin et al., 2022).

The variation in operating conditions and the intrinsic complexity of bearings cause a distribution mismatch between simulated and measured data, thereby degrading diagnostic performance (Xiao et al., 2022). Unsupervised domain adaptation methods offer an effective means to mitigate this issue. Maximum Mean Discrepancy (MMD) minimizes feature distance across domains (Li et al., 2019), while correlation alignment loss further improves alignment (Li et al., 2020b). Domain-Adversarial Neural Networks (DANN) incorporate a domain discriminator to handle complex, nonlinear shifts. Attention mechanisms have been integrated with DANN to enhance feature learning quality (Wu et al., 2022), while MMD has been applied to adjust feature distributions across domains and improve diagnostic performance (Wan et al., 2022). Discriminative Adversarial Category Discrepancy (DACD) builds upon the DANN by incorporating category information, with the aim of further reducing category-level discrepancy between domains. Huo et al. (2023) introduced a class-level transfer learning network incorporating class information to align source and target distributions of identical fault categories. These domain adaptation methods aim to enable diagnostic models to learn domain-invariant feature representations. However, they face limitations in complex and variable noise environments, hindering accurate fault feature extraction.

To address the aforementioned issues, this paper proposes a dynamics-assisted unsupervised domain adaptation method for rolling bearing fault diagnosis under noisy conditions. The innovations and contributions of the proposed method are summarized as follows:

(1) A dynamic modeling method with parameter identification is developed to enable physics-constrained data generation for rolling bearings. This produces high-fidelity simulated data covering various fault types and effectively mitigates the scarcity of labeled training data in bearing fault diagnosis.

(2) A dynamics-assisted domain adaptation framework is proposed to achieve unsupervised bearing fault diagnosis. The framework integrates a domain-adversarial strategy and leverages high-fidelity simulated data to bridge the simulation-to-reality gap, preserving fault-relevant features and effectively handling nonlinear distribution discrepancies.

(3) A deep learning model incorporating a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is designed. The proposed model combines adaptive noise suppression with fault-related feature enhancement, improving feature extraction robustness and diagnostic performance under strong noise.

The rest of this article is structured as follows: Section 2 establishes the dynamic model of rolling bearings and develops a parameter identification method. An unsupervised domain-adaptive fault diagnosis method is presented in Section 3. Section 4 discusses the experimental verification conducted on two public datasets. Several concluding remarks are drawn in Section 5.

2. Dynamics-assisted data generation

In this section, a dynamic model of rolling bearings is established to generate simulated signals. A parameter identification method is developed to identify the key physical parameters of the model.

2.1. Construction of dynamic model

To characterize the dynamic response of the bearing system, a two-degree-of-freedom dynamic model is developed, as shown in Figure 1. The corresponding differential equations are described as (Qin et al., 2020)

{\begin{cases} m x^{″} + c x^{'} + k \sum_{i = 1}^{N_{r}} μ_{i} δ_{i}^{n} \cos θ_{i} = F_{x} \\ m y^{″} + c y^{'} + k \sum_{i = 1}^{N_{r}} μ_{i} δ_{i}^{n} \sin θ_{i} = F_{y} \end{cases}

(1)

Where m represents the mass of the rolling bearing, c is the damping coefficient, k is the contact stiffness,

N_{r}

is the number of rollers,

μ_{i}

is the contact judgment coefficient of the ith roller,

δ_{i}

is the contact deformation of the ith roller, n is the load-deformation coefficient,

θ_{i}

is the angle of the ith roller, and

F_{x}

and

F_{y}

are the bearing load components.

Figure 1.

Dynamic model and defect schematic of rolling bearing.

When the contact deformation is greater than zero, an elastic force is generated at the contact point. The contact deformation of the ith roller can be expressed as

δ_{i} = x \cos θ_{i} + y \sin θ_{i} - γ - H_{d}

(2)

Where γ is the radial clearance and

H_{d}

is the displacement excitation. The displacement excitation of the fault can be defined as

H_{d} = {\begin{cases} H_{\max} \cos (\frac{π (θ_{j} - θ_{i d})}{2 (θ_{o d} - θ_{i d})}) & \frac{θ_{i d}}{2} \leq θ_{j} \leq \frac{θ_{o d}}{2} \\ H_{\max} & θ_{j} \leq \frac{θ_{i d}}{2} \\ 0 & otherwise \end{cases}

(3)

Where

θ_{j}

represents the relative position angle of the fault point, and when

θ_{j}

is zero, the defect is directly aligned with the contact surface.

H_{\max}

is the maximum fault displacement excitation,

θ_{o d}

is the angle at which the roller enters and exits the fault,

θ_{i d}

is the angle at which the roller contacts and leaves the bottom of the fault, and for roller faults,

θ_{i d}

is zero. The maximum fault displacement excitation can be calculated as

H_{\max} = \min (D_{d}, \frac{D_{r}}{2} - \sqrt{{(\frac{D_{r}}{2})}^{2} - {(\frac{L_{d}}{2})}^{2}})

(4)

Where

D_{r}

represents the roller diameter,

L_{d}

is the fault defect size, and

D_{d}

is the defect depth. The relative position angle of the fault point can be defined as

θ_{j} = {\begin{cases} | \mod (θ_{i} - θ_{p} + π, 2 π) - π | & i f outer race fault \\ | \mod (θ_{i} - θ_{r} - θ_{p} + π, 2 π) - π | & i f inner race fault \\ | \mod (θ_{e} - θ_{p} - θ_{i} + \frac{π}{2}, π) - \frac{π}{2} | & i f roller fault \end{cases}

(5)

Where

θ_{p}

is the initial defect position angle,

θ_{r}

is the bearing rotation angle, and

θ_{e}

is the roller defect position angle. During the solution process of the dynamic equations, the outer race defect position angle remains constant at the initial defect position angle, while the inner race defect position angle is consistent with the bearing rotation angle. The rolling element defect position angle varies dynamically with time and can be expressed as

θ_{e} = \frac{D ω_{r} t}{2 d} (1 - {(\frac{d \cos θ_{t}}{D})}^{2})

(6)

Where

θ_{t}

is the contact angle, and

ω_{r}

is the rotational speed of the bearing.

2.2. Identification of dynamic model parameter

Stiffness k and damping coefficient c are critical parameters that directly influence the quality of signal generation. These parameters are typically set empirically, which may result in simulated signals that cannot fully reflect the dynamic characteristics of the bearing. Such simulation-stage errors can propagate to downstream adaptation, affect domain alignment, and increase the risk of negative transfer. To mitigate this issue, a parameter identification method is developed to estimate these parameters more accurately and reduce simulation errors before adaptation.

In this study, the parameters are updated using measured data, transforming the parameter identification task into the following optimization problem

\begin{array}{l} E_{freq} = \sqrt{\frac{2}{N_{f}} \sum_{i = 1}^{\frac{N_{f}}{2}} (Re {(S_{i} (k, c) - M_{i})}^{2} + Im {(S_{i} (k, c) - M_{i})}^{2})} \\ s . t . 10^{8} < k < 10^{10}, 10^{2} < c < 10^{3} \end{array}

(7)

Where

N_{f}

is the number of frequency points, and

S_{i}

and

M_{i}

are the frequency spectra of the simulated and measured signals at the ith frequency point, respectively.

Since the frequency spectrum of the vibration signal reflects the dynamic characteristics of the system, the objective function is defined as the Root Mean Square Error (RMSE) between the simulated and measured spectra, represented as $E_{freq}$ . The objective function is minimized to identify the critical parameters. Due to the large disparity in their value ranges, normalization is performed to enhance optimization stability.

The parameters exhibit a highly nonlinear influence on the objective function, which may lead to multiple local optima during optimization. Differences in structural characteristics and material properties among rolling bearing types result in notable variations in the critical parameters. The Zebra Optimization Algorithm (ZOA) (Trojovska et al., 2022), with an escape mechanism, is effective in avoiding entrapment in local optima and enhancing global search capability. Therefore, ZOA is employed to solve the optimization problem and achieve accurate parameter identification. The parameter identification process is shown in Algorithm 1.

2.3. Validation of dynamic model

The effectiveness of the dynamic model is validated using the Case Western Reserve University (CWRU) bearing dataset (Case Western Reserve University, 2011), where the critical parameters are identified through the proposed parameter identification method. The bearing employed in the validation is the 6205-2RS JEM SKF model, and its structural parameters are listed in Table 1.

Table 1.

Parameters of 6205-2RS JEM SKF.

Parameter	Value	Parameter	Value
Bearing mass (kg)	0.13	Roller diameter (mm)	7.94
Inner ring diameter (mm)	25	Number of rollers	9
Outer ring diameter (mm)	52	Contact angle (°)	0
Pitch diameter (mm)	39.04	Clearance (mm)	3 × 10⁻³

Additionally, other settings for the dynamic model are listed as follows: the bearing load in the x-axis direction is 0 N, the y-axis direction is 12000 N, the bearing rotational speed is 1797 r/min, and the defect depth is 0.05 mm. The parameter identification results in a k of 1.27 × 10⁹ N/m and a c of 139. The nonlinear dynamic equation is solved using the Runge–Kutta method to obtain the vibration response. The initial state set as $x (0) = 1 μ m$ , $x^{'} (0) = 0$ , $y (0) = 1 μ m$ , and $y^{'} (0) = 0$ , and the time step size set as $t = 1 \times 10^{- 5} s$ .

The visual comparisons of time-domain waveforms and envelope spectra between the measured and simulated signals are provided in Figure 2. The time-domain waveform comparisons for three fault types show that the measured and simulated signals exhibit consistent periodic characteristics. The corresponding envelope spectra obtained through Hilbert Transform demodulation clearly reveal the fault characteristic frequencies. At a rotational speed of 1797 r/min (29.95 Hz), the theoretical fault frequencies (Zhang et al., 2023) for the outer race, inner race, and roller of the simulated signal are 107.36 Hz, 162.19 Hz, and 141.17 Hz, respectively. The prominent frequency components of the simulated signal are consistent with both the measured signal and these theoretical values, validating the fidelity of the simulated signal.

Figure 2.

Waveforms and envelope spectra of measured and simulated signals for different fault types.

To further illustrate the effectiveness of the proposed parameter identification method, simulation signals are generated with the identified parameters and the empirical parameters (k = 1.3 × 10¹⁰ N/m, c = 300) (Li et al., 2020a), followed by a comparison between the two signals. The discrepancy between the signals is typically assessed using the Percentage Root Mean Square Difference (PRD), RMSE, and MMD. These metrics quantify the difference in amplitude, overall fit, and distribution between the simulated and measured signals, respectively. The three metrics are defined as

P R D = \sqrt{\frac{\sum_{i = 1}^{N_{d}} {(x_{i} - {\hat{x}}_{i})}^{2}}{\sum_{i = 1}^{N_{d}} x_{i}^{2}}} \times 100 %

(8)

Where

N_{d}

is the data length,

x_{i}

and

{\hat{x}}_{i}

are the ith points of the measured data and simulated data

R M S E = \sqrt{\frac{\sum_{i = 1}^{N_{d}} {(x_{i} - {\hat{x}}_{i})}^{2}}{N_{d}}}

(9)

M M D [F, P_{D}, Q_{D}] = \sup_{f \in F} (E_{x_{D} ˜ P_{D}} [f (x)] - E_{{\hat{x}}_{D} ˜ Q_{D}} [f (\hat{x})])

(10)

Where F is the reproducing kernel Hilbert space generated by a positive definite kernel function, f is a function in F,

P_{D}

and

Q_{D}

are the source and target distributions, and

x_{D}

and

{\hat{x}}_{D}

are the samples extracted from

P_{D}

and

Q_{D}

The simulated signals generated using the identified and empirical parameters are evaluated in both the time and frequency domains to quantify the simulation-to-reality gap, as presented in Table 2. The results show that the signals generated with identified parameters achieve lower PRD, RMSE, and MMD values than those generated with empirical parameters, indicating better agreement with the measured signals in terms of amplitude, overall fitting, and distribution. In addition, a parameter perturbation check indicates that parameter variations can affect the simulated signal characteristics, further supporting the necessity of parameter identification. This demonstrates that the proposed parameter identification method improves simulation fidelity. Based on this method, labeled simulated signals can be generated for different fault types to support deep learning. Since the measured healthy data is readily available, it is used to supplement the healthy type in the simulated data.

Table 2.

Comparison of generated signals with different parameters.

Domain	Parameter type	PRD	RMSE	MMD
Time	Identified parameters	14.8712	0.5731	0.0014
Time	Empirical parameters	18.0585	0.7983	0.0024
Frequency	Identified parameters	9.4658	0.0114	0.1324
Frequency	Empirical parameters	13.2937	0.0157	0.2658

3. Unsupervised domain-adaptive fault diagnosis

The bearing dynamic model, obtained via parameter identification to reduce simulation errors, provides simulated signals with rich label information for deep learning. However, variations in operating conditions and the inherent complexity of bearings lead to a simulation-to-reality gap, which reduces the accuracy when simulated signals are directly applied to fault diagnosis. To address this issue, this section proposes an unsupervised domain adaptation method, as illustrated in Figure 3. The proposed method addresses the domain discrepancy between simulated and measured data using a deep learning model with a dual-discriminator adversarial architecture. The model is trained jointly under a domain-adversarial strategy and employs a domain discriminator to align feature vectors between domains. A feature extraction network is designed using a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism to denoise and enhance domain-invariant features. This approach implements an unsupervised domain adaptation method for rolling bearing fault diagnosis under noisy conditions.

Figure 3.

The overview of the proposed method.

3.1. Decomposition of vibration signal modes

Bearing vibration signals, characterized by periodic impacts and strong noise, can be decomposed by VMD into components with distinct frequency characteristics (Kumar et al., 2022). This approach helps enhance impact-related features while suppressing noise components, effectively mitigating mode mixing and endpoint effects. It improves adaptability and noise robustness in fault feature extraction, while also reducing the risk of overfitting.

The VMD method adaptively decomposes the signal into a set of Intrinsic Mode Functions (IMFs), each representing a multi-scale mode with a distinct bandwidth and center frequency. The IMFs , representing the multi-scale modes, can reveal the local features of fault signals at different frequency scales. This facilitates fault feature extraction and noise mode separation, provides deep learning models with features of clearer physical interpretability, and improves the separability and transferability of learned representations.

3.2. Denoising of multi-scale modes

During the fault diagnosis process, noise interference can blur the key fault features in vibration signals, potentially causing negative transfer and leading to degraded diagnostic accuracy. While classic domain adaptation methods are less effective in suppressing noise interference. To address this, a multi-scale mode denoising network is proposed to capture the local features of multi-scale modes and enhance the model’s noise robustness. The network consists of a compact multi-branch convolutional layer, a cross-branch fusion unit, and a channel-wise dynamic denoising gating mechanism, as shown in Figure 4.

Figure 4.

Process of multi-scale mode denoising.

The compact multi-branch convolutional layer utilizes small-sized convolution kernels to filter out high-frequency noise while capturing the distribution characteristics and structural relationships of vibration signals. The cross-branch fusion unit enhances this capability by using depthwise convolutions and group convolutions to explore the local features of each mode under different receptive fields. This improves the model’s ability to extract spatial structural features, as shown in the following equation

{\begin{cases} X_{i} = Relu (BN (Conv 1 D_{1 \times d_{i}} (I M F s))), I M F s \in R^{L_{m} \times K} \\ S F M s = GDWConv 1 D_{1 \times d_{f}} (Dropout ([X_{1}, X_{2}, \dots, X_{N_{b}}])) \end{cases}

(11)

Where SFMs denotes the collection of multi-scale mode spatial mappings,

X_{i}

represent the collection of spatial modes corresponding to the ith branch,

L_{m}

is the multi-scale mode length, K is the number of multi-scale modes,

d_{i}

is the size of the convolution kernel of the ith branch, and

d_{f}

is the size of the cross-branch fusion convolution kernel. The compact multi-branch convolutional layers and cross-branch fusion unit can extract subtle spatial features and short-term dependencies across multi-scale modes, enhancing the feature representation ability.

A channel-wise dynamic denoising gating mechanism is designed, introducing a learnable scaling factor to adjust the frequency amplitude weights of each channel in the multi-scale mode spatial mappings. This helps suppress noise and enhances the feature channels strongly correlated with fault modes. The weight adjustment range of the adaptive scaling factor is constrained through weight pruning to avoid overfitting. The constrained scaling factor is then element-wise multiplied with the multi-scale mode spatial mappings to achieve optimized channel selection. The denoising process can be formulated as

{\begin{cases} β (w) = w_{\min} + (w_{\max} - w_{\min}) Sigmoid (w) \\ S F M s^{'} = IFFT (β (w) \cdot FFT (S F M s)), S F M s \in R^{L_{m} \times N_{m}} \end{cases}

(12)

Where

β (w)

is the scaling factor adjusted by the weight parameter w,

w_{\min}

and

w_{\max}

are the lower and upper bounds of the weight range,

S F M s^{'}

denotes the collection of multi-scale mode spatial mappings after channel weight adjustment, and

N_{m}

is the number of spatial mappings. The channel-wise dynamic denoising gating mechanism adaptively adjusts the channel weights of the spatial mappings. This enables optimized selection and effective noise suppression of the multi-scale mode spatial mappings.

The compact multi-branch convolutional layer captures multi-scale mode distributions, followed by cross-branch fusion to integrate features and enhance sensitivity to subtle structural patterns. A channel-wise dynamic denoising gate is then applied to refine and suppress noise in fault-related frequency-domain features, thereby implicitly enabling background noise reduction and preliminary extraction of multi-scale mode features. This process can also help mitigate negative transfer to some extent.

3.3. Correlation fusion of mode spatial mappings

To enhance fault feature extraction and achieve the fusion of correlated mode spatial mappings, a novel inverse-embedding cosine similarity attention mechanism is proposed. This mechanism emphasizes spatial mappings relevant to fault types while suppressing irrelevant features. It enhances the relationship modeling between mappings and adaptively adjusts category weights to alleviate class-wise attention imbalance. This approach helps avoid feature pattern overlap among fault types and further enhances noise suppression. The feature fusion process depicted in Figure 5 integrates an inverse embedding layer, a cosine similarity-based attention mechanism, and a convolutional feed-forward extractor, which can be represented as

{\begin{cases} h_{i}^{0} = iEmbedding (s f m_{:, i}^{'}), H = {\{h_{1}, h_{2}, \dots, h_{N_{m}}\}}^{T} \\ H^{j + 1} = SimAM (H^{j}), j = 0, 1, \dots, N_{a} - 1 \\ z = ConvFFE (H^{N_{a}}) \end{cases}

(13)

Where

s f m_{:, i}^{'} \in S F M s^{'}

is the sequence of the ith multi-scale mode spatial mapping,

h_{i}^{0}

is the corresponding spatial mapping embedding vector,

N_{a}

is the number of similarity-based attention mechanism layers, and z is the domain-invariant feature vector.

Figure 5.

Process of spatial mapping correlation fusion.

The unified embedding for each multi-scale mode spatial mapping fails to preserve the distinct scale information (Liu et al., 2023). Therefore, the inverse embedding layer is employed to independently process the time series associated with each spatial mapping. It then employs a multi-layer perceptron to capture the global information of each scale mode. The inverse embedding process is given by

H = MLP (S F M {s^{'}}^{T}), S F M s^{'} \in R^{L_{m} \times N_{m}}

(14)

Where H is the embedding matrix. The inverse embedding layer expands the model’s local receptive field, which is beneficial for capturing the correlations among multiple spatial mappings.

The cosine similarity-based self-attention mechanism calculates attention weights by evaluating the cosine similarity between mappings, enabling more effective handling of embeddings that preserve distinct scale information. Focusing on the directional similarity between independently processed embeddings enhances cross-scale fusion and improves robustness to scale variations. Compared with traditional attention mechanisms, this approach strengthens the capability of relationship modeling among mappings, facilitating efficient feature fusion and noise suppression. By assigning higher weights to fault-related features, it increases sensitivity to fault characteristics. The query ( Q ), key ( K ), and value ( V ) can be derived by

{\begin{cases} \begin{array}{l} Q = H^{j} W^{Q} \\ K = H^{j} W^{K} \\ V = H^{j} W^{V} \end{array} & , H^{j} \in R^{N_{m} \times L_{e}} \end{cases}

(15)

Where

W^{Q}

W^{K}

, and

W^{V}

are learnable weight matrices, and

L_{e}

is the embedding dimension. The cosine similarity-based self-attention mechanism captures the correlations among embedded spatial mappings, and the normalized attention score matrix A is calculated as

A = Softmax (\frac{{Q K}^{T}}{‖ Q ‖ ‖ K ‖}) V

(16)

The multi-scale mode spatial mapping matrix is multiplied by the normalized attention score matrix for global feature fusion as

H^{l + 1} = {A H}^{l}

(17)

The cosine similarity-based self-attention mechanism improves the model’s sensitivity to fault features by dynamically adjusting the attention weights. It assigns greater importance to key fault-related features based on their similarity, further achieving noise suppression.

To capture the nonlinear features of attention-modulated multi-scale mode spatial mapping time series, a convolutional feed-forward extractor is proposed. It consists of convolutional layers, activation functions, normalization layers, and pooling layers. Extending the time dimension of the spatial mappings facilitates local connectivity and parameter sharing, enhancing the model’s representational capacity and reducing the risk of overfitting. The computation process can be formulated as

{\begin{cases} h_{i}^{'} = Dropout (SeLU (Conv 1 D_{1 \times d_{1}} (h_{i}^{N_{a}}))) \\ h_{i}^{''} = Dropout (Conv 1 D_{1 \times d_{2}} (h_{i}^{'})) \\ z = Pool (Norm (H^{''})), H^{''} = {\{h_{1}^{''}, h_{2}^{''}, \dots, h_{N_{m}}^{''}\}}^{T} \end{cases}

(18)

Where

d_{1}

and

d_{2}

are the corresponding convolution kernel sizes. Multi-scale mode spatial mappings are first projected to a high-dimensional space to extract fine-grained feature representations. Subsequently, dimensionality reduction is applied to compress the high-dimensional features and filter key information, thereby enhancing the model’s expressive power and generalization ability. The latent vector

H^{″}

, obtained after nonlinear feature extraction, is transformed through a pooling layer to derive the feature representation z .

The inverse embedding and cosine similarity-based attention mechanism guide the model to focus on inter-mapping relationships and emphasize fault-related features. This enables the model to concentrate on reliable mode spatial mappings, thereby indirectly mitigating the influence of simulation stage errors. Meanwhile, the convolutional feed-forward extractor processes each channel’s time series in parallel, preserving temporal structural characteristics. This enables effective learning of local features and intrinsic spatial properties, adaptive adjustment of category weights to mitigate attention imbalance, and improved suppression of noise interference.

3.4. Implementation of the diagnostic method

In this paper, a dynamics-assisted unsupervised domain adaptation method is established for bearing fault diagnosis. The workflow is shown in Figure 6, which is divided into the offline stage and the online stage. Steps 1 to 3 correspond to the offline stage, while Step 4 corresponds to the online stage.

Step 1: The dynamic model of the bearing in equation (1) is established by determining its key physical parameters through the parameter identification method in Algorithm 1, which is then used to generate simulated labeled fault data.

Step 2: The simulation data and measured data are set as the source domain and target domain, respectively. Data preprocessing is performed and the deep learning model parameters are randomly initialized.

Step 3: The preprocessed data are fed into the proposed model for forward propagation to obtain predicted class and domain labels. The performance of the model is evaluated by the loss function in equation (19) and optimized through backpropagation with the adaptive moment estimation (Adam) algorithm (Singarimbun et al., 2019)

{\begin{cases} L_{c} = - \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} \sum_{j = 1}^{N_{c}} y_{i, j} \log ({\hat{y}}_{i, j}) \\ L_{d} = - \frac{1}{N_{s + t}} \sum_{i = 1}^{N_{s + t}} [y_{i}^{d} \log ({\hat{y}}_{i}^{d}) + (1 - y_{i}^{d}) \log (1 - {\hat{y}}_{i}^{d})] \\ L = L_{c} + L_{d} \end{cases}

(19)

Where

L_{c}

and

L_{d}

represent the cross-entropy losses for classification and domain alignment,

N_{c}

is the number of classes,

N_{s}

is the number of source domain samples,

y_{i, c}

is the true label of the ith sample for class j, one-hot encoded, and

{\hat{y}}_{i, c}

is the predicted probability that the ith sample belongs to class j.

N_{s + t}

is the total number of samples,

y_{i}^{d}

is the true domain label, and

{\hat{y}}_{i}^{d}

is the predicted domain probability.

Step 4: The real-time monitoring data is preprocessed similarly to the offline stage and then fed into the well-trained model to obtain the real-time diagnosis results.

Figure 6.

The workflow of proposed method.

4. Experimental results and analysis

4.1. Dataset description

The CWRU dataset is one of the most widely used publicly available datasets for rolling bearing fault diagnosis. As shown in Figure 7(a), the experimental setup includes a torque sensor, a power meter, and a three-phase induction motor. The bearing model used is SKF6205. The dataset comprises three fault types: inner race fault, roller fault, and outer race fault. In the experiments, data was collected from the driving end, with the motor’s rotation speed being 1772 r/min and the sampling frequency of the data at 12 kHz.

Figure 7.

Rolling bearing test-rig for CWRU dataset (a) and XJTU-SY dataset (b).

The Xi’an Jiaotong University (XJTU) bearing dataset (Lei et al., 2019), released by Xi’an Jiaotong University, is collected using a test-rig as shown in Figure 7(b). It records the complete vibration signal data from normal operation to early-stage damage and severe faults. The test bearing used is the LDK UER204 model. The fault types include inner race fault, outer race fault, and mixed damage. The signals are sampled at a frequency of 25.6 kHz.

4.2. Experimental setting

To evaluate the diagnostic performance of the proposed method under the lack of label information in the measured data, a verification experiment is conducted as shown in Table 3. The simulated labeled data generated by the method in Section 2 is used as the source domain, and the measured unlabeled data is used as the target domain. In Case 1 and Case 2, the target domain uses datasets from CWRU and XJTU-SY, respectively. Moreover, in practical engineering scenarios, varying probabilities of fault occurrence may lead to class imbalance in the measured data. Therefore, three scenarios are considered for each case: one including all fault types, one excluding a single fault type, and one excluding two fault types. The fault types include normal (N), outer race fault (OF), inner race fault (IF), roller fault (RF), and hybrid fault (HF). Notably, target domain fault types only indicate their presence in measured data, while labels remain unknown. The hyperparameter settings are listed in Table 4.

Table 3.

Experimental setting for fault diagnosis tasks.

Datasets	Task	Source domain	Target domain
CWRU (Case1)	A₁	N, RF, IF, OF	N, RF, IF, OF
	A₂	N, RF, IF, OF	N, RF, IF
	A₃	N, RF, IF, OF	N, RF
XJTU-SY (Case2)	B₁	N, HF, IF, OF	N, HF, IF, OF
	B₂	N, HF, IF, OF	N, HF, IF
	B₃	N, HF, IF, OF	N, HF

Table 4.

Hyperparameters setting.

Hyperparameters	Values
Sample length	2048
Batch size	128
Epoch	100
Learning rate	0.0002
Dropout	0.2
Number of attention heads	2
Number of encoder layers	2
Embedding dimension	256
Hidden layer dimension	1024

In the experiment, the proposed method is compared with the following approaches. Methods without domain adaptation: VMD + SVM, which combines variational mode decomposition with support vector machine; and Non-DA, which is the proposed method without the domain adversarial strategy. Several advanced domain adaptation methods: MMD, which incorporates the discrepancy between domain distributions into the loss function; DANN, which employs a domain adversarial strategy by introducing a domain discriminator into the shared domain feature layer; and DACD, which further integrates label information into the shared domain feature layer based on the domain adversarial strategy. Ablation methods: Base-Attn, which replaces the inverse-embedding cosine similarity attention mechanism with a traditional attention mechanism; Non-DN, which removes the multi-scale mode denoising network; and Base-NDN, which removes the multi-scale mode denoising network and retains only the traditional attention mechanism.

4.3. Analysis of the comparison results

We conducted 10 independent repeated experiments on the two cases. The accuracy comparison of unsupervised cross-domain fault diagnosis of different methods is shown in Table 5.

Table 5.

Accuracy comparison of different methods.

Method	A₁ (%)	A₂ (%)	A₃ (%)	B₁ (%)	B₂ (%)	B₃ (%)
VMD + SVM	45.62 ± 0.00	-	-	36.75 ± 0.00	-	-
Non-DA	49.23 ± 4.63	-	-	52.45 ± 5.52	-	-
MMD	73.83 ± 3.79	71.61 ± 4.51	67.45 ± 4.65	78.55 ± 4.11	76.91 ± 4.83	74.07 ± 5.22
DANN	79.42 ± 4.21	75.59 ± 5.13	67.97 ± 5.34	82.72 ± 5.04	79.22 ± 5.65	73.18 ± 5.67
DACD	81.26 ± 3.25	79.38 ± 3.55	75.82 ± 4.22	84.15 ± 4.25	81.26 ± 4.84	78.11 ± 5.31
Base-NDN	82.26 ± 3.15	79.50 ± 4.53	74.46 ± 4.67	85.38 ± 4.34	81.66 ± 5.01	77.85 ± 5.27
Base-Attn	82.24 ± 3.19	79.52 ± 4.61	74.96 ± 4.88	86.57 ± 4.59	82.62 ± 4.92	78.10 ± 5.35
Non-DN	82.93 ± 2.85	81.06 ± 3.48	77.96 ± 3.71	87.24 ± 3.26	85.14 ± 4.11	82.04 ± 4.78
Proposed	83.02 ± 2.98	81.67 ± 3.31	78.57 ± 3.73	87.83 ± 3.41	85.12 ± 4.07	82.06 ± 4.73

As shown in Table 5, the proposed method achieves the highest diagnostic accuracy in both cases, demonstrating better generalization than the other methods. The results of VMD + SVM and Non-DA are reported once for each dataset because they are identical across different tasks without domain adaptation. Their relatively low accuracies indicate the influence of source-target discrepancy. The standard deviation of VMD + SVM is 0.00 because it does not involve random initialization or stochastic optimization. MMD reduces distribution differences in the reproducing kernel Hilbert space, but its mean-alignment strategy limits nonlinear feature modeling, leading to modest improvement. DANN aligns feature distributions through domain adversarial training but is affected by target-domain category imbalance. DACD incorporates source domain label information to mitigate this issue, but the lack of an attention mechanism limits its ability to capture local category differences. Base-NDN and Base-Attn apply traditional attention, with limited performance gains. In contrast, Non-DN and the proposed method utilize inverse-embedding to preserve multi-scale information and capture inter-scale relationships, achieving higher diagnostic accuracy and reliability than other approaches.

To further evaluate the diagnostic performance, we take Task A1 as an example and present the confusion matrices and t-distributed stochastic neighbor embedding (t-SNE) plots, as shown in Figures 8 and 9.

Figure 8.

Confusion matrices of different fault diagnosis methods.

Figure 9.

t-SNE visualization of different fault diagnosis methods.

The confusion matrices in Figure 8 illustrate the diagnostic accuracy of different methods across all fault types. Each cell displays the sample count on top, with the normalized percentage at the bottom. Light gray cells represent precision (calculated by column) and recall (calculated by row), while the dark gray cell in the bottom right indicates overall classification accuracy. The numbers 0 to 3 correspond to N, OF, IF, and RF, respectively.

The confusion matrices show severe misclassification without domain adaptation, indicating significant source-target distribution discrepancies. Conventional domain adaptation methods perform well for rolling element faults but still struggle to distinguish inner and outer race faults, reflecting category adaptation imbalance. DACD alleviates the adaptation disparity by incorporating label information. However, its fixed fusion mechanism limits local feature adaptation. The inverse-embedding cosine similarity attention mechanism adaptively adjusts category alignment and improves diagnostic performance. Compared with traditional attention, it more effectively alleviates class-wise imbalance and reduces confusion in certain categories.

The t-SNE plot visualizes boundaries and differences across fault types. To assess the domain adaptation and classification performance of various methods, output feature vectors are projected into a two-dimensional space using t-SNE. Since VMD + SVM does not learn deep output feature vectors, it is not included in the t-SNE visualization. Figure 9 presents the t-SNE plots for the different methods. S and T denote the source domain and the target domain, respectively.

The t-SNE plots further demonstrate the superiority of the proposed method in domain adaptation. Compared to other methods, the proposed approach reduces domain distribution discrepancies, yielding more compact intra-class distributions across domains with clearer boundaries. This demonstrates that the method achieves better overall performance, effectively mitigating class and feature distribution differences in fault data, enabling accurate extraction of complex fault features, and improving diagnostic accuracy and stability.

4.4. Result analysis under noisy conditions

To evaluate the noise robustness of different methods, experiments are conducted under various signal-to-noise ratio (SNR) conditions. Tables 6 and 7 show the accuracy of different methods under two conditions: low noise (20 dB) and high noise (−5 dB). Figure 10 presents line plots comparing their performance across different noise levels.

Table 6.

Accuracy comparison of different methods under low-noise condition.

Method	A₁ (%)	A₂ (%)	A₃ (%)	B₁ (%)	B₂ (%)	B₃ (%)
VMD + SVM	37.12 ± 0.00	-	-	25.08 ± 0.00	-	-
Non-DA	47.56 ± 4.84	-	-	50.40 ± 5.67	-	-
MMD	68.62 ± 5.42	64.33 ± 5.90	60.21 ± 6.55	71.62 ± 5.47	66.79 ± 6.11	63.02 ± 6.83
DANN	77.08 ± 4.67	73.32 ± 5.44	65.39 ± 5.85	80.59 ± 5.33	76.36 ± 6.20	71.26 ± 6.75
DACD	78.24 ± 3.92	76.48 ± 4.35	73.08 ± 5.23	81.62 ± 4.34	78.49 ± 5.22	75.31 ± 5.74
Base-NDN	79.98 ± 3.85	77.57 ± 4.83	73.46 ± 5.08	83.20 ± 4.67	80.42 ± 5.47	76.19 ± 5.85
Base-Attn	82.01 ± 3.20	79.36 ± 4.62	74.82 ± 4.95	85.51 ± 4.74	81.87 ± 5.08	77.47 ± 5.76
Non-DN	81.39 ± 3.21	79.64 ± 3.65	76.74 ± 4.09	85.55 ± 3.81	82.47 ± 4.52	79.76 ± 5.49
Proposed	82.87 ± 3.05	81.48 ± 3.36	78.12 ± 3.95	87.59 ± 3.62	84.83 ± 4.22	81.41 ± 5.03

Table 7.

Accuracy comparison of different methods under high-noise condition.

Method	A₁ (%)	A₂ (%)	A₃ (%)	B₁ (%)	B₂ (%)	B₃ (%)
VMD + SVM	25.00 ± 0.00	-	-	23.50 ± 0.00	-	-
Non-DA	26.43 ± 3.25	-	-	37.56 ± 4.66	-	-
MMD	51.63 ± 6.65	48.33 ± 7.28	46.45 ± 8.46	55.28 ± 7.18	51.25 ± 8.61	48.54 ± 9.63
DANN	60.90 ± 5.74	56.33 ± 6.88	50.51 ± 7.67	63.30 ± 6.74	59.63 ± 7.12	55.13 ± 7.73
DACD	59.28 ± 6.26	57.10 ± 6.96	52.03 ± 7.34	62.55 ± 6.43	59.56 ± 6.92	57.46 ± 7.40
Base-NDN	61.54 ± 5.90	56.99 ± 6.38	51.87 ± 7.16	65.22 ± 6.13	62.26 ± 6.65	58.38 ± 6.97
Base-Attn	68.76 ± 4.77	66.53 ± 5.84	60.54 ± 6.39	72.35 ± 5.17	67.50 ± 5.52	62.28 ± 6.74
Non-DN	64.37 ± 5.86	62.12 ± 6.27	59.86 ± 6.83	66.94 ± 5.88	64.62 ± 6.18	60.10 ± 6.91
Proposed	71.67 ± 4.63	69.67 ± 5.33	66.44 ± 5.85	74.32 ± 4.27	72.04 ± 5.28	68.15 ± 6.16

Figure 10.

Accuracy comparison of different methods under varying SNR conditions.

Experimental results show that under low noise conditions, all methods except MMD maintain relatively high diagnostic accuracy. MMD exhibits the earliest performance degradation due to its strong reliance on source domain features, leading to increased distribution discrepancies in noisy environments. As noise increases and the SNR approaches 5 dB, feature blurring causes class distributions to overlap, resulting in rapid accuracy declines for DANN, DACD, and Base-NDN. Among them, DACD relies on label information to learn domain-shared features, which may induce overfitting, further degrading performance.

The inverse-embedding cosine similarity attention mechanism reduces feature overlap among fault types, enhancing noise suppression. This improves the robustness of Non-DN, preventing notable accuracy degradation until the SNR approaches 0 dB. Benefiting from this mechanism, the proposed method achieves superior robustness compared with Base-Attn.

Comparing Non-DN with the proposed method shows that Non-DN exhibits varying accuracy degradation under different SNR conditions. Benefiting from the compact multi-branch convolutional network and channel-wise dynamic denoising gating mechanism, the proposed method more effectively suppresses noise interference and maintains high diagnostic accuracy under severe noise. These results confirm the effectiveness of the multi-scale mode denoising network in improving noise robustness.

To further analyze the impact of noise on domain adaptation performance and the noise robustness of different approaches, we present the t-SNE plots in Figure 11.

Figure 11.

t-SNE visualization of different fault diagnosis methods under high-noise condition.

As shown in Figure 11, MMD exhibits pronounced separation and considerable inter-class overlap, indicating weaker domain alignment and class separability under noise. DANN partially mitigates the domain discrepancy, but noise still causes blurred boundaries between classes, leading to less distinct clustering. Although DACD incorporates label information for feature alignment, it tends to overfit target domain features in noisy environments, resulting in misalignment for certain classes. Base-Attn demonstrates better inter-class separability, but noise interference increases domain gaps within certain classes. Non-DN shows tight domain alignment within each class, but noise interference causes some class overlap. In contrast, the proposed method maintains effective domain alignment and clear class boundaries, demonstrating superior noise robustness.

5. Conclusion

In response to the limited availability of labeled measured data and noise interference in rolling bearing fault diagnosis, this paper proposes a dynamics-assisted unsupervised domain adaptation method for cross-domain diagnosis under noisy conditions. The simulated labeled data are generated through a dynamic model of rolling bearings, in which the critical physical parameters are determined by a parameter identification method. Compared with empirical parameters, the identified parameters reduce time-domain PRD, RMSE, and MMD by 17.65%, 28.21%, and 41.67%, respectively, indicating improved simulation fidelity. The distribution mismatch between the simulated and measured data is then mitigated by an unsupervised domain adaptation framework through an adversarial strategy. A deep learning model that integrates a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is proposed to extract domain-invariant fault features. The multi-scale mode denoising network enhances noise suppression by capturing local characteristics of multi-scale modes and selectively suppressing noise in the frequency domain, improving average accuracy by 7.38% under −5 dB noise. Class-wise attention imbalance is alleviated by the inverse-embedding cosine similarity attention mechanism through enhancing relationship modeling between spatial mappings, thereby highlighting fault-related features and enabling the adaptive adjustment of class weights. The experimental results on two public datasets show that the proposed method achieves average accuracies of 81.09% and 85.00%, with average improvements of 3.05% under normal conditions and 12.39% under −5 dB noise over the best-performing advanced method.

Our future research will focus on enhancing the interpretability and transparency of the model, providing deeper insights into the decision-making process and making the fault diagnosis results more understandable for engineers.

Footnotes

ORCID iDs

Yuyu Zhao

Yi Tian

Yuxiao Wang

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China [Grant Numbers 62003352, 62003351], the Fundamental Research Funds for Central Universities (CAUC) [Grant Numbers 3122025041, 3122025047].

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.*

References

Akhenia

Bhavsar

Panchal

, et al. (2022) Fault severity classification of ball bearing using SinGAN and deep convolutional neural network. Proceedings of the Institution of Mechanical Engineers - Part C: Journal of Mechanical Engineering Science 236(7): 3864–3877. https://doi.org/10.1177/09544062211043132

Case Western Reserve University (2011) Bearing data center: Apparatus & procedures. Available at. https://csegroups.case.edu/bearingdatacenter/pages/apparatus-procedures accessed 17 September 2010.

Chawla

Bowyer

Hall

, et al. (2002) SMOTE: Synthetic Minority Over-sampling Technique. ArXiv abs/1106.1813.

Chen

Yang

Xue

, et al. (2023) Deep transfer learning for bearing fault diagnosis: a systematic review since 2016. IEEE Transactions on Instrumentation and Measurement 72: 1–21. https://doi.org/10.1109/tim.2023.3244237

Cui

Guan

Chen

(2021) Rolling element fault diagnosis based on VMD and sensitivity MCKD. IEEE Access 9: 120297–120308. https://doi.org/10.1109/access.2021.3108972

Hei

Sun

Yang

, et al. (2025) Novel domain-adaptive Wasserstein generative adversarial networks for early bearing fault diagnosis under various conditions. Reliability Engineering & System Safety 257: 110847. https://doi.org/10.1016/j.ress.2025.110847

Hoang

Kang

(2019) A survey on deep learning based bearing fault diagnosis. Neurocomputing 335: 327–335. https://doi.org/10.1016/j.neucom.2018.06.078

Huo

Jiang

Shen

, et al. (2023) A class-level matching unsupervised transfer learning network for rolling bearing fault diagnosis under various working conditions. Applied Soft Computing 146: 110739. https://doi.org/10.1016/j.asoc.2023.110739

Kumar

Gandhi

Vashishtha

, et al. (2022) VMD based trigonometric entropy measure: a simple and effective tool for dynamic degradation monitoring of rolling element bearing. Measurement Science and Technology 33(1): 014005. https://doi.org/10.1088/1361-6501/ac2fe8

10.

Lei

Han

Wang

, et al. (2019) XJTU-SY rolling element bearing accelerated life test datasets: a tutorial. Journal of Mechanical Engineering 55: 1.

11.

Zhang

Ding

, et al. (2019) Multi-layer domain adaptation method for rolling bearing fault diagnosis. Signal Processing 157: 180–197. https://doi.org/10.1016/j.sigpro.2018.12.005

12.

Qin

Wang

, et al. (2020a) Vibration analysis of deep groove ball bearings with local defect using a new displacement excitation function. Journal of Tribology - Transactions of the ASME 142(12): 121202. https://doi.org/10.1115/1.4048163

13.

Zhang

Ding

, et al. (2020b) Diagnosing rotating machines with weakly supervised data using deep transfer learning. IEEE Transactions on Industrial Informatics 16(3): 1688–1697. https://doi.org/10.1109/tii.2019.2927590

14.

Liu

Zhang

, et al. (2023) Itransformer: Inverted Transformers are Effective for Time Series Forecasting. ArXiv abs/2310.06625.

15.

Miao

Wang

Zhang

, et al. (2022) Improved generative adversarial network for rotating component fault diagnosis in scenarios with extremely limited data. IEEE Transactions on Instrumentation and Measurement 71: 1–13. https://doi.org/10.1109/tim.2021.3127636

16.

Feng

, et al. (2022) A fault information-guided variational mode decomposition (FIVMD) method for rolling element bearings diagnosis. Mechanical Systems and Signal Processing 164: 108216. https://doi.org/10.1016/j.ymssp.2021.108216

17.

Qin

Cao

, et al. (2020) A fault dynamic model of high-speed angular contact ball bearings. Mechanism and Machine Theory 143: 103627. https://doi.org/10.1016/j.mechmachtheory.2019.103627

18.

Qin

Luo

(2022) Data-model combined driven digital twin of life-cycle rolling bearing. IEEE Transactions on Industrial Informatics 18(3): 1530–1540. https://doi.org/10.1109/tii.2021.3089340

19.

Qin

Liu

Mao

(2024) Faulty rolling bearing digital twin model and its application in fault diagnosis with imbalanced samples. Advanced Engineering Informatics 61: 102513. https://doi.org/10.1016/j.aei.2024.102513

20.

Shah

Vakharia

Kumar

, et al. (2026) SA-ConSinGAN and reservoir computing fusion for accurate bearing fault classification and severity identification using GAF-based techniques. Scientific Reports 16(1): 9027. https://doi.org/10.1038/s41598-026-39807-7

21.

Shao

Ming

Liu

, et al. (2025) Small sample gearbox fault diagnosis based on improved deep forest in noisy environments. Nondestructive Testing and Evaluation 40(8): 3935–3956. https://doi.org/10.1080/10589759.2024.2404489

22.

Shi

Yang

, et al. (2024) A model-data combination driven digital twin model for few samples fault diagnosis of rolling bearings. Measurement Science and Technology 35(9): 095103. https://doi.org/10.1088/1361-6501/ad50f3

23.

Singarimbun

Nababan

Sitompul

(2019) Adaptive moment estimation to minimize square error in backpropagation algorithm. In: 2019 International Conference of Computer Science and Information Technology (Icosnikom), Paris, 25–27 September 2026, pp. 1–7.

24.

Song

Jiang

, et al. (2023) Smart multichannel mode extraction for enhanced bearing fault diagnosis. Mechanical Systems and Signal Processing 189: 110107. https://doi.org/10.1016/j.ymssp.2023.110107

25.

Trojovska

Dehghani

Trojovsky

(2022) Zebra optimization algorithm: a new bio-inspired optimization algorithm for solving optimization algorithm. IEEE Access 10: 49445–49473. https://doi.org/10.1109/access.2022.3172789

26.

Wan

Chen

, et al. (2022) A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis. Measurement 191: 110752. https://doi.org/10.1016/j.measurement.2022.110752

27.

Wang

Dong

Wang

, et al. (2023a) Limited fault data augmentation with compressed sensing for bearing fault diagnosis. IEEE Sensors Journal 23(13): 14499–14511. https://doi.org/10.1109/jsen.2023.3277563

28.

Wang

Zheng

Xiang

(2023b) Online bearing fault diagnosis using numerical simulation models and machine learning classifications. Reliability Engineering & System Safety 234: 109142. https://doi.org/10.1016/j.ress.2023.109142

29.

Zhang

, et al. (2022) Intelligent fault diagnosis of rolling bearings under varying operating conditions based on domain-adversarial neural network and attention mechanism. ISA Transactions 130: 477–489. https://doi.org/10.1016/j.isatra.2022.04.026

30.

Xiao

Shao

Han

, et al. (2022) Novel joint transfer network for unsupervised bearing fault diagnosis from simulation domain to experimental domain. IEEE/ASME Transactions on Mechatronics 27(6): 5254–5263. https://doi.org/10.1109/tmech.2022.3177174

31.

Zhang

Jia

, et al. (2020) Machinery fault diagnosis with imbalanced data using deep generative adversarial networks. Measurement 152: 107377. https://doi.org/10.1016/j.measurement.2019.107377

32.

Zhang

Ren

, et al. (2023) Digital twin-driven partial domain adaptation network for intelligent fault diagnosis of rolling bearing. Reliability Engineering & System Safety 234: 109186. https://doi.org/10.1016/j.ress.2023.109186

33.

Zhang

Xue

, et al. (2024) A collaborative domain adversarial network for unlabeled bearing fault diagnosis. Applied Sciences-Basel 14(19): 9116. https://doi.org/10.3390/app14199116

34.

Zhao

Guo

(2024) Rolling bearing fault diagnosis model based on DSCB-NFAM. Measurement Science and Technology 35(1): 015029. https://doi.org/10.1088/1361-6501/ad031b