Abstract
Real-time and accurate fault diagnosis of rolling bearings is crucial for the safe operation of rotating machinery. Deep learning technology has gained widespread applications in fault diagnosis due to its capability in vast data analysis and complex nonlinear modeling. However, such data-driven approaches are highly dependent on both data quantity and quality. In practical applications, they suffer from significant performance degradation due to scarce labeled data and noisy measurements. To address these issues, this paper proposes a dynamics-assisted unsupervised domain adaptation method for rolling bearing fault diagnosis. First, a dynamic model of rolling bearings is established and a parameter identification method is developed to determine its critical physical parameters, which enable the generation of high-fidelity labeled fault simulation data. Then, an adversarial unsupervised domain adaptation framework is constructed to mitigate the distribution discrepancy between simulated and measured data. Meanwhile, a deep learning model incorporating a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is proposed to extract domain-invariant fault features by capturing multi-scale modal characteristics, enhancing fault-related features, and suppressing noise. The effectiveness of the proposed method is validated on two public datasets under various target-domain data availability conditions, achieving average accuracies of 81.09% and 85.00%, respectively, and outperforming the best-performing method among other advanced methods by 2.27% and 3.83%. Under −5 dB noise, the improvements further increase to 13.12% and 11.65%, respectively.
1. Introduction
Rolling bearings are among the most critical components in mechanical systems, as their health directly affects the safe operation of the entire system. However, operating in complex and dynamic environments with high noise levels, extreme temperature variations, and significant pressure fluctuations (Zhao and Guo, 2024), they are prone to failure. This highlights the critical importance of accurate and reliable fault diagnosis for maintaining safety and efficiency (Hoang and Kang, 2019).
Vibration signals reflect bearing fault characteristics, but their nonlinearity and non-stationarity complicate the direct extraction of fault information. Time-domain or frequency-domain methods are limited in capturing both temporal and spectral characteristics. Variational Mode Decomposition (VMD), as a time-frequency signal processing method, provides a robust approach by adaptively decomposing signals into intrinsic modes (Cui et al., 2021; Ni et al., 2022; Song et al., 2023).
In recent years, deep learning methods have been widely applied to fault classification following signal processing (Chen et al., 2023). However, many models require sufficient labeled samples, which are often scarce in engineering due to cost and safety constraints. Deep forest methods leverage multi-grained scanning and feature selection to improve feature representation and robustness, which enables relatively good performance with small labeled datasets (Shao et al., 2025). Data augmentation techniques provide a more effective solution by generating additional fault samples to supplement the limited labeled data (Hei et al., 2025). Numerical-based data augmentation (Chawla et al., 2002; Wang et al., 2023a) processes existing fault data through interpolation, transformation, and reconstruction to expand datasets but may fail to preserve the underlying distribution, leading to overfitting on minority samples (Zhang et al., 2020). Deep learning-based data augmentation (Akhenia et al., 2022; Shah et al., 2026; Zhang et al., 2024) creates diverse labeled synthetic fault samples using models such as Generative Adversarial Networks and Variational Autoencoders but often suffer from mode collapse and lack high-frequency details (Miao et al., 2022). Physics-based data augmentation methods (Qin et al., 2024; Shi et al., 2024; Wang et al., 2023b), by simulating bearing faults via dynamic models, generate high-fidelity labeled vibration signals and reduces reliance on measured data. Among them, the two-degree-of-freedom dynamic model effectively represents bearing dynamics and fault features while allowing efficient parameter tuning (Qin et al., 2022).
The variation in operating conditions and the intrinsic complexity of bearings cause a distribution mismatch between simulated and measured data, thereby degrading diagnostic performance (Xiao et al., 2022). Unsupervised domain adaptation methods offer an effective means to mitigate this issue. Maximum Mean Discrepancy (MMD) minimizes feature distance across domains (Li et al., 2019), while correlation alignment loss further improves alignment (Li et al., 2020b). Domain-Adversarial Neural Networks (DANN) incorporate a domain discriminator to handle complex, nonlinear shifts. Attention mechanisms have been integrated with DANN to enhance feature learning quality (Wu et al., 2022), while MMD has been applied to adjust feature distributions across domains and improve diagnostic performance (Wan et al., 2022). Discriminative Adversarial Category Discrepancy (DACD) builds upon the DANN by incorporating category information, with the aim of further reducing category-level discrepancy between domains. Huo et al. (2023) introduced a class-level transfer learning network incorporating class information to align source and target distributions of identical fault categories. These domain adaptation methods aim to enable diagnostic models to learn domain-invariant feature representations. However, they face limitations in complex and variable noise environments, hindering accurate fault feature extraction.
To address the aforementioned issues, this paper proposes a dynamics-assisted unsupervised domain adaptation method for rolling bearing fault diagnosis under noisy conditions. The innovations and contributions of the proposed method are summarized as follows: (1) A dynamic modeling method with parameter identification is developed to enable physics-constrained data generation for rolling bearings. This produces high-fidelity simulated data covering various fault types and effectively mitigates the scarcity of labeled training data in bearing fault diagnosis. (2) A dynamics-assisted domain adaptation framework is proposed to achieve unsupervised bearing fault diagnosis. The framework integrates a domain-adversarial strategy and leverages high-fidelity simulated data to bridge the simulation-to-reality gap, preserving fault-relevant features and effectively handling nonlinear distribution discrepancies. (3) A deep learning model incorporating a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is designed. The proposed model combines adaptive noise suppression with fault-related feature enhancement, improving feature extraction robustness and diagnostic performance under strong noise.
The rest of this article is structured as follows: Section 2 establishes the dynamic model of rolling bearings and develops a parameter identification method. An unsupervised domain-adaptive fault diagnosis method is presented in Section 3. Section 4 discusses the experimental verification conducted on two public datasets. Several concluding remarks are drawn in Section 5.
2. Dynamics-assisted data generation
In this section, a dynamic model of rolling bearings is established to generate simulated signals. A parameter identification method is developed to identify the key physical parameters of the model.
2.1. Construction of dynamic model
To characterize the dynamic response of the bearing system, a two-degree-of-freedom dynamic model is developed, as shown in Figure 1. The corresponding differential equations are described as (Qin et al., 2020) Dynamic model and defect schematic of rolling bearing.
When the contact deformation is greater than zero, an elastic force is generated at the contact point. The contact deformation of the ith roller can be expressed as
2.2. Identification of dynamic model parameter
Stiffness k and damping coefficient c are critical parameters that directly influence the quality of signal generation. These parameters are typically set empirically, which may result in simulated signals that cannot fully reflect the dynamic characteristics of the bearing. Such simulation-stage errors can propagate to downstream adaptation, affect domain alignment, and increase the risk of negative transfer. To mitigate this issue, a parameter identification method is developed to estimate these parameters more accurately and reduce simulation errors before adaptation.
In this study, the parameters are updated using measured data, transforming the parameter identification task into the following optimization problem
Since the frequency spectrum of the vibration signal reflects the dynamic characteristics of the system, the objective function is defined as the Root Mean Square Error (RMSE) between the simulated and measured spectra, represented as
The parameters exhibit a highly nonlinear influence on the objective function, which may lead to multiple local optima during optimization. Differences in structural characteristics and material properties among rolling bearing types result in notable variations in the critical parameters. The Zebra Optimization Algorithm (ZOA) (Trojovska et al., 2022), with an escape mechanism, is effective in avoiding entrapment in local optima and enhancing global search capability. Therefore, ZOA is employed to solve the optimization problem and achieve accurate parameter identification. The parameter identification process is shown in Algorithm 1.
2.3. Validation of dynamic model
Parameters of 6205-2RS JEM SKF.
Additionally, other settings for the dynamic model are listed as follows: the bearing load in the x-axis direction is 0 N, the y-axis direction is 12000 N, the bearing rotational speed is 1797 r/min, and the defect depth is 0.05 mm. The parameter identification results in a k of 1.27 × 109 N/m and a c of 139. The nonlinear dynamic equation is solved using the Runge–Kutta method to obtain the vibration response. The initial state set as
The visual comparisons of time-domain waveforms and envelope spectra between the measured and simulated signals are provided in Figure 2. The time-domain waveform comparisons for three fault types show that the measured and simulated signals exhibit consistent periodic characteristics. The corresponding envelope spectra obtained through Hilbert Transform demodulation clearly reveal the fault characteristic frequencies. At a rotational speed of 1797 r/min (29.95 Hz), the theoretical fault frequencies (Zhang et al., 2023) for the outer race, inner race, and roller of the simulated signal are 107.36 Hz, 162.19 Hz, and 141.17 Hz, respectively. The prominent frequency components of the simulated signal are consistent with both the measured signal and these theoretical values, validating the fidelity of the simulated signal. Waveforms and envelope spectra of measured and simulated signals for different fault types.
To further illustrate the effectiveness of the proposed parameter identification method, simulation signals are generated with the identified parameters and the empirical parameters (k = 1.3 × 1010 N/m, c = 300) (Li et al., 2020a), followed by a comparison between the two signals. The discrepancy between the signals is typically assessed using the Percentage Root Mean Square Difference (PRD), RMSE, and MMD. These metrics quantify the difference in amplitude, overall fit, and distribution between the simulated and measured signals, respectively. The three metrics are defined as
Comparison of generated signals with different parameters.
3. Unsupervised domain-adaptive fault diagnosis
The bearing dynamic model, obtained via parameter identification to reduce simulation errors, provides simulated signals with rich label information for deep learning. However, variations in operating conditions and the inherent complexity of bearings lead to a simulation-to-reality gap, which reduces the accuracy when simulated signals are directly applied to fault diagnosis. To address this issue, this section proposes an unsupervised domain adaptation method, as illustrated in Figure 3. The proposed method addresses the domain discrepancy between simulated and measured data using a deep learning model with a dual-discriminator adversarial architecture. The model is trained jointly under a domain-adversarial strategy and employs a domain discriminator to align feature vectors between domains. A feature extraction network is designed using a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism to denoise and enhance domain-invariant features. This approach implements an unsupervised domain adaptation method for rolling bearing fault diagnosis under noisy conditions. The overview of the proposed method.
3.1. Decomposition of vibration signal modes
Bearing vibration signals, characterized by periodic impacts and strong noise, can be decomposed by VMD into components with distinct frequency characteristics (Kumar et al., 2022). This approach helps enhance impact-related features while suppressing noise components, effectively mitigating mode mixing and endpoint effects. It improves adaptability and noise robustness in fault feature extraction, while also reducing the risk of overfitting.
The VMD method adaptively decomposes the signal into a set of Intrinsic Mode Functions (IMFs), each representing a multi-scale mode with a distinct bandwidth and center frequency. The
3.2. Denoising of multi-scale modes
During the fault diagnosis process, noise interference can blur the key fault features in vibration signals, potentially causing negative transfer and leading to degraded diagnostic accuracy. While classic domain adaptation methods are less effective in suppressing noise interference. To address this, a multi-scale mode denoising network is proposed to capture the local features of multi-scale modes and enhance the model’s noise robustness. The network consists of a compact multi-branch convolutional layer, a cross-branch fusion unit, and a channel-wise dynamic denoising gating mechanism, as shown in Figure 4. Process of multi-scale mode denoising.
The compact multi-branch convolutional layer utilizes small-sized convolution kernels to filter out high-frequency noise while capturing the distribution characteristics and structural relationships of vibration signals. The cross-branch fusion unit enhances this capability by using depthwise convolutions and group convolutions to explore the local features of each mode under different receptive fields. This improves the model’s ability to extract spatial structural features, as shown in the following equation
A channel-wise dynamic denoising gating mechanism is designed, introducing a learnable scaling factor to adjust the frequency amplitude weights of each channel in the multi-scale mode spatial mappings. This helps suppress noise and enhances the feature channels strongly correlated with fault modes. The weight adjustment range of the adaptive scaling factor is constrained through weight pruning to avoid overfitting. The constrained scaling factor is then element-wise multiplied with the multi-scale mode spatial mappings to achieve optimized channel selection. The denoising process can be formulated as
The compact multi-branch convolutional layer captures multi-scale mode distributions, followed by cross-branch fusion to integrate features and enhance sensitivity to subtle structural patterns. A channel-wise dynamic denoising gate is then applied to refine and suppress noise in fault-related frequency-domain features, thereby implicitly enabling background noise reduction and preliminary extraction of multi-scale mode features. This process can also help mitigate negative transfer to some extent.
3.3. Correlation fusion of mode spatial mappings
To enhance fault feature extraction and achieve the fusion of correlated mode spatial mappings, a novel inverse-embedding cosine similarity attention mechanism is proposed. This mechanism emphasizes spatial mappings relevant to fault types while suppressing irrelevant features. It enhances the relationship modeling between mappings and adaptively adjusts category weights to alleviate class-wise attention imbalance. This approach helps avoid feature pattern overlap among fault types and further enhances noise suppression. The feature fusion process depicted in Figure 5 integrates an inverse embedding layer, a cosine similarity-based attention mechanism, and a convolutional feed-forward extractor, which can be represented as Process of spatial mapping correlation fusion.

The unified embedding for each multi-scale mode spatial mapping fails to preserve the distinct scale information (Liu et al., 2023). Therefore, the inverse embedding layer is employed to independently process the time series associated with each spatial mapping. It then employs a multi-layer perceptron to capture the global information of each scale mode. The inverse embedding process is given by
The cosine similarity-based self-attention mechanism calculates attention weights by evaluating the cosine similarity between mappings, enabling more effective handling of embeddings that preserve distinct scale information. Focusing on the directional similarity between independently processed embeddings enhances cross-scale fusion and improves robustness to scale variations. Compared with traditional attention mechanisms, this approach strengthens the capability of relationship modeling among mappings, facilitating efficient feature fusion and noise suppression. By assigning higher weights to fault-related features, it increases sensitivity to fault characteristics. The query (
The multi-scale mode spatial mapping matrix is multiplied by the normalized attention score matrix for global feature fusion as
The cosine similarity-based self-attention mechanism improves the model’s sensitivity to fault features by dynamically adjusting the attention weights. It assigns greater importance to key fault-related features based on their similarity, further achieving noise suppression.
To capture the nonlinear features of attention-modulated multi-scale mode spatial mapping time series, a convolutional feed-forward extractor is proposed. It consists of convolutional layers, activation functions, normalization layers, and pooling layers. Extending the time dimension of the spatial mappings facilitates local connectivity and parameter sharing, enhancing the model’s representational capacity and reducing the risk of overfitting. The computation process can be formulated as
The inverse embedding and cosine similarity-based attention mechanism guide the model to focus on inter-mapping relationships and emphasize fault-related features. This enables the model to concentrate on reliable mode spatial mappings, thereby indirectly mitigating the influence of simulation stage errors. Meanwhile, the convolutional feed-forward extractor processes each channel’s time series in parallel, preserving temporal structural characteristics. This enables effective learning of local features and intrinsic spatial properties, adaptive adjustment of category weights to mitigate attention imbalance, and improved suppression of noise interference.
3.4. Implementation of the diagnostic method
In this paper, a dynamics-assisted unsupervised domain adaptation method is established for bearing fault diagnosis. The workflow is shown in Figure 6, which is divided into the offline stage and the online stage. Steps 1 to 3 correspond to the offline stage, while Step 4 corresponds to the online stage. The workflow of proposed method.

4. Experimental results and analysis
4.1. Dataset description
The CWRU dataset is one of the most widely used publicly available datasets for rolling bearing fault diagnosis. As shown in Figure 7(a), the experimental setup includes a torque sensor, a power meter, and a three-phase induction motor. The bearing model used is SKF6205. The dataset comprises three fault types: inner race fault, roller fault, and outer race fault. In the experiments, data was collected from the driving end, with the motor’s rotation speed being 1772 r/min and the sampling frequency of the data at 12 kHz. Rolling bearing test-rig for CWRU dataset (a) and XJTU-SY dataset (b).
The Xi’an Jiaotong University (XJTU) bearing dataset (Lei et al., 2019), released by Xi’an Jiaotong University, is collected using a test-rig as shown in Figure 7(b). It records the complete vibration signal data from normal operation to early-stage damage and severe faults. The test bearing used is the LDK UER204 model. The fault types include inner race fault, outer race fault, and mixed damage. The signals are sampled at a frequency of 25.6 kHz.
4.2. Experimental setting
Experimental setting for fault diagnosis tasks.
Hyperparameters setting.
In the experiment, the proposed method is compared with the following approaches. Methods without domain adaptation: VMD + SVM, which combines variational mode decomposition with support vector machine; and Non-DA, which is the proposed method without the domain adversarial strategy. Several advanced domain adaptation methods: MMD, which incorporates the discrepancy between domain distributions into the loss function; DANN, which employs a domain adversarial strategy by introducing a domain discriminator into the shared domain feature layer; and DACD, which further integrates label information into the shared domain feature layer based on the domain adversarial strategy. Ablation methods: Base-Attn, which replaces the inverse-embedding cosine similarity attention mechanism with a traditional attention mechanism; Non-DN, which removes the multi-scale mode denoising network; and Base-NDN, which removes the multi-scale mode denoising network and retains only the traditional attention mechanism.
4.3. Analysis of the comparison results
Accuracy comparison of different methods.
As shown in Table 5, the proposed method achieves the highest diagnostic accuracy in both cases, demonstrating better generalization than the other methods. The results of VMD + SVM and Non-DA are reported once for each dataset because they are identical across different tasks without domain adaptation. Their relatively low accuracies indicate the influence of source-target discrepancy. The standard deviation of VMD + SVM is 0.00 because it does not involve random initialization or stochastic optimization. MMD reduces distribution differences in the reproducing kernel Hilbert space, but its mean-alignment strategy limits nonlinear feature modeling, leading to modest improvement. DANN aligns feature distributions through domain adversarial training but is affected by target-domain category imbalance. DACD incorporates source domain label information to mitigate this issue, but the lack of an attention mechanism limits its ability to capture local category differences. Base-NDN and Base-Attn apply traditional attention, with limited performance gains. In contrast, Non-DN and the proposed method utilize inverse-embedding to preserve multi-scale information and capture inter-scale relationships, achieving higher diagnostic accuracy and reliability than other approaches.
To further evaluate the diagnostic performance, we take Task A1 as an example and present the confusion matrices and t-distributed stochastic neighbor embedding (t-SNE) plots, as shown in Figures 8 and 9. Confusion matrices of different fault diagnosis methods. t-SNE visualization of different fault diagnosis methods.

The confusion matrices in Figure 8 illustrate the diagnostic accuracy of different methods across all fault types. Each cell displays the sample count on top, with the normalized percentage at the bottom. Light gray cells represent precision (calculated by column) and recall (calculated by row), while the dark gray cell in the bottom right indicates overall classification accuracy. The numbers 0 to 3 correspond to N, OF, IF, and RF, respectively.
The confusion matrices show severe misclassification without domain adaptation, indicating significant source-target distribution discrepancies. Conventional domain adaptation methods perform well for rolling element faults but still struggle to distinguish inner and outer race faults, reflecting category adaptation imbalance. DACD alleviates the adaptation disparity by incorporating label information. However, its fixed fusion mechanism limits local feature adaptation. The inverse-embedding cosine similarity attention mechanism adaptively adjusts category alignment and improves diagnostic performance. Compared with traditional attention, it more effectively alleviates class-wise imbalance and reduces confusion in certain categories.
The t-SNE plot visualizes boundaries and differences across fault types. To assess the domain adaptation and classification performance of various methods, output feature vectors are projected into a two-dimensional space using t-SNE. Since VMD + SVM does not learn deep output feature vectors, it is not included in the t-SNE visualization. Figure 9 presents the t-SNE plots for the different methods. S and T denote the source domain and the target domain, respectively.
The t-SNE plots further demonstrate the superiority of the proposed method in domain adaptation. Compared to other methods, the proposed approach reduces domain distribution discrepancies, yielding more compact intra-class distributions across domains with clearer boundaries. This demonstrates that the method achieves better overall performance, effectively mitigating class and feature distribution differences in fault data, enabling accurate extraction of complex fault features, and improving diagnostic accuracy and stability.
4.4. Result analysis under noisy conditions
Accuracy comparison of different methods under low-noise condition.
Accuracy comparison of different methods under high-noise condition.

Accuracy comparison of different methods under varying SNR conditions.
Experimental results show that under low noise conditions, all methods except MMD maintain relatively high diagnostic accuracy. MMD exhibits the earliest performance degradation due to its strong reliance on source domain features, leading to increased distribution discrepancies in noisy environments. As noise increases and the SNR approaches 5 dB, feature blurring causes class distributions to overlap, resulting in rapid accuracy declines for DANN, DACD, and Base-NDN. Among them, DACD relies on label information to learn domain-shared features, which may induce overfitting, further degrading performance.
The inverse-embedding cosine similarity attention mechanism reduces feature overlap among fault types, enhancing noise suppression. This improves the robustness of Non-DN, preventing notable accuracy degradation until the SNR approaches 0 dB. Benefiting from this mechanism, the proposed method achieves superior robustness compared with Base-Attn.
Comparing Non-DN with the proposed method shows that Non-DN exhibits varying accuracy degradation under different SNR conditions. Benefiting from the compact multi-branch convolutional network and channel-wise dynamic denoising gating mechanism, the proposed method more effectively suppresses noise interference and maintains high diagnostic accuracy under severe noise. These results confirm the effectiveness of the multi-scale mode denoising network in improving noise robustness.
To further analyze the impact of noise on domain adaptation performance and the noise robustness of different approaches, we present the t-SNE plots in Figure 11. t-SNE visualization of different fault diagnosis methods under high-noise condition.
As shown in Figure 11, MMD exhibits pronounced separation and considerable inter-class overlap, indicating weaker domain alignment and class separability under noise. DANN partially mitigates the domain discrepancy, but noise still causes blurred boundaries between classes, leading to less distinct clustering. Although DACD incorporates label information for feature alignment, it tends to overfit target domain features in noisy environments, resulting in misalignment for certain classes. Base-Attn demonstrates better inter-class separability, but noise interference increases domain gaps within certain classes. Non-DN shows tight domain alignment within each class, but noise interference causes some class overlap. In contrast, the proposed method maintains effective domain alignment and clear class boundaries, demonstrating superior noise robustness.
5. Conclusion
In response to the limited availability of labeled measured data and noise interference in rolling bearing fault diagnosis, this paper proposes a dynamics-assisted unsupervised domain adaptation method for cross-domain diagnosis under noisy conditions. The simulated labeled data are generated through a dynamic model of rolling bearings, in which the critical physical parameters are determined by a parameter identification method. Compared with empirical parameters, the identified parameters reduce time-domain PRD, RMSE, and MMD by 17.65%, 28.21%, and 41.67%, respectively, indicating improved simulation fidelity. The distribution mismatch between the simulated and measured data is then mitigated by an unsupervised domain adaptation framework through an adversarial strategy. A deep learning model that integrates a multi-scale mode denoising network and an inverse-embedding cosine similarity attention mechanism is proposed to extract domain-invariant fault features. The multi-scale mode denoising network enhances noise suppression by capturing local characteristics of multi-scale modes and selectively suppressing noise in the frequency domain, improving average accuracy by 7.38% under −5 dB noise. Class-wise attention imbalance is alleviated by the inverse-embedding cosine similarity attention mechanism through enhancing relationship modeling between spatial mappings, thereby highlighting fault-related features and enabling the adaptive adjustment of class weights. The experimental results on two public datasets show that the proposed method achieves average accuracies of 81.09% and 85.00%, with average improvements of 3.05% under normal conditions and 12.39% under −5 dB noise over the best-performing advanced method.
Our future research will focus on enhancing the interpretability and transparency of the model, providing deeper insights into the decision-making process and making the fault diagnosis results more understandable for engineers.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China [Grant Numbers 62003352, 62003351], the Fundamental Research Funds for Central Universities (CAUC) [Grant Numbers 3122025041, 3122025047].
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
