A cross-condition fault diagnosis method for rolling bearings based on multi-source domain adversarial feature fusion

Abstract

Currently, rolling bearing fault diagnosis faces dual challenges: significant variations in sample feature distributions caused by changes in operating conditions, and the scarcity of labeled fault samples. These factors severely undermine the generalization capability of traditional data-driven methods. To address these challenges, a multi-source domain feature fusion-domain adversarial neural network (MSDF-DANN) fault diagnosis model is proposed. Unlike conventional approaches that rely on unstable time-frequency representations, this study first employs variational mode decomposition and symmetric dot pattern to transform one-dimensional signals into two-dimensional snowflake maps, providing a geometrically consistent feature representation resilient to operating condition variations. Subsequently, a novel three-stage convolutional fusion module is designed to perform deep nonlinear integration of heterogeneous features from multiple source domains before adversarial adaptation. Furthermore, the mechanism of the adversarial weight is theoretically analyzed and experimentally verified to balance domain alignment and class discriminability. Finally, experiments on the Case Western Reserve University and Dynamic Diagnosis System datasets show that even with limited samples, MSDF-DANN has higher diagnostic accuracy and stability.

Keywords

Transfer learning fault diagnosis feature fusion domain adaptation rolling bearings

Introduction

As one of the key components for transmitting power and torque in industrial equipment, bearings are highly prone to various types and degrees of localized damage, such as spalling, pitting, and cracks on rolling surfaces.¹ These distinct fault types typically excite characteristic frequencies and modulate the vibration signals, resulting in differentiable temporal and spectral patterns.² Failure to promptly detect and address such damage may result in severe equipment malfunctions or even pose risks to personal safety.³ Therefore, accurate health monitoring of bearing transmission systems is essential for ensuring reliable equipment operation and preventing catastrophic accidents.⁴

In recent years, with the advancement of computer technology, data-driven intelligent fault diagnosis methods have become the mainstream research focus.^5,6 Gao et al.⁷ transformed time-domain signals into the frequency domain using fast Fourier transform, constructed a K-nearest neighbors graph for node classification, and successfully achieved high-accuracy bearing fault classification with limited samples by integrating meta-learning with graph convolutional networks. Wu et al.⁸ introduced a convolutional feature-based recurrent neural network incorporating long short-term memory (LSTM) blocks to extract long-term dependencies in time series. Maurya et al.⁹ proposed a bearing fault diagnosis framework that combines a multi-scale convolutional neural network (CNN) with an LSTM network. This approach improves robustness under varying load conditions and in the presence of noise by leveraging CNNs for spatial feature extraction and LSTMs for temporal pattern recognition. However, in real-world applications, variations in equipment operating conditions and sensor constraints often lead to discrepancies in feature distributions across datasets collected from the same process—commonly referred to as domain inconsistency.¹⁰

The rapid advancement of transfer learning technology provides novel approaches to addressing this challenge. At its core, transfer learning aims to utilize knowledge and model parameters acquired from a source task (source domain) to improve the learning performance and efficiency of a related but distinct target task (target domain).¹¹ Huo et al.¹² proposed a mechanism-driven domain adversarial model that utilizes autoencoders to extract physical features from vibration signals, enabling cross-operating-condition fault diagnosis through domain adversarial neural networks (DANNs). Zhong et al.¹³ aligned source and target domain features through domain-level and category-level matching adversarial training. They employed a maximum classifier deviation structure to optimize category feature alignment and guided feature extraction using target domain pseudo labels. Huo et al.¹⁴ proposed a lightweight model based on feature extraction transfer. They integrated a self-attention mechanism into SqueezeNet, transferring pre-trained ImageNet features and fine-tuning them to achieve efficient bearing fault classification on resource-constrained platforms. Deng et al.¹⁵ optimized source domain alignment and target domain feature extraction as a joint task by improving the one-dimensional (1D)-CNN architecture, combining linear convolutional stacking modules with pseudo-label transfer loss.

Although the aforementioned transfer learning methods demonstrate promising performance in fault diagnosis tasks, their effectiveness is often constrained by the feature distribution of source domain data when relying on a single-source domain for training.¹⁶ In real-world industrial scenarios, equipment operating environments are complex and variable, frequently yielding multiple similar source domains. Xia et al.¹⁷ proposed a multi-signal fusion method based on principal component analysis, converting multi-signal data into three-channel Red Green Blue (RGB) images and utilizing an improved residual CNN for feature extraction and fault diagnosis. Wang et al.¹⁸ collected vibration signals from three sensors at different locations, fusing multi-sensor signals through the convolutional operation of neural networks to achieve efficient signal processing and diagnosis. Xia et al.¹⁹ developed a novel multi-sensor information fusion method, organizing time-domain vibration signals from sensors at different locations into rectangular two-dimensional (2D) matrices and processing them via an improved 2D-CNN. Wang et al.²⁰ further proposed a CNN-based fault diagnosis method for rotating machinery, integrating raw data from multiple sensors while fully considering the spatio-temporal characteristics of the data. Huang et al.²¹ designed a method to convert multi-sensor vibration signals into RGB images, refining fault features through RGB color space transformation to enhance the discrimination capability between different fault signal types. Tang et al.²² employed an adaptive weighted fusion algorithm to fuse dual-channel vibration signals at the data layer, enhancing signal intensity. Simultaneously, they generated new samples through weighted fusion of data from different fault severity levels for servo motor fault diagnosis, further improving the model’s diagnostic performance.

Currently, research on fault diagnosis under cross-operating condition scenarios remains limited. While many studies emphasize deep learning approaches, they often overlook the physical significance of the signals, leading to poor model interpretability. Specifically, three critical issues remain to be addressed:

Lack of physically meaningful feature representation: Despite efforts to integrate multi-source information using multi-sensor fusion techniques, many approaches are still limited to basic data-level or feature-level fusion. These methods often fail to construct a feature space with explicit geometric structure and physical semantics, which is crucial for representing the intrinsic dynamics of bearing faults and mitigating the interference of background noise under varying conditions.

Insufficient utilization of multi-condition information: Methods relying on single-source domains cannot fully leverage the rich and complementary information scattered across multiple operating conditions. This limitation constrains the model’s capacity to learn robust, domain-invariant representations, thereby compromising generalization performance in complex and variable real-world industrial environments.

Poor transferability under data scarcity: Traditional methods experience notable performance degradation when labeled target domain samples are scarce. In such cases, models tend to overfit the sparsely labeled target domain features and fail to learn a decision boundary that generalizes, as they lack effective mechanisms to bridge the domain gap with minimal supervision.

To address the aforementioned challenges, this paper proposes a multi-source domain feature fusion-DANN (MSDF-DANN) fault diagnosis, aiming to address the transfer learning problem in multi-source domain scenarios. The method consists of three core components: (1) construction of fault feature representations, (2) MSDF, and (3) a domain adversarial training mechanism. First, 1D time-domain signals are mapped into 2D feature maps with geometric semantics, explicitly geometrizing fault features. Subsequently, a three-layer convolutional architecture is designed to fuse multi-source domain features from various operating conditions, thereby enhancing the model’s cross-domain adaptability to bearing faults across diverse scenarios. Furthermore, a domain confusion loss weight adjustment strategy is introduced to balance the trade-off between domain alignment and classification discrimination. The main contributions of this study are summarized as follows:

Propose a novel variational mode decomposition (VMD)-symmetric dot pattern (SDP) feature transformation that provides a geometric and physically interpretable alternative to conventional time-frequency images, enhancing noise robustness and feature discriminability for bearing vibration signals.

A novel three-stage convolutional feature fusion module is proposed, which innovatively departs from conventional MSDA approaches that rely on simple feature concatenation or weighted fusion strategies. Instead, the module learns a unified feature representation directly on the VMD-SDP geometric map through a hierarchical convolutional architecture. This deep fusion mechanism enables more effective integration of heterogeneous features derived from multiple operating conditions, thereby substantially improving the model’s cross-domain adaptability.

Introduce a strategy for tuning the domain confusion loss weight, which stabilizes the adversarial training process and systematically balances feature alignment with classification discrimination, leading to more reliable cross-domain performance.

The remainder of this paper is organized as follows: the second section presents a comprehensive description of the proposed methodology. The third section provides experimental results and a comparative analysis to evaluate the performance of the proposed approach. The fourth section summarizes the key findings and concludes the paper.

Methodology

Single-source domain methods, which rely solely on a single data source, are limited in feature information and often fail to comprehensively characterize fault features under complex operating conditions. This study proposes the MSDF-DANN model, which improves model generalization by integrating data features from multiple operating conditions through a multi-source feature extractor. Compared with traditional single-domain transfer approaches, MSDF-DANN effectively exploits rich information from diverse domain sources, making it particularly suitable for cross-operating-condition transfer learning. The fusion of multi-domain data enables more comprehensive feature representation, hence enhancing the model’s adaptability and robustness.

VMD-SDP feature map extraction method

The SDP is a technique for analyzing and describing symmetry features within data. By employing this method, hidden structures and characteristics within the data can be revealed.²³ The conversion principle is illustrated in Figure 1. The acquired 1D time-domain vibration signal, $X = {x_{0}, x_{1}, x_{2}, x_{3}, \dots, x_{n}}$ , undergoes normalization processing, and then is transformed into symmetrical coordinate points in polar coordinates via the SDP.

Polar radius:

R_{i} = \frac{x_{i} - x_{min}}{x_{max} - x_{min}}

(1)

Counterclockwise angle:

θ_{i +} = θ + \frac{x_{i + l} - x_{min}}{x_{max} - x_{min}} ξ

(2)

Clockwise angle:

θ_{i -} = θ - \frac{x_{i + 1} - x_{min}}{x_{max} - x_{min}} ξ

(3)

where $x_{max}$ denotes the maximum amplitude of the vibration signal. $x_{min}$ denotes the minimum amplitude. $x_{i}$ denotes the ith sample point of the vibration signal. $i$ denotes the time interval parameter—in this paper, setting it to 1, that is, the subsequent data points, can effectively retain the local temporal correlation of the vibration signals characterizing the bearing faults. $θ$ denotes the mirror alignment angle—setting it to $π / 6$ controls the angular separation between the counterclockwise and clockwise mirrored points; it determines the overall opening or spread of the resulting SDP pattern. Finally, $ξ$ denotes the angular amplification factor—setting it to 10 amplifies the amplitude variations of the signal into more pronounced angular variations in the polar plot; it enhances the sensitivity of the SDP image to amplitude fluctuations, which are key indicators of fault-induced impulses.

Figure 1.

Illustration of SDP method’s principle: (a) time series and (b) SDP image. SDP: symmetric dot pattern.

The SDP method based on VMD is a signal processing technique that decomposes an original signal into multiple time-frequency localized components. Each decomposed modal component corresponds to a specific local frequency of the signal. By decomposing the original signal into these modal components, we assume a time-domain signal $x (t)$ , which is decomposed into K intrinsic modal functions through VMD. The resulting decomposition can be mathematically represented as a linear combination, as shown in Equations (4) and (5) below:

x (t) = \sum_{k = 1}^{K} u_{k} (t)

(4)

J_{k} (u_{k}) = \frac{1}{2} {| | x - \sum_{j \neq k} u_{i} | |}_{2}^{2} + λ Ω_{k} (u_{k})

(5)

where $u_{k} (t)$ denotes the kth modal function, which is obtained by minimizing the cost function.

The performance of VMD is mainly determined by two key parameters: the number of modes $K$ and the penalty factor $α$ . If K is too small, it may lead to mode mixing, while if $K$ is too large, it will introduce spurious modes and increase the computational burden, resulting in computational redundancy. On the other hand, the penalty factor $α$ affects the bandwidth of the extracted modes, balancing data fidelity and mode smoothness. In this study, referring to the method of Ganin and Lempitsky,²⁴ first, a variable bandwidth strategy is constructed based on the frequency distribution differences of each component. Second, the convergence characteristics of the signal are analyzed through a self-converging strategy based on the variable bandwidth control parameters. Finally, the optimal initial center frequency is determined to generate the best parameters.

Each modal component within the modal function consists of t time-domain points. By utilizing the time-domain points of each modal component and integrating them with the symmetric point modal method, Equations (6)–(8) are formulated.

Polar radius:

R_{k} [t] = \frac{u_{k} (t) - u_{k} {(t)}_{min}}{u_{k} {(t)}_{max} - u_{k} {(t)}_{min}}

(6)

Counterclockwise angle:

θ_{1 k} [t] = θ + \frac{u_{k} {(t)}_{n}_{+ l} - u_{k} {(t)}_{min}}{u_{k} {(t)}_{max} - u_{k} {(t)}_{min}} ξ

(7)

Clockwise angle:

θ_{2 k} [t] = θ - \frac{u_{k} {(t)}_{n}_{+ l} - u_{k} {(t)}_{min}}{u_{k} {(t)}_{max} - u_{k} {(t)}_{min}} ξ

(8)

where $R_{k} [t]$ denotes the mapping radius of the modal component in polar coordinate space; $θ_{1 k} [t]$ represents the counterclockwise mirror angle corresponding to the ith frequency of the kth modal component; $θ_{2 k} [t]$ denotes the clockwise mirror angle corresponding to the ith frequency of the kth modal component; and $ξ$ is the angle amplification factor. The flowchart is shown in Figure 2.

Figure 2.

VMD-SDP image plotting flowchart. VMD: variational mode decomposition; SDP: symmetric dot pattern.

Although VMD and SDP have been extensively studied as standalone techniques, the novelty of this work lies in employing VMD-SDP as a critical front-end preprocessing step to specifically address the challenge of feature representation in multi-source domain adversarial networks. Rather than using raw signals or time-frequency representations directly, the proposed method leverages the signal decomposition capability of VMD to isolate dominant fault modes, which are then transformed into snowflake graphs with stable geometric topological structures via SDP. This distinctive 2D representation offers geometrically consistent inputs for the subsequent convolutional fusion network, thereby enhancing robustness in preserving domain-invariant fault characteristics.

Domain adversarial neural networks

Due to differences in data acquisition conditions, environmental noise, and operational states between laboratory and industrial data, directly transferring models trained in the source domain to the target domain often leads to degraded model performance and reduced fault identification accuracy. To address this domain adaptation challenge, DANNs introduce an adversarial training mechanism to learn domain-invariant feature representations.²⁵ Under domain adversarial learning, the model adapts to domain differences between source and target domains under varying operating conditions, enhancing generalization capabilities on the target domain to achieve cross-condition fault diagnosis.

The DANN primarily consists of a feature extractor $G_{f}$ , a label predictor $G_{y}$ , and a domain discriminator $G_{d}$ , with its network architecture shown in Figure 3. DANN establishes an adversarial game between the feature extractor and domain discriminator, enabling the feature extractor to learn domain-invariant features independent of domain labels. This effectively mitigates distribution differences between source and target domains. Consequently, within the model architecture, its objective loss function comprises both classification loss and domain adversarial loss. The classification loss applies to the $G_{y}$ and the $G_{f}$ , ensuring classification accuracy for source domain samples. Its definition is shown in Equation (9):

L_{y} = \frac{1}{N} \sum_{i = 1}^{N_{s}} L_{CE} (G_{y} (G_{f} (x_{i}^{s}), y_{i}^{s}))

(9)

where $L_{CE}$ represents the binary cross-entropy loss; $G_{y} (G_{f} (x_{i}^{s}), y_{i}^{s})$ denotes the predicted result, and $L_{y}$ indicates the classification loss.

Figure 3.

Structure of domain adversarial neural network.

Domain adversarial loss $L_{d}$ is employed to optimize $G_{d}$ , enabling it to accurately distinguish features between the source domain and target domain. Its definition is shown in Equation (10):

L_{d} = - \frac{1}{N} \sum_{i = 1}^{N} [d_{i} \log G_{d} (G_{f} (x_{i})) + (1 - d_{i}) \log (1 - G_{d} (G_{f} (x_{i})))]

(10)

where $d_{i}$ denotes the domain label of sample $x_{i}$ (1 indicates source domain, 0 indicates target domain), and N is the total number of samples in the source and target domains.

The ultimate optimization objective of the feature extractor is to minimize classification loss while simultaneously counteracting the domain discriminator’s optimization, thereby generating domain-indiscernible features. This is defined as shown in Equation (11):

L_{f} = min_{G_{f}, G_{y}} max_{G_{d}} L_{y} - λ_{2} L_{d}

(11)

where $λ_{2}$ is the hyperparameter balancing the classification loss and domain adversarial loss, while $L_{f}$ represents the total loss, which is also the feature extraction loss.

To achieve the ultimate optimization goal, DANN incorporates a gradient reversal layer (GRL). The function of GRL is to bridge the feature extractor and domain discriminator while reversing the gradient sign during backpropagation. During forward propagation, GRL leaves the input data unchanged. During backpropagation, GRL multiplies the gradient by a negative constant $- λ_{2}$ to enable adversarial updates.²⁶ Its definition is shown in Equation (12):

\frac{\partial GRL (x)}{\partial x} = - λ_{2}

(12)

Multi-source domain feature fusion-domain adversarial neural network

Traditional MSDF methods generally conduct superficial fusion during the later stages of feature extraction, which limits their ability to fully capture the intrinsic relationships among features from different domains in deep representation space. To address this issue, this study proposes a DANN based on MSDF. Building on the conventional DANN framework, the model integrates a feature fusion module, the architecture of which is depicted in Figure 4. This module exerts a direct impact on the geometric feature map generated by VMD-SDP. By means of a hierarchical convolutional architecture, prior to feeding the features into the main extractor, it accomplishes the nonlinear mapping and integration of multi-source heterogeneous features within the deep-feature space. This fusion-first-and-then-adversarial framework is capable of effectively extracting a unified and robust domain-invariant fault representation.

Figure 4.

Architecture of the enhanced domain adversarial neural network.

On the other hand, to achieve alignment of multi-source domain features, the multi-source feature fusion module implicitly learns geometric transformation matrices through convolutional layers, mapping multi-condition feature maps into a unified feature space. The ultimate optimization objective of the feature extractor is to generate domain-indistinguishable features by minimizing classification loss while optimizing the adversarial domain discriminator.

The MSDF-DANN flowchart is illustrated in Figure 5, with the detailed procedure outlined as follows:

Step 1: Data preprocessing. The input data are normalized. Each signal is decomposed using VMD to extract IMF components. The IMFs are then transformed into 2D polar coordinate snowflake diagrams via SDP method. All resulting snowflake diagrams are concatenated to form a multi-channel feature map, which serves as the model input.

Step 2: Construction of the MSDF module. Input: The multi-channel feature map generated in step 1. Initial feature extraction is performed through a first convolutional layer. A second convolutional layer further abstracts high-level features, followed by a fusion layer that reduces the channel dimension from 128 to 3, yielding fused cross-condition fault features.

Step 3: Construct a DANN framework. The feature extractor takes the fused features as input and extracts high-level representations through convolutional and fully connected layers. The fault classifier receives the output from the feature extractor and predicts fault categories using a fully connected layer. The domain discriminator takes the extracted features as input and distinguishes whether they originate from the source or target domain.

Step 4: Perform joint training and adversarial learning. The model simultaneously leverages labeled data from multiple source domains and sparsely labeled data from the target domain. Through adversarial optimization, it jointly refines the feature fusion module, feature extractor, and fault classifier. This process enables accurate fault classification while encouraging domain-invariant feature representations, thereby minimizing the discrepancy between source and target domains.

Step 5: Target domain fault diagnosis. Input the data into the trained MSDF-DANN model; the fault classifier generates predicted probabilities for each fault type; subsequently, compute evaluation metrics—including accuracy and the confusion matrix—on the target domain test set to assess the model’s generalization performance across different operating conditions.

Figure 5.

Flow diagram of MSDF-DANN. MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Experiment verification

Data description

To validate the effectiveness of the method, this study employs the Case Western Reserve University (CWRU) Bearing Fault Dataset as the source domain dataset.²⁷ Although the CWRU dataset is a classic benchmark allowing for fair comparison with established literature, we intentionally introduced the Dynamic Diagnosis System (DDS) Testbed at China University of Mining and Technology as the target domain, enabling an evaluation of the model’s adaptability across different operating conditions.²⁸ Detailed information on the datasets is presented in Table 1. Three rotational speed conditions (A, B, and C) were selected from the CWRU dataset, covering four fault categories with three distinct fault diameters: 0.007, 0.014, and 0.021 inches. The target domain dataset includes two operational conditions (D and E), each containing 100 samples per fault type (Figure 6).

Table 1.

Experimental datasets A-E.

Dataset	Dataset labels	Motor Load	Rotational speed	Normal	Outer ring	Inner ring	Roller
CWRU	A	3HP	1730	1200	400	400	400
	B	3HP	1797	1200	400	400	400
	C	3HP	1750	1200	400	400	400
DDS	D	0HP	1750	300	100	100	100
	E	1HP	1650	300	100	100	100

CWRU: Case Western Reserve University; DDS: Dynamic Diagnosis System; HP: Horsepower.

Figure 6.

Dataset description.

Raw bearing fault vibration data is divided into training and validation sets. The vibration data is then extracted from the raw dataset and segmented into multiple samples using a fixed sliding window of 4096 data points. The original time-domain signals are normalized prior to undergoing the VMD-SDP transformation, which generates a 2D feature map.

To evaluate the effectiveness of domain adversarial neural networks with MSDF for cross-operating condition bearing fault diagnosis, three experimental setups were designed. These experiments conduct in-depth analyses across three aspects: diagnostic model parameter tuning, comparison of multi-source domain transfer diagnosis with other deep learning methods, and comparison between single-source and multi-source domain transfer diagnosis.

Implementation details

MSDF-DANN is implemented using the TensorFlow 2.0 deep learning framework. It primarily comprises a feature fusion network, a feature extractor, a classifier, a domain classifier, and GRL. The model parameters are detailed in Table 2.

Table 2.

Model parameters of MSDF-DANN.

Network layer	Parameters	Activation function
Feature fusion module	Inputs (256, 256, 9)	—
Convolution layer 1	(64, 3, 3)	ReLU
Convolution layer 2	(128, 3, 3)	ReLU
Convolution fusion layer	(3, 1, 1)	ReLU
Feature extractor	Inputs (256, 256, 3)	—
Convolution layer 1	(32, 3, 3)	ReLU
Max-pooling layer	(2, 2)	—
Classifier	Inputs (128)	—
Fully connected layer	(4)	Softmax
Domain classifier	Inputs (128)	—

MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

The feature fusion network consists of three convolutional layers. The first convolutional layer employs a 64 × 64 convolution kernel with Rectified Linear Unit (ReLU) activation and same padding, followed by a batch normalization layer. The second convolutional layer employs a 128 × 128 kernel with ReLU activation, followed by another batch normalization layer. This layer reduces the number of channels from 128 to 3 using ReLU as the activation function.

The feature extractor performs feature extraction through a convolutional layer utilizing 32 convolutional kernels, employing the ReLU activation function. A max-pooling layer is applied to downsample the feature maps, reducing spatial dimensions while preserving essential information. The pooled feature maps are flattened into 1D vectors via a flattening layer, enabling input to subsequent fully connected layers for deeper feature learning and classification.

The classifier network processes features through a fully connected layer containing 512 neurons, followed by the LeakyReLU activation function. To reduce overfitting, dropout layers are applied after each hidden layer. The features are then passed through a second fully connected layer, also consisting of 512 neurons and using LeakyReLU activation, again accompanied by dropout regularization. Finally, a four-neuron output layer with softmax activation performs the multi-class classification, generating probability scores for each class to support the final decision.

The domain classifier processes input features through a fully connected layer with 512 neurons, followed by the LeakyReLU activation function. These features are then propagated through a sequence of progressively smaller fully connected layers—256, 128, and 64 neurons respectively—each also employing LeakyReLU to progressively extract more abstract and discriminative domain-specific representations. Finally, a single-neuron output layer with a sigmoid activation function produces the binary classification output, distinguishing between source and target domains. This architecture enables the model to effectively adapt feature representations to variations in operating conditions, thereby improving cross-condition learning performance and transferability.

Experiment on adversarial weight analysis and interpretable mechanism

This experimental section investigates the influence of domain confusion loss weights on model transfer performance. As shown in Equation (11), the overall objective function of MSDF-DANN involves a trade-off between the classification loss $L_{y}$ and the domain adversarial loss $L_{d}$ . The domain confusion loss weight $λ$ plays a decisive role and is responsible for balancing classification accuracy and domain invariance. Mathematically, the optimization objective can be expressed as follows:

E (θ_{f}, θ_{y}, θ_{d}) = L_{y} (θ_{f}, θ_{y}) - λ L_{d} (θ_{f}, θ_{d})

(13)

where $G_{f}$ aims to minimize classification error while maximizing domain confusion, while $G_{d}$ attempts to minimize domain confusion. The weight $λ$ serves as a regularization coefficient, determining the strength of domain alignment. This paper proposes an interpretable mechanism to explain its impact on model performance at three different stages:

(1). Under-adaptation regime ( $λ \to 0$ ): When $λ$ is excessively small, the total loss will be dominated by the source domain classification loss $L_{y}$ . The gradient contribution of the domain discriminator can be ignored. Therefore, $G_{f}$ only focuses on extracting features that are discriminative for the source domain and neglects the distribution shift. This leads to overfitting on the source domain and poor generalization ability on the target domain.

(2). Negative transfer regime ( $λ \to \infty$ ): When $λ$ is excessively large, the adversarial loss dominates the optimization. To maximize the deception of the discriminator, the feature extractor is forced to eliminate all domain-specific information to generate domain-invariant features. However, excessive elimination may remove crucial but subtle category-distinguishing information that is vital for fault diagnosis. This could lead to negative transfer, where the feature extractor learns to map all features to a single, domain-invariant but category-indistinguishable point. This would also result in a decline in classification performance.

(3). Equilibrium regime: There exists an optimal range where $λ$ balances the gradients from classification and discrimination. In this regime, $G_{f}$ learns a feature space that is maximally invariant across domains while retaining sufficient discriminability for fault categorization.

To empirically verify the proposed mechanism and determine the optimal configuration for our specific VMD-SDP feature inputs, this section conducted a sensitivity analysis on $λ$ . The experimental settings, batch size 16, learning rate 0.0001, were consistent with previous sections. We systematically adjusted $λ$ from 0.0001 to 2.5. Source domain data were drawn from datasets A, B, and C, while target domain data were selected from datasets D and E. During training, 400 samples from D and E were reserved for the validation set; the remaining data, combined with the source domain data, were used for model training. The results on bearing fault recognition accuracy are detailed in Table 3. The visualization and confusion matrices for the two transfer tasks with the best training results are shown in Figures 7 and 8 (N: normal; I: inner ring; O: outer ring; R: rolling element).

Table 3.

Effect of different confusion loss weights on experimental results.

Weight	0.0001	0.001	0.01	0.1	1	1.5	2	2.5
ABC → D	0.58	0.675	0.69	0.76	0.795	0.85	0.95	0.92
ABC → E	0.485	0.52	0.645	0.72	0.74	0.835	0.915	0.905

Figure 7.

The t-distributed stochastic neighbor embedding (t-SNE) visualization results of (a) ABC-D and (b) ABC-E.

Figure 8.

(a) ABC-D and (b) ABC-E migration confusion matrix.

The experimental results show that when the weight $λ$ is at a low value (0.0001, 0.001), the accuracy rates of target domains D and E are merely 0.58 and 0.485 respectively. This confirms that without sufficient adversarial penalty, the significant distribution discrepancy caused by operating condition changes is not effectively bridged.

As $λ$ increases toward the range of [1.5, 2.0], the model performance improves significantly, reaching a peak of 0.95 (ABC → D) and 0.915 (ABC → E). This indicates that the model has reached the optimal equilibrium point where domain-invariant features are successfully learned.

Crucially, when the weight $λ$ is further increased to 2.5, a decline in accuracy is observed, dropping to 0.92 and 0.905, respectively. This supports the “Negative Transfer” mechanism, wherein an excessive adversarial weight begins to undermine the class-discriminative structure of the features. Based on this analysis, we fixed $λ = 2$ for subsequent experiments.

To further evaluate the performance of the cross-domain fault identification model, this section analyzes its training dynamics on the ABC → D transfer task, as illustrated in Figure 9. The domain adversarial loss, implemented as a binary cross-entropy loss, converges to approximately 0.693. This value corresponds to the expected loss for a binary classification task with balanced classes under random guessing (i.e., 50% accuracy). Therefore, a loss near 0.693 indicates that the domain discriminator is unable to reliably differentiate whether the features extracted by the feature encoder originate from the source or target domain. This outcome suggests that the domain adversarial training has successfully enforced domain-invariant feature learning, achieving the desired equilibrium in the adversarial process.

Figure 9.

ABC-D migration diagnosis loss function curve.

During the early training phase, significant distributional discrepancies between the two operating condition datasets hindered the feature extractor from learning domain-invariant features that could effectively align the source and target domains. As training iterations increased, all loss components exhibited a consistent downward trend. The steady reduction in classification loss reflects continuous improvement in the classifier’s discriminative capability, indicating that the model gradually converged and entered a stable training regime. Once the losses stabilized within a narrow fluctuation range, it became evident that both feature extraction and classification functions had reached a steady state. The absence of severe overfitting, combined with the stabilization of domain adversarial loss, suggests that domain adaptation has achieved equilibrium, allowing the model to effectively capture the underlying data distribution of the target domain. These observations collectively demonstrate the model’s strong transferability in fault diagnosis tasks and its robust generalization capacity for accurate cross-operating condition fault identification.

Comparative experiments with other methods

Four relevant fault diagnosis models—CNN,²⁹ LSTM,³⁰ and Support Vector Machine (SVM),³¹—were trained using target domain data (E) for comparative evaluation. The datasets for these baseline methods included only a limited amount of labeled data from the target domain. In contrast, MSDF-DANN leveraged additional source domain datasets (A, B, and C) representing diverse operating conditions; a domain adaptation fault diagnosis model based on ResNet and Transformer (DAFDMRT)³² used additional source domain dataset A. For evaluation, 200 samples from target domain E were reserved as validation data. All models were trained with the same total number of samples but under varying training set sizes, and each was subjected to 100 training iterations. The resulting recognition accuracies are summarized in Table 4, with corresponding bar charts provided in Figure 10.

Table 4.

Results validate the robustness and reliability of the model across varying sample distributions.

Models	Sample size
	40	120	360	480	840
CNN	0.64 ± 0.07	0.72 ± 0.05	0.83 ± 0.03	0.85 ± 0.03	0.89 ± 0.02
LSTM	0.58 ± 0.08	0.74 ± 0.06	0.84 ± 0.04	0.88 ± 0.03	0.91 ± 0.02
SVM	0.42 ± 0.11	0.56 ± 0.09	0.77 ± 0.05	0.83 ± 0.04	0.87 ± 0.03
DAFDMRT	0.85 ± 0.10	0.87 ± 0.08	0.88 ± 0.04	0.90 ± 0.04	0.93 ± 0.03
MSDF-DANN	0.93 ± 0.02	0.935 ± 0.02	0.95 ± 0.02	0.96 ± 0.01	0.96 ± 0.01

CNN: convolutional neural network; LSTM: long short-term memory; MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Figure 10.

Recognition accuracy of each model under different sample sizes.

Analysis of the experimental results in Table 4 and Figure 10 reveals that traditional methods such as SVM perform poorly under small-sample and cross-operating condition scenarios, with recognition accuracy heavily influenced by the characteristics of target domain data. The experimental data show that SVM’s accuracy fluctuates significantly across different operating conditions, ranging from a low of 0.42 to a high of 0.92. While deep learning methods like CNN and LSTM can automatically extract features, their performance is still sensitive to the distribution of target domain data, which limits their effectiveness when target domain samples are scarce.

In contrast, MSDF-DANN significantly enhances the stability and accuracy of cross-domain fault diagnosis through multi-source feature fusion and adversarial transfer learning. This improvement stems primarily from the multi-source feature fusion mechanism, which enables the model to extract effective features from multiple source domains, boosting its generalization capability; the feature alignment strategy reduces distribution differences between source and target domains, ensuring high recognition accuracy even with limited target domain data. Unlike CNNs and LSTMs, which require substantial target domain samples for good performance, MSDF-DANN achieves relatively accurate diagnosis even with minimal labeled target domain data through domain adversarial transfer learning. To further analyze the model’s classification performance, we visualized its classification results using a confusion matrix when the sample size was minimal, as shown in Figure 11.

Figure 11.

Confusion matrix: training sample count is 40: (a) CNN, (b) LSTM, (c) SVM, (d) DAFDMRT and (e) MSDF-DANN. CNN: convolutional neural network; LSTM: long short-term memory; MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Analysis of Figure 11 shows that the limited availability of labeled data in the target domain leads to low recognition accuracy when deep learning methods—such as SVM, CNN, and LSTM networks—are applied. The small sample size hinders sufficient model training, resulting in poor generalization ability in distinguishing bearing fault samples from normal ones. In contrast, the multi-source domain fault recognition method based on DANNs with MSDF achieves superior diagnostic performance. This approach leverages a small amount of labeled target domain data to fine-tune the transfer diagnostic model, even under the constraint of limited target domain datasets (Figure 12).

Figure 12.

t-SNE visualization: training sample count is 40: (a) CNN, (b) LSTM, (c) SVM, (d) DAFDMRT and (e) MSDF-DANN. CNN: convolutional neural network; LSTM: long short-term memory; MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Comparative experiment for single-source domain and multi-source domain

This section evaluates how multi-domain feature fusion improves diagnostic performance in the target domain under varying conditions. When target domain data is limited or distribution shifts are significant, single-domain models may fail to capture cross-domain patterns. Integrating multi-domain data alleviates data scarcity and enhances generalization. We trained and evaluated both single- and multi-domain models on target domain data.

In these experiments, the ABC-E dataset serves as the source domain dataset for the multi-domain cross-condition fault recognition method, while the A-E dataset functions as the single-domain dataset. The target domain comprises 120 total samples, with 30 datasets per category and a validation set of 200 datasets. The model was trained for 500 epochs, and the recognition accuracy on the validation dataset is shown in Figure 13.

Figure 13.

Comparison for single-source and multi-source domain adaptation.

As illustrated in Figure 13, the maximum accuracies achieved by the two methods were 93.5 and 88%, respectively, demonstrating that the multi-source domain transfer diagnosis method significantly outperforms the single-source counterpart. This result indicates that feature fusion across multiple operating conditions provides richer feature information, enhancing the model’s generalization capability and adaptability. By integrating source domain data from diverse operating conditions, the multi-domain approach reduces domain bias, improves recognition of target domain fault patterns, and strengthens the model’s adaptability and robustness to complex scenarios. In contrast, single-domain methods, relying solely on data from a single-source domain, may lead to model overfitting to specific operating conditions. This hinders the effective extraction of domain-invariant features, making it difficult to adapt to changes in the target domain and reducing diagnostic accuracy and reliability. Therefore, compared to single-domain transfer diagnosis methods, multi-domain transfer diagnosis methods demonstrate superior performance in improving fault recognition accuracy, enhancing transfer capability, and boosting adaptability to complex operating conditions and diagnostic stability.

To more intuitively evaluate the classification accuracy of the multi-source domain transfer diagnosis method based on MSDF-DANN and the single-source domain transfer diagnosis method based on DANN in bearing fault diagnosis, the highest diagnostic accuracy achieved on the target domain test set during experimental training was visualized using a confusion matrix. The matrix illustrates the classification performance of both methods across various fault categories, clearly revealing prediction accuracy, misclassification rates, and overall performance for each category, as presented in Figure 14.

Figure 14.

Confusion matrix of single-source and multi-source domain adaptation diagnosis: (a) DANN and (b) MSDF-DANN. MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Analysis of Figures 14 and 15 reveals that misclassified samples predominantly cluster within the fault data category. This likely stems from the broad scope of fault data, encompassing multiple types such as inner-ring faults, outer-ring faults, and rolling-element faults. The feature representations of these subtypes exhibit significant overlap, leading to classification ambiguity. In contrast, the normal operating condition is characterized by more distinct and consistent features, resulting in higher recognition accuracy.

Figure 15.

t-SNE of single-source and multi-source domain adaptation diagnosis: (a) DANN and (b) MSDF-DANN. MSDF: multi-source domain feature fusion; DANN: domain adversarial neural network.

Based on a comprehensive analysis of experiments involving model parameter optimization, transferability, and diagnostic accuracy, as well as comparative evaluations across different methodologies, the effectiveness of the MSDF-DANN method for multi-source domain fault diagnosis under varying operating conditions has been clearly demonstrated. This study thus provides a solid scientific foundation and technical support for intelligent fault diagnosis in complex operational environments.

Conclusion

This paper addresses the dual challenges of feature distribution shift and limited labeled samples in fault diagnosis of rolling bearings under cross-operating conditions. We propose the MSDF-DANN method that transforms 1D vibration signals into 2D geometric feature maps embedded with semantic information, thereby effectively isolating fault characteristics from background noise. A three-stage convolutional architecture is designed to fuse multi-source domain features, enhancing the model’s adaptability to complex and varying operating conditions. Through a theoretical and experimental analysis of the adversarial weight, we identified the equilibrium regime that balances domain invariance and classification discrimination, effectively preventing negative transfer.

Comparative experiments with advanced baselines and t-SNE visualizations confirm the superiority of MSDF-DANN. It achieves high diagnostic accuracy with minimal standard deviation (±0.02) even with limited target samples, demonstrating exceptional stability and interpretability. Future work will focus on developing an adaptive weighting strategy to dynamically optimize the trade-off between domain alignment and classification accuracy during the training process.

Footnotes

Acknowledgements

The source domain data is from the Case Western Reserve University Bearing Data Center Website:

Author contributions

Jin Xu: software, methodology, investigation, data curation, conceptualization. Jiang Yi: validation, software, resources, data curation. Chang Liu: validation, data curation, funding acquisition. Yusong Pang: validation, data curation. Gang Cheng: writing—review and editing, supervision, project administration, investigation, conceptualization.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China (no. 52304179) and the Xuzhou Science and Technology Planning Project (no. KC22030).

ORCID iD

Jin Xu

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Zhang

Liu

Shao

, et al. CFENet: a contrastive frequency-sensitive learning method for gas-insulated switch-gear fault detection under varying operating conditions using acoustic signals. Eng Appl Artif Intell 2024; 135: 108835.

Liu

Shao

Overview of dynamic modelling and analysis of rolling element bearings with localized and distributed faults. Nonlinear Dyn 2018; 93: 1765–1798.

Guan

Xiong

, et al. Study on dynamic characteristics of the gear-dual-rotor system with multi-position rubbing. Mech Mach Theory 2024; 191: 105501.

Cheng

, et al. The interpretable health condition monitoring method of gear transmission system embedded in feature cloud-guided hypergraph structure under the extremely small-sample background. ISA Trans 2025; 162: 272–286.

Ren

Lin

Feng

, et al. A systematic review on imbalanced learning methods in intelligent fault diagnosis. IEEE Trans Instrum Meas 2023; 72: 1–35.

Zhang

Wang

Zhao

Vertices packaging-based interval independent component analysis (VP-I2CA) for fault detection with process uncertainty. IEEE Trans Ind Inform 2024; 20: 919–930.

Liu

Peng

Few-shot bearing fault diagnosis by semi-supervised meta-learning with graph convolutional neural network under variable working conditions. Measurement 2025; 240: 115402.

Gao

Zhu

Ren

, et al. A novel weak fault diagnosis method for rolling bearings based on LSTM considering quasi-periodicity. Knowl-Based Syst 2021; 231: 107413.

Zheng

Fault diagnosis method of rolling bearing based on MSCNN-LSTM. Comput Mater Contin 2024; 79: 4395–4411.

10.

Maurya

Singh

Verma

, et al. Condition-based monitoring in variable machine running conditions using low-level knowledge transfer with DNN. IEEE Trans Autom Sci Eng 2021; 18: 1983–1997.

11.

Pan

Yang

A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22: 1345–1359.

12.

Jiang

Kuang

, et al. Towards enhanced interpretability: a mechanism-driven domain adaptation model for bearing fault diagnosis across operating conditions. Mech Syst Signal Process 2025; 225: 112244.

13.

Huo

Jiang

Shen

, et al. A class-level matching unsupervised transfer learning network for rolling bearing fault diagnosis under various working conditions. Appl Soft Comput 2023; 146: 110739.

14.

Zhong

Yuan

, et al. Bearing fault diagnosis using transfer learning and self-attention ensemble lightweight convolutional neural network. Neurocomputing 2022; 501: 765–777.

15.

Huo

Jiang

Shen

, et al. Enhanced transfer learning method for rolling bearing fault diagnosis based on linear superposition network. Eng Appl Artif Intell 2023; 121: 105970.

16.

Deng

Huang

, et al. A double-layer attention based adversarial network for partial transfer learning in machinery fault diagnosis. Comput Ind 2021; 127: 103399.

17.

Xie

Huang

Choi

S-K.

Intelligent mechanical fault diagnosis using multisensor fusion and convolution neural network. IEEE Trans Ind Inform 2022; 18: 3213–3223.

18.

Xia

Mao

Zhang

, et al. A new compound fault diagnosis method for gearbox based on convolutional neural network. In: 2020 IEEE 9th data driven control and learning systems conference (DDCLS), Liuzhou, China, 20–22 November 2020, pp. 1077–1083. New York: IEEE.

19.

Wang

, et al. Fault diagnosis of bearings based on multi-sensor information fusion and 2D convolutional neural network. IEEE Access 2021; 9: 23717–23725.

20.

Xia

, et al. Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks. IEEEASME Trans Mechatron 2018; 23: 101–110.

21.

Wang

Song

, et al. An enhanced intelligent diagnosis method based on multi-sensor image fusion via improved deep learning network. IEEE Trans Instrum Meas 2020; 69: 2648–2657.

22.

Huang

, et al. Servo motor fault diagnosis based on data fusion. In: 2021 33rd Chinese control and decision conference (CCDC), Kunming, China, 22–24 May 2021, pp. 6737–6743. New York: IEEE.

23.

Tang

Zhang

Qin

, et al. Graph cardinality preserved attention network for fault diagnosis of induction motor under varying speed and load condition. IEEE Trans Ind Inform 2022; 18: 3702–3712.

24.

Yuan

, et al. Variable-bandwidth self-convergent variational mode decomposition and its application to fault diagnosis of rolling bearing. IEEE Trans Instrum Meas 2024; 73: 3511615.

25.

Ganin

Ustinova

Ajakan

, et al. Domain-adversarial training of neural networks. J Mach Learn Res 2016; 17: 1–35.

26.

Ganin

Lempitsky

Unsupervised domain adaptation by backpropagation. In: Bach

Blei

(eds) International conference on machine learning, Lille, France, vol. 37. 2015, pp. 1180–1189. San Diego: JMLR-Journal Machine Learning Research.

27.

Smith

Randall

RB.

Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study. Mech Syst Signal Process 2015; 64–65: 100–131.

28.

Cheng

, et al. Research on multi-granularity imbalanced knowledge condition monitoring for mechanical equipment based on hierarchical ELM in multi-entropy space. Expert Syst Appl 2024; 238: 121817.

29.

Zhao

Yan

Chen

, et al. Deep learning and its applications to machine health monitoring. Mech Syst SIGNAL Process 2019; 115: 213–237.

30.

Chen

Zhang

Gao

Bearing fault diagnosis base on multi-scale CNN and LSTM model. J Intell Manuf 2021; 32: 971–987.

31.

Widodo

Yang

B-S.

Support vector machine in machine condition monitoring and fault diagnosis. Mech Syst Signal Process 2007; 21: 2560–2574.

32.

Wang

, et al. Intelligent fault diagnosis of rotating machinery under variable working conditions based on deep transfer learning with fusion of local and global time-frequency features. Struct Health Monit 2024; 23: 2238–2254.