DW-JDA: An unsupervised cross-domain dynamic weighted joint distribution adaptation network for high-speed train bearing fault diagnosis

Abstract

To address the challenges of significant feature distribution discrepancies across different working conditions and the difficulty in distinguishing fine-grained faults in rolling bearings, a dynamic weighted joint distribution adaptation (DW-JDA) network is proposed. The method integrates a continuous wavelet transform (CWT) preprocessing module, a depthwise separable convolution (DSConv) based feature extractor, and a novel entropy-weighted discriminative joint distribution alignment (EW-DJDA) mechanism. First, one-dimensional vibration signals are converted into two-dimensional time–frequency images via CWT to fully capture the time–frequency characteristics of nonstationary signals. Then, a hybrid backbone network based on DSConv is constructed to efficiently extract spatial features. Simultaneously, the EW-DJDA module is introduced, which dynamically adjusts the alignment weights of marginal and conditional distributions according to the prediction uncertainty in the target domain, effectively mitigating negative transfer. Furthermore, to tackle the issue of ambiguous decision boundaries, a Hard Negative Margin Softmax loss function is proposed, which explicitly increases the distance between the target class and the strongest interfering class, thereby enhancing the model’s discriminability for hard-to-distinguish samples. Experiments on the Central South University, Case Western Reserve University, and Beijing Jiaotong University datasets demonstrate that DW-JDA achieves average diagnostic accuracies of 99.55, 99.92, and 99.91%, respectively, showing competitive or best average performance among the compared transfer learning methods. Particularly under strong noise conditions with signal-to-noise ratio = −2 dB, the model maintains an average accuracy of 97.71%, exhibiting excellent robustness and generalization capability.

Keywords

fault diagnosis transfer learning unsupervised domain adaptation joint distribution adaptation dynamic weighting

Introduction

Rolling bearings are the core components of rotating machinery systems, serving a pivotal role in critical infrastructure such as high-speed trains, wind turbines, and aerospace engines. Their primary function is to support mechanical loads and transmit power with minimal friction.^1–4 However, these components operate inevitably under harsh and complex industrial environments, characterized by heavy loads, continuous speed variations, and strong background noise. Such extreme conditions render bearings highly susceptible to surface degradation and structural fatigue. A minor defect, if not detected and rectified in its early stages, can rapidly propagate, compromising the reliability of the entire mechanical system and potentially precipitating catastrophic economic losses or severe safety incidents.⁵ Consequently, the development of intelligent and robust fault diagnosis methodologies is not merely a technical preference but an industrial imperative.

In the past decades, fault diagnosis has evolved from traditional signal processing to intelligent data-driven paradigms. Traditional methods largely relied on manual feature engineering, requiring domain experts to extract statistical indicators or spectral features based on prior physical knowledge. However, these manual processes are labor-intensive and often fail to capture complex, nonlinear fault patterns hidden in high-dimensional data. In recent years, with the exponential growth of industrial data and computing power, deep learning (DL) techniques, such as convolutional neural networks and recurrent neural networks, have achieved remarkable success.^6–9 By automatically learning hierarchical feature representations from raw vibration signals, DL models have significantly reduced the reliance on human intervention and improved diagnosis accuracy.^6,10

Despite these advancements, the practical deployment of standard DL models faces a fundamental obstacle: they predominantly operate under the assumption that the training and testing data are independent and identically distributed.¹¹ In real-world engineering scenarios, such as high-speed train operations, working conditions fluctuate continuously.¹² Changes in rotational speed result in shifts in characteristic fault frequencies, while variations in load alter the amplitude modulation of vibration signals. These physical changes lead to a significant distribution shift between the labeled data collected under controlled conditions and the unlabeled monitoring data acquired during actual operation.^13–16 Consequently, a model trained on a specific source domain often suffers from severe performance degradation when transferred to an unseen target domain. To mitigate this issue, unsupervised domain adaptation (UDA) methods^17–20 have been extensively adopted. The core objective of UDA is to minimize the distribution discrepancy between domains by mapping them into a shared feature subspace, thereby facilitating the transfer of diagnostic knowledge from the source domain to the unlabeled target domain.^21–23 Building on this concept, recent advancements have successfully applied domain adaptation frameworks to tackle specific environmental and operational variations. For instance, fine-tuning DL frameworks have been developed to effectively palliate data distribution shifts in rotary machine fault detection.²⁴ Furthermore, to address environmental variability, novel domain-adaptive networks have been proposed to ensure temperature-resilient structural health monitoring.²⁵

Although UDA-based approaches have demonstrated promising potential, existing research still faces two critical bottlenecks regarding distribution alignment strategies and decision boundary discriminability, which limit their performance in complex, variable-condition scenarios. Most current UDA methods suffer from the dilemma of static alignment.^26–28 Joint distribution adaptation (JDA) aims to align both the global marginal distribution and the class-aware conditional distribution. However, existing methods typically aggregate these two alignments using equal or static weights, failing to account for the dynamic nature of the learning process.^29,30 The credibility of conditional distribution alignment is inherently dependent on the accuracy of pseudo-labels in the target domain. In the early training stages, the classifier’s predictions are often ambiguous or incorrect. Forcibly imposing strong conditional alignment based on these erroneous pseudo-labels causes the model to align samples to the wrong classes, exacerbating error propagation. This phenomenon, known as negative transfer,^31–33 severely hinders model convergence. A mechanism that can dynamically perceive the reliability of pseudo-labels and adjust the alignment focus is largely missing in current literature. At the decision boundary level, traditional UDA diagnosis models typically employ the standard Softmax loss for classification.^34–36 While Softmax effectively maximizes the probability of the correct class, it does not explicitly impose constraints on the geometric structure of the feature space. In bearing fault diagnosis, signals from different fault types or different severity levels often exhibit highly similar spectral patterns and temporal characteristics. These hard negative samples reside dangerously close to the decision boundaries.^37,38 Softmax loss only ensures that the samples are correctly classified, but it does not guarantee a sufficient safety margin between classes. Without dedicated mechanisms to enforce sufficient interclass margin and separability, even minor domain shifts can cause these ambiguous samples to cross the decision boundary, resulting in misclassification.

To address these aforementioned challenges, this article proposes a novel fault diagnosis framework termed DW-JDA. This framework integrates time–frequency analysis, uncertainty-aware domain alignment, and margin-based classification into a unified architecture. First, to fully capture transient fault features under nonstationary conditions, one-dimensional vibration signals are transformed into two-dimensional time–frequency images via continuous wavelet transform (CWT). This allows the model to leverage the powerful feature extraction capabilities of depthwise separable convolution (DSConv) while retaining time–frequency localization. Second, to resolve the static alignment dilemma, an entropy-weighted discriminative joint distribution alignment (EW-DJDA) strategy is introduced. This mechanism utilizes the information entropy of the target predictions as a proxy for uncertainty. It dynamically regulates the contribution of marginal and conditional alignments, prioritizing global alignment in the early stages and gradually shifting focus to fine-grained class alignment as pseudo-labels become reliable, thus effectively preventing negative transfer. Finally, to enhance boundary robustness, an Hard Negative Margin Softmax (HNM-Softmax) loss function is designed. Unlike standard Softmax, HNM-Softmax imposes a rigorous additive margin penalty specifically on the most confusable hard negative classes. This forces the model to learn more discriminative features, pushing hard samples away from the decision boundary and ensuring robustness against domain shifts.

The main contributions of this article are summarized as follows:

A fault diagnosis framework (DW-JDA) is proposed. By organically integrating CWT with deep feature extraction, the model effectively extracts domain-invariant representations across variable operating conditions.

An EW-DJDA strategy is developed. This method adaptively balances the importance of global and fine-grained alignment based on prediction uncertainty, ensuring a smooth and stable optimization process.

An HNM-Softmax loss function is designed to specifically target hard-to-distinguish samples. It enhances the model’s discriminative power by maximizing the decision margin between the target class and the most competitive negative classes.

Extensive experiments on three datasets (Central South University (CSU), Case Western Reserve University (CWRU), and Beijing Jiaotong University (BJTU)) demonstrate that the proposed method achieves competitive performance across varying-speed transfer tasks.

Methodology

To address the significant fault feature distribution discrepancies and the challenge of distinguishing confusable samples in rolling bearings under cross-working-condition scenarios, a DW-JDA based on DSConv and hard negative sample mining is proposed, as shown in Figure 1.

Figure 1.

Schematic diagram of rolling bearing fault diagnosis model.

The framework consists of three main components: a data preprocessing module based on CWT, a DSConv-based hybrid network for feature extraction, and a domain adaptation module that incorporates the HNM-Softmax classifier and EW-DJDA.

The framework is designed to learn discriminative features from labeled source domain data and generalize them to the unlabeled target domain through multilevel distribution alignment strategies.

DSConv-based hybrid network for feature extraction

Due to the pronounced nonstationary and nonlinear characteristics of vibration signals in rolling bearings, directly feeding raw one-dimensional signals into the network often fails to effectively capture transient impact components. Therefore, this article first employs CWT to convert one-dimensional time-domain vibration signals into two-dimensional time–frequency images. CWT provides excellent localization properties in both the time and frequency domains simultaneously, effectively revealing multiscale time–frequency texture features in fault signals and supplying rich informational input for subsequent deep feature extraction. The computational formula for the wavelet transform is given by:

WT (α, τ) = \frac{1}{\sqrt{α}} \int_{- \infty}^{\infty} f (t) ψ^{*} (\frac{t - τ}{α}) d t

(1)

where $α$ denotes the scale parameter of the wavelet transform, $τ$ denotes the translation parameter, and $ψ (t)$ represents the mother wavelet function. The scale $α$ controls the dilation and contraction of the wavelet function, while the translation $τ$ governs its temporal shift. The wavelet basis employed in this study is the complex Morlet wavelet cmorl3-3. This specific basis is selected because its waveform morphologically matches the transient, exponentially decaying impulse responses typical of bearing faults. By capturing both amplitude and phase, it yields a smoother, interference-free time–frequency representation compared to real-valued wavelets. Furthermore, its parameters (bandwidth $F_{b} = 3$ , center frequency $F_{c} = 3$ ) provide an optimal trade-off, balancing the time resolution needed to separate successive impacts with the frequency resolution required to localize resonance bands.

In the feature extraction stage, to ensure strong feature representation capability while reducing model parameters and computational complexity, this article adopts the backbone network structure based on DSConv proposed in our previous work MPAIT-Net.³⁹ Unlike standard convolution, DSConv decomposes the standard convolution operation into two sequential steps: Depthwise Convolution and Pointwise Convolution.

Specifically, let the input feature map be denoted as $X \in R^{H \times W \times C_{in}}$ , where $H$ , $W$ , and $C_{in}$ represent the height, width, and number of input channels, respectively. First, depthwise convolution applies a single convolutional filter $K_{dw} \in R^{k \times k \times C_{in}}$ to each input channel independently to capture spatial features. The intermediate feature map $M$ is calculated as follows:

M_{c, i, j} = X_{c} * K_{c} = \sum_{u, v} K_{c, u, v} \cdot X_{c, i + u, j + v}

(2)

where $K_{c}$ denotes the convolutional kernel corresponding to the cth channel, $(u, v)$ represents the spatial coordinates within the kernel, and $*$ denotes the convolution operation.

Subsequently, pointwise convolution employs $1 \times 1$ convolution kernels $K_{pw} \in R^{1 \times 1 \times C_{in} \times C_{out}}$ to project the intermediate features into a new channel space, enabling information interaction across channels. The final output feature map $Y$ is obtained by

Y_{n, i, j} = \sum_{c = 1}^{C_{in}} K_{p w_{n, c}} \cdot M_{c, i, j}

(3)

where $n$ indicates the index of the output channel (n = 1, 2, …, C_out).

Through this decomposition, the computational cost is significantly reduced. Assuming an output channel number of $C_{out}$ and a kernel size of $k \times k$ , the ratio of the computational complexity of DSConv to that of standard convolution $(Ω_{std})$ can be expressed as follows:

\begin{matrix} \frac{Ω_{DSConv}}{Ω_{std}} = \frac{H \cdot W \cdot C_{in} \cdot k^{2} + H \cdot W \cdot C_{in} \cdot C_{out} \cdot 1^{2}}{H \cdot W \cdot C_{in} \cdot C_{out} \cdot k^{2}} \\ = \frac{1}{C_{out}} + \frac{1}{k^{2}} \end{matrix}

(4)

This formula indicates that as the number of channels and kernel size increase, DSConv significantly reduces computational load compared to standard convolution. After extracting local spatial features using these DSConv-based MPAConv blocks, the feature maps are processed through a patch convolution (Patch_conv) and rearrange operation to generate token sequences. These sequences are then fed into an improved vision transformer module. By combining the local inductive bias of DSConv with the global receptive field of the transformer, the complete backbone network (as detailed in Table 2) achieves highly robust and comprehensive feature representations. The extracted high-dimensional features are then mapped to a low-dimensional shared feature space via a fully connected bottleneck layer (Bottleneck), enabling subsequent alignment and classification by the classifier and domain adaptation module.

Entropy-weighted discriminative joint distribution alignment

To achieve discriminative feature learning with hard alignment for unlabeled target samples, an EW-DJDA module is designed in the shared feature space, as shown in Figure 2. This module is jointly optimized with the classification loss during the domain adaptation stage, guiding the feature extraction network to simultaneously reduce the marginal distribution discrepancies and class-conditional distribution discrepancies between the source and target domains, thereby obtaining more discriminative domain-invariant features.

Figure 2.

Schematic diagram of the EW-DJDA module. EW-DJDA: entropy-weighted discriminative joint distribution alignment.

Source domain and target domain samples are mapped to a shared feature space through a weight-shared feature extractor, yielding deep features $z^{s}$ and $z^{t}$ . The EW-DJDA module aligns the distributions of the two domains by leveraging the ground-truth labels $y^{s}$ from the source domain and pseudo-labels ${\hat{y}}^{t}$ from the target domain. In particular, the module generates a dynamic factor $μ$ by computing the entropy of target domain predictions, thereby adaptively balancing the adaptation weights between marginal and conditional distributions.

The detailed procedure is as follows: Let the feature extraction network be denoted as $G (\cdot)$ and the classifier as $C (\cdot)$ . For labeled source domain samples ${x_{i}^{s}, y_{i}^{s}}_{i = 1}^{n_{s}}$ and unlabeled target domain samples ${x_{j}^{t}}_{j = 1}^{n_{t}}$ , their representations in the shared feature space are, respectively, given by:

f_{i}^{s} = G (x_{i}^{s}), f_{j}^{t} = G (x_{j}^{t})

(5)

First, the Multikernel maximum mean discrepancy is employed to measure the distance between the source and target domains at the global distribution level, thereby facilitating marginal distribution alignment. Let $ϕ (\cdot)$ denote the implicit feature mapping corresponding to the representation in the reproducing kernel Hilbert space. The marginal distribution alignment loss is then formulated as follows:

L_{m} = ‖ \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ (f_{i}^{s}) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ (f_{j}^{t}) ‖_{H}^{2}

(6)

$L_{m}$ quantifies the discrepancy between the overall feature distributions of the source and target domains, serving as the foundation for stable alignment in the early stages of domain adaptation.

Building upon marginal alignment, a class-conditional distribution alignment term is introduced to further enhance feature discriminability. Ground-truth labels $y_{i}^{s}$ are used for source domain samples, while class probabilities output by the classifier are employed for target domain samples:

p_{j}^{t} = C (f_{j}^{t})

(7)

Pseudo-labels are obtained as ${\hat{y}}_{j}^{t} = \arg max_{k} p_{j}^{t} (k)$ . Let $n_{s}^{k}$ and $n_{t}^{k}$ denote the number of samples in class $k$ for the source and target domains, respectively, with corresponding feature sets ${f_{i}^{s, k}}$ and ${f_{j}^{t, k}}$ . The conditional distribution alignment loss is then:

L_{c} = \sum_{k = 1}^{K} ‖ \frac{1}{n_{s}^{k}} \sum_{i : y_{i}^{s} = k} ϕ (f_{i}^{s, k}) - \frac{1}{n_{t}^{k}} \sum_{j : {\hat{y}}_{j}^{t} = k} ϕ (f_{j}^{t, k}) ‖_{H}^{2}

(8)

where $K$ is the number of fault classes. $L_{c}$ reduces the distance between intraclass samples from the source and target domains at the class level, contributing to discriminative feature learning with hard alignment.

Since pseudo-label quality in the target domain is poor in early training and gradually improves, directly enforcing strong conditional alignment may lead to erroneous alignment. To address this, an adaptive weight $μ_{k}$ is constructed based on the entropy of target domain prediction distributions, automatically modulating the relative importance of marginal and conditional alignment during training. For each target sample, the prediction entropy is defined as follows:

H (p_{j}^{t}) = - \sum_{k = 1}^{K} p_{j}^{t} (k) \log p_{j}^{t} (k)

(9)

which is normalized by the number of classes:

H^{″} (p_{j}^{t}) = \frac{H (p_{j}^{t})}{\log K}

(10)

Lower $H^{″} (p_{j}^{t})$ indicates higher prediction confidence for the sample. However, reducing the target uncertainty to a single batch-level scalar fails to detect class-specific pseudo-label collapse. To address this, a classwise reliability weighting mechanism is proposed. For each specific fault category k (k = 1, 2, …, K), the classwise adaptive coefficient $μ_{k}$ is defined based on the average prediction entropy of the target samples assigned to pseudo-class k:

μ_{k} = 1 - \frac{1}{n_{t}^{k}} \sum_{f_{j}^{t} \in X_{t}^{(k)}} H^{″} (p_{j}^{t})

(11)

where $X_{t}^{(k)}$ represents the set of target samples assigned with pseudo-label k, and $n_{t}^{k}$ is the number of samples in this set. Thus, if the model is highly uncertain about a specific pseudo-class k (high entropy), $μ_{k}$ becomes small, independently suppressing the conditional alignment for this corrupted class to prevent negative transfer. Conversely, for classes with confident predictions, $μ_{k}$ automatically increases. In implementation, when no target samples are assigned to pseudo-class k in a mini-batch (i.e., $n_{t}^{k}$ = 0), the corresponding reliability coefficient is set to $μ_{k}$ = 0, and the class-conditional alignment term for that class is skipped in the current iteration. This treatment avoids numerical instability and prevents empty or unreliable pseudo-class estimates from affecting optimization.

It is worth noting that although Equation (8) utilizes hard pseudo-labels, the potential negative transfer caused by noisy predictions is effectively suppressed by the classwise reliability weighting mechanism $μ_{k}$ . Unlike traditional confidence-filtering methods that require manual thresholds, our EW-DJDA adaptively scales the alignment importance based on the continuous entropy of each class, ensuring that only clusters with high prediction certainty contribute to the conditional distribution alignment.

To maintain the balance between global marginal alignment and fine-grained conditional alignment, the loss for the entropy-weighted discriminative joint distribution alignment ( $L_{EW - DJDA}$ ) is reformulated as follows:

L_{EW - DJDA} = (1 - \bar{μ}) L_{m} + \sum_{k = 1}^{K} μ_{k} L_{c}^{(k)}

(12)

where $\bar{μ} = \frac{1}{K} \sum_{k = 1}^{K} μ_{k}$ is the mean reliability across all classes, and $L_{c}^{(k)}$ represents the conditional distribution discrepancy for the specific class k (i.e., the specific term inside the summation of Equation (8)). This fine-grained, classwise weighting strategy dynamically distinguishes reliable classes from confused ones within the same batch, ensuring a highly robust distribution alignment. Using the mean reliability $\bar{μ}$ for the marginal term helps keep its contribution at a comparable scale to the aggregated classwise conditional term $\sum_{k = 1}^{K} μ_{k} L_{c}^{(k)}$ , thereby improving optimization stability across minibatches with different pseudo-label compositions.

Hard-Negative Margin Softmax

Conventional loss functions optimize global probability distributions by treating all negative classes equally, but in fault diagnosis, feature confusion mainly occurs between similar classes, and ignoring hard negatives near decision boundaries can lead to misclassifications under domain shifts. To improve discriminability for confusable faults, we propose HNM-Softmax, which applies relative margin constraints to explicitly increase the distance between the target class and the most confusable interfering class, as shown in Figure 3.

Figure 3.

Schematic diagram of decision boundaries in the feature space. (a) Standard Softmax leads to confusion of difficult-to-classify samples at the boundaries and (b) HNM-Softmax introduces relative margin constraints, significantly increasing the inter-class distance. HNM-Softmax: Hard Negative Margin Softmax.

For an input sample $x_{i}$ , let its ground-truth label be $y_{i}$ and the model’s output logit vector be $z$ . HNM-Softmax first performs hard negative mining, identifying the nontarget class with the highest prediction score as the hardest negative sample:

{\hat{y}}_{hard}^{(i)} = max_{j \neq y_{i}} (z_{j}^{(i)})

(13)

Here, $z_{j}^{(i)}$ denotes the logit value of sample $i$ for class $j$ . To ensure sufficient classifier robustness, a relative margin constraint is introduced, requiring the target class score $z_{y_{i}}^{(i)}$ to exceed the hardest negative sample score ${\hat{y}}_{hard}^{(i)}$ by at least a predefined threshold $m_{hn}$ . Based on this, the hard negative margin loss term is formulated as:

L_{margin} = \frac{1}{N} \sum_{i = 1}^{N} max (0, m_{hn} - (z_{y_{i}}^{(i)} - {\hat{y}}_{hard}^{(i)}))

(14)

where $N$ is the batch size and $m_{hn}$ is the relative margin hyperparameter. Gradients are produced only when the gap between the target class and the strongest confusable class is less than $m_{hn}$ . This mechanism forces the model to concentrate optimization efforts on hard-to-distinguish boundary samples rather than expending resources on already well-separated easy samples.

Ultimately, the HNM-Softmax loss $L_{HNM}$ is defined as the weighted combination of the standard cross-entropy loss $L_{CE}$ and the hard negative margin term $L_{margin}$ :

L_{HNM} = L_{CE} + β L_{margin}

(15)

where $L_{CE}$ is the standard cross-entropy loss, which ensures the baseline prediction probability for the target class, and $β$ is the balancing coefficient.

The overall optimization objective of HNM-Softmax is to explicitly maximize the margin and separability between confusable classes, ensuring robust decision boundaries. Through this targeted hard negative suppression strategy, the feature extractor learns discriminative features with clearer decision boundaries, thereby effectively supporting the EW-DJDA module in achieving more precise alignment under complex operating conditions.

Optimization objective and model training

The overall optimization objective comprises two components: one is the hard negative margin classification loss $L_{HNM}$ , designed to enforce explicit interclass margin constraints and enhance feature separability, and the other is the entropy-weighted discriminative joint distribution alignment loss $L_{EW - DJDA}$ , aimed at eliminating interdomain distribution discrepancies. Unlike traditional methods that employ standard cross-entropy loss, this article utilizes the $L_{HNM}$ proposed in “Hard-Negative Margin Softmax” section as the source domain supervised loss to enhance the model’s discriminability for confusable faults. Combining Equations (12) and (15), the total loss function of the model $L_{total}$ is defined as follows:

L_{total} = L_{HNM} + λ_{ew - djda} L_{EW - DJDA}

(16)

where $λ_{ew - djda}$ is the regularization coefficient that balances classification performance and domain adaptation effects. During training, $L_{HNM}$ forces the feature extractor to learn discriminative features with clear decision boundaries, particularly targeting hard-to-distinguish samples near the boundaries, while $L_{EW - DJDA}$ guides the model toward domain-invariant feature representations by dynamically adjusting the alignment weights between marginal and conditional distributions.

Considering the training stability of deep domain adaptation networks, enforcing alignment in the early stages of training, when the feature extractor has not yet extracted effective discriminative features, may induce negative transfer. To mitigate this, a staged training strategy is adopted in this paper. When the training epoch is less than the threshold $T_{da}$ , only the source domain supervised classification loss is used for pretraining, with $λ_{ew - djda} = 0$ . This stage leverages source domain label information to rapidly initialize the feature extractor parameters, establishing preliminary feature extraction and classification capabilities. Once the epoch exceeds $T_{da}$ , target domain data are incorporated and the domain alignment module is activated. In this subsequent stage, the model simultaneously minimizes hard negative sample classification errors and distribution discrepancies, with the hard sample mining mechanism of HNM-Softmax facilitating more precise class-conditional alignment by EW-DJDA.

Experimental analysis and validation

To comprehensively validate the effectiveness of the proposed method and its generalization capability across different operating conditions, three bearing datasets were selected for experimental evaluation. These include one laboratory self-constructed dataset and two publicly available datasets: the CWRU dataset and the BJTU dataset. The datasets encompass a wide range of rotational speeds from low to high, varying sampling frequencies, and complex fault patterns, thereby providing diverse data support for assessing the diagnostic performance of the model. Detailed parameter information for the three datasets is presented in Table 1.

Table 1.

Detailed information of the dataset.

Name	Dataset	Fault category	Fault diameter (inch)	Label	Speed(r/min)	Sampling frequency (kHz)
A	CWRU	NC	—	0	1772/1750/1730	12
		IF	0.007	1
		BF		2
		OF		3
		IF	0.014	4
		BF		5
		OF		6
		IF	0.021	7
		BF		8
		OF		9
B	CSU	NC	—	0	1200/1800/2100	48
		BF	/	1
		CF		2
		OF-P		3
		OF-C		4
C	BJTU	NC	—	0	1200/2400/3600	64
		IF	/	1
		OF		2
		BF		3
		CF		4

Note. CSU: Central South University; CWRU: Case Western Reserve University; BJTU: Beijing Jiaotong University.

CSU dataset: This dataset was acquired from a small bogie test rig at the Key Laboratory of Rail Traffic Safety, Ministry of Education, CSU, as shown in Figure 4. Vibration signals were recorded at a sampling frequency of 48 kHz. The data encompass three operating speeds: 1200, 1800, and 2100 r/min. This dataset emphasizes complex fault scenarios, including five distinct health states: normal condition (NC), ball fault (BF), cage fracture (CF), and two specific outer race faults (outer race pitting (OF-P) and outer race crack (OF-C)). In contrast to the simplified fault classifications in other datasets, it incorporates specific damage patterns such as pitting and cracking (OF-P and OF-C) as well as cage fracture (CF).

Figure 4.

Small bogie test bench.

CWRU dataset⁴⁰: This dataset originates from the CWRU Bearing Data Center. The experiments were conducted using a test rig comprising an electric motor, a torque transducer, and a dynamometer. Vibration signals were acquired from the drive-end bearings and fan-end bearings at a sampling frequency of 12 kHz. The dataset encompasses three distinct rotational speeds: 1772, 1750, and 1730 r/min. The bearing health conditions include one normal state (NC) and three different fault types: inner race fault (IF), ball fault (BF), and outer race fault (OF). For each fault type, single-point defects were introduced via electro-discharge machining, with fault severities categorized into three levels (fault diameters): 0.007, 0.014, and 0.021 inches. Consequently, for each rotational speed, the dataset comprises 10 distinct health states (one normal state + three fault types × three fault diameters).

BJTU dataset⁴¹: This dataset is sourced from the test rig at BJTU. It encompasses high-speed operating conditions, with motor rotational speeds set to 1200, 2400, and 3600 r/min, respectively. Vibration signals were recorded at a sampling frequency of 64 kHz. The dataset includes five bearing health states: normal condition (NC), inner race fault (IF), outer race fault (OF), ball fault (BF), and cage fracture (CF).

To further validate the robustness and effectiveness of the proposed method under variable operating conditions, cross-condition diagnostic experiments were designed within the aforementioned three datasets. These experiments simulate common rotational speed fluctuation scenarios in industrial settings by adopting a transfer learning paradigm: data from one specific speed serve as the source domain, while data from different speeds are treated as the target domain for cross-domain fault diagnosis. All experiments were conducted on a unified computing platform to ensure fair comparisons. The hardware environment features an Intel Core i5-12600KF CPU and an NVIDIA GeForce RTX 4070Ti SUPER GPU. The software development environment is based on the Windows operating system, utilizing Python 3.9 as the programming language, with the model implemented under the PyTorch 2.2.1 DL framework and GPU acceleration enabled via CUDA 12.1.

The initial learning rate was set to 0.001, and the Adam optimizer was employed for model parameter updates. To prevent overfitting and enhance generalization capability, a weight decay coefficient of 2 × 10⁻⁴ was applied. The total number of training epochs was set to 100. To ensure that the feature extractor acquires sufficient discriminative features from the source domain prior to domain alignment, a staged training strategy was adopted: the domain adaptation starting epoch was set to 30, meaning that only source domain classification training is performed in the first 30 epochs, after which the domain adaptation loss is formally introduced for cross-domain transfer optimization from epoch 30 onward. Additionally, a step-wise learning rate scheduling mechanism was incorporated during training, with a decay factor of 0.1 applied automatically at epochs 50 and 75. These specific milestones were determined through empirical observation of the training loss curves during preliminary experiments. Specifically, after the domain adaptation module is activated at epoch 30, the loss typically begins to plateau around epoch 50. Applying the first decay at this stage allows the optimizer to overcome local oscillations and reach a better minimum. The second decay at epoch 75 facilitates fine-grained parameter tuning in the final stages, thereby promoting stable convergence.

The sliding window length is set to 1024 samples, with a sliding step size of 1024. Time-series segments are obtained via this sliding window approach, and each segment is subsequently subjected to CWT. The sample acquisition process is illustrated in Figure 5. The detailed architecture of the model is presented in Table 2.

Figure 5.

Sample acquisition process.

Table 2.

Detailed structure of the model.

Module	Layer type	Input size	Output size
Feature extractor	Downsample_1	3 × 224 × 224	12 × 112 × 112
	MPAConv _1	12 × 112 × 112	12 × 112 × 112
	Downsample_2	12 × 112 × 112	24 × 56 × 56
	MPAConv _2	24 × 56 × 56	24 × 56 × 56
	Downsample_3	24 × 56 × 56	48 × 28 × 28
	MPAConv _3	48 × 28 × 28	48 × 28 × 28
	Downsample_4	48 × 28 × 28	96 × 14 × 14
	MPAConv _4	96 × 14 × 14	96 × 14 × 14
	Downsample_5	96 × 14 × 14	192 × 7 × 7
	Patch_conv	192 × 7 × 7	192 × 7 × 7
	Rearrange	192 × 7 × 7	49 × 192
	Improved Vision Transformer	49 × 192	1 × 192
Bottleneck	FC1	1 × 192	1 × 128
	ReLU + dropout	1 × 128	1 × 128
	FC2	1 × 128	1 × 64
	ReLU + dropout	1 × 64	1 × 64
Classifier	Classification layer	1 × 64	1 × num_classes

Note. Patch_conv: Patch Convolution; ReLU: rectified linear unit.

To rigorously prevent any potential data leakage during model training and evaluation, the data preparation protocol follows a strict split order. Given the UDA paradigm, the training (source) and testing (target) domains are naturally partitioned by their distinct physical operating conditions (e.g., different rotational speeds). This physically guarantees that the train and test domains are split prior to any segmentation at the raw acquisition level. Therefore, all segments derived from a single raw recording are exclusively kept within a single domain partition, strictly eliminating cross-domain leakage.

Furthermore, the utilized sliding window length is 1024 points with a step size of 1024 points, ensuring that all time-series segments are strictly nonoverlapping. Taking the CSU dataset as a representative example, each raw acquisition record for a specific health state under a given speed contains exactly 245,760 raw data points. Applying the nonoverlapping sliding window yields exactly 240 derived segments per class for each domain. Consequently, a complete single domain comprising five health states contains a total of 1,228,800 raw data points, which translates to exactly 1200 independent segments. Within each domain, these independent segments are then randomly partitioned into 80% for model training/adaptation and 20% for validation, strictly maintaining class balance without segment overlap.

To comprehensively evaluate the performance superiority of the proposed method, representative classical DL models and state-of-the-art domain adaptation methods were selected as baselines for comparison. The compared methods include conditional adversarial domain adaptation network (CADAN),⁴² deep discriminative domain adaptation network (DDDAN),⁴³ deep correlation alignment (DCORAL),⁴⁴ domain-adversarial neural network (DANN),⁴⁵ adaptive intermediate class-wise distribution alignment (AICDA),⁴⁶ and dynamic joint distribution adaptation (DJDA).⁴⁷ To mitigate the effects of randomness in network initialization and optimization, each transfer task was repeated independently five times under identical settings. The final result reported in the tables is the arithmetic mean of the five target-domain accuracies, and the corresponding standard deviation is also provided.

Experiments based on the CSU dataset

This section evaluates the effectiveness of DW-JDA under complex fault modes and varying speeds using the self-built CSU dataset. This dataset incorporates diverse failure modes, such as pitting, cracks, and cage breakage. To mimic the operational variations inherent in industrial environments, six cross-condition transfer tasks (TB1–TB6) were established by pairing data from three distinct speeds: 1200, 1800, and 2100 r/min. Detailed task settings and the resulting average diagnostic accuracies for all models are presented in Tables 3 and 4, respectively.

Table 3.

Cross-condition transfer tasks on the CSU dataset.

Task	Source domain category	Target domain category	Scenario
TB1	NC, BF, CF, OF-P, OF-C	NC, BF, CF, OF-P, OF-C	0 → 1
TB2			0 → 2
TB3			1 → 2
TB4			1 → 0
TB5			2 → 0
TB6			2 → 1

Note. In the table, 0: 1200, 1: 1800, and 2: 2100 r/min represent rotational speeds. CSU: Central South University.

Table 4.

Diagnostic accuracy (%) on the CSU dataset.

Task	CADAN	DDDAN	DCORAL	DANN	AICDA	DJDA	Proposed
TB1	99.16 ± 0.59	100 ± 0.00	89.62 ± 10.65	99.75 ± 0.23	98.91 ± 0.23	97.6 ± 0.23	99.66 ± 0.46
TB2	98.91 ± 1.24	97.32 ± 3.73	81.84 ± 8.40	98.58 ± 1.01	96.82 ± 3.46	96.5 ± 0.62	99.83 ± 0.38
TB3	97.15 ± 5.01	99.83 ± 0.23	89.54 ± 5.64	98.91 ± 0.81	98.99 ± 0.48	97.10 ± 0.78	99.75 ± 0.37
TB4	99.08 ± 1.08	99.66 ± 0.55	89.72 ± 7.72	97.24 ± 2.43	98.74 ± 0.51	95.85 ± 1.29	99.41 ± 0.48
TB5	93.97 ± 6.59	98.74 ± 1.26	78.49 ± 11.75	95.65 ± 2.85	98.41 ± 0.86	94.64 ± 2.03	99.16 ± 0.78
TB6	98.24 ± 1.04	99.41 ± 0.64	93.55 ± 8.59	98.91 ± 1.01	99.08 ± 0.35	96.94 ± 0.76	99.50 ± 0.69
Avg	97.75	99.16	87.13	98.17	98.49	96.44	99.55

Note. CSU: Central South University.

As shown in Table 4 and Figure 6, the proposed method achieves the highest or among the highest mean accuracies across the transfer tasks, while maintaining low run-to-run variability, achieving an average accuracy of 99.55% across all transfer tasks and showing competitive or best average performance among the compared transfer learning methods. The bar chart and corresponding error bars in Figure 6 further illustrate the exceptional stability of the proposed method across independent trials. Unlike DCORAL, which suffers significant performance degradation and severe variance (up to 11.75% in task TB5), the proposed method maintains accuracy consistently above 99% with standard deviations strictly constrained below 0.8% (0.37–0.78%) across all tasks.

Figure 6.

Average accuracy of each method on the CSU dataset. CSU: Central South University.

CADAN and DDDAN display localized fluctuations (up to 6.59% for CADAN in TB3 and TB5, and 3.73% for DDDAN in TB2); AICDA generally maintains small variances but struggles noticeably in TB2 (3.46%); and DANN and DJDA demonstrate moderate stability but still experience variances up to 2.85 and 2.03%, respectively, in TB5.

To further evaluate the discriminative ability of the model in handling complex fault modes, the TB5 transfer task was selected for confusion matrix visualization analysis, as shown in Figure 7(a). It can be seen that even under conditions with large speed variations, the proposed method achieves 100% classification accuracy across all fault categories. Notably, the CSU dataset contains two easily confused fine-grained faults: outer race pitting (OF-P) and outer race cracks (OF-C), as illustrated in Figure 8.

Figure 7.

Visual analysis of TB5 task: (a) confusion matrix and (b) t-SNE feature visualization. t-SNE: t-distributed stochastic neighbor embedding.

Figure 8.

Time–frequency plots of OF-P and OF-C signals after CWT processing. CWT: continuous wavelet transform.

Furthermore, to visually demonstrate the domain alignment effect in the feature space, Figure 7(b) presents the t-distributed stochastic neighbor embedding (t-SNE) visualization of the high-dimensional features for the TB5 task. Samples of different health states form clearly bounded clusters in the two-dimensional space. This implies that despite the speed discrepancy between domains, the EW-DJDA module effectively reduces the differences in marginal and conditional distributions via its dynamic weighting strategy. Consequently, robust domain-invariant features are extracted, ensuring the model’s generalization performance on the target domain.

Experiments based on the CWRU dataset

To validate the effectiveness of the model on a standard benchmark, this section employs the CWRU dataset for experimentation. As one of the most widely used public datasets in the field of bearing fault diagnosis, CWRU includes three subtle speed variations (1772, 1750, and 1730 r/min) induced by different loads. By using these three operating conditions interchangeably as source and target domains, six transfer tasks (TA1 to TA6) were designed. Detailed descriptions are provided in Table 5.

Table 5.

Cross-condition transfer tasks on the CWRU dataset.

Task	Source domain category		Target domain category		Scenario
Task	Fault diameter (inch)	Fault category	Fault diameter (inch)	Fault category	Scenario
TA1	0.007/0.014/0.021	NC, BF, OF, IF	0.007/0.014/0.021	NC, BF, OF, IF	0 → 1
TA2					0 → 2
TA3					1 → 2
TA4					1 → 0
TA5					2 → 0
TA6					2 → 1

Note. In the table, 0: 1772, 1: 1750, and 2: 1730 r/min represent the rotational speed. CWRU: Case Western Reserve University.

Based on the experimental results presented in Table 6, the proposed method achieves highly competitive performance across six cross-domain transfer tasks on the CWRU dataset, with an average accuracy of 99.92%. Notably, in the two more challenging tasks, TA2 and TA5, which involve the largest rotational-speed discrepancies, the proposed method maintains nearly 100% diagnostic accuracy. These results indicate that the proposed method is effective in extracting domain-invariant features and mitigating the domain shift caused by varying operating conditions.

Table 6.

Diagnostic accuracy (%) on the CWRU dataset.

Task	CADAN	DDDAN	DCORAL	DANN	AICDA	DJDA	Proposed
TA1	99.94 ± 0.14	99.87 ± 0.29	99.87 ± 0.18	99.74 ± 0.42	99.61 ± 0.42	99.29 ± 0.27	99.94 ± 0.14
TA2	92.36 ± 7.13	99.55 ± 0.84	84.92 ± 14.70	99.81 ± 0.29	98.51 ± 1.85	92.23 ± 5.39	100 ± 0.00
TA3	100 ± 0.00	99.94 ± 0.14	91.91 ± 7.20	99.94 ± 0.14	99.16 ± 1.53	95.27 ± 5.68	100 ± 0.00
TA4	97.73 ± 3.12	99.35 ± 1.28	95.85 ± 4.71	94.16 ± 11.99	99.03 ± 0.51	96.11 ± 2.49	99.87 ± 0.29
TA5	85.84 ± 5.32	96.37 ± 4.85	84.22 ± 7.41	88.96 ± 2.31	92.99 ± 2.64	87.14 ± 2.32	99.87 ± 0.18
TA6	94.68 ± 6.66	99.55 ± 1.02	95.46 ± 5.25	99.42 ± 0.58	99.35 ± 0.51	96.95 ± 3.64	99.81 ± 0.29
Avg	95.09	99.11	92.04	97.01	98.11	94.50	99.92

Note. CWRU: Case Western Reserve University.

Furthermore, the error bar analysis in Figure 9 shows that the proposed method exhibits low run-to-run variability across repeated experiments. While some comparison methods show larger performance fluctuations under speed changes, the proposed method maintains competitive accuracy together with stable generalization behavior.

Figure 9.

Average accuracy of each method on the CWRU dataset. CWRU: Case Western Reserve University.

Figure 10(a) and (b) presents the normalized confusion matrices for the proposed method and DDDAN on the TA5 task’s worst-case results. DDDAN struggles to distinguish faults of varying severities, specifically different diameters of the same Ball Fault type. As shown in Figure 10(b), this causes significant mutual misclassification: 62.50% of label 2 (0.007 inch) samples are misclassified as label 8 (0.021 inch), and 86.96% of label 8 samples are misclassified as label 2.

Figure 10.

Confusion matrix for the TA5 migration task: (a) proposed and (b) DDDAN.

The proposed method effectively rectifies this issue. Figure 10(a) shows the recognition accuracy for label 2 improving to 100%, while label 8 accuracy reaches 95.65%. This indicates that by mining hard negative samples near the decision boundary, HNM-Softmax enables the model to learn fine-grained features that differentiate fault severities, maintaining robust diagnostic capabilities even under conditions with subtle interclass differences.

Experiments based on the BJTU dataset

To evaluate the model’s generalization ability under significant speed discrepancies, the public dataset from BJTU was employed for testing. Unlike the previous two datasets, the BJTU dataset is characterized by a substantial speed span (ranging from 1200 to 3600 r/min), which imposes greater challenges on feature alignment. Based on three operating conditions (1200, 2400, and 3600 r/min), six cross-domain tasks (TC1–TC6) were constructed, as shown in Table 7. The detailed diagnostic results of each model on the BJTU dataset are presented in Table 8.

Table 7.

Cross-condition transfer tasks on the BJTU dataset.

Task	Source domain category	Target domain category	Scenario
TC1	NC, IF, BF, CF, OF	NC, IF, BF, CF, OF	0 → 1
TC2			0 → 2
TC3			1 → 2
TC4			1 → 0
TC5			2 → 0
TC6			2 → 1

Note. In the table, 0: 1200, 1: 2400, and 2: 3600 r/min. BJTU: Beijing Jiaotong University.

Table 8.

Diagnostic accuracy (%) on the BJTU dataset.

Task	CADAN	DDDAN	DCORAL	DANN	AICDA	DJDA	Proposed
TC1	99.84 ± 0.20	99.84 ± 0.28	98.21 ± 1.39	99.87 ± 0.18	99.68 ± 0.25	99.01 ± 0.57	99.97 ± 0.07
TC2	99.72 ± 0.25	100 ± 0.00	93.70 ± 10.81	99.90 ± 0.14	99.46 ± 0.47	98.72 ± 0.78	100 ± 0.00
TC3	99.71 ± 0.40	99.90 ± 0.21	96.45 ± 6.45	99.90 ± 0.14	99.62 ± 0.37	99.10 ± 0.56	99.94 ± 0.14
TC4	97.54 ± 2.87	99.81 ± 0.13	96.51 ± 2.53	98.05 ± 1.52	99.07 ± 0.68	97.41 ± 0.80	99.81 ± 0.13
TC5	98.08 ± 2.62	99.06 ± 2.02	92.00 ± 6.10	97.70 ± 2.53	97.78 ± 1.91	94.35 ± 1.34	99.87 ± 0.13
TC6	99.65 ± 0.31	99.66 ± 0.55	98.56 ± 2.28	99.52 ± 0.54	99.49 ± 0.31	98.85 ± 0.26	99.84 ± 0.28
Avg	99.09	99.71	95.91	99.16	99.18	97.91	99.91

Note. BJTU: Beijing Jiaotong University.

As illustrated in Table 8 and Figure 11, the proposed DW-JDA method achieves the highest average accuracy on this dataset and exhibits low run-to-run variability. Although DDDAN remains a strong competitor, the overall results indicate that the proposed method provides competitive accuracy together with stable diagnostic performance across diverse operating conditions. In contrast, some comparison methods show larger performance fluctuations on more challenging transfer tasks.

Figure 11.

Average accuracy of different methods on the BJTU dataset. BJTU: Beijing Jiaotong University.

To verify the feature learning capability under large-span operating conditions, Figure 12 displays the t-SNE feature visualization results for the TC5 task (3600 → 1200 r/min). As depicted, multiple competitive baseline methods exhibit varying degrees of feature distribution degradation. Both DJDA (Figure 12(g)) and DCORAL (Figure 12(c)) display severe category aliasing, failing to distinctively separate several proximal health states and showing numerous outliers. While DANN (Figure 12(d)) and CADAN (Figure 12(e)) achieve coarse separation, they are characterized by looser clustering with features significantly dispersed. AICDA (Figure 12(f)) shows improvement over these, yet its clusters remain less compact. DDDAN maintains comparable performance to the proposed method.

Figure 12.

t-SNE feature visualization of the TC5 task: (a) proposed, (b) DDDAN, (c) DCORAL, (d) DANN, (e) CADAN, (f) AICDA, and (g) DJDA. t-SNE: t-distributed stochastic neighbor embedding.

In contrast, the proposed method yields clearer interclass separation and better-organized feature distributions. These observations suggest that the hard-margin constraints introduced by HNM-Softmax may improve inter-class separation, which is consistent with the competitive discriminative performance observed under wide-span operating conditions.

Experiments under noisy conditions

In actual high-speed train operating environments, early fault signals from rolling bearings are weak and easily submerged by strong background noise such as wheel-rail friction, aerodynamic noise, and mechanical resonance. This noise not only degrades the signal-to-noise ratio (SNR) but also blurs the feature distribution boundaries between the source and target domains, thereby exacerbating the negative impact of domain shift.⁴⁸ To comprehensively evaluate the anti-interference ability and robustness of the DW-JDA model under extreme working conditions, Gaussian white noise was introduced into the original vibration signals to construct composite signals with varying SNRs. The formula for calculating SNR is defined as follows:

SNR = 10 \log_{10} (\frac{P_{signal}}{P_{noise}})

(17)

where $P_{signal}$ and $P_{noise}$ represent the power of the effective signal and the noise, respectively. The experiment established two typical noise environments: SNR = −2 dB (strong noise interference, where noise power exceeds signal power) and SNR = 2 dB (moderate noise interference). Under these two settings, transfer tasks (TB3, TC2, TA6) were executed on the CSU, BJTU, and CWRU datasets, respectively, to verify the model’s generalization limits.

When SNR = −2 dB, signal features are severely obscured by noise, serving as an extreme test for the feature extraction capability of fault diagnosis models. The diagnostic results of each model under strong noise conditions are shown in Table 9 and Figure 13.

Table 9.

Diagnostic accuracy (%) of cross-condition transfer tasks at SNR = −2.

Task	SNR = −2
Task	CADAN	DDDAN	DCORAL	DANN	AICDA	DJDA	Proposed
TB3	84.86 ± 7.95	93.89 ± 3.47	56.23 ± 13.22	81.93 ± 11.86	91.13 ± 2.72	85.94 ± 3.83	95.56 ± 2.00
TC2	98.56 ± 0.67	99.49 ± 0.35	84.51 ± 3.85	98.46 ± 0.58	98.56 ± 0.44	98.02 ± 0.29	99.39 ± 0.26
TA6	94.61 ± 4.89	98.12 ± 1.27	89.22 ± 6.81	97.27 ± 0.85	97.60 ± 0.54	95.97 ± 1.80	98.18 ± 0.75
Avg	92.68	97.17	76.65	92.55	95.76	93.31	97.71

Note. SNR: signal-to-noise ratio.

Figure 13.

Average accuracy of each method under SNR = −2 conditions.

As illustrated in Table 9 and Figure 13, the proposed DW-JDA network achieves the highest average accuracy under severe noise (SNR = −2), yielding an average accuracy of 97.71%. While baseline methods like DCORAL and CADAN suffer severe performance degradation (averaging 76.65 and 92.68%, respectively), even competitive adaptation networks such as AICDA (95.76%) and DJDA (93.31%) fall short of the proposed model’s overall robustness. This advantage is particularly evident on the challenging CSU dataset (task TB3: 1800 → 2100 r/min). Despite the presence of complex fault modes, DW-JDA attains a 95.56% accuracy with minimal standard deviation (±2.00%), substantially outperforming DDDAN (93.89%), AICDA (91.13%), DJDA (85.94%), and DANN (81.93%). This resilience is primarily driven by the HNM-Softmax mechanism. Severe noise typically causes feature drift that easily breaches standard Softmax decision boundaries; however, HNM-Softmax enforces a strict safety margin against hard negative samples, preserving precise classification boundaries even when feature representations are heavily blurred.

When the noise intensity decreases to SNR = 2 dB, the performance of all models recovers, with detailed results shown in Table 10 and Figure 14. The average accuracy of DW-JDA rises to 99.01%, maintaining highly competitive performance. Notably, in the TC2 task (1200 → 3600 r/min) on the BJTU dataset, DW-JDA achieves a classification accuracy of 99.90%.

Table 10.

Diagnostic accuracy (%) of cross-condition transfer tasks at SNR = 2.

Task	SNR = 2
Task	CADAN	DDDAN	DCORAL	DANN	AICDA	DJDA	Proposed
TB3	89.71 ± 11.15	97.66 ± 2.00	63.43 ± 10.97	95.40 ± 3.00	96.32 ± 0.54	94.56 ± 1.07	97.32 ± 0.64
TC2	99.23 ± 0.94	99.74 ± 0.14	95.30 ± 4.85	99.14 ± 0.44	99.46 ± 0.27	98.59 ± 0.41	99.90±0.09
TA6	95.72 ± 5.88	99.87 ± 0.18	93.90 ± 2.65	99.29 ± 0.70	99.09 ± 0.53	96.75 ± 1.70	99.81 ± 0.29
Avg	94.89	99.09	84.21	97.94	98.29	96.63	99.01

Note. SNR: signal-to-noise ratio.

Figure 14.

Average accuracy of each method under SNR = 2 conditions.

Comparing the results between SNR = −2 dB and SNR = 2 dB reveals that DW-JDA exhibits minimal performance fluctuation (a difference of only 1.3%), demonstrating strong stability. In contrast, the average accuracy of DCORAL fluctuates by as much as 7.56%. This stability may be partly attributed to the EW-DJDA module. The introduction of noise increases the uncertainty of predictions in the target domain. Through its dynamic weighting mechanism, EW-DJDA can automatically reduce the weight of conditional distribution alignment when entropy values are high. This behavior may help alleviate negative transfer caused by noise interference, thereby supporting effective feature alignment across different signal-to-noise ratios.

To intuitively evaluate the feature learning capability of the proposed method in cross-domain tasks, t-SNE is employed to visualize the high-dimensional features of task TB3 on the CSU dataset under the condition of SNR = −2 dB. Figure 15 shows the feature clustering results of the proposed method and the comparison methods under this noisy condition.

Figure 15.

t-SNE visualization in task TB3 on the CSU dataset when SNR = −2: (a) proposed (98.33%), (b) DANN (94.56%), (c) CADAN (92.89%), (d) DCORAL (74.90%), (e) DDDAN (97.49%), (f) AICDA (94.14%), and (g) DJDA (88.70%). t-SNE: t-distributed stochastic neighbor embedding; CSU: Central South University; SNR: signal-to-noise ratio.

As shown in Figure 15(a), the proposed method exhibits relatively clear decision boundaries in the target domain for task TB3 under a severe −2 dB SNR condition. The visualization suggests that the proposed approach enlarges the interclass gaps and improves overall feature separability. In the mixed visualizations, circles represent the source domain and crosses represent the target domain; better adaptation is reflected by closer categorywise overlap between the two domains.

Compared with the baseline methods, the proposed method shows clearer inter-class separation and better cross-domain feature congruence. DCORAL suffers from severe feature overlap under harsh noise, while DJDA also exhibits noticeable domain shifts between source and target samples of the same category. CADAN and AICDA achieve moderate alignment, but visible discrepancies remain near the decision boundaries. DANN further improves cross-domain alignment, whereas DDDAN and the proposed method show comparatively better category-wise overlap. Overall, the visualizations suggest that the proposed method achieves more organized feature distributions and stronger robustness under extreme noise interference, which is consistent with its competitive diagnostic performance.

Ablation experiment

To rigorously isolate the specific contributions of key components and evaluate their synergistic effects on cross-domain diagnosis, an ablation study was conducted. The framework was deconstructed into modules to create three distinct variants, as outlined in Table 11.

Table 11.

Comparison of ablation experiment settings.

Network framework	M1 (proposed)	M2	M3
EW-DJDA	✓	✓
HNM-Softmax	✓		✓

Note. ✓ indicates participation in the training of the proposed model. EW-DJDA: entropy-weighted discriminative joint distribution alignment; HNM-Softmax: Hard Negative Margin Softmax.

M1 (proposed): The full model, incorporating both EW-DJDA and HNM-Softmax. M2: Preserves the EW-DJDA module but substitutes the classification loss with standard Cross-Entropy Loss. This setup is designed to assess the efficacy of HNM-Softmax in mining hard-to-classify samples and refining decision boundaries. M3: Retains the HNM-Softmax loss while excluding the domain adaptation alignment strategy. This setup serves to validate the capability of EW-DJDA in bridging distribution shifts and mitigating negative transfer.

To ensure a fair comparison and strictly isolate the performance gains from the adaptation mechanisms, all ablation variants (M1, M2, and M3) are implemented using the identical hybrid backbone architecture (comprising MPAConv blocks and the improved vision transformer, as detailed in “DSConv-based hybrid network for feature extraction” section and Table 2). By keeping the backbone architecture identical across M1, M2, and M3, any performance differences can be attributed to the presence or absence of EW-DJDA and to the use of HNM-Softmax versus the standard cross-entropy loss, rather than to architectural changes.

To verify the robustness of the model under extreme working conditions, experiments were conducted in a strong noise environment (SNR = −2 dB). Transfer tasks were selected from the three datasets: CSU (TB3), BJTU (TC2), and CWRU (TA6). Each task was repeated independently five times to obtain the average value, and the results are shown in Table 12 and Figure 16.

Table 12.

Diagnostic accuracy (%) on each task when SNR = −2.

Task	M1	M2	M3
TB3	95.56 ± 2.00	82.84 ± 6.87	68.03 ± 7.41
TC2	99.39 ± 0.26	99.30 ± 0.37	89.28 ± 4.16
TA6	98.18 ± 0.75	98.05 ± 1.30	85.52 ± 4.72
Average	97.71	93.40	80.94

Note. SNR: signal-to-noise ratio.

Figure 16.

Ablation experiment accuracy.

Experimental results demonstrate that HNM-Softmax enhances the model’s discriminative ability, particularly for easily confused classes. In the CSU TB3 task, which contains highly similar fault features (such as pitting and cracks), M1 achieved a 12.72% improvement over M2. This proves that HNM-Softmax effectively decouples blurred boundaries by enforcing larger inter-class margins.

To visually corroborate this effect, Figure 17 illustrates the feature distributions of M2 and M1 for the TB3 task. As shown in Figure 17(a), without the hard negative margin constraint (M2), the features of highly similar fault classes remain loosely distributed and are prone to blurring near the decision boundaries. Conversely, Figure 17(b) demonstrates that the complete model (M1) exhibits clear and distinct feature clusters by explicitly maximizing the angular margin between confusable classes, thereby providing direct geometric proof for the efficacy of HNM-Softmax in enhancing interclass separability.

Figure 17.

Feature space visualization of the ablation models for the TB3 task (SNR = −2 dB): (a) M2: without HNM-Softmax and (b) M1: proposed model. SNR: signal-to-noise ratio; HNM-Softmax: Hard Negative Margin Softmax.

In contrast, for the BJTU task (TC2) characterized by distinct feature differences, the model maintains comparable high performance. However, for the CWRU task (TA6), which exhibits a certain degree of feature similarity, removing HNM-Softmax results in a slight performance decline. This indicates that while standard Softmax suffices for tasks with highly clear feature separations (like BJTU), HNM-Softmax still provides necessary refinements when moderate feature confusion exists (like CWRU), and becomes critical for highly complex tasks (like CSU). Crucially, this validates that HNM-Softmax effectively targets hard samples to optimize decision boundaries without degrading performance on easier samples.

Furthermore, removing the EW-DJDA alignment module (M3) resulted in the model’s inability to overcome feature distribution shifts caused by speed variations, leading to an average accuracy decline of approximately 16.77%. EW-DJDA addresses feature domain invariance by dynamically aligning source and target domain distributions, while HNM-Softmax further resolves the discriminability of confusing categories. The two modules operate synergistically through feature alignment and boundary constraints, ensuring the model’s robustness under strong noise and complex variable conditions.

Sensitivity analysis

The parameter $m_{hn}$ is a key hyperparameter in the HNM-Softmax loss function, determining the magnitude of the enforced margin between the target class and hard negative samples, which directly influences the discriminability of the model’s decision boundaries. Furthermore, the balancing coefficient $β$ in Equation (15) is fixed at 1.0 throughout all experiments to reduce the hyperparameter search space and isolate the effect of $m_{hn}$ . Since the standard cross-entropy and the hard negative margin serve complementary roles, with the former ensuring global convergence and the latter focusing on local boundary refinement, they avoid severe optimization interference. Freezing $β$ successfully isolates the specific geometric impact of $m_{hn}$ . To investigate the specific impact of this parameter on model performance and determine its optimal value, a sensitivity analysis experiment was conducted under a strong noise environment (SNR = −2 dB) using three transfer tasks representing varying degrees of difficulty (TB3, TC2, TA6). By performing a search for $m_{hn}$ within the set {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1}, the experimental results are presented in Figure 18.

Figure 18.

Parameter sensitivity analysis.

From the figure, it can be observed that the value of $m_{hn}$ exerts a differential impact on diagnostic tasks of varying difficulty. The TB3 task (black square curve) involves the distinction of highly similar faults (such as pitting and cracks) in the CSU dataset and is most sensitive to changes in $m_{hn}$ . As $m_{hn}$ increases from 0 to 0.6, the model’s average accuracy shows an upward trend, reaching a peak at $m_{hn} = 0.6$ . This indicates that a moderate margin constraint can effectively increase the distance of hard-to-classify samples in the feature space, reducing confusion. However, when $m_{hn}$ further increases (>0.6), the accuracy not only declines but also the corresponding error band (gray shaded area) also widens significantly. This suggests that an excessive margin constraint drastically increases the optimization difficulty, causing severe fluctuations in the model’s performance across different runs and substantially reducing training stability.

Regarding the other tasks, the model exhibits strong overall robustness, though with nuanced differences reflecting their inherent complexity. For the TC2 task (red circle curve), where fault feature differences are highly distinct, the accuracy remains remarkably stable at nearly 100% across the entire variation range of $m_{hn}$ , with almost zero fluctuation. For the TA6 task (blue triangle curve), which contains a moderate degree of feature similarity, the accuracy consistently remains at a high level (approximately 98%) but exhibits slightly wider error bands and minor fluctuations compared to TC2. This nuanced observation suggests that HNM-Softmax has good compatibility: the introduction of a margin constraint does not appear to disrupt the feature distribution of highly separable samples (TC2), while still maintaining reliable performance when moderate feature confusion is present (TA6).

In summary, when $m_{hn}$ is small (such as 0 or 0.1), the model’s penalty on hard-to-classify samples is insufficient, limiting the accuracy in the highly complex TB3 task. Conversely, when $m_{hn}$ is excessively large (>0.8), the overly strong geometric constraint leads to feature space distortion and optimization difficulties, triggering performance degradation and instability. To balance interclass separability with training stability, $m_{hn} = 0.6$ was ultimately selected as the optimal parameter setting and kept constant in all subsequent experiments.

Conclusion

This article proposes a novel UDA network, designed to address the challenges of significant feature distribution shifts and hard-negative misclassification in rolling bearing fault diagnosis under cross-working conditions. By integrating time–frequency analysis, uncertainty-aware alignment, and margin-based classification, the proposed framework establishes an end-to-end solution. The main findings and contributions of this study are summarized as follows:

By leveraging CWT and DSConv, the model effectively captures multiscale time–frequency characteristics from nonstationary vibration signals while reducing computational complexity.

The novel EW-DJDA mechanism effectively mitigates negative transfer by dynamically balancing the weights of marginal and conditional distribution alignments based on prediction uncertainty in the target domain.

The proposed HNM-Softmax loss function explicitly enforces a safety margin between target classes and competitive hard negatives. This targeted suppression strategy ensures high discriminability for fine-grained and similar fault modes.

Extensive experiments on the CSU, CWRU, and BJTU datasets demonstrate that DW-JDA achieves average accuracies of 99.55, 99.92, and 99.91%, respectively. Notably, it maintains a high accuracy of 97.71% under extreme noise conditions (SNR = −2), showing competitive or best average performance among the compared methods.

Visual analysis suggests that, even under large-span rotational-speed differences, the model can achieve clearer feature separation and reduced cross-domain distribution offsets, indicating good generalization ability.

Although the proposed method demonstrates competitive and stable performance on the experimental datasets, it is acknowledged that real industrial environments involve more complex multisource coupled faults and interference. Therefore, future research will focus on transitioning the DW-JDA framework from laboratory rigs to actual high-speed train operating conditions to validate its generalization. This includes exploring open-domain adaptation to address emerging and unknown fault types under drastically variable field conditions. Furthermore, it will focus on exploring cross-machine transfer diagnostics to overcome structural differences between different experimental platforms and actual train bogies, and integrating few-shot learning strategies to maintain high diagnostic accuracy even with extremely scarce target domain field samples.

Supplemental Material

sj-docx-1-shm-10.1177_14759217261452585 – Supplemental material for DW-JDA: An unsupervised cross-domain dynamic weighted joint distribution adaptation network for high-speed train bearing fault diagnosis

Supplemental material, sj-docx-1-shm-10.1177_14759217261452585 for DW-JDA: An unsupervised cross-domain dynamic weighted joint distribution adaptation network for high-speed train bearing fault diagnosis by Penghui Xie, Suchao Xie and Xin Guo in Structural Health Monitoring

Footnotes

ORCID iD

Suchao Xie

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was undertaken at Key Laboratory of Traffic Safety on Track (Central South University), Ministry of Education, China. The authors gratefully acknowledge the support from the National Key R&D Program of China (grant no. 2024YFB4303000). This paper was also supported by the science and technology innovation Program of Hunan Province (grant no. 2024RC1019), the Key Project of Scientific Research Project of Hunan Provincial Department of Education (grant no. 23A0017) and the Natural Science Foundation of Hunan Province (grant no. 2023JJ31015).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

References

Ouifak

Idri

. A comprehensive review of fuzzy logic based interpretability and explainability of machine learning techniques across domains. Neurocomputing; 647. Epub ahead of print 28 September 2025. https://doi.org/10.1016/j.neucom.2025.130602

Zhao

Gao

Cheng

, et al. Semi-supervised fault diagnosis of bearings under noisy environments with limited labeled samples. Neurocomputing; 649. Epub ahead of print 7 October 2025. https://doi.org/10.1016/j.neucom.2025.130843

Yang

Shao

, et al. Fault detection of high-speed train axle bearings based on a hybridized physical and data-driven temperature model. Mech Syst Signal Process; 208. Epub ahead of print 15 February 2024. https://doi.org/10.1016/j.ymssp.2023.111037

Wang

Xie

, et al. A temporal cross-contrastive self-supervised learning framework for high-speed train bearing fault diagnosis: addressing limited labeling and speed variability. Eng Appl Artif Intell; 158. Epub ahead of print 15 October 2025. https://doi.org/10.1016/j.engappai.2025.111537

Zhang

Wang

, et al. Multi-scale convolutional sparse attention transformer: a lightweight fault diagnosis model for rotating machinery. Neurocomputing; 650. Epub ahead of print 14 October 2025. https://doi.org/10.1016/j.neucom.2025.130934

Prawin

. Deep learning neural networks with input processing for vibration-based bearing fault diagnosis under imbalanced data conditions. Struct Health Monit 2025; 24: 883–908. https://doi.org/10.1177/14759217241246508

Deng

Liu

Fang

, et al. MgNet: a fault diagnosis approach for multi-bearing system based on auxiliary bearing and multi-granularity information fusion. Mech Syst Signal Process; 193. Epub ahead of print 15 June 2023. https://doi.org/10.1016/j.ymssp.2023.110253

Qin

Yang

, et al. Multi-layer convolutional dictionary learning network for signal denoising and its application to explainable rolling bearing fault diagnosis. ISA Trans 2024; 147: 55–70. https://doi.org/10.1016/j.isatra.2024.01.027

. A new hybrid deep signal processing approach for bearing fault diagnosis using vibration signals. Neurocomputing 2020; 396: 542–555. https://doi.org/10.1016/j.neucom.2018.12.088

10.

Huang

Zhang

, et al. A rolling bearing fault diagnosis method based on interactive generative feature space oversampling-based autoencoder under imbalanced data. Struct Health Monit 2025; 24: 979–997. https://doi.org/10.1177/14759217241248209

11.

Chen

Yang

Xue

, et al. Deep transfer learning for bearing fault diagnosis: a systematic review since 2016. IEEE Trans Instrum Meas 2023; 72: 3508221. https://doi.org/0.1109/TIM.2023.3244237

12.

Zhang

Zheng

, et al. Unsupervised fault detection with multi-source anomaly sensitivity enhancing convolutional autoencoder for high-speed train bogie bearings. Expert Syst Appl; 281. Epub ahead of print 1 July 2025. https://doi.org/10.1016/j.eswa.2025.127570

13.

Cui

Jiang

Liu

, et al. A novel weighted sparse classification framework with extended discriminative dictionary for data-driven bearing fault diagnosis. Mech Syst Signal Process; 222. Epub ahead of print 1 January 2025. https://doi.org/10.1016/j.ymssp.2024.111777

14.

Wang

Wen

, et al. Adaptive class center generalization network: a sparse domain-regressive framework for bearing fault diagnosis under unknown working conditions. IEEE Trans Instrum Meas; 72. Epub ahead of print 2023. https://doi.org/10.1109/TIM.2023.3273659

15.

Cui

Jiang

Liu

, et al. A novel adaptive generalized domain data fusion-driven kernel sparse representation classification method for intelligent bearing fault diagnosis. Expert Syst Appl; 247. Epub ahead of print 1 August 2024. https://doi.org/10.1016/j.eswa.2024.123225

16.

Ding

Cao

Jia

, et al. Deep temporal-spectral domain adaptation for bearing fault diagnosis. Knowledge-based Syst; 299. Epub ahead of print 5 September 2024. https://doi.org/10.1016/j.knosys.2024.111999

17.

Men

Gong

Zhou

, et al. Unsupervised domain adaptation method for bearing fault diagnosis assisted by twin data under extreme sample scarcity. Mech Syst Signal Process; 239. Epub ahead of print 1 October 2025. https://doi.org/10.1016/j.ymssp.2025.113359

18.

Sun

Ran

, et al. Unsupervised domain adaptation method based on domain-invariant features evaluation and knowledge distillation for bearing fault diagnosis. IEEE Trans Instrum Meas; 72. Epub ahead of print 2023. https://doi.org/10.1109/TIM.2023.3318747

19.

Jia

Huang

Ding

, et al. Physics-informed unsupervised domain adaptation framework for cross-machine bearing fault diagnosis. Adv Eng Inf; 62. Epub ahead of print October 2024. https://doi.org/10.1016/j.aei.2024.102774

20.

Zhang

, et al. An unsupervised domain adaptation approach with enhanced transferability and discriminability for bearing fault diagnosis under few-shot samples. Expert Syst Appl; 225. Epub ahead of print 1 September 2023. https://doi.org/10.1016/j.eswa.2023.120084

21.

Tong

Jiang

, et al. A time-frequency interpretable framework for bearing fault diagnosis via global channel-region information interaction and weight-CAM. Neurocomputing; 670. Epub ahead of print 14 March 2026. https://doi.org/10.1016/j.neucom.2026.132622

22.

Gao

Ren

, et al. Dual-stream symbiotic architecture: Mitigating rotational speed domain feature distribution shift for adaptive bearing fault diagnosis. Neurocomputing; 674. Epub ahead of print 14 April 2026. https://doi.org/10.1016/j.neucom.2026.132685

23.

Wen

Shen

Fang

, et al. A two-step transfer learning approach for railway point machine fault diagnosis under small sample conditions. Neurocomputing; 676. Epub ahead of print 1 May 2026. https://doi.org/10.1016/j.neucom.2026.133043

24.

Rezazadeh

Perfetto

de Oliveira

, et al. A fine-tuning deep learning framework to palliate data distribution shift effects in rotary machine fault detection. Struct Health Monit 2026; 25: 661–683. https://doi.org/10.1177/14759217241295951

25.

Rezazadeh

De Luca

Perfetto

, et al. Domain-adaptive graph attention semi-supervised network for temperature-resilient SHM of composite plates. Sensors 2025; 25: 6847. https://doi.org/10.3390/s25226847

26.

Yang

Jin

, et al. A new method for bearing fault diagnosis across machines based on envelope spectrum and conditional metric learning. Sensors; 24. Epub ahead of print May 2024. https://doi.org/10.3390/s24092674

27.

Zhang

Chai

, et al. Bearing fault diagnosis under variable working conditions base on contrastive domain adaptation method. IEEE Trans Instrum Meas; 71. Epub ahead of print 2022. https://doi.org/10.1109/TIM.2022.3200106

28.

Zhu

Shi

Feng

. A transfer learning method using high-quality pseudo labels for bearing fault diagnosis. IEEE Trans Instrum Meas; 72. Epub ahead of print 2023. https://doi.org/10.1109/TIM.2022.3223146

29.

Feng

Zhang

Zhao

. Unsupervised domain adaptation bearing fault diagnosis method based on joint feature alignment. Proc Inst Mech Eng Part C-J Eng Mech Eng Sci 2024; 238: 11356–11365. https://doi.org/10.1177/09544062241274178

30.

Zhang

Guo

, et al. Imbalanced bearing fault diagnosis under variant working conditions using cost-sensitive deep domain adaptation network. Expert Syst Appl; 193. Epub ahead of print 1 May 2022. https://doi.org/10.1016/j.eswa.2021.116459

31.

Chang

Fang

Zhou

, et al. A multi-order moment matching-based unsupervised domain adaptation with application to cross-working condition fault diagnosis of rolling bearings. Struct Health Monit 2025; 24: 1438–1455. https://doi.org/10.1177/14759217241262386

32.

Lee

C-Y

Zhuo

G-L

. Optimal transport deep clustering network for universal-set transfer bearing fault diagnosis. IEEE Trans Instrum Meas; 74. Epub ahead of print 2025. https://doi.org/10.1109/TIM.2025.3545974

33.

Tang

Wang

Pan

. Angle-based transfer support matrix machine for roller bearing fault diagnosis under limited labeled data. Struct Health Monit. Epub ahead of print 25 December 2024. https://doi.org/10.1177/14759217241298394

34.

Tang

Liu

, et al. A novel transfer learning network with adaptive input length selection and lightweight structure for bearing fault diagnosis. Eng Appl Artif Intell; 123. Epub ahead of print August 2023. https://doi.org/10.1016/j.engappai.2023.106395

35.

Huo

Jiang

Shen

, et al. Enhanced transfer learning method for rolling bearing fault diagnosis based on linear superposition network. Eng Appl Artif Intell; 121. Epub ahead of print May 2023. https://doi.org/10.1016/j.engappai.2023.105970

36.

Zhong

Yuan

, et al. Bearing fault diagnosis using transfer learning and self-attention ensemble lightweight convolutional neural network. Neurocomputing 2022; 501: 765–777. https://doi.org/10.1016/j.neucom.2022.06.066

37.

Zhao

Guo

, et al. Lightweight bearing fault diagnosis method based on cross-scale learning transformer under imbalanced data. Meas Sci Technol; 35. Epub ahead of print 1 October 2024. https://doi.org/10.1088/1361-6501/ad5ea4

38.

Liu

Pan

Zheng

, et al. Broad distributed game learning for intelligent classification in rolling bearing fault diagnosis. Appl Soft Comput; 167. Epub ahead of print December 2024. https://doi.org/10.1016/j.asoc.2024.112470

39.

Xie

Wang

, et al. A fault diagnosis method for critical rotating components in trains based on multi-pooling attention convolution and an improved vision transformer. Struct Health Monit 2025. https://doi.org/10.1177/14759217251355472

40.

Smith

Randall

. Rolling element bearing diagnostics using the case western reserve university data: a benchmark study. Mech Syst Signal Process 2015; 64–65: 100–131. https://doi.org/10.1016/j.ymssp.2015.04.021

41.

Ding

Qin

Wang

, et al. Evolvable graph neural network for system-level incremental fault diagnosis of train transmission systems. Mech Syst Signal Process 2024; 210: 111175. https://doi.org/10.1016/j.ymssp.2024.111175

42.

Weng

Xia

, et al. Cross-domain damage identification based on conditional adversarial domain adaptation. Eng Struct 2024; 321: 118928. https://doi.org/10.1016/j.engstruct.2024.118928

43.

Chen

Xiang

Liu

, et al. Deep discriminative domain adaptation network considering sampling frequency for cross-domain mechanical fault diagnosis. Expert Syst Appl 2025; 280: 127296. https://doi.org/10.1016/j.eswa.2025.127296

44.

Sun

Saenko

. Deep CORAL: correlation alignment for deep domain adaptation. In: Hua

Jégou

(eds) Computer vision – ECCV 2016 workshops. Springer International Publishing, 2016, pp. 443–450.

45.

Ganin

Ustinova

Ajakan

, et al. Domain-adversarial training of neural networks. J Mach Learn Res 2016; 17: 1–35.

46.

Qian

Luo

Qin

. Adaptive intermediate class-wise distribution alignment: a universal domain adaptation and generalization method for machine fault diagnosis. IEEE Trans Neural Netw Learn Syst 2025; 36: 4296–4310. https://doi.org/10.1109/TNNLS.2024.3376449

47.

Chang

Fang

Fan

, et al. A dynamic weighted joint distribution domain adaptation network for cross-machine fault diagnosis of rolling bearings. Struct Health Monit, 2026, 25(3): 1739–1760. https://doi.org/10.1177/14759217241312080

48.

Yang

Wang

Huang

, et al. A uniform phase decoupled iterative filtering and multiscale sliding fractal box dimension method for rolling bearing fault diagnosis. Struct Health Monit. Epub ahead of print 16 August 2025. https://doi.org/10.1177/14759217251358710

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.02 MB

0.00 MB