Deep capsule networks based on dual heterogeneous feature resonance fusion for mechanical fault diagnosis

Abstract

Feature extraction and fusion are important for the fault diagnosis and prediction of rotating machinery. While traditional deep learning networks can learn single attribute of features, they still face difficulties in capturing heterogeneous features with distinct attributes during the fusion process. To solve this problem, a novel type of capsule networks (CapsNets) based on dual heterogeneous feature resonance fusion is presented for heterogeneous feature extraction and fault diagnosis. Firstly, a dual-scale deformable convolution network is proposed to extract dual heterogeneous features. Then, an adaptive heterogeneous feature adjustment mechanism is presented to adjust the weights of heterogeneous features and identify discriminative features. Next, a resonance fusion mechanism is constructed to coordinate and select correlated heterogeneous features in both structural and spatial dimensions, avoiding information conflicts in feature fusion. Lastly, the heterogeneous resonance gain features are introduced into the CapsNet for fault diagnosis and classification tasks. The superiority of the proposed network lies in its ability to integrate and coordinate global and local information, enhancing the correlation between heterogeneous features for improved performance. Comparative experiments on multiple datasets with the state-of-the-art methods demonstrate that the proposed method excels in extracting and fusing dual heterogeneous features under complex operating conditions and noise interference.

Keywords

Rotating machinery fault diagnosis heterogeneous features resonance fusion information conflict capsule network

Introduction

Rotating machinery is widely used in industrial manufacturing and transportation equipment.^1,2 However, rotating components are susceptible to wear, fatigue, and other failures due to harsh operating environments and variable operating conditions, which may results in the internal damage and even severe safety incidents.³ As a result, accurate and prompt detection of mechanical fault is important for improving the reliability and productivity of industrial equipment.^4,5

Traditional signal-based fault diagnosis methods have been rapidly developed to extract features from the time and frequency domains and recognize faults, such as Fourier transform,⁶ wavelet decomposition,⁷ empirical mode decomposition,⁸ variational mode decomposition,⁹ and so on. Despite these advancements in technology, traditional approaches predominantly still depend on manual feature extraction and empirical judgment, which poses challenges for fault diagnosis. The introduction of machine learning (ML) provides an efficient and easily designed approach for industrial equipment maintenance by analyzing and identifying the intrinsic relationships between data, such as principle component analysis,¹⁰ hybrid kernel ridge regression,¹¹ support vector machines,¹² decision trees,¹³ and artificial neural networks.¹⁴ However, these traditional ML methods rely on the shallow fault features extracted from the signals, which restrict the model’s ability of nonlinear pattern recognition and autonomous fault diagnosis in complex classification scenarios.

In recent years, deep learning (DL), as a class of ML algorithms, have shown great potential in the field of fault diagnosis due to their automatic feature extraction capabilities and end-to-end learning characteristics directly from raw vibration signals. Classical DL architectures, including convolutional neural networks (CNNs),¹⁵ graph convolutional networks,¹⁶ deep belief networks,¹⁷ and generative adversarial networks (GANs),¹⁸ have been widely adopted to analyze vibration signals from rotating components such as rolling bearings and gearboxes, enabling accurate fault diagnosis and remaining useful life prediction. In particular, CNNs have made extensive progress in the application of fault diagnosis due to their strong feature extraction ability.^19,20 Although the pooling layer of the CNN provides the model with a priori probabilities of translational invariance but ignores specific attitude and spatial location information.

The introduction of capsule network (CapsNet) addresses the limitations of CNNs for spatial relationships and pose recognition during feature extraction.²¹ Unlike neurons in traditional CNN, the CapsNet outputs a vector instead of a scalar, which not only represents the probability of an entity’s existence in the image but also contains multidimensional information such as the pose, angle, and position of that entity.²² Liu et al.²³ proposed an improved residual-based GAN combined with CapsNet to solve the problem of unbalanced fault data. Xu et al.²⁴ proposed a convolutional block attention module (CBAM) integrated with CapsNet applied to fault diagnosis with different signal-to-noise ratio (SNR) data samples. However, most of the existing DL models utilize fixed single-scale feature convolution kernels to extract features, which render the networks ineffective in capturing the critical fault characteristics in nonstationary process conditions.

To effectively exploit diverse fault features, a combination of multiscale convolution and selective kernel adaptation is employed to enable the DL method to dynamically extract features at different scales.^25,26 The integration of these techniques further enhances the model’s ability to fuse multiscale data.²⁷ Wang and Liu²⁸ proposed a multiscale meta-learning network that can be applied to cross-domain diagnosis with few samples. Wang et al.²⁹ proposed a multilevel attitude-aware denoising network for bearing fault diagnosis under noisy conditions. Xiong et al.³⁰ proposed a multiscale adaptive-routing capsule contrastive network for intelligent fault diagnosis in rotating machinery, improving accuracy and robustness under noisy environments and labels. Although these methods can effectively extract multiscale features, they do not fully consider the feature variability of multiple channels.

Due to noise and variable operating conditions, different samples with the same label are fed into a DL model, resulting in corresponding changes in the learned feature vectors. Heterogeneous features from different sources or scales have distinct attributes and spatial dimensions.^31,32 During the fusion process, large differences between these features can reduce model accuracy, while highly similar features may lead to redundancy and decrease the model’s learning efficiency.^33,34 These potential factors are collectively referred to as information conflict. Miao et al.³⁵ proposed a channel-wise CNN with feature augmentation for fault diagnosis of wheeled mobile robots, improving accuracy and robustness in multiheterogeneous sensor data. Han et al.³⁶ proposed a multisource heterogeneous information fusion network for intelligent fault diagnosis of rotating machinery, enhancing accuracy and robustness with limited datasets. Miao et al.³⁷ proposed a deep feature interactive network for machinery fault diagnosis that uses multisource heterogeneous data such as infrared thermal images and vibration signals for adaptive feature fusion. However, these methods fail to deeply mine and adjust the weights of heterogeneous features during the extraction process, which can lead to increased information redundancy. Furthermore, the fusion process neglects the correlation between heterogeneous features, potentially causing information conflicts and resulting in unreliable diagnostic results. This highlights the potential of a new DL method for capturing heterogeneous features under complex working conditions or noise interference and eliminating the conflict information differences between features.

In summary, how to accurately capture the heterogeneous features under complex working conditions or noise interference and eliminate the conflict information differences between features is the key of current research. For this purpose, we construct a model based on dual heterogeneous feature resonance fusion for CapsNets (DHFRF-CN), which effectively learn heterogeneous features, ensuring robust and reliable fault diagnosis performance across various scenarios.

The main contributions of this article are as follows:

A dual-scale deformable convolution network (DS-DCN) is developed to capture dual heterogeneous features from time-domain signals under complex working conditions and noisy environments.

An adaptive heterogeneous feature adjustment mechanism (AHFAM) is proposed to adjust the weights of heterogeneous features in the pooling layer, which can remove redundant signals and mine discriminative features for improved performance.

The resonance fusion mechanism (RFM) is proposed to coordinate the correlation of heterogeneous features across structural and spatial dimensions, which avoid feature conflicts during the fusion process and enhances the overall feature integration.

We conducted comparative experiments with other methods on different fault datasets. The effectiveness of the method in capturing heterogeneous features and eliminating the information conflict between the features under complex working conditions and noise environment is verified.

The rest of the article is organized as follows: The second section introduces the related theoretical knowledge. The third section describes the DHFRF-CN methodology and the fault diagnosis framework in detail. The fourth section verifies the effectiveness of the DHFRF-CN through experimental cases. At last, the fifth section briefly summarizes the work of this article.

Relevant theories

Deformable convolution Network

Since the convolution kernel is fixed in shape and size in conventional CNNs, it cannot effectively capture complex feature information from the objects with irregularities or deformations. DCNs were proposed to capture complex deformation features of objects while suppressing background interference.³⁸ As shown in Figure 1, the sampling positions in the DCN convolution kernel can be dynamically adjusted according to the content of the input feature map. As shown in Figure 2, DCN introduces a learnable offset on top of ordinary convolution, which enables the convolution operation to adjust the sampling position according to the changes of the features. The convolution operation of DCN is

y (p_{0}) = \sum_{p_{n} \in R} ω (p_{n}) \cdot x (p_{0} + p_{n} + Δ p_{n})

(1)

where $p_{0}$ is the current output position, $p_{n}$ represents the sampling points in the convolution kernel, and the offset $Δ p$ used to adjust the sampling position of $p_{n}$ , and $R$ is a fixed sampling region.

Figure 1.

Different sampling points of CNN and DCN. CNN: convolutional neural network; DCN: deformable convolution network.

Figure 2.

Architecture of the DCN. DCN: deformable convolution network.

Capsule network

The core idea of CapsNet is to capture the features of the target and its spatial transformations through capsules. As shown in Figure 3, the architecture of the CapsNet is made up of the convolution layer, primary capsule layer, and digit capsule layer.^39,40

Figure 3.

Architecture of the CapsNet. CapsNet: capsule network.

The primary capsule layer receives the local feature maps. It is cut into 32 channels and 8 vector dimensions to obtain the feature map of size [6 × 6 × 8 × 32]. The length and direction of the capsule vectors indicate the probability of the existence of an entity and certain attributes of the entity. The dynamic routing is employed to enable the transmit of feature information between the PrimaryCaps and DigitCaps layers within CapsNet. This process establishes the connection between low-level and top-level features, facilitating an efficient transmission of features. The prediction vector ${\hat{u}}_{j | i}$ of the next layer capsule $j$ for a layer capsule $i$ can be calculated by

{\hat{u}}_{j | i} = W_{ij} u_{i}

(2)

where $u_{i}$ is the output of the first layer capsule $i$ , and $W_{ij}$ is a weight matrix. The output vector $s_{j}$ is obtained by weighted summation of ${\hat{u}}_{j | i}$ as

s_{j} = \sum_{i} c_{ij} {\hat{u}}_{j | i}

(3)

where $c_{ij}$ is the coupling coefficient. The squash function is applied to perform nonlinear compression on $s_{j}$ and generate the $j$ th layer high-level capsule $v_{j}$ as

v_{j} = \frac{{‖ s_{j} ‖}^{2}}{1 + {‖ s_{j} ‖}^{2}} \cdot \frac{s_{j}}{‖ s_{j} ‖}

(4)

where $v_{j}$ is the output vector in the range of 0 and 1.

The CapsNet outputs probabilities in such a way as to push the probability of the correct category as close to 1 as possible and the probability of the wrong category infinitely close to 0. The margin loss is defined as the objective function of CapsNet:

L_{k} = T_{k} max {(0, m^{+} - ‖ v_{k} ‖)}^{2} + λ (1 - T_{k}) max {(0, ‖ v_{k} ‖ - m^{-})}^{2}

(5)

where $L_{k}$ denotes the loss value, $T_{k}$ denotes the label, $m^{+}$ denotes the maximum anchor point, $m^{-}$ denotes the minimum anchor point, $λ$ is the scale parameter, and $‖ v_{k} ‖$ denotes the paradigm of $v_{k}$ .

Dual heterogeneous feature resonance fusion for capsule network

Inspired by the limitations of feature extraction in single-scale networks, we adopt the dual-scale network concept and propose DHFRF-CN. The DHFRF-CN is a DL model applied to fault diagnosis and its architecture is clearly illustrated in Figure 4. In contrast to the dual-scale network, which only uses convolutional layers, we not only use multistage convolutional layers but also incorporate the proposed AHFAM in the two subpaths. Furthermore, our DHFRF-CN is designed with feature resonance fusion and enhancement stages to extract more diverse heterogeneous features.

Figure 4.

Architecture of the DHFRF-CN. DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

The specific operation steps are as follows:

Step 1. The two-dimensional matrix obtained from the vibration signal sampling is used as the input of DS-DCN.

Step 2. The DS-DCN is constructed to capture dual heterogeneous features by adjusting the sampling points of the convolution kernels.

Step 3. AHFAM is used to adaptively calculate and adjust the weights of the heterogeneous features.

Step 4. The RFM is applied to coordinate the relevance of heterogeneous features and achieve feature fusion.

Step 5. The heterogeneous resonance gain features obtained through fusion are encoded into capsule unit for fault diagnosis and classification.

The DHFRF-CN addresses key problems in dual-heterogeneous feature extraction, weight optimization, and fusion conflict under complex operating conditions and noisy environments, achieving superior fault diagnosis performance.

Dual-scale deformable convolution network

The model first extracts heterogeneous features from different data sources or modalities from the time domain signal. DS-DCN is designed to extract dual-scale heterogeneous features, which adapt to the feature nature of different data and modalities. In DS-DCN, the convolution kernels emphasize different aspects of extracting dual-scale heterogeneous features, with the small kernel capturing fine-grained detail features, and the large kernel capturing macroscopic features.

DS-DCN is shown in Figure 5. The multistage DS-DCN is constructed by multiple ordinary convolution layers and deformable convolution layers with varying kernel sizes, which are stacked to progressively learn the hierarchical structure of heterogeneous features. The connection of the ordinary convolution layer module with the deformable convolution layer at different stages enables the proposed network to extract effective cross-modal heterogeneous features. The DS-DCN enhances the perception of dual-scale heterogeneous features by adjusting the sampling points of each layer based on the offset and the signal distribution.

Figure 5.

Architecture of the DS-DCN. DS-DCN: dual-scale deformable convolution network.

Generally, the kernel size and dilation rate of convolution layers determines the receptive field for the extraction global and local features.^41,42 Table 1 shows the sensitivity analysis of kernel sizes and dilation rates. In DS-DCN, CNNs using larger kernels can capture deeper information, while DCN adjusts sampling points through offsets, offering high flexibility. The dilation rate expands the receptive field of convolution without increasing or decreasing the number of parameters. However, DCN has enhanced the flexibility of the receptive field through the offset, and the dilation rate is usually kept at 1 or 2 to avoid excessive sparsity.

Table 1.

Sensitivity analysis of kernel sizes and dilation rates.

Configuration	CNN		DCN		Parameters	ERF
Configuration	Kernel size	Dilation rate	Kernel size	Dilation rate	Parameters	ERF
Config1	3 × 3	1	3×3	1	19,370	7
Config2	3 × 3	1	5×5	1	35,754	9
Config3	5 × 5	1	3×3	1	19,882	9
Config4	5 × 5	1	7×7	1	60,842	13
Config5	7 × 7	1	5×5	1	37,034	13
Config6	3 × 3	1	3×3	2	19,370	7
Config7	3 × 3	2	5×5	1	35,754	9
Config8	5 × 5	3	7×7	1	60,842	13

CNN: convolutional neural network; DCN: deformable convolution network; ERF: effective receptive field.

A kernel that is too large may lead to complex offset learning and increase the number of training parameters, while a kernel that is too small results in an insufficient effective receptive field. Therefore, a balanced choice of parameters is particularly important in fault diagnosis, and kernels of moderate size are usually sufficient to capture features. We can be concluded that in DS-DCN, a larger kernel is used in the early stages to adapt to complex vibration signals, while the kernel is appropriately reduced in the later stages to focus on features.

Adaptive heterogeneous feature adjustment mechanism

The AHFAM is proposed to adaptively calculate and adjust the weight of each feature according to the contribution of the heterogeneous features in the classification task, enhancing the feature discriminative ability of DS-DCN.

As shown in Figure 6, AHFAM employs two modules including the global average pooling (GAP) and global maximum pooling (GMP), which aggregates global and local features to capture the dependencies between channels. The former module adjusts the overall weights of the heterogeneous features to reflect the overall trend of the features by aggregating the mean values in the region, while the latter module assigns the local maximum weights of the heterogeneous features to effectively capture the local detail changes of the features. The AHFAM simultaneously aggregates the global and local weights of both modules to capture the dependencies between the two channels, forming an aggregated channel. The outputs from these three channels are fused into a discriminative heterogeneous feature, which enables the network to consider features across multiple dimensions and results in more accurate feature representations.

Figure 6.

Architecture of the AHFAM. AHFAM: adaptive heterogeneous feature adjustment mechanism.

As can be seen from Figure 6, GAP and GMP are structurally parallel and symmetric. For the input feature $x \in R^{H \times W \times C}$ , the average of feature map $G_{1}$ is denoted as

G_{1} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{i, j}

(6)

where $H, W,$ and $C$ denote the corresponding width, height, and channel dimensions, $X_{i, j}$ denotes the feature map $X$ at location $i$ and $j$ traversing each point of $H$ and $W$ . The introduction of GMP is to search for the local maximum feature value $G_{2}$ is denoted as

G_{2} = max_{i, j} X_{i, j}

(7)

where $max_{i, j} ()$ denotes its local maximum at each position $(i, j)$ of the feature map. To capture the dependencies between channels, the global and local feature components are aggregated into $G_{concat}$ .

G_{concat} = α \cdot G_{1} + (1 - α) \cdot G_{2}

(8)

where $α$ is an adaptive learning coefficient with a value in the range [0,1]. This parameter controls the fusion ratio of GAP and local maximum pooling. The closer the value of $α$ is to 1, the higher the proportion of global features in fusion. Contrarily, the weight of local features is higher when $α$ tends to 0. The feature components $G_{1}$ , $G_{2}$ , and $G_{concat}$ are uniformly denoted as $g$ , with a range of values $g \in R^{1 \times 1 \times C}$ . The feature weights for each interchannel weights are calculated as

ω = σ (W_{k} g)

(9)

σ (x) = \frac{1}{1 + e^{- x}}

(10)

where $σ$ is sigmoid activation function, and $W_{k}$ is the frequency band matrix for learning channel attention. If only k channels of a current channel domain are considered, the weighting formula is expressed as

ω_{i} = σ (\sum_{j = 1}^{k} w^{j} g_{i}^{j}), g^{j} \in Ω_{i}^{k}

(11)

where $Ω_{i}^{k}$ indicates the set of $g^{j}$ and its $k$ domain channel groups, $ω_{i}$ denotes the learning parameters shared by all channels. The total weight $V$ can be obtained by weighting the channel weights of the three branches

V = a \cdot ω_{a} + b \cdot ω_{b} + c \cdot ω_{c}, a + b + c = 1

(12)

where $ω_{a}$ is the channel weights in GAP, and $ω_{b}$ is the channel weights in GMP, $ω_{c}$ is channel weights of aggregate pooling, $a$ , $b$ , and $c$ are dynamic parameters that are continuously adjusted for optimization during network training. The new features $\tilde{x}$ can be obtained by elementwise multiplication of the weight $V$ and the input feature $x$ as

\tilde{x} = V \cdot x

(13)

Resonance fusion mechanism

Traditional feature fusion methods often result in information loss or conflict due to excessive differences between features. To prevent conflicts arising from heterogeneous feature in terms of information differences and spatial size, the RFM is proposed for the capture of discriminative cross-modal information, including a structural alignment layer, an interaction connection layer, an association filtering layer, and feature enhancement layer. These layers work together to obtain heterogeneous resonance gain features, which effectively coordinate the diverse features across both spatial and structural dimensions.

As shown in Figure 7, the procedures of RFM are described as follows: First, a structural alignment layer is designed to facilitate dimensional and structural compatibility of dual-scale heterogeneous features, where a convolution operation is performed on the input feature mapping with same padding. For the heterogeneous features $F_{A} \in R^{H_{A} \times W_{A} \times C_{1}}$ and $F_{B} \in R^{H_{B} \times W_{B} \times C_{2}}$ , two independent convolution kernels $W_{A}$ and $W_{B}$ are used to compute the aligned features $F_{A}^{'} \in R^{H_{A} \times W_{A} \times C}$ and $F_{B}^{'} \in R^{H_{B} \times W_{B} \times C}$ as

F_{A}^{'} = Conv (F_{A}, W_{A})

(14)

F_{B}^{'} = Conv (F_{B}, W_{B})

(15)

where $W_{A} \in R^{k \times k \times C_{1} \times C}$ and $W_{B} \in R^{k \times k \times C_{2} \times C}$ are convolution kernels, $k$ is the convolution kernel size. Different channel dimensions $C_{1}$ and $C_{2}$ are aligned structurally to form a unified dimension $C$ .

Figure 7.

Architecture of the RFM. RFM: resonance fusion mechanism.

Next, an interaction connection layer is constructed to enable feature interaction. The outer product operation is performed on the aligned features $F_{A}^{'}$ and $F_{B}^{'}$ at each spatial location $(x, y, j)$ to generate an interaction feature $F_{int}$ as

F_{int} (x, y, d) = \sum_{i = 1}^{C} \sum_{j = 1}^{C} U (i, j, d) \cdot F_{A}^{'} (x, y, i) \cdot F_{B}^{'} (x, y, j)

(16)

where $U \in R^{C \times C \times D}$ is the interaction weight matrix and $d$ is the feature dimension after interaction. To compress the spatial dimensions in interaction connection layer, GAP and the activation function are used to generate the interactive feature weights as

A = σ (W \cdot GAP (F_{int}))

(17)

where $σ$ is the Sigmoid activation function, and $W$ is the weight matrix in terms of a linear transformation of the pooled features.

Then, the association filtering layer is performed on the interacted features, which retains related features and removes conflicting features. The masking operation is introduced to set a threshold, preserving features with higher correlation and suppressing or removing those with lower correlation. The threshold $θ$ is determined by analyzing the feature weight distribution and selecting the value with the highest percentage as the measure, which can be calculated as

{\hat{A}}_{(x, y, d)} = {\begin{matrix} 1, & if \begin{matrix} A (x, y, d) \geq θ \end{matrix} \\ 0, & otherwise \end{matrix}

(18)

where ${\hat{A}}_{(x, y, d)}$ is the result for the filtering of interactive feature weights.

Finally, the full connectivity layer to obtain fused heterogeneous resonance gain features $F_{fusion}$ , as follows:

F_{fusion} = {\hat{A}}_{1} \cdot F_{A}^{'} + {\hat{A}}_{2} \cdot F_{B}^{'}

(19)

where ${\hat{A}}_{1}$ and ${\hat{A}}_{2}$ are the weight coefficients after the masking operation.

DHFRF-CN for fault diagnosis

The fault diagnosis process of DHFRF-CN is shown in Figure 8 as follows:

Divide the data sample proportionally.

The training set is sent into the network to adjust the parameters and the validation set is used to adjust the hyperparameters, so that the network maintains the best generalization to unknown data.

Fault classification is performed using testing set data and diagnostic results are obtained. The results are also visualized using confusion matrix and t-distributed stochastic neighbor embedding (t-SNE).

Figure 8.

Fault diagnosis process of DHFRF-CN. DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Experimental verification

This section employs multiple datasets and designs a range of experimental scenarios to evaluate the effectiveness of the DHFRF-CN model in diagnostic tasks. The configuration of the experiments, including the selection of the experimental environment and the model parameters, is also described in detail to ensure that the experiments are conducted under suitable conditions to obtain reliable results.

Experimental configuration and parameter settings

The proposed network in this article is based on the TensorFlow 2.9 (Google LLC, Mountain View, CA, USA) framework in Python 3.9 and implemented on an Intel Xeon E5-2680 v4 (Intel Corporation, Santa Clara, CA, USA) and RTX 3060 (NVIDIA Corporation, Santa Clara, CA, USA) under Windows 11. One thousand samples were collected for each class, and the dataset was divided in the ratio of 7:1:2.⁴³ Each experiment is conducted five times to obtain more reliable results. For the comparison experiments, we selected seven methods to compare with DHFRF-CN. These comparison methods include CNN, multiscale CNN (MSCNN), multiscale CapsNet (MSCN), deep CNN with wide first-layer kernels (WDCNN),⁴⁴ bidirectional long short-term memory and CapsNet with CNN (BLC-CNN),⁴⁵ dual convolutional CapsNet (DC-CN),⁴⁶ and convolutional block attention mechanism CapsNet (CBAM-CN).²⁴

Case1: Rolling bearing fault diagnosis

The CWRU bearing dataset contains test data for bearings under normal and fault conditions.⁴⁷ The test platform for this dataset is illustrated in Figure 9. Rolling bearing vibration data supporting the motor shaft were collected using vibration accelerometers with sampling frequencies of 12 and 48 kHz. By adjusting the loads, data were collected for one healthy state and nine fault states at four different loads. The fault types and data labels are shown in Table 2. Figure 10 shows the vibration waveform of the 10 states collected at 0HP.

Figure 9.

CWRU bearing test bench; CWRU: Case Western Reserve University.

Table 2.

Description of sample labels on the CWRU bearing dataset.

Dataset	Label	0	1	2	3	4	5	6	7	8	9
CWRU bearing dataset	Fault location	Ball			Inner race			Outer race			Normal
CWRU bearing dataset	Fault diameter(10⁻³inch)	7	14	21	7	14	21	7	14	21	0

Figure 10.

Vibration waveforms of data samples.

Evaluation under identical loading conditions

In this section, we use the sample data collected under the same working conditions for model training and evaluation. For example, dataset A-A indicates that the sample set are all from 0HP. To thoroughly assess the effectiveness of DHFRF-CN, we conducted tests under different working conditions separately. As shown in Table 3, it can be seen that the accuracy reaches 100 at 0, 2, and 3HP, and is close to 100 at 1HP.

Table 3.

Diagnostic accuracy of DHFRF-CN under the same load.

Condition	Precison (%)	Recall (%)	F1 score (%)	Accuracy (%)
A-A	100	100	100	100
B-B	99.51	99.50	99.50	99.50
C-C	100	100	100	100
D-D	100	100	100	100

DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

In addition, Figure 11 demonstrates the confusion matrices of DHFRF-CN under four different working conditions to further analyze the classification effect of the model. The accuracy and misclassification rate metrics are labeled below each matrix. From the figure, it can be clearly observed that the misclassification rate under working condition 1 is 0.005, while the misclassification rates of other working conditions are all 0, which verifies reliability of DHFRF-CN performance under different working conditions.

Figure 11.

Confusion matrix of DHFRF-CN under the same loading. DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Evaluation under different loading conditions

To verify the diagnostic ability of DHFRF-CN under variable load, the sample data are collected under different working conditions, such as 0, 1, 2, and 3HP are denoted as data sets A, B, C, and D. Dataset A→B indicates that the training and validation sample are from 0HP, and the testing sample is from 1HP. The results are shown in Table 4. It can be seen that DHFRF-CN not only maintains the top performance in all four conditions but also has the lowest standard deviation.

Table 4.

Diagnostic accuracy of eight methods under different loads (%).

Model	A-B	B-C	C-D	D-A
CNN	81.82 ± 2.661	94.64 ± 2.102	91.16 ± 1.297	62.10 ± 5.028
MSCNN	82.32 ± 2.326	97.04 ± 3.451	91.82 ± 6.052	62.10 ± 3.838
MSCN	81.98 ± 1.161	97.64 ± 1.105	92.80 ± 5.546	65.50 ± 0.779
WDCNN	84.94 ± 2.390	99.32 ± 0.735	90.76 ± 2.855	68.92 ± 2.872
BLC-CNN	79.24 ± 5.298	98.12 ± 0.930	88.72 ± 5.258	68.86 ± 5.232
DC-CN	75.82 ± 3.262	84.48 ± 2.972	72.36 ± 3.470	55.34 ± 2.796
CBAM-CN	85.60 ± 2.811	97.74 ± 2.912	97.62 ± 1.791	72.74 ± 4.469
DHFRF-CN	96.34 ± 0.893	99.78 ± 0.203	99.00 ± 0.469	77.94 ± 0.531

CNN: convolutional neural network; MSCNN: multiscale convolutional neural network; MSCN: multiscale capsule network; WDCNN: deep convolutional neural network with wide first-layer kernel; BLC-CNN: bidirectional long short-term memory and capsule network with convolutional neural network; DC-CN: dual convolutional capsule network; CBAM-CN: convolutional block attention mechanism capsule network; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 12 can be clearly seen as the diagnostic accuracy of the eight models under different loads. Overall, the evaluation accuracies of datasets A-B are all slightly lower than the other datasets, which indicate that the large difference in signal characteristics between datasets A and B leads to a lower accuracy. In this case of large feature differences, DHFRF-CN still maintains superior diagnostic result compared to other models, which moreover validates the feature difference extraction ability of this model. The signal feature differences between datasets B-C and C-D are small, and most of the eight models can extract useful and reliable features. DHFRF-CN achieves the highest accuracy of 99.78 and 99.00% on these two datasets, respectively, which can show that the present method is robust along with the excellent feature difference extraction ability. Dataset D-A has the lowest evaluation accuracy, which indicates that there is a huge signal feature difference between datasets D and A. The overall gain of DHFRF-CN on this dataset compared to CNN, MSCNN, MSCN, WDCNN, BLC-CNN, DC-CN, and CBAM-CN is 15.36, 15.36, 11.96, 8.54, respectively, 8.60, 22.12, and 4.72%. The comparison reveals that DC-CN has the lowest accuracy indicating that the single-scale CapsNet is unable to extract diverse features. CBAM-CN has the second highest accuracy after the present method and is the only one among the compared models that is above 70%. It can be seen that both channel and spatial attention in the convolution block attention mechanism can effectively extract key information, but cannot mitigate scale information conflicts. DHFRF-CN overcomes the problem of single-scale network extracting a single feature, but also extracts the most critical and effective feature information. It also avoids the problem of conflicting information differences on different scale channels and achieves the highest accuracy in different load experiments.

Figure 12.

Diagnostic accuracy of eight methods under different loads.

Figure 13 demonstrates the accuracy of the eight models under different working conditions in single experiment. The results show that DHFRF-CN has the highest accuracy under all working conditions, and the results of the five experiments show good stability and exhibit small fluctuations. This indicates that DHFRF-CN performs consistently in different experiments and can effectively cope with the challenges of various working conditions with strong robustness.

Figure 13.

Diagnostic accuracy of each trail for the eight methods at different loads.

In contrast, the diagnostic accuracies of the other seven methods have large fluctuations, indicating that they are susceptible to changes in working conditions or fluctuations in training data. For example, the low accuracy rate of DC-CN indicates that relying on only two layers of convolution network to process the signal fails to extract sufficiently rich features for CapsNet to perform effective classification. The heterogeneous resonance gain features obtained by DHFRF-CN through the first three stages of processing provide richer and more diverse input features for the CapsNet. This process effectively enhances the diversity and accuracy of feature expression, thus laying foundation for the accurate classification of the CapsNet. Therefore, the stable performance and superior performance of DHFRF-CN in this experiment indicate its potential advantages in fault diagnosis tasks, especially in the face of complex working conditions and data uncertainty.

Case2: Rolling bearing fault diagnosis

The experimental workbench for the dataset from the University of Paderborn (PU) in Germany is shown in Figure 14. Bearing data under four operating conditions were collected at different speeds and loads. The bearing data from the PU dataset are divided into three types: normal, inner ring fault, and outer ring fault.⁴⁸ Damage is classified as either level 1 or level 2 based on the severity of the fault. In this experiment, we selected five fault state datasets. As shown in Table 5, the fault state data include one healthy bearing (K001), two outer ring fault bearings with different degrees of damage (KA05, KA06), and two inner ring fault bearings with different degrees of damage (KI05, KI07).

Figure 14.

PU bearing test bench. PU: University of Paderborn.

Table 5.

Description of sample labels on the PU bearing dataset.

Dataset	Label	0	1	2	3	4
PU bearing dataset	Fault location	Normal	Outer race		Inner race
	Damage level	0	1	2	1	2
	Description	K001	KA05	KA06	KI05	KI07

PU: University of Paderborn.

Fault diagnostic results under benchmark conditions

The training accuracy and loss curve of DHFRF-CN under 0HP are shown in Figure 15. The curve converges within the first 10 training cycles and tends to be stable after that. This indicates that DHFRF-CN has good generalization performance.

Figure 15.

Training loss curve.

Under 0 HP, the diagnostic results of the eight methods are shown in Table 6 and Figure 16. As can be seen from the figures, the precision, recall, F1 score, and average accuracy of DHFRF-CN are 97.72, 97.68, 97.65, and 97.85%, respectively, achieving the best metrics compared to other methods. Additionally, the standard deviation of DHFRF-CN is 0.19%, the lowest among all networks, indicating its greater stability. Figure 17 shows the confusion matrix of the eight methods diagnosed at 0HP. DHFRF-CN (Figure 17(h)) achieved the highest classification accuracy overall, with a misclassification rate of only 0.022.

Table 6.

Diagnostic results under benchmark conditions.

Model	Max-acc (%)	Min-acc (%)	Precision (%)	Recall (%)	F1 score (%)	Accuracy (%)
CNN	85.40	83.20	84.29 ± 0.949	84.20 ± 0.940	84.17 ± 0.899	84.20 ± 0.940
MSCNN	93.20	92.60	93.22 ± 0.251	92.87 ± 0.251	92.82 ± 0.272	92.87 ± 0.251
MSCN	89.60	87.20	88.80 ± 1.141	88.53 ± 1.153	88.38 ± 1.232	88.53 ± 1.153
WDCNN	94.40	93.40	93.66 ± 0.521	93.73 ± 0.526	93.67 ± 0.521	93.73 ± 0.526
BLC-CNN	93.60	90.40	92.50 ± 1.437	92.47 ± 1.605	92.44 ± 1.503	92.47 ± 1.605
DC-CN	94.00	91.00	93.27 ± 0.998	92.73 ± 1.452	92.64 ± 1.605	92.73 ± 1.452
CBAM-CN	78.40	77.20	78.13 ± 1.110	77.60 ± 0.653	76.95 ± 0.320	77.60 ± 0.653
DHFRF-CN	97.85	97.40	97.72 ± 0.189	97.68 ± 0.216	97.65 ± 0.210	97.68 ± 0.216

CNN: convolutional neural network; MSCNN: multiscale convolutional neural network; MSCN: multi-scale capsule network; WDCNN: deep convolutional neural network with wide first-layer kernel; BLC-CNN: bidirectional long short-term memory and capsule network with convolutional neural network; DC-CN: dual convolutional capsule network; CBAM-CN: convolutional block attention mechanism capsule network; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 16.

Diagnostic results of the eight methods.

Figure 17.

Confusion matrix: (a) CNN, (b) MSCNN, (c) MSCN, (d) WDCNN, (e) BLC-CNN, (f) DC-CN, (g) CBAM-CN, and (h) DHFRF-CN. CNN: convolutional neural network; MSCNN: multiscale convolutional neural network; MSCN: multiscale capsule network; WDCNN: deep convolutional neural network with wide first-layer kernel; BLC-CNN: bidirectional long short-term memory and capsule network with convolutional neural network; DC-CN: dual convolutional capsule network; CBAM-CN: convolutional block attention mechanism capsule network; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Fault diagnostic results in complex noise environments

To verify the reliability of DHFRF-CN in fault diagnosis under complex noise environments, Gaussian white noise with different SNRs was added to the original vibration signals under benchmark conditions, constructing vibration signals in noisy environments.⁴⁹ The lower the SNR, the stronger the noise intensity; conversely, the higher the SNR, the weaker the noise intensity. The experimental results are shown in Table 7. The data are intuitively presented in Figures 18 and 19. It can be seen that interference information in the signal significantly affects the performance of the eight methods. Under different noise conditions, DHFRF-CN outperforms the other seven methods. Especially under strong noise conditions with an SNR of −6 dB, the proposed DHFRF-CN achieves overall gains of 16.6, 11.87, 10.6, 23.67, 19.1, 10.87, and 18.67% compared to CNN, MSCNN, MSCN, WDCNN, BLC-CNN, DC-CN, and CBAM-CN. This further demonstrates the strong robustness of the proposed DHFRF-CN.

Table 7.

Detailed diagnostic results of the eight methods under different SNRS (%).

Model	−6	−4	−2	0	2	4	6
CNN	58.07	63.07	66.80	71.27	73.67	78.80	80.87
MSCNN	62.80	69.93	78.27	83.60	86.33	88.13	88.67
MSCN	64.07	70.07	76.33	81.87	83.07	84.87	85.53
WDCNN	51.00	62.73	73.00	81.73	85.60	91.07	91.47
BLC-CNN	55.57	62.60	74.07	75.27	80.30	87.00	88.00
DC-CN	63.80	70.87	78.27	84.27	86.40	88.93	90.67
CBAM-CN	56.00	57.60	60.20	65.20	70.93	71.93	74.53
DHFRF-CN	74.67	80.60	86.40	89.80	93.47	93.87	95.53

Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 18.

Diagnostic accuracy of the eight methods under different SNRs. SNR: signal-to-noise ratio.

Figure 19.

Diagnostic accuracy of the eight methods under different SNRs. SNR: signal-to-noise ratio.

Case3: Gearbox fault diagnosis

In this section, the Southeast University (SEU) gearbox dataset is used for fault diagnosis experiments.⁵⁰ As shown in Figure 20, the experiments were conducted using a driveline dynamics simulator. The dataset consists of sub-datasets from the bearing and gearbox. Each sub-dataset has one operational condition state and four fault states. The gearbox fault types are shown in Table 8.

Figure 20.

SEU gearbox test bench. SEU: Southeast University.

Table 8.

Description of sample labels on the SEU gearbox dataset.

Dataset	Label	0	1	2	3	4
SEU gearbox dataset	Fault location	Normal	Chipped	Miss	Root	Surface
SEU gearbox dataset	Description	Health state	Teeth crack	Missing teeth	Root crack	Surface wear

SEU: Southeast University.

Fault diagnostic results under benchmark conditions

The training loss curve of DHFRF-CN is shown in Figure 21. From this, it can be observed that the model has a fast convergence property. Within the first five training cycles, the curves converge rapidly and stabilize. Additionally, the average training accuracy curve indicates no overfitting throughout the entire training process, which further proves that DHFRF-CN has good generalization ability during the learning process.

Figure 21.

Training loss curve.

As can be seen from the detailed data provided in Table 9, the minimum accuracy of DHFRF-CN consistently remains above the maximum accuracy of the other seven methods. As shown in Figure 22, DHFRF-CN achieved accuracy rates of 99.4, 99.2, 99.8, 99.2, and 99.6% in five independent experiments. The highest accuracy rate was obtained in each experiment, which fully verified the superior performance of the method. The performance of DHFRF-CN with the other seven methods in the diagnostic process is visualized from the bar chart. In terms of average F1 score and average accuracy, the DHFRF-CN proposed in this article also showed significant advantages over CNN, MSCNN, MSCN, WDCNN, BLC-CNN, DC-CN, and CBAM-CN, obtaining 14.36, 2.28, 8.72, 1.8, 1.76, 1.08, and 10.68% overall gain. In addition, the standard deviation of each of the DHFRF-CN is significantly lower than that of the other methods, which indicate that DHFRF-CN has higher stability and robustness.

Table 9.

Diagnostic results under benchmark conditions.

Model	Max-acc (%)	Min-acc (%)	Precision (%)	Recall (%)	F1 score (%)	Accuracy (%)
CNN	86.80	82.20	85.34 ± 1.438	85.08 ± 1.536	84.95 ± 1.517	85.08 ± 1.536
MSCNN	97.80	96.20	97.26 ± 0.546	97.16 ± 0.637	97.14 ± 0.662	97.16 ± 0.637
MSCN	92.80	88.40	91.45 ± 1.495	90.72 ± 1.800	90.73 ± 1.800	90.72 ± 1.800
WDCNN	98.20	96.20	97.78 ± 0.637	97.64 ± 0.752	97.65 ± 0.737	97.64 ± 0.752
BLC-CNN	98.20	96.80	97.73 ± 0.459	97.68 ± 0.466	97.68 ± 0.465	97.68 ± 0.466
DC-CN	98.80	98.00	98.40 ± 0.304	98.36 ± 0.320	98.36 ± 0.318	98.36 ± 0.320
CBAM-CN	90.40	87.00	89.55 ± 0.965	88.76 ± 1.162	88.61 ± 1.287	88.76 ± 1.162
DHFRF-CN	99.80	99.20	99.46 ± 0.227	99.44 ± 0.233	99.44 ± 0.232	99.44 ± 0.233

Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 22.

Diagnostic results of the eight methods.

To further verify the feature clustering effect of each method, Figure 23 visualizes the feature clustering results of the eight methods using the t-SNE algorithm. The different colors in the figure represent the five operating states of the gearbox. The comparison shows that the feature clustering effect of DHFRF-CN (Figure 23(h)) is the most obvious, which is able to gather the features of different gear states more closely. Comparatively, the feature clustering effect of the other seven methods (Figure 23(a) to (g)) is poor, and the feature differentiation between gear states is not as obvious as that of DHFRF-CN. Therefore, DHFRF-CN not only achieves higher classification accuracy but also surpasses other methods in feature representation and clustering effectiveness.

Figure 23.

Feature distribution visualization: (a) CNN, (b) MSCNN, (c) MSCN, (d) WDCNN, (e) BLC-CNN, (f) DC-CN, (g) CBAM-CN, and (h) DHFRF-CN. CNN: convolutional neural network; MSCNN: multiscale convolutional neural network; MSCN: multiscale capsule network; WDCNN: deep convolutional neural network with wide first-layer kernel; BLC-CNN: bidirectional long short-term memory and capsule network with convolutional neural network; DC-CN: dual convolutional capsule network; CBAM-CN: convolutional block attention mechanism capsule network; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Fault diagnostic results in complex noise environments

To verify the reliability of DHFRF-CN in fault diagnosis under complex noise environments, Gaussian white noise with different SNRs was added to the original vibration signals under benchmark conditions, constructing vibration signals in a noisy environment. The lower the SNR, the stronger the noise intensity; conversely, the higher the SNR, the weaker the noise intensity. The experimental results are shown in Table 10. The data are intuitively presented in Figures 24 and 25. It can be seen that interference information in the signal significantly affects the performance of the nine methods. Under different noise conditions, DHFRF-CN outperforms the other seven methods. Especially under strong noise conditions with an SNR of −6 dB, the proposed DHFRF-CN achieves overall gains of 15.67, 9.6, 16.47, 12.13, 9.2, 9.2, and 9.93% compared to CNN, MSCNN, MSCN, WDCNN, BLC-CNN, DC-CN, and CBAM-CN. This further demonstrates the strong robustness of the proposed DHFRF-CN.

Table 10.

Detailed diagnostic results of the eight methods under different SNRS (%).

Model	−6	−4	−2	0	2	4	6
CNN	68.33	70.27	73.60	74.47	77.27	78.77	79.60
MSCNN	74.40	79.27	84.93	86.73	89.20	93.07	93.47
MSCN	67.53	72.13	73.47	78.67	81.27	84.80	88.33
WDCNN	71.87	72.13	75.40	81.33	89.87	93.33	96.33
BLC-CNN	74.80	78.47	78.93	84.87	90.73	92.80	95.93
DC-CN	74.80	78.53	85.53	86.47	94.80	95.33	96.93
CBAM-CN	74.07	76.47	77.60	77.87	79.53	80.93	81.47
DHFRF-CN	84.00	95.13	96.20	97.40	97.53	97.67	98.87

SNR: signal-to-noise ratio; CNN: convolutional neural network; MSCNN: multiscale convolutional neural network; MSCN: multiscale capsule network; WDCNN: deep convolutional neural network with wide first-layer kernel; BLC-CNN: bidirectional long short-term memory and capsule network with convolutional neural network; DC-CN: dual convolutional capsule network; CBAM-CN: convolutional block attention mechanism capsule network; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network.

Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 24.

Diagnostic accuracy of the eight methods under different SNRs. SNR: signal-to-noise ratio.

Figure 25.

Diagnostic accuracy of the eight methods under different SNRs. SNR: signal-to-noise ratio.

Ablation study

In this section, we carry out ablation experiments on the PU bearing and SEU gearbox datasets to verify the contribution and effect of each module in the network.

Effectiveness of the AHFAM

In this section, we established DHFRF-CN-NAHFAM to verify the effectiveness of AHFAM. Experiments were performed on both the PU bearing and the SEU gearbox datasets, with detailed results shown in Table 11 and visually presented in Figures 26 and 27. In both datasets, the diagnostic accuracy of DHFRF-CN was 10.84 and 4.48% higher than that of DHFRF-CN-NAHFAM, respectively, with a smaller standard deviation. This indicates that AHFAM can effectively extract discriminative information from fault features during the learning process, thereby significantly enhancing the model’s fault recognition capability.

Table 11.

Average diagnostic accuracy of the four methods.

Dataset	Method	Accuracy (%)
PU bearing dataset	DHFRF-CN-NAHFAM	86.84 ± 1.098
	DHFRF-CN-NRFM	92.68 ± 0.928
	DHFRF-CN-SS	89.16 ± 0.557
	DHFRF-CN	97.68 ± 0.216
SEU gearbox dataset	DHFRF-CN-NAHFAM	94.96 ± 2.136
	DHFRF-CN-NRFM	97.72 ± 0.530
	DHFRF-CN-SS	94.32 ± 3.120
	DHFRF-CN	99.44 ± 0.233

PU: University of Paderborn; SEU: Southeast University; DHFRF-CN: dual heterogeneous feature resonance fusion for capsule network; NAHFAM: No adaptive heterogeneous feature adjustment mechanism; NRFM: No resonance fusion mechanism; SS: Single-channel structure. Bold represents the optimal performance of DHFRF-CN under different conditions.

Figure 26.

Average accuracy for the four methods.

Figure 27.

Diagnostic accuracy of each trail for the four methods.

Effectiveness of the RFM

In this section, we established DHFRF-CN-NRFM to verify the effectiveness of RFM. Experiments were performed on both the PU bearing and the SEU gearbox datasets, with detailed results shown in Table 11 and visually presented in Figures 26 and 27. In both datasets, the diagnostic accuracy of DHFRF-CN was 5 and 1.72% higher than that of DHFRF-CN-NRFM, respectively, with a smaller standard deviation. This indicates that RFM can adaptively integrate feature information from different channels, effectively avoiding information conflicts.

Effectiveness of the dual-scale network

In this section, we established DHFRF-CN-SS to verify the effectiveness of the dual-scale network. Experiments were performed on both the PU bearing and the SEU gearbox datasets, with detailed results shown in Table 11 and visually presented in Figures 26 and 27. In both datasets, the diagnostic accuracy of DHFRF-CN was 8.52 and 5.12% higher than that of DHFRF-CN-SS, respectively, with a smaller standard deviation. This demonstrates the advantage of the dual-scale network in capturing diverse features. In contrast, the single-channel network structure is limited when dealing with diverse features, making it difficult to fully uncover potential fault modes, leading to poor diagnostic performance.

Conclusion

In this article, we construct a novel DHFRF-CN for mechanical fault diagnosis. This method can accurately capture heterogeneous features and eliminate information conflicts between features under complex working conditions or noise interference. We conducted case studies on different fault datasets, comparing DHFRF-CN with state-of-the-art networks. The results show that the proposed network has strong fault diagnosis and noise resistance capabilities. Additionally, ablation experiments were designed to verify the effectiveness of the DHFRF-CN components in fault diagnosis.

The DHFRF-CN method proposed in this article focuses on fault diagnosis of rolling bearings and gearboxes. Future research will explore the application of this method to a broader range of machinery, such as motors and pumps. By adapting to the fault characteristics of different equipment, further validation of the general applicability and robustness of DHFRF-CN in mechanical fault diagnosis will be conducted. Additionally, with the rapid development of ML, emerging technologies like transfer learning and self-supervised learning have made significant progress. Therefore, future research will concentrate on how to effectively integrate these technologies into existing frameworks to address more complex operating conditions and diverse fault types.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (nos. 51875457 and 62271390), the Key Research and Development Program of Shaanxi Province of China (2025CY-YBXM-602), the Scientific Research Program Funded by Education Department of Shaanxi Provincial Government (program no. 24JC083), and the Graduate Student Innovation Fund of Xi'an University of Posts and Telecommunications (CXJJYL2024069).

ORCID iDs

Youming Wang

Gaige Chen

Xianzhi Wang

References

Yang

Song

, et al. Twin broad learning system for fault diagnosis of rotating machinery. IEEE Trans Instrum Meas 2023; 72: 1–12.

Yang

Zhang

Jiang

Mechanical fault diagnosis based on deep transfer learning: a review. Meas Sci Technol 2023; 34(11): 112001.

Tang

Yan

, et al. Deep transfer learning strategy in intelligent fault diagnosis of rotating machinery. Eng Appl Artif Intell 2024; 134: 108678.

Tama

Vania

Lee

, et al. Recent advances in the application of deep learning for fault diagnosis of rotating machinery using vibration signals. Artif Intell Rev 2023; 56(5): 4667–4709.

Yan

Liu

, et al. Bearing fault feature extraction method based on enhanced differential product weighted morphological filtering. Sensors 2022; 22(16): 6184.

Zhou

Xiao

Bartos

, et al. Remaining useful life prediction and fault diagnosis of rolling bearings based on short-time Fourier transform and convolutional neural network. Shock Vib 2020; 2020(1): 8857307.

Liang

Wang

, et al. A wavelet packet transform-based deep feature transfer learning method for bearing fault diagnosis under different working conditions. Measurement 2022; 201: 111597.

Kwon

Lee

Choi

, et al. Empirical mode decomposition and Hilbert-Huang transform-based eccentricity fault detection and classification with demagnetization in 120 kW interior permanent magnet synchronous motors. Expert Syst Appl 2024; 241: 122515.

Liu

Han

, et al. Attention on the key modes: machinery fault diagnosis transformers through variational mode decomposition. Knowl Based Syst 2024; 289: 111479.

10.

Chaleshtori

Aghaie

A novel bearing fault diagnosis approach using the Gaussian mixture model and the weighted principal component analysis. Reliab Eng Syst Saf 2024; 242: 109720.

11.

Zhang

Jin

, et al. RTSMFFDE-HKRR: a fault diagnosis method for train bearing in noise environment. Measurement 2025; 239: 115417.

12.

Shen

Xiao

Wang

, et al. Rolling bearing fault diagnosis based on support vector machine optimized by improved grey wolf algorithm. Sensors 2023; 23(14): 6645.

13.

Sridharan

Sugumaran

Visual fault detection in photovoltaic modules using decision tree algorithms with deep learning features. Energ Sources Part A Recovery Util Environ Eff 2025; 47(2): 2020379.

14.

Tyagi

Panigrahi

SK.

An SVM-ANN hybrid classifier for diagnosis of gear fault. Appl Artif Intell 2017; 31(3): 209–231.

15.

Chen

Huang

Zhao

, et al. Multiscale convolutional neural network with feature alignment for bearing fault diagnosis. IEEE Trans Instrum Meas 2021; 70: 1–10.

16.

Zhao

Jin

, et al. Prediction of bearing remaining useful life based on a two-stage updated digital twin. Adv Eng Inform 2025; 65: 103123.

17.

Jin

Wei

Intelligent fault diagnosis of train axle box bearing based on parameter optimization VMD and improved DBN. Eng Appl Artif Intell 2022; 110: 104713.

18.

Zhang

Shen

, et al. Intelligent fault diagnosis of multi-way directional valves in hydraulic systems using digital twin and deep learning approaches. Mech Syst Signal Process 2025; 230: 112579.

19.

Ruan

Wang

Yan

, et al. CNN parameter design based on fault signal analysis and its application in bearing fault diagnosis. Adv Eng Inform 2023; 55: 101877.

20.

Jiang

, et al. A time-frequency residual convolution neural network for the fault diagnosis of rolling bearings. Processes 2023; 12(1): 54.

21.

Sabour

Frosst

Hinton

GE.

Dynamic routing between capsules. Adv Neural Inf Process Syst 2017; 30: 3856–3866.

22.

Huang

Wang

, et al. A robust weight-shared capsule network for intelligent machinery fault diagnosis. IEEE Trans Ind Inform 2020; 16(10): 6466–6475.

23.

Liu

Zhang

Jiang

Imbalanced fault diagnosis of rolling bearing using improved MsR-GAN and feature enhancement-driven CapsNet. Mech Syst Signal Process 2022; 168: 108664.

24.

Lee

CKM

, et al. Ensemble capsule network with an attention mechanism for the fault diagnosis of bearings from imbalanced data samples. Sensors 2022; 22(15): 5543.

25.

Ren

, et al. Compound fault diagnosis of planetary gearbox based on improved LTSS-bow model and capsule network. Sensors 2024; 24(3): 940.

26.

Wang

Chen

A multi-scale spatial-temporal capsule network based on sequence encoding for bearing fault diagnosis. Complex Intell Syst 2024; 10: 6189–6212.

27.

Huang

Xie

Luo

, et al. Incremental learning with multi-fidelity information fusion for digital twin-driven bearing fault diagnosis. Eng Appl Artif Intell 2024; 133: 108212.

28.

Wang

Liu

A multi scale meta-learning network for cross domain fault diagnosis with limited samples. J Intell Manuf 2024; 36: 2841–2861.

29.

Wang

Kang

Huang

A multilevel attitude-aware denoising network for bearing fault diagnosis. IEEE Trans Indust Inform 2025; 21: 3686–3694.

30.

Xiong

Liu

Tan

, et al. Multi-scale adaptive-routing capsule contrastive network-based intelligent fault diagnosis method for rotating machinery under noisy environment and labels. Adv Eng Inform 2024; 62: 102712.

31.

Wang

Wei

Huang

, et al. IMWMOTE: a novel oversampling technique for fault diagnosis in heterogeneous imbalanced data. Expert Syst Appl 2024; 251: 123987.

32.

Min

Shao

, et al. Class-imbalanced machinery fault diagnosis using heterogeneous data fusion support tensor machine. J Dyn Monit Diagn 2025; 4(1): 11–21.

33.

Zhang

Gao

Shi

Bearing fault diagnosis method based on multi-source heterogeneous information fusion. Meas Sci Technol 2022; 33(7): 075901.

34.

Zhang

Tang

, et al. Few-shot fault diagnosis based on heterogeneous information fusion and meta learning. IEEE Sens J 2023, 23(18): 21433-21442.

35.

Miao

Zhou

Yuan

, et al. Multi-heterogeneous sensor data fusion method via convolutional neural network for fault diagnosis of wheeled mobile robot. Appl Soft Comput 2022; 129: 109554.

36.

Han

Zhang

, et al. Multi-source heterogeneous information fusion fault diagnosis method based on deep neural networks under limited datasets. Appl Soft Comput 2024; 154: 111371.

37.

Miao

Deep feature interactive network for machinery fault diagnosis using multi-source heterogeneous data. Reliab Eng Syst Saf 2024; 242: 109795.

38.

Dai

Xiong

, et al. Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 2017, pp. 764–773. Piscataway, NJ: IEEE.

39.

Long

Qin

Yang

, et al. Discriminative feature learning using a multiscale convolutional capsule network from attitude data for fault diagnosis of industrial robots. Mech Syst Signal Process 2023; 182: 109569.

40.

Chen

Han

Qiao

, et al. EEG-based sleep staging via self-attention based capsule network with Bi-LSTM model. Biomed Signal Process Control 2023; 86: 105351.

41.

Jiang

Jin

Hua

, et al. Numerical investigation of the effective receptive field and its relationship with convolutional kernels and layers in convolutional neural network. Front Mar Sci 2024; 11: 1492572.

42.

Sun

, et al. Fourier Convolution Block with global receptive field for MRI reconstruction. Med Image Anal 2025; 99: 103349.

43.

Wang

Cao

A multiscale convolution neural network for bearing fault diagnosis based on frequency division denoising under complex noise conditions. Complex Intell Syst 2023; 9(4): 4263–4285.

44.

Zhang

Peng

, et al. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017; 17(2): 425.

45.

Han

Zheng

Combination bidirectional long short-term memory and capsule network for rotating machinery fault diagnosis. Measurement 2021; 176: 109208.

46.

Zhang

Kang

, et al. Fault diagnosis of rotating machinery based on dual convolutional-capsule network (DC-CN). Measurement 2022; 187: 110258.

47.

Hendriks

Dumond

Knox

DA.

Towards better benchmarking using the CWRU bearing fault dataset. Mech Syst Signal Process 2022; 169: 108732.

48.

Guo

Discrete wavelet integrated convolutional residual network for bearing fault diagnosis under noise and variable operating conditions. Sci Rep 2025; 15(1): 1–26.

49.

Yin

Zhang

, et al. MC-ABDS: a system for low SNR fault diagnosis in industrial production with intense overlapping and interference. Appl Acoust 2025; 227: 110217.

50.

Liang

Deng

Yuan

, et al. A deep capsule neural network with data augmentation generative adversarial networks for single and simultaneous fault diagnosis of wind turbine gearbox. ISA Trans 2023; 135: 462–475.