An integrated approach for mechanical fault diagnosis using maximum mean square discrepancy representation and CNN-based mixed information fusion

Abstract

Bearings are an essential component of modern industry and cross-domain diagnosis of bearings holds significant importance. However, in practical applications, issues such as insufficient training data and differences between equipment present challenges. Transfer learning has become one of the effective methods to address these problems. This article proposes a fault diagnosis method that combines maximum mean square discrepancy (MMSD) measurement with the convolutional neural network (CNN) with mixed information (MIXCNN). MIXCNN enhances spatial position discrimination through deep convolution and achieves cross-channel information interaction using traditional convolution. The introduction of residual connections reduces information loss, while increasing network depth focuses on highly distinguishable features. MMSD constructs a metric that comprehensively reflects the mean and variance information of data samples in the reproducing kernel Hilbert space, thereby enhancing domain confusion. Experimental results show that this method achieves high diagnostic accuracy in various transfer tasks, with a maximum accuracy of 99.29%, providing reliable support for bearing fault diagnosis.

Keywords

MIXCNN MMSD deep convolution cross-channel information interaction transfer learning

Introduction

In today’s industrial production, there is an increasing focus on the reliability and safety of rotating mechanical equipment. These devices play crucial roles in various applications such as wind turbines,¹ aircraft engines,² and mining machinery.³ However, due to their high-speed operation in complex working environments, they often face various potential failure risks. Therefore, timely fault diagnosis and monitoring are essential to ensure the reliable operation of equipment.^4–6 Vibration data provide important information about the operational status of rotating machinery, thus, increasing attention is being devoted to fault diagnosis based on vibration data. Deep learning techniques facilitate convenient and accurate fault diagnosis by extracting deep features. However, they typically require a large number of labeled samples, and these samples need to be from the same distribution as the test samples.

Convolutional neural networks (CNNs)^7–10 have garnered significant attention in the field of rotating machinery fault diagnosis. Some studies transform vibration signals into two-dimensional time-feature data and then employ CNNs for fault diagnosis.¹¹ Although this approach can improve the accuracy of fault classification, it may lead to insufficient vibration signal features, as phase information could be lost. In contrast, one-dimensional convolutional neural networks (1DCNNs) can directly handle signals and extract features, simplifying the fault diagnosis process and revealing the inherent relationship between raw data and fault data. CNNs play a crucial role in one-dimensional signal analysis, similar to the impact of AlexNet¹² in the field of deep learning. In recent years, significant breakthroughs have been made in the field of one-dimensional signal analysis by adopting deeper network structures,¹³ more efficient architectures,¹⁴ and stronger multiscale analysis capabilities,¹⁵ constructing more powerful signal processing backbones.¹⁶ Yu et al.¹⁷ introduced a novel deep neural network Multichannel one-Dimensional Convolutional Neural Network (MC1-DCNN), which extracts effective features from high-dimensional process signals using wavelet transforms, demonstrating its excellent fault diagnosis performance and showing rich possibilities for applying this method to industrial processes. Liu et al.¹⁸ established a spall propagation evaluation algorithm based on spectral amplitude ratio and statistical features. Using Pearson’s correlation coefficient to select effective features, they successfully identified and assessed the location and severity of spall damage in ball bearings. The effectiveness of this method was validated using test data from previous studies. Li et al.¹⁹ proposed a modeling method for irregular-shaped defects and a dynamic model for double row cylindrical roller bearings. By validating the effectiveness of this model, they studied the impact of bearing load, rotating speed, and different defect sizes on bearing vibrations. The results indicate that the irregular-shaped defect model more accurately reflects the actual vibration characteristics of bearings compared to the simplified defect model. Liang et al.²⁰ proposed a novel method combining one-dimensional dilated convolutional networks and residual connections for diagnosing rolling bearing faults under different noise and load conditions. Experimental results showed that this method outperformed others in accuracy and had strong applicability and effectiveness. Ji et al.²¹ proposed a novel neural network compression method based on knowledge distillation and parameter quantization to address the challenge of deploying fault diagnosis networks on small embedded platforms, demonstrating the effectiveness of significantly compressing network size with minimal accuracy loss. Zhu et al.²² proposed an effective image tampering localization scheme based on Convolutional Next (ConvNeXt) encoder and multiscale feature fusion (ConvNeXtFF), using stacked ConvNeXt blocks as encoders to capture hierarchical multiscale features, and then fusing them in the decoder to accurately locate tampered pixels. The adoption of a combined loss function and effective data augmentation strategies further improved model performance. Xu et al.²³ proposed a dual-attention multiscale network model for fault diagnosis, where the dual-attention multiscale module guides the 1DCNN model to extract multiscale features, effectively capturing and enhancing discriminative features in vibration signals. Fang et al.²⁴ proposed an efficient and lightweight CNN-based feature extraction method for high-precision fault diagnosis, utilizing 1-D CNNs, dynamic convolution, separable convolution, and a spatial attention mechanism. some fault diagnosis models, such as CNN-Transformer,²⁵ multiscale and multichannel CNN,²⁶ and multi-information fusion²⁷ methods, adopt strategies that involve increasing the number of convolutional layers and parameters. Although these networks exhibit high accuracy, as the number of hidden layers increases, their operational efficiency decreases.^28,29 Simple network structures often struggle to sufficiently extract features. But, simply stacking more convolutional layers may lead to higher model error rates and trigger issues such as vanishing or exploding gradients. To strike a balance between model performance and computational complexity, it is necessary to address issues related to computational resource consumption and efficiency reduction. In order to better solve the above problems and extract the time-frequency information related to faults from vibration signals, this article proposes a fault diagnosis method combining maximum mean square discrepancy (MMSD) and MIXCNN. MIXCNN is a lightweight, efficient CNN fault diagnosis method that integrates channel and spatial position information. In the field of fault diagnosis, MIXCNN is unique in that it does not enhance feature similarity or weaken irrelevant features to improve performance but focuses on extracting strongly discriminative information that can better represent different faults. MMSD constructs a metric that comprehensively reflects the mean and variance information of data samples in the reproducing kernel Hilbert space, thereby enhancing domain confusion. The main objective of this study is to integrate the mixed information CNN (MIXCNN) network architecture with the MMSD to enhance the performance and accuracy of one-dimensional signal processing in fault diagnosis and monitoring of rotating mechanical equipment. The following summarizes the main contributions of this study:

The proposed MIXCNN network is a CNN-based fault diagnosis method. By combining deep convolution, point convolution, and residual connections, it achieves a mixture of channel and spatial position.

The MMSD, by comprehensively reflecting the mean and variance information of data samples, has significantly improved domain confusion issues.

A notable advantage of MIXCNN is its ability to retain original information to a greater extent by not requiring downsampling, thereby enhancing the model’s sensitivity and diagnostic accuracy in signal processing. The model can comprehensively capture subtle changes and features in signals, aiding in more accurate identification and interpretation of fault patterns.

Method in this article

This section introduces the construction of the MIXCNN model and the calculation of the maximum mean square difference.

The MIXCNN

Although the design of the loss function plays a crucial role in the feature extractor, helping capture domain-invariant and discriminative features in both the source and target domains, research indicates that the performance of the feature extractor itself is also crucial.^30,31 To discover more representative information in vibration signals, employing deeper networks directly is an intuitive approach. However, this may bring some adverse effects, such as overfitting and slow convergence speed. To overcome these issues, this article adopts the lightweight MIXCNN network, which can address these concerns by preserving the original information better while maintaining rapid convergence. Consequently, it enhances the model’s sensitivity to signal processing and diagnostic accuracy. Figure 1 illustrates the architecture of the MIXCNN network.

Figure 1.

Illustrates the framework of the MIXCNN network.

According to Figure 1, the network combines depthwise convolution, pointwise convolution, and residual connections to blend channels and spatial positions, thereby reducing parameters. Specifically, through pointwise convolution operations (with a kernel size of 1 × 1), features are sequentially weighted in the depth dimension, effectively leveraging information from different features at the same spatial position. This enables a more comprehensive and rich feature representation, enhancing the performance of the feature extractor. In the architecture of this network, the original vibration signals are initially processed using larger convolutional kernels as filters to extract local features and information over a certain period of time. Subsequently, the signals undergo convolutional operations through depthwise convolution layers, followed by activation functions and batch normalization. The depthwise convolution independently convolves each feature map in each channel, thereby enhancing feature information differentiation across spatial positions. Unlike traditional convolution, where one convolution kernel is responsible for one channel, in depthwise convolution, one kernel handles one channel, resulting in the same number of output features as the number of channels in the input layer. In this study, one-dimensional depthwise convolution is employed. Furthermore, the residual principle is utilized to add the output of the aforementioned results to the original convolution, reducing information loss.

The aim of this article is to overcome the limitations in feature extraction from vibration signals and better capture domain-invariant features in both the source and target domains. This will provide a more powerful and robust feature representation for subsequent domain adaptation tasks, thereby enhancing overall performance and generalization capability. The schematic diagram of the mixed information (MIX) layer in the network model is illustrated in Figure 2.

Figure 2.

The schematic diagram of the MIX layer in MIXCNN.

The construction of MMSD

This article introduces a method based on the maximum difference in mean square error, which effectively addresses domain confusion by comprehensively considering both the mean and the variance information of data samples, resulting in significant improvements. This approach not only provides a more comprehensive understanding of the distribution characteristics of the data but also more accurately captures the differences between different domains. Therefore, this integrated approach offers an effective solution to domain confusion issues and opens up possibilities for further enhancing model performance. According to Figure 3, this article utilizes a MIXCNN network as the backbone model for measuring differences using MMSD, which is employed for extracting fault features and conducting unsupervised transfer diagnosis. This backbone model consists of four “Conv1D” blocks, each comprising a convolutional layer, a batch normalization layer, and a max-pooling layer. We will employ the Rectified Linear Unit (ReLU) function as the activation function.

Figure 3.

The schematic diagram of MMSD.

$X_{S} \in {X_{S_{i}}}_{i = 1}^{n_{s}}$ and $X_{T} \in {X_{T_{j}}}_{j = 1}^{n_{t}}$ represent the source domain dataset $D_{s}$ and the target domain dataset $D_{t}, n_{s}$ , and $n_{t}$ are the numbers of samples in the source and target domains, respectively. Due to variations in working conditions such as environment, task settings, data collection methods, differences arise in the data distribution between the source $D_{s}$ and target domains $D_{t}$ . This discrepancy leads to a distribution shift between the source and target domains. The objective of unsupervised domain adaptation is to enable the feature extractor $G_{η}$ to extract domain-invariant features $F_{d} = (G_{η} (X_{S}) \cup G_{η} (X_{T}))$ to alleviate domain shift. Maximum mean discrepancy (MMD) is one of the most commonly used measures of difference in domain adaptation methods, used to quantify the degree of difference between two data distributions.^32–34 During domain adaptation training, minimizing the MMD between the source and target domains distributions aims to capture domain-invariant features. Furthermore, the extracted features should also possess good classification performance. Therefore, the objective function based on domain adaptation methods consists of classification cross-entropy loss term, discrepancy measurement loss term, and adversarial loss, defined as follows:

\begin{matrix} min_{G_{f}, G_{μ}} \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} J (G_{μ} (G_{η} (x_{i}^{s})), x_{j}^{t}) \\ + λ MMSD (G_{η} (X_{S}), G_{η} (X_{T})) + L_{ad_loss} \end{matrix}

(1)

\begin{matrix} MMS D^{2} (H \otimes H, G_{η} (X_{s}), G_{η} (X_{t})) \\ = ‖ \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ (G_{η} (x_{i}^{s})) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ (G_{η} (x_{j}^{s})) ‖_{H \otimes H}^{2} \\ = \frac{1}{n_{s}^{2}} \sum_{i = 1}^{n_{s}} \sum_{j = 1}^{n_{s}} k (G_{η} (x_{i}^{s}), G_{η} (x_{j}^{s})) \\ + \frac{1}{n_{t}^{2}} \sum_{i = 1}^{n_{t}} \sum_{j = 1}^{n_{t}} k (G_{η} (x_{i}^{t}), G_{η} (x_{j}^{t})) \\ - \frac{2}{n_{s} n_{t}} \sum_{i = 1}^{n_{s}} \sum_{j = 1}^{n_{t}} k (G_{η} (x_{i}^{s}), G_{η} (x_{j}^{t})) \end{matrix}

(2)

where $H \otimes H$ denotes the reproducing kernel Hilbert space, where the tensor product of two Hilbert spaces remains another Hilbert space $H = H_{1} \otimes H_{2}$ ; $ϕ (\cdot)$ represents the mapping from the original samples to the reproducing kernel Hilbert space; $k (\cdot)$ is the kernel function, typically using the Gaussian kernel function; $G_{μ}$ is typically a classifier model used to calculate the loss based on the feature representation and the target label $x_{j}^{t}$ ; $G_{η}$ is typically a feature extraction model used to transform the input $x_{i}^{s}$ into a feature representation.

\begin{matrix} J (G_{μ} (G_{η} (x_{i}^{s})), x_{j}^{t}) = - \frac{1}{n} \sum_{n}^{i = 1} \\ (\sum_{n}^{i = 1} (w_{i} * (y_{i} * \log (p_{i}) + (1 - y_{i}) * \log (1 - p_{i})))) \end{matrix}

(3)

In Equation (3),^35,36 $J (G_{μ} (G_{η} (x_{i}^{s})), x_{j}^{t})$ represents the cross-entropy loss between the predicted result of the input $x_{i}^{s}$ processed by the models $G_{η}$ and $G_{μ}$ , and the target label $x_{j}^{t}$ .

L_{ad_loss} = \frac{\sum_{n} w_{i} * G_{μ} (G_{η} (x_{i}^{s}), x_{j}^{t})}{\sum_{n} w_{i}}

(4)

where $G_{μ}$ represents the classifier; $y_{i}$ represents the true label of the i-th sample; $p_{i}$ is the probability of predicting the correct label; $L_{ad_loss}$ represents the adversarial loss; $w_{i}$ is the weight of sample $i$ , representing the importance of each sample in the loss calculation; $G_{η}$ is typically a feature extraction model used to transform the input $x_{i}^{s}$ into a feature representation.

Lightweight fault diagnosis framework

By integrating the MIXCNN architecture, a lightweight fault diagnosis framework is proposed in this article, as illustrated in Figure 4. It can be summarized into three steps:

Data collection and sample segmentation: System vibration data are collected and transmitted to digital devices. Subsequently, the data are segmented into samples for training, validation, and testing using sliding windows.

Lightweight design and model training: Lightweight modules such as MIX and one-dimensional convolution are embedded into the MIXCNN structure. The training process employs classification cross-entropy loss, discrepancy metric loss, and adversarial loss to compute the total loss of the training set. The Adam with Weight Decay (AdamW) optimizer is then utilized to update the model’s learning parameters. During iterations, the model with the highest accuracy on the validation set is selected as the well-trained model.

Fault detection, diagnosis, and result analysis: The well-trained fault diagnosis model is used to input the test set for multiangle visualization analysis of the diagnostic results.

Figure 4.

The overall framework schematic diagram.

Experimental validation

In this section, the authors present an unsupervised fault diagnosis case study involving rotating components, including gears and bearings. Simultaneously, the authors validate the proposed method to ascertain its effectiveness. Additionally, the authors conduct ablation experiments, testing different modules to evaluate their impact on system performance.

The bearing dataset from Jiangnan University

Jiangnan University³⁷ collected a dataset of bearing faults, recording vertical vibration signals using a PCB MA352A60 accelerometer. The data were sampled at a frequency of 50 kHz, with tests conducted at various speeds, including 600, 800, and 1000 rpm. To simulate bearing failures, small indentations were created on the outer race, inner race, and rolling elements of the rolling bearings using wire cutting technology. These indentations measured 0.3 mm in width and 0.05 mm in depth. The dataset includes specifications, fault sizes, and other necessary details of the tested bearings, as listed in Table 1. When selecting experimental samples, a data sample length of 1024 data points was chosen. For data partitioning, 80% of the data was used as the training set, while the remaining 20% constituted the test set. In this article, six transferable states, 0-1, 0-2, 1-2, 1-0, 2-0, and 2-1, are constructed in the Jiangnan University (JNU) bearing dataset for experimental verification. Table 2 provides the detailed parameters of each transfer state.

Table 1.

Bearing information for verification.

Contents	N205	NUP205
Bearing outer diameter	52 mm	52 mm
Bearing outer diameter	25 mm	25 mm
Bearing width	15 mm	15 mm
Bearing roller diameter	7 mm	7 mm
The number of the rollers	10	11
Contact angle	0 rad	0 rad
Out-race defect (width × depth)	0.3 × 0.25 mm Early stage
Rolling element defect (width × depth)	0.3 × 0.15 mm Early stage
Inner-race defect (width × depth)		0.3 × 0.25 mm Early stage

Table 2.

The transfer path description of rolling bearing.

Path	Soure domain (rpm)	Target doamin (rpm)	Fault types
			Locations	Severities
0-1	600	800
0-2	600	1000	Normal
1-2	800	1000	TF	0.007in
1-0	800	600	IF	0.014in
2-0	1000	600	OF	0.021in
2-1	1000	800

The article designs six different transfer experiments under various speed conditions to delve deeper into the bearing fault diagnosis problem. These experiments aim to assess the variations in model performance under different operating conditions for a more comprehensive understanding of the bearing’s operational state. To enhance the stability of the experiments in transfer learning, the authors conduct five independent experiments for each transfer task and average the results of these five experiments to obtain the final evaluation result. This approach of conducting multiple experiments helps reduce the impact of parameter random initialization on the results and improves the reliability of the outcomes. The article utilizes accuracy and scatter plots to quantitatively evaluate the performance of various methods, while further validation of the authors’ method’s effectiveness is achieved through confusion matrices. Detailed experimental results and analysis can be found in the corresponding tables and figures.

In Tables 3 and 9, Multiple Kernel Maximum Mean Discrepancy (MK-MMD)³⁸ is commonly used as a regularization term for domain adversarial loss, further reducing the marginal distribution distance between domains. In deep learning domain adaptation, MMD³⁹ is considered a common metric. Local Maximum Mean Discrepancy (LMMD)⁴⁰ is a method for measuring the difference between two distributions by dividing the sample space into multiple local regions, calculating the MMD in each region, and taking the maximum value as the final metric result. In Tables 4 and 10, Multiscale Residual Network (MSResNet)⁴¹ is a Residual Network (ResNet)⁴² model that updates the residual layer connections, adds multiscale feature extraction, and improves the dimensionality reduction of feature maps. Deep Convolutional Neural Networks with Wide First-layer Kernels (WDCNN)⁴³ performs fault diagnosis by using raw vibration signals as input and combining data augmentation.

Table 3.

Diagnostic accuracy of various methods on Jiangnan University dataset.

Tasks	MMSD	MK-MMD	MMD	LMMD
0-1	0.9915	0.9658	0.9778	0.9915
0-2	1.0000	0.9829	0.9459	0.9829
1-0	0.9829	0.9316	0.9018	0.9487
2-0	0.9915	0.9060	0.8944	0.9060
1-2	0.9915	0.9695	0.9329	0.9829
2-1	1.0000	0.9745	0.9557	0.9658

MMSD: maximum mean square discrepancy; MMD: maximum mean discrepancy.

Table 4.

Diagnostic accuracy of various models on Jiangnan University dataset.

Tasks	MIXCNN	MSResNet	WDCNN	ResNet18
0-1	0.9915	0.9615	0.9316	0.8974
0-2	1.0000	0.9188	0.9316	0.9145
1-0	0.9829	0.8761	0.8718	0.8632
2-0	0.9915	0.8675	0.8205	0.8376
1-2	0.9915	0.9573	0.9573	0.9658
2-1	1.0000	0.9615	0.9487	0.9744

MIXCNN: mixed information convolutional neural network.

Based on the data from Tables 3 and 4 and Figures 5 and 6, it is clear that the method proposed in this article exhibits higher accuracy across all transfer tasks. This indicates that the proposed method performs well on different tasks and datasets, demonstrating superior and stable performance overall. This consistent performance underscores the robustness and reliability of the proposed method.

Figure 5.

Confusion matrix of the proposed method on Jiangnan University dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.

Figure 6.

Feature visualization method based on t-SNE in Jiangnan University dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1. t-SNE: t-distributed stochastic neighbor embedding.

Based on the results in Table 5, differences in the performance of various models can be observed. The study indicates that replacing any module within the model leads to extreme instability in accuracy, which is particularly evident in most transfer tasks, typically accompanied by a noticeable decrease in accuracy. This phenomenon confirms the effectiveness and robustness of the proposed model and methods. Furthermore, Table 6 shows that even with noise added to the vibration signals, the model’s accuracy remains above 86%, further demonstrating the model’s stability in the presence of noise.

Table 5.

Comparative results of ablation experiments on Jiangnan University dataset.

Tasks	MIXCNN–MMSD	MIXCNN	MMSD
0-1	0.9915	0.9658	0.9316
0-2	1.0000	0.9829	0.9316
1-0	0.9829	0.9316	0.8718
2-0	0.9915	0.9060	0.8205
1-2	0.9915	0.9695	0.9573
2-1	1.0000	0.9745	0.9487

MIXCNN: mixed information convolutional neural network; MMSD: maximum mean square discrepancy.

Table 6.

Experimental results after adding noise on the Jiangnan University dataset.

Tasks	SNR = 0	SNR = 5	SNR = 10
0-1	0.9915	0.9060	0.9829
0-2	1.0000	0.9231	0.9573
1-2	0.9829	0.8974	0.9231
1-0	0.9915	0.8632	0.9402
2-0	0.9915	0.8889	0.8889
2-1	1.0000	0.8889	0.9487

SNR: signal-to-noise ratio.

The laboratory bearing dataset

The experimental bench is mainly composed of three-phase asynchronous motor, transmission shaft, bearing and other components. The sampling frequency is set to 19.2 kHz, and two channels (vertical and horizontal directions) are collected synchronously, collecting data at a rate of 19,200 samples per second. The acceleration sensor is placed vertically on the bearing seat of the experimental bench, and the acceleration signal is collected for 110 s for each fault type at each speed. The equipment used in the experiment is shown in Figure 7.

Figure 7.

Photograph of the experimental setup.

The experiment utilized NUP205 cylindrical roller bearings, with faults created by wire cutting, resulting in outer race faults, inner race faults, and normal bearings. The different health conditions of the bearings are depicted in Figure 8.

Figure 8.

Health conditions of NUP205 bearings: (a) outer ring fault 1, (b) outer ring fault 2, (c) outer ring fault 3, (d) outer ring fault 4, (e) inter ring fault 1, (f) inter ring fault 2, (g) inter ring fault 3, and (h) normal.

All data samples were 1024 data points in length, with data partitioned into 80% for training and 20% for testing. In this article, six transferable states (0-1, 0-2, 1-2, 1-0, 2-0, and 2-1) were constructed in the laboratory bearing dataset for experimental verification. Table 7 provides the detailed parameters of each transfer state: 1350 rpm corresponds to state 0, 1500 rpm to state 1, and 1650 rpm to state 2. For example, an outer ring fault described as 1_0_0_2_1 indicates an outer ring fault with a defect distance to the nearest edge of 0 mm, a defect length of 0 mm, a width of 2 mm, and a depth of 1 mm. Other fault types are shown in Tables 7 and 8.

Table 7.

NUP205 model experimental bearing parameters.

Health status	Label	Fault description
Outer ring fault 1_0_0_2_1	0	0_0_2_1
Outer ring fault 2_0_4.5_2_1	1	0_4.5_2_1
Outer ring fault 3_mid_4_4_1	2	mid_4_4_1
Outer ring fault 4_tworows of 3 circles	3	two rows of threecircles
Inter ring fault 1_0_7_2_1	4	0_7_2_1
Inter ring fault 2_3_3_3_1	5	3_3_3_1
Inter ring fault 3_3rows 3 circles	6	Three rows threecircles
Normal	7	—

Table 8.

The transfer path description of rolling bearing.

Path	Sourcedomain(rpm)	Targetdomain (rpm)	Faulttypes
0-1	1350	1500	Normal
0-2	1350	1650	Outer ring fault 1
1-2	1500	1650	Outer ring fault 2
1-0	1500	1350	Outer ring fault 3
2-0	1650	1350	Outer ring fault 4
2-1	1650	1500	Inter ring fault 1
			Inter ring fault 2
			Inter ring fault 3

From Tables 9 and 10, we can see that the proposed method is superior to other comparison methods. From Table 11, we can see that the ablation study results show that the proposed model achieves the highest accuracy. This reflects the effectiveness of each module in the model. From Table 12, we can see that in the six migration tasks of noisy data, the minimum accuracy of the model is stable at around 70%, which still maintains a good accuracy. Based on Figures 9 and 10, it is evident that the proposed method achieves satisfactory diagnostic results for various health conditions in each transfer task on the laboratory dataset. This observation strongly confirms the effectiveness of the proposed method, especially the combined application of MIXCNN and MMSD, in improving model accuracy and performance.

Table 9.

Diagnostic accuracy of various comparison methods on the laboratory dataset.

Tasks	MMSD	MK-MMD	MMD	LMMD
0-1	1.0000	0.8720	0.8760	0.8320
0-2	1.0000	0.8560	0.8360	0.8880
1-0	0.9840	0.8960	0.9081	0.8480
2-0	0.9920	0.9280	0.9038	0.8800
1-2	1.0000	0.8880	0.8730	0.9840
2-1	0.9920	0.8960	0.9252	0.8880

MMSD: maximum mean square discrepancy; MMD: maximum mean discrepancy.

Table 10.

Diagnostic accuracy of various models on the laboratory dataset.

Tasks	MIXCNN	MSResNet	WDCNN	ResNet18
0-1	1.0000	0.9872	0.8000	0.9600
0-2	1.0000	0.9423	0.8320	0.9760
1-0	0.9840	0.9701	0.8400	0.9520
2-0	0.9920	0.9658	0.7840	0.9600
1-2	1.0000	0.9722	0.9280	0.9840
2-1	0.9920	0.9786	0.7280	0.9760

MIXCNN: mixed information convolutional neural network.

Table 11.

Comparison results of ablation experiments on the laboratory dataset.

Tasks	MIXCNN–MMSD	MXICNN	MMSD
0-1	1.0000	0.8320	0.9600
0-2	1.0000	0.8880	0.9760
1-0	0.9840	0.8480	0.9520
2-0	0.9920	0.8800	0.9600
1-2	1.0000	0.9840	0.9840
2-1	0.9920	0.8880	0.9760

MIXCNN: mixed information convolutional neural network; MMSD: maximum mean square discrepancy.

Table 12.

Experimental results after adding noise to the laboratory dataset.

Tasks	SNR = 0	SNR = 5	SNR = 10
0-1	1.0000	0.7260	0.7979
0-2	1.0000	0.6952	0.8322
1-2	0.9840	0.7021	0.8151
1-0	0.9920	0.7146	0.8116
2-0	1.0000	0.6747	0.7547
2-1	0.9920	0.7055	0.8527

SNR: signal-to-noise ratio.

Figure 9.

Confusion matrix of the proposed method on the laboratory dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.

Figure 10.

t-SNE based feature visualization method on the laboratory dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.

Conclusion

This article proposes a novel bearing fault diagnosis method based on the MMSD and the MIXCNN. The innovation of this article lies in the use of deep convolution modules in MIXCNN to enhance the spatial position discrimination ability for bearing fault information. The proposed method employs convolution modules to achieve cross-channel interaction and introduces residual connections to reduce information loss in convolutional layers. Compared with existing methods, the superiority of the proposed method is attributed to MIXCNN not performing downsampling, thus maintaining consistent output across convolutional layers and minimizing the loss of original information. Additionally, MMSD comprehensively reflects the mean and variance information of data samples, further improving the model’s fault diagnosis accuracy. Experiments have validated the excellent performance of the proposed method in bearing fault diagnosis. Future research by the authors will explore the spatial position discrimination ability of this method for bearing fault information in high-noise environments by integrating various types of sensor data.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported in part by the Fundamental Research Funds for the Central Universities (No. 2572022BF07), in part by the Key Laboratory of Vibration and Control of Aero-Propulsion System, Ministry of Education, Northeastern University (VCAME202209), and in part by the Harbin science and technology innovation talent project (2023HBRCCG004).

ORCID iD

Zhijie Xie

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Zhu

Zhao

Yang

, et al. Condition monitoring of wind turbine based on deep learning networks and kernel principal component analysis. Comput Electrical Eng 2023; 105: 108538.

Abdulrahman

Eltoum

Ayyad

, et al. Aero-engine blade defect detection: a systematic review of deep learning models. IEEE Access 2023; 11: 53048–53061.

Ashtiani

Raahemi

News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Syst Appl 2023; 217: 119509.

Tama

Vania

Lee

, et al. Recent advances in the application of deep learning for fault diagnosis of rotating machinery using vibration signals. Artif Intell Rev 2023; 56(5): 4667–4709.

Shwetabh

Ambhaikar

Smart health monitoring system of agricultural machines: deep learning-based optimization with IoT and AI. BIO Web Conf 2024; 82: 05007.

Kulkarni

Raisi

Valente

, et al. Deep learning augmented infrared thermography for unmanned aerial vehicles structural health monitoring of roadways. Autom Constr 2023; 148: 104784.

Budiman

Yaputera

Achmad

, et al. Student attendance with face recognition (LBPH or CNN): systematic literature review. Procedia Comput Sci 2023; 216: 31–38.

Shah

Manan

Pandya

, et al. A comprehensive study on skin cancer detection using artificial neural network (ANN) and convolutional neural network (CNN). Clin eHealth 2023; 6: 76–84.

Zhao

Jiao

A fault diagnosis method for rotating machinery based on CNN with mixed information. IEEE Trans Ind Inform 2023; 19: 9091–9101.

10.

Wang

Huang

, et al. Semi-supervised multi-sensor information fusion tailored graph embedded low-rank tensor learning machine under extremely low labeled rate. Inform Fusion 2024; 105: 102222.

11.

Huang

Zhang

Safaei

, et al. The flexible tensor singular value decomposition and its applications in multisensor signal fusion processing. Mech Syst Signal Process 2024; 220: 111662.

12.

Arias-Serrano

Velásquez-López

Avila-Briones

, et al. Artificial intelligence based glaucoma and diabetic retinopathy detection using MATLAB—retrained AlexNet convolutional neural network. F1000Res 2023; 12: 14.

13.

Sharma

Tripathi

Mittal

DLMC-Net: deeper lightweight multi-class classification model for plant leaf disease detection. Ecol Inform 2023; 75: 102025.

14.

Menghani

GJ.

Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput Surv 2023; 55(12): 1–37.

15.

Chen

Liu

, et al. Learning multi-scale features for speech emotion recognition with connection attention mechanism. Expert Syst Appl 2023; 214: 118943.

16.

Kaur

Singh

SJ.

A comprehensive review of object detection with deep learning. Digital Signal Process 2023; 132: 103812.

17.

Zhang

Wang

Multichannel one-dimensional convolutional neural network-based feature learning for fault diagnosis of industrial processes. Neural Comput Appl 2021; 33: 3085–3104.

18.

Liu

Zhou

, et al. A statistical feature investigation of the spalling propagation assessment for a ball bearing. Mech Machine Theory 2019; 131: 336–350.

19.

Liu

Ding

, et al. Dynamic modeling and vibration analysis of double row cylindrical roller bearings with irregular-shaped defects. Nonlinear Dyn 2024; 112(4): 2501–2521.

20.

Liang

Zhao

Rolling bearing fault diagnosis based on one-dimensional dilated convolution network with residual connection. IEEE Access 2021; 9: 31078–31091.

21.

Peng

, et al. A neural network compression method based on knowledge-distillation and parameter quantization for the bearing fault diagnosis. Appl Soft Comput 2022; 127: 109331.

22.

Zhu

Cao

Zhao

, et al. Effective image tampering localization with multi-scale convnext feature fusion. J Vis Commun Image Representation 2024; 98: 103981.

23.

Yan

Sun

, et al. Dually attentive multiscale networks for health state recognition of rotating machinery. Reliab Eng Syst Safety 2022; 225: 108626.

24.

Fang

Deng

Zhao

, et al. LEFE-Net: a lightweight efficient feature extraction network with strong robustness for bearing fault diagnosis. IEEE Trans Instrumen Meas 2021; 70: 1–11.

25.

Yuan

Zhu

, et al. CTIF-Net: a CNN-transformer iterative fusion network for salient object detection. IEEE Trans Circuits Syst Video Technol 2023; pp(99): 1–1.

26.

Behera

Bakshi

Nappi

, et al. Superpixel-based multiscale CNN approach toward multiclass object segmentation from UAV-captured aerial images. IEEE J Selected Topics Appl Earth Observations Remote Sens 2023; 16: 1771–1784.

27.

Wang

Gan

Mao

, et al. Forecasting power demand in China with a CNN-LSTM model including multimodal information. Energy 2023; 263: 126012.

28.

Yin

Han

, et al. Parameter-efficient is not sufficient: exploring parameter, memory, and time efficient adapter tuning for dense predictions. arXiv:2306.09729, 2023.

29.

Zhang

Cheng

Wang

, et al. Asymmetric cross-attention hierarchical network based on CNN and transformer for bitemporal remote sensing images change detection. IEEE Trans Geosci Remote Sens 2023; 61: 1–15.

30.

Xiao

Shao

Min

, et al. Multiscale dilated convolutional subdomain adaptation network with attention for unsupervised fault diagnosis of rotating machinery cross operating conditions. Measurement 2022; 204: 112146.

31.

Tang

Zhu

, et al. A multiscale framework with unsupervised learning for remote sensing image registration. IEEE Trans Geosci Remote Sens 2022; 60: 1–15.

32.

Dewi

Chen

APS

Christanto

. YOLOv7 for face mask identification based on deep learning. In: 2023 15th international conference on computer and automation engineering (ICCAE), Sydeny, Australia, 2023, pp. 193–197: IEEE.

33.

Wang

Feng

Zhang

, et al. An unsupervised domain adaptation deep learning method for spatial and temporal transferable crop type mapping using Sentinel-2 imagery. ISPRS J Photogramm Remote Sens 2023; 199: 102–117.

34.

Huang

Yin

Yan

, et al. A fault diagnosis method of bearings based on deep transfer learning. Simulation Modell Practice Theory 2023; 122: 102659.

35.

Wang

, et al. Fusing joint distribution and adversarial networks: a new transfer learning method for intelligent fault diagnosis. Appl Acoust 2024; 216: 109767.

36.

Zhu

Zhao

Yao

, et al. Adaptive multiscale convolution manifold embedding networks for intelligent fault diagnosis of servo motor-cylindrical rolling bearing under variable working conditions. IEEE/ASME Trans Mech 2023; PP(99): 1–11.

37.

Ping

Wang

, et al. Sequential fuzzy diagnosis method for motor roller bearing in variable operating conditions based on vibration analysis. Sensors 2013; 13(6): 8013–8041.

38.

Cheng

Liang

, et al. A deep adaptation network for speech enhancement: combining a relativistic discriminator with multi-kernel maximum mean discrepancy. IEEE/ACM Trans Audio Speech Language Process 2020; 29: 41–53.

39.

Qian

Wang

Zhang

, et al. Maximum mean square discrepancy: a new discrepancy representation metric for mechanical fault transfer diagnosis. Knowledge-Based Syst 2023; 276: 110748.

40.

Zhu

Zhuang

Wang

, et al. Deep subdomain adaptation network for image classification. IEEE Trans Neural Netw Learn Syst 2020; 32(4): 1713–1722.

41.

Pang

Jiang

, et al. A spatio-temporal multiscale neural network approach for wind turbine fault diagnosis with imbalanced SCADA data. IEEE Trans Ind Inform 2020; 17(10): 6875–6884.

42.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, 2016, pp. 770–778.

43.

Zhang

Peng

, et al. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017; 17(2): 425.