Abstract
Bearings are an essential component of modern industry and cross-domain diagnosis of bearings holds significant importance. However, in practical applications, issues such as insufficient training data and differences between equipment present challenges. Transfer learning has become one of the effective methods to address these problems. This article proposes a fault diagnosis method that combines maximum mean square discrepancy (MMSD) measurement with the convolutional neural network (CNN) with mixed information (MIXCNN). MIXCNN enhances spatial position discrimination through deep convolution and achieves cross-channel information interaction using traditional convolution. The introduction of residual connections reduces information loss, while increasing network depth focuses on highly distinguishable features. MMSD constructs a metric that comprehensively reflects the mean and variance information of data samples in the reproducing kernel Hilbert space, thereby enhancing domain confusion. Experimental results show that this method achieves high diagnostic accuracy in various transfer tasks, with a maximum accuracy of 99.29%, providing reliable support for bearing fault diagnosis.
Introduction
In today’s industrial production, there is an increasing focus on the reliability and safety of rotating mechanical equipment. These devices play crucial roles in various applications such as wind turbines, 1 aircraft engines, 2 and mining machinery. 3 However, due to their high-speed operation in complex working environments, they often face various potential failure risks. Therefore, timely fault diagnosis and monitoring are essential to ensure the reliable operation of equipment.4–6 Vibration data provide important information about the operational status of rotating machinery, thus, increasing attention is being devoted to fault diagnosis based on vibration data. Deep learning techniques facilitate convenient and accurate fault diagnosis by extracting deep features. However, they typically require a large number of labeled samples, and these samples need to be from the same distribution as the test samples.
Convolutional neural networks (CNNs)7–10 have garnered significant attention in the field of rotating machinery fault diagnosis. Some studies transform vibration signals into two-dimensional time-feature data and then employ CNNs for fault diagnosis. 11 Although this approach can improve the accuracy of fault classification, it may lead to insufficient vibration signal features, as phase information could be lost. In contrast, one-dimensional convolutional neural networks (1DCNNs) can directly handle signals and extract features, simplifying the fault diagnosis process and revealing the inherent relationship between raw data and fault data. CNNs play a crucial role in one-dimensional signal analysis, similar to the impact of AlexNet 12 in the field of deep learning. In recent years, significant breakthroughs have been made in the field of one-dimensional signal analysis by adopting deeper network structures, 13 more efficient architectures, 14 and stronger multiscale analysis capabilities, 15 constructing more powerful signal processing backbones. 16 Yu et al. 17 introduced a novel deep neural network Multichannel one-Dimensional Convolutional Neural Network (MC1-DCNN), which extracts effective features from high-dimensional process signals using wavelet transforms, demonstrating its excellent fault diagnosis performance and showing rich possibilities for applying this method to industrial processes. Liu et al. 18 established a spall propagation evaluation algorithm based on spectral amplitude ratio and statistical features. Using Pearson’s correlation coefficient to select effective features, they successfully identified and assessed the location and severity of spall damage in ball bearings. The effectiveness of this method was validated using test data from previous studies. Li et al. 19 proposed a modeling method for irregular-shaped defects and a dynamic model for double row cylindrical roller bearings. By validating the effectiveness of this model, they studied the impact of bearing load, rotating speed, and different defect sizes on bearing vibrations. The results indicate that the irregular-shaped defect model more accurately reflects the actual vibration characteristics of bearings compared to the simplified defect model. Liang et al. 20 proposed a novel method combining one-dimensional dilated convolutional networks and residual connections for diagnosing rolling bearing faults under different noise and load conditions. Experimental results showed that this method outperformed others in accuracy and had strong applicability and effectiveness. Ji et al. 21 proposed a novel neural network compression method based on knowledge distillation and parameter quantization to address the challenge of deploying fault diagnosis networks on small embedded platforms, demonstrating the effectiveness of significantly compressing network size with minimal accuracy loss. Zhu et al. 22 proposed an effective image tampering localization scheme based on Convolutional Next (ConvNeXt) encoder and multiscale feature fusion (ConvNeXtFF), using stacked ConvNeXt blocks as encoders to capture hierarchical multiscale features, and then fusing them in the decoder to accurately locate tampered pixels. The adoption of a combined loss function and effective data augmentation strategies further improved model performance. Xu et al. 23 proposed a dual-attention multiscale network model for fault diagnosis, where the dual-attention multiscale module guides the 1DCNN model to extract multiscale features, effectively capturing and enhancing discriminative features in vibration signals. Fang et al. 24 proposed an efficient and lightweight CNN-based feature extraction method for high-precision fault diagnosis, utilizing 1-D CNNs, dynamic convolution, separable convolution, and a spatial attention mechanism. some fault diagnosis models, such as CNN-Transformer, 25 multiscale and multichannel CNN, 26 and multi-information fusion 27 methods, adopt strategies that involve increasing the number of convolutional layers and parameters. Although these networks exhibit high accuracy, as the number of hidden layers increases, their operational efficiency decreases.28,29 Simple network structures often struggle to sufficiently extract features. But, simply stacking more convolutional layers may lead to higher model error rates and trigger issues such as vanishing or exploding gradients. To strike a balance between model performance and computational complexity, it is necessary to address issues related to computational resource consumption and efficiency reduction. In order to better solve the above problems and extract the time-frequency information related to faults from vibration signals, this article proposes a fault diagnosis method combining maximum mean square discrepancy (MMSD) and MIXCNN. MIXCNN is a lightweight, efficient CNN fault diagnosis method that integrates channel and spatial position information. In the field of fault diagnosis, MIXCNN is unique in that it does not enhance feature similarity or weaken irrelevant features to improve performance but focuses on extracting strongly discriminative information that can better represent different faults. MMSD constructs a metric that comprehensively reflects the mean and variance information of data samples in the reproducing kernel Hilbert space, thereby enhancing domain confusion. The main objective of this study is to integrate the mixed information CNN (MIXCNN) network architecture with the MMSD to enhance the performance and accuracy of one-dimensional signal processing in fault diagnosis and monitoring of rotating mechanical equipment. The following summarizes the main contributions of this study:
The proposed MIXCNN network is a CNN-based fault diagnosis method. By combining deep convolution, point convolution, and residual connections, it achieves a mixture of channel and spatial position.
The MMSD, by comprehensively reflecting the mean and variance information of data samples, has significantly improved domain confusion issues.
A notable advantage of MIXCNN is its ability to retain original information to a greater extent by not requiring downsampling, thereby enhancing the model’s sensitivity and diagnostic accuracy in signal processing. The model can comprehensively capture subtle changes and features in signals, aiding in more accurate identification and interpretation of fault patterns.
Method in this article
This section introduces the construction of the MIXCNN model and the calculation of the maximum mean square difference.
The MIXCNN
Although the design of the loss function plays a crucial role in the feature extractor, helping capture domain-invariant and discriminative features in both the source and target domains, research indicates that the performance of the feature extractor itself is also crucial.30,31 To discover more representative information in vibration signals, employing deeper networks directly is an intuitive approach. However, this may bring some adverse effects, such as overfitting and slow convergence speed. To overcome these issues, this article adopts the lightweight MIXCNN network, which can address these concerns by preserving the original information better while maintaining rapid convergence. Consequently, it enhances the model’s sensitivity to signal processing and diagnostic accuracy. Figure 1 illustrates the architecture of the MIXCNN network.

Illustrates the framework of the MIXCNN network.
According to Figure 1, the network combines depthwise convolution, pointwise convolution, and residual connections to blend channels and spatial positions, thereby reducing parameters. Specifically, through pointwise convolution operations (with a kernel size of 1 × 1), features are sequentially weighted in the depth dimension, effectively leveraging information from different features at the same spatial position. This enables a more comprehensive and rich feature representation, enhancing the performance of the feature extractor. In the architecture of this network, the original vibration signals are initially processed using larger convolutional kernels as filters to extract local features and information over a certain period of time. Subsequently, the signals undergo convolutional operations through depthwise convolution layers, followed by activation functions and batch normalization. The depthwise convolution independently convolves each feature map in each channel, thereby enhancing feature information differentiation across spatial positions. Unlike traditional convolution, where one convolution kernel is responsible for one channel, in depthwise convolution, one kernel handles one channel, resulting in the same number of output features as the number of channels in the input layer. In this study, one-dimensional depthwise convolution is employed. Furthermore, the residual principle is utilized to add the output of the aforementioned results to the original convolution, reducing information loss.
The aim of this article is to overcome the limitations in feature extraction from vibration signals and better capture domain-invariant features in both the source and target domains. This will provide a more powerful and robust feature representation for subsequent domain adaptation tasks, thereby enhancing overall performance and generalization capability. The schematic diagram of the mixed information (MIX) layer in the network model is illustrated in Figure 2.

The schematic diagram of the MIX layer in MIXCNN.
The construction of MMSD
This article introduces a method based on the maximum difference in mean square error, which effectively addresses domain confusion by comprehensively considering both the mean and the variance information of data samples, resulting in significant improvements. This approach not only provides a more comprehensive understanding of the distribution characteristics of the data but also more accurately captures the differences between different domains. Therefore, this integrated approach offers an effective solution to domain confusion issues and opens up possibilities for further enhancing model performance. According to Figure 3, this article utilizes a MIXCNN network as the backbone model for measuring differences using MMSD, which is employed for extracting fault features and conducting unsupervised transfer diagnosis. This backbone model consists of four “Conv1D” blocks, each comprising a convolutional layer, a batch normalization layer, and a max-pooling layer. We will employ the Rectified Linear Unit (ReLU) function as the activation function.

The schematic diagram of MMSD.
where
In Equation (3),35,36
where
Lightweight fault diagnosis framework
By integrating the MIXCNN architecture, a lightweight fault diagnosis framework is proposed in this article, as illustrated in Figure 4. It can be summarized into three steps:
Data collection and sample segmentation: System vibration data are collected and transmitted to digital devices. Subsequently, the data are segmented into samples for training, validation, and testing using sliding windows.
Lightweight design and model training: Lightweight modules such as MIX and one-dimensional convolution are embedded into the MIXCNN structure. The training process employs classification cross-entropy loss, discrepancy metric loss, and adversarial loss to compute the total loss of the training set. The Adam with Weight Decay (AdamW) optimizer is then utilized to update the model’s learning parameters. During iterations, the model with the highest accuracy on the validation set is selected as the well-trained model.
Fault detection, diagnosis, and result analysis: The well-trained fault diagnosis model is used to input the test set for multiangle visualization analysis of the diagnostic results.

The overall framework schematic diagram.
Experimental validation
In this section, the authors present an unsupervised fault diagnosis case study involving rotating components, including gears and bearings. Simultaneously, the authors validate the proposed method to ascertain its effectiveness. Additionally, the authors conduct ablation experiments, testing different modules to evaluate their impact on system performance.
The bearing dataset from Jiangnan University
Jiangnan University 37 collected a dataset of bearing faults, recording vertical vibration signals using a PCB MA352A60 accelerometer. The data were sampled at a frequency of 50 kHz, with tests conducted at various speeds, including 600, 800, and 1000 rpm. To simulate bearing failures, small indentations were created on the outer race, inner race, and rolling elements of the rolling bearings using wire cutting technology. These indentations measured 0.3 mm in width and 0.05 mm in depth. The dataset includes specifications, fault sizes, and other necessary details of the tested bearings, as listed in Table 1. When selecting experimental samples, a data sample length of 1024 data points was chosen. For data partitioning, 80% of the data was used as the training set, while the remaining 20% constituted the test set. In this article, six transferable states, 0-1, 0-2, 1-2, 1-0, 2-0, and 2-1, are constructed in the Jiangnan University (JNU) bearing dataset for experimental verification. Table 2 provides the detailed parameters of each transfer state.
Bearing information for verification.
The transfer path description of rolling bearing.
The article designs six different transfer experiments under various speed conditions to delve deeper into the bearing fault diagnosis problem. These experiments aim to assess the variations in model performance under different operating conditions for a more comprehensive understanding of the bearing’s operational state. To enhance the stability of the experiments in transfer learning, the authors conduct five independent experiments for each transfer task and average the results of these five experiments to obtain the final evaluation result. This approach of conducting multiple experiments helps reduce the impact of parameter random initialization on the results and improves the reliability of the outcomes. The article utilizes accuracy and scatter plots to quantitatively evaluate the performance of various methods, while further validation of the authors’ method’s effectiveness is achieved through confusion matrices. Detailed experimental results and analysis can be found in the corresponding tables and figures.
In Tables 3 and 9, Multiple Kernel Maximum Mean Discrepancy (MK-MMD) 38 is commonly used as a regularization term for domain adversarial loss, further reducing the marginal distribution distance between domains. In deep learning domain adaptation, MMD 39 is considered a common metric. Local Maximum Mean Discrepancy (LMMD) 40 is a method for measuring the difference between two distributions by dividing the sample space into multiple local regions, calculating the MMD in each region, and taking the maximum value as the final metric result. In Tables 4 and 10, Multiscale Residual Network (MSResNet) 41 is a Residual Network (ResNet) 42 model that updates the residual layer connections, adds multiscale feature extraction, and improves the dimensionality reduction of feature maps. Deep Convolutional Neural Networks with Wide First-layer Kernels (WDCNN) 43 performs fault diagnosis by using raw vibration signals as input and combining data augmentation.
Diagnostic accuracy of various methods on Jiangnan University dataset.
MMSD: maximum mean square discrepancy; MMD: maximum mean discrepancy.
Diagnostic accuracy of various models on Jiangnan University dataset.
MIXCNN: mixed information convolutional neural network.
Based on the data from Tables 3 and 4 and Figures 5 and 6, it is clear that the method proposed in this article exhibits higher accuracy across all transfer tasks. This indicates that the proposed method performs well on different tasks and datasets, demonstrating superior and stable performance overall. This consistent performance underscores the robustness and reliability of the proposed method.

Confusion matrix of the proposed method on Jiangnan University dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.

Feature visualization method based on t-SNE in Jiangnan University dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1. t-SNE: t-distributed stochastic neighbor embedding.
Based on the results in Table 5, differences in the performance of various models can be observed. The study indicates that replacing any module within the model leads to extreme instability in accuracy, which is particularly evident in most transfer tasks, typically accompanied by a noticeable decrease in accuracy. This phenomenon confirms the effectiveness and robustness of the proposed model and methods. Furthermore, Table 6 shows that even with noise added to the vibration signals, the model’s accuracy remains above 86%, further demonstrating the model’s stability in the presence of noise.
Comparative results of ablation experiments on Jiangnan University dataset.
MIXCNN: mixed information convolutional neural network; MMSD: maximum mean square discrepancy.
Experimental results after adding noise on the Jiangnan University dataset.
SNR: signal-to-noise ratio.
The laboratory bearing dataset
The experimental bench is mainly composed of three-phase asynchronous motor, transmission shaft, bearing and other components. The sampling frequency is set to 19.2 kHz, and two channels (vertical and horizontal directions) are collected synchronously, collecting data at a rate of 19,200 samples per second. The acceleration sensor is placed vertically on the bearing seat of the experimental bench, and the acceleration signal is collected for 110 s for each fault type at each speed. The equipment used in the experiment is shown in Figure 7.

Photograph of the experimental setup.
The experiment utilized NUP205 cylindrical roller bearings, with faults created by wire cutting, resulting in outer race faults, inner race faults, and normal bearings. The different health conditions of the bearings are depicted in Figure 8.

Health conditions of NUP205 bearings: (a) outer ring fault 1, (b) outer ring fault 2, (c) outer ring fault 3, (d) outer ring fault 4, (e) inter ring fault 1, (f) inter ring fault 2, (g) inter ring fault 3, and (h) normal.
All data samples were 1024 data points in length, with data partitioned into 80% for training and 20% for testing. In this article, six transferable states (0-1, 0-2, 1-2, 1-0, 2-0, and 2-1) were constructed in the laboratory bearing dataset for experimental verification. Table 7 provides the detailed parameters of each transfer state: 1350 rpm corresponds to state 0, 1500 rpm to state 1, and 1650 rpm to state 2. For example, an outer ring fault described as 1_0_0_2_1 indicates an outer ring fault with a defect distance to the nearest edge of 0 mm, a defect length of 0 mm, a width of 2 mm, and a depth of 1 mm. Other fault types are shown in Tables 7 and 8.
NUP205 model experimental bearing parameters.
The transfer path description of rolling bearing.
From Tables 9 and 10, we can see that the proposed method is superior to other comparison methods. From Table 11, we can see that the ablation study results show that the proposed model achieves the highest accuracy. This reflects the effectiveness of each module in the model. From Table 12, we can see that in the six migration tasks of noisy data, the minimum accuracy of the model is stable at around 70%, which still maintains a good accuracy. Based on Figures 9 and 10, it is evident that the proposed method achieves satisfactory diagnostic results for various health conditions in each transfer task on the laboratory dataset. This observation strongly confirms the effectiveness of the proposed method, especially the combined application of MIXCNN and MMSD, in improving model accuracy and performance.
Diagnostic accuracy of various comparison methods on the laboratory dataset.
MMSD: maximum mean square discrepancy; MMD: maximum mean discrepancy.
Diagnostic accuracy of various models on the laboratory dataset.
MIXCNN: mixed information convolutional neural network.
Comparison results of ablation experiments on the laboratory dataset.
MIXCNN: mixed information convolutional neural network; MMSD: maximum mean square discrepancy.
Experimental results after adding noise to the laboratory dataset.
SNR: signal-to-noise ratio.

Confusion matrix of the proposed method on the laboratory dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.

t-SNE based feature visualization method on the laboratory dataset: (a) 0-1, (b) 0-2, (c) 1-0, (d) 1-2, (e) 2-0, and (f) 2-1.
Conclusion
This article proposes a novel bearing fault diagnosis method based on the MMSD and the MIXCNN. The innovation of this article lies in the use of deep convolution modules in MIXCNN to enhance the spatial position discrimination ability for bearing fault information. The proposed method employs convolution modules to achieve cross-channel interaction and introduces residual connections to reduce information loss in convolutional layers. Compared with existing methods, the superiority of the proposed method is attributed to MIXCNN not performing downsampling, thus maintaining consistent output across convolutional layers and minimizing the loss of original information. Additionally, MMSD comprehensively reflects the mean and variance information of data samples, further improving the model’s fault diagnosis accuracy. Experiments have validated the excellent performance of the proposed method in bearing fault diagnosis. Future research by the authors will explore the spatial position discrimination ability of this method for bearing fault information in high-noise environments by integrating various types of sensor data.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported in part by the Fundamental Research Funds for the Central Universities (No. 2572022BF07), in part by the Key Laboratory of Vibration and Control of Aero-Propulsion System, Ministry of Education, Northeastern University (VCAME202209), and in part by the Harbin science and technology innovation talent project (2023HBRCCG004).
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
