Transfer learning fault diagnosis of axial piston pumps by fusing knowledge from multisource subdomains and simulation-driven soft labels

Abstract

Axial piston pumps operate under dynamic conditions in practical applications, leading to cross-domain data distribution discrepancies that pose challenges for their deep-learning based fault diagnosis. While transfer learning has shown promise in mitigating such data distribution discrepancies between different domains, existing single-source transfer learning methods exhibit limited diagnostic accuracy when confronted with substantial domain shift. To address this challenge, this article proposes a novel transfer learning framework that fuses knowledge from multisource subdomains and simulation-driven soft labels. Each source subdomain has an independent domain-specific classifier, and its contribution to the final classification decision is weighted by the distribution discrepancy between source and target data. A computational fluid dynamics model is developed to simulate discharge pressure signals under target operating conditions. These simulated signals are used to compute soft labels through interclass similarity, which enables prior knowledge of target domain to be incorporated during classifier training. Experiments on an axial piston pump at various rotational speeds demonstrate that the proposed model increases the diagnostic accuracy by 28.03% over conventional single-source methods. Furthermore, the integration of simulation-driven soft labels yields an additional 13.59% accuracy gain. As a result, the proposed model achieves a superior average accuracy of 99.02%.

Keywords

fault diagnosis transfer learning multisource domain soft label axial piston pump

Introduction

Hydraulic transmission systems have a wide range of applications in various industries, including aerospace, automotive, construction machinery, and machine tools. At the core of these systems, axial piston pumps function as the hydraulic “heart,” converting mechanical energy into hydraulic power for delivering pressurized fluid to actuators. However, axial piston pumps are particularly vulnerable to extreme operating conditions, such as high speed, high pressure, and wide temperature ranges.¹ Failures in these pumps can lead to machine breakdown, resulting in substantial economic losses and potential safety risks. Consequently, implementing intelligent operation and maintenance for axial piston pumps is vital to guarantee safe and reliable performance of hydraulic transmission systems.²

Fault diagnosis is a critical component of intelligent operation and maintenance for mechanical equipment.³ For axial piston pumps, two primary approaches are employed: physics-based models and data-driven models. The former relies on domain-specific knowledge to establish specific physical models under certain assumptions. However, these models are often constrained by noisy operating environments and complex dynamic systems. Additionally, they cannot be updated in real time using online observed data, which restricts their capability to provide timely maintenance recommendations. In contrast, data-driven models require no complex and sophisticated mathematical models, making them increasingly popular for diagnosing hydraulic pumps.⁴

Conventional data-driven fault diagnosis methods typically input hand-crafted features extracted using signal processing technique^5,6 into shallow machine learning models—such as extreme learning machine,⁷ support vector machine,⁸K-nearest neighbor,⁹ and decision tree¹⁰—for pattern classification or regression. Driven by advances in artificial intelligence and Industrial Internet of Things, deep learning, which automatically extracts deep feature representations directly from raw data, has become a transformative paradigm for data-driven fault diagnosis of rotary machinery.^11,12 These supervised methods have demonstrated their effectiveness in the fault diagnosis of axial piston pumps,² but they heavily rely on the ideal assumption of abundant labeled data as well as independent and identical distributions for training and testing datasets. Unfortunately, labeled samples are often scarce in practical applications. Furthermore, the distribution discrepancies arising from varying operating conditions or diverse mechanical devices inevitably lead to severe performance degradation and poor generalization in new scenarios, hindering their practical engineering deployment.¹³

To overcome these limitations, transfer learning has emerged as a promising solution to enhance model generalization by mitigating distribution discrepancies across different domains.^14,15 Two main transfer learning strategies, comprising discrepancy-based and adversarial-based frameworks, have been extensively explored in mechanical fault diagnosis research.¹⁶ The former minimizes the data distribution discrepancy between source and target domains using various metrics, such as maximum mean discrepancy (MMD)^17,18 and its variants, including multikernel MMD,¹⁹ joint MMD,²⁰ and local MMD (LMMD).²¹ The adversarial-based transfer learning strategy learns domain-invariant features through adversarial training between a domain discriminator and a feature generator.^22,23 The discriminator is responsible for distinguishing between source and target samples, while the generator learns to produce features that deceive the discriminator. Building upon the adversarial paradigm, advanced variant models such as cycle-consistent generative adversarial networks have been developed to address the critical scarcity of labeled fault samples in target scenarios.^24,25

Transfer learning has demonstrated significant advances in machinery fault diagnosis in recent years, particularly in areas such as bearings,²⁶ gearboxes,²⁷ and hydraulic pumps.^28,29 While effective for cross-domain tasks, most existing studies concentrate on single-source domain knowledge transfer to target domain. However, labeled dataset are often collected from diverse sources, and a single-source dataset often provides insufficient knowledge for many real-world applications. Consequently, conventional single-source transfer learning models may exhibit suboptimal diagnostic accuracy in the target domain.

By contrast, multisource domains can offer complementary knowledge from various operating conditions or machines. Mansour et al.³⁰ demonstrated that the target distribution in multisource domain adaptation can be represented as a weighted combination of source distributions. This implies that a target classifier can be constructed by combining diverse source-specific classifiers with appropriate weights, provided that source-target domain relationships are known,³¹ thus supporting multisource transfer learning theoretically. Duan et al.³² further introduced a multisource domain adaptation approach that integrates base classifiers pretrained on labeled source data to construct a robust target domain classifier. Pioneering work in multisource domain adaptation methods primarily aimed to learn shared domain-invariant features across all domains. However, Zhu et al.³³ found that large distribution shifts between labeled source datasets can impede learning a single domain-invariant representation. They tackled this by mapping each source-target pair into a separate feature space to learn distinct domain-invariant representations.

Recent progress in mechanical fault diagnosis increasingly leverages multisource transfer learning to optimize utilization of heterogeneous diagnostic data. Tian et al.³⁴ introduced a multisource subdomain adaptation (MSSA) framework for bearing and gearbox fault diagnosis, employing LMMD to align feature distributions between source-target domain pairs and integrating weighted source classifiers into a joint target classifier. Huang et al.³⁵ presented a multisource transfer learning approach for fault diagnosis of bearings across variable conditions. Their study emphasized the critical role of employing independent domain-specific feature extractors rather than a shared architecture, as negative transfer from low-quality source domains could degrade feature learning in other domains.

To further improve diagnostic performance in complex industrial scenarios, recent research has promoted multisource domain adaptation from diverse methodological perspectives, including advanced transfer architectures,^36,37 reliable source-weighting mechanisms,^38,39 and adaptability to complex nonasymmetric or open-set transfer tasks.^40,41 Despite these remarkable algorithmic advancements, current studies typically focus on pure data-reliant paradigm. Consequently, the frameworks in these studies remain highly vulnerable to statistical distribution shifts of signals, as they completely exclude the underlying physical mechanisms and prior knowledge from the domain alignment process. To date, fusing mechanism-based prior knowledge into multisource transfer learning remains an unaddressed gap in intelligent fault diagnosis.

The literature review reveals that relying solely on single-source domain data is inadequate for addressing data distribution discrepancies, particularly in axial piston pump fault diagnosis under varying operating conditions. Furthermore, even within existing multisource paradigms, the lack of domain-specific physical prior knowledge to guide domain adaptation severely constrains the generalization capability of purely data-driven transfer networks. To overcome these limitations, we propose a novel transfer learning method that integrates MSSA and simulation-driven soft labeling (SL). The main contributions are outlined below:

A multisource transfer learning framework is developed to effectively utilize data from multiple source subdomains, enhancing feature representation and domain adaptability under various operational conditions.

A reliable source-weighting mechanism based on LMMD is introduced to ensure that larger weights are assigned to source subdomains exhibiting smaller distribution discrepancy relative to the target domain.

Through high-fidelity computational fluid dynamics (CFD) simulation of the pump’s discharge pressures, the target-domain physical constraints are directly incorporated into classifier training.

The rest of this article is structured as follows: the second section elaborates on the proposed transfer learning method; the third section introduces experimental validation on a test pump; the fourth section evaluates diagnostic performance; and the final section concludes with future research directions.

Proposed method

Transfer learning framework

Figure 1 illustrates the transfer learning framework architecture, leveraging labeled data from multiple source subdomains and unlabeled data from target domain while integrating target-domain prior knowledge via simulation-driven soft labels. The framework processes discharge pressure signals from multiple operating conditions (source subdomains) and unlabeled target-condition data. A shared feature extractor learns domain-invariant representations, while source-specific classifiers are dynamically weighted according to source-target domain discrepancies. This adaptive weighting strategy ensures that sub-domains with higher similarity to the target domain exert greater influence on the final decision.

Figure 1.

Proposed method architecture.

To reduce the domain gap between source and target conditions, simulation-driven soft labels are constructed to incorporate physics-based prior knowledge. A high-fidelity CFD model of the pump is first built to simulate target-condition discharge pressure signals. By introducing typical faults into the CFD model and calculating the interclass similarity of the simulated signals, probabilistic soft labels are generated to represent the class probability distribution. These physics-informed soft labels replace standard one-hot labels to be integrated into the network’s training process as a sophisticated regularization mechanism. This strategy prevents the model from becoming overconfident on source subdomain features and mitigates potential model overfitting, thereby enhancing the robustness and diagnostic performance of cross-condition transfer learning.

CFD modeling and fault injection

Compared to motor current⁴² and vibration⁴³ signals, discharge pressure signals provide more informative insights into the health status of the axial piston pump and are often readily measurable in hydraulic systems. To generate these critical discharge pressure signals under both normal and faulty conditions, we employ high-fidelity CFD simulations, as illustrated in Figure 2. The fluid domain is derived from the pump’s three-dimensional (3D) model, with particular attention to four critical friction pairs (i.e., slipper/swash plate, piston/slipper, piston/cylinder block, and valve plate/cylinder block) where 10 μm oil films are set to match the simulated leakage rate with that of the test pump. The computational domain and grid generation have been described in our previous work.⁴⁴ To account for the significant variations in rotational speed and discharge pressure across different practical applications, various operating conditions are simulated by adjusting the parameters of outlet pressure and angular velocity in the CFD model. To guarantee the simulation accuracy, the developed CFD model has been validated against experimental data in our previous work.⁴⁵ In that work, the simulated and experimental pressure signals exhibit strong agreement in periodicity and waveform, with cosine similarity values consistently ranging between 0.85 and 0.92, which successfully verifies the model’s effectiveness.

Figure 2.

CFD modeling and fault injection for axial piston pumps.⁴⁵

Scratches on critical friction pairs are among the most common incipient failure modes in axial piston pumps. In this study, we validate the diagnostic framework by detecting scratches across various friction pairs. Specifically, three types of scratches are investigated: circumferential scratches (CS) and radial scratches (RS) in the cylinder block, as well as parallel scratches (SS) in the slipper. These fault conditions are accurately replicated in our CFD model through precise geometric modifications to the fluid domain, where rectangular grooves are introduced at the respective friction pairs to realistically simulate the characteristic leakage paths caused by scratch damage. The scratch-induced leakage causes distinct waveform variations in the discharge pressure signals, captured by a virtual pressure sensor at the pump outlet. Besides, the initial angular position of the shaft is kept identical across all simulation models to ensure inherent phase synchronization. This comprehensive simulation approach generates a robust dataset of pressure signals under various faulty conditions, establishing a reliable data foundation for fault diagnosis while overcoming the challenges associated with physical fault injection in experimental setups.

Soft label generation

The inherent variability in discharge pressure signals across different operating conditions poses a high risk of overfitting in cross-domain transfer learning for diagnosing axial piston pumps. Recent studies^46,47 have demonstrated that soft label training offers an effective solution to this problem, with a careful calibration of soft label values being the key to realizing superior generalization and accuracy. To address the challenge of overfitting, this study introduces a soft label generation strategy based on sample similarity derived from simulation data. By assigning soft labels tailored to the target domain, the proposed strategy effectively mitigates overfitting and improves model generalization. The detailed process for generating these soft labels is illustrated in Figure 3.

Figure 3.

Procedure of soft-label generation.

The sampling interval of the virtual pressure sensor in the CFD model is set to 1°, yielding 360 data points per revolution cycle. As a result, the simulated signals are divided into segments, each containing 360 points. Given that all CFD simulations start at the same phase, signal alignment before similarity evaluation is not required. To quantify the dissimilarity between different signal classes, the root mean square error (RMSE) is selected as the distance metric.

{RMSE}_{i, j} = \sqrt{\frac{\sum_{m = 1}^{360} {(x_{i, m} - x_{j, m})}^{2}}{360}}

(1)

where i and j represent two distinct classes, x_i,m and x_j,m denote the mth data point in the representative signal series of class i and j, respectively. The RMSE _i,j value reflects the geometric distance between two waveforms: higher RMSE values indicate greater waveform deviations, while a RMSE value of 0 signifies identical signals.

To construct the soft labels, the RMSE is mapped to a similarity metric S_i,j within the range [0, 1] using an exponential decay function:

S_{i, j} = \exp (- k \cdot {RMSE}_{i, j})

(2)

where k is a scaling parameter that regulates the sensitivity of the similarity distribution. For this study, k is empirically set to 5. The SoftMax function is then applied to normalize these similarity scores, generating a probability distribution where the sum of all elements equals 1:

L_{i, j} = SoftMax (S_{i, j})

(3)

As shown in the SoftMax matrix in Figure 3, the diagonal elements, which represent the confidence in the true class, may become insufficiently dominant due to high interclass similarity. This can lead to underfitting or ambiguity in classification. To mitigate this issue, a correction mechanism is introduced to ensure a minimum confidence level for the ground truth class:

y_{i} = \frac{(e^{x_{i}} - 1)}{\sum_{i = 0}^{3} (e^{x_{i}} - 1)}, while max (y_{i}) < μ

(4)

where x_i and y_i denote the ith probability value in the original and the rectified soft label, respectively. The parameter μ serves as a threshold to enforce the lower bound of the true class probability, which is set to 0.7 in this study.

MSSA network

The proposed MSSA network is designed to achieve robust fault diagnosis under diverse operating conditions by extracting domain-invariant features. As illustrated in Figure 1, the architecture comprises a shared feature extractor F multiple source-specific classifiers ${C_{j}}_{j = 1}^{K}$ , where K corresponds to the number of distinct source domains (each representing a unique operating condition).

A core strategy in transfer learning involves reducing the distribution discrepancy between source and target domains to facilitate effective knowledge transfer. To quantify this distribution discrepancy, the MMD serves as a widely adopted nonparametric metric, measuring the distance $D_{H}$ between the global distributions P_s (source domain) and P_t (target domain) in a reproducing kernel Hilbert space $H$ :

D_{H} (P_{s}, P_{t}) = ‖ E_{x ~ P_{s}} [φ (x^{s})] - E_{x ~ P_{t}} [φ (x^{t})] ‖_{H}^{2}

(5)

where φ(·) represents the feature mapping function related to the characteristic kernel k , with x^s ∈ D_s and x^t ∈ D_t representing samples from the source and target datasets, respectively. The distance between feature representations is calculated using the inner product:

k (x^{s}, x^{t}) = 〈 φ (x^{s}), φ (x^{t}) 〉

(6)

Note that the standard MMD only aligns the global distributions, ignoring the local class-conditional structures. For the task of fault diagnosis, aligning relevant subdomains within the same class is critical. Therefore, the LMMD is selected in the MSSA framework to align the distributions of each category independently, as defined below:

D_{ℋ} (P_{s}, P_{t}) = E_{n} {‖ E_{x_{s}^{(n)}} [φ (x^{s})] - E_{x_{t}^{(n)}} [φ (x^{t})] ‖}_{ℋ}^{2}

(7)

where n indicates the nth category, P_s⁽ ⁿ ⁾ and P_t⁽ ⁿ ⁾ denote the distributions of class n in the source and target domains, respectively. The LMMD progressively aligns these distributions across domains during training. Its unbiased estimator is given by

{\hat{D}}_{H} (P_{s}, P_{t}) = \frac{1}{N} \sum_{n = 1}^{N} ‖ \sum_{x_{i}^{s} \in D_{s}} w_{i}^{s, n} φ (x_{i}^{s}) - \sum_{x_{j}^{t} \in D_{t}} w_{j}^{t, n} φ (x_{j}^{t}) ‖_{H}^{2}

(8)

where $w_{i}^{s, n}$ and $w_{j}^{t, n}$ denote the probability of the ith sample in D_s and the jth sample in D_t belong to class n, respectively, N is the total number of classes.

The shared feature extractor F processes samples from both domains during training, with the extracted features F(x) serving as inputs for the LMMD-based domain adaptation loss:

\begin{matrix} {\hat{D}}_{H} (P_{s}, P_{t}) = \frac{1}{N} \sum_{n = 1}^{N} ‖ \sum_{i = 1}^{N_{s}} \sum_{j = 1}^{N_{t}} w_{i}^{s, n} w_{j}^{s, n} k (F (x_{i}^{s}), F (x_{j}^{s})) + \\ \sum_{i = 1}^{N_{t}} \sum_{j = 1}^{N_{t}} w_{i}^{t, n} w_{j}^{t, n} k (F (x_{i}^{t}), F (x_{j}^{t})) \\ - 2 \sum_{i = 1}^{N_{s}} \sum_{j = 1}^{N_{t}} w_{i}^{s, n} w_{j}^{t, n} k (F (x_{i}^{s}), F (x_{j}^{t})) ‖_{H}^{2} \end{matrix}

(9)

To leverage knowledge from multiple operating conditions, the total domain adaptation loss l_LMMD is calculated as the sum of LMMD losses across all K source domains:

l_{LMMD} = \sum_{j = 1}^{K} {\hat{D}}_{H} (F (x_{j}^{s}), F (x^{t}))

(10)

The extracted features are passed to the domain-specific classifier C_j. To mitigate overfitting and incorporate the prior knowledge obtained from simulation data, the soft labels generated in “Soft label generation” section are used as the ground truth. We define the classification loss l_cls as the cross-entropy between the output of classifier C_j and these soft labels:

l_{cls} = \sum_{j = 1}^{K} E_{x ~ X_{j}} J (C_{j} (F (x_{j}^{s})), y_{soft}^{t})

(11)

where J(·, ·) denotes the cross-entropy function:

J (y_{i}, {\hat{y}}_{i}) = - \sum_{i = 1}^{N} y_{i} \log ({\hat{y}}_{i})

(12)

where y_i and ${\hat{y}}_{i}$ are the elements of the soft label and the corresponding network prediction, respectively.

The total loss l_total of the MSSA network combines the classification loss l_cls and domain adaptation loss l_LMMD through weighted summation:

l_{total} = l_{cls} + λ l_{LMMD}

(13)

where λ is the trade-off coefficient balancing these two losses. To ensure stable convergence, a dynamic trade-off adaptation strategy²² is implemented as follows:

λ = \frac{2}{1 + \exp (- γ \cdot p)} - 1

(14)

where scaling hyperparameter γ is fixed to 10, and p is the training progress linearly changing from 0 to 1. Under this configuration, the trade-off coefficient λ remains close to zero during the early stage of training to avoid divergence, while rapidly increasing to 1 in later epochs to enforce the extraction of domain-invariant features.

During inference, relying solely on a single classifier or simple averaging is suboptimal due to the varying correlation between the target and different source domains. Therefore, we establish an ensemble task classifier by integrating the outputs of source-specific classifiers. Each classifier is dynamically assigned a weight based on the LMMD distance between its corresponding source and target domain features. The weighting mechanism ensures that smaller LMMD distance means higher similarity and thus a larger weight. The weight ω_j for the jth classifier is given by

ω_{j} = \frac{\exp (- {\hat{D}}_{H} (F (x_{j}^{s}), F (x^{t})))}{\sum_{j = 1}^{K} \exp (- {\hat{D}}_{H} (F (x_{j}^{s}), F (x^{t})))}

(15)

Finally, the output of the task classifier is determined as the weighted sum of the predictions from all specific classifiers, as shown in Equation (16). The final diagnostic result of the task classifier is determined by the class with the maximum probability in the aggregated output.

C_{t} (x^{t}) = \sum_{j = 1}^{K} ω_{j} \cdot C_{j} (F (x^{t}))

(16)

Experimental setup

The experimental investigation was performed using a swash-plate axial piston pump featuring a 71 mL/rev maximum displacement, as illustrated in Figure 4. The experimental apparatus and its circuit diagram are shown in Figure 5. The test pump powered by an electric motor drew low-pressure fluid from the tank and delivered high-pressure fluid to the hydraulic system. The temperature of the supplied oil was maintained at approximately 50°C. To avoid cavitation effects, a charge pump was employed to elevate the suction pressure at the test pump’s inlet to 1.6 bar. The control unit was responsible for regulating the rotational speed and discharge pressure. A proportional relief valve was used for adjustable load applications, which allowed for variable pressure loads independent of the rotational speed.

Figure 4.

Configuration of the test pump.⁴⁸

Figure 5.

Experimental setup of axial piston pumps: (a) test bench and its (b) hydraulic circuit diagram^48; (c–e) RS, CS, and SS fault injection into internal components. CS: circumferential scratches; RS: radial scratches; SS; parallel scratches.

An array of sensors was deployed to monitor the test pump’s characteristic parameters. Pressure sensors were located at both the pump’s inlet and outlet lines to measure suction and discharge pressures, respectively. Additionally, flowmeters were positioned at the outlet and drain lines to quantify volumetric delivery flow rate and leakage flow rate. Vibration behavior was captured using a uniaxial accelerometer, magnetically mounted on the pump’s rear end cover. During steady-state operation, all signals were acquired simultaneously via a data acquisition unit. The discharge pressure and vibration data were acquired at 51.2 kHz, while the other parameters were logged at 1 kHz.

The test pump was first operated under normal conditions (N) at three rotational speeds of 500, 1000, and 1500 r/min, and four discharge pressures of 5, 10, 15, and 20 MPa. Once the experiments under N were finished, the test pump was disassembled to introduce specific fault modes, including CS, RS, and SS, by replacing the corresponding internal components. These scratches were fabricated via electrical discharge machining to form a rectangular cross-sectional profile, with their location and dimensional parameters detailed in Figure 5(c) to (e). After reassembly, the faulty pump was tested under identical operating conditions to collect sensor signals. Although artificially machined scratches exhibit a geometric discrepancy compared to real scratches, they induce highly consistent pressure ripple and leakage characteristics, and thus have been widely adopted in previous research.⁴⁹

Figure 6 presents a sample measurement of the test pump’s discharge pressure under normal and faulty conditions at 20 MPa and three rotational speeds (500, 1000, and 1500 r/min). Under N, the discharge pressure waveform exhibits a periodic pulsation over one cycle. In contrast, under faulty conditions, this periodic pulsation waveform becomes distorted due to increased internal leakage caused by injected faulty internal components. Additionally, the waveform of the discharge pressure signals varies with different operating conditions, even when the test pump has no faulty components or the same faulty component.

Figure 6.

Discharge pressure waveforms of the test pump with different faults: (a–c) N, (d–f) CS, (g–i) RS, and (j–l) SS. N: normal conditions; CS: circumferential scratches; RS: radial scratches; SS; parallel scratches

The variability in operating conditions, particularly changes in rotational speed, induces substantial deviations in discharge pressure signals, significantly complicating the pump’s fault diagnosis. To rigorously validate the proposed diagnostic framework, eight cross-speed transfer learning tasks were systematically designed, as detailed in Table 1. These tasks cover four discharge pressures (5, 10, 15, and 20 MPa) to evaluate the model’s generalization capability. The study employs both interpolation and extrapolation scenarios: in tasks T1–T4, the target domain speed of 1000 r/min lies within the source domain speed range of 500–1500 r/min, while in tasks T5–T8, the target speed of 500 r/min extends beyond the source domain range of 1000–1500 r/min, providing a robust assessment across different operating conditions.

Table 1.

Transfer learning tasks.

Task	Source domain I		Source domain II		Target domain
T1	500 r/min	5 MPa	1500 r/min	5 MPa	1000 r/min	5 MPa
T2		10 MPa		10 MPa		10 MPa
T3		15 MPa		15 MPa		15 MPa
T4		20 MPa		20 MPa		20 MPa
T5	1000 r/min	5 MPa	1500 r/min	5 MPa	500 r/min	5 MPa
T6		10 MPa		10 MPa		10 MPa
T7		15 MPa		15 MPa		15 MPa
T8		20 MPa		20 MPa		20 MPa

To maintain data consistency across varying operating conditions, each raw discharge pressure signal is segmented into samples representing one complete pump revolution. Due to the variation in the number of sampling points per revolution with rotational speed, an instantaneous speed-guided linear interpolation method is applied to standardize the length of each sample to 720 points, thereby eliminating speed-induced domain discrepancies across different working conditions. This configuration yields a precise angular resolution of 0.5° per point to effectively preserve transient fault features. For each transfer learning task, the model is trained on 324 samples from each source and target domain, reserving 20% of these samples as a validation set to monitor the training progress. An additional 324 independent target domain samples serve as a test set to rigorously evaluate the diagnostic performance.

The proposed diagnostic network runs in PyTorch and is trained on a workstation equipped with a CPU featuring Intel® Core™ i9-14900HX and a GPU featuring NVIDIA GeForce RTX 4060. Hyperparameters have been optimized to ensure the model performance. Specifically, the input signals are normalized to the range [–1, 1] to accelerate convergence. Model training was conducted over 20 epochs with a batch size of 16, using the SGD optimizer (momentum = 0.9, weight decay = 0.0005). The initial learning rate of 0.002 was dynamically adjusted via a StepLR scheduler, reducing the learning rate by a factor of 0.2 every 10 epochs.

Results and discussion

To enhance domain adaptation, the proposed method incorporates prior knowledge of faulty signals in the target domain through simulation-driven soft labels. For each transfer learning task, these soft labels are generated from simulated signals under the target domain’s specific operating conditions (as detailed in Section “Soft label generation”). The resulting soft label values for each task are presented in Figure 7, where higher values indicate a greater probability of the corresponding category.

Figure 7.

(a–d) Soft labels of each fault category; (e–l) soft label matrices under different working conditions.

As illustrated in Figure 7, the generated soft labels consistently yield peak values for their respective true categories, confirming their discriminative capability. For the N samples in Figure 7(a), the soft label values associated with the SS category remain substantially greater than zero, reflecting significant inter-class similarity between N and SS categories. A similar phenomenon is evident in Figure 7(d), where the true SS samples show relatively high soft label values for the N category. This observation aligns with the discharge pressure signal characteristics shown in Figure 6, where the waveform distortions caused by SS faults are relatively subtle and the waveform of SS samples looks like the N samples. By leveraging these soft labels, a fuzzy classification strategy is introduced between N and SS samples, which minimizes misclassification arising from sharp decision boundaries and better reflects their inherent physical similarities. In contrast, the soft labels for CS and RS samples remain close to a standard one-hot representation due to their distinct waveform characteristics. This distinction facilitates clearer fault identification and enhances the model’s generalization across diverse operating conditions.

To evaluate the diagnostic performance of the proposed method, we conducted an ablation study comparing five distinct model configurations across transfer learning tasks T1–T8. This systematic evaluation was designed to isolate and quantify the individual contributions of two key components: (1) multisource information fusion and (2) the simulation-driven SL strategy.

(a) Source I only/Source II only: These two baseline configurations represent conventional single-source domain adaptation approaches, where the model is trained exclusively on either source I or II along with the unlabeled target domain. The source-target domain discrepancy is minimized via LMMD to align their local feature distributions during training.

(b) DAN: This configuration combines data from both source I and source II into a single merged source domain, treating them as a homogeneous dataset. The model follows the classical domain adaptation network (DAN) but utilizes LMMD for distribution alignment to maintain metric consistency, thereby neglecting the distinct weights of individual domains.

(c) MSSA: This configuration adopts the MSSA framework described in Section “MSSA network”, but uses standard one-hot encoding for the classification loss.

(d) MSSA + SL: This is the complete proposed framework in this work, which integrates simulation-driven SL to effectively leverage prior knowledge from the target domain.

To ensure a fair and conclusive comparison, all models are trained for 20 epochs with a batch size of 16 using the same SGD optimizer and dynamic trade-off strategy as the proposed method. The learning rate is specified as 0.005 for the Source I only, Source II only, and DAN methods, and 0.002 for both MSSA and MSSA + SL to guarantee convergence. Furthermore, all comparison methods share an identical feature extractor structure and use the same neural network topology for their classifiers, with the number of classifiers dynamically matching that of source domains.

Figure 8 and Table 2 compare the diagnostic accuracies across the transfer learning tasks T1–T8 between the five models. The proposed method achieves the highest average accuracy of 99.02% while consistently outperforming other competing models in all tasks except T8. In contrast, the transfer learning models relying solely on source I or II exhibit significantly limited diagnostic performance, with accuracies rarely surpassing 75%. While the DAN achieves high diagnostic accuracy in specific tasks (e.g., 100% in T2) by leveraging data from diverse operating conditions, its generalization capability remains unstable compared to the MSSA and MSSA + SL methods. Notably, the diagnostic accuracy of the DAN in T6 drops below that of the “Source I only” setting, suggesting that incorporating additional source domains with large distribution discrepancies introduces interference rather than improvement.

Figure 8.

Comparison results of diagnostic accuracy between different methods.

Table 2.

Comparative diagnostic accuracy across transfer learning tasks.

Task	Source I only (%)	Source II only (%)	DAN (%)	MSSA (%)	MSSA + SL (%)
T1	81.56	70.31	73.75	79.06	99.06
T2	50.00	30.94	100	100	100
T3	47.81	35.31	78.43	81.25	91.88
T4	50.00	50.00	50.00	50.00	100
T5	60.31	70.31	61.88	73.44	99.69
T6	98.75	54.96	55.00	97.81	100
T7	82.81	39.69	97.50	99.69	99.69
T8	44.69	46.88	86.88	100	99.69
Average	57.13		75.43	85.16	98.75

Note. Bold values indicate the highest diagnostic accuracy for each task.

The conventional MSSA demonstrates a notable improvement over the DAN, achieving nearly 100% diagnostic accuracy in T2 and T6–T8, owing to its dynamic weighting mechanism, which prioritizes relevant source domains while suppressing noise from dissimilar ones. Nonetheless, the proposed method still surpasses the conventional MSSA in challenging tasks such as T1 and T3–T6, where distinguishing between similar fault categories demands higher discriminative capability.

In general, transfer learning leveraging multisource data consistently outperforms single-source approaches. Specifically, compared to their single-source baselines, DAN, MSSA, and MSSA + SL achieve average accuracy improvements of 18.30%, 28.03%, and 41.62%, respectively. Despite using the same source domain data, these methods exhibit significant performance differences. For instance, while the DAN model fails to distinguish between CS and RS samples in T8 (Figure 9(a)), the MSSA-based methods achieve accurate diagnosis (Figure 9(d) and (g))—likely due to their more refined domain alignment strategies.

Figure 9.

Effects of sub-domain weights on the diagnostic performance in T8: (a–c) confusion matrix, sub-domain weights, and t-SNE visualization of DAN; (d–f) confusion matrix, sub-domain weights, and t-SNE visualization of MSSA; (g–i) confusion matrix, sub-domain weights, and t-SNE visualization of MSSA + SL. DAN: domain adaptation network; MSSA; multi-source subdomain adaptation; SL; soft labeling.

To further investigate the training dynamics, the variations of domain weights are evaluated. As shown in Figure 9(b), (e), and (h), the MSSA and MSSA + SL models assign higher weights to source II during training. According to the optimization objective of the MSSA framework, a smaller LMMD loss inherently triggers a larger domain weight. Therefore, this dynamic weight allocation indicates that source II shares a higher distribution similarity with the target domain in the latent space. By intensifying the contribution of the more relevant source sub-domain, the model focuses on domain-invariant features while suppressing potential negative transfer from irrelevant features. Compared to the feature distribution of the DAN model in Figure 9(c), the t-distributed stochastic neighbor embedding (t-SNE) visualizations in Figure 9(f) and (i) demonstrate that the target domain features generated by the MSSA-based methods are clustered closer to source II and farther from source I. This observation confirms that the weighting mechanism successfully promotes alignment between target domain features and the most relevant source domain features in the latent space, which leads to a more compact intraclass clustering and higher classification accuracy in the final diagnostic results.

Distinguishing between N and SS samples remains a significant challenge in cross-condition diagnosis due to the similarity of their pressure waveforms. As shown in the confusion matrices for task T1 in Figure 10(a) and (c), both DAN and MSSA models demonstrate limited diagnostic accuracy, with frequent misclassification between these two categories. The t-SNE visualizations in Figure 10(b) and (d) reveal that while N and SS samples from the source domains (red and green markers) are well-separated, their counterparts in the target domain (yellow markers) exhibit substantial overlap. This indicates that these two models suffer from overfitting and fail to generalize the subtle interclass differences to the target domain. A likely explanation is that the rigid one-hot labels may encourage the model to learn trivial features that minimize classification loss on the source domains, rather than robust cross-domain representations.

Figure 10.

Diagnostic performance in T1: (a–b) confusion matrix and t-SNE visualization of DAN; (c–d) confusion matrix and t-SNE visualization of MSSA; (e–f) confusion matrix and t-SNE visualization of MSSA + SL. DAN: domain adaptation network; MSSA; multisource subdomain adaptation; SL; soft labeling.

To address this limitation, simulation-driven soft labels are introduced to mitigate overfitting. By assigning nonzero probabilities to the visually similar categories, the model learns a softer decision boundary with enhanced tolerance to classification errors. The confusion matrix in Figure 10(e) reveals a substantial enhancement in diagnostic performance, particularly evident in the improved classification accuracy between the N and SS categories. Furthermore, the t-SNE visualization in Figure 10(f) highlights that feature representations of several N-class samples (red circles) now overlap with SS-class clusters (red triangles)—a result of the intentionally softened class boundaries during training. This strategy enhances feature robustness and generalization, ultimately enabling the MSSA + SL method to achieve clearer separation between N and SS samples in the target domain, resulting in a notable improvement in classification accuracy.

Conclusion

The diagnostic performance of axial piston pumps is challenged by operating condition variations that induce distortions in discharge pressure signals, which severely limit the generalization of conventional data-driven approaches. To address these challenges in cross-condition fault diagnosis, we propose a novel MSSA framework integrated with simulation-driven soft labels. The experimental results across eight cross-speed transfer learning tasks demonstrate that leveraging data from diverse working conditions is crucial for enhancing diagnostic performance. Compared to single-source baselines, the multisource approaches yield a substantial accuracy improvement of at least 18.30%. Specifically, the dynamic weighting mechanism within the MSSA-based method effectively prioritizes knowledge from the most relevant source domains, increasing diagnostic accuracy by 9.73%. Furthermore, ablation study verifies the necessity of simulation-driven soft labels derived from a high-fidelity CFD model. By incorporating target-domain prior knowledge into classifier training, the MSSA + SL method can learn more robust features, achieving a superior average accuracy of 99.02%.

While the proposed MSSA + SL framework demonstrates excellent performance for cross-condition fault diagnosis in axial piston pumps, two limitations need to be addressed in future work. First, the high-fidelity CFD simulation for generating soft labels is computationally time-consuming, which could be effectively accelerated by training surrogate models for data generation. Second, as a domain adaptation method, the MSSA framework still requires unlabeled target-domain data for training. Therefore, multisource domain generalization methods will be investigated to achieve reliable zero-shot fault diagnosis under unknown operating conditions.

Footnotes

ORCID iDs

Qun Chao

Zhongrui Wang

Wentao Wang

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (no. 52475064) and the Special Fund for Science and Technology Innovation Teams of Shanxi Province (no. 202304051001033).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Sharma

Kumar

Das

. A review on wear failure of hydraulic components: existing problems and possible solutions. Eng Res Express 2024; 6(1): 012502. https://doi.org/10.1088/2631-8695/ad299f

Zhu

Tang

, et al. Intelligent fault diagnosis methods for hydraulic piston pumps: a review. J Mar Sci Eng 2023; 11(8): 1609. https://doi.org/10.3390/jmse11081609

Tang

Yan

Xue

, et al. Deep transferable model with multi-head attention for small sample fault diagnosis of hydraulic pumps. Eng Appl Artif Intell 2025; 160: 111808. https://doi.org/10.1016/j.engappai.2025.111808

Yang

Ding

Xiao

, et al. Current status and applications for hydraulic pump fault diagnosis: a review. Sensors 2022; 22(24): 9714. https://doi.org/10.3390/s22249714

Wang

Zhu

Wang

, et al. Effective component extraction for hydraulic pump pressure signal based on fast empirical mode decomposition and relative entropy. AIP Adv 2020; 10(7): 075103. https://doi.org/10.1063/5.0009771

Kumar

Gandhi

Tang

, et al. Adaptive sensitive frequency band selection for VMD to identify defective components of an axial piston pump. Chin J Aeronaut 2022; 35(2): 250–265. https://doi.org/10.1016/j.cja.2020.12.037

Guo

Zhang

Wen

, et al. A simultaneous extraction transform and snake optimized extreme learning machine for variable speed piston pump health diagnostics. J Braz Soc Mech Sci Eng 2025; 47(9): 446. https://doi.org/10.1007/s40430-025-05763-y

Xia

Zhang

, et al. A spare support vector machine based fault detection strategy on key lubricating interfaces of axial piston pumps. IEEE Access 2019; 7: 178177–178186. https://doi.org/10.1109/ACCESS.2019.2958141

Konieczny

Stojek

. Use of the K-nearest neighbour classifier in wear condition classification of a positive displacement pump. Sensors 2021; 21(18): 6247. https://doi.org/10.3390/s21186247

10.

Keller

Sciancalepore

Vacca

. Condition monitoring of an axial piston pump on a mini excavator. Int J Fluid Power 2023; 24(2): 171–206. https://doi.org/10.13052/ijfp1439-9776.2422

11.

Nie

Ding

, et al. Ensemble weighted DJPMMD based deep transfer metric learning method for fault diagnosis of bearing under variable working conditions. Struct Health Monit 2025. https://doi.org/10.1177/14759217251378855

12.

Xia

Huang

Zhang

. A novel fault diagnosis method based on nonlinear-CWT and improved YOLOv8 for axial piston pump using output pressure signal. Adv Eng Inform 2025; 64: 103041. https://doi.org/10.1016/j.aei.2024.103041

13.

Zhang

Qin

, et al. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing 2020; 407: 121–135. https://doi.org/10.1016/j.neucom.2020.04.045

14.

Pan

Yang

. A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22(10): 1345–1359. https://doi.org/10.1109/TKDE.2009.191

15.

Weiss

Khoshgoftaar

Wang

. A survey of transfer learning. J Big Data 2016; 3: 1–40. https://doi.org/10.1186/s40537-016-0043-6

16.

Zhuang

Duan

, et al. A comprehensive survey on transfer learning. Proc IEEE 2021; 109(1): 43–76. https://doi.org/10.1109/JPROC.2020.3004555

17.

Han

Liu

Yang

, et al. Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application. ISA Trans 2020; 97: 269–281. https://doi.org/10.1016/j.isatra.2019.08.012

18.

Pan

Tsang

Kwok

, et al. Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 2011; 22(2): 199–210. https://doi.org/10.1109/TNN.2010.2091281

19.

Long

Cao

, et al. Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Mach Intell 2019; 41(12): 3071–3085. https://doi.org/10.1109/TPAMI.2018.2868685

20.

Jia

Deng

, et al. Joint distribution adaptation with diverse feature aggregation: a new transfer learning framework for bearing diagnosis across different machines. Measurement 2022; 187: 110332. https://doi.org/10.1016/j.measurement.2021.110332

21.

Zhu

Zhuang

Wang

, et al. Deep subdomain adaptation network for image classification. IEEE Trans Neural Netw Learn Syst 2021; 32(4): 1713–1722. https://doi.org/10.1109/TNNLS.2020.2988928

22.

Ganin

Ustinova

Ajakan

, et al. Domain-adversarial training of neural networks. J Mach Learn Res 2016; 17(1): 1–35.

23.

Sicilia

Zhao

Hwang

. Domain adversarial neural networks for domain generalization: when it works and how to improve. Mach Learn 2023; 112: 2685–2721. https://doi.org/10.1007/s10994-023-06324-x

24.

Liu

Xiang

, et al. A transfer learning strategy based on numerical simulation driving 1D Cycle-GAN for bearing fault diagnosis. Inf Sci 2023; 642: 119175. https://doi.org/10.1016/j.ins.2023.119175

25.

Chao

Shao

Wang

, et al. Numerical simulation driven Cycle-GAN for domain generalization-based fault diagnosis of axial piston pumps. Eng Sci Technol Int J 2026; 77: 102348. https://doi.org/10.1016/j.jestch.2026.102348

26.

Wang

Jiang

Dong

, et al. Spatial-channel collaborative multi-scale graph interaction deep transfer learning for unsupervised rotating machinery fault diagnosis. Eng Appl Artif Intell 2026; 176: 114691. https://doi.org/10.1016/j.engappai.2026.114691

27.

Chen

Gao

, et al. Semi-supervised transfer graph representation learning with few-shot adaptation for gearbox diagnostics under extraneous transient noise. Struct Health Monit 2026. https://doi.org/10.1177/14759217251414344

28.

Yan

Wang

, et al. Multi-domain coupled dynamics modeling and enhanced meta-transfer learning method for few-shot fault diagnosis of axial piston pumps. Adv Eng Inform 2026; 69: 104002. https://doi.org/10.1016/j.aei.2025.104002

29.

Shao

Chao

Xia

. Fault severity recognition in axial piston pumps using attention-based adversarial discriminative domain adaptation neural network. Phys Scr 2024; 99(5): 056009. https://doi.org/10.1088/1402-4896/ad38ea

30.

Mansour

Mohri

Rostamizadeh

. Domain adaptation with multiple sources. In: Advances in neural information processing systems, Red Hook, USA, Curran Associates Inc., 2008, pp. 1–8.

31.

Chen

Zuo

, et al. Deep cocktail network: multi-source unsupervised domain adaptation with category shift. In: IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, USA, IEEE Computer Society, 2018, pp. 3964–3973.

32.

Duan

Tsang

. Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans Neural Netw Learn Syst 2012; 23(3): 504–518. https://doi.org/10.1109/TNNLS.2011.2178556

33.

Zhu

Zhuang

Wang

. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, Honolulu, USA, AAAI Press, 2019, pp. 5989–5996.

34.

Tian

Han

, et al. A multi-source information transfer learning method with subdomain adaptation for cross-domain fault diagnosis. Knowl Based Syst 2022; 243: 108466. https://doi.org/10.1016/j.knosys.2022.108466

35.

Huang

Yin

Yan

. A fault diagnosis method of bearings based on deep transfer learning. Simul Model Pract Theory 2023; 122: 102659. https://doi.org/10.1016/j.simpat.2022.102659

36.

Gao

Yang

Tang

. A multi-source domain information fusion network for rotating machinery fault diagnosis under variable operating conditions. Inf Fusion 2024; 106: 102278. https://doi.org/10.1016/j.inffus.2024.102278

37.

Rezaeianjouybari

Shang

. A novel deep multi-source domain adaptation framework for bearing fault diagnosis based on feature-level and task-specific distribution alignment. Measurement 2021; 178: 109359. https://doi.org/10.1016/j.measurement.2021.109359

38.

Liu

Jiang

, et al. Intelligent fault diagnosis of rotating machinery using a multi-source domain adaptation network with adversarial discrepancy matching. Reliab Eng Syst Saf 2023; 231: 109036. https://doi.org/10.1016/j.ress.2022.109036

39.

Zhao

Jia

Shao

. A novel conditional weighting transfer Wasserstein auto-encoder for rolling bearing fault diagnosis with multi-source domains. Knowl Based Syst 2023; 262: 110203. https://doi.org/10.1016/j.knosys.2022.110203

40.

Chen

Liao

, et al. A multi-source weighted deep transfer network for open-set fault diagnosis of rotary machinery. IEEE Trans Cybern 2022; 53(3): 1982–1993. https://doi.org/10.1109/TCYB.2022.3195355

41.

Yang

Lei

, et al. Multi-source transfer learning network to complement knowledge for intelligent diagnosis of machines with unseen faults. Mech Syst Signal Process 2022; 162: 108095. https://doi.org/10.1016/j.ymssp.2021.108095

42.

Zhang

Liu

Deng

, et al. Research on electro-mechanical actuator fault diagnosis based on ensemble learning method. Int J Hydromechatron 2024; 7(2): 113–131. https://doi.org/10.1504/IJHM.2024.138231

43.

Zhang

Liu

, et al. A novel interpretable semi-supervised graph learning model for intelligent fault diagnosis of hydraulic pumps. Knowl Based Syst 2024; 305: 112598. https://doi.org/10.1016/j.knosys.2024.112598

44.

Wang

Chao

Shi

, et al. Condition monitoring of axial piston pumps based on machine learning-driven real-time CFD simulation. Eng Appl Comput Fluid Mech 2025; 19(1): 2474676. https://doi.org/10.1080/19942060.2025.2474676

45.

Wang

Chao

Wang

, et al. Transfer learning from computational fluid dynamics simulation data to experimental data for the fault diagnosis of axial piston pumps. Eng Appl Artif Intell 2025; 162: 112643. https://doi.org/10.1016/j.engappai.2025.112643

46.

Feng

Yan

, et al. Cross-modal fusion convolutional neural networks with online soft-label training strategy for mechanical fault diagnosis. IEEE Trans Ind Inform 2023; 20(1): 73–84. https://doi.org/10.1109/TII.2023.3256400

47.

Vargas

Gutiérrez

Barbero-Gómez

, et al. Soft labelling based on triangular distributions for ordinal classification. Inf Fusion 2023; 93: 258–267. https://doi.org/10.1016/j.inffus.2023.01.003

48.

Chao

Shao

Liu

. Health evaluation of axial piston pumps based on density weighted support vector data description. Reliab Eng Syst Saf 2023; 237: 109354. https://doi.org/10.1016/j.ress.2023.109354

49.

Liu

Zhang

Huang

, et al. Temporal–spatial attention network: a novel axial piston pump coupled fault diagnosis method. IEEE Trans Instrum Meas 2024; 73: 1–15. https://doi.org/10.1109/TIM.2024.3398074