State-adaptive liquid neural network for continuous RUL trajectory modelling of harmonic drive

Abstract

Accurate remaining useful life (RUL) prediction of harmonic drives is essential for ensuring the safety of space manipulators. However, existing data-driven methods tend to use discrete time steps to predict RUL. These approaches ignore the actual physical degradation law of continuous degradation for mechanical equipment. To address this issue, taking neural ordinary differential equations as the core framework, this article proposes a state-adaptive liquid neural network (SALNN), which utilizes its unique liquid time-constant mechanism to characterize the RUL trajectory continuously. Specifically, prior to the prediction stage, a novel health indicator (HI) based on a spectral correlation architecture is constructed, targeting the fault characteristics of harmonic drives. Subsequently, SALNN utilizes a state detection module to perceive device degradation severity from the HI, embedding this as state information into the liquid time-constant, which can endow the network with dynamic properties. This mechanism enables the network to apply differentiated update rates corresponding to distinct degradation states, thereby capturing the physical evolutionary dynamics of the RUL more effectively. Experimental results on harmonic drive datasets and XJTU-SY dataset demonstrate that the proposed method effectively suppresses noise and significantly outperforms other state-of-the-art methods in prediction accuracy.

Keywords

liquid neural network neural ordinary differential equation remaining useful life prediction harmonic drive cyclic spectral correlation

Introduction

Advanced mechanical equipment is widely utilized in automobiles, aerospace, energy and power, and other fields. However, in industrial settings, prolonged operation, changing operating conditions, or improper maintenance can cause equipment to suffer damage.¹ Minor damage may lead to shutdowns and production losses, while severe damage may cause major safety accidents resulting in casualties.² Therefore, accurately estimating remaining useful life (RUL) and planning equipment maintenance in advance have become essential prerequisites for reducing the risk of accidents and ensuring stable equipment operation.³

Under the framework of prognostic health management (PHM), RUL prediction methods for mechanical equipment are sorted into model-based and data-driven approaches.⁴ Model-based methods primarily use mathematical knowledge or physical principles to construct models of equipment operation, thereby describing performance degradation. In research on bearing prognostics, many classical models have been proven by previous scholars to be instructive, such as the crack growth model,⁵ the Archard wear model,⁶ and Paris’ law model.⁷ If the failure mechanism can be thoroughly analysed and the relevant parameters obtained, the model can accurately describe the degradation process, enabling RUL prediction. However, this approach has several limitations. The internal damage mechanisms of complex mechanical systems are difficult to analyse, making it challenging to construct appropriate physical models.⁸ Furthermore, model parameters are usually determined from extensive experimental data or finite element simulations, resulting in high acquisition costs.⁹ Consequently, in scenarios characterized by massive data volumes and complex operating conditions, model-based methods face difficulties in further development, while data-driven approaches are gradually becoming mainstream for RUL prediction.¹⁰

In general, data-driven methods for mechanical equipment prognostics can be implemented in four steps: data collection and processing, constructing HI, segmenting health stages, and predicting RUL. HI construction and RUL prediction are crucial steps in the method. HI construction serves as a link connecting raw monitoring data to the health status of equipment, while RUL prediction is the ultimate goal of machinery health prognostics.¹¹ Recent review studies have provided a systematic summary of RUL prediction methodologies with HI dependence for rotating machinery, highlighting the central role of HI construction and its relationship to subsequent prognostic modelling.¹² This article focuses on improving data-driven methods, particularly in HI construction and network model design.

Constructing an HI that accurately characterizes the trend of equipment degradation is a prerequisite for estimating RUL, since the RUL prediction problem is essentially a regression task that models equipment degradation using the HI.¹³ Based on the construction method, HIs are categorized into physical health indicators (PHIs) and virtual health indicators (VHIs). PHIs are proposed based on the physical mechanisms of the degradation process. Common PHIs include the root mean square (RMS), the kurtosis, and the indicator of second-order cyclostationarity (ICS2), which describe the state of bearings from the perspectives of energy, impact characteristics, and cyclostationarity, respectively. Entropy-based indicators, such as sample entropy and fuzzy entropy, assess the degree of system disorder based on signal complexity. RMS is the most commonly used HI for assessing bearing performance and is the indicator that effectively reflects signal energy. The RMS¹⁴ was used as a time-domain feature to capture the bearing degradation trend in research work. Pan et al.¹⁵ proposed using a relative RMS value, optimized via linear rectification, as the HI for RUL prediction. In addition, mechanical equipment fault signals can manifest as periodic impulses. Thus, impulsiveness and cyclostationarity are regarded as two key attributes for characterizing degradation behaviours.¹⁶ Kurtosis and second-order cyclostationarity are typical indicators of these properties. For instance, in study,¹⁷ kurtosis was employed as an input feature for a Support Vector Regression network to compare prediction performance. Dong and Chen¹⁸, as well as Feng et al.¹⁹ utilized the second-order cyclostationarity of vibration signals to construct an HI characterizing degradation trends of bearings and gears. Entropy characterizes equipment degradation by quantifying the system’s disorder. However, VHIs utilize virtual representations to reveal the bearing degradation process. In the study,²⁰ a matrix factorization method was used to iteratively solve a slow-feature HI. Bai et al.²¹ proposed using genetic programming to adaptively fuse features for constructing HI.

In actual operating environment, monitoring data is often subject to external noise interference, which causes random fluctuations in HI curves and impairs their ability to characterize degradation trends. To reduce the impact of noise and random fluctuations, relevant studies have proposed the following methods: Under the spectral correlation framework, Ni et al.²² introduced Linear Rectification-Wasserstein Distance Spectral Correlation (LR-WDSC), combining the Wasserstein distance and optimized via linear rectification to achieve high-precision bearing RUL prediction. In another study, Ni et al.²³ constructed a WDgram multi-scale feature map and used a multi-objective grasshopper optimization algorithm to generate the Composite Multi-scale Wasserstein Distance (CMSWD). Kuzio et al.²⁴ conducted local damage detection of bearing vibration in the presence of non-Gaussian noise, then constructed a HI by comparing the distance between the information and non-information bands in cyclic spectrum coherence diagram. Xu et al.²⁵ introduced a bearing HI derived from the moving average cross-correlation coefficient of power spectral density. Cohen et al.²⁶ developed a multi-slice dynamic model for helical gears with tooth breakage. They also proposed a novel spectral-energy HI, refuting the assumption that fault severity correlates with damage size. Inspired by these studies, this article proposes a novel PHI that reflects the energy of vibration signals.

In data-driven equipment prognostics, the selection of neural networks also plays a vital role in prediction accuracy, generalization ability, and engineering applicability of RUL prediction. Commonly used neural networks include gated recurrent unit (GRU), long short-term memory (LSTM), graph neural networks (GNNs), and Transformers. In terms of feature extraction, GRUs and LSTMs are biased toward extracting temporal features. Li et al.²⁷ combined Convolutional Neural Network (CNN) feature fusion with GRU to propose an Integrated Deep Multi-scale Feature Fusion Network, which utilizes the Mish activation function to achieve high-precision lifetime prediction on the C-MAPSS dataset. Wang et al.²⁸ proposed a Gated Graph Convolutional Network that integrates multi-sensor fusion for RUL prediction. Xu et al.²⁹ proposed a multi-resolution LSTM for aero-engine RUL prediction and addressed the complex operating conditions of aero-engines. Shi et al.³⁰ proposed a lightweight model based on exponential smoothing and an attention-enhanced LSTM, addressing the issues of attention mechanisms losing temporal information and the complex model deployment. In contrast, GNNs and Transformers are suited to extracting spatial features. Wang et al.³¹ constructed a spatio-temporal graph via dynamic time warping and proposed a Graph Feature-based Graph Convolutional Attention Network to address the challenge of predicting aero-engine lifespan with partially missing multi-sensor data. Cai et al.³² introduced a Knowledge-Embedded Spatio-Temporal Graph Convolutional Network that utilizes prior knowledge to model sensor spatial interactions. Xu et al.³³ developed a spatiotemporal hybrid Transformer variant to address RUL prediction challenges involving multiple operating conditions and high-dimensional multi-sensor features.

Current data-driven methods for equipment RUL prediction primarily improve prediction accuracy by modifying network architectures or optimizing optimization strategies, with research focusing on spatio-temporal representations, transfer learning and continual learning. Wang et al.³⁴ proposed a Dual-View Graph Transformer to address the challenges of feature fusion arising from complex spatiotemporal dependencies in RUL prediction. Wang et al.³⁵ combined multi-task learning, dual-level adversarial transfer learning, and multi-level attention to predict the RUL of CNC milling tools in parallel. Qian et al.³⁶ proposed a Dimension-Mismatched Adversarial Network for rolling bearing RUL transfer prediction under cross-domain scenarios with mismatched input dimensions. Zhou et al.³⁷ proposed a Knowledge Library Network for non-exemplar incremental RUL prediction to achieve accurate prediction across sequential tasks while alleviating catastrophic forgetting. However, such methods generally have limitations: their predictive processes are disconnected from physical degradation laws and fail to account for the underlying physical dynamics of equipment from health to failure. For example, the degradation process of equipment such as bearings and engines has inherent “irreversible and progressive” properties. Yet, existing methods mostly adopt a point-prediction paradigm, producing discrete outputs that often yield fluctuating prediction trajectories that violate physical laws.³⁸ Predicting RUL from a trajectory modelling perspective can effectively address this limitation. The equipment degradation process is essentially a continuous, smooth evolution of system states. Characterizing temporal dependencies by predicting state derivatives between time steps enables the establishment of RUL trajectories that conform to physical laws and have causal relationships. As abstract descriptions of dynamic systems, Neural Ordinary Differential Equations (NODEs) are suitable for such trajectory modelling. Its recursive property can simulate the degradation process where the current state depends on historical evolution, and can integrate physical priors through explicit dynamic equations.³⁹

As a result, NODEs have attracted significant attention for RUL prediction. On the one hand, these models can simulate continuous degradation trajectories using numerical solvers, avoiding the fluctuation issues inherent to point prediction. Zhou et al.³⁸ proposed a Dynamic Governing Network that employs a parameterized NODES as the governing equation for the RUL trajectory, thereby aligning the output trajectories closely with actual degradation by constraining the degradation rate. On the other hand, NODEs grant these models stronger generalization capabilities. Hu et al.⁴⁰ used the time-invariance property of NODEs to learn consistent features across different degradation stages, effectively mitigating distribution shifts in single-source domain generalization scenarios. Hasani et al.⁴¹ introduced a liquid time-constants (LTCs) model that uses time-varying constants in NODEs to simulate dynamic changes in sensitivity during equipment degradation. This model has demonstrated superior stability and accuracy compared to LSTM and traditional NODEs in temporal modelling task. However, due to the LTCs network’s excessive sensitivity and the lack of interpretability in PHM, this article proposes a novel LTC network and applies it to RUL prediction.

The framework is shown in Figure 1, and the innovations and contributions of this article are reflected in the following aspects:

(1) The Targeted Cyclic Energy Indicator (TCEI) is constructed based on second-order cyclostationary properties. Through spectral correlation analysis, it selectively extracts energy at specific cyclic frequencies and fault-related frequency bands. By combining the mean fault-symptomatic peaks (MFP) with screening high-energy fault bands, it suppresses noise interference and prevents misjudgment of irrelevant frequency bands. Additionally, exponential smoothing is applied to reduce random fluctuations.

(2) This article proposes state-adaptive liquid neural network (SALNN). To address the dynamic differences between the healthy and degraded stages of mechanical equipment, a dual-time-constant system is designed. Guided by the health probability output from a state discriminator, it adaptively controls the initial values and fluctuation ranges of the time constants. This allows the model to capture long-term trends during healthy states and respond to short-term fluctuations in the degradation stage, resolving the oversensitivity and fluctuation issues of traditional LTC networks.

(3) Extensive experiments are conducted on the harmonic drives and the bearings. The results validate the performance advantages of the proposed indicator and model.

Figure 1.

Framework of the proposed method.

The main structure of the present article are as follows: section “Construction of the novel HI” presents the detailed methodology for constructing HI from a signal processing perspective. Section “The proposed prognostic approach” elaborates on the technical details of the proposed prediction methodology. Section “Case 1: Harmonic drive dataset” and section “Case 2: XJTU-SY dataset” present the experimental validation and performance evaluation of the proposed method on two distinct datasets, respectively. Section “Conclusion” provides a systematic summary and conclusion of the overall research work presented in this article.

Construction of the novel HI

This section conducts a deep study of second-order cyclostationary indicators, targeting the fault characteristic frequency bands of vibration signals to construct an energy indicator that reflects potential degradation features of rotating machinery. The following describes the methodology for building a novel second-order cyclostationary energy indicator based on spectral correlation analysis and frequency band selection criteria.

Spectral correlation analysis

Vibration signals generated by rotating machinery faults, such as gear tooth surface wear and bearing pitting, are typically considered a chain of periodic impulses. However, when operating, the arrival intervals between two consecutive fault impacts exhibit random slips, which transform the signal from an ideal periodic signal into a periodically time-varying signal. This property has inspired extensive research into first- and second-order cyclostationary signals in industrial settings. The rotating machinery fault signals belong to the second-order cyclostationary signals, manifested as periodic fluctuations of their second-order autocorrelation functions in time⁴²:

R_{xx} (t, τ) = E {x (t - \frac{τ}{2}) x^{*} (t + \frac{τ}{2})}

(1)

R_{xx} (t, τ) = R_{xx} (t + T, τ)

(2)

where $R_{xx} (t, τ)$ is time-varying autocorrelation function of $x (t)$ , $E {\cdot}$ denotes mathematical expectation operator, $t$ indicates time, $τ$ is time-lay, $T$ denotes autocorrelation period, $x^{*}$ indicates conjugate function of $x$ .

Since the spectral correlation function can characterize the information contained in second-order cyclostationary signals, spectral correlation has become one of the most widely used methods among various second-order cyclostationary analysis techniques. The spectral correlation function below can be obtained by performing Fourier transforms on the instantaneous autocorrelation function in both the time and time-lag dimensions:

S_{x} (α, f) = \int \int R_{x} (t, τ) e^{- j 2 π (α t + f τ)} d t d τ

(3)

where $α$ is the cyclic frequency indicating the modulation frequency, transformed from time $t$ . $f$ is the frequency indicating the carrier frequency, transformed from time-lag $τ$ .²²

Fault information estimation

To screen for high-energy fault characteristic frequency bands in the spectral correlation map, this article proposes MFP to identify energy values. The original MFP to the noise (MFPN) uses “peak-to-noise ratio” to evaluate fault information in frequency bands. Although it can effectively suppress noise interference, it tends to misjudge high-energy bands as invalid under intense background noise or when overall resonance energy is high. In this engineering context, this article improves MFPN by removing the average noise term from the original denominator and retaining the numerator. Finally, for each cyclic frequency slice, the spectral correlation amplitudes are accumulated according to preset fault frequency orders, yielding a curve that reflects the accumulation of fault energy. The larger the amplitude of the curve, the stronger the resonance energy caused by fault impacts in the corresponding frequency band. High-energy fault frequency bands can thus be obtained without additional penalty terms. The formula for the fault feature peak indicator is as follows⁴³:

\begin{matrix} MFP (f_{n}) = \sum_{k = 1}^{K} \\ max {\hat{S} ({\hat{α}}_{m}, f_{n}), {\hat{α}}_{m} \in [k \times α_{fault} - Δ α, k \times α_{fault} + Δ α]} \end{matrix}

(4)

where $f_{n}$ denotes spectral frequency, $K$ is order ( $K = 2$ is set in this article), $α_{fault}$ indicates fault frequency ( $Δ α = 0.01 α_{fault}$ is set in this article).

Construction of TCEI

To construct the novel HI, the acquired vibration signals are subjected to spectral correlation analysis. The corresponding fault characteristic frequencies are calculated according to the formulas. Subsequently, the MFP indicator is employed to screen the target frequency bands. Finally, the fault characteristic frequencies, the screened fault frequency bands, and the spectral correlation map are incorporated into the energy integration formula to calculate the TCEI for that specific time step. By computing this sequentially for each time step, the full-life cycle TCEI indicator is obtained.

TCEI (t) = \int_{f_{1}}^{f_{2}} \int_{α_{0} - Δ α}^{α_{0} + Δ α} S (α, f) d α d f

(5)

where $t$ is time step, $[f_{1}, f_{2}]$ denotes the selected frequency band, $α_{0}$ indicates the fault frequency ( $Δ α = 0.1 α_{0}$ is set in this article).

From the perspective of predicting degradation trajectories, the stochastic fluctuations within the HI fail to convey meaningful information about bearing degradation. Instead, the model can make them unnecessarily complicated during training, leading to additional parameters that fit the noise. This ultimately compromises prediction accuracy and robustness. Therefore, the exponential smoothing technique is employed to mitigate fluctuations during mechanical equipment degradation, facilitating better extraction of the underlying degradation trend characteristics. Algorithm 1 outlines the entire process, while the construction and smoothing formula for TCEI is as follows³⁰:

TCEI {(t)}_{smooth} = \frac{\sum_{i = 0}^{t} w_{i} \cdot TCEI (t - i)}{\sum_{i = 0}^{t} w_{i}}

(6)

w_{i} = {(1 - γ)}^{i}

(7)

γ = \frac{2}{1 + s}

(8)

where the decay factor $γ$ adjusts smoothness, $S$ represents the smoothing coefficient.

Algorithm 1.

Construction of the TCEI.

Inputs: The mechanical equipment run-to-failure dataset {x_i} ^N collected at N moments

Outputs: N TCEIs of the run-to-failure dataset

Initialize: Calculate the fault characteristic frequency of mechanical equipment

α

Procedure:

1: Calculate the autocorrelation function and the SC of x_i

2: Slice the SC of x_i along

α

axis, obtain a cyclic power-spectra denoted as

S_{x_{i}} (f)

3: Use MFP to filter out the target fault frequency band

[f_{a}, f_{b}]

S_{x_{i}} (f)

4: Obtain TCEI _i by accumulating Equation (5)

5: Calculate N TCEIs of x_i from i = 1 to i = N in step 4

6: Construction of the TCEI_smooth using the exponential weighted smoothing shown in Equation (6) to

process N TCEIs obtained in step 5

The proposed prognostic approach

This section introduces a deep learning model designed for RUL prediction. The model’s predictive workflow primarily consists of three components: state detection, feature extraction, and RUL prediction. The state discriminator first analyses the HI sequence using the model. The detection results directly determine the time constants, which, in turn, govern the update rates of subsequent hidden states. Subsequently, an improved NODEs network is employed to iteratively extract latent representations of the degradation features from hidden states. Finally, the iterative results are fed into a fully connected layer to arrive at the last forecast of RUL.

Neural ordinary differential equation

In previous studies, most researchers treated RUL prediction as a time-series modelling task and used traditional time-series models, such as GRUs and LSTMs. However, mechanical equipment degradation is a continuous, dynamically evolving process, while these time-series methods are discrete-time models that record historical information through jumps across multiple discrete time steps. They tend to suffer from gradient explosion in long sequences, and temporal correlations are unstable, making it challenging to capture the essence of degradation dynamics. In recent years, a new neural network framework based on ordinary differential equation (ODE) has been proposed. It defines the hidden state of a neural network as the solution to an ODE, modelling the system’s evolution process in time domain. This network architecture is not only more suitable for time-series modelling tasks but also highly consistent with the physical meaning of mechanical equipment degradation. The formula for parameterizing the continuous dynamics of hidden units using ODE is as follows³⁹:

\frac{d h (t)}{d t} = f (h (t), t, θ), where h (t_{0}) = h_{0}

(9)

where $h (t)$ denotes hidden state, $t$ is time, $θ$ indicates network parameters. Given system dynamic function $f$ and initial input state $h_{0}$ , numerical integration can be performed on the differential equation over the interval $[0, T]$ , with the output $h (T)$ being the final value. The entire integration process is performed by an adaptive ODE solver that iteratively computes the dynamic function.³⁹

In light of this, several variants of NODEs have emerged in the field of artificial intelligence. Inspired by biological nervous systems, Hasani et al.⁴¹ proposed the LTC network that combines a presynaptic neuron current architecture with a neuron membrane potential equation:

\frac{d v}{d t} = - g_{l} v (t) + S (t)

(10)

S (t) = f (v (t), I (t)) (A - v (t))

(11)

where $v (t)$ is transmitted current signal; $g_{l}$ is leakage conductance, which denotes the natural decay trend of membrane potential; $S (t)$ is the sum of all presynaptic neural input currents; $f (\cdot)$ is a sigmoid nonlinear function dependent on presynaptic neurons and external input $I (t)$ , which simulates the probability of neurotransmitter release. $(A - v (t))$ indicates the driving potential difference.

Hasani proposed the dynamics of LTCs represented by the following equation⁴¹:

\frac{d h (t)}{d t} = - [\frac{1}{τ} + f (h (t), x (t), t, θ)] h (t) + f (h (t), x (t), t, θ) A

(12)

where $x (t)$ denotes the input, $τ$ represents time-constant, and $A$ is the anchor parameter. The neural network $f$ drives the variation of hidden state $h (t)$ and integrates into the system as time constant $τ_{sys} = \frac{1}{1 / τ + f (\cdot)}$ depending on input and state. This liquid property enables a time-varying constant to autonomously adjust its response speed and dynamic behaviour in real time. This inherent adaptability makes it suitable for diverse and complex time-series modelling tasks. Hasani et al.⁴¹ further showed that, for any bounded input, the hidden state and time constant of the LTCs model remain within finite ranges, which is essential for reliable time-series modelling. Experiments also show that the LTCs model is highly expressive within the family of continuous-time models. This means it can generate more complex and richer dynamic trajectories.

To improve computational efficiency, Hasani developed a custom hybrid solver rather than using general solvers. This solver combines the advantages of both Explicit and Implicit Euler methods. The solver expression is as follows:

h (t + Δ t) = \frac{h (t) + Δ t \cdot f (h (t), x (t), t, θ) A}{1 + Δ t (1 / τ + f (h (t), x (t), t, θ))}

(13)

where $Δ t$ denotes integration step size.

State-adaptive liquid neural network

This article introduces the LTC networks into RUL prediction, then proposes improvements on the significant difference in dynamic behaviour between healthy and degraded states, based on mechanical equipment degradation characteristics. A novel ODE network is designed that combines a dual-time-constant system and a state discriminator.

The network architecture and RUL prediction process are illustrated in Figure 1. The HI sequence is fed into both the state discriminator and the LTCs cell units simultaneously. The state discriminator analyses the sequence to output a health-state probability, which can then be used to regulate the time constants. The LTCs cell units utilize these time constants to establish ODEs. A hybrid-form solver performs numerical solution, and the output is passed through a LayerNorm layer to ensure stability during training. The entire network adopts a bidirectional extraction architecture. Finally, the output layer maps the ultimate hidden state to the predicted RUL. This network design balances physical interpretability with computational efficiency, preserving the advantages of trajectory modelling while enhancing the capability to capture degradation laws.

The dynamic behaviour of time constants directly determines the accuracy of ODE models in capturing degradation. To achieve precise state recognition, a deep neural network discriminator is constructed. This discriminator adopts a multi-layer perceptron architecture, enhances classification training stability through batch normalization layers, and outputs a continuous probability estimate of the equipment health state via a sigmoid activation function. The process of state classification is shown in Figure 2 and the following formula can represent the structure of the discriminator:

h_{1} = BatchNorm (ReLU ({Fc}_{1} (x_{t})))

(14)

h_{2} = BatchNorm (ReLU ({Fc}_{2} (h_{1})))

(15)

h_{3} = ReLU ({Fc}_{3} (h_{2}))

(16)

p_{t} = σ ({Fc}_{3} (h_{3}))

(17)

where $x_{t}$ is input signal, ${Fc}_{1}$ , ${Fc}_{2}$ , ${Fc}_{3}$ , and ${Fc}_{4}$ denote fully connected layers, $h_{1}$ , $h_{2}$ , and $h_{3}$ are latent features, $p_{t}$ indicates health state probability.

Figure 2.

Flowchart of the state classification.

From a mechanical degradation perspective, the time-constant-related term can be viewed as a damping term that regulates the rate of equipment degradation. Based on the state-aware mechanism, this article further enhances the model’s adaptability to multi-stage degradation processes by designing a differentiated time-constant system for two typical stages. The two typical stages are healthy and degraded, with specific differences reflected in two aspects: initial value setting and fluctuation range. In the healthy stage, mechanical equipment operates stably with a slow degradation rate and gentle changes. Therefore, a larger initial value of the time constant and a narrower fluctuation range are selected. This setup can reduce model sensitivity to short-term minor noise, focus on capturing long-term, stable evolutionary trends of the healthy state, and avoid misjudging the equipment degradation process due to local fluctuations. In the degraded stage, mechanical equipment performance declines rapidly, and fluctuations in operating conditions intensify. Thus, a smaller initial value of the time constant and a wider fluctuation range are adopted. This configuration enhances the model’s responsiveness to short-term degradation features, accurately captures key dynamic information, such as sudden changes in equipment performance and aggravated faults, and ensures real-time tracking of the degradation process.

The state-adaptive time-constant differential equation is as follows:

\frac{d h (t)}{d t} = - [\frac{1}{τ_{0}} + f_{τ} (p_{t}, t, τ_{h}, τ_{d})] h (t) + f (h (t), x (t), t, θ) A

(18)

f_{τ} (p_{t}, t, τ_{h}, τ_{d}) = {\begin{matrix} \frac{C}{τ_{h}}, p_{t} \geq 0.5 \\ \frac{C}{τ_{d}}, p_{t} < 0.5 \end{matrix}

(19)

where $τ_{0}$ is the set initial value of the time constant, $τ_{h}$ and $τ_{d}$ are two groups of learnable time constants used to evaluate different states ( $τ_{h}$ denotes health state, $τ_{d}$ denotes degradation state), $C$ is a constant parameter.

Although the hybrid solver adopted in this article is based on a core logic of implicit solvers, it ultimately yields a closed-form analytical solution. This characteristic eliminates the need for complex, time-consuming, iterative operations at each numerical computation step, placing its computational load in the same order of magnitude as that of the Explicit Euler method. From a numerical stability perspective, the implicit formulation inherently yields a superior solution to the differential equation. In contrast to explicit methods, which are prone to numerical oscillations due to step-size sensitivity, this hybrid solver allows SALNN to adopt larger computational step sizes during simulation. This effectively avoids system divergence caused by inappropriate step-size settings and reduces solution risk during model training and prediction. Considering the solver’s mathematical structure, the denominator term functions similarly to an adaptive stabilizer. When the equipment degradation system undergoes sharp dynamic changes, such as sudden shifts in degradation rate or abrupt increases in fault characteristics, the denominator term responds promptly by increasing. In turn, this reduces the magnitude of hidden-state updates, preventing abnormal feature extraction caused by severe system fluctuations. This structural design also enhances the robustness of the model computational process and ensures stability in modelling complex degradation processes.

The selection of the differential solver references the hybrid solver proposed in the LTC network. Algorithm 2 is the ODE solver iterative solution.⁴¹ The solution formula is as follows:

h (t + Δ t) = \frac{h (t) + Δ t \cdot f (h (t), x (t), t, θ) A}{1 + Δ t (1 / τ + f_{τ} (p_{t}, t, τ_{h}, τ_{d}))}

(20)

Algorithm 2.

SALNN update by fused ODE solver.

Parameters:

θ

= {

τ^{(N \times 1)}

= time-constant,

L

= number of unfolding steps,

Δ t

= step size}

Inputs: Inputs x(t) of length T, x(0)

Outputs: Next SALNN neural state

h_{t + Δ t}

Function: FusedStep

(h (t), x (t), Δ t, θ)

h (t + Δ t)

[h (t) + Δ t \cdot f (h (t), x (t), t, θ) ⊙ A] / [1 + Δ t (1 / τ + f_{τ} (p_{t}, t, τ_{h}, τ_{d}))]

f (\cdot)

, and all divisions are applied element-wise

⊙ is the Hadamard product

end Function

h_{t + Δ t} = h (t)

For i = 1 …L do

h_{t + Δ t} = FusedStep (h (t), x (t), Δ t, θ)

end for

return

h_{t + Δ t}

RUL prediction and loss function

To ensure stable output from the RUL prediction layer, this article introduces normalization into model training: the hidden-state output from each iteration of the ODE is fed into a LayerNorm layer. Through standardization, gradient fluctuations during training are suppressed, and the instability of model training caused by deviations in the feature distribution is avoided.

In addition, to fully explore the temporal correlation in equipment degradation and to prevent the omission of historical or future degradation information due to one-way feature extraction, this article embeds the entire ODE neural network framework within a bidirectional extraction architecture. This architecture consists of two parallel feature extraction branches, a forward branch and a backward branch. The forward branch iterates from the start to the end, capturing features during the “healthy-degraded” process; the backward branch iterates from the end to the start, mining degradation-traceability information implied by the “faulty-healthy” process, such as the retrospective capture of early weak fault features. The hidden states of the two bidirectional branches are finally fused through feature concatenation to form comprehensive degradation features that contain both forward and reverse temporal information. These features are then fed into a fully connected layer, which performs dimensionality reduction and a nonlinear mapping to produce an RUL prediction. The feature fusion process of the bidirectional architecture and the mathematical expression of RUL prediction are as follows:

h_{husion} = Concat ({ODE}_{forward} (x_{1 : T}), {ODE}_{backward} (x_{T : 1}))

(21)

h_{norm} = LayerNorm (h_{fusion})

(22)

h_{drop} = Dropout (ReLU (W_{1} h_{norm} + b_{1}), p)

(23)

y_{t}^{*} = W_{2} h_{drop} + b_{2}

(24)

where $W_{1}$ and $W_{2}$ represent learnable matrices, $b_{1}$ and $b_{2}$ are bias matrices, $p_{t}$ is the parameter of the dropout layer, $y_{t}^{*}$ denotes the final prediction result.

The SALNN proposed in the article is essentially a multi-task collaborative learning framework, which integrates two tasks: the RUL prediction task and the state monitoring task. Two tasks have a strong coupling relationship. The recognition accuracy of the state monitoring task directly determines adaptive regulation of the time constant, which in turn affects the model’s prediction accuracy. To address the problem that a single-task loss function is complex to optimize for simultaneously improving the discriminator performance, this article proposes a dual-supervised loss function. By jointly optimizing an objective, this loss function enables collaborative training. On the one hand, it minimizes the difference between the forecasted and actual RUL values; on the other hand, it improves the state discriminator ability to identify the critical transition point between the equipment’s healthy and degraded states. Ultimately, this ensures the effectiveness and stability of model training.

This dual supervised loss function is composed of a regression loss term and a classification loss term, each with a weight, corresponding to the optimization objectives of the RUL prediction task and the state monitoring task, respectively. The backpropagation construction is shown in Figure 3. Its mathematical expression is as follows:

L_{total} (θ_{r}, θ_{s}) = L_{RUL} (θ_{r}) + λ L_{state} (θ_{s})

(25)

L_{RUL} (θ_{r}) = \frac{1}{T} \sum_{t = 1}^{T} {(y_{t}^{*} - y_{t})}^{2}

(26)

L_{state} (θ_{r}) = - \frac{1}{T} \sum_{t = 1}^{T} [s_{t} \cdot \log (p_{t}) + (1 - s_{t}) \cdot \log (1 - p_{t})]

(27)

where $L_{total}$ represents the total loss function, $L_{RUL}$ represents the loss function for RUL prediction, $L_{state}$ represents the loss function for state prediction, $θ_{r}$ is the set of parameters for predicting RUL, $θ_{s}$ , the set of parameters for health state classification, $T$ denotes the total number of samples, $s_{t}$ denotes the truth state.

Figure 3.

The backpropagation of SALNN. SALNN: state-adaptive liquid neural network.

Case 1: Harmonic drive dataset

Dataset description

The full-life cycle test for harmonic drives was conducted on an aerospace power system comprehensive performance test platform. As shown in Figure 4, the test bench primarily comprises three components: a drive motor, a load motor, and harmonic drives test units. Vibration signals along the x, y, and z axes are simultaneously acquired using two accelerometers. Data collection occurs at 10-min intervals, with each sampling session lasting 5 s at a sampling frequency of 12.8 kHz.

Figure 4.

Photo of test bench and harmonic drive fault parts.

The experimental campaign includes 11 independent tests, generating 11 distinct data subsets that encompass multiple operational condition combinations. These comprise three rotational speed levels (1000, 1250, and 1500 rpm) and three load levels (40, 60, and 80 Nm). Detailed operational parameters for each subset are provided in Table 1. For model training and validation, leave-one-out cross-validation is used. During each iteration, one of the 11 subsets is chosen as the test set, and the other ten are utilized for training. The training set is divided into training and validation subsets in a 4:1 ratio to aid in optimizing model parameters and avoiding overfitting.

Table 1.

Detailed operating conditions of the harmonic drives full-life cycle dataset.

Datasets	Speed (rpm)	Load (Nm)
Test1	1500	80
Test2	1500	80
Test3	1500	80
Test4	1500	80
Test5	1500	80
Test6	1000	80
Test7	1500	60
Test8	1250	80
Test9	1500	40
Test10	1500	40
Test11	1500	80

Selection of cycling frequency and fault frequency band

Before constructing the TCEI, it is essential to determine the key parameters required for its calculation: the cyclic frequency and the fault frequency band. The selection of these parameters directly dictates the indicator’s ability to characterize the equipment’s degradation trend. Insufficient parameter matching will result in an indicator that fails to reflect the equipment’s actual degradation state accurately.

Using the full-life cycle dataset, Test3 as an analysis sample, a segment of its vibration signal is extracted for visual analysis, as shown in Figure 6(a). The time-domain waveform shows the signal is contaminated by noise, making it impossible to directly identify the fault characteristics of the harmonic drive. Therefore, the fault type is defined based on actual damage observed in the harmonic drive. The damage in Test3 primarily manifests as wear. The tooth surface of the test piece flexspline was observed under a microscope. The result is shown in Figure 5. Degradation phenomena such as wear on the flexspline, circular spline, and the contact surface between the wave generator and flexspline lead to a reduction in meshing stiffness and an increase in clearance between components.⁴⁴ This change has two main effects: first, an offset of the instantaneous centre, and second, a bi-periodic fluctuation in meshing stiffness. In the demodulated spectrum, this modulation phenomenon is evidenced by an increase in the amplitude at twice the rotational frequency and its harmonics. Correspondingly, the spectral correlation map also shows high energy concentration at twice the rotational frequency. Therefore, twice the rotational frequency is identified as the cyclic frequency for the harmonic drive.

Figure 5.

Tooth surface wear: (a) Test2, (b) Test3, and (c) Test7.

To further screen for sensitive fault frequency bands, the MFP is employed to analyse the energy distribution across frequency bands. The result is shown in Figure 6(c). The statistical characteristics of the MFP curve across the entire frequency range are calculated, using the mean plus two standard deviations as the high-energy threshold for screening. Continuous frequency points exceeding this threshold are then selected as the fault frequency band. The screening formula is as follows:

μ = \frac{1}{N} \sum_{f = f_{min}}^{f_{max}} P (f)

(28)

σ = \sqrt{\frac{1}{N} \sum_{f = f_{min}}^{f_{max}} {[P (f) - μ]}^{2}}

(29)

f_{selected} = {f | P (f) > μ + 2 σ}

(30)

where $P (f)$ denotes the MFP value at frequency $f$ , $[f_{min}, f_{max}]$ is frequency range, $N$ is the total number of frequency points.

Figure 6.

Test3 vibration analysis: (a) time domain signal, (b) spectral correlation, and (c) frequency band selection.

Consequently, the range [4200, 4600] Hz is chosen as the integration interval for the fault frequency band, which will be used for the subsequent energy calculation of the TCEI. Based on the above analysis, cyclic frequency and fault frequency band screening were conducted for all 11 groups of harmonic drive datasets. Finally, the fault frequency bands required for TCEI calculation for each dataset were determined, with appropriate rounding applied. The screening results of fault frequency bands for all equipment are shown in Table 2.

Table 2.

Detailed values of the cyclic frequency and fault frequency band of the harmonic drives.

Datasets	Cyclic frequency (Hz)	Fault band (Hz)
Test1	50	[2500, 2900]
Test2	50	[2200, 2600]
Test3	50	[4200, 4600]
Test4	50	[2000, 2400]
Test5	50	[2400, 2800]
Test6	33	[2400, 2800]
Test7	50	[3300, 3700]
Test8	42	[2200, 2600]
Test9	50	[1800, 2200]
Test10	50	[1800, 2200]
Test11	50	[2300, 3000]

TCEI

To capture the degradation trajectory of the harmonic drives and reflect its degradation characteristics, the optimal parameters screened were substituted into TCEI calculation formula for feature extraction from each acquired vibration signal segment. Taking datasets Test3 and Test4 as examples, the extracted TCEI curves are shown in Figure 7, respectively. As illustrated, the TCEI curves for both datasets exhibit clear multi-stage degradation characteristics. During the first half of the service cycle, TCEI values remain low, indicating stable and healthy equipment operation. In the latter half, TCEI values show a continuous upward trend, reflecting gradual performance deterioration that ultimately leads to failure. These results validate the effectiveness of TCEI in characterizing the degradation process of the harmonic drive.

Figure 7.

TCEIs: (a) Test3 and (b) Test4.

This article conducts a quantitative analysis from three dimensions, monotonicity, correlation, and robustness, and compares it with traditional HIs to evaluate the performance of TCEI systematically.⁴⁵ The definitions of each evaluation indicator are as follows⁴⁶: Monotonicity measures the degree to which a mechanical equipment’s HI changes monotonically over time, reflecting the irreversibility of the degradation process. Robustness represents the ability of a HI to resist noise interference and random fluctuations during equipment degradation. A higher value indicates that the indicator is less affected by interference and has stronger stability. It is usually quantified using the smoothness or signal-to-noise ratio of the HI sequence. Correlation measures the degree of linear correlation between a HI and a time series, calculated using the Pearson correlation coefficient. The closer the absolute value is to 1, the more significant the correlation. The mathematical expressions of each indicator are as follows²³:

➢ Monotonicity:

O_{Mon} (HI) = \frac{1}{L - 1} | no . PD - no . ND |

(31)

where $L$ denotes the total length of the sequence, $no . PD$ represents the positive differences between consecutive HI values, $no . ND$ represents the negative differences between consecutive HI values.

➢ Robustness:

O_{Rob} (HI) = \frac{1}{L} \sum_{i = 1}^{L} \exp (- \frac{| {HI}_{i} - {HI}_{i}^{'} |}{{HI}_{i}})

(32)

where ${HI}_{i}$ denotes the ith indicator value, ${HI}_{i}^{'}$ represents the ith smoothed HI (moving average smoothing).

➢ Correlation:

\begin{matrix} O_{Cor} (HI, T) = \\ \frac{L \sum_{i = 1}^{L} ({HI}_{i} \cdot t_{i}) - \sum_{i = 1}^{L} {HI}_{i} \cdot \sum_{i = 1}^{L} t_{i}}{\sqrt{[L \sum_{i = 1}^{L} {HI}_{i}^{2} - {(\sum_{i = 1}^{L} {HI}_{i})}^{2}] \cdot [L \sum_{i = 1}^{L} t_{i}^{2} - {(\sum_{i = 1}^{L} t_{i})}^{2}]}} \end{matrix}

(33)

where $t_{i}$ denotes the ith time point.

➢ Composite indicator:

CI = 0.3 Mon (HI) + 0.3 Cor (HI, T) + 0.4 Rob (HI)

(34)

To further verify the superiority of TCEI, four widely used HIs for RUL prediction were selected for comparative experiments. The physical significance of each benchmark indicator is as follows: RMS reflects the overall energy fluctuation intensity of vibration signals; kurtosis indicates the prominence of impact components in vibration signals and is sensitive to incipient faults; cyclic energy indicator is a traditional energy metric based on second-order cyclostationary properties; fuzzy entropy characterizes the complexity and disorder of vibration signals, reflecting the degree of system state irregularity. The composite evaluation metrics of the four benchmark indicators and the TCEI were calculated, with the results shown in Figure 8.

Figure 8.

Comprehensive indicators.

For an intuitive comparison of degradation trends, datasets Test3 and Test4 were selected to visually compare the health curves of the TCEI against the benchmark indicators, as illustrated in Figure 9. Integrating both quantitative and qualitative analyses, it is evident that the TCEI achieves a higher composite metric than the four benchmark indicators. Furthermore, its degradation curve clearly reveals the multi-stage characteristics of the equipment, demonstrating superior correlation and degradation discernibility compared to the benchmark indicators.

Figure 9.

Indicator comparison: (a) Test3 and (b) Test4.

RUL prediction and comparative experiment

TCEI and four types of mainstream HIs were used as input features, respectively, and fed into SALNN for RUL prediction comparison experiments to verify TCEI application performance in practical engineering scenarios. The prediction results are shown in Table 3.

Table 3.

Comparison of HI prediction results.

Tasks	TCEI		ICS2		RMS		Fuzzy entropy		Kurtosis
Tasks	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Test1	0.080	0.139	0.131	0.186	0.124	0.195	0.047	0.074	0.148	0.172
Test2	0.048	0.069	0.394	0.450	0.138	0.218	0.158	0.212	0.259	0.293
Test3	0.028	0.051	0.200	0.228	0.107	0.190	0.091	0.111	0.194	0.213
Test4	0.056	0.076	0.202	0.230	0.137	0.185	0.094	0.127	0.514	0.544
Test5	0.038	0.063	0.274	0.295	0.078	0.117	0.117	0.160	0.295	0.375
Test6	0.040	0.063	0.212	0.237	0.124	0.181	0.159	0.230	0.414	0.472
Test7	0.049	0.082	0.176	0.224	0.081	0.097	0.052	0.080	0.428	0.448
Test8	0.174	0.227	0.321	0.359	0.112	0.147	0.124	0.167	0.261	0.318
Test9	0.082	0.105	0.295	0.343	0.203	0.243	0.153	0.177	0.229	0.255
Test10	0.119	0.151	0.298	0.398	0.184	0.221	0.214	0.295	0.257	0.317
Test11	0.035	0.093	0.203	0.242	0.046	0.138	0.196	0.209	0.242	0.258
Average	0.068	0.102	0.246	0.290	0.122	0.175	0.1290	0.158	0.295	0.333

In prediction tasks involving 11 groups of harmonic drives, the model with TCEI as input achieved the highest average prediction accuracy and the optimal individual score in 8 of the groups. To visually demonstrate the predictive effect of TCEI, two units were selected, and their trajectories of RUL prediction are shown in Figure 10. The results indicate that TCEI has strong practicality in engineering applications.

Figure 10.

RUL prediction results: (a) Test4 and (b) Test6. RUL: remaining useful life.

In this section, a variety of representative models are selected for comparative experiments to verify SALNN’s performance. The comparative models fall into two categories: first, classical baseline models such as Bidirectional-Gated Recurrent Unit (Bi-GRU); second, state-of-the-art models that have performed well in RUL prediction in recent years, including MSTformer,³³ Dual Attention-Long Short-Term Memory (DA-LSTM),³⁰ Closed-form Continuous-depth neural-based hybrid difference Features Re-representation Network (CFC-F2RN),⁴⁷ Multilevel Focal Self-Attention (MLFSA),³⁷ and Slow Feature Analysis-Bidirectional-Long Short-Term Memory (SFA-Bi-LSTM).²⁰

The experimental results show that, across 11 groups of harmonic drives, SALNN achieves optimal prediction performance in most units. The average scores of the prediction results are shown in Table 4, and the average error of the proposed method is lower than that of all comparative models. The complete average experimental results are shown in Tables 5.

Table 4.

Prediction results of comparative methods.

Tasks	SALNN		Bi-GRU		MSTformer		DA-LSTM		CFC-F2RN
Tasks	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
Test1	0.080	0.139	0.089	0.144	0.090	0.164	0.079	0.134	0.106	0.164
Test2	0.048	0.069	0.051	0.082	0.039	0.081	0.054	0.093	0.055	0.081
Test3	0.028	0.051	0.065	0.115	0.069	0.127	0.063	0.124	0.067	0.123
Test4	0.056	0.076	0.057	0.083	0.076	0.102	0.059	0.087	0.050	0.078
Test5	0.038	0.063	0.041	0.064	0.045	0.070	0.045	0.067	0.036	0.047
Test6	0.040	0.063	0.057	0.090	0.052	0.079	0.061	0.082	0.061	0.086
Test7	0.049	0.082	0.063	0.120	0.060	0.121	0.064	0.111	0.059	0.103
Test8	0.174	0.227	0.181	0.246	0.181	0.247	0.173	0.234	0.186	0.246
Test9	0.082	0.105	0.099	0.143	0.101	0.147	0.116	0.159	0.108	0.169
Test10	0.120	0.151	0.125	0.164	0.119	0.164	0.124	0.167	0.121	0.162
Test11	0.035	0.093	0.046	0.096	0.034	0.101	0.038	0.101	0.034	0.095

SALNN: state-adaptive liquid neural network; Bi-GRU: bidirectional-gated recurrent unit; DA-LSTM: dual attention-long short-term memory; MAE: mean absolute error; RMSE: root mean square error; CFC-F2RN: closed-form continuous-depth neural-based hybrid difference features re-representation network.

Table 5.

Average prediction results of comparative experiments.

Methods	MAE	RMSE
Bi-GRU	0.080	0.122
MSTformer	0.079	0.127
DA-LSTM	0.079	0.124
CFC-F2RN	0.080	0.123
MLFSA	0.078	0.121
SFA-Bi-LSTM	0.079	0.126
SALNN	0.068	0.102

To intuitively demonstrate differences in prediction accuracy among models, four groups of units were selected, and three representative comparative models (MSTformer, DA-LSTM, and Bi-GRU) were selected. The RUL prediction trajectories under different network frameworks were visually compared with the actual degradation trajectories, and the results are shown in Figure 11. According to the results, the SALNN prediction trajectory closely matches the actual degradation curve of the harmonic drives.

Figure 11.

Comparative experiments: (a) Test4, (b) Test5, (c) Test6, and (d) Test7.

Ablation experiment

This section involved designing and conducting a series of ablation experiments to examine the impact of each module in the proposed method. In the model proposed above, a LayerNorm layer from the Transformer module was introduced at the end of the output hidden unit of each LTCs cell to stabilize the cell; for this reason, in the ablation experiments, a comparative model with this LayerNorm layer removed was first constructed and named M1. Regarding the adaptive time- constant module in the model, two sets of comparative experiments were designed to explore its impact: the first set directly removed the time-constant module and simplified the entire network into an ODE network, named M2, while the second set used a traditional liquid time-constant network, called M3. The complete model proposed in this article was designated M4 and served as the performance benchmark.

The RUL prediction results of the four methods (M1–M4) are shown in Table 6. A comparative analysis reveals that M4 achieves better predictive performance than the others. M1 experienced multiple gradient explosions during training, leading to extremely unstable final predictions, fully verifying the key role of LayerNorm in maintaining model stability. As M2 removes the adaptive time-constant module, its structure is overly simplified, failing to capture the equipment’s nonlinear degradation process accurately. In contrast, M3 adopts a complex liquid time constant, leading to large fluctuations in prediction results. This indicates that the complexity of the time constant is not positively correlated with performance; overly complex designs may, in fact, make the model more susceptible to short-term signal fluctuations.

Table 6.

Comparison of ablation experiments.

Methods	MAE	RMSE
M1	0.087	0.121
M2	0.078	0.116
M3	0.077	0.114
M4	0.068	0.102

MAE: mean absolute error; RMSE: root mean square error.

Parameter discussion

This section designs controlled experiments to investigate the impact of model key parameters, focusing on three types: the hidden layer dimension of ODE cells, the batch size, and the sequence length. For each parameter, an average was calculated from 10 independent experiments to obtain the final result. For the remaining parameters, the liquid neural network is constructed with three layers and employs the Rectified Linear Unit (ReLU) function as the activation function.

The hidden layer dimension directly determines the model feature expression and memory capabilities, and its value is crucial to prediction accuracy. When the dimension is small, the model fitting ability is insufficient, making it challenging to extract deep features. When the dimension is large, the model complexity and computational cost increase, and overfitting may also occur. For this reason, four different hidden layer dimensions were selected for comparative experiments, and the prediction results are shown in Figure 12. Analysis shows that when the hidden layer dimension is set to 64, the model achieves the smallest prediction error. Batch size is the number of samples used to update parameters in each iteration, and it must balance computational efficiency with model stability. As shown in Figure 12, when the batch size is set to 32, it ensures stable parameter updates. The sequence length is the size of the input to the model during a single training session. Its size affects the ODE iteration process on the data, thereby influencing the extraction of degradation features. Four different sequence lengths were selected for comparative experiments in this article, and the results are shown in Figure 12. Research suggests that setting the sequence length to three results in the model’s best prediction performance.

Figure 12.

Parameter controlled experiments: (a) hidden layer dimension, (b) batch size, and (c) sequence length.

State detection and time-constant analysis

As a basis for adaptive time-constant allocation, accurate condition monitoring affects time constant regulation. The network architecture parameters of state discriminator and the RUL prediction layer are shown in Table 7.

Table 7.

Network architecture parameters.

Module	Configuration	Output dimension
State discriminator	FC(3,64) + ReLU + BN	64
	FC(64,128) + ReLU + BN	128
	FC(128,64) + ReLU	64
	FC(64,1) + Sigmoid	1
RUL prediction layer	FC(64,32) + ReLU + Dropout	32
RUL prediction layer	FC(32,1)	1

RUL: remaining useful life; FC: fully connected layer; ReLU: rectified linear unit; BN: batch normalization.

To verify the condition discriminator performance, the discriminator health probability output was compared with the actual degradation trajectory. The classification results for each dataset are shown in Table 8.

Table 8.

State classification accuracy.

Tasks	Test1	Test2	Test3	Test4	Test5	Test6	Test7	Test8	Test9	Test10	Test11
Accuracy (%)	88.64	91.10	100.00	93.18	88.68	97.30	96.75	85.17	96.61	94.99	94.12

Using the Test3 and Test6 datasets as cases, the results are shown in Figure 13: the red curve represents the equipment health-state probability predicted by the model, the blue curve represents the actual RUL degradation trajectory, and the background colour indicates the model’s state classification results via binarization. The result shows that the condition discriminator accurately identifies the health state transition points, and the output health probability is consistent with the actual degradation trend, ensuring accurate regulation of the time constant.

Figure 13.

State classifications: (a) Test3 and (b) Test6.

Case 2: XJTU-SY dataset

Dataset description

The second case study utilizes the rolling bearing dataset from Xi’an Jiaotong University.⁴⁸ This dataset was collected through accelerated degradation experiments on 15 bearing test units under three different operating conditions, comprehensively recording vibration signals from initial healthy operation to complete failure. For each operating condition, five parallel bearing test units were arranged to ensure data reliability. Detailed parameters for each operating condition and the corresponding bearings are presented in Table 9. The experimental platform structure, shown in Figure 14, consists of test bearings, a rotating shaft, an AC motor, a speed controller, support bearings, and a hydraulic system, simulating real-world bearing operation scenarios. The signal acquisition parameters were set as follows: a sampling frequency of 25.6 kHz, with vibration data collected for 1.28 s/min, ensuring complete capture of state characteristic changes throughout the bearing degradation process.

Table 9.

The operating condition information of the XJTU-SY dataset.

Datasets	Speed (rpm)	Load (KN)
Bearing1-1–Bearing1-5	2100	12
Bearing2-1–Bearing2-5	2250	11
Bearing3-1–Bearing3-5	2400	10

Figure 14.

Rolling bearing test platform.

For model training and validation, the leave-one-out method was adopted. Each time, 1 of the 14 subsets was selected as the test set, while the remaining 13 served as the training set. Within the training set, an 11:2 ratio was further used to partition the data into training and validation subsets, aimed at optimizing model parameters and mitigating overfitting.

Selection of cycling frequency and fault frequency band

For the selection of parameters related to dataset, the vibration signals from the Bearing2_3 dataset Figure 15(a) were selected for analysis as an example. By substituting structural parameters into calculations, the outer race fault characteristic frequency was determined to be 107.91 Hz. From the spectral correlation diagram Figure 15(b), a significant energy concentration at the outer race fault characteristic frequency can be clearly observed; further, the MFP algorithm was applied to the cyclic frequency slices, accurately locating the target fault frequency band where energy is concentrated. For bearings under other operating conditions, the relevant fault frequencies were calculated first, and then the MFP algorithm was used to locate the target fault frequency bands Figure 15(c). Through this process, the relevant parameters can be determined.

Figure 15.

Bearing1_1 vibration analysis: (a) time domain signal, (b) spectral correlation, and (c) frequency band selection.

TCEI

To capture the degradation trend of bearings and accurately reflect degradation characteristics, the selected parameters are substituted into the formula to perform targeted energy feature extraction on each segment of the collected vibration signals. The extracted TCEI is an energy indicator that shows a sharp increase as faults become more severe. For better visualization, the HIs of the two bearing datasets (Bearing2_3 and Bearing3_4) after TCEI extraction are partially truncated, and the TCEI values for the final several time steps are not displayed initially. However, the full-life cycle data are used as input during RUL prediction, ensuring the model training process remains unaffected. As shown in Figure 16, the TCEI of both devices demonstrates clear multi-stage characteristics: the energy remains low during the first half of the service life, indicating a healthy state, while it shows an upward trend in the latter half, reflecting gradual degradation leading to failure, thus effectively representing the degradation progression.

Figure 16.

TCEIs: (a) Bearing2_3 and (b) Bearing3_4.

RUL prediction and comparative experiment

To determine the effectiveness of the proposed indicator and model, the SALNN was compared with multiple RUL prediction methods using the dataset. The results of these methods are presented in Table 10. The proposed method achieves higher average prediction scores in terms of Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to other methods. Furthermore, the prediction results for the Bearing2_3 and Bearing3_4 datasets are shown in Figure 17 and the comparison results are shown in Figure 18.

Table 10.

Prediction results of comparative methods.

Methods	MAE	RMSE
MSTformer	0.092	0.165
DA-LSTM	0.084	0.149
CFC-F2RN	0.085	0.179
Bi-GRU	0.079	0.146
SALNN	0.072	0.133

Bi-GRU: bidirectional-gated recurrent unit; DA-LSTM: dual attention-long short-term memory; MAE: mean absolute error; SALNN: state-adaptive liquid neural network; RMSE: root mean square error; CFC-F2RN: closed-form continuous-depth neural-based hybrid difference features re-representation network.

Figure 17.

RUL prediction results: (a) Bearing2_3 and (b) Bearing3_4. RUL: remaining useful life.

Figure 18.

Comparative experiments: (a) Bearing2_3 and (b) Bearing3_4.

Conclusion

This article proposes a novel HI and an ODE neural network that integrates the degradation mechanism of harmonic drives. The TCEI constructed based on second-order cyclostationary characteristics exhibits multi-stage performance and anti-interference capability, allowing for efficient adaptation with the ODE neural network. The segmented time-constant in SALNN model compensates for the shortcomings of the traditional LTCs framework in RUL prediction, enabling it to accurately mine degradation trend features. A LayerNorm layer is introduced into the ODE cells to suppress gradient fluctuations, and the network is embedded in a bidirectional architecture to enhance feature extraction capability. Verification on the harmonic drive dataset confirmed the effectiveness and generalization ability of the indicator and the model. Compared with existing methods, the proposed TCEI indicator and SALNN model demonstrate significant advantages, providing support for the engineering deployment of online training and offline prediction, and laying theoretical and experimental foundations for the subsequent development of mobile detection technology.

However, the method proposed in this study is currently limited to relatively simple short-term forecasting scenarios and does not yet support transfer learning, continual learning, or forecasting for other, more complex tasks. Future work will incorporate these models into the framework and pursue more in-depth investigations on this basis.

Footnotes

ORCID iDs

Mingzhe Du

Zongyang Liu

Jing Lin

Wenhao Li

Hao Li

Xinyu Lu

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the National Natural Science Foundation of China under grant 52305089 and grant 52235002, and the Development Project of Beilun District under grant 2024BLG009, which is highly appreciated by the authors.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Lin

Liu

, et al. An interpretable waveform segmentation model for bearing fault diagnosis. Adv Eng Inform 2024; 61: 102480. https://doi.org/10.1016/j.aei.2024.102480

Cui

Zhang

, et al. A novel sparse Gaussian process regression with time-aware spatiotemporal kernel for remaining useful life prediction and uncertainty quantification of bearings. Struct Health Monit 2025; 24: 3868–3889. https://doi.org/10.1177/14759217241282876

Wang

Yang

, et al. A remaining useful life prediction approach for ball bearing by acoustic emission signal and physics-informed neural network. Struct Health Monit 2025; Online First. https://doi.org/10.1177/14759217251333056

Liu

, et al. A data-driven framework for real-time failure prediction in adhesively bonded composite joint with acoustic emission data. Thin Walled Struct 2026; 222: 114489. https://doi.org/10.1016/j.tws.2026.114489

Liu

Chiang

J-Y

Liu

, et al. Dual-attention enhanced variational encoding for interpretable remaining useful life prediction. Neurocomputing 2025; 624: 129487. https://doi.org/10.1016/j.neucom.2025.129487

Quagliato

Kim

Lee

, et al. Run-out based crossed roller bearing life prediction by utilization of accelerated testing approach and FE numerical models. Int J Mech Sci 2017; 130: 99–110. https://doi.org/10.1016/j.ijmecsci.2017.06.006

Liao

Discovering prognostic features using genetic programming in remaining useful life prediction. IEEE Trans Ind Electron 2014; 61: 2464–2472. https://doi.org/10.1109/TIE.2013.2270212

Lei

Gontarz

, et al. A model-based method for remaining useful life prediction of machinery. IEEE Trans Reliab 2016; 65: 1314–1326. https://doi.org/10.1109/TR.2016.2570568

Ohana

Matania

Talan

, et al. A new holistic approach to investigating and estimating rolling bearing RUL based on physical grounds. Struct Health Monit 2025; Online First. https://doi.org/10.1177/14759217251327357

10.

Zhang

Yan

, et al. Remaining useful life prediction of motor bearings based on slow feature analysis-assisted attention mechanism and dual-LSTM networks. Struct Health Monit 2025; Online First. https://doi.org/10.1177/14759217251324103

11.

Stamatelatos

Galanopoulos

Zarouchas

, et al. Graph neural networks for SHM: exploiting spatial interdependencies of strain data for diagnostics and prognostics. Struct Health Monit 2025; Online First. https://doi.org/10.1177/14759217251386802

12.

Zhou

Yang

Xiang

, et al. Remaining useful life prediction methodologies with health indicator dependence for rotating machinery: a comprehensive review. IEEE Trans Instrum Meas 2025; 74: 1–19. https://doi.org/10.1109/TIM.2025.3556919

13.

Song

Sun

Gao

, et al. A novel HI construction method based on healthy-state data training for rotating machinery components. Struct Health Monit 2025; 24: 3849–3867. https://doi.org/10.1177/14759217241279784

14.

Zhou

Fan

, et al. A global attention based gated temporal convolutional network for machine remaining useful life prediction. Reliab Eng Syst Saf 2025; 260: 110997. https://doi.org/10.1016/j.ress.2025.110997

15.

Pan

Meng

Chen

, et al. A two-stage method based on extreme learning machine for predicting the remaining useful life of rolling-element bearings. Mech Syst Signal Process 2020; 144: 106899. https://doi.org/10.1016/j.ymssp.2020.106899

16.

Feng

, et al. A fault information-guided variational mode decomposition (FIVMD) method for rolling element bearings diagnosis. Mech Syst Signal Process 2022; 164: 108216. https://doi.org/10.1016/j.ymssp.2021.108216

17.

Liu

Yang

Hauptmann

AG.

Simultaneous bearing fault recognition and remaining useful life prediction using joint-loss convolutional neural network. IEEE Trans Ind Inform 2020; 16: 87–96. https://doi.org/10.1109/TII.2019.2915536

18.

Dong

Chen

Study on cyclic energy indicator for degradation assessment of rolling element bearings. J Vib Control 2011; 17: 1805–1816. https://doi.org/10.1177/1077546310362860

19.

Feng

Smith

Borghesani

, et al. Use of cyclostationary properties of vibration signals to identify gear wear mechanisms and track wear evolution. Mech Syst Signal Process 2021; 150: 107258. https://doi.org/10.1016/j.ymssp.2020.107258

20.

Zhang

Ding

, et al. Rolling bearing degradation stage division and RUL prediction based on recursive exponential slow feature analysis and Bi-LSTM model. Reliab Eng Syst Saf 2025; 259: 110923. https://doi.org/10.1016/j.ress.2025.110923

21.

Bai

Noman

Yang

, et al. Towards trustworthy remaining useful life prediction through multi-source information fusion and a novel LSTM-DAU model. Reliab Eng Syst Saf 2024; 245: 110047. https://doi.org/10.1016/j.ress.2024.110047

22.

Feng

Data-driven prognostic scheme for bearings based on a novel health indicator and gated recurrent unit network. IEEE Trans Ind Inform 2023; 19: 1301–1311. https://doi.org/10.1109/TII.2022.3169465

23.

Feng

, et al. Data-driven bearing health management using a novel multi-scale fused feature and gated recurrent unit. Reliab Eng Syst Saf 2024; 242: 109753. https://doi.org/10.1016/j.ress.2023.109753

24.

Kuzio

Zimroz

Wyłomańska

Methodology for health indicators design based on distributions’ distance measures applied to robust CSC maps. Application to non-Gaussian vibration-based fault detection. Mech Syst Signal Process 2025; 238: 113166. https://doi.org/10.1016/j.ymssp.2025.113166

25.

Pennacchi

Chatterton

A new method for the estimation of bearing health state and remaining useful life based on the moving average cross-correlation of power spectral density. Mech Syst Signal Process 2020; 139: 106617. https://doi.org/10.1016/j.ymssp.2020.106617

26.

Cohen

Bachar

Matania

, et al. Enhanced fault diagnosis of helical gears: advanced dynamic modeling and novel health indicators for early detection of tooth breakage. Struct Health Monit 2025. https://doi.org/10.1177/14759217251369337

27.

Jiang

Liu

, et al. An integrated deep multiscale feature fusion network for aeroengine remaining useful life prediction with multisensor data. Knowl Based Syst 2022; 235: 107652. https://doi.org/10.1016/j.knosys.2021.107652

28.

Wang

Cao

, et al. A gated graph convolutional network with multi-sensor signals for remaining useful life prediction. Knowl Based Syst 2022; 252: 109340. https://doi.org/10.1016/j.knosys.2022.109340

29.

Han

Zhu

, et al. Multi-resolution LSTM-based prediction model for remaining useful life of aero-engine. IEEE Trans Veh Technol 2024; 73: 1931–1941. https://doi.org/10.1109/TVT.2023.3319377

30.

Shi

Zhong

Zhang

, et al. A dual attention LSTM lightweight model based on exponential smoothing for remaining useful life prediction. Reliab Eng Syst Saf 2024; 243: 109821. https://doi.org/10.1016/j.ress.2023.109821

31.

Wang

Peng

Wang

, et al. Remaining useful life prediction based on graph feature attention networks with missing multi-sensor features. Reliab Eng Syst Saf 2025; 258: 110902. https://doi.org/10.1016/j.ress.2025.110902

32.

Cai

Zhang

, et al. Knowledge embedded spatial–temporal graph convolutional networks for remaining useful life prediction. Reliab Eng Syst Saf 2025; 259: 110928. https://doi.org/10.1016/j.ress.2025.110928

33.

Xiao

Liu

, et al. Spatio-temporal degradation modeling and remaining useful life prediction under multiple operating conditions based on attention mechanism and deep learning. Reliab Eng Syst Saf 2023; 229: 108886. https://doi.org/10.1016/j.ress.2022.108886

34.

Wang

Cao

, et al. DVGTformer: a dual-view graph Transformer to fuse multi-sensor signals for remaining useful life prediction. Mech Syst Signal Process 2024; 207: 110935. https://doi.org/10.1016/j.ymssp.2023.110935

35.

Wang

Liu

, et al. Multi-task dual-level adversarial transfer learning boosted RUL estimation of CNC milling tools. Knowl Based Syst 2025; 312: 113152. https://doi.org/10.1016/j.knosys.2025.113152

36.

Qian

Zhou

Hou

, et al. Dimension-mismatched adversarial network: a new feature distribution adaptation method for rolling bearing RUL prediction. Reliab Eng Syst Saf 2026; 271: 112269. https://doi.org/10.1016/j.ress.2026.112269

37.

Zhou

Luo

, et al. Knowledge library network for non-exemplar incremental remaining useful life prediction. IEEE/ASME Trans Mechatron 2026; Online First. https://doi.org/10.1109/TMECH.2025.3650415

38.

Zhou

Zhao

, et al. Time-varying trajectory modeling via dynamic governing network for remaining useful life prediction. Mech Syst Signal Process 2023; 182: 109610. https://doi.org/10.1016/j.ymssp.2022.109610

39.

Chen

RTQ

Rubanova

Bettencourt

, et al. Neural ordinary differential equations. 2019. https://doi.org/10.48550/arXiv.1806.07366

40.

Zhang

Neural ODE powered model for bearing remaining useful life predictions with intra- and inter-domain shifts. Adv Eng Inform 2025; 64: 103077. https://doi.org/10.1016/j.aei.2024.103077

41.

Hasani

Lechner

Amini

, et al. Liquid time-constant networks. 2020. https://doi.org/10.48550/arXiv.2006.04439

42.

Żuławiński

Antoni

Zimroz

, et al. Applications of robust statistics for cyclostationarity detection in non-Gaussian signals for local damage detection in bearings. Mech Syst Signal Process 2024; 214: 111367. https://doi.org/10.1016/j.ymssp.2024.111367

43.

Zhang

Miao

Lin

, et al. Weighted envelope spectrum based on the spectral coherence for bearing diagnosis. ISA Trans 2022; 123: 398–412. https://doi.org/10.1016/j.isatra.2021.05.012

44.

Masoumi

Alimohammadi

An investigation into the vibration of harmonic drive systems. Front Mech Eng 2013; 8: 409–419. https://doi.org/10.1007/s11465-013-0275-5

45.

Chen

Qin

Wang

, et al. Health indicator construction by quadratic function-based deep convolutional auto-encoder and its application into bearing RUL prediction. ISA Trans 2021; 114: 44–56. https://doi.org/10.1016/j.isatra.2020.12.052

46.

Zhou

Yang

Qin

A systematic overview of health indicator construction methods for rotating machinery. Eng Appl Artif Intell 2024; 138: 109356. https://doi.org/10.1016/j.engappai.2024.109356

47.

Wang

, et al. A closed-form continuous-depth neural-based hybrid difference features re-representation network for RUL prediction. Reliab Eng Syst Saf 2025; 253: 110540. https://doi.org/10.1016/j.ress.2024.110540

48.

Lei

Han

Wang

, et al. XJTU-SY rolling element bearing accelerated life test datasets: a tutorial. J Mech Eng 2019; 55: 1–6. https://doi.org/10.3901/JME.2019.16.001