Remaining useful life prediction of slewing bearings using attention mechanism enabled multivariable gated recurrent unit network

Abstract

It is difficult to obtain the damage information on large slewing bearings only from vibration signals. In addition, deep learning models trained on old samples do not achieve high accuracy in new tasks. Therefore, this paper uses vibration, temperature, and torque signals of slewing bearings to build a model. Meanwhile, we add attention mechanism to capture internal correlation of them to consider the related factors of remaining useful life (RUL) from multiple angles. The multivariable gated recurrent unit (GRU) based on attention mechanism gated recurrent unit (attention-MGRU) model is adopted to improve the prediction performance. On this foundation, a fine-tuning strategy is introduced to improve the generalization ability of the model. A full-life accelerated test is carried out on the slewing bearing test bench. The model proposed in this paper is compared with GRU prediction model, which utilizes vibration signals and multivariable GRU prediction model. Mean absolute error (MAE) and root-mean-square error (RMSE) are used as measurement indicators. Among different methods, three indicators generated by attention-MGRU show significant superiority. Moreover, the fine-tuned model performs better in new tasks compared with the original model.

Keywords

Attention mechanism gated recurrent units remaining useful life prediction fine-tuning

Introduction

Slewing bearings are used to connect the rotators in a machine, similar to joints. Once they break down or fail, it will cause safety hazards and great economic losses. Therefore, it is very important to improve the reliability of slewing bearings to ensure the operation safe and stable. Active operation and maintenance are effective ways to improve the operational reliability of slewing bearings, but the premise is to accurately evaluate the life state of them. Accurate prediction of remaining useful life (RUL) and timely failure warning can avoid unnecessary losses.

Scholars have studied the life evaluation of slewing bearings to some extent. Data-driven methods do not depend on the damage mechanism of slewing bearings, so using reliable data to build model has become widely used. Data-driven methods mainly include two major steps: signal preprocessing and establishment of prediction model (Lei et al., 2018). Signal denoising is the first step of preprocessing. Common methods to reduce noise include wavelet domain denoising (Caesarendra et al., 2016), ensemble empirical mode decomposition (EEMD) (Žvokelj et al., 2016), local mean decomposition (LMD) (Du et al., 2017), and singular value decomposition (SVD) (Dong et al., 2021).

Feature extraction of signals can reduce the calculated amount and help to highlight the variation trend of signals. Researchers propose a deep autoencoder (DAE)-based joint feature compression method (Ren et al., 2018). This way can retain effective information without increasing the scale of neural network. Improved convolutional deep belief network (CDBN) with compressed sensing (Shao et al., 2018) and normalized conditional variational autoencoder (NCVAE) model (Zhao et al., 2022) are put forward to enhance the feature learning ability. Zhu et al. (2019) proposed a deep feature learning method based on time–frequency representation (TFR) and multi-scale convolutional neural network (MSCNN). This model keeps the synchronization of global and local information. A stacked wavelet autoencoder structure is applied to provide more reliable results with a more complete data set (Shao et al., 2021).

Establishing reasonable and effective health indicator (HI) of slewing bearings is the most valid way to supervise the operating state and reliability of slewing bearings. Main applied methods include support vector domain description (SVDD) (Jiao et al., 2020), hidden Markov model (HMM) (José et al., 2020), and Kolmogorov–Smirnov (K–S) test (Singleton et al., 2015). Dimensionality reduction is a commonly used way to fuse various features and characterize the operating state of equipment. Based on manifold learning, isometric mapping (ISOMAP) is selected to gain HI of signals to obtain a good evaluation effect (Bao et al., 2020). Nguyen and Medjaher (2021) put forward an evaluation criterion for HI, so as to create the most appropriate HI. Ahmad et al. (2018) proposed a linear revision technique (LRT) to deal with fake fluctuations in HI, which improved the accuracy of regression models.

To sum up, there are still problems needing further discussion about preprocessing. Feature extraction of signals should satisfy the requirement of comprehensiveness and avoid the influence made by feature redundancy. So, it is necessary to extract features in multiple domains and screen the features effectively considering actual degradation law.

RUL prediction results of slewing bearings are also depended on the accuracy of the prediction model. Subrata et al. (2019) proposed gene expression programming based on the population diversity (PD-GEP) algorithm, which improves the model’s prediction accuracy and convergence speed. Ding et al. (2019) constructed a new prediction model based on the hybrid genetic programming-model structure adaptive method (HYGP-MSAM), which realizes visual modeling of slewing bearings’ life prediction. Temporal cascade broad learning system is applied in bearings’ life prediction, solving the problem that network can only be updated by retraining when adding newly acquired data (Cao et al., 2022). Pan et al. (2020) suggested a two-stage prediction method based on extreme learning machine to predict RUL of bearings quickly and accurately. However, traditional machine learning methods show insufficient processing capacity because of the complex interrelationships in degradation process. In addition, the inflexibility of models’ structure limits their development. To settle these issues, recurrent neural network (RNN) is applied due to its advantages in time series problems (Guo et al., 2017). Zhang et al. (2018) introduced a life prediction model based on long and short-term memory (LSTM), an improved version of RNN. Using RNNs for modeling merely will result in a lack of accuracy and information capture capability, so scholars make modifications on this basis. She and Jia (2021) proposed bidirectional gated recurrent unit (BiGRU), which allows the predictions to take prior and subsequent information into account. Self-attention ConvLSTM (SA-ConvLSTM) neural network is put forward, and this model can better learn the importance of different time steps (Li et al., 2021).

Due to the lack of generalization ability, numerous models perform poorly in cross-domain prediction. Sun et al. (2019) proposed a deep transfer learning network based on sparse autoencoder (SAE). Analogously, Zou et al. (2021) predicted RUL of bearings through a transfer prediction model based on the dynamic benchmark. Also, an effective method named deep residual LSTM with domain invariance (DIDRLSTM) is applied (Fu et al., 2021). These methods really improve the prognostic performance of cross-domain RUL prediction, but the issues of domain adaptation are not fully considered.

Life prediction tasks are often accompanied by the problem of few shot. To address the lack of sample data, Zan et al. (2021) proposed a method based on the joint approximative diagonalization of eigen matrices (JADE) and particle swarm optimization support vector machine (PSO-SVM), which can offer high accuracy under small sample sizes. Li et al. (2022) proposed a self-data-driven method, which relies on its own state monitoring data for RUL prediction rather than fault data. A novel model named meta-attention RNN is provided by Ding et al. (2022), and the generalization of it is proved. These methods optimize the precision under few shot to some extent, but the ability to capture information is relatively weak and hyperparameter optimization in deep learning algorithm are still challenging (Thoppil et al., 2021).

As can be seen in these articles, three problems of prediction models need to be worked out:

In existing methods, it is difficult to obtain the damage information on slewing bearings comprehensively only from vibration signals.

Building effective HI can improve the accuracy of life assessment. Therefore, it is significant to develop suitable schemes of feature extraction, screening, and fusion.

Models are limited in accuracy or learning ability in cross-domain tasks. Domain adaptation is suffering from many challenges.

Improving any step of RUL prediction can let the results more satisfactory. This paper extracts information from multiple sensors and applied attention mechanism to develop the model. Main contributions of this paper are as follows:

To upgrade the quality of HI, a novel and reasonable method is created to screen compelling features. Also, we make improvements in the establishment of HIs based on ISOMAP and maximum likelihood estimation (MLE).

On the basis of the traditional gated recurrent unit (GRU) model, attention mechanism gated recurrent unit (attention-MGRU) prediction model is proposed to excavate the correlation between multi-physical signals and slewing bearings’ RUL. It helps to excavate damage information deeply and improves accuracy and robustness of the model.

To strengthen the generalization ability of the model, this paper adds a fine-tune strategy which can modify parameters and improve precision on new tasks. Attention mechanism ensures the ability to capture information after fine-tuning.

Preprocessing of signals

This section explores the preprocessing in detail. Signals’ preprocessing includes denoising original signals, extracting features, screening features, and establishing HI. The process of it is shown as follows:

Step 1. The original signals are represented as $x_{vib} (t)$ , $x_{tem} (t)$ , and $x_{tor} (t)$ . Robust local mean decomposition (RLMD) is applied to remove the noise of signals in the first step. By optimizing envelope estimation and screening stop criteria, the optimal step size and the optimal screening times can be determined adaptively on the basis of LMD (Liu et al., 2017).

Step 2. To comprehensively reflect the degradation trend of slewing bearings, this paper uses the features in Table 1 as the extraction objects. As shown in the table, 1 to 10 are the features of the time domain, 11 and 12 are the features of the frequency domain, and 13 is feature of time–frequency domain. $C_{i} (t)$ are intrinsic mode functions (IMF) from signal decomposition in step 1. $x_{i}$ is the $i^{th}$ point of each signal after denoising, $f_{i}$ is the power spectrum frequency at time i , p_i is the power spectrum amplitude, and $C_{i} (t)$ is the natural mode function of signal decomposition. Because the vibration signal is high frequency, extracting the time domain features merely will inevitably lose important information. It is sufficient to extract time domain features for low-frequency signals such as temperature and torque signals.

Step 3. This paper proposes a hybrid evaluation method based on the linear superposition of the monotonicity indicator (equation (1)) and trajectory difference indicator (equation (2)) to screen features of physical signals. The score is a comprehensive evaluation indicator, and the calculation formula is shown in equation (3). Before linear superposition, normalization (equation (4)) is performed uniformly. Among them, Mon and Div represent monotonicity indicator and trajectory difference indicator, respectively

Mon (X) = \frac{1}{n - 1} | α_{X} - β_{X} |

(1)

Div (X, Y) = {(\frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}})}^{- β}

(2)

Score = f_{nor} (Mon) + f_{nor} (Div)

(3)

f_{nor} (x) = \frac{x - min (x)}{max (x) - min (x)}

(4)

Table 1.

Multi-domain candidate features.

Characteristic	Expression	Characteristic	Expression
1 Maximum	$X max {{\| x_{n} \|}_{\max}$	2 Variance	$X_{v} = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}$
3 Root mean square	$X_{rms} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}$	4 Kurtosis	$X_{k} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{4}$
5 Absolute mean	$X_{am} = \frac{1}{n} \sum_{i = 1}^{n} \| x_{i} \|$	6 Root amplitude	$X_{ra} = {[\frac{1}{n} \sum_{i = 1}^{n} \sqrt{\| x_{i} \|}]}^{2}$
7 Shape factor	$X_{sf} = \frac{X_{rms}}{X_{am}}$	8 Impulse factor	$X_{if} = \frac{X_{\max}}{X_{am}}$
9 Crest factor	$X_{crf} = \frac{X_{\max}}{X_{rms}}$	10 Clearance factor	$X_{clf} = \frac{X_{max}}{X_{ra}}$
11 Mean square frequency	$Msf = \frac{\sum_{i = 1}^{n} f_{i}^{2} p_{i}}{\sum_{i = 1}^{n} p_{i}}$	12 Center frequency	$Fc = \frac{\sum_{i = 1}^{n} f_{i} p_{i}}{\sum_{i = 1}^{n} p_{i}}$
13 IMF energy	$X_{imf} = \sum_{i = 1}^{n} {\| C_{i} (t) \|}^{2}$

X and Y are two distinct feature vectors, and $\bar{X}$ and $\bar{Y}$ represent their averages. The numbers of positive and negative differentials in the interval are $α_{X}$ and $β_{X}$ . It should be noted that to comprehensively utilize the knowledge of time domain, frequency domain, and time–frequency domain, the screening results should include the features under each domain and the number of inferior features excluded from each domain should not exceed 1/2 of the total. Only feature 13 is in time–frequency domain, so filtering is not required in time–frequency domain.

Step 4. In this paper, ISOMAP algorithm is used to set up the HIs of vibration, temperature, and torque signals. This method embeds high-dimensional data to low-dimensional space through multidimension scaling (MDS). It is indispensable to estimate the intrinsic dimension before reducing dimension. MLE is used to complete this mission in this process. The calculation process is as follows: establishing likelihood equation, taking the logarithm of the likelihood function, and solving the equation. Finally, matrix A is obtained and utilized to reduce the dimension of feature matrix $X_{fea}$ to the intrinsic dimension calculated by MLE (equation (5)). After that, HIs, $H I_{vib}$ , $H I_{tem}$ , and $H I_{tor}$ , are obtained. The whole preprocessing process makes preparation for the subsequent life prediction

HI = A^{T} X_{fea}

(5)

Build life prediction model

The attention-MGRU model proposed in this paper is a RUL prediction method combining GRU and attention mechanism. Figure 1 demonstrates the forecasting process. In neural networks, RNN is well suited for tasks of time series prediction. GRU and LSTM are the evolution of RNN, and they work out the problem of gradient disappearance. Compared with LSTM, the structure of GRU is simpler and easier to realize. When multiple physical variables are utilized as inputs, it is improper that each variable possesses the same effect on the predicted results. Attention mechanism is capable of assigning weights to each variable based on the output of previous GRU. This will help to capture variables’ internal correlations and enhance the robustness of model. Attention layer’s inputs are HIs and the hidden layer’s latest output. Furthermore, the output is time series modified by weighting. The hidden layer is composed of GRU network structure, and the output layer generates final prediction results. After that, original model is fine-tuned through transforming the loss function.

Figure 1.

Prediction process of attention-MGRU.

Construction of GRU module in hidden layer

Figure 2 shows the network structure of GRU at the hidden layer. In the figure, $X_{t}$ and $h_{t - 1}$ are the inputs of GRU, where $h_{t - 1}$ is the output of the former GRU module. $X_{t}$ is the series of input, calculated by the attention mechanism. $z_{t}$ and $r_{t}$ are the update gate and reset gate of GRU, respectively. Calculation formulae are shown in equations (4) and (5). The initial values of h_t-1 are random

z_{t} = σ (W_{z} \cdot [h_{t - 1}, X_{t}])

(6)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, X_{t}])

(7)

Figure 2.

Structure of attention mechanism and GRU network.

In these formulae, $W_{z}$ and $W_{r}$ are weight matrices, and σ is sigmoid function. In the process of forward training, the reset gate $r_{t}$ controls the combination of the input values $X_{t}$ and $h_{t - 1}$ . The update gate $z_{t}$ determines the current stage’s utilization of the previous stage’s information. After obtaining $z_{t}$ and $r_{t}$ , it is necessary to summarize the input and historical hidden layers, that is, to find the candidate hidden states $\tilde{h_{t}}$ . The calculation formula is as follows

\tilde{h_{t}} = \tanh (W_{h} \cdot [r_{t} * h_{t - 1}, X_{t}])

(8)

$h_{t}$ is the output of the current unit in the hidden layer and serves as the input in the next module. Moreover, the value of $h_{t}$ will also be returned to the attention layer and used as the input to obtain $X_{t}$ . The formula for $h_{t}$ is given as

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * \tilde{h_{t}}

(9)

Loss function $Los s_{RUL}$ is a non-negative function used to measure the difference between predicted values and actual values. In this paper, mean square error (MSE) is selected as the loss function and it is applied for reverse gradient optimization, so as to update the weights in equations (6)–(8). The calculation formula of the loss function is shown in equation (10)

Los s_{RUL} = \frac{1}{t} \sum_{i = 1}^{t} (Y_{i} - h_{i})^{2}

(10)

where Y is the actual life series. BPTT (back-propagation through time) algorithm is used in backward training, and $Los s_{RUL}$ is applied to optimize the weight in the network layer. Adam optimization function is exploited to optimize the gradient.

Addition of attention mechanism

Attention mechanism introduced in this paper is shown in Figure 2. Weights are calculated by multilayer perceptron. HIs at time t and previous output of GRU network $h_{t - 1}$ are taken as the input of the multilayer perceptron. Calculation formula of it is shown in equation (11). $V_{e}^{T}$ , $W_{e}$ , and $U_{e}$ are the neuron weights, and $b_{e}$ is the bias parameter

e_{t}^{n} = V_{e}^{T} \tanh (W_{e} h_{t - 1} + U_{e} {HI}_{t}^{n} + b_{e})

(11)

To quantify the influence degree of different factors on RUL at the current moment, normalization should be carried out. In this paper, SoftMax function is applied to normalize the weights and calculate the final output of the attention mechanism. The calculation formula is shown in equation (12)

a_{t}^{n} = \frac{\exp (e_{t}^{n})}{\sum_{i = 1}^{3} \exp (e_{t}^{i})}

(12)

The importance of each variable can be enhanced or weakened according to the weights. After that, the new series $X_{t}$ is obtained. The expression of $X_{t}$ is shown in equation (13)

X_{t} = (a_{t}^{1} {HI}_{t}^{1}, a_{t}^{2} {HI}_{t}^{2}, a_{t}^{3} {HI}_{t}^{3})

(13)

After training, parameters $W_{z}$ , $W_{r}$ , and $W_{h}$ in hidden layer and $V_{e}^{T}$ , $W_{e}$ , $U_{e}$ , and $b_{e}$ in attention mechanism are obtained. The predicted RUL and the input HIs satisfy the following functional relationship, where $F_{AMGRU}$ is the model after training

RUL = F_{AMGRU} (H I_{vib}, H I_{tem}, H I_{tor})

(14)

Fine-tuning in new tasks

In traditional artificial intelligence (AI)-based prediction approaches, models trained on old data tend to show poor performance on new tasks. For models which only utilize vibration signals to achieve the results of RUL, diverse feature distribution of data will lead to large deviation between the outputs and actual values. Combination of three categories of physical signals can attenuate the effect caused by excessive disparity of one certain variable’s feature distribution to some degree. To further improve the generalization ability of the proposed method and let it adapt to slewing bearings with various physical structures and degradation information, we come up with a fine-tuning strategy.

This strategy divides the model construction into two steps as demonstrated in Figure 1. The initial model trained with labeled data can be regarded as a pre-trained model. This measure can effectively reduce the training cost and obtain excellent prediction model in a short time. Modifying the loss function in training process is the critical part of fine-tuning strategy. Maximum mean discrepancy (MMD) is a novel kernel learning method in domain adaptation (Cao et al., 2021; Dong et al., 2023). The original data of the two domains are mapped to a regenerated Hilbert space, and the distance between them will be calculated after that. As a loss function, MMD is calculated by the following formula

Los s_{MMD} (χ_{train}, χ_{test}) = ‖ \frac{1}{n_{train}} \sum_{i = 1}^{n_{train}} ϕ (x_{train}^{i}) - \frac{1}{n_{test}} \sum_{i = 1}^{n_{test}} ϕ (x_{test}^{i}) ‖_{H}

(15)

where $χ_{train}$ and $χ_{test}$ represent outputs of training samples and test samples. ||▪ ${||}_{H}$ means regenerated kernel Hilbert space, and $ϕ$ (▪) is used to map the outputs into it. MMD loss function satisfies the requirement of improving the generalization ability of attention-MGRU. At the same time, it is necessary to restrict the accuracy between the training results and the labels to ensure no deviation of test results. Therefore, the final loss function combines $Los s_{MMD}$ and $Los s_{RUL}$ with a non-negative parameter λ to balance values of them. The specific calculation formula is present in equation (16)

Los s_{final} = Los s_{RUL} + λ Los s_{MMD}

(16)

It is worth explaining that the learning rate in this process requires to be reduced up to 0.1 times. The model is already convergent before fine-tuning, so a small learning rate does not take too much time. In addition, this action can bring the point closer to the local minimum when fine-tuning the model, which improves the model’s performance.

Experimental verification and discussion

In this section, raw data of three physical variables are provided by a run-to-fail experiment and HIs are constructed through the methods. Superiority and generalization of the proposed model will be deeply studied. The training environment is 12th Gen Intel (R) Core (TM) i7-12700 H @ 2.30 GHz, RAM 16.0 GB, NVIDIA RTX 3060, and pytorch 2.0.1.

Accelerated run-to-fail experiment of large slewing bearings

Actual life of slewing bearings can be more than 20 years, so traditional life experiment will consume too much time and cannot be completed in the laboratory. Accelerated experiment employs greater stress to speed up the process of failure and degradation without changing the failure mechanism, which can help to shorten the life cycle.

This accelerated run-to-fail experiment utilized a test bench independently developed by our group to examine the comprehensive performance of slewing bearings. The test bench mainly includes three parts: mechanical system, hydraulic system, and measurement and control system. The structure is shown in Figure 3. Hydraulic cylinders 1 and 2 provide axial force and overturning moment, while hydraulic cylinder 3 provides radial force.

Figure 3.

Information on the slewing bearing experiment.

The quality of signals collected by sensors is the premise to ensure the reliability of subsequent data analysis. Therefore, selection of sensors, number of measuring points, and the layout mode are significant. Figure 3 shows the installation of multiple types of sensors. To collect comprehensively, method of multi-measuring points is applied. a1–a4 are 4 capacitive vibration acceleration sensors with 90° of separation Angle. T1–T4 are four PT100 temperature sensors installed in the oil injection hole to monitor the temperature rise of lubricating oil. The sensors to measure vibration and temperature are mounted on a four-point contact slewing bearing. With the aggravation of the faults, the friction of the slewing bearings is increasing. So, one torque sensor is installed between the couplings to measure friction torque of the slewing bearing.

Test 1: Utilizing a group of experimental data for detailed analysis and discussion

The signals utilized in test 1 are shown in Figure 4(a)–(c). The horizontal coordinate in the figure represents time, and one period represents 1 day. The experiment provides data for the preprocessing of multiple physical signals. After feature screening, scores of each feature are shown in Table 2. Only vibration signals are analyzed in frequency and time–frequency domains. Superior features of vibration are as follows: $[X_{v}, X_{rms}, X_{k}, X_{am}, X_{ra}, X_{sf}, Msf, X_{imf}]$ , those of temperature are as follows: $[X_{rms}, X_{k}, X_{am}, X_{ra}, X_{sf}]$ , and those of torque are as follows: $[Xrm s_{k am ra {if}_{crf} {clf}_{\max}} []]$ . The results of MLE show that the intrinsic dimension of three physical signals is 1. After ISOMAP, $H I_{vib}$ , $H I_{tem}$ , and $H I_{tor}$ are obtained and each one is made up of 3000 points (1 point = 5 minutes) as presented in Figure 4(d)–(f).

Figure 4.

Raw data and HIs of slewing bearings: (a) Vibration signal, (b) Temperature signal, (c) Torque signal, (d) HI of vibration (e) HI of temperature and (f) HI of torque.

Table 2.

Scores of features.

Variable	1	2	3	4	5	6	7	8	9	10	11	12
Vibration	20.5	27.1	62.4	75.8	52.2	58.0	27.4	20.3	19.8	20.4	58.5	45.3
Temperature	11.9	11.9	14.7	11.9	16.5	18.0	27.2	12.2	12.2	12.2
Torque	12.1	13.1	16.2	13.5	16.5	16.6	14.7	18.7	20.1	18.0

Compared with the original vibration signals, $H I_{vib}$ reflects the actual degradation process of the slewing bearings more clearly. With the time going on, $H I_{vib}$ shows a rising trend and the curve can be divided into five stages: (1) initial run-in, (2) failure generating, (3) early recession, (4) intensification of recession, and (5) rapid failure. Although the curve fluctuates, it does not convert its stable decline trend with increase. On the contrary, the existence of fluctuation is more consistent with the high-frequency peculiarity of vibration signals in actual operation process. The interference of ambient factors is avoided because the approach to reveal temperature adopts absolute rise of temperature. At the same time, the temperature change belongs to the gradual process, which make it different from vibration and torque signals. So, the $H I_{tem}$ fluctuates little and does not have a significant mutation phenomenon. $H I_{tor}$ increases over time, and the curve reflects the change stages including initial run-in, failure initiation, rapid decline, and failure. When the recession worsens, the resistance moment of slewing bearings grows continuously, which is consistent with the mutation phenomenon at about 2500 points. Therefore, $H I_{vib}$ , $H I_{tem}$ , and $H I_{tor}$ are capable of effectively representing the essential process of slewing ring performance degradation.

To verify the superiority of the proposed model, this section compares the results of attention-MGRU with other methods, including GRU model based on vibration signal, GRU based on multiple physical signals (MGRU), bidirectional LSTM (Bi-LSTM), PSO-SVM, and causal convolutional neural network (CNN). Considering the comparison of degradation stages in Figure 4(d)–(f), it can be seen that points 2601–3000 show rapid degradation until failure. The values of HIs in this period are relatively high, and the growth rates have increased significantly. Therefore, the last 400 points of data are used as the test set to verify the predictive accuracy of the model. Table 3 gives the values of hyperparameters in this test. Reasonable hyperparameters can ensure the fairness of comparison. The sample size of training set is not large, so dropout ratio, window length, and batch size are equal to 0.1, 2, and 32, respectively, in each model. Learning rate is set to 0.01, but it should be reduced to half for attention-MGRU and Bi-LSTM. With more parameters, reducing the learning rate can avoid the drastic fluctuation of the loss function. After trials, models converge within 100 epochs. Therefore, training epoch for most models is 100. Causal CNN converges faster so its epoch is 50 to avoid overfitting. Figure 5 describes the methods’ results. Blue line shows the predicted results, and the red line shows the actual RUL.

Table 3.

Hyperparameters and errors in test 1.

Model	Learning rate	Training epoch	MAE	RMSE
Attention-MGRU	0.005	100	19.437	38.992
MGRU	0.01	100	38.019	43.669
GRU	0.01	100	87.351	101.935
Bi-LSTM	0.005	100	56.435	66.818
PSO-SVM	—	—	38.291	50.439
Causal CNN	0.01	50	47.798	60.312

Figure 5.

Results of test 1: (a) Results for Attention-MGRU, (b) Results for MGRU, (c) Results for GRU, (d) Results for Bi-LSTM, (e) Results for PSO-SVM and (f) Results for Causal CNN.

The prediction results based on attention-MGRU model demonstrate that this model owns obvious advantages over the other. RUL predicted by this approach shows small difference with the actual values. During the whole prediction period, the difference is diminutive and no obvious oscillation occurs. Due to the existence of attention mechanism, the internal relationship between HIs of multi-physical signals and RUL can be researched. This model considers the correlation factors of the remaining life from multiple angles, which effectively improves the accuracy and stability. MGRU prediction model takes $H I_{vib}$ , $H I_{tem}$ , and $H I_{tor}$ as inputs to make full use of damage information. It can be seen from the pictures that the deviation degree of RUL is smaller than that of GRU. Although large deviation also occurs from point 100 and point 300, the predictive errors are still lower. Taking information from multiple variables into account leads to this result. The prediction results for GRU and Bi-LSTM are similar. Bidirectional network structure is able to capture more information from data, so the deviation of Bi-LSTM is slightly smaller. In the middle of the forecast period, the coincidence between PSO-SVM’s curve and the curve of actual RUL is high, but large errors and fluctuations occur in the last 50 points. Causal CNN’s results are similar to PSO-SVM’s. These phenomena reflect that their predictive capacity is inadequate.

To compare the performance of each model more intuitively, mean absolute error (MAE) and root-mean-square error (RMSE) are used for comparison and measurement. Their formulae are as follows

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(17)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(18)

where n is the total number of points; $y_{i}$ is the actual RUL; and ${\hat{y}}_{i}$ is the RUL predicted by models. Among them, MAE is used to reflect the magnitude of results’ error and RMSE represents the dispersion of results. Table 3 presents models’ hyperparameters and indicators to evaluate results. Compared with other models, evaluation indicators of attention-MGRU are significantly reduced, which verifies the advantages of this model in predictive tasks of RUL. This method effectively solves the problem that the degradation information on slewing bearings cannot be obtained comprehensively and provides a new idea for the life forecasting of slewing bearings. Small fluctuation of the resulting curve also indicates that the proposed model has strong robustness.

Test 2: Validation and discussion through different data sets

The test performed above divides data into a training set and a test set, which is incapable to support that this method has good generalization. So, vibration and torque signals from other sensors are applied to validate this test. Training sets are the HI data obtained in test 1. New sets of data and HIs of them are shown in Figure 6(a)–(b). Then, parameters of initial model are modified according to the method proposed in “Build life prediction model” section. There is only one set of torque data, so the fine-tuning operation only brings in new vibration and temperature data. Figure 6(c)–(d) demonstrates the prediction results of two different models including values of RUL and the weights each variable accounting for. It can be perceived that the curve after fine-tuning is closer to actual line than the curve of original results from Figure 6(c), and the weights provided by attention layer transform slightly from Figure 6(d). Temperature and torque account for a greater proportion while vibration is the opposite.

Figure 6.

Raw signals, HIs and results of test 2: (a) Raw signals of test 2, (b) HIs of test 2, (c) Results for proposed method in test 2 and (d) Attention weights of test 2.

In addition, the proportion change of the temperature signal is minimal. These phenomena are consistent with the discrepancy of actual data, which validates the effectiveness of results. The values of error indicators of the initial model and modified model as well as other advanced methods are present in Table 4. RUL is normalized in test 2, so the evaluation indicators’ order of magnitude is different from test 1. Hyperparameters of models are the same as those of test 1. Models converge faster in fine-tuning process, so the additional epoch is set to 50. After fine-tuning, the prediction accuracy of attention-MGRU, GRU, and Bi-LSTM gets improved. This proves the effectiveness of fine-tuning. Moreover, compared with other models in the table, attention-MGRU (fine-tune) has the best precision.

Table 4.

Hyperparameters and errors in test 2.

Method	Learning rate	Training epoch	MAE	RMSE
Attention-MGRU	0.005	100	0.0690	0.0883
Attention-MGRU (fine-tune)	0.005	100 + 50	0.0348	0.0442
GRU	0.01	100	0.1863	0.2151
GRU (fine-tune)	0.01	100 + 50	0.0841	0.1109
Bi-LSTM	0.005	100	0.1655	0.1762
Bi-LSTM(fine-tune)	0.005	100 + 50	0.0742	0.0919
Causal CNN	0.01	50	0.2203	0.2598
PSO-SVM	—	—	0.1891	0.2067

Conclusion

Based on the traditional GRU model, an attention-MGRU prediction model is proposed in this paper. The internal correlation of multi-physical signals is captured by the attention mechanism. It can be seen that the approach proposed in this paper can effectively improve the predictive accuracy and stability according to the results. In addition, multiple physical variables ensure the rationality and robustness of the model. It performs well in cross-domain tasks with the aid of fine-tune, which demonstrates its ability of generalization. The method presented in this paper provides a new idea for the maintenance of rotary machinery in the future. In the next stage, we will try to apply this method in practice and do some research to improve model’s predictive ability under the condition of few shot.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant no. 51875273) and Suzhou Key Research and Development Industrialization Project (grant no. SGC2020064). Meanwhile, the authors would like to thank the editors and the anonymous reviewers for their helpful comments.

ORCID iDs

Yiyu Shao

Data availability statement

Data sharing is not applicable to this article as no data sets were generated or analyzed during this study.

References

Ahmad

Khan

Kim

(2018) A hybrid prognostics technique for rolling element bearings using adaptive predictive models. IEEE Transactions on Industrial Electronics 65(2): 1577–1584.

Bao

Wang

Chen

(2020) Life prediction of slewing bearing based on isometric mapping and fuzzy support vector regression. Transactions of the Institute of Measurement and Control 42(1): 94–103.

Caesarendra

Kosasih

Tieu

(2016) Acoustic emission-based condition monitoring methods: Review and application for low speed slew bearing. Mechanical Systems and Signal Processing 72(5): 134–159.

Cao

Jia

Ding

(2021) Transfer learning for remaining useful life prediction of multi-conditions bearings based on bidirectional-GRU network. Measurement 178(6): 1–14.

Cao

Jia

Ding

(2022) Incremental learning for remaining useful life prediction via temporal cascade broad learning system with newly acquired data. IEEE Transactions on Industrial Informatics 19(4): 6234–6245.

Ding

Jia

Ding

(2022) Intelligent machinery health prognostics under variable operation conditions with limited and variable-length data. Advanced Engineering Informatics 53(8): 101691.

Ding

Wang

Bao

(2019) HYGP-MSAM based model for slewing bearing residual useful life prediction. Measurement 141(1): 162–175.

Dong

Pei

(2021) Rolling bearing fault diagnosis method based on multilayer noise reduction technology and improved convolutional neural network. Journal of Engineering Mechanics 57(1): 148–156.

Dong

Xiao

(2023) Deep transfer learning based on Bi-LSTM and attention for remaining useful life prediction of rolling bearing. Reliability Engineering & System Safety 230(2): 1–12.

10.

Zhang

(2017) Fault diagnosis for roller bearing based on local mean decomposition and enhanced envelope spectrum. Journal of Vibration, Measurement & Diagnosis 37(2): 92–96.

11.

Zhang

Lin

(2021) Deep residual LSTM with domain-invariance for remaining useful life prediction across domains. Reliability Engineering & System Safety 216(12): 1–16.

12.

Guo

Jia

(2017) A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 240(5): 98–109.

13.

Jiao

Peng

Dong

(2020) Fault monitoring and remaining useful life prediction framework for multiple fault modes in prognostics. Reliability Engineering and System Safety 203(11): 1–10.

14.

José

Jhon

Mauricio

(2020) Bearing health monitoring using relief-F-based feature relevance analysis and HMM. Applied Sciences 10(15): 1–17.

15.

Lei

Guo

(2018) Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing 104(5): 799–834.

16.

Tang

Deng

(2021) Self-attention ConvLSTM and its application in RUL prediction of rolling bearings. IEEE Transactions on Instrumentation and Measurement 70(6): 3518811.

17.

Lei

(2022) A self-data-driven method for remaining useful life prediction of wind turbines considering continuously varying speeds. Mechanical Systems and Signal Processing 165(2): 1–13.

18.

Liu

Jin

Zuo

(2017) Time-frequency representation based on robust local mean decomposition for multicomponent AM-FM signal analysis. Mechanical Systems and Signal Processing 95(10): 468–487.

19.

Nguyen

Medjaher

(2021) An automated health indicator construction methodology for prognostics based on multi-criteria optimization. ISA Transactions 113(7): 81–96.

20.

Pan

Meng

Chen

(2020) A two-stage method based on extreme learning machine for predicting the remaining useful life of rolling-element bearings. Mechanical Systems and Signal Processing 144(10): 1–17.

21.

Ren

Sun

Cui

(2018) Bearing remaining useful life prediction based on deep autoencoder and deep neural networks. Journal of Manufacturing Systems 48(7): 71–77.

22.

Shao

Jiang

Zhang

(2018) Rolling bearing fault feature learning using improved convolutional deep belief network with compressed sensing. Mechanical Systems and Signal Processing 100(2): 743–765.

23.

Shao

Lin

Zhang

(2021) A novel approach of multisensory fusion to collaborative fault diagnosis in maintenance. Information Fusion 74(10): 65–76.

24.

She

Jia

(2021) A BiGRU method for remaining useful life prediction of machinery. Measurement 167(1): 108277.

25.

Singleton

Strangas

Aviyente

(2015) Extended kalman filtering for remaining-useful-life estimation of bearings. IEEE Transactions on Industrial Electronics 63(3): 1781–1790.

26.

Subrata

Abhishek

Rajsekhar

(2019) Artificial intelligence based gene expression programming (GEP) model prediction of Diesel engine performances and exhaust emissions under Diesosenol fuel strategies. Fuel 235(1): 317–325.

27.

Sun

Zhao

(2019) Deep transfer learning based on sparse autoencoder for remaining useful life prediction of tool in manufacturing. IEEE Transactions on Industrial Informatics 15(4): 2416–2425.

28.

Thoppil

Vasu

Rao

(2021) Deep learning algorithms for machinery health prognostics using time-series data: A review. Journal of Vibration Engineering & Technologies 9(3): 1123–1145.

29.

Zan

Liu

Wang

(2021) Prediction of performance deterioration of rolling bearing based on JADE and PSO-SVM. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 235(9): 1684–1697.

30.

Zhang

Peng

Yan

(2018) Long short-term memory for machine remaining life prediction. Journal of Manufacturing Systems 48(7): 78–86.

31.

Zhao

Yao

Deng

(2022) Normalized conditional variational auto-encoder with adaptive focal loss for imbalanced fault diagnosis of bearing-rotor system. Mechanical Systems and Signal Processing 170(5): 1–23.

32.

Zhu

Chen

Peng

(2019) Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Transactions on Industrial Electronics 66(4): 3208–3216.

33.

Zou

Zhao

Liu

(2021) The transfer prediction method of bearing remain use life based on dynamic benchmark. IEEE Transactions on Instrumentation and Measurement 70(10): 1–11.

34.

Žvokelj

Zupan

Prebil

(2016) EEMD-based multiscale ICA method for slewing bearing fault detection and diagnosis. Journal of Sound and Vibration 370(5): 394–423.