Multiobjective domain intelligent fault diagnosis of rotating machinery based on multitask soft-margin twin matrix machine

Abstract

Support matrix machine (SMM) has emerged as a powerful intelligent classification technique with broad applications in mechanical fault diagnosis. However, SMM and its derivative methods can only complete the recognition of a single-objective task at the same time, making it difficult to balance multitask classification, and easily ignoring information between multiobjectives. Therefore, a multitask failure pattern identification scheme based on multitask soft-margin twin matrix machine (MTSTMM) is proposed. In MTSTMM, a soft-margin is designed to improve the flexibility of handling noisy, and to elevate the model’s out-of-distribution generalization capacity by increasing tolerance for misclassification. Meanwhile, by setting a regularized multitask learning weighting term, matrix singularity during computation and model overfitting during training are avoided, and implicit information between multiple tasks is effectively utilized. In addition, a kernel technique is used to fully utilize the structural information of matrix samples, which effectively solve the problem of slow convergence in traditional matrix classifiers. Through two independent experiments, experimental findings demonstrate that the proposed MTSTMM exhibits stronger multitask classification ability in noisy environments.

Keywords

Fault diagnosis multitask soft-margin twin matrix machine support matrix machine multitask learning bearing

Introduction

Rolling bearings and gears, as two key components widely used in rotating machinery, play a crucial role in various fields such as industrial production, transportation, and aerospace. However, due to long-term operation in complex environments, such as high-load, high-speed, and variable working conditions, rolling bearings, and gears are prone to failure, which can affect the normal operation of the entire system and even lead to serious economic losses and safety accidents. Therefore, conducting fault diagnosis research on typical components of rotating machinery is of great significance.^1–3

Traditional fault diagnosis methods mainly rely on manual experience, using senses such as hearing and touch to determine equipment status, which is not only inefficient but also difficult to guarantee accuracy and reliability. As artificial intelligence technology advances at breakneck speed, data-driven fault diagnosis approaches have increasingly emerged as a focal point of research. Pattern recognition, as an important part of artificial intelligence, includes methods such as artificial neural networks,^4–6 random forests,^7,8 extreme learning machines,^9,10 and support vector machines (SVMs),^11–13 providing powerful tools for fault diagnosis. However, the above method requires extracting a limited number of features from the original signal during data sample classification, which may result in the loss of structural information of the original features. Therefore, support matrix machine (SMM) and its improved algorithms based on matrix theory have been proposed.^14,15 SMM achieves grouping effects by expanding the spectrum of a conventional elastic network on a matrix, thereby capitalizing on the inherent structural correlations within matrix-form data. To further enhance the performance of SMM, Zheng et al.¹⁶ proposed the sparse SMM (SSMM), which improves the sparsity property of the model by defining new regularization terms. Wu et al.¹⁷ proposed transfer least square SMM, which solves the problem of insufficient label samples through transfer learning. Pan et al.¹⁸ put forward the symplectic hyperdisk matrix machine, which reduces the interference of noise in the original signal on the model through symplectic geometric similarity transform. Li et al.¹⁹ proposed the nonparallel least squares SMM (NPLSSMM), which reduces the computational complexity of the model by using a matrix form squared loss term. However, SMM and its derivative methods are based on modeling a single-task and cannot utilize the intertask correlation information when dealing with multitask classification.

Considering the limited feature data available for single-objective learning methods, multitask learning (MTL) methods can mine shared information between tasks, resulting in good learning outcomes for all tasks. Therefore, relevant scholars have proposed combining SVM with MTL to further improve learning ability and achieve multitask classification. Zhang et al.²⁰ proposed a multitask support vector with pinball loss, which achieves insensitivity to noise and stability to resampling by maximizing the quantile distance of each task. Mei and Xu²¹ proposed the multitask least squares twin support vector machine (MTLS-TWSVM), which improves computational efficiency by using the least squares algorithm to avoid solving quadratic programming problem (QPP). In addition, Liu and Xu²² proposed a multitask nonparallel support vector machine (MTNPSVM), which introduces a new loss term to avoid matrix inversion operations and reduce computational costs. Although these methods can effectively achieve multitask classification, they lack flexibility in handling noisy samples and insufficient extraction of intertask correlation information.

Considering the importance of implicit and common information in multitask classification processes, as well as the strong interference of strong noise information on hard classification boundaries, a novel fault diagnosis model based on multitask soft-margin twin matrix machine (MTSTMM) is put forward. In MTSTMM, a soft margin is designed to enhance the flexibility of the model in handling noisy, which significantly enhances generalization ability of the model by increasing its tolerance for misclassification. Meanwhile, the problem of matrix singularity in the calculation and the model overfitting phenomenon during the training process are effectively avoided by setting regularized MTL weighting term (RMTLW). In addition, by defining a kernel technique, the iterative algorithm of traditional matrix classifiers is transformed into solving QPPs, improving computational efficiency. This research makes three fundamental contributions:

A soft-margin model is constructed to improve the flexibility of handling noisy, which significantly enhances generalization ability of the model.

A RMTLW is designed to avoid the problems of matrix singularity in the calculation process and model overfitting during training.

MTSTMM is applied to the fault dataset of rotating machinery, and the result shows that MTSTMM has excellent multitask classification performance.

The subsequent sections of this article are organized as follows: with the second section providing background information. In the third section, the proposed method is introduced. In the fourth section, the classification performance of the proposed method is verified through a large number of experiments. In the fifth section, the conclusion is given.

Background

Support matrix machine

SMM is based on the principle of minimizing nuclear norm and uses the sum of singular values of the matrix as the low rank approximation of the matrix. SMM takes matrix sample ${Z_{i} | i = 1, 2, \dots, n}$ as input and establishes an optimization function as shown in Equation (1):

\begin{matrix} min_{W, b} \frac{1}{2} ‖ W ‖ + τ ‖ W ‖_{*} + k \sum_{i = 1}^{n} η_{i} \\ s . t . y_{i} [tr (W^{T} Z_{i}) + b] + η_{i} \geq 0, η_{i} \geq 0 \end{matrix}

(1)

where $W$ is the weight matrix, $b$ is the threshold, $η_{i}$ is the relaxation variable, $k$ is the loss coefficient, and $τ$ is the kernel norm coefficient.

By solving the optimization function and obtaining the weight matrix and threshold, a decision function is constructed for unknown samples as shown in Equation (2):

\hat{y} = sign (W^{T} \hat{Z} + b)

(2)

Multitask nonparallel support vector machine

In the tth task, $a_{+ t}$ represents the positive class samples and $b_{- t}$ represents the negative class samples. All samples in the positive class are represented as $a_{+}$ , all samples in the negative class are represented as $b_{-}$ , and $e$ are represented as column vectors of any dimension. Then, define

{\begin{matrix} A_{t} = (a_{+ t}, e), A = (a_{+}, e) \\ B_{t} = (b_{- t}, e), B = (b_{-}, e) \end{matrix}

(3)

MTNPSVM assumes that all tasks have common information $u = (w_{+}; b_{+})$ and $v = (w_{-}; b_{-})$ , where $u_{t}$ and $v_{t}$ are individual information for each task. The objective function is constructed as shown in Equations (4) and (5).

\begin{matrix} min_{u, u_{t}, ζ_{t} ζ_{t}^{*}, η_{t}} \frac{1}{2 T} \sum_{t = 1}^{T} ‖ u_{t} ‖ + \frac{ρ_{1}}{2} ‖ u ‖ + k_{1} \sum_{t = 1}^{T} e_{1 t}^{T} (ζ_{t} + ζ_{t}^{*}) \\ + k_{2} \sum_{t = 1}^{T} e_{2 t}^{T} η_{t} \\ s . t . - ε e_{1 t} - ζ_{t}^{*} \leq A_{t} (u + u_{t}) \leq ε e_{1 t} + ζ_{t} \\ - B_{t} (u + u_{t}) \geq e_{2 t} - η_{t}, η_{t}, ζ_{t}, ζ_{t}^{*} \geq 0, \\ t = 1, 2, \dots, T . \end{matrix}

(4)

\begin{matrix} min_{v, v_{t}, η_{t} η_{t}^{*}, ζ_{t}} \frac{1}{2 T} \sum_{t = 1}^{T} ‖ v_{t} ‖ + \frac{ρ_{2}}{2} ‖ v ‖ + k_{2} \sum_{t = 1}^{T} e_{2 t}^{T} (η_{t} + η_{t}^{*}) \\ + k_{4} \sum_{t = 1}^{T} e_{1 t}^{T} ζ_{t} \\ s . t . - ε e_{2 t} - η_{t}^{*} \leq B_{t} (v + v_{t}) \leq ε e_{2 t} + η_{t} \\ A_{t} (v + v_{t}) \geq e_{1 t} - ζ_{t}, ζ_{t}, η_{t}, η_{t}^{*} \geq 0, \\ t = 1, 2, \dots, T . \end{matrix}

(5)

where $ρ_{t}$ is adjustment parameter, $k_{i} (i = 1, 2, 3, 4)$ is the loss coefficient, $η_{t}^{(*)}$ and $ζ_{t}^{(*)}$ are the relaxation variable, and $T$ is the number of tasks.

By solving the above two QPPs, the weight and threshold of each task can be determined. Finally, the class label of the unknown sample for the tth task is determined by the decision function Equation (6).

class i = \arg min_{r = +, -} | x^{T} w_{r} + b_{r} |

(6)

Although SMM capitalizes on the inherent structural correlations within matrix-form data, it is sensitive to noise, and solving it through ADMM algorithm results in longer model training time. MTNPSVM can achieve multitask classification, but it lacks sufficient extraction of implicit information between tasks and flexibility in handling noisy samples. Therefore, a new classification method is proposed that can address the shortcomings of SMM and MTNPSVM, while also having superior classification performance.

The proposed method

Multitask soft-margin twin matrix machine

In this section, a multitask matrix classifier is proposed, and its classification principle is shown in Figure 1.

Figure 1.

The classification principle of MTSTMM.

Assuming there are T distinct yet interrelated diagnostic tasks. The sample of each task is $X_{t} \in R^{m \times n}$ , and $y \in {1, 2, . ., z}$ is corresponding label. Supposing that all tasks have common information $[w_{0}^{+}, b_{0}^{+}]$ and $[w_{0}^{-}, b_{0}^{-}]$ , and the personality information of the tth task is $[w_{t}^{+}, b_{t}^{+}]$ and $[w_{t}^{-}, b_{t}^{-}]$ . First, a kernel technique is designed to process matrix samples, and defined as follows:

Define: Assuming there are two types of matrix sample datasets, namely X and Y. $x_{i}, y_{j} \in R^{m \times n}, i, j \in {1, 2, . ., N}$ is matrix sample from these two datasets, respectively. The definition of kernel technique is shown in Equation (7), where k is the nonlinear offset.

K_{ij} (x_{i}, y_{j}) = (x_{i}^{T} y_{j} + k)^{d}

(7)

Thus, Equation (8) is constructed as follows: where $A_{t}$ represents the positive class sample of the tth task, and $A$ represents all task positive sample. $B_{t}$ and $B$ represent the negative class samples of the tth task and all tasks, $C = [A; B]$ .

{\begin{matrix} L = K (A, C), L_{t} = K (A_{t}, C) \\ M = K (B, C), M_{t} = K (B_{t}, C) \end{matrix}

(8)

In addition, to make the model insensitive to noise and sparse, a sparse ball constraint term is introduced. Its definition is shown as Equation (9). Under these constraints, the model has more flexible classification performance when dealing with outliers or noise. Besides, MTSTMM elevates the model’s out-of-distribution generalization capacity by increasing the tolerance for misclassification, and the classification model of MTSTMM is shown in Figure 2.

L_{τ}^{ε} (h) = {\begin{matrix} h - ε, h > ε, \\ 0, ε \geq h \geq - \frac{ε}{τ}, \\ - τ (h + \frac{ε}{τ}), h < - \frac{ε}{τ} . \end{matrix}

(9)

where $τ$ is the parameter of the constraint term.

Figure 2.

The classification model of MTSTMM.

Due to the zero gradient of $L_{τ}^{ε} (h)$ in interval $[\frac{ε}{τ}, ε]$ , the model exhibits a certain degree of sparsity. Finally, the objective function for constructing MTSTMM is as follows:

\begin{matrix} min_{w_{0, t}^{+}, b_{0, t}^{+}, η_{t}} \frac{1}{2} ‖ {Lw}_{0}^{+} + {eb}_{0}^{+} ‖ + \frac{1}{2 T} \sum_{t = 1}^{T} ‖ L_{t} w_{t}^{+} + e_{t} b_{t}^{+} ‖ + \frac{c_{3}}{2} [‖ w_{0}^{+} ‖ + b_{0}^{+ 2} + \frac{1}{T} \sum_{t = 1}^{T} (‖ w_{t}^{+} ‖ + {b_{t}^{+}}^{2})] + c_{1} \sum_{t = 1}^{T} e_{t}^{T} η_{t} \\ s . t . - [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+})] + η_{t} + e_{t} ε \geq e_{t}, \\ - [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+})] \leq e_{t} + \frac{η_{t}}{τ} + e_{t} \frac{ε}{τ}, \\ η_{t} \geq 0 \end{matrix}

(10)

and

\begin{matrix} min_{w_{0, t}^{-}, b_{0, t}^{-}, η_{t}} \frac{1}{2} ‖ {Mw}_{0}^{-} + {eb}_{0}^{-} ‖ + \frac{1}{2 T} \sum_{t = 1}^{T} ‖ M_{t} w_{t}^{-} + e_{t} b_{t}^{-} ‖ + \frac{c_{4}}{2} [‖ w_{0}^{-} ‖ + b_{0}^{- 2} + \frac{1}{T} \sum_{t = 1}^{T} (‖ w_{t}^{-} ‖ + b_{t}^{- 2})] + c_{2} \sum_{t = 1}^{T} e_{t}^{T} η_{t} \\ s . t . L_{t} (w_{0}^{-} + w_{t}^{-}) + e_{t} (b_{0}^{-} + b_{t}^{-}) + η_{t} + e_{t} ε \geq e_{t}, \\ L_{t} (w_{0}^{-} + w_{t}^{-}) + e_{t} (b_{0}^{-} + b_{t}^{-}) \leq e_{t} + \frac{η_{t}}{τ} + e_{t} \frac{ε}{τ}, \\ η_{t} \geq 0 \end{matrix}

(11)

where $η_{t}$ is relaxation variables, $c_{1}$ and $c_{2}$ are loss coefficients, $\frac{c_{3}}{2} [‖ w_{0}^{+} ‖ + b_{0}^{+ 2} + \frac{1}{T} \sum_{t = 1}^{T} (‖ w_{t}^{+} ‖ + b_{t}^{+ 2})]$ and $\frac{c_{4}}{2} [‖ w_{0}^{-} ‖ + b_{0}^{- 2} + \frac{1}{T} \sum_{t = 1}^{T} (‖ w_{t}^{-} ‖ + b_{t}^{- 2})]$ are RMTLW terms, $c_{3}$ and $c_{4}$ are weighting factors.

Next, solve Equations (10) and (11) separately. First, the Lagrange multiplier method is introduced for Equation (10) and the Lagrange function is constructed as shown in Equation (12).

\begin{matrix} L (w_{0, t}^{+}, b_{0, t}^{+}, α_{1 t}, β_{1 t}, γ_{1 t}, ε) = \frac{1}{2} ‖ {Lw}_{0}^{+} + {eb}_{0}^{+} ‖ + \frac{1}{2 T} \sum_{t = 1}^{T} ‖ L_{t} w_{t}^{+} + e_{t} b_{t}^{+} ‖ + \frac{c_{3}}{2} [‖ w_{0}^{+} ‖ + {b_{0}^{+}}^{2} \\ + \frac{1}{T} \sum_{t = 1}^{T} (‖ w_{t}^{+} ‖ + {b_{t}^{+}}^{2})] + c_{1} \sum_{t = 1}^{T} e_{t}^{T} η_{t} - \sum_{t = 1}^{T} α_{1 t}^{T} [- [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+})] + η_{t} + e_{t} (ε - 1)] \\ - \sum_{t = 1}^{T} β_{1 t}^{T} [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+}) + \frac{η_{t}}{τ} e_{t} (1 + \frac{ε}{τ})] - \sum_{t = 1}^{T} γ_{1 t}^{T} η_{t} \end{matrix}

(12)

where $α_{1} = [α_{11}^{T}, α_{12}^{T}, \dots, α_{1 T}^{T}]^{T}$ , $β_{1} = [β_{11}^{T}, β_{12}^{T}, \dots, β_{1 T}^{T}]^{T}$ , and $γ_{1} = [γ_{11}^{T}, γ_{12}^{T}, \dots, γ_{1 T}^{T}]^{T}$ are nonnegative Lagrange multiplier. Equations (13) and (14) can be obtained by using the Karush–Kuhn–Tucker (KKT) condition:

{\begin{matrix} \frac{\partial L}{\partial w_{0}^{+}} = L^{T} ({Lw}_{0}^{+} + {eb}_{0}^{+}) + c_{3} w_{0}^{+} + M^{T} α_{1} - M^{T} β_{1} = 0 \\ \frac{\partial L}{\partial b_{0}^{+}} = {e^{+}}^{T} ({Lw}_{0}^{+} + {eb}_{0}^{+}) + c_{3} b_{0}^{+} + e^{T} α_{1} - e^{T} β_{1} = 0 \\ \frac{\partial L}{\partial w_{t}^{+}} = \frac{1}{T} L_{t}^{T} (L_{t} w_{t}^{+} + e_{t} b_{t}^{+}) + \frac{c_{3}}{T} w_{t}^{+} + M_{t}^{T} α_{1 t} - M_{t}^{T} β_{1 t} = 0 \\ \frac{\partial L}{\partial b_{t}^{+}} = \frac{1}{T} e_{t}^{+ T} (L_{t} w_{t}^{+} + e_{t} b_{t}^{+}) + \frac{c_{3}}{T} b_{t}^{+} + e_{t}^{T} α_{1 t} + e_{t}^{T} β_{1 t} = 0 \end{matrix}

(13)

{\begin{matrix} \frac{\partial L}{\partial η_{t}} = c_{1} e_{t} - α_{1 t} - \frac{β_{1 t}}{τ} - γ_{1 t} = 0 \\ α_{1 t}^{T} [- [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+})] + η_{t} + e_{t} (ε - 1)] = 0 \\ β_{1 t}^{T} [M_{t} (w_{0}^{+} + w_{t}^{+}) + e_{t} (b_{0}^{+} + b_{t}^{+}) + \frac{η_{t}}{τ} e_{t} (1 + \frac{ε}{τ})] = 0 \\ γ_{1 t}^{T} η_{t} = 0 \end{matrix}

(14)

Then, Equation (15) can be obtained from Equation (13):

{\begin{matrix} ([L^{T} e^{T}] [L e] + c_{3} E) [w_{0}^{+} b_{0}^{+}]^{T} + [M^{T} e^{T}] (α_{1} - β_{1}) = 0 \\ (\frac{1}{T} [L_{t}^{T} e_{t}^{T}] [L_{t} e_{t}] + c_{3} E_{t}) [w_{t}^{+} b_{t}^{+}]^{T} + [M_{t}^{T} e_{t}^{T}] (α_{1 t} - β_{1 t}) = 0 \end{matrix}

(15)

For convenience, the following substitutions are made

λ_{1} = α_{1} - β_{1}

(16)

{\begin{matrix} L^{*} = [L e], L_{t}^{*} = [L_{t} e_{t}] \\ M^{*} = [M e], M_{t}^{*} = [M_{t} e_{t}] \\ u = [w_{0}^{+}; b_{0}^{+}], u_{t} = [w_{t}^{+}; b_{t}^{+}] \\ v = [w_{0}^{-}; b_{0}^{-}], v_{t} = [w_{t}^{-}; b_{t}^{-}] \end{matrix}

(17)

Equation (15) can be rewritten as Equation (18).

{\begin{matrix} ({L^{*}}^{T} L^{*} + c_{3} E) u + {M^{*}}^{T} λ_{1} = 0 \\ (\frac{1}{T} {L^{*}}_{t}^{T} L_{t}^{*} + \frac{c_{3}}{T} E_{t}) u_{t} + {M^{*}}_{t}^{T} λ_{1 t} = 0 \end{matrix}

(18)

Because of $β_{1 t} \geq 0$ , $α_{1 t} + \frac{β_{1 t}}{τ} \leq c_{1} e_{t}$ can be obtained; and $β_{1} = α_{1} - λ_{1}$ can be determined from Equation (16). By combining the KKT condition and substituting Equation (18) into Equation (12), the dual problem of Equation (10) is obtained as follows:

\begin{matrix} max_{λ_{1}, α_{1}} - \frac{1}{2} λ_{1}^{T} [{M^{*}}^{T} ({L^{*}}^{T} L^{*} + c_{3} E)^{- 1} M^{*} + G] λ_{1} + λ_{1}^{T} e (1 + \frac{ε}{τ}) \\ - α_{1}^{T} e (\frac{ε}{τ} + ε) \\ s . t . α_{1} (1 + \frac{1}{τ}) - \frac{λ_{1}}{τ} \leq c_{1} e, α_{1} \geq 0, α_{1} - λ_{1} \geq 0 . \end{matrix}

(19)

where $G = blkdiag (G_{1,} G_{2}, \dots, G_{T}), G_{t} = M_{t}^{* T} (\frac{1}{T} L_{t}^{* T} L_{t}^{*} + \frac{c_{3}}{T} E_{t})^{- 1} M_{t}^{*}$ .

Similarly, the dual problem of Equation (11) can be obtained in Equation (20).

\begin{matrix} max_{λ_{2}, α_{2}} - \frac{1}{2} λ_{2}^{T} [{L^{*}}^{T} ({M^{*}}^{T} M^{*} + c_{4} E)^{- 1} L^{*} + H] λ_{2} + λ_{2}^{T} e (1 + \frac{ε}{τ}) \\ - α_{2}^{T} e (\frac{ε}{τ} + ε) \\ s . t . α_{2} (1 + \frac{1}{τ}) - \frac{λ_{2}}{τ} \leq c_{2} e, α_{2} \geq 0, α_{2} - λ_{2} \geq 0 . \end{matrix}

(20)

where $H = blkdiag (H_{1,} H_{2}, \dots, H_{T}), H_{t} = L_{t}^{* T} (\frac{1}{T} M_{t}^{* T} M_{t}^{*} + \frac{c_{3}}{T} E_{t})^{- 1} L_{t}^{*}$ .

Finally, the optimal solution of the optimization function is determined by solving two QPPs. For the tth task, the predicted label of the unknown sample is determined by Equation (21).

class z = \arg min_{r = 1, 2} | K {(\hat{x}, C)}^{T} w_{r} + b_{r} |

(21)

Noise insensitivity analysis

To demonstrate the robustness of the proposed method to noise, further analysis is conducted on the limiting terms. The subgradient of Equation (9) can be calculated as shown in Equation (22):

{sgn}_{τ}^{ε} (h) = {\begin{matrix} 1, h > ε, \\ [0, 1], h = ε, \\ 0, ε > h > - \frac{ε}{τ}, \\ [- τ, 0], h = - \frac{ε}{τ}, \\ - τ, h < - \frac{ε}{τ} . \end{matrix}

(22)

Obviously, Equation (10) has an optimal solution; therefore, for the tth task, Equation (23) can be rewritten as follows:

\vec{0} \in L_{t} (L_{t} w^{+} + e_{t} b^{+}) + w^{+} + c_{1} \sum_{i = 1}^{n} {sgn}_{τ}^{ε} (1 + (w^{T} x_{j} + b^{+})) x_{j}

(23)

where $\vec{0}$ is a vector with all elements being zero, $x_{j}$ is the negative class sample in task-t, $w^{+} = w_{0}^{+} + w_{t}^{+}$ , $b^{+} = b_{0}^{+} + b_{t}^{+}$ .

Next, the index set can be divided into five sets as follows:

{\begin{matrix} H_{1}^{w^{+}, b^{+}} = {i : 1 + ({w^{+}}^{T} x_{j} + b^{+}) > ε}, \\ H_{2}^{w^{+}, b^{+}} = {i : 1 + ({w^{+}}^{T} x_{j} + b^{+}) = ε}, \\ H_{3}^{w^{+}, b^{+}} = {i : i : - \frac{ε}{τ} < 1 + ({w^{+}}^{T} x_{j} + b^{+}) < ε}, \\ H_{4}^{w^{+}, b^{+}} = {i : i : 1 + ({w^{+}}^{T} x_{j} + b^{+}) = - \frac{ε}{τ}}, \\ H_{5}^{w^{+}, b^{+}} = {i : i : 1 + ({w^{+}}^{T} x_{j} + b^{+}) < - \frac{ε}{τ}} . \end{matrix}

(24)

From Equation (22), it can be seen that the subgradient in set $H_{3}^{w^{+}, b^{+}}$ are zero; therefore, $H_{3}^{w^{+}, b^{+}}$ directly affects the sparsity of the model, and $H_{3}^{w^{+}, b^{+}}$ depends on $ε$ . When $ε = 0$ , the model will lose its sparsity property. Assuming the existence of $σ_{j} \in [0, 1]$ and $φ_{j} \in [- τ, 0]$ , Equation (23) can be rewritten as Equation (25):

\begin{matrix} {\frac{1}{c_{1}}}_{t} [(L_{t} w^{+} + e_{t} b^{+}) + w^{+}] + \sum_{i \in H_{1}^{w^{+}, b^{+}}} x_{j} + \sum_{i \in H_{2}^{w^{+}, b^{+}}} σ_{j} x_{j} + \\ \sum_{i \in H_{4}^{w^{+}, b^{+}}} φ_{j} x_{j} - τ \sum_{i \in H_{5}^{w^{+}, b^{+}}} x_{j} = 0 \end{matrix}

(25)

If the value of $ε$ is fixed in the equation, then $τ$ determines the number of samples in the five sets. Obviously, the number of samples in $H_{1}^{w^{+}, b^{+}}$ , $H_{3}^{w^{+}, b^{+}}$ , and $H_{5}^{w^{+}, b^{+}}$ is greater than the number of samples in $H_{2}^{w^{+}, b^{+}}$ and $H_{5}^{w^{+}, b^{+}}$ . When $τ \to 0$ , a large number of samples are located in the set $H_{5}^{w^{+}, b^{+}}$ . This means that there are fewer samples in other sets, making the results sensitive to noise. Therefore, when $τ$ gets larger, all five sets contain many samples, making the results less sensitive to noise.

Experimental analysis

To verify the effectiveness of MTSTMM method, experiments are conducted using the rolling bearing and gear datasets from Shandong University of Science and Technology (SDUST) and the datasets from Anhui University of Technology (AHUT). In addition, to evaluate the superiority of MTSTMM, several assessment criteria are introduced to estimate its classification precision, namely accuracy (Acc), kappa (Kap), sensitivity (Sens), specification (Spec), area under curve (AUC), and G-mean (G-m). To display the diagnostic performance of MTSTMM more clearly, Figure 3 shows the modeling flowchart of MTSTMM, and compare it with multitask classification methods such as Multi-task twin bounded support vector machine (MT-TBSVM),²³ MTNPSVM, MTLS-TWSVM, as well as single task classification methods such as SSMM, NPLSSMM, and dual path convolution with attention mechanism-Bidirectional Gated Recurrent Unit (DCA-BiGRU).²⁴

Figure 3.

The modeling flowchart of MTSTMM.

Due to the fact that both the proposed method and the single-task algorithm classifiers (excluding DCA-BiGRU) are matrix classifiers, it is necessary to construct matrix datasets, and the process of signal acquisition is easily affected by noise. To diminish the detrimental effect of noise on the classification outcomes, the symplectic geometry similarity transform (SGST)¹⁸ is introduced to preprocess the initial signal. The fault diagnosis flowchart based on MTSTMM is shown in Figure 4.

Figure 4.

Fault diagnosis flowchart based on MTSTMM.

Experiment 1

To verify the classification performance of MTSTMM, multitask datasets is constructed using the rolling bearing gear dataset of SDUST, with specific parameters shown in Tables 1 and 2. The experimental components of this dataset are 6205 bearings and planetary gearboxes (including sun gears and planetary gears), with a sampling time of 40 s. The experimental platform is shown in Figure 5. Two hundred samples are constructed for each fault category, of which the training set is composed of 100 samples, and the testing set is composed of an equal number.

Table 1.

Gear dataset for task 1.

Speed	Load	Fault type
2000 rpm	0.5 A	Normal state (NS)
		Sun pitting (SP)
		Sun fracture (SF)
		Sun wear (SW)
		Planetray pitting (PP)
		Planetray fracture (PF)
		Planetray wear (PW)

Table 2.

Bearing dataset for task 2.

Speed	Load	Fault type
3000 rpm	40 N	Normal state (NS)
		Inner ring fault 0.2 (IF1)
		Inner ring fault 0.4 (IF2)
		Inner ring fault 0.6 (IF3)
		Outer ring fault 0.2 (OF1)
		Outer ring fault 0.4 (OF2)
		Outer ring fault 0.6 (OF3)

Figure 5.

SDUST test bench. SDUST: Shandong University of Science and Technology.

The parameters $c_{1}, c_{2}, c_{3}, c_{4}, ε$ and $τ$ directly affect the classification performance of the proposed method, so appropriate parameters need to be set before model training. For convenience, let $c_{1} = c_{2}, c_{3} = c_{4}$ , whose values are chosen from set ${10^{i} : i = - 5, - 4, - 3, . . ., + 3}$ . Since the values of $τ$ and $ε$ mainly affect the sensitivity of the model to noise, and the original signal has already been denoised using the SGST method; thus, $τ = 0.01, ε = 0.01$ are fixed here. $c_{3}$ is the weighting factor between structural risk and empirical risk, which avoids singular matrix calculations in the calculation and directly affects the accuracy of model classification. $c_{1}$ is the loss coefficient, which directly affects the optimization problem of the model. Therefore, the impacts of $c_{1}$ and $c_{3}$ on the classification results are verified using gear and bearing datasets, respectively, as shown in Figure 6. It can be seen that the larger the value of $c_{3}$ , the lower its classification accuracy. As the value of $c_{1}$ increases, classification accuracy first increases and then decreases; therefore, the optimal parameter values are determined as $c_{1} = 10^{- 2}, c_{3} = 10^{- 5}$ . In addition, fix other parameters and change the values of k and d in the kernel technique separately, and the optimal parameters based on the classification results are determined. Here, set $k = 1, d = 4$ .

Figure 6.

Parameter optimization results. (a) The results of the gear dataset. (b) The results of the rolling bearing dataset.

Similarly, the optimal parameters for the comparison methods are set as shown in Table 3. Among them, DCA-BiGRU uses hyperparameter grid search to determine the optimal parameters. Then, the datasets of tasks 1 and 2 are input into various methods, and the confusion matrix of classification results are shown in Figure 7. First, compared to multitasking methods, the proposed method achieved the accuracy of over 98% in both tasks, indicating its extremely excellent classification performance. From Figure 7, Only seven samples are misclassified in task 1, and only eight samples are misclassified in task 2. Although MT-TBSVM and MTNPSVM achieve the accuracy of over 97% in task 2, it has a large number of misclassifications in task 1. MTLS-TWSVM performed well in task 1, but the result is significantly lower in task 2. From the experimental results, when performing multitask classification, there will always be a situation where the classification result of a certain task is poor. The reason is that these methods are all vector classifiers, which lead to the destruction of the structural information of the data when processing matrix samples. Therefore, the classification performance of the proposed method is superior to MT-TBSVM, MTNPSVM, and MTLS-TWSVM.

Table 3.

Parameter setting.

Method	Parameters
MT-TBSVM	$c_{1} = c_{2} = 0.01, c_{3} = c_{4} = 0.01, ρ = 5$
MTNPSVM	$c_{1} = 1, c_{2} = 0.5, ρ = 5$
MTLS-TWSVM	$c_{1} = c_{2} = 0.01, ρ = 5$
SSMM	Task 1	$τ = 0.01, γ = 0.001$
SSMM	Task 2	$τ = 0.01, γ = 0.5$
NPLSSMM	Task 1	$γ_{1} = γ_{2} = 12, τ = 0.1$
NPLSSMM	Task 2	$γ_{1} = γ_{2} = 12, τ = 0.1$
DCA-BiGRU	Task 1	$Learning rate = 1 0^{- 3}, weight decay = 1 0^{- 4}, epochs = 100$
DCA-BiGRU	Task 2	$Learning rate = 10^{- 3}, weight decay = 1 0^{- 4}, epochs = 100$

TWSVM: multitask least squares twin support vector machine; MTNPSVM: multitask nonparallel support vector machine; SSMM: sparse support matrix machine; NPLSSMM: nonparallel least squares support matrix machine.

Figure 7.

Classification confusion matrix of six methods.

Next, compared with single-task classification methods, SSMM and NPLSSMM are both matrix classifiers that can protect the internal information of raw data. The classification accuracies of SSMM are 95.43 and 94.86%, respectively, while the classification accuracies of NPLSSMM are 96.29 and 95.14%, respectively. It appears that the classification performance of NPLSSMM is superior to SSMM. The reason is that SSMM constructs a pair of parallel hyperplanes, which limit its classification performance. NPLSSMM improves its classification efficiency by addressing two smaller QPPs. However, due to the fact that SSMM and NPLSSMM are both based on single-task learning, the information they utilize is limited. DCA-BiGRU, as a deep learning method, can use dual path convolution with attention mechanism (DCA) to extract fusion features of vibration signals, but it can only be based on single-task feature information fusion.

In addition, Table 4 provides the total time for different methods to classify the two tasks, and the training time of MTSTMM is relatively short. Owing to the fact that MT-TBSVM, MTNPSVM, and MTLS-TWSVM process vector data with smaller dimensions, their classification time is shorter. NPLSSMM and SSMM both require iterative computation, resulting in longer training times. Although MTSTMM processes matrix samples, it avoids iterative algorithms through kernel techniques, thereby reducing training time. DCA-BiGRU requires multiple training sessions to achieve stable classification accuracy, so the training time is relatively long.

Table 4.

Training time of six models.

Method	Time (s)	Method	Time (s)
MTSTMM	1.7850	SSMM	106.1883
MT-TBSVM	1.1875	NPLSSMM	9.7444
MTNPSVM	2.8119	DCA-BiGRU	752.0204
MTLS-TWSVM	1.2213	—	—

MTNPSVM: multitask nonparallel support vector machine; MTSTMM: multitask soft-margin twin matrix machine; NPLSSMM: nonparallel least squares support matrix machine; SSMM: sparse support matrix machine; MTLS-TWSVM: multitask least squares twin support vector machine.

MT-TBSVM is essentially an extension of TSVM, with a computational complexity of $O (l^{3} / 4)$ . The calculation process of MTNPSVM involves matrix dimension calculation, and its computational complexity is $O (r (2 p + q)^{3})$ , where p and q are the numbers of positive and negative samples, respectively. MTLS-TWSVM designs a least squares algorithm that avoids matrix inversion, resulting in shorter training time. SSMM and NPLSSMM require multiple iterations to achieve algorithm convergence due to the presence of kernel norm, resulting in longer training time. The kernel technique and soft-margin model of MTSTMM do not increase the computational complexity of the model, and its main computational cost lies in solving the QPP twice. Therefore, the computational costs of MTSTMM and MT-TBSVM are basically the same, with a computational complexity of $O (l^{3} / 4)$ .

To further testify the performance of the MTSTMM method, the datasets of tasks 1 and 2 are randomly shuffled. The dataset partitioning scheme is reconfigured to ensure rigorous validation, with redefining samples applied to both training and testing subsets, repeating the experiment five times, and the results are shown in Tables 5 and 6. It can be seen that MTSTMM has the highest evaluation metrics in both tasks 1 and 2, proving that the classification precision of MTSTMM is brilliant. The reason is that the designed kernel technique can fully utilize the internal information of matrix samples and can mine the correlation information between tasks in MTL algorithms. Although MT-TBSVM, MTNPSVM, and MTLS-TWSVM are also MTL algorithms, they all vectorize matrix samples, resulting in the loss of structural information in the samples. SSMM and NPLSSMM are both based on single-task learning algorithms, which cannot utilize the common information between tasks, thus limiting their classification performance. DCA-BiGRU requires more computing resources during the training process, which may to some extent limit its classification performance. In addition, the standard deviation of the proposed algorithms is almost always the smallest, verifying its more stable classification performance.

Table 5.

The results for task 1.

	Acc	Kap	Sens	Spec	AUC	G-m
MTSTMM	0.9965 ± 0.0042	0.9957 ± 0.0050	0.9965 ± 0.0042	0.9992 ± 0.0009	0.9978 ± 0.0025	0.9978 ± 0.0025
MT-TBSVM	0.9332 ± 0.0082	0.9213 ± 0.0092	0.9332 ± 0.0082	0.9888 ± 0.0013	0.9607 ± 0.0046	0.9603 ± 0.0048
MTNPSVM	0.9529 ± 0.0067	0.9450 ± 0.0078	0.9589 ± 0.0127	0.9921 ± 0.0011	0.9840 ± 0.0026	0.9839 ± 0.0058
MTLS-TWSVM	0.9732 ± 0.0102	0.9680 ± 0.0116	0.9732 ± 0.0102	0.9954 ± 0.0017	0.9840 ± 0.058	0.9839 ± 0.0058
SSMM	0.9511 ± 0.0045	0.9430 ± 0.0052	0.9511 ± 0.0045	0.9919 ± 0.0007	0.9715 ± 0.0026	0.9713 ± 0.0026
NPLSSMM	0.9557 ± 0.0088	0.9483 ± 0.0102	0.9557 ± 0.0088	0.9926 ± 0.0015	0.9742 ± 0.0051	0.9740 ± 0.0052
DCA-BiGRU	0.9642 ± 0.0182	0.9576 ± 0.0199	0.9653 ± 0.0181	0.9943 ± 0.0042	0.9788 ± 0.0100	0.9789 ± 0.0140

MTSTMM: multitask soft-margin twin matrix machine; MTLS-TWSVM: multitask least squares twin support vector machine; MTNPSVM: multitask nonparallel support vector machine; SSMM: sparse support matrix machine; NPLSSMM: nonparallel least squares support matrix machine.

Table 6.

The results for task 2.

	Acc	Kap	Sens	Spec	AUC	G-m
MTSTMM	0.9967 ± 0.0050	0.9960 ± 0.0060	0.9967 ± 0.0050	0.9993 ± 0.0010	0.9980 ± 0.0030	0.9980 ± 0.0030
MT-TBSVM	0.9620 ± 0.0111	0.9561 ± 0.0129	0.9620 ± 0.0111	0.9937 ± 0.0019	0.9778 ± 0.0065	0.9777 ± 0.0066
MTNPSVM	0.9754 ± 0.0057	0.9713 ± 0.0066	0.9754 ± 0.0057	0.9959 ± 0.0009	0.9857 ± 0.0033	0.9856 ± 0.0033
MTLS-TWSVM	0.8837 ± 0.0260	0.8643 ± 0.0304	0.8837 ± 0.0260	0.9806 ± 0.0044	0.9302 ± 0.0153	0.9308 ± 0.0159
SSMM	0.9380 ± 0.0051	0.9277 ± 0.0060	0.9380 ± 0.0051	0.9897 ± 0.0009	0.9638 ± 0.0030	0.9635 ± 0.0031
NPLSSMM	0.9780 ± 0.0157	0.9743 ± 0.0183	0.9780 ± 0.0157	0.9963 ± 0.0026	0.9872 ± 0.0091	0.9871 ± 0.0092
DCA-BiGRU	0.9708 ± 0.0231	0.9667 ± 0.0264	0.9708 ± 0.0231	0.9953 ± 0.0037	0.9801 ± 0.0133	0.9810 ± 0.0144

To systematically evaluate the noise robustness of the proposed methodology, 5 dB of noise is added to both datasets mentioned above. Then, the experiments are repeated with samples containing noise, and the results are shown in Figure 8.

Figure 8.

Results for faulty datasets containing noise. (a) Results for task 1. (b) Results for task 2.

Comparing the classification accuracy with and without noise, the recognition rates of MTSTMM, MT-TBSVM, MTNPSVM, MTLS-TWSVM, SSMM, NPLSSMM, and DCA-BiGRU are reduced by 1.43, 6.00, 5.57, 3.71, 2.43, 3.58, and 3.28% in tasks 1, and 2.43, 1.57, 2.14, 1.29, 2.57, 1.00, and 3.55% in task 2, respectively. Although the classification accuracy of the comparative methods decreases slightly in task 2, it significantly decreases in task 1. The proposed method demonstrates robust classification ability in both tasks 1 and 2, as the introduction of sparse pinball constraint term can improve the sparsity property and robustness to noise of the model. From Figure 7, the proposed method has the highest evaluation metrics in both datasets, indicating its superior performance. The reason behind this is that MTSTMM has superior flexibility in handling noisy samples by establishing a soft-margin model, which is lacking in comparison methods. MT-TBSVM, MTNPSVM, and MTLS-TWSVM always have poor classification results in one of these two tasks. The reason is that MT-TBSVM and MTLS-TWSVM introduce hinge loss, which is sensitive to noise, thus limiting their classification performance. MTNPSVM constructs two nonparallel hyperplanes, which cannot strengthen the model’s resilience against noise. SSMM improves its sparsity by designing low rank loss, but lacks robustness. NPLSSMM essentially belongs to a semihard margin model by constructing two hyperplanes, which result in the inability to flexibly handle noisy data. Although DCA-BiGRU has certain noise resistance performance, it assumes that all sample weights are equal when modeling and does not explicitly distinguish between noise and normal data.

Experiment 2

To further validate the multitask diagnostic capability of MTSTMM, experimental verification is executed to use data from the AHUT test bench, as shown in Figure 9. In the experiment, the load is set to 2 kN, and the motor speed is set to 1200 rpm. The multitask datasets are constructed using vibration signals from four states of rolling bearings and gears. In task 1, the rolling bearing fault states include inner ring ball failure, inner and outer ring ball failure (OBF), inner and outer ring ball failure, and OBF. In task 2, the gear fault states include pitting, peeling, tooth breakage, and cracks. Each fault state is divided into 1024 sampling points to obtain 200 matrix samples, with the samples in the training and testing sets consistent with experiment 1.

Figure 9.

AHUT test bench.

After adding 5 dB noise to the above two datasets, six methods are tested and trained, and the results of each method are shown in Figure 10. From Figure 10, the proposed method achieves classification accuracy of over 94% in both tasks under noisy conditions, indicating that MTSTMM has stable multitask diagnostic performance. The accuracy of MTLS-TWSVM reached 95% in task 1, but the result is lower in task 2. The reason is that MTSTMM constructs soft-margin model to diminish the model’s responsiveness to noise and preserve sparsity, while MT-TBSVM, MTLS-TWSVM, and MTNPSVM are all sensitive to noise points, resulting in inferior classification performance compared to the proposed method. In addition, the setting of RMTLW term avoids overfitting during model training, thereby significantly improving the diagnostic efficiency of the proposed method. SSMM has improved its classification performance to some extent by relying on its sparsity, but lacks robustness to noise. NPLSSMM improves its computational efficiency by designing matrix squared loss, but it does not reduce the sensitivity to noise. Due to the high complexity and strong fitting ability of the DCA-BiGRU model, it is easy to mistake the noise in the training data for effective features for learning, resulting in a slightly decrease in the classification performance.

Figure 10.

Experimental results of AHUT dataset under noise. (a) The result of task 1. (b) The result of task 2.

To avoid randomness, the above two datasets are validated five times, and the accuracy results of each experiment are shown in Figure 11 and Table 7. From Figure 11, the first group of experiments had slightly inferior results compared to SSMM in task 1, but in the other four groups of experiments, the classification results are the best. In task 2, the results of each experiment are superior to other methods. The average accuracy of the methods proposed in both tasks is the highest from Table 7, indicating their effectiveness in multitask fault diagnosis. Meanwhile, its standard deviation of classification accuracy in the two datasets is the smallest, indicating that it has more stable diagnostic performance. The reason behind this is that the soft-margin set by MTSTMM gives the model greater flexibility in handling noisy data. Moreover, the setting of RMTLW is based on minimizing structural risk, which can significantly enhance the computational accuracy of the model, and MTSTMM can more fully explore the structural information of multitarget samples through kernel techniques, achieving a certain degree of information enhancement.

Figure 11.

Five verification results. (a) Five verification results for task 1. (b) Five verification results for task 2.

Table 7.

Average accuracy and standard deviation.

	Task 1	Task 2
MTSTMM	95.15 ± 0.91%	93.35 ± 0.52%
MT-TBSVM	93.65 ± 0.96%	89.10 ± 1.01%
MTNPSVM	94.05 ± 1.05%	91.40 ± 1.28%
MTLS-TWSVM	89.65 ± 3.06%	79.50 ± 3.64%
SSMM	94.90 ± 0.98%	89.20 ± 0.76%
NPLSSMM	94.65 ± 1.05%	92.20 ± 0.65%
DCA-BiGRU	93.05 ± 0.93%	91.55 ± 2.08%

Statistical analysis

To further evaluate the differences between the MTSTMM method and other methods, Friedman test and Nemenyi test^25,26 are used for analysis, and the calculation formulas for Friedman test and Nemenyi test are given by Equations (26) and (27), respectively.

χ_{F}^{2} = \frac{12 N}{n} (\sum_{i = 1}^{n} r_{i}^{2} - \frac{n {(n + 1)}^{2}}{4})

(26)

CD = q_{α} \sqrt{\frac{n (n + 1)}{6 N}}

(27)

where N is the number of experiments, n is the number of methods, $r_{i}$ is the average rank of the ith method, and $q_{α}$ is the critical value for Nemenyi test.

The five validation results from experiments 1 and 2 are sorted, and the average rank of each method is determined as shown in Table 8. When the degrees of freedom are $n - 1 = 6$ and $(n - 1) (N - 1) = 114$ , the value of $χ_{F}^{2}$ is greater than its critical value. Therefore, the null hypothesis of “no significant difference” is rejected. Next, Nemenyi test is used for subsequent validation. Figure 12 shows the CD encoding diagram of Nemenyi test, and the methods within the red line represent no significant differences before the method. It is obvious that there are significant differences between the proposed method and other methods. Therefore, the classification performance of MTSTMM is significantly better than MT-TBSVM, MTNPSVM, MTLS-TWSVM, SSMM, NPLSSMM, and DCA-BiGRU.

Table 8.

The average rank of six methods.

Method	MTSTMM	MT-TBSVM	MTNPSVM	MTLS-TWSVM	SSMM	NPLSSMM	DCA-BiGRU
Average rank	1.1	5.325	4.125	5.825	4.4	3.325	3.9

Figure 12.

Nemenyi test CD encoding diagram.

Conclusion

This article proposes a multitask mechanical fault diagnosis method based on MTSTMM and applies it to multiple experiments for verification. MTSTMM improves its flexibility in handling noisy samples by constructing a soft-margin model, resulting in superior classification performance compared to the aforementioned comparison methods. Besides, by defining RMLTW, the problems of matrix singularity in computation and model overfitting during training can be avoided, which effectively improves the classification precision of the algorithm. Moreover, implicit information between multiple tasks is effectively utilized by MTSTMM, which SSMM and NPLSSMM do not possess. In addition, the design of kernel techniques enables MTSTMM to solve only two QPPs instead of iterative calculations, improving its computational efficiency, which makes the model faster in training than SSMM and NPLSSMM. Thus, compared to MT-TBSVM, MTNPSVM, MTLS-TWSVM, SSMM, NPLSSMM, and DCA-BiGRU methods, MTSTMM has more superior multitask diagnostic performance.

MTSTMM demonstrates superior diagnostic performance in both datasets, but faced challenges in deploying in real-world industrial environments such as small sample sizes, weak correlation feature multitasking data, and so on. Therefore, further optimizing the model to improve the practicality of the proposed method is one of the future works.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Anhui Provincial Natural Science Foundation (no. 2408085ME113) and the Outstanding Youth Fund of Universities in Anhui Province (no. 2022AH020032).

ORCID iD

Haiyang Pan

References

Liu

Wang

Shi

Dynamic modelling of the defect extension and appearance in a cylindrical roller bearing. Mech Syst Signal Process 2022; 173: 109040.

Zhang

Zhou

Huang

, et al. Fault diagnosis of rotating machinery based on recurrent neural networks. Measurement 2021; 171: 108774.

Zhu

Lei

, et al. A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Measurement 2023; 206: 112346.

Yang

Wang

XJ.

Artificial neural networks for neuroscientists: a primer. Neuron 2020; 107(6): 1048–1070.

Linka

Kuhl

A new family of constitutive artificial neural networks towards automated model discovery. Comput Methods Appl Mech Eng 2023; 403: 115731.

Cheng

Lin

, et al. Intelligent fault diagnosis of rotating machinery based on continuous wavelet transform-local binary convolutional neural network. Knowl Based Syst 2021; 216: 106796.

Ganaie

Tanveer

Suganthan

, et al. Oblique and rotation double random forest. Neural Netw 2022; 153: 496–517.

Yeşilkanat

CM.

Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm. Chaos Solitons Fract 2020; 140: 110210.

Zhu

Liu

Zhou

, et al. Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection. Neurocomputing 2020; 407: 50–62.

10.

Shariati

Mafipour

Ghahremani

, et al. A novel hybrid extreme learning machine–grey wolf optimizer (ELM-GWO) model to predict compressive strength of concrete with partial replacements for cement. Eng Comput 2020; 38: 757–779.

11.

Cervantes

Garcia-Lamont

Rodríguez-Mazahua

, et al. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 2020; 408: 189–215.

12.

Dai

Zhao

A hybrid load forecasting model based on support vector machine with intelligent methods for feature selection and parameter optimization[J]. Applied energy, 2020, 279: 115332.

13.

Yan

Xiaodong

Qian Justin

Jiahui

, et al. Reconfigurable mixed-kernel heterojunction transistors for personalized support vector machine classification. Nat Electron 2023; 6(11): 862–869.

14.

Kumari

Akhtar

Shah

, et al. Support matrix machine: a review. Neural Netw 2024; 181: 106767.

15.

Pan

Zheng

, et al. Multi-class fuzzy support matrix machine for classification in roller bearing fault diagnosis. Adv Eng Inform 2022; 51: 101445.

16.

Zheng

Zhu

Qin

, et al. Sparse support matrix machine. Pattern Recognit 2018; 76: 715–726.

17.

Sheng

Pan

, et al. Fault diagnosis method of rolling bearing based on transfer least squares support matrix machine. J Vib Shock 2022; 41(21): 53–59.

18.

Pan

Zheng

An intelligent fault diagnosis method for roller bearing using symplectic hyperdisk matrix machine. Appl Soft Comput 2021; 105: 107284.

19.

Yang

Pan

, et al. Non-parallel least squares support matrix machine for rolling bearing fault diagnosis. Mech Mach Theory 2020; 145: 103676.

20.

Zhang

Dong

, et al. Multi-task support vector machine with pinball loss. Eng Appl Artif Intell 2021; 106: 104458.

21.

Mei

Multi-task least squares twin support vector machine for classification. Neurocomputing 2019; 338: 26–33.

22.

Liu

Multi-task nonparallel support vector machine for classification. Appl Soft Comput 2022; 124: 109051.

23.

Liu

Multi-task twin bounded support vector machine and its safe screening rule. Appl Soft Comput 2023; 138: 110188.

24.

Zhang

, et al. Fault diagnosis for small samples based attention mechanism. Measurement 2022; 187: 110242.

25.

Zhao

, et al. Usage of correlation analysis and hypothesis test in optimizing the gated recurrent unit network for wind speed forecasting. Energy 2022; 242: 122960.

26.

Pan

Zheng

, et al. Non-parallel bounded support matrix machine and its application in roller bearing fault diagnosis. Inf Sci 2023; 624: 395–415.