Machinery fault diagnosis via an improved multi-linear subspace and locally linear embedding

Abstract

Traditional fault diagnosis methods mainly depend on the vector model to describe a signal, which will lead to information loss and the curse of dimensionality. In order to overcome these problems, in this paper an improved multi-linear subspace (MLS) method and locally linear embedding (LLE) are integrated (MLSLLE) to extract significant features. To obtain more information, first it is suggested that multiple sensors should be used to sample the vibration signal of a machine from different positions; then, these data are projected into different subspaces, where each sample is represented as a tensor form, respectively; finally, higher-order singular value decomposition and LLE are introduced to extract significant features. Thus a fault diagnosis method is proposed based on MLSLLE and support vector machines. The advantages of the proposed fault diagnosis method are validated by two real bearing data sets.

Keywords

Fault diagnosis locally linear embedding support vector machine tensor learning higher-order singular value decomposition

Introduction

In real-world applications, it is crucial for industrial production to monitor machines’ running states in real time (Li et al., 2016; Yin et al., 2015a). When a machine has faults, a fault diagnosis system can automatically notify the operators to correct the fault, reducing the harm to the industrial production. As we all know, because of a large amount of redundant information fault diagnosis has become far more challenging. Investigation of a highly efficient method to extract significant features from quantities of information has become an urgent issue.

Traditionally, most feature extraction methods focus on space transformation techniques (An and Tang, 2016; Du et al., 2016; Gao et al., 2016b; Yin et al., 2013), such as short-time Fourier transformation (Seryasat et al., 2012), wavelet transformation (WT) (Gaied, 2015), S transformation (ST) (Hao et al., 2016), and so on. The significant features are extracted in these transformed spaces, by which the class of a sample is recognized. However, these kinds of methods can only deal with simple data sets. In order to improve the capability to handle complex data sets, great effort has been made in manifold learning algorithms (Jialiang et al., 2016; Yin et al., 2012) which reduces the dimensionality by exploring the structure information of a data set. Manifold learning can be roughly divided into two categories: linear methods and nonlinear methods. Principal component analysis (PCA) (Zhou et al., 2014) is a classical linear algorithm. The advantage of PCA is that it can construct an explicit projection relationship between an original high-dimensional space and the corresponding embedding space, by which a new sample can be easily mapped into the low-dimensional space. Linear algorithms are not suitable for dealing with data sets whose distribution is nonlinear. Hence, some work has been done to improve the performance of linear algorithms Gao et al. (2016c). Gao constructs a physics model by the inductive thermography mechanism and utilizes sparse greedy-based PCA to deal with this model. Experimental tests have been conducted to verify the efficacy of the proposed method. Ding et al. (2016) propose locality sensitive batch feature extraction (LSBFE) by exploring both the local and the global discriminant structure of the data manifold, and a new gradient optimization model is proposed to obtain the final result. LSBFE has a strong capability to extract significant features of a data set, but LSBFE is a supervised method. When the labels of the data set are unknown, LSBFE will fail. Usually, a complex data set can be better treated by nonlinear manifold learning algorithms (Alkaya and Grimble, 2015; Ding et al., 2015; He, 2013; Wang and Yin, 2014; Yin et al., 2015b). Nonlinear methods explore local structures in the original space and compute the embedding result by these local structures. Locally linear embedding (LLE) (Cheng et al., 2016; Roweis and Saul, 2000) is an unsupervised nonlinear manifold learning algorithm. The computing speed of LLE is fast, and only two parameters need to be set, so LLE has been widely researched.

All the feature extraction methods mentioned above mainly depend on vector form. Vector form not only loses much significant information but also has the small sample size problem. Therefore, dimension reduction algorithms based on tensor form have greatly attracted some researchers within the past decade (Cichocki et al., 2015; Rong et al., 2012; Zhao and Wang, 2017). Tensor-based dimension reduction algorithms have been successfully applied in image processing (Vasilescu and Terzopoulos, 2003), target detection (Du and Zhang, 2014), and industrial process monitoring (Luo et al., 2014), among others (Liu et al., 2013). Many researchers have demonstrated that tensor-based dimension reduction algorithms can also be efficiently used in fault diagnosis (Liu et al., 2014). The multi-linear subspace (MLS) method is a type of classical dimension reduction method based on tensors (Luo et al., 2015; Zhang et al., 2015). As shown in Figure 1, the MLS method first describes a data set using tensor form; then, various decomposition technologies, such as Tucker (Louwerse and Smilde, 2000), higher-order singular value decomposition (HOSVD) (Afra and Gildin, 2016), multiple principal component analysis (MPCA) (Zhang et al., 2016) and parallel factor (PARAFAC) (Rasmus, 1997), are employed to project the data set to feature spaces. For instance, in order to obtain the most significant features, Nomikos and Macgregor (1995) allow MPCA to work directly on matrices (second-order tensors) and the experimental results illustrate that the performance of tensor-based methods is much better than that of vector-based methods. For inspecting a gear, Gao et al. (2016a) developed a physics-based multi-dimensional spatial transient stage tensor model to describe the thermo-optical flow pattern, and a canonical decomposition was introduced to deal with the tensor model. Tests of a helical gear with different cycles of contact fatigue are performed, and the result indicates that the proposed method is effective.

Figure 1.

Multi-linear subspace process.

Usually, the expressions of a signal are different in various subspaces. If we can simultaneously observe this signal from all the subspaces, it will improve the recognition accuracy. MLS can project a signal into several subspaces. Most importantly, with all the factors considered in each subspace, we can better observe the signal. Hence, in this paper, we propose a new feature extraction method called MLSLLE that constructs a tensor for each sample and decomposes these tensors by HOSVD respectively, based on which LLE is employed to obtain the final features. The rest of this paper is organized as follows: in ‘Feature extraction’, the process of MLSLLE is illustrated in detail. Based on MLSLLE, a highly efficient fault diagnosis method is presented in ‘Fault diagnosis’. Experiments on two real bearing data sets are reported in ‘Experiments’. Finally, the conclusion is given in the last section.

Feature extraction

Basically, a tensor is just a high-order matrix that can utilize most known information in a frame to describe a signal. It is beneficial for us to fully observe a signal using a tensor model. This is illustrated by Figure 2. In this paper, we will perform WT and ST on the data sampled from different sensors and construct a tensor for each sample.

Figure 2.

Comparing tensor form with vector form.

WT

WT is one of the classical signal processing tools, and is frequently used in analysing non-stationary signals. WT can project a signal into time–frequency space, where some significant features can be easily found. Let $X = [X_{1}, X_{2}, \dots, X_{N}] \in R^{D \times N}$ represent a data set and $X_{i} = [x_{1}, x_{2}, \dots, x_{D}]^{'} \in R^{D \times 1} (i = 1, 2, \dots, N$ ) denote the $i$ th sample, whose dimensionality is $D$ . WT projects a signal by a series of wavelet bases derived from a mother wavelet, and the projections can be computed by the inner product of signal $X_{i}$ and wavelet base $ψ_{u, s} (t - u)$ . A continual WT can be defined as (Bayindir, 2015)

\begin{matrix} W_{x}^{ψ} (u, s) = < X_{i} (t), ψ_{u, s} (t) > = \int_{- \infty}^{\infty} X_{i} (t) \bar{ψ_{u, s} (t - u)} d t \\ = \int_{- \infty}^{\infty} X_{i} (t) \frac{1}{\sqrt{s}} \bar{ψ_{u, s} (\frac{t - u}{s})} d t \end{matrix}

(1)

where $s$ denotes a scale factor and $u$ is a shift factor. It is very expensive to directly compute equation (1), so a fast WT (FWT) was proposed by Mallar (1989). FWT applies a low-pass filter and a high-pass filter to a signal, by which the coefficient approximation $c_{1}$ and detail $d_{1}$ are generated, followed by $c_{2}$ and $d_{2}$ , and so on. The calculation formula of FWT can be presented as follows:

{\begin{matrix} c_{j} (i) = \frac{1}{2} \sum_{m} h (m - 2 i) c_{j - 1} (i) \\ d_{j} (i) = \frac{1}{2} \sum_{m} g (m - 2 i) d_{j - 1} (i) \end{matrix}

(2)

where $h$ denotes a low-pass filter, $g$ is a high-pass filter, $j$ represents a scale factor, and $m$ is the translation factor. It is noteworthy that $h$ and $g$ are all derived from a mother wavelet, so the property of the selected mother wavelet is very important for the decomposition result. For the detailed introduction to FWT, refer to the literature (Daubechies, 1990).

We project each sample to a time–frequency space by WT, and the $i$ th sample in the time–frequency space can be expressed as follows:

F^{i} = [\begin{matrix} f_{11}^{i} & f_{12}^{i} & \dots & f_{1 n_{1}}^{i} \\ f_{21}^{i} & f_{22}^{i} & \dots & f_{2 n_{1}}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ f_{D 1}^{i} & f_{D 2}^{i} & \dots & f_{D n_{1}}^{i} \end{matrix}]

(3)

where $F^{i}$ represents the wavelet coefficient matrix, and $n_{1}$ is the maximum decomposition level that can be decided by the Jarque–Bera test. Each column of $F^{i}$ denotes a different frequency range. The length of each frequency range equals the dimensionality of $X_{i}$ .

ST

ST is an extension of WT and it can overcome some disadvantages of WT. Unlike WT, ST can simultaneously provide the amplitude and the phase information, and, furthermore, ST is not sensitive to noise. ST can be defined as (Lin and Meng, 2011)

S (τ, f) = \int_{- \infty}^{\infty} X_{i} (t) \frac{| f |}{\sqrt{2} π} e^{\frac{{(τ - t)}^{2} f^{2}}{2}} e^{- j 2 π f t} d t

(4)

where $τ$ and $f$ represent time and frequency, respectively. We also project each sample of the original data set to another time–frequency space by ST, and the $i$ th sample in the space can be represented as follows:

S^{i} = [\begin{matrix} s_{11}^{i} & s_{12}^{i} & \dots & s_{1 n_{1}}^{i} \\ s_{21}^{i} & s_{22}^{i} & \dots & s_{2 n_{1}}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{D 1}^{i} & s_{D 2}^{i} & \dots & s_{D n_{1}}^{i} \end{matrix}]

(5)

where $S^{i}$ represents the projection coefficient matrix of the $i$ th sample, and $n_{1}$ is the number of selected frequency ranges. Each column of $S^{i}$ denotes a different frequency range. The length of each frequency range is equal to the dimensionality of $X_{i}$ .

Tensor construction and decomposition

In this case, based on the two time–frequency spaces built above, we construct a tensor model for each sample. The number of subspaces is two, and the size of each subspace is $D \times n_{1}$ . Then the three-order tensor form of $X_{i}$ can be denoted as $X_{i} \in R^{2 \times D \times n_{1}}$ , whose elements are represented as $x_{i_{1} i_{2} i_{3}}$ ( $1 ⩽ i_{1} ⩽ 2, 1 ⩽ i_{2} ⩽ D, 1 ⩽ i_{3} ⩽ n_{1}$ ). $X_{i}$ are shown in Figure 3. It is notable that the size of each subspace must be equal.

Figure 3.

Tensor representation.

In order to explore the intrinsic features of a tensor, HOSVD, that is, an extension of singular value decomposition (SVD), is employed to decompose the tensor. Compared with SVD, HOSVD is more efficient at extracting significant features. Before decomposition, HOSVD must flatten a tensor to a series of matrices along various dimensions of the tensor (see Figure 4). Then SVD is introduced to decompose these matrices, that is, HOSVD still works based on the matrix. However, the matrix flattened by the tensor contains all the information regarding various factors. For example, $X_{i} \in R^{2 \times D \times n_{1}}$ is an order-three tensor form and we flatten the tensor along its first order, so the tensor becomes a matrix $X \in R^{2 \times D n_{1}}$ . It is clear that other factors are also contained in $X$ . Perform HOSVD on $X_{i}$ , and we can obtain the following forms (Costantini et al., 2008):

\begin{matrix} X_{Frequency} = U_{Frequency} Λ_{Frequency} V_{Frequency} \\ X_{Time} = U_{Time} Λ_{Time} V_{Time} \\ X_{Subspace} = U_{Subspace} Λ_{Subspace} V_{Subspace} \end{matrix}

(6)

where $X_{Frequency}, X_{Time}, X_{Subspace}$ represent that $X_{i}$ flattens along the frequency dimension, time dimension, and subspace dimension respectively; $Λ_{Frequency}, Λ_{Time}, Λ_{Subspace}$ denote the diagonal singular value matrices; $U_{Frequency}$ , $U_{Time}$ , $U_{Subspace}$ span the column spaces of $X_{Frequency}, X_{Time}, X_{Subspace}$ respectively, and $V_{Frequency}, V_{Time}, V_{Subspace}$ span the row spaces respectively. Then we can use $U_{Frequency}$ , $U_{Time}$ and $U_{Subspace}$ to reconstruct tensor $X_{i}$ (Vasilescu and Terzopoulos, 2003):

\begin{matrix} X = \sum_{i_{1} = 1}^{2} \sum_{i_{2} = 1}^{D} \sum_{i_{3} = 1}^{n_{1}} σ_{i_{1} i_{2} i_{3}} (u_{i_{1}}^{subspace \circ} u_{i_{2}}^{Frequency \circ} u_{i_{3}}^{Time}) \\ = Z \times_{Subspace} U_{Subspace} \times_{Frequency} U_{Frequency} \times_{Time} U_{Time} \end{matrix}

(7)

where ∘ denotes the outer product, $U^{Subspace} = [u_{1}^{Subspace}, u_{2}^{Subspace}]$ , $U^{Frequency} = [u_{1}^{Frequency}, u_{2}^{Frequency}, \dots, u_{D}^{Frequency}]$ , $U^{Time} = [u_{1}^{Time}, u_{2}^{Time}, \dots, u_{n_{1}}^{Time}]$ , $[Z]_{i_{1} i_{2} i_{3}} = σ_{i_{1} i_{2} i_{3}}$ , and $Z \times_{i} U_{i}$ ( $i$ can be Subspace, Frequency or Time) denotes the product of a tensor and a matrix. The mode- $i$ product of a tensor $Z \in R^{2 \times D \times n_{1}}$ by a matrix $U_{i}$ , denoted by $Z \times_{i} U_{i}$ , is a tensor $B$ . For example, we compute $Z \times_{Frequency} U_{Frequency}$ . First, $Z$ is flattened to a matrix $Z \in R^{D \times 2 n_{1}}$ along the dimension of Frequency; then, compute $B = U Z$ ; finally, fold $B$ to a tensor $B$ . $Z$ can be computed using the following equation (Vasilescu and Terzopoulos, 2003):

Z = X \times_{Subspace} U'_{Subspace} \times_{Frequency} U'_{Frequency} \times_{Time} U'_{Time}

(8)

where $U'_{Frequency}$ , $U'_{Time}$ and $U'_{Subspace}$ represent the transpositions of $U_{Frequency}$ , $U_{Time}$ and $U_{Subspace}$ , respectively. $Z$ represents the relationships between different subspaces, while $Λ = (Λ_{Frequency}, Λ_{Time}, Λ_{Subspace})$ , that is, the set of singular value matrices, can express the inner information of a subspace. In this paper, $Λ$ and $Z$ are all selected as the initial features to represent a sample. Then $X_{i}$ can be represented by

T_{i} = [σ_{1 i}, σ_{2 i}, \dots, σ_{n_{2} i}, z_{1 i}, z_{2 i}, \dots, z_{n_{3} i}]^{'}

(9)

where the $σ$ are the non-zero elements of $Λ$ , the $z$ are the elements of $Z$ , $n_{2}$ is the number of elements selected from $Λ$ , and $n_{3}$ is the number of elements selected from $Z$ . So far, we can denote the original data set by the extracted features:

\begin{matrix} P = [T_{1}, T_{2}, \dots, T_{N}] \\ = [\begin{matrix} σ_{11} & σ_{12} & \dots & σ_{1 N} \\ σ_{21} & σ_{22} & \dots & σ_{1 N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ σ_{n_{2} 1} & σ_{n_{2} 2} & \dots & σ_{n_{2} N} \\ z_{11} & z_{12} & \dots & z_{1 N} \\ z_{21} & z_{22} & \dots & z_{2 N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ z_{n_{3} 1} & z_{n_{3} 2} & \dots & z_{n_{3} N} \end{matrix}] \in R^{(n_{2} + n_{3}) \times N} \end{matrix}

Figure 4.

Flattening a tensor along different dimensions.

LLE

Although $P$ is the feature of $X$ , the dimensionality of $P$ is still high, which will be difficult for the following fault diagnosis. Hence, LLE is performed on $P$ to reduce its dimensions. The computational process of LLE can be roughly divided into the following three steps.

Compute the neighbours of each point by the k nearest neighbor (k-NN) method or $ε$ -neighbourhoods method. Euclidean distance is commonly used to measure the similarity between two points, but it is only suitable for linear data sets. Some novel methods are proposed, such as Chebychev, city block, Spearman, correlation, and so on. In practical applications, we should select a method depending on the distribution of a data set. In addition, the number $k$ of neighbours is also a key factor for LLE. If $k$ is large, the embedding result mainly expresses the global characters, while a small $k$ may lose the relationship among submanifolds. Either a too large or a too small $k$ may decrease the performance of LLE.

Explore the local structure information from the original data set. LLE treats the local reconstruction weight coefficient as the structure information. The weight coefficients can be computed as follows:

\begin{matrix} \arg \min ‖ T_{i} - \sum_{j = 1}^{k} w_{i j} T_{i}^{j} ‖_{2}^{2} \\ s . t . e' W_{i} = 1 \end{matrix}

(11)

where $e = [1, 1, \dots, 1]^{'} \in R^{k \times 1}$ , $W_{i} = [w_{i 1}, w_{i 2}, \dots, w_{i k}]^{'} \in^{R k \times 1}$ , and $w_{i j}$ indicates the weight coefficient between the point $i$ and the point $j$ . The constrained condition can ensure that the solution of equation (11) is unique. Equation (11) can be solved by the Lagrange multiplier method; then the optimal solution $W_{i}$ can be obtained by

W_{i} = \frac{G_{i}^{- 1} e}{e^{'} G_{i}^{- 1} e}

(12)

Compute the embedding result $Y = [Y_{1}, Y_{2}, \dots, Y_{N}]$ . Y can be obtained by preserving the local structure information of the original data set in a low-dimensional space. It can be computed by

\begin{matrix} \arg min \sum_{i = 1}^{n} ‖ Y_{i} - \sum_{j = 1}^{k} w_{i j} Y_{i}^{j} ‖_{2}^{2} = tr (Y^{'} M Y) \\ s . t . \frac{1}{N} Y^{'} Y = I \\ Y^{'} e = 0 \end{matrix}

(13)

where $Y_{i}^{j}$ represents the $j$ th neighbour of $Y_{i}$ , $M = (I_{N} - W)^{'} (I_{N} - W)$ , and $W \in R^{N \times N}$ represents weight coefficient matrix. The $i$ th row and $j$ th column element of the matrix $W$ denotes the weight value between $x_{i}$ and $x_{j}$ . If $x_{i}$ is a neighbour of $x_{j}$ , $w_{i j}$ can be computed using equation (12); otherwise, $w_{i j} = 0$ . The constraints of equation (13) make the solution of the cost function invariant to translations and rescaling. The low-dimensional coordinate of $X$ can be obtained by calculating the bottom $d + 1$ eigenvectors of the matrix $M$ . Since the normalization constraint $\sum_{j = 1}^{k} w_{i j} = 1$ , zero is a trivial eigenvalue that should be excluded. The remaining $d$ eigenvectors are the final low-dimensional coordinates.

Fault diagnosis

Based on the MLSLLE algorithm, a novel machinery fault diagnosis method is developed. First, the features of the training and test data sets are simultaneously extracted by MLSLLE; then, the support vector machine (SVM) is trained by the features of training data set; finally, the classes of the test samples are recognized by the trained SVM. The complete fault diagnosis process is shown in Figure 5, and the detailed description of the process is as follows.

Sample the data from a machine, then divide the data into training and test samples.

WT and ST are introduced to decompose each sample respectively, by which a tensor form of each sample is constructed.

Respectively perform HOSVD on the constructed tensors to obtain the initial features $T_{i} (i = 1, 2, \dots, N)$ . A new data set $P$ is constructed using these initial features.

Reduce the dimensionality of $P$ by LLE.

Train SVM in the low-dimensional space obtained by step 4.

Compute the features of the test samples by the technology described in step 2 to step 4.

Recognize the classes of the test samples using the trained SVM.

Figure 5.

Fault diagnosis based on MLSLLE.

Experiments

In this section, two real bearing data sets are utilized to validate the effectiveness of our proposed method.

Bearing data set 1: The bearing data set is obtained from the Western Reserve University Bearing Data Center website. This data set, sampled from a motor test platform, has become a standard data set used to verify the efficiency of a new fault diagnosis method. As shown in Figure 6, the test platform consists of a motor (left), a torque transducer/encoder (centre), a dynamometer (right), and control electronics (not shown). It can generate four types of data including normal data, inner race fault data, outer race fault data, and ball fault data. The dimensionality of each sample is 1024, and the size of each kind of data is 100. The one-dimensional signals in the time domain are shown in Figure 7.

Figure 6.

Bearing test platform 1.

Figure 7.

Time-domain signals of bearing data set 1.

In this experiment, we mainly analyse the performance of MLSLLE in terms of feature extraction. According to the characters of the signal, sym3 is selected as the mother wavelet, and each sample is decomposed into eight layers by WT. In addition, to ensure that the sizes of the two subspaces are equal, we only select eight important frequency ranges from ST. In order to compare their capability for feature extraction, PCA, tensor locality preserving projections (TENSORLPP) (He et al., 2005, 2006), conventional MLS and our improved MLS are all performed on the bearing data set, and the most significant feature and the top three significant features of each sample are shown in Figure 8.

Figure 8.

Features of bearing data 1 extracted by different algorithms. (The red ‘*’ denotes ball fault data; the green ‘▵’ represents inner race fault data; the blue ‘□’ denotes outer race fault data; the black ‘o’ indicates normal data.)

Since PCA belongs to a linear dimension reduction algorithm, PCA can only deal with simple linear data sets. Hence, the features extracted by PCA are not suitable for classification. The fact can be demonstrated by Figure 8(a) and 8(b), where most of the samples overlap. Although TENSORLPP and MLS describe a signal using tensor models, all the algorithms construct a tensor model for the whole data set, for which features of each sample are only extracted from the tensor model. In other words, TENSORLPP and MLS belong to global algorithms ignoring the local information, so the features obtained by the two algorithms are still not suitable for recognition. As shown in Figure 8(g) and 8(h), although the two kinds of data are close, all the samples can be recognized in three-dimensional space. This is because each sample is described in tensor form, increasing the amount of useful information. Furthermore, the features of each sample are independently extracted, which will be benefit feature extraction. We perform the K-NN method on the one-dimensional and the three-dimensional feature spaces obtained by improved MLS, and find that the normal and the inner samples can be fully recognized (accuracy can reach 100%). Because the features of the ball and outer samples are very similar in one-dimensional space, K-NN method cannot distinguish between the two classes. However, in three-dimensional space, the inter-class separability between the ball and outer samples increases, so the recognition accuracy is 100%. Finally, we use LLE to reduce the dimensions of the features obtained by improved MLS ( $k = 8$ , $d = 3$ ). As shown in Figure 9, the samples with the same label are projected nearly to a point, while the samples with different labels are separated. So our proposed method is highly adaptive for extracting the features of a complex data set.

Figure 9.

Features of bearing data set 1 extracted by MLSLLE. (The red ‘*’ denotes ball fault data; the green ‘▵’ represents inner race fault data; the blue ‘□’ denotes outer race fault data; the black ‘o’ indicates normal data.)

The computational cost of MLSLLE can be roughly divided into three parts including tensor construction, HOSVD decomposition and LLE. In order to test the computational cost of MLSLLE, we use MLSLLE to deal with the bearing data set, whose size is 400 and whose dimensionality is 1024. The result shows that MLSLLE only takes 15.6 s to deal with this data set, so MLSLLE can meet the requirements of industrial applications.

Bearing data set 2: This bearing data set is collected from a real test platform installed in our own laboratory. As shown in Figure 10(a), the test platform mainly consists of a motor, a gearbox and a bearing. Each kind of data can be generated by replacing bearings with different faults (see Figure 10(b)). It can generate four kinds of data: normal data, inner race fault data, outer race fault data and ball fault data. All the data is sampled from two accelerometer sensors mounted in the test platform’s horizontal and vertical directions. The bearing is driven by the rotating motor with a frequency of 1200 rpm, and the sampling frequency is set to 10 kHz. For convenience, the feature number of each sample is 1024, and the size of each type of data is 80. The one-dimensional signals in the time domain are shown in Figure 11.

Figure 10.

Bearing test platform 2.

Figure 11.

Four typical vibration signals of bearing data set 2.

In this experiment, we addressed a classification problem using bearing data set 2. MLSLLE is used on the data sets to extract the features. Additionally, PCA, local tangent space alignment (LTSA), WSVLLE (Liu et al., 2016), DIFFUSION MAP, and LLE are also introduced to deal with the data set sampled from the vertical direction, for comparison with MLSLLE. All the optimal parameters of these algorithms are selected, and the visualizations are shown in Figure 12. It is clear that the features extracted by the algorithms depending on vector models completely overlap in three-dimensional space, and the classes of each sample cannot be directly recognized. Since they use tensor models, the features obtained by WSVLLE and MLSLLE have better intra-class compactness and inter-class separability. Finally, we conduct SVM on the embedding results to quantitatively analyse the performance of our proposed method in machinery fault diagnosis. The highest accuracy of each algorithm and the corresponding dimensionality are listed in Table 1. Although the accuracy of both WSVLLE and MLSLLE is 100%, the minimum dimensionality of features obtained by WSVLLE and MLSLLE is five and two, respectively. It implies that the features obtained by MLSLLE are more significant than the ones obtained by WSVLLE, so our proposed method has perfect performance in terms of fault diagnosis.

Figure 12.

Features of bearing data 2 extracted by different algorithms. (The red ‘*’ denotes ball fault data; the green ‘▵’ represents inner race fault data; the blue ‘□’ denotes outer race fault data; the black ‘o’ indicates normal data.)

Table 1.

Recognition accuracy of different algorithms for bearing data set 2 (%).

Data	Ball	Inner race	Outer race	Normal
Algorithm
PCA ( $d = 12$ )	60	80	90	40
LTSA ( $d = 8$ , $k = 12$ )	70	80	80	70
DIFFUSION MAPS ( $d = 8$ )	60	70	90	50
LLE ( $d = 7$ , $k = 12$ )	30	70	90	100
WSVLLE ( $d = 5$ , $k = 12$ )	100	100	100	100
MLSLLE ( $d = 2$ , $k = 12$ )	100	100	100	100

Conclusion

This paper developed a tensor-based machinery fault diagnosis method called MLSLLE. MLSLLE uses the tensor model to describe each sample and employs HOSVD and LLE to explore the intrinsic features. We performed the proposed fault diagnosis method on two real bearing data sets, and the experimental results show that the proposed method can efficiently extract significant features and can obtain high recognition accuracy. In future work we will mainly consider how to directly use the tensor form with LLE. We will also take note of supervised or semi-supervised MLSLLE cases.

Footnotes

Declaration of conflicting interests

The authors declare that there is no conflict of interest.

Funding

This work is supported by the National Natural Science Foundation of China (number 61673102).

References

Afra

Gildin

(2016) Tensor based geology preserving reservoir parameterization with higher order singular value decomposition (HOSVD). Computers & Geosciences 94: 110–120.

Alkaya

Grimble

(2015) Non-linear minimum variance estimation for fault detection systems. Transactions of the Institute of Measurement & Control 37(6): 805–812.

Tang

(2016) Application of variational mode decomposition energy distribution to bearing fault diagnosis in a wind turbine. Transactions of the Institute of Measurement & Control 5(2): 753–772.

Bayindir

(2015) Early detection of rogue waves by the wavelet transforms. Physics Letters A 38(1–2): 156–161.

Cheng

Jiang

et al . (2016) Incremental locally linear embedding-based fault detection for satellite attitude control systems. Journal of the Franklin Institute 353(1): 17–36.

Cichocki

Mandic

Lathauwer

et al . (2015) Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Processing Magazine 32(2): 145–163.

Costantini

Sbaiz

Susstrunk

(2008) Higher order SVD analysis for dynamic texture synthesis. IEEE Transactions on Image Processing 17(1): 42–52.

Daubechies

(1990) The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory 3(65): 961–1005.

Ding

Wen

et al . (2015) Min-max discriminant analysis based on gradient method for feature extraction. In: International conference on control automation robotics & vision. IEEE, pp. 129–134.

10.

Ding

Wen

et al . (2016) Locality sensitive batch feature extraction for high-dimensional data. Neurocomputing 171: 664–672.

11.

Zhang

(2014) Target detection based on a dynamic subspace. Pattern Recognition 47(1): 344–358.

12.

Fan

Zhang

(2016) Fault diagnosis of non-Gaussian process based on FKICA. Journal of the Franklin Institute 354(6): 2573–2590.

13.

Gaied

(2015) Wavelet-based prognosis for fault-tolerant control of induction motor with stator and speed sensor faults. Transactions of the Institute of Measurement & Control 37(1): 100–113.

14.

Gao

Woo

et al . (2016a) Multidimensional tensor-based inductive thermography with multiple physical fields for offshore wind turbine gear inspection. IEEE Transactions on Industrial Electronics 63(10): 6305–6315.

15.

Gao

Woo

Gui

et al . (2016b) Unsupervised diagnostic and monitoring of defects using waveguide imaging with adaptive sparse representation. IEEE Transactions on Industrial Informatics 12(1): 405–416.

16.

Gao

Woo

et al . (2016c) Unsupervised sparse pattern diagnostic of defects with inductive thermography imaging system. IEEE Transactions on Industrial Informatics 12(1): 371–383.

17.

Hao

Wen

Zhang

et al . (2016) Q estimation of seismic data using the generalized S-transform. Journal of Applied Geophysics 135: 122–134.

18.

(2013) Frequency manifold for nonlinear feature extraction in machinery fault diagnosis. Mechanical Systems & Signal Processing 35(1–2): 200–218.

19.

Cai

Niyogi

(2006) Tensor subspace analysis. In: Advances in neural information processing systems, pp. 499–506.

20.

Cai

Liu

et al . (2005) Image clustering with tensor representation. In: ACM international conference on multimedia, Singapore, November, pp. 132–140.

21.

Jialiang

Jianfu

Feng

(2016) Fault diagnosis for multivariable non-linear systems based on non-linear spectrum feature. Transactions of the Institute of Measurement & Control 39(7):122–134.

22.

Lin

Meng

(2011) An adaptive Generalized S-transform for instantaneous frequency estimation. Signal Processing 91(8): 1876–1886.

23.

Liu

Moor

(2013) Multiview partitioning via tensor methods. IEEE Transactions on Knowledge & Data Engineering 25(5): 1056–1069.

24.

Liu

Zeng

et al . (2016) LLE for submersible plunger pump fault diagnosis via joint wavelet and SVD approach. Neurocomputing 185: 202–211.

25.

Liu

Wang

(2014) New approach to derivative calculation of multi-valued logical functions with application to fault detection of digital circuits. IET Control Theory & Applications 8(8): 554–560.

26.

Wei

et al . (2016) A new rolling bearing fault diagnosis method based on multiscale permutation entropy and improved support vector machine based binary tree. Measurement 77: 80–94.

27.

Louwerse

Smilde

(2000) Multivariate statistical process control of batch processes based on three-way models. Chemical Engineering Science 55(7): 1225–1235.

28.

Luo

Bao

Gao

et al . (2014) Batch process monitoring with GTucker2 model. Industrial & Engineering Chemistry Research 53(39): 15101–15110.

29.

Luo

Tao

Ramamohanarao

et al . (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge & Data Engineering 27(11): 3111–3124.

30.

Mallar

(1989) A theory for multi-resolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7): 11676–11693.

31.

Nomikos

Macgregor

(1995) Multi-way partial least squares in monitoring batch processes. Chemometrics & Intelligent Laboratory Systems 30(1): 97–108.

32.

Rasmus

(1997) PARAFAC: Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38(2): 149–171.

33.

Rong

Liu

Shao

(2012) Dynamic fault diagnosis using extended matrix and tensor locality preserving discriminant analysis. Chemometrics & Intelligent Laboratory Systems 116(1): 41–46.

34.

Roweis

Saul

(2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326.

35.

Seryasat

Shoorehdeli

Ghane

et al . (2012) Intelligent fault detection of ball bearing using FFT, STFT energy entropy and RMS. Life Science Journal 9(3): 1781–1786.

36.

Vasilescu

MAO

Terzopoulous

(2003) Multilinear subspace analysis of image ensemblese, IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI 2:93-99, London: Springer-Verlag.

37.

Wang

Yin

(2014) Data-driven fault diagnosis for an automobile suspension system by using a clustering based method. Journal of the Franklin Institute 351(6): 3231–3244.

38.

Yin

Gao

et al . (2015a) Data-based techniques focused on modern industry: An overview. IEEE Transactions on Industrial Electronics 62(1): 657–667.

39.

Yin

Wang

Karimi

(2013) Data-driven design of robust fault detection system for wind turbines. Mechatronics 24(4): 298–306.

40.

Yin

Yang

Karimi

(2012) Data-driven adaptive observer for fault diagnosis. Mathematical Problems in Engineering 2012(3, part b): 1094–1099.

41.

Yin

Zhu

Kaynak

(2015b) Improved PLS focused on key-performance-indicator-related fault diagnosis. IEEE Transactions on Industrial Electronics 62(3): 1651–1658.

42.

Zhang

Wang

Sun

(2016) A report on multilinear PCA plus GTDA to deal with face image. Cybernetics & Information Technologies 16(1): 146–157.

43.

Zhang

Tao

Gao

et al . (2015) Learning multiple linear mappings for efficient single image super-resolution. IEEE Transactions on Image Processing 24(3): 846–861.

44.

Zhao

Wang

(2017) Tensor dynamic neighborhood preserving embedding algorithm for fault diagnosis of batch process. Chemometrics and Intelligent Laboratory Systems 162(15): 94–S103.

45.

Zhou

Wang

Zhang

et al . (2014) Face recognition based on PCA and logistic regression analysis. Optik: International Journal for Light and Electron Optics 125(20): 5916–5919.