Fault identification with limited labeled data

Abstract

Intelligent fault diagnosis (IFD) based on deep learning methods has shown excellent performance, however, the fact that their implementation requires massive amount of data and lack of sufficient labeled data, limits their real-world application. In this paper, we propose a two-step technique to extract fault discriminative features using unlabeled and a limited number of labeled samples for classification. To this end, we first train an Autoencoder (AE) using unlabeled samples to extract a set of potentially useful features for classification purpose and consecutively, a Contrastive Learning-based post-training is applied to make use of limited available labeled samples to improve the feature set discriminability. Our Experiments—on SEU bearing dataset—show that unsupervised feature learning using AEs improves classification performance. In addition, we demonstrate the effectiveness of the employment of contrastive learning to perform the post-training process; this strategy outperforms Cross-Entropy based post-training in limited labeled information cases.

Keywords

Autoencoder unsupervised feature learning few-shot learning contrastive learning Siamese network rolling element bearing

Introduction

It is almost impossible to prevent the generation of faults in rotating machinery, primarily due to their long operations hours and complex operational environment Zhao et al. (2020). On the other hand, the vital role these equipment have in modern industry raises the need to keep them in operational condition as much as possible. Thus, during recent decades, fault diagnosis of rotating machinery has gained special attention Liu et al. (2018b). The promising performance of artificial intelligence methods in solving pattern recognition problems, alongside the hard to provide solid understanding of the machines’ dynamical behavior and extensive experience required to diagnose complex machines, resulted in the creation of a new terminology known as Intelligent Fault Diagnosis (IFD). Accordingly, IFD is defined as taking advantage of intelligent methods to diagnose machine health states Lei et al. (2020).

Rolling element bearings are among the most important pieces in rotating machinery Cheng et al. (2021). Therefore, the application of intelligent methods toward the diagnosis of them is widely investigated Han et al. (2021), Xu et al. (2020) and Pei et al. (2021).

In the context of machinery fault diagnosis, Deep Learning methods are well known, mainly due to the fact that, unlike classical machine learning approaches, they do not require human supervision for feature extraction and feature selection; as they are able to learn features from the given inputs, independently Lei et al. (2020).

The training of deep networks needs enormous amounts of information as training data since it involves the optimization of huge numbers of variables as model parameters. In contrast, in most cases of machinery fault diagnosis, sufficient amounts of information are not available as running industrial machines and pieces of equipment in faulty mode have severe consequences Wang et al. (2020). Therefore, insufficient labeled training data are a challenge for the application of deep learning methods in real-world applications.

Due to its simpler collection process and affordability, unlabeled information can be considered a potential candidate to decrease the amount of labeled information required. Autoencoders, as neural networks employing unsupervised learning strategies, are a great choice to mine unlabeled information.

In order to improve the extracted feature set by the AE, for classification purposes, different supervised strategies are likely to be employed. The simplest approach to take is to place a softmax layer at the end of the feature extractor and retrain it again. However, this method is not able to perform well in extremely low-labeled information scenarios.

In cases where the available labeled information is extremely low, Few-Shot Learning is potentially capable of making better classification performances available, compared with the conventional methods Zhang et al. (2019). Few-Shot Learning (FSL) approaches are a set of methods and techniques with a great potential to decrease the dependency of deep learning models on the availability of training data Jadon (2020). Although they are mostly famous for their great performance in computer vision, their application in other fields, including intelligent fault diagnosis, is growing rapidly.

In this paper, we introduce a hybrid classification approach consisting of unsupervised feature learning alongside contrastive representation learning-based post-training. This goal is achieved by the application of Siamese networks as a Few-Shot Learning strategy, to optimize the feature space created by the Encoder section of an Autoencoder network. To our very knowledge, we are the pioneers of utilizing both these methods, as complementaries of each other. Additionally, the applicability of the proposed method in cases involving limited labeled data scenarios is demonstrated by comparing its performance with conventional baselines. Based on the results of this case study, we succeeded in achieving plus 90% classification accuracy using only 5 labeled observations from each class and an advantage over the conventional classification approaches as high as almost 9%. Furthermore, the potential use of trainable parameter reduction to improve the classification accuracy achievable by post-training steps is investigated.

Theoretical background

Autoencoders

In its simplest formation, an Autoencoder (AE) is a neural network consisting of 3 layers, known as the input, latent space or bottleneck, and output, where input and output layers have an equal number of neurons, and the bottleneck has lower dimensions. During the training process, the AE learns how to reconstruct the given input at the output, with the restriction of extracting a lower dimension representation, at the bottleneck Xia et al. (2017). As illustrated in Figure 1, an AE is made of two main building blocks, known as encoder and decoder. While the encoder is supposed to derive the set of features available at latent space, the decoder is responsible for reconstructing the given input at the output layer Liu et al. (2018a). Similar to every other neural network, the training of an AE requires a loss function. An ideal AE is capable of reconstructing the given input exactly the same at its output layer; therefore, the loss function of an AE is usually a similarity metric. Due to its ease of implementation and reliable performance, Mean Squared Error is the most frequent choice as the loss function to train an AE.

Figure 1.

Visual demonstration of a simple 3-layered autoencoder network.

Similar to other neural networks, the deeper an AE is, the more abstract feature set it would extract at its bottleneck. Making use of a deep encoder and decoder is one strategy to implement a deep AE; however, it is not the only approach to take. Alternatively, a series of shallow AEs can be trained, where the embedding provided at the latent space of an AE is used as the input of another one. Having all the AEs trained, encoders existing at each AE are placed consecutively to produce a deep encoder, and similarly, a deep decoder is produced by putting decoders in a row. A deep AE implemented using this technique is known as Stacked Autoencoder (SAE).

Few-shot learning

Few-Shot Learning (FSL) can be considered as a set of strategies and techniques to improve the performance of deep learning networks in limited data scenarios. The Idea behind every FSL strategy is to train a neural network with few observations and acceptable generalization to unseen observations Jadon (2020). Contrastive representation learning, using Siamese Networks, is one of the FSL approaches that is effective in low data scenarios. As the name suggests, a Siamese network is made of two identical models (not only sharing the same structure but also sharing the same network parameters), used to extract embeddings, given an arbitrary pair of observations. Pairs passed to the Siamese network either belong to the same classes (positive pairs) or different classes (negative pairs). During the training process, the Siamese networks are trained to maximize the similarity between positive pairs and simultaneously minimize the similarity between negative pairs, resulting in the construction of a contrastive space. The similarity between two arbitrary observations is usually evaluated using a distance metric, such as Euclidean Distance. Euclidean Distance is the widely used option Witten and Frank (2002), mainly due to its geometric intuition and applicability to high-dimensional spaces (Euclidean Distance is the preferred choice when it comes to 1-Dimensional and 2-Dimensional signals, such as vibration, images, and other similar modalities of information). A Siamese network in its simplest form is illustrated in Figure 2.

Figure 2.

Illustration of a simple Siamese network.

Contrastive loss Jadon (2020) is an option used for contrastive learning. For a pair of inputs, the contrastive loss can be computed as follows

C o n t r a s t i v e L o s s = (1 - c) \frac{1}{2} D_{w}^{2} + (c) \frac{1}{2} (\max (0, m - D_{w}))^{2}

(1)

where D_w is an arbitrary similarity metric such as the Euclidean Distance between the embeddings derived from given pair of inputs, c is the contrastive label of an arbitrary pair, and m is the desired distance that clusters of observations from different classes are supposed to have. According to the given equation, this function can be divided into two terms:

• The first term is responsible for minimizing the distance between the embeddings of observations, sharing the same class

• The second term, to maximize the dissimilarity of the embeddings belonging to observations of different classes Jadon (2020)

Related works

Some studies are conducted to assess the effectiveness of AEs for unsupervised feature learning for fault diagnosis purposes. For instance in Liu et al. (2018a) a Stacked Autoencoder, is used to train a deep AE for gearbox fault diagnosis. Similarly, in Xia et al. (2017) SAE is employed to diagnose the health state of motor bearing. In their study, the effectiveness of the proposed method is evaluated by the variation of the availability of labeled data. Moreover, Denoising Autoencoders (DAE), a set of AEs capable of reconstructing a noise-free version of an arbitrary noisy observation, are widely used to encounter noise interference in noisy environments. For example, Sun et al. (2016) took advantage of Sparse Denoising Autoencoders, which are the employment of dropout layers in conventional DAEs to assess the efficiency of AE-powered fault diagnosis solutions to diagnose induction motors, in the presence of different noise levels and various levels of labeled data availability. In Li et al. (2017), AE with a series of modifications is used to improve the robustness towards noise presence for rolling bearing fault diagnosis purposes. The applied modifications mainly rely on using dropout layers to set the activation value of randomly chosen neurons in both input and the latent space equal to zero. Taking such an approach makes the whole network more strong in dealing with random processes, interfering with the original data. In Shao et al. (2017), a new loss function is designed by taking advantage of Cross-Entropy to train feature learning using AEs. Moreover, the artificial fish swarm algorithm is used to optimize the key parameters of the deep AE used. The efficiency of the proposed method is assessed based on its performance in fault diagnosis of a gearbox test rig and electrical locomotive roller bearing. In Sohaib and Kim (2018), SAE is taken advantage of to train deep AEs for roller bearing fault detection. In this study, sparsity is also employed to prevent overfitting during the training process.

Zhang et al. (2019) employed a Siamese network consisting of a pair of deep convolutional neural networks as feature extractor, to achieve performance superior to conventional approaches, in the diagnosis of rolling bearings in limited data scenarios. This study also considered the tolerance of the proposed training strategy to the presence of various levels of noise, in the form of random Gaussian noise. More recently, in Zhang et al. (2021), FSL is employed to diagnose bearings using model-agnostic meta-learning with an aim to train fault classifiers in scenarios involving low amounts of training data availability. Similarly, in Tao et al. (2022) a novel Few-Shot learning strategy based on model agnostic matching network is proposed. The performance of the introduced method is evaluated based on both identical machines and cross-equipment fault diagnosis scenarios. Their experiments confirm the effectiveness of the proposed approach, in comparison with conventional approaches.

Proposed Method

Any deep neural network (DNN) as to solve a classification task, can be considered to be consisting of two main parts: the feature extractor and the classifier. The feature extractor is supposed to extract a feature set with higher levels of abstraction through the network, while the classifier is responsible for mapping the final set of extracted features by the feature extractor to the labels. Assume x as the input, $\tilde{x}$ as the final set of features extracted by the feature extractor, and L_x as the label corresponding to the x. The feature extractor can be described as $F : x \to \tilde{x}$ and similarly the classifier can be demonstrated as $C : \tilde{x} \to L_{x}$ . The focus of our work is on designing the feature extractor; on top of the constructed feature space, any conventional classifier can be used.

While conventional approaches to training DNN-based classification networks are merely dependent on labeled information, as mentioned earlier, the collection of labeled information—in huge amounts—for the cases of machinery fault diagnosis is not only time-consuming and costly but also not always possible as operation of industrial machines in faulty mode affects other machines in relation with it severely. Therefore, a DNN-based fault diagnosis approach capable of using both labeled and unlabeled information, able to perform robustly in cases where labeled data is extremely low, is quite beneficial. The proposed method in this paper is such an approach; by using an AE capable of learning features in an absolutely unsupervised manner, we are able to gain and invaluable preliminary understanding of the data with no labeled information used. Once the AE absorbs basic characteristics of the data—according to its performance in the reconstruction of unseen data—a post-training process capable of optimizing the feature space provided by the encoder section of the AE is essential. This is due to the fact that the AE is not aware of the orientation of the classes within the feature space and therefore not useful originally for classification purposes. Different post-training procedures are likely to be used, in order to make the feature space of the encoder a useful one for classification purposes. To keep the amount of labeled information during this procedure as low as possible, we take advantage of Siamese networks; a contrastive representation learning approach, adaptable to extremely limited labeled data scenarios. During this post-training, feature space produced by the encoder is modified in a way that observations belonging to the same classes of health state are aggregated to similar region of space while simultaneously the distance between regions representing different health states are oriented sufficiently far from each other. Contrastive learning as an FSL approach is capable of achieving a more separable feature set, in comparison with conventional approaches, and an increase in the separability is equal to higher classification performance. The proposed method is demonstrated in Figure 3.

Figure 3.

Visual demonstration of the proposed hybrid method.

Experiments

Dataset

Rolling bearing plays a crucial role in rotating machinery and their failure is one of the most frequent reasons behind the downtime of these machines. In this study, the multi-fault bearing dataset provided by Southeast University¹ Shao et al. (2018) is used to evaluate the performance of the learning regimes introduced in the previous section. This dataset includes 5 classes as Normal, Inner Ring, Outer Ring, and simultaneous Inner and Outer Ring faults. In addition to those, two rotating speeds are included as 20 and 30 Hz. This dataset provides various channels of accelerations, corresponding to the various measurement locations. In this study, the time signal provided in the second channel is first windowed to a series of signals with a length of 1024, and then Fast Fourier Transform (FFT) is used as a preprocessing method. Application of FFT to transform the time-domain signal to the frequency-domain signal is a fairly accepted course of action Liu et al. (2018a), Li et al. (2017), Meng et al. (2018) in vibration analysis for rotating machinery fault detection; as faults are considerably easier to recognize in the frequency domain. By the application of FFT on time-domain signals, we are likely to obtain 512-point long frequency-domain signals covering 0–1000 Hz frequency range. According to the length of the signal, we have a frequency resolution lower than 2 Hz which we find it adequate. The dataset is split into train, validation, and test datasets with ratios of 0.6, 0.2679, and 0.1321, respectively. Moreover, the MinMax scaler is used to scale the components in the frequency spectrum. The feature scaler is fitted on the training dataset and is used across all the datasets.

Proposed method Vs. Comparable methods

To evaluate the effectiveness of the proposed method in fault diagnosis of rolling bearings, its performance is compared with the following methods.

Softmax feature extractor

The implementation of this learning regime involves a fully supervised process of training, where the classification network consists of a feature extractor and a simple softmax layer, as the classification network.

Few-shot feature extractor

A Few-Shot feature extractor is a feature extractor whose training procedure is done in an absolutely supervised manner.

Softmax optimization of encoder

The implementation of such learning regimes is a hybrid process, from the supervised or unsupervised learning point of view. Firstly, an Autoencoder is trained on the dataset in an unsupervised manner and using only unlabeled information. Afterward, by adding a softmax layer to the encoder, the whole network is post-trained using the labeled information to achieve improved classification capabilities of the encoder.

Procedure of experiments

The whole training dataset referenced in the previous section is used to train the AE. We employed a symmetrical structure with the formation of 512-256-128-256-512 as the number of neurons per layer, to construct the AE used in this study. Moreover, we used Adam optimizer to train the AE with the learning rate set to 0.001 and 100 epochs provided satisfactory training process. It is worth mentioning that the choice of activation function for all the layers of the AE is the hyperbolic tangent, due to its highly discriminative feature set. It is also noteworthy that the AE used in this paper is a conventional deep one, as the employment of stacking technique was not essential. Finally yet importantly, Mean Squared Error(MSE) is used not only as the loss function during the training of the AE but also as its performance evaluation criterion; according to our previous experience we considered a MSE value—on unseen data—lower than 0.0075 sufficient. The AE trained with the previously referenced architecture and set of parameters achieved a MSE equal to 0.0064 on the unseen test set.

As this study focuses on the robustness of the learning regimes to restricted labeled data scenarios, various amount of labeled information is provided to the aforementioned methods as follows:

• 2 Samples of each machine health class (Total of 10 Labeled Samples)

• 3 Samples of each machine health class (Total of 15 Labeled Samples)

• 4 Samples of each machine health class (Total of 20 Labeled Samples)

• 5 Samples of each machine health class (Total of 25 Labeled Samples)

• 1% of Training Data (Total of 49 Labeled Samples)

• 5% of Training Data (Total of 245 Labeled Samples)

• 10% of Training Data (Total of 490 Labeled Samples)

In each of the above cases, labeled samples are randomly selected from the training dataset.

In cases in which the post-training is implemented using FSL, Adam optimizer with the learning rate of 0.00001 is used and 1500 epochs guaranteed appropriate training process. On the other hand, in scenarios where the addition of softmax layer to the feature extractor and using Cross-Entropy loss function to perform the post-training, Adam optimizer with 0.001 as the learning rate and 100 epochs were used.

To keep the comparison between the methods fair, the efficiency of the feature extractors for classification purposes is evaluated by training a K-nearest neighbor classifier on the embedding provided by each feature extractor. The classification performance of the KNN classifier is evaluated on the testing dataset. The number of neighbors used by the KNN classifier varies according to the number of labeled samples provided. In the cases where less than five samples of each health state are provided, the number of neighbors is the same as the number of samples of each health state. For cases with more or equal to five samples of each health state, the number of neighbors is set to five.

In the implementation of learning regimes involving Few-Shot Learning, it is not always computationally affordable to use all the available pairs. Therefore, in the experiments where 5% and 10% of the whole training dataset is used as the available labeled information, respectively, 12 and 6 pairs are randomly sampled out of all available pairs. In these two cases, sampling out of pairs is done in a way that the number of positive and negative pairs is equal. Moreover, to exclude the effect of randomness in the training of the networks, experiments are done with ten repetitions.

Results and discussion

Wide range of criteria and metrics are available to evaluate the performance of a classifier; in this study as a balanced dataset is engaged, we believe the use of classification accuracy is quite reasonable. The mean accuracy achieved by each classification approach introduced in the previous section is summarized in Table 1.

Table 1.

Mean classification accuracy and standard deviation, by Methods and Amount of Labeled Information.

Amount of Labeled Information	Methods
Amount of Labeled Information	Few-Shot Optimization of Encoder (Proposed Method)	Softmax Feature Extractor	Few-Shot Feature Extractor	Softmax Optimization of Encoder
2 Samples per class	0.6763 (0.0856)	0.5966 (0.1242)	0.5476 (0.0404)	0.6427 (0.1196)
3 Samples per class	0.7837 (0.0837)	0.6751 (0.1012)	0.6479 (0.0602)	0.7714 (0.0691)
4 Samples per class	0.8366 (0.0817)	0.7001 (0.0752)	0.7121 (0.0537)	0.8090 (0.0785)
5 Samples per class	0.9243 (0.0467)	0.8376 (0.0684)	0.7578 (0.0459)	0.9048 (0.0348)
1 % of Training Data	0.9653 (0.0305)	0.9164 (0.0605)	0.8772 (0.0453	0.9462 (0.0348)
5 % Training Data	0.9960 (0.0019)	0.9982 (0.0012)	0.9995 (0.0005)	0.9946 (0.0055)
10 % Training Data	0.9995 (0.0004)	0.9996 (0.0004)	0.9999 (0.0003)	0.9994 (0.0007)

It can be easily understood that learning regimes utilizing unlabeled information by unsupervised feature learning, have outperformed methods depending on supervised-only training strategies for a wide range of labeled information availability. The effectiveness of unsupervised feature learning in the improvement of classification accuracy is as significant as that higher levels of classification accuracy are available in extremely lower amounts of labeled information. For example, methods capable of unsupervised feature learning unlock 90% and higher classification accuracy with just 25 samples, while Softmax Feature Extractor requires at least 49 samples (1% of the whole training data available) and Few-Shot Feature Extractor is not capable of achieving such performance until it is provided with 245 samples (5% of the whole training data available).

Excluding Few-Shot Feature Extractor learning regime due to its poor classification accuracy, generally lower SD shows that methods with unsupervised feature learning are likely to perform better according to classification accuracy robustness.

The significant difference between the classification performance provided by Softmax Feature Extractor and Softmax Optimization of Encoder can be explained by the fact that Softmax Feature Extractor as a fully supervised learning regime, is not able to take advantage of unlabeled information. Moreover, the performance of this learning regime is highly dependent on the amount of labeled information.

Similar to Softmax Feature Extractor, Few-Shot Feature Extractor is also incapable of making use of unlabeled information.

In addition to those, FSL post-tuning of the encoder turned out to be more effective, rather than the Softmax Optimization from an average classification accuracy point of view.

In the Figure 4(a), the separability of health states on raw data is demonstrated using the T-distributed Stochastic Neighbor Embedding visualization tool. The existence of two clusters from each health state is the result of the presence of two loading conditions in the dataset. Although these clusters look easily separable, severe over-fitting is inevitable in limited labeled information cases. In the Figure 4(b), the separablity of health states on the embedding provided by the encoder post-trained with FSL is demonstrated. Five labeled observations per class (25 observations in total) are used to conduct the post-training process. Except for the clusters identical to healthy operation of the bearing and Inner-Outer ring combinatory fault state which seem to have a neat overlap and a few outliers, the health state clusters are properly separable. In addition to that, this training procedure succeeded in the aggregation of observations belonging to similar health states over different loading conditions.

Figure 4.

Demonstration of Health Separability by Dimensions Derived using t-SNE. (a) 2D t-SNE Visualization of Raw Data. (b) 2D t-SNE Visualization of the Embedding Derived by Few-Shot Optimization of Encoder (utilizing 5 labeled observation per class). H: healthy, B: ball fault, I: inner race fault, O: outer race fault and I-O: combination of inner race and outer race faults.

Ablation study—improving post-training process by reduction of trainable parameters

It is generally accepted that a specific amount of information has a limited potential to be used for model parameters optimization in machine learning tasks. According to the referenced fact, increasing the complexity of a model in cases of severe data scarcity is not only unlikely to help but also has a great potential of decreasing the overall performance due to over-fitting. On the other hand, decreasing the complexity of a model as results in the reduction of the number of parameters supposed to be optimized during the training process can be considered as a corrective action to increase the performance of a model. Same analogy can be generalized to encoder post-training process in the implementation of the method proposed in this study. In every neural network, the deeper we go through the network, the higher abstraction we get from set of features provided by the network at a specific layer. Therefore, especially for classification purposes, the latter layers play a more vital role compared with initial layers. Based on the provided explanations, additional experiments are conducted to examine the effect of freezing the weights and biases of the encoder at first layer before its post-training process in extremely low labeled information availability scenarios. Freezing the weights and biases of the encoder at first layer is identical to utilizing the whole labeled information available to post-train the second layer of the encoder, directly affecting the classification performance. Similar choice of optimizer and learning rate to previous set of experiments, provided acceptable performance here too. Results corresponding to this experiment can be found in the Table 2. Besides that, pair making process is taken place exactly identical to the ones previously explained. Similar to the previously presented results, mean and standard deviations provided in Table 2 are corresponding to mean and SD of ten trials to exclude the effect of randomness as much as possible.

Table 2.

Effect of freezing the initial layer of encoder on the mean and standard deviation of classification accuracy.

Method	Initial Layer State	2 Samples per class	3 Samples per class	4 Samples per class	5 Samples per class
Few-Shot Optimization of Encoder	Frozen	0.7402 (0.0659)	0.7954 (0.0745)	0.8888 (0.0292)	0.9436 (0.0397)
Few-Shot Optimization of Encoder	Unfrozen	0.6763 (0.0856)	0.7837 (0.0837)	0.8366 (0.0817)	0.9243 (0.0467)
Softmax Optimization of Encoder	Frozen	0.6669 (0.0569)	0.7351 (0.0719)	0.8469 (0.0201)	0.8926 (0.0385)
Softmax Optimization of Encoder	Unfrozen	0.6427 (0.1196)	0.7714 (0.0691)	0.8090 (0.0785)	0.9048 (0.0348)

An alternative approach to decrease the number of trainable parameters during the post-training process is to develop a secondary neural network with extremely lower number of parameters and reduced complexity, on the embedding provided by the encoder. In this study, we used a network with the architecture of 128-64-32, utilizing Hyperbolic Tangent as the activation function, as the feature extractor of the secondary neural network trained on the embedding provided on the encoder. To conduct these set of experiments, Adam optimizers with 0.00001 and 0.001 as learning rate for FSL approaches and Cross-Entropy based approaches, respectively, were used. In addition to that, 1500 and 100 epochs provided proper training process for FSL approaches and Cross-Entropy ones, respectively. The performance of both FSL and softmax optimization approach as the training strategy of the secondary network is evaluated similar to the previous experiment and the results can be found in Table 3.

Table 3.

Mean and standard deviation of classification accuracy achieved by secondary feature extractors on encoder embedding.

	Amount of labeled information
Method	2 Samples per class	3 Samples per class	4 Samples per class	5 Samples per class
Few-shot FE on encoder embedding	0.5389 (0.0616)	0.6309 (0.0720)	0.6769 (0.0505)	0.8186 (0.0467)
Softmax FE on encoder embedding	0.6001 (0.0945)	0.7178 (0.0805)	0.7802 (0.0660)	0.8840 (0.0233)

Conclusion

In this study, a hybrid learning approach from the supervised or unsupervised learning point of view is presented to take advantage of both labeled and unlabeled information available. The proposed method has the potential to decrease the required amount of information to achieve baseline accuracy metrics, compared with most frequent conventional methods.

The performance of the proposed method is evaluated using the SEU Bearings Datasets, known to be amongst the best available rotating machinery benchmark datasets. This dataset includes time-domain acceleration signals, collected from bearings for various loads and different health states. In addition to the presented method, the performance of other methods is also evaluated to provide a performance comparison across classification approaches.

The conducted experiments show the benefits of using AEs to take advantage of unlabeled information. This approach not only improves the mean of accuracy but also provides a lower SD. Besides that, FSL turns out to be a better choice to post-train the embedding provided by encoders in limited labeled information scenarios.

The proposed method in this study can be regarded as a solution to low-labeled data availability cases. Its performance in fault diagnosis of other types of rotating machinery can be evaluated by taking advantage of similar datasets such as SEU Gear Datasets. Besides that, as in real-world fault diagnosis applications, the available machinery data are highly unbalanced, the performance of the proposed method needs to be assessed where unlabeled data suffers unbalancing.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Abbas Rohani Bastami

Note

References

Cheng

Kong

Chen

, et al. (2021) Transferable convolutional neural network based remaining useful life prediction of bearing under multiple failure behaviors. Measurement 168: 108286.

Han

Zhang

Yin

, et al. (2021) Rolling bearing fault diagnosis with combined convolutional neural networks and support vector machine. Measurement 177: 109022.

Jadon

(2020) An Overview of Deep Learning Architectures in Few-Shot Learning Domain. arXiv preprint arXiv:2008.06365.

Lei

Yang

Jiang

, et al. (2020) Applications of machine learning to machine fault diagnosis: a review and roadmap. Mechanical Systems and Signal Processing 138: 106587.

Zhang

Peng

, et al. (2018) Bearing fault diagnosis using fully-connected winner-take-all autoencoder. IEEE Access 6: 6103–6115.

Liu

Bao

Han

(2018a) A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis. Mathematical Problems in Engineering 2018.

Liu

Yang

Zio

, et al. (2018b). Artificial intelligence for fault diagnosis of rotating machinery: a review. Mechanical Systems and Signal Processing 108: 33–47. DOI: 10.1016/j.ymssp.2018.02.016.

Meng

Zhan

, et al. (2018) An enhancement denoising autoencoder for rolling bearing fault diagnosis. Measurement 130: 448–454.

Pei

Zheng

(2021) Intelligent bearing fault diagnosis based on teager energy operator demodulation and multiscale compressed sensing deep autoencoder. Measurement 179: 109452.

10.

Shao

Jiang

Zhao

, et al. (2017) A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Mechanical Systems and Signal Processing 95: 187–204.

11.

Shao

McAleer

Yan

, et al. (2019) Highly accurate machine fault diagnosis using deep transfer learning. IEEE Transactions on Industrial Informatics 15: 2446–2455.

12.

Sohaib

Kim

, 2018. Reliable fault diagnosis of rotary machine bearings using a stacked sparse autoencoder-based deep neural network. Shock and Vibration 2018, 1, 11, 2018.

13.

Sun

Shao

Zhao

, et al. (2016) A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 89: 171–178.

14.

Tao

Cheng

Qiu

, et al. (2022) Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic. Measurement Science and Technology 33: 115005.

15.

Wang

Kong

, et al. (2020) Few-shot rolling bearing fault diagnosis with metric-based meta learning. Sensors 20: 6437.

16.

Witten

Frank

(2002) Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record 31: 76–77.

17.

Xia

Liu

, et al. (2017) Intelligent fault diagnosis approach with unsupervised feature learning by stacked denoising autoencoder. IET Science, Measurement & Technology 11: 687–695.

18.

Fang

Tang

, et al. (2020) An unsupervised and enhanced deep belief network for bearing performance degradation assessment. Measurement 162: 107902.

19.

Zhang

Cui

, et al. (2019) Limited data rolling bearing fault diagnosis with few-shot learning. IEEE Access 7: 110895–110904.

20.

Zhang

Wang

, et al. (2021) Few-shot bearing fault diagnosis based on model-agnostic meta-learning. IEEE Transactions on Industry Applications 57: 4754–4764.

21.

Zhao

, et al. (2020). Deep learning algorithms for rotating machinery intelligent diagnosis: an open source benchmark study. ISA Transactions 107, 224–255. DOI: 10.1016/j.isatra.2020.08.010.