Abstract
Wind turbine gearbox failures significantly contribute to operational downtime and elevated maintenance expenses, necessitating advanced integrated condition monitoring techniques. In practical scenarios, sensor malfunction frequently leads to incomplete acoustic and vibration signals, impeding accurate fault diagnosis. To address this critical challenge, this study proposes an innovative data imputation strategy utilizing generative adversarial imputation networks (GAINs) to effectively reconstruct missing signal data. Acoustic and vibration data were collected from a scaled gearbox setup, replicating multicomponent fault conditions and varying sensor malfunction scenarios with missing data rates from 10 to 50%. Continuous wavelet transform was applied to convert imputed signals from one-dimensional time-series data into two-dimensional spectrograms, enhancing distinct fault feature visualization. Subsequently, fault classification was conducted using pretrained deep learning models, specifically SqueezeNet and DenseNet-201. The proposed GAIN showed a high coefficient of determination (0.98077–0.998253) with Pearson’s correlation coefficients exceeding 0.99 underscoring GAIN’s exceptional capability to preserve original signal variance and maintain the physical relevance of imputed signals. DenseNet-201 consistently outperformed SqueezeNet, achieving a maximum accuracy of 95.205% at 10% missing data and maintaining high accuracy of 92.343% even at 50% data loss. Conversely, SqueezeNet demonstrated accuracy ranging from 91.512 to 87.509% under similar conditions. The proposed methodology effectively integrates data imputation technique with transfer learning-based classification, demonstrating superior robustness, accuracy, and practical applicability in gearbox condition monitoring under real-world sensor malfunction conditions.
Keywords
Introduction
Worldwide, rising concerns regarding climate change have motivated substantial financial investments and policy initiatives aimed at promoting renewable energy solutions. Among the diverse renewable energy resources, wind energy has emerged prominently, gaining attention and priority due to its sustainability and environmental benefits. Particularly, from 2015 to 2020, investment in wind power plants worldwide witnessed an impressive escalation, marking approximately a 29% growth. 1 Although wind power is advantageous as it eliminates conventional fuel costs during the production of electricity, consistent operational and maintenance expenses remain significant concerns in wind-based energy generation systems. Wind turbines (WTs) are complex engineering structures comprising numerous interconnected subsystems, each responsible for specific operational tasks. Among these subsystems, the gearbox stands out due to its critical functionality—it efficiently transfers mechanical power from the low-speed rotor shaft hub to the generator via a high-speed shaft. Despite meticulous design considerations, WT components frequently encounter failures primarily attributed to harsh operational environments, severe cyclic loading, and fluctuating weather conditions. Consequently, the gearbox emerges as a notably vulnerable subsystem within WT, exhibiting the highest incidence of mechanical failures and resulting in substantial downtime relative to other subsystems. 2 The gearbox in a WT consistently endures irregular and fluctuating mechanical loads as it conveys rotational energy between shafts operating at significantly different speeds. These intense and cyclic stresses generate two distinct categories of mechanical failures: localized and distributed defects. Localized defects predominantly arise from fatigue-induced stress concentrations, resulting in common issues such as gear tooth fractures, bearing spalls, and cracks within bearing races. Conversely, distributed defects typically originate from factors including improper installations, manufacturing inaccuracies, misalignments, and abrasive wear phenomena. 3 Both categories significantly impair the gearbox’s functionality, leading to pronounced maintenance challenges and considerable financial burdens due to prolonged downtimes and elevated repair costs. 4 Consequently, the operational and maintenance expenditures associated with WT form a substantial proportion of the total cost involved in wind energy generation projects, estimated to account for nearly 30% of the cumulative lifecycle cost of wind farms. 5 To mitigate these escalating expenses and reduce unplanned downtimes, the adoption of preventive maintenance strategies becomes crucial. Within preventive maintenance approaches, condition monitoring (CM), characterized as a predictive maintenance practice, has increasingly attracted attention. The primary objective of CM is the continuous assessment of machine health through systematic observation of sensor-acquired data from strategically mounted monitoring devices placed at vital subsystem points on the turbine.
CM leverages sophisticated signal processing algorithms combined with advanced decision-making methodologies to identify early fault indicators and precisely quantify the severity of potential defects. Over recent years, researchers have actively explored diverse CM methodologies employing different physical parameters and measurement techniques, encompassing vibration analysis, acoustic emissions, lubricant oil sampling, motor current monitoring, and thermal assessments.6–10 However, individual CM techniques alone frequently exhibit limitations, notably their partial ability in comprehensively detecting all defects. 11 To overcome the limitations inherent to single-technique approaches, integrated CM (ICM) has emerged as a promising strategy.12,13 ICM synthesizes and analyzes data acquired simultaneously from multiple distinct sensor types, thereby achieving more comprehensive diagnostic accuracy. 14 Significant advancements have also occurred within signal processing algorithms for fault detection, specifically focusing on adaptive and expansion-based decomposition techniques. 15 These techniques have shown improvement in diagnostic accuracies in the CM of gearbox.16–20 The progressive integration of deep learning (DL) techniques within fault diagnosis methodologies has been instrumental in revolutionizing the predictive maintenance landscape, especially for WT gearboxes. DL approaches inherently possess advantages such as automated extraction of representative features directly from raw or minimally processed sensor data, thereby significantly minimizing human involvement in manual feature engineering. The adaptive and hierarchical representation capabilities intrinsic to DL networks allow precise recognition and classification of fault characteristics that traditional machine learning techniques might overlook. Moreover, contemporary research efforts have explored various specialized DL architectures tailored explicitly toward gearbox CM tasks. Zhang et al. 21 developed end-to-end fault diagnosis method using empirical mode decomposition and one-dimensional (1D) convolutional neural networks (CNNs), achieving fault recognition accuracy of 96.44%. Similarly, Yu et al. 22 proposed a graph CNN to get fast classification by reducing the number of CNN nodes. The proposed model’s performance was verified using multiple experimental datasets.
Transfer learning (TL) has increasingly become integral in fault detection methodologies due to its capacity to leverage existing knowledge from previously trained models or source domains, enhancing fault diagnosis accuracy particularly in cases of limited target domain data.23,24 The core strength of transfer learning lies in its ability to utilize DL models trained on extensive historical data to detect faults accurately in new or different operational conditions. 25 Li et al. 26 demonstrated this by employing a convolutional autoencoder (CAE) enhanced with transfer learning, termed CAE-TL, which achieved classification accuracies consistently above 90%, significantly outperforming conventional CAE without transfer learning that attained around 70%. Specifically, CAE-TL reached near-perfect classification accuracy for gearbox temperature anomalies and hydraulic pressure issues, underscoring the method’s efficiency in transferring learned diagnostic capabilities across different turbine scenarios. Further validating transfer learning efficacy, Jamil et al. 27 introduced a deep boosted transfer learning approach explicitly designed to minimize the adverse impacts of negative knowledge transfer. This method strategically reduces the influence of irrelevant information from source domain datasets, enhancing fault detection accuracy significantly when compared to traditional DL approaches. Zhu et al. 28 similarly emphasized the utility of domain adaptation strategies where transfer learning models effectively diagnosed complex fault patterns across varying operational conditions by integrating diverse datasets, notably demonstrating that effective domain adaptation can significantly mitigate the discrepancies arising from operational differences among turbine gearboxes. Chen et al. 29 explored different transfer learning algorithms including TrAdaBoost and Inception V3 for fault detection in WT, finding TrAdaBoost highly effective in diagnosing gear cog belt fracture, with accuracy and comprehensive index reaching up to 99.8 and 99.7%, respectively.
ICM has become vital in diagnosing complex multicomponent faults in WT gearboxes, leveraging data fusion from diverse sensor types to overcome the limitations inherent in single-CM techniques. However, practical ICM implementation faces challenges due to frequent sensor malfunctions, leading to missing and incomplete acoustic and vibration signals. Such incomplete datasets severely impair diagnostic reliability by obscuring critical transient fault indicators. Existing research on WT gearbox fault detection predominantly addresses scenarios with complete sensor data availability and overlooks practical challenges such as sensor malfunctions that frequently cause missing data in real-world operations. Furthermore, traditional imputation techniques like interpolation or regression often fail to effectively recover nonlinear, transient features that characterize complex gearbox faults, thereby compromising fault classification performance. To address these gaps, this study proposes an integrated methodology utilizing generative adversarial imputation networks (GAINs) to accurately reconstruct missing acoustic and vibration signals caused by sensor malfunctions across various fault scenarios. The reconstructed signals are compared with the original signals to ensure avoiding no loss of vital information. Following imputation, the continuous wavelet transform (CWT) is applied to convert these signals into two-dimensional (2D) spectrograms, enabling the clear visualization and robust identification of subtle fault signatures. Transfer learning approaches are subsequently employed to classify gearbox faults based on the generated spectrograms. This proposed framework effectively resolves challenges posed by sensor data loss and demonstrates improved diagnostic accuracy, making it highly suitable for practical WT gearbox CM applications. The rest of the article is organized as follows: the second section gives the details of the experimental setup followed by the proposed methodology in the third section. The results of the proposed method are given in the fourth section and finally, the conclusions of the work are detailed in the fifth section.
Experimental setup
The experimental arrangement utilized for conducting the gearbox ICM investigation comprises a scaled-down WT gearbox similar to the configuration previously described by Kandula and Narayana. 30 The gearbox assembly, specifically engineered to replicate the operational complexity inherent in WT drivetrains, consists of three distinct stages of rotational speed reduction. These stages include low-speed stage (LSS), intermediate-speed stage (ISS), and high-speed stage (HSS), each incorporating pairs of parallel spur gears that facilitate mechanical power transmission between sequential stages. The gearbox configuration was carefully chosen to closely emulate actual WT operating scenarios, thereby allowing reliable analysis of fault diagnosis strategies applicable to real-world applications. The power input required for driving the gearbox during the experimental tests was provided by a three-phase, one-horsepower synchronous AC motor. This motor was specifically selected instead of a conventional WT generator due to the primary emphasis of this research being gearbox fault identification rather than electricity generation. The motor directly drives the HSS, and consequently, power propagates sequentially through the ISS and LSS gear stages. The rotational speed of the motor shaft is carefully controlled by employing a variable frequency drive, which enables precise adjustment and stabilization of operational speeds throughout the testing process. To ensure mechanical stability and minimize undesired axial or radial shaft displacements, the gearbox shafts are rigidly supported using standard radial ball bearings. The detailed specifications and structural arrangements of these bearings, as well as the gearbox design, were extensively documented in the earlier study conducted by Kandula and Narayana. 30 Errors propagations related to mechanical and sensor based such as transmission and power losses and data acquisition errors are not considered in this work.
Furthermore, vibration responses from these ball bearings were acquired using high-precision piezoelectric accelerometers manufactured by PCB (model: 320C33), having a broad operational frequency range from 1 Hz to 11 kHz and exhibiting sensitivity of approximately 10 mV/g. These accelerometers were strategically positioned directly on the bearing housings, capturing accurate vibration signals indicative of gearbox component health. Alongside vibration-based CM, acoustic emission measurements were also implemented in this investigation to achieve enhanced fault detection performance. Acoustic data were collected using specialized free-field microphones manufactured by GRAS (model: 40PH), specifically selected due to their operational frequency range extending from 10 Hz to 20 kHz and sensitivity rating of 50 mV/Pa. To optimize signal acquisition quality, these microphones were carefully affixed externally onto the gearbox casing, enabling the efficient capturing of acoustic signatures generated during gearbox operation. Both accelerometer and microphone signals were continuously monitored and recorded onto a computer system using an NI 9234 data acquisition device, facilitating synchronized storage and subsequent data processing. The overall configuration and physical arrangement of the gearbox test facility are depicted schematically in Figures 1 and 2, respectively. Within these diagrams, specific abbreviations denote essential gearbox components such as pinions (P), bearings (B), gears (G), and the precise mounting positions of sensors indicated by “L.”

Gearbox test rig.

Gearbox schematic. 31
Localized failures primarily originating from bearing races and pinions significantly contribute to WT gearbox malfunctions.32–34 The current experimental investigation explicitly focuses on diagnosing faults occurring within bearings and pinion components. Three distinct fault types—inner race (IR) defects, outer race defects in bearings, and pinion tooth root cracks—were deliberately introduced to simulate realistic gearbox failures typically encountered in WTs. Electron discharge machining precisely created these localized defects, accurately replicating real-world fault conditions. The dimensions and specifications of each artificially generated fault, including precise crack depths and widths, are systematically documented in Table 1 and visually depicted in Figure 3. This investigation specifically targeted the challenging scenario of simultaneously occurring multicomponent faults, recognized as particularly complex in prior literature, such as the work by Antoniadou et al. 35 Operating conditions selected for experiments reflected stationary conditions at intermediate difficulty, with the gearbox consistently maintained at a rotational speed of 720 rpm, representing half the maximum achievable motor speed, to ensure repeatable and stable experimental conditions. Comprehensive testing involved various combinations of the induced bearing and pinion faults, systematically detailed in Table 2. For each fault combination scenario, synchronized acoustic and vibration signals were captured at a sampling frequency of 12 kHz, with each data segment comprising exactly 16,384 samples. Consequently, the generated dataset was extensive, encompassing vibration and acoustic signal data, resulting in a dataset of 100 samples per fault combination, each containing 131,072 data points.
Faults dimensions.

Seeded faults: (a) CPT, (b) IR fault, and (c) OR fault. 36 CPT: cracked pinion tooth; IR: inner race; OR: outer race.
Emulated fault types.
CPT: cracked pinion tooth; IR: inner race; OR: outer race; H: healthy; LSS: low-speed stage; ISS: intermediate-speed stage; HSS: high-speed stage.
Methodology
ICM of the gearbox is carried out by collecting acoustic and vibration signals for different fault cases list in the Table 2 using three microphones and five accelerometers equipped on the gearbox as shown in Figure 2. Initially, multiple datasets representing different rates (percentage of missing signal in comparison with the original signal) of missing data that replicate the sensor malfunction in real-world scenarios are created. A missing data imputation strategy using GAINs was proposed that compensates the missing data lost due to sensor malfunctions. Later, CWT is used to process these signals and convert the 1D signals to 2D spectrograms. Finally, fault classification is done using these 2D spectrograms with the help of pretrained transfer learning models. Figure 4 shows the proposed methodology. The working of CWT and GAN are detailed in the following sections.

Proposed methodology.
Generative adversarial networks
Generative adversarial networks (GANs) have emerged as powerful DL frameworks extensively employed for data imputation and reconstruction, particularly effective in addressing missing data issues prevalent in practical engineering scenarios. In practical gearbox ICM applications, GAN-based data imputation (GAIN) methods significantly surpass traditional missing data reconstruction techniques such as interpolation or regression-based imputation, primarily due to their ability to capture complex nonlinear correlations existing within gearbox vibration and acoustic emission signals. The GAIN model was specifically chosen over conventional methods such as k-nearest neighbors (k-NNs) or autoencoders due to its distinctive architecture capable of effectively modeling complex, nonlinear patterns in gearbox sensor data. Unlike k-NN, which relies on local similarity and often struggles with high-dimensional, sparse sensor data, GAIN leverages adversarial training between generator and discriminator networks to capture deeper, global data dependencies. 36 Compared to autoencoders, GAIN uniquely incorporates explicit masking and adversarial feedback mechanisms, enabling it to better reconstruct intricate transient signals characteristic of gearbox faults. 36 Thus, GAIN ensures accurate data imputation by not merely minimizing reconstruction error, but by actively distinguishing actual signal patterns from artificially generated ones. This ability significantly improves reconstruction fidelity and reliability, directly enhancing subsequent fault classification accuracy.
Gearbox faults typically manifest themselves as nonlinear, transient signal disturbances exhibiting characteristic spectral patterns and temporal signatures. Traditional linear interpolation or statistical methods frequently fail to accurately reconstruct such nonlinear and transient features, leading to degraded fault diagnosis accuracy. 37 In contrast, GAINs inherently capture sophisticated signal dynamics and intricate spectral–temporal relationships due to their underlying deep neural network architectures, which effectively model nonlinear interactions and signal correlations. 38 GAIN accomplishes the accurate modeling of complex signal dynamics by employing two interconnected neural networks—a generator and a discriminator—that train adversarially. The generator aims to reconstruct missing data points, utilizing observed signal information and random noise to capture underlying nonlinear relationships effectively. Concurrently, the discriminator evaluates the authenticity of these reconstructions, encouraging the generator to produce increasingly realistic signals. This competitive interaction drives the model to learn sophisticated spectral–temporal features and correlations embedded within gearbox sensor data. Consequently, GAIN reliably reconstructs signals by implicitly learning the intrinsic signal characteristics. In current gearbox ICM, GAINs offer promising solutions for rectifying incomplete or missing sensor (for vibration and acoustic signals) data caused by sensor malfunctions, signal disruptions, or acquisition anomalies commonly encountered in WT operational environments. Such data irregularities pose significant challenges in accurate fault diagnosis due to gaps or inconsistencies within the collected vibration and acoustic emission signals. Within this context, GAINs demonstrate superior capabilities in recovering missing information by capturing intricate underlying patterns and distributions present within healthy and faulty gearbox operational signals. 39
Mathematically, a GAIN architecture comprises two critical components: a generator network (G) and a discriminator network (D), which are simultaneously trained in a competitive, adversarial manner. The primary objective of the generator is synthesizing realistic data closely resembling authentic signals, whereas the discriminator’s primary function is differentiating between real and artificially generated data. This dual-network competitive interplay effectively guides the generator toward producing data indistinguishable from genuine measurements, thereby enhancing its reconstruction capabilities. Formally, the GAIN optimization process represents a minimax game between the generator and discriminator, expressed mathematically in Equation (1):
In this equation, V represents the objective function governing the adversarial optimization process. The expectation (E) notation indicates averaging operations over respective distributions, with pdata(x) denoting the true distribution of observed gearbox signals, and pz(z) symbolizing a random noise distribution input fed into the generator. Specifically, x refers to real observed vibration or acoustic signals, while z is an artificially generated random vector drawn from Gaussian distributions. The minimax objective used by GAIN is particularly suitable for time-series reconstruction because it inherently facilitates capturing the intricate distributional and temporal structures within gearbox signals. This objective function systematically pushes the generator to minimize divergence from real signal patterns, guided by discriminator feedback, while simultaneously compelling the discriminator to improve differentiation capabilities. 40 Such adversarial interaction effectively accommodates complex temporal structures and transient fault dynamics characteristic of gearbox sensor data. During GAIN training, the discriminator is initially optimized to maximize the likelihood of correctly classifying authentic gearbox measurements versus generator-produced data. Conversely, the generator optimization seeks to minimize this classification likelihood by producing realistic, synthetic data effectively deceiving the discriminator. Through iterative adversarial training, this dual-objective minimax optimization systematically improves the generator’s ability to create high-fidelity approximations of real sensor data, ultimately facilitating accurate reconstruction of missing gearbox signal segments.
To specifically address missing data scenarios encountered in gearbox sensor measurements, modified GAIN architectures have been developed, prominently including conditional GAINs. This GAIN variant incorporates additional conditional labels into their networks, significantly enhancing the accuracy and relevance of generated signals under incomplete or partial sensor data scenarios. Conditional GAIN architecture explicitly conditions the generated data on observed data segments, thereby ensuring contextual consistency in reconstructed data and preserving critical fault-related features intrinsic to original gearbox measurements. In conditional GAIN formulations, the generator function is mathematically represented by Equation (2):
Here, y represents the conditional contextual information or observed partial segments from actual sensor signals, while x′ denotes the imputed gearbox signal produced by the GAIN generator. This conditioning mechanisms allow GAINs to effectively capture and integrate the temporal and spectral dependencies present within sensor data, ensuring accurate reconstruction of missing signal intervals. The GAIN-reconstructed datasets preserve critical fault-related information, ensuring accurate feature extraction, and subsequent reliable gearbox fault classification utilizing methodologies such as CWT and transfer learning-based classifiers employed in the proposed methodology.
Continuous wavelet transform
CWT is an advanced signal-processing technique extensively used in CM applications due to its powerful ability to analyze transient characteristics and nonstationary signals that typically occur in machinery fault diagnostics. 41 Furthermore, one of the major advantages inherent to CWT is its inherent robustness to signal noise, making it particularly effective in diagnosing machinery faults from signals that often contain substantial background disturbances. 42 Unlike conventional Fourier analysis techniques such as short-time Fourier transform, CWT inherently adapts its time–frequency resolution according to the analyzed frequency band. Specifically, CWT provides higher temporal resolution at higher frequencies (short-duration events), while maintaining improved frequency resolution at lower frequencies (long-duration events). 43 This adaptive resolution capability significantly enhances CWT’s performance in capturing both rapidly evolving transient phenomena and sustained low-frequency vibrations characteristic of gearbox fault signatures simultaneously. Consequently, employing CWT-based feature extraction ensures reliable detection and characterization of subtle fault signatures even under noisy experimental conditions, considerably enhancing fault identification performance. The effectiveness of CWT in gearbox ICM also lies in its capability to distinctly identify overlapping signals resulting from simultaneous multicomponent faults. Given the complexity inherent in real-world gearbox systems, accurately isolating distinct fault types can be challenging. However, the detailed spectrograms produced by CWT explicitly reveal subtle yet differentiating features enabling precise isolation and identification of multiple simultaneous faults within gearboxes. For instance, combined inner-race bearing faults and pinion cracks generate distinct yet overlapping frequency-domain and temporal signatures. Through meticulous scale adjustment and appropriate wavelet function selection, CWT effectively disentangles such complex interactions, accurately isolating individual fault contributions within the resultant spectrograms. This property is invaluable for robust and reliable diagnostic decision-making processes, especially when diagnosing intricate fault combinations encountered in realistic WT gearbox scenarios.
In the context of gearbox ICM, CWT serves as an essential component of the proposed methodology by transforming 1D vibration and acoustic emission signals collected from sensors into 2D spectrogram representations. These spectrograms vividly illustrate the localized variations within signals in both the frequency and time domains simultaneously, thereby facilitating the identification of specific fault signatures. This approach significantly improves the capability of diagnosing multicomponent faults within WT gearboxes. Specifically, the raw vibration and acoustic signals obtained from multiple accelerometers and microphones equipped on the experimental gearbox (as illustrated previously in Figure 2) are initially subjected to the CWT analysis. CWT aids in identifying subtle changes embedded within the signals arising due to fault occurrences by converting them into informative time–frequency representations. CWT indirectly supports imputation quality assessment by converting signals into spectrograms, enhancing visual interpretability beyond root mean square error (RMSE), coefficient of determination (R2), and correlation metrics. This transformation reveals subtle distortions not captured by numerical scores, allowing validation of transient and spectral feature preservation crucial for accurate fault classification in gearbox CM. Mathematically, the CWT of a given signal x(t) involves its decomposition onto scaled and shifted versions of a wavelet function ψ(t). The CWT is mathematically defined by the integral given in Equation (3):
In the above equation, W x (a,b) represents the wavelet coefficient, indicating the similarity between the analyzed signal and the wavelet function. Here, a symbolizes the scale parameter, directly related to the frequency of the wavelet, whereas b denotes the translation parameter indicating the position of the wavelet along the time axis. The superscript “*” symbolizes the complex conjugation operation of the wavelet function. By systematically varying the scale and translation parameters, the wavelet function scans the entire signal to accurately capture signal characteristics at diverse scales and distinct positions, producing a robust and informative representation known as a scalogram. The scalogram visually encodes the energy distribution across frequencies and time, thereby allowing precise detection and identification of anomalies within signals originating from machinery faults such as bearing race cracks, pinion fractures, and gear misalignments. The choice of the wavelet function, often referred to as the “mother wavelet,” is critically important for obtaining accurate and reliable results from CWT-based analysis. Common wavelets utilized in mechanical fault diagnostics include Morlet, Mexican Hat, Gaussian, and Daubechies wavelets, each characterized by unique features and suitability for specific signal processing scenarios. In the current work, the Morlet wavelet is prominently utilized due to its exceptional capability in capturing transient characteristics embedded within acoustic and vibration signals. The Morlet wavelet function is mathematically expressed as follows in Equation (4):
Here, ω0 represents the central frequency of the Morlet wavelet, and j denotes the imaginary unit. This complex-valued wavelet simultaneously encapsulates amplitude and phase information, providing comprehensive insights into frequency components and temporal localization within nonstationary gearbox signals. The selection of suitable wavelet parameters, especially the central frequency and scale range, significantly influences the resolution and clarity of generated spectrograms. Smaller scales correspond to high-frequency components typically associated with transient features such as impacts, whereas larger scales capture low-frequency phenomena related to structural resonances. Consequently, appropriate parameter selection ensures effective extraction of essential features required for successful fault detection. Upon successful application of CWT, the derived wavelet coefficients are converted into spectrograms which effectively illustrate how signal energy distribution evolves over time across various frequency bands. Such 2D representations considerably enhance the interpretability of fault-related features as compared to raw 1D time-domain signals. Spectrograms clearly exhibit distinctive patterns corresponding to localized faults such as bearing defects and pinion tooth fractures. For instance, bearing inner-race faults typically generate periodic transient impulses manifesting as characteristic repetitive spikes at specific frequencies, easily detectable through wavelet-based spectrogram analysis. Similarly, pinion tooth root cracks produce distinct alterations in gear meshing frequencies and introduce abnormal frequency harmonics, uniquely identifiable via CWT-derived representations. Hence, the resultant spectrograms obtained through CWT enable significant improvement in fault classification accuracy when utilized as inputs for pretrained transfer learning-based classification models, forming a robust diagnostic tool integrated within gearbox CM approaches.
Results and discussions
Data imputation and signal processing
GAINs validly model gearbox fault signals by implicitly learning complex, non-Gaussian distributions through adversarial training. 44 Without relying on parametric assumptions, the generator iteratively approximates real signal distributions, guided by discriminator feedback. This enables accurate representation of nonlinear, transient features inherent in gearbox faults, supporting the methodology’s foundational assumption. 45 In-addition, temporal dependencies are preserved in GAIN through its input structure, which combines observed signals with mask vectors, allowing the generator to capture local context. The discriminator enforces continuity by penalizing fragmented imputations. This interaction ensures sequential integrity, maintaining essential temporal patterns critical for accurate gearbox fault diagnosis. 40 Acoustic and vibration signals are collected from the gearbox for the different fault scenarios mentioned in the second section are used to create multiple missing rates of data. Five different missing rates of data ranging from 10 to 50%, given in Table 3, of original signal length that mimics the real-world conditions of sensor malfunctions throughout their life time. All the fault cases mentioned in Table 2 are considered in each missing rate dataset. Figure 5 shows the missing lengths for dataset A10 and A50 of the signals obtained from all the sensors for fault case c1. For clear representation only a sample length of 250 is shown in Figure 5. Despite the current gearbox investigations are performed under stationary operating conditions, the nonstationary nature of gearbox signals introduces challenges in GAIN training, as abrupt changes from transient faults cause fluctuations in data distributions, often destabilizing the learning process. To address this, normalization is applied to the sensor data prior to training, standardizing the input range and reducing variance across samples. This ensures a consistent statistical structure, which helps stabilize adversarial training dynamics and improves convergence. 46 Additionally, the balanced use of adversarial and reconstruction loss allows the model to adaptively learn relevant features from transient signals without overfitting to anomalies, maintaining training robustness and preserving fault-relevant signal integrity. After creating the multiple datasets mentioned in Table 3, the entire dataset is normalized to values between 0 to 1 to prevent the larger difference and variable measuring units dominating the learning process of the GAIN. These normalized original and variable missing rates datasets are utilized to train the GAIN to get the imputed signals.
Details of variable missing rates datasets in comparison to original signal length.

Missing signals (amplitude vs sampling point) for fault scenario c1 for dataset: (a) A10 and (b) A50 representing accelerometer and microphone channels.
GAIN is developed to impute the missing data. The normalized input signals and their corresponding masks indicating missing data positions constitute the primary inputs to the GAIN architecture. This approach ensures that the network explicitly differentiates between observed and missing data points. The generator network begins by accepting an input vector constructed through the concatenation of two essential components: the observed normalized data points and their associated mask vectors. This input vector possesses twice the dimensionality of the original signal, facilitating comprehensive learning of missing and available signal relationships. The first layer within the generator is a fully connected dense layer, initialized using Xavier initialization, promoting stable weight adjustments during training. This initial dense layer transforms the concatenated input into a hidden representation of equivalent dimensionality to the original data space. To incorporate nonlinear signal characteristics such as transient vibrations and subtle acoustic fluctuations typical in gearbox faults, a rectified linear unit (ReLU) activation function is utilized. This layer effectively encodes intricate, nonlinear interactions and signal dependencies, crucial for accurately predicting missing values. The second hidden layer further refines these encoded features through another fully connected structure, similarly initialized with Xavier initialization and activated using ReLU functions. This additional layer allows the network to capture deeper signal dependencies and enhances the quality of imputations. Subsequently, the output layer of the generator employs a sigmoid activation function, explicitly chosen to constrain outputs within the normalized signal range between zero and one, consistent with the preprocessing stage. This final output represents imputed values, filling in missing data points within the vibration and acoustic signals.
The discriminator network’s primary responsibility is evaluating the authenticity of generated imputations. Its input vector consists of a combination of generated data from the generator and the “hint” vectors, where the hints provide partial information regarding original missing positions. These hints significantly stabilize the discriminator’s performance, improving its accuracy in distinguishing real from synthetic signals. The discriminator’s first hidden layer integrates this input using a fully connected dense structure with Xavier initialization, followed by ReLU activation. This layer is responsible for extracting discriminative features, differentiating between authentic and generated signal segments. A second hidden layer, also fully connected and employing ReLU activation, further processes these features, enhancing the discriminator’s capability to identify subtle differences between real and generated signal characteristics. The discriminator concludes with an output layer employing a sigmoid activation function, producing probabilities that indicate the likelihood of the input being authentic or imputed. This outcome guides the generator network by providing feedback that assists in enhancing its generated imputations during subsequent training iterations. Figure 6 gives the network architecture of the GAIN model.

GAIN network architecture. GAIN: generative adversarial imputation network.
The loss functions used in GAIN explicitly define the objectives of each network. The discriminator’s loss function directly measures the accuracy of identifying genuine versus artificially generated data. Specifically, the discriminator loss is constructed using binary cross-entropy, computed between the discriminator’s predictions and the actual observed masks, effectively encouraging the discriminator to improve its predictive accuracy. In contrast, the generator’s loss combines two separate terms: the adversarial loss and the mean squared error (MSE) loss. The adversarial component encourages the generator to create imputations indistinguishable from authentic data by penalizing outcomes where the discriminator correctly identifies synthetic data. Simultaneously, the MSE loss explicitly minimizes the difference between real data and generated imputations at positions where data are available, effectively guiding the generator toward producing realistic imputations closely resembling true signal values. The balance between adversarial and reconstruction losses is carefully regulated through a hyperparameter termed alpha (α), controlling the trade-off between realism and reconstruction fidelity. 39 Optimization of both generator and discriminator parameters is conducted through the Adam optimizer. This method adapts learning rates based on historical gradient information, efficiently converging toward optimal solutions. Iterative training using batches of gearbox signal data systematically refines both networks: the discriminator progressively improves its discriminative capacity, while the generator continuously enhances its ability to produce accurate, realistic imputations. Upon completion of training, the generator network outputs imputed datasets suitable for precise gearbox fault diagnosis tasks. Figure 7 shows the normalized original and imputed signal for dataset A10 and A50 of the signals obtained from all the sensors for fault case c1.

GAIN imputed signals (amplitude vs sampling point; normalized) for fault scenario c1 for dataset: (a) A10 and (b) A50. GAIN: generative adversarial imputation network.
Performance evaluation of the GAIN applied to gearbox signals involves key metrics: RMSE, R2, mean absolute error (MAE), and Pearson’s correlation coefficient. RMSE quantifies the average magnitude of the differences between imputed and actual signal values, emphasizing larger errors. The R2 value reflects how accurately imputed signals replicate original signal variance, thereby indicating the quality of reconstruction. MAE assesses the mean absolute difference, providing robust evaluation of imputation precision across all sensor data. Pearson’s correlation measures the linear relationship strength between original and reconstructed signals, signifying the consistency and fidelity of the imputation process. The hyperparameter α in the employed GAIN significantly influences the trade-off between adversarial loss and reconstruction accuracy, critically impacting the imputation quality for gearbox signals. In this study, α was empirically tuned and selected as 0.5 after evaluating a range of values (from 0.1 to 2.0). This choice provided a balance between generating realistic signal features through adversarial loss and ensuring precise reconstruction accuracy via MSE. Lower α values (e.g., 0.1) excessively emphasized realistic signal generation, compromising reconstruction fidelity, leading to higher RMSE (>0.08) and lower R2 values (<0.95). Conversely, higher α values (e.g., 2.0) overly prioritized reconstruction accuracy, neglecting realistic transient features essential for gearbox fault diagnostics. An α of 0.5 effectively balanced these aspects, yielding optimal signal reconstruction metrics—RMSE around 0.066 to 0.072, R2 consistently above 0.98, and Pearson’s correlation exceeding 0.99. This enabled effective fault feature retention, subsequently achieving superior classification accuracy. Thus, selecting α as 0.5 was specifically meaningful for ensuring accurate reconstruction and reliable fault classification in this gearbox CM application.
Table 4 gives the averaged performance metrics for the different datasets mentioned in Table 3. These values represent the averaged performance across different fault scenarios mentioned in Table 2 for each dataset given in Table 3. The performance metrics indicate consistent effectiveness in signal reconstruction. The RMSE values range narrowly from 0.064273 (A30) to 0.07238 (A20), demonstrating that increasing missing data rates from 10 to 50% do not significantly compromise reconstruction accuracy. Similarly, the MAE shows a stable trend, fluctuating slightly between 0.047018 (A30) and 0.053607 (A50), highlighting robust performance even at higher missing rates. The R2 values remain notably high, decreasing gradually from 0.998253 at 10% missing data (A10) to 0.98077 at 50% (A50), yet still signifying excellent preservation of the original signal variance. Pearson’s correlation coefficients, consistently above 0.99 (ranging from 0.999133 for A10 to 0.990339 for A50), confirm the strong linear relationship and high fidelity between original and imputed signals, validating GAIN’s efficacy across diverse sensor data conditions.
GAIN performance metrics.
GAIN: generative adversarial imputation network; R2: coefficient of determination; RMSE: root mean square error; MAE: mean absolute error.
Further, the imputed signals are denormalized to original scale and are processed using CWT, described in “Continuous wavelet transform” section to convert 1D signals to 2D images. CWT transformation influences the imputation quality assessment indirectly by converting imputed signals into a spectrogram format suitable for visual inspection and robust classification evaluation. While the imputation quality itself is primarily quantified through metrics such as RMSE, R2, and Pearson’s correlation, CWT enhances interpretability and validation by visually representing signal details in a time–frequency domain. Consequently, subtle reconstruction inaccuracies or distortions that may not be evident in raw signal metrics become visually identifiable in the spectrograms. This added visual dimension enables a deeper assessment of how well critical transient and spectral features are preserved following imputation, reinforcing confidence in the model’s ability to capture fault-specific characteristics necessary for accurate fault classification. For each fault case, 999 images per sensor per fault case are generated. The total size of dataset used for classification has dimensions of 13 × 8 × 999 (13 fault cases × 8 sensors × CWT images). These images are later fed as input to the pretrained models to classify the faults. Figure 8 shows a sample CWT spectrogram for fault scenarios c1 and c2 for dataset A10.

CWT spectrograms (scale vs sampling point) for dataset A10 for sensor L1 for fault scenarios: (a) c1 and (b) c2. CWT: continuous wavelet transform.
Fault classification
Directly applying transfer learning to 1D imputed time-series signals is ineffective due to domain mismatch with pretrained CNNs designed for 2D inputs. CWT resolves this by converting signals into spectrograms, aligning data format with CNN requirements, thereby enhancing feature extraction and improving fault classification accuracy. In this work, transfer learning was strategically employed by leveraging pretrained CNNs, specifically SqueezeNet and DenseNet-201, initially trained on the large-scale ImageNet dataset. The source domain in this context comprises the general image classification tasks from ImageNet, characterized by diverse and extensive visual data. Conversely, the target domain includes spectrogram images derived from gearbox vibration and acoustic signals, representing distinct gearbox fault scenarios. Domain similarity was meticulously ensured by converting the gearbox sensor signals into spectrograms using the CWT, transforming the signals into visually interpretable images compatible with the CNN models’ training data structure. Thus, despite differences in the data’s initial form and source, the feature representation method employed effectively bridged domain discrepancies. Both models are well-suited for this application due to their inherent characteristics. SqueezeNet is particularly effective for gearbox fault detection owing to its compact yet highly efficient architecture, making it suitable for computationally limited environments and providing rapid predictions with lower memory requirements. 47 DenseNet-201, 48 conversely, is renowned for its superior feature extraction capabilities, resulting from densely connected convolutional layers, facilitating enhanced fault signature recognition by effectively capturing subtle differences and complex features present within gearbox fault spectrograms. Furthermore, the distinct architectures of these models complement one another, ensuring comprehensive and robust performance across varying fault conditions. Both networks were trained using the Adam optimizer, known for its adaptive learning rate capabilities, particularly advantageous when dealing with complex nonlinear fault signatures inherent in gearbox signals.
SqueezeNet’s specific architecture, shown in Figure 9, comprises a total of approximately 1.24 million parameters, organized primarily through innovative structures termed “fire modules.” Each fire module features two fundamental sublayers: a “squeeze” convolutional layer followed by an “expand” layer. The squeeze layer reduces computational complexity through fewer filters, typically employing 1 × 1 convolutions that limit input channel dimensionality, while the expand layer, combining both 1 × 1 and 3 × 3 convolutions, subsequently enhances feature extraction depth. This architecture was fine-tuned specifically for gearbox fault spectrograms with an initial learning rate of 0.0001, ensuring stable and incremental parameter updates conducive to accurately capturing intricate signal variations. Training spanned a maximum of 150 epochs, sufficient to achieve convergence while avoiding overfitting. Mini-batch sizes of 32 were selected to optimize computational efficiency and training stability, allowing the model to process sufficient data points per iteration for reliable gradient estimation. Additionally, model performance was periodically validated every 30 iterations, enabling continuous monitoring and early detection of overfitting, ensuring consistently reliable classification accuracy across varying gearbox faults.

SqueezeNet architecture diagram.
DenseNet-201, with approximately 20 million parameters, employs an advanced architecture consisting of multiple dense blocks, each densely connecting convolutional layers with shortcut connections (shown in Figure 10). Unlike conventional convolutional networks, DenseNet’s dense connectivity ensures each layer receives direct input from all preceding layers, enhancing information propagation, alleviating vanishing gradients, and reinforcing the retention of detailed fault signatures within the gearbox spectrogram data. DenseNet-201 incorporates a total of 201 layers structured into dense blocks interleaved by transition layers comprising batch normalization, convolutional layers (typically 1 × 1 convolutions), and pooling layers. This structured organization facilitates dimensionality reduction and computational efficiency. The model was trained using identical hyperparameters, specifically employing the Adam optimizer at an initial learning rate of 0.0001. However, considering its greater parameter complexity, DenseNet-201 required fewer epochs (maximum of 80) to reach optimal convergence without sacrificing predictive accuracy. Mini-batch sizes of 50 were selected, balancing computational load and accuracy, ensuring robust gradient estimation without computational overhead. Validation assessments occurred every 30 iterations, consistently verifying the model’s performance throughout training, facilitating the early detection of issues such as potential overfitting or insufficient learning. This strategy ensured that DenseNet-201 effectively distinguished fault conditions, reliably identifying and classifying faults under various gearbox scenarios with exceptional precision.

DenseNet-201 architecture diagram.
The classification performance metrics obtained using SqueezeNet and DenseNet-201 models on the datasets A10–A50, characterized by different missing data rates, demonstrate robust capabilities for gearbox fault diagnosis and are given in Tables 5 and 6. The results highlight the effectiveness of both models in accurately classifying faults even when the input signals are reconstructed using GAINs. Specifically, SqueezeNet shows a progressive decline in accuracy from 91.512% at the lowest missing rate of 10% (A10) to 87.509% at the highest missing rate of 50% (A50). This indicates a relatively small reduction of approximately 4% across the full range of missing data rates, reflecting its resilience in maintaining classification accuracy despite higher data uncertainty. The precision and recall values of SqueezeNet closely follow the accuracy trends, ranging from 91.483 to 87.515% and 91.573 to 87.644%, respectively, confirming balanced performance between false positives and negatives across increasing missing data scenarios. Correspondingly, the F1 score, representing the harmonic balance between precision and recall, also demonstrates consistency, decreasing marginally from 91.498% (A10) to 87.504% (A50). Such metrics emphasize that SqueezeNet effectively sustains accurate fault classification across various levels of imputed signals, highlighting its suitability for gearbox CM tasks in practical situations where sensor malfunctions introduce significant data loss.
SqueezeNet classification performance metrics.
DenseNet-201 classification performance metrics.
DenseNet-201 exhibits even stronger classification performance metrics, consistently outperforming SqueezeNet across all missing rates. The accuracy for DenseNet-201 shows a subtle and gradual reduction, starting from 95.205 at 10% missing data (A10) and decreasing slightly to 92.343% at 50% missing data (A50). This represents only about a 3% decrease across the varying rates, demonstrating the network’s outstanding robustness and stability, particularly in handling the imputed signals. Precision values for DenseNet-201 similarly display remarkable consistency, varying minimally from 95.216% (A10) to 92.321% (A50). The recall metrics, which measure the model’s sensitivity in detecting true fault cases, exhibit slight variations from 95.163% at A10 to 92.291% at A50. These values indicate a well-maintained balance between accurately identified faults and missed detections. Furthermore, the F1 score for DenseNet-201 closely aligns with accuracy and precision, presenting values ranging narrowly from 95.218% (A10) to 92.314% (A50). Collectively, these performance indicators confirm DenseNet-201’s superior ability to manage complex, subtle differences among fault scenarios, highlighting its suitability for gearbox fault diagnosis even under significant sensor data deficiencies.
Comparatively, DenseNet-201 demonstrates enhanced accuracy, precision, recall, and F1 scores over SqueezeNet, underscoring its dense-layer connectivity advantage, which facilitates detailed feature extraction and representation from CWT-derived spectrogram images. Despite the slightly decreased performance in both models at higher missing rates, the degradation remains minimal, underscoring the efficiency and reliability of the GAIN-based imputation process. The high accuracy retained by both networks, particularly DenseNet-201, further validates the effectiveness of the employed transfer learning approach for identifying complex gearbox faults from spectrogram representations, notwithstanding signal imperfections caused by real-world sensor malfunctions. This consistent classification accuracy across varying data completeness highlights the capability of combining advanced DL architectures with imputation techniques, providing a robust framework for gearbox CM applications. Such an approach proves particularly valuable in maintaining high predictive accuracy, reliability, and diagnostic precision critical for real-world industrial gearbox fault detection scenarios. Table 7 gives the comparison of the proposed work with existing techniques. Despite the current method shows little less accuracy than reported literature, the proposed method was able to impute the missing data under sensor malfunction resulting in a high diagnostic accuracy of 93.58%.
Classification performance comparison.
The current study does not incorporate an explicit framework for uncertainty quantification during fault diagnosis under missing data conditions. However, the methodology addresses uncertainty indirectly through rigorous evaluation of imputation quality using multiple performance metrics—RMSE (ranging from 0.064273 to 0.07238), R2 (0.98077 to 0.998253), and Pearson’s correlation coefficients (above 0.99)—across datasets with 10 to 50% missing data. These results demonstrate stable and accurate signal reconstruction even at higher missing rates, which indirectly reflects low uncertainty in imputed signal quality. Additionally, classification metrics such as DenseNet-201’s accuracy (95.205% for A10 and 92.343% for A50) further reinforce the model’s robustness despite imputation-induced variability.
Conclusions
In the current investigation, a robust and efficient ICM methodology has been developed to accurately diagnose multicomponent gearbox faults in WT. Acoustic and vibration signals collected from strategically placed accelerometers and microphones were initially subjected to controlled missing data scenarios that simulate realistic sensor malfunctions commonly encountered in industrial settings. To address the challenge posed by incomplete sensor data, GAINs) were employed, exhibiting superior performance compared to traditional imputation techniques. CWT subsequently transformed the imputed 1D signals into 2D spectrograms, significantly enhancing fault identification capabilities through improved visualization of transient fault signatures. Fault classification was successfully achieved using advanced pretrained DL models, namely SqueezeNet and DenseNet-201, demonstrating consistently high accuracy across varying levels of missing data. The following conclusions are derived from the comprehensive analysis performed in this study:
The proposed GAIN approach demonstrated remarkable effectiveness in reconstructing gearbox signals, with the RMSE consistently ranging from 0.064273 to 0.07238, indicating minimal reconstruction errors even at higher missing data rates.
The R2 values remained notably high (0.98077–0.998253), underscoring GAIN’s exceptional capability to preserve original signal variance and maintain the physical relevance of imputed signals.
Pearson’s correlation coefficients exceeding 0.99 confirmed a strong linear correlation between original and reconstructed signals, ensuring fidelity and reliability of the imputation process for subsequent fault classification.
DenseNet-201 consistently outperformed SqueezeNet across all missing data conditions, achieving accuracies ranging from 95.205 to 92.343%, reflecting only a minimal reduction of approximately 3% despite increasing data loss.
SqueezeNet exhibited robust performance, maintaining an accuracy of approximately 91.512% at 10% missing data and 87.509% at 50% missing data, validating its suitability for rapid and efficient fault classification in resource-constrained environments.
This study has demonstrated that the integration of advanced imputation methods such as GAIN with sophisticated DL-based transfer learning architectures significantly enhances gearbox fault detection accuracy under realistic sensor malfunction scenarios. The developed methodology is well-suited for practical industrial applications due to its ability to handle complex nonlinear signals and maintain consistent accuracy even when substantial sensor data loss occurs. While the developed GAIN-based data imputation model exhibited high robustness and accuracy, certain limitations remain. The current approach did not explicitly evaluate the sensitivity of the GAIN model performance to different types of sensor malfunctions. Future research could systematically explore these sensitivities to better quantify their effects on imputation accuracy. Moreover, the study primarily considered steady-state operating conditions at a fixed speed (720 rpm), potentially limiting the model’s generalizability under variable speed and load conditions typically encountered in practical WT operations. Additionally, the classifier performance, particularly for SqueezeNet, exhibited a moderate accuracy reduction from 91.512% (at 10% missing data) to 87.509% (at 50%), highlighting potential vulnerabilities in extreme data-loss scenarios. Evaluating and addressing these limitations could significantly improve the robustness and real-world applicability of the proposed diagnostic approach. Furthermore, the adaptability of the GAIN and CWT methodologies to broader classes of machinery fault scenarios warrants additional exploration, emphasizing robustness and real-time implementation potential for comprehensive CM systems.
Footnotes
Acknowledgements
The authors would like to acknowledge that this project is funded by Department of Science and Technology, Government of India under the CRG scheme CRG/2022/002439.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is funded by Department of Science and Technology, Government of India under the CRG scheme CRG/2022/002439.
