Abstract
Prognostics and health management has become a significant part of component life-cycle in modern industries. The prognostics and health management framework is implemented in the industries to identify the fault type, assess fault severity, and predict the future state or remaining useful life to optimize the maintenance activities. Three significant aspects of a prognostics and health management framework are diagnostics, prognostics, and decision making. This article presents a review of different types of diagnostic and prognostic approaches (i.e. physics-based, data-driven, and hybrid approaches) developed for the gears. The flow of information between diagnostics and prognostics parts of the framework is briefly discussed. Regarding the physics-based approaches, this article discusses different physics-based diagnostic and prognostic models developed for different types of gear failure modes such as crack, pitting, and wear. In the data-driven approaches, the article attempts to summarize the data processing techniques used for extracting fault-related information from the recorded raw vibration signal, health indicators developed for different kinds of gear failure modes, processing/selection approaches for best health indicators, fault classification, and fault prognostic models particularly developed for the gear. The article discusses how a hybrid approach can be developed by the integration of a data-driven diagnostics approach and a physics-based prognostics approach. Finally, uncertainty quantification of prognostic approaches, performance evaluation metrics, decision-making strategies, and future research and development perspectives are discussed. This article focuses on the diagnostic and prognostic approaches developed for gears, given the fact that these approaches for other components such as bearing and batteries are reviewed in the past.
Introduction
The power transmission (e.g. speed and torque conversion) from one shaft to another shaft can be achieved using a gearbox, chain-sprocket mechanism, belt-pulley mechanism, and so on. Compared to the gearbox, the belt-pulley and chain-sprocket arrangements are cost-effective and easy to use. However, these mechanisms have many disadvantages such as less service life, low load capacity, and less velocity ratio, and mostly preferred when power needs to be transmitted between shafts with large center distance. Comparatively, the gearbox is most widely used for power transmission due to large and constant velocity ratio, compact construction, higher transmission efficiency, large power transmission, longer service life, and so on. Considering its wider use in the industry, the uninterrupted and quiet operation of the gearbox is of paramount importance.
In a survey, 1 it is found that 74.7% of time gears fail due to service-related causes (user) and 25.3% of the time they fail due to design errors, incorrect manufacturing processes leading to material defects, heat treatment errors, manufacturing defects, and so on. The reasons for service-related failure are continuous overloading, higher levels of speed and torque fluctuations, improper assembly/alignment, impact loading, improper lubrication, ingress of foreign material in the tooth contact area, incorrect handling, operator error, and so on. The failure statistics in several key application areas are a pointer to a need for a systematic and concerted effort on the diagnostics and prognostics approaches. In helicopter transmission system failure, 19.1% of the failures are found due to gears. 2 In some applications, even if the gear failures are relatively less frequent, the downtime and associated costs are relatively higher. For example, in a survey on wind power systems, it was found that although the gears have only 9.8% of the total failure, they add up to 19.4% of the complete downtime of a wind turbine. 3 In addition, the unexpected or unplanned shutdown of the gearbox may lead to loss of human life in some applications. For example, in April 2009, 16 people lost lives due to the catastrophic failure of the gearbox of the North Sea helicopter. 4 In summary, the failure/degradation of a geared system increases unplanned outages, reduces productivity, increases operating and maintenance costs, and so on. Hence, it is critical to have a good condition monitoring (CM) system in order to reduce gear failure. Prognostics and health management (PHM) is an emerging discipline for gear condition assessment. The implementation of the PHM framework allows a cost-effective maintenance practice as it can give indications and warning prior to collateral damages and help in improvement in availability and reliability. 5
Diagnostics, prognostics, and decision making/health management are three main parts of a PHM framework. The diagnostic process deals with fault detection, identification, classification, and so on and provides the current health stage (e.g. zero pitting, initial pitting, medium pitting, severe pitting on the gear tooth surface) information of the gear. Prognostics is the remaining useful life (RUL) prediction of a gear based on their current and historical health condition. More specifically, based on the current loading condition and expected future operational and environmental conditions, prognostic approaches predict the RUL by estimating the development of a fault on the gear tooth surface given the present level of degradation. 6 The prognostic approaches in a PHM framework can be implemented in three different ways using physics-based approaches, sensor data-driven approaches, and hybrid approaches (a combination of sensor data-driven and physics-based approaches). The decision making/health management process involves the actions implemented based on the estimate of the current health state and the predicted RUL. The decision-making step ensures a reduction in life-cycle costs, safe operation or reducing the catastrophic failures, increases system availability by expanding the maintenance cycles, reduction in inspections, timely repair actions, and improves design and logistical support. 6 The same decision-making procedure can be implemented for any type of PHM framework, that is, physics-based, data-driven, or hybrid.
Some researchers7,8 believe that diagnostics and prognostics need to be considered as an integrated process, whereas others9–13 are of the view that diagnostics and prognostics can be achieved separately. However, it is indeed true that the performance of prognostic approaches depends on the type and accuracy of the diagnostics output (i.e. type of health indicators (HIs), fault types/severity, and its progression rate), especially for components that have multiple failure behaviors/modes. In fact, diagnostic can be independent of prognostic, not the vice versa. Different failure modes have different dynamics of initiation and rate of propagation. Hence, prognostics approach developed for one kind of failure mode may not work for different kinds of failure modes. If a component has a single failure mode, then prognostics might be performed independently without the need for diagnostics output. Hence, for the prognostics, regardless of the type of approach (physics-based, data-driven, or hybrid), diagnostics is the first and critical step.
Some researchers have already reviewed some aspects of the PHM framework for gear. For example, Liang et al. 14 have reviewed dynamic modeling of the gear faults while a review on the physics-based prognostic approaches for many of the gear failure modes is presented by Cubillo et al. 15 The data-driven prognostic approaches implemented for rotating machinery are summarized in previous studies,11,12,16–19 wherein the review work is more aligned with the bearings instead of gears. Many of the time domain-based HIs developed for gear fault diagnostics are summarized in previous studies.20–25 The existing review articles on gear diagnostics and prognostics have a specific and limited scope and many aspects are not elaborately discussed. In light of the existing research articles mentioned above, the following significant contributions are made in this article:
This article presents a comprehensive review of diagnostic and prognostic approaches developed for different types of PHM framework for gears: from physics-based, data-driven, to hybrid approaches.
In the physics-based PHM framework (section “Physics-based approaches”), a description of different gear failure modes, an overview of modulation-based and dynamic models used for gear fault diagnostics is given. In addition, the prognostic approaches developed for most reported gear failure modes such as crack, pitting, and wear are reviewed. The model parameter updating algorithms developed for improvement in accuracy of the physics-based prognostic approaches are also discussed.
Although this review paper is more aligned in the direction of approaches developed based on vibration signal, the pros and cons of different types of sensors used for gear CM are discussed in the data-driven PHM framework section (section “Data-driven approaches”). In addition, in this section, a variety of data processing methods for gear raw vibration signal, time domain, frequency domain, and time-frequency domain-based HIs developed based on the processed/raw vibration signal, techniques used for processing/selection of best HIs, machine learning and deep learning approaches developed for gear fault severity/type classification, and fault prognostics are briefly reviewed.
The hybrid prognostic approaches developed for different kinds of gear failure modes are reviewed (section “Hybrid approaches”). It has been noticed that not many hybrid prognostic approaches are developed for gears. The present review attempts to cover all the major reported hybrid approaches developed so far.
An overview of uncertainty quantification of prognostic approaches (section “Uncertainty quantification of prognostic approaches”), performance evaluation metrics (section “Performance evaluation metrics”) used for classification and prognostic approaches, and post prognostics decision process (section “Post prognostics decision making/health management process”) is also given. Future research possibilities and challenges in the direction of gear diagnostics and prognostics are discussed (section “Concluding remarks”).
Majority of the diagnostic and prognostic approaches for gears are developed based on the vibration signal. In general, it is observed that the vibration signal contains most of the information related to the gear dynamics and the changes in the dynamic response due to gear tooth degradation. Hence, this work focuses on the review of the vibration signal–based diagnostic and prognostic approaches developed so far.
Physics-based approaches
In the physics-based approaches, component health is assessed by solving a set of equations based on the physical laws and the knowledge of engineering and science. 11 The physics-based approaches are very important if accuracy is a critical factor and testing is restricted. 15 The physics-based approaches are failure mode-specific. A gear may fail in various modes and these failure modes in the order of decreasing frequency can be divided into four groups: fatigue, impact, wear, and stress rupture.1,26
Fatigue: The repeating cyclic stresses lower than ultimate tensile strength cause cracking of the surface, which results in fatigue failure. The fatigue failure can be divided mainly into two categories: tooth bending fatigue and surface contact fatigue.
Tooth bending fatigue: It results from a crack originating in the root section of the gear tooth. Non-metallic inclusions near the surface, imperfections in the root section of gear tooth, and so on. are reasons for the bending fatigue. The subsequent progression of crack originated at the root section causes fracture of the gear tooth surface.
Surface contact fatigue: When two surfaces roll or roll and slide against one another with sufficient contact force, the maximum shear stress is developed slightly below the contacting surface. The surface contact fatigue is initiated by this maximum shear stress and causes the cracking of the surface. These cracks propagate at a shallow angle and form a pit when the contact stresses exceed the surface fatigue strength of the material. 27
Impact: Due to sudden shock load, the tooth gets fractured within a few cycles and results in impact failure. This kind of failure is random and hence not many physics-based diagnostic and prognostic studies are reported for this category of gear failures.
Wear: Metal-to-metal contact due to lack of oil film, ingress of abrasive particles in oil, and so on cause removal of material more or less uniformly from the active gear tooth surface and results in wear failure.
Stress rupture: The internal residual stresses build to a magnitude beyond the strength of the material cause rupture of the gear tooth.
The failure modes discussed above have several subcategories.1,26 Among the gear failure modes, a majority of gear failures appear as tooth crack/fracture, pitting, and wear failure mode. Hence, most of the diagnostic and prognostic approaches are developed for these failure modes. In this article, physics-based diagnostic and prognostic methodologies developed for these failure modes are reviewed.
Physics-based diagnostic approaches
The diagnostics cannot be pure physics-based as the physical sensor data is always required to assess the gear condition. For example, a cracked or broken tooth might excite a gear-pair natural frequency every time it meshes, increasing the harmonics of shaft speed in the vicinity of the resonance. 28 The gear pair natural frequency can be estimated theoretically using physics-based model. However, to know whether the frequency is indeed excited and appears in the actual vibration signal in the geared system, the vibration sensor data is required. Most of the physics-based diagnostic approaches build a virtual dynamic system that models and mimics the complete gearbox system based on the physical understanding of that system. For diagnostics, the output of the physics-based model is compared with the sensor data output and a fault is detected. The physics-based diagnostic models can be divided into two categories: modulation-based models and dynamic models. 14 In the modulation-based models,29,30 amplitude, frequency, and phase modulation characteristics of the gear vibration signal are analyzed. The dynamic models are based on the lumped parameter model of the elements of the gearbox, the gear mesh, and associated elements. The development of both modulation-based and dynamic models is discussed hereunder.
Modulation-based models
The vibration signal obtained from a healthy geared rotor system comprises shaft rotational frequency, gear mesh frequency (GMF), and their harmonics due to deviations from the ideal tooth profile, residual rotor unbalance, transmission error, and so on.29,31,32 The amplitude modulation (AM) and frequency modulation (FM) of gear carrier signal generate the sidebands around GMF and its harmonics in a healthy geared rotor system. Figure 1 shows a typical healthy geared pair vibration signal obtained from a gearbox test rig.
32
The frequency spectra in Figure 1(c) and (d) show the sidebands, GMF and its harmonics, and sidebands around the GMF. In the healthy stage, the gear carrier signal is modulated due to reasons such as fluctuation in load/speed, riding of the gear on an eccentric gear or on a misaligned shaft and so on.
33
Hence, the vibration signal
where,

A typical gear vibration signal in (a) time domain, (b) time domain zoomed view, (c) frequency domain, and (d) frequency domain zoomed around the fundamental GMF.
The gear vibration of varying AM, FM, and phase angle can be simulated/generated using equation (1). In general, the signal simulated using a modulation-based model helps in the understanding of the gear vibration signal obtained in a healthy stage and particular fault severity stage. These models highlight the key characteristics of the gear vibration signal expected in the experimental measurements or field data. The gear mesh stiffness changes due to the presence of a defect on the gear tooth surface. Depending on the type of defect, these changes can be seen as changes in the amplitude of the GMF and its harmonics or change in the AM and FM, and phase effect on the gear mesh vibrations.
For example, the presence of wear on either side of the pitch point tends to affect the tooth meshing frequency and its harmonics. The type of profile error generated because of the surface wear distorts the gear meshing stiffness cycle. The effect of tooth wear is more noticeable at the higher harmonics of GMF compared to the fundamental GMF itself. Thus, at least the first three harmonics of the GMF should be considered for wear detection in early stages. 29 Similarly, defects such as pitting cause a non-uniformity in tooth spacing leading to the changes in the gear angular velocity as a function of the rotation. This effect may causes the FM of the gear mesh carrier signal. In addition, non-uniform tooth spacing also increases the fluctuation in torque and hence increases the AM. The sidebands are thus produced as a combined effect of AM and/or FM, both resulting from the same fault. 29 The increase in the amplitude of the sidebands and their families are often used for gear pitting detection. 32 Due to phase relations on either side of the carrier frequency, sidebands may combine to give reinforcement on one side and cancelation on the other, depending on the initial phase relationships of the AM and FM. Because of this reason, the sidebands structure in gear spectra is often unsymmetrical. 29 In addition, the presence of pitting reduces the GMF amplitude due to increase in the FM of the gear carrier signal. However, GMF amplitude is not affected due to increase in the AM. 32 McFadden 30 showed the importance of modulation for early detection of the defect such as fatigue crack. A very high-phase lag in the signal due to variation in the mesh point is observed when the affected part of the gear comes under load. In summary, the studies reported by McFadden, 30 Randall, 29 and Kundu et al. 32 can be referred for a better understanding of the modulation-based models developed for analyzing the gear vibration generated because of failure modes such as crack, wear, and pitting, respectively.
Dynamic models
The dynamic models use physical law such as equilibrium, conservation of energy, and Newton’s laws of motion to simulate gearbox vibration response in different health conditions. 14 In the dynamic models, various health conditions of the gear can be mathematically simulated. The dynamic models for the geared system are developed using many different ways. Lumped parameter modeling (LPM) and finite element modeling (FEM) are the most popular among them. In LPM, components (e.g. shaft, bearing, gear) in a geared system are considered to be solid and modeled as a combination of lumped mass, stiffness, and damping parameters. In FEM, a similar analogy is used in which mass, stiffness, and damping parameters are distributed on mesh elements and assemble them to form a complete mass, stiffness, and damping matrices for each component. Both kinds of modeling give similar results if boundary conditions and degree of discretization are properly defined. 14 However, the FEM is much computationally expensive compared to the LPM.
The time-varying mesh stiffness is one of the main sources of vibration in a gear transmission system. Based on the change in the gear mesh stiffness and its variation for a faulty gear, the gearbox vibration response changes. The dynamic model involves the time-dependent gear mesh stiffness, which can be modeled following a square waveform method, potential energy method, finite element method, experimental method, and so on. 14
Square waveform method for gear mesh stiffness evaluation: This approach is the easiest to model the gear mesh stiffness. The gear mesh stiffness is approximated using a square waveform periodic function. The time duration for one revolution divided by the number of teeth represents the period of the square waveform.
Potential energy method for gear mesh stiffness evaluation: The gear mesh stiffness is approximated by assuming the gear tooth as a cantilever beam. The gear mesh stiffness is estimated based on the solid mechanics of the beam. In this method, the contribution of bending, shear, axial compressive, and Hertzian contact is usually analyzed for stiffness evaluation.
Finite element method for gear mesh stiffness evaluation: This approach numerically estimates the gear mesh stiffness by discretization of the gear pair. The finite element method is a time-consuming and computationally expensive method. The accuracy of these methods depends on the degree of discretization, mesh density, and choice of element type.
The experimental method for gear mesh stiffness evaluation: Methods such as photo elasticity, 35 dynamic speckle photography, 36 and strain gauge 37 are used for experimental estimation of gear mesh stiffness.
The dynamic models help in simulating the response of the geared system being monitored. The indicator extracted based on the dynamic response can be used for gear fault diagnostics. If properly modeled, the model can simulate the component failure condition under any given speed and load profile. 18 Hence, the dynamic model reduces the time and expenses associated with seeding the physical damage/fault on the actual component (gear). However, the dynamic model has two major drawbacks. First, some factors such as misalignment, gear surface quality, oil quality, and clearance are difficult to model. Hence, the experimental and the simulated vibration response from the dynamic model seldom match exactly. Second, the dynamic models are expensive in terms of modeling efforts and computational time, particularly for complex gearing systems. 16 Åkerblom 38 and Liang et al. 14 have given a detailed description of dynamic models for geared systems and hence is not covered in detail in the present review.
Physics-based prognostic approaches
Based on the mathematical modeling of the degradation process for a particular failure mode, the physics-based prognostic approaches predict when the damage in gear crosses a predefined threshold of failure. The description of prognostic approaches used for different kinds of gear failure mode is given in the subsequent sections.
Crack
The fatigue crack in a gear tooth is mainly caused by high cycle fatigue. The Paris power law equation describes the crack growth that can be used for estimating the RUL. The Paris power law equation for modeling the crack growth is given by 39
where
where
The Paris power law equation is the most widely used for estimation of crack propagation in gear tooth. For example, in the literature,45,47 the tooth crack growth was estimated using this equation. For a particular crack length, gear mesh stiffness was estimated using the potential energy method and the stiffness is input to the gear dynamic model for estimating the stress variation. Based on this variation, the SIF was estimated using equation (3) and then the fracture mechanics-based model (equation (4)) was used for the RUL prediction. Li and colleagues45,47 give an overview of a combined physics-based diagnostic and prognostic approach developed for gear subjected to crack failure mode. Endeshaw et al. 48 proposed an approach for crack propagation in gears considering uncertainties in loading and material properties. The uncertainties in loading and material properties are incorporated in the gear dynamic model. Based on the gear dynamic model, it was found that an increase in crack length reduces gear mesh stiffness and, as a result, increases the dynamic force on the gear tooth. The calculated dynamic force is input to the finite element model for calculation of the SIF and gear crack propagation life was predicted using the Paris power law equation. Many researchers used the FEM-based software for estimation of the SIF used in the Paris power law equation instead of developing a dynamic model. For example, Glodež et al. 49 used the FEM program in FRANC2D software for simulating the fatigue crack growth in the gear tooth root. The virtual crack extension method was used for simulating the crack propagation path. Chen et al. 50 analyzed the propagation path and failure behavior of cracked gears under different initial angles. A decrease in gear mesh stiffness was observed with an increase in the initial crack angle. The crack propagation path through the gear rim was found more serious as compared to the crack propagation path through the teeth. In addition, in this study, a relationship between the gear crack propagation process and degradation level is established. For a particular initial crack angle, crack propagation path and the crack length were predicted using FEM. Based on the predicted path and length, the gear vibration response was simulated using a gear dynamic model. The degradation parameters such as root mean square (RMS) and kurtosis were extracted using the simulated vibration response. Subsequently, the relation between the extracted parameters and crack propagation path and length was established.
In most of the studies, crack is virtually initiated at a point and in the direction of maximum principal stress in the gear tooth root. Estimation of the number of cycles required for the fatigue crack initiation is important during the calculation of the total gear life. The strain life method based on FEM was used by Glodež et al. 49 to estimate the number of stress cycles required for the fatigue crack initiation. However, Lin et al. 51 used the power density method for estimation of the fatigue crack initiation life.
These works are further extended for a more comprehensive analysis of gear fatigue crack propagation. For example, Lewicki et al. 52 considered the effect of moving gear tooth load for the prediction of crack propagation. Podrug et al. 43 considered the effect of gear rim thickness, crack closure, and change in forces during rotation while the prediction of gear tooth cracks propagation life. Agarwal et al. 53 studied the phenomena of fatigue crack propagation in the presence of inclusion in the gear tooth root. For the hard inclusions near to the original crack paths, the crack propagation was observed to be slower. The finite element model using the Paris power law equation, gear crack path, and fatigue life was predicted. In order to reduce the computational requirement of the FEM-based gear crack life prediction methodology, Gueye et al. 54 proposed a pseudo evolutionary structural optimization approach. Čular et al. 55 proposed a strain-life approach that differs from the above studies for gear bending (crack) fatigue life prediction.
The Paris power law equation only holds when the crack growth rate is significant and not valid when the crack growth is not significant or become unstable. Alternate models such as Foreman law,56–58 which are the extended version of the Paris power law equation, can also be used for gear crack growth prediction. Studies on the development of various fatigue crack growth models are detailed in Pugno et al. 59
Pitting
The subsurface crack propagation under cyclic loading allows the material to break from its surface and results in pit formation.60–64 Hence, the Paris power law equation (equation (2)) used for modeling the crack growth can also be utilized to estimate the propagation of the surface-breaking crack, which results in a pit formation. Blake and Cheng 64 used such a model for predicting the initiation of pitting. As discussed earlier, the surface fatigue failure/pitting is initiated by the maximum shear stress acting below the gear tooth surface. 65 Hence, instead of using the normal stress (mode I crack propagation), the shear stress is used in equation (2) for predicting the subsurface breaking crack propagation, which is also termed as mode II crack propagation. This study was further extended by Blake and Cheng, 66 in which a predictive pit growth model was proposed for estimating the failure probabilities and service life for gears. Aslantaş and Taşgetiren 67 considered both mode I and mode II crack propagation SIF for predicting the initiation of pitting. Based on the finite element model developed in FRANC2D simulation software, the linear elastic fracture mechanics approach is used for gear life prediction with pitting failure mode. Glodež et al. 68 proposed a new model in which a finite element-based virtual crack extension method was used for predicting the initiation of pitting on gear tooth surface. The conventional pitting life prediction model is developed based on assumptions that gear tooth surface are ideally smooth without lubrication. Fajdiga et al. 62 proposed a numerical computational model in which the influence of lubricant pressure acting on the subsurface crack faces, which ultimately results in pitting, is studied. The effect of lubricant pressure within the crack is very important because it refers to mode I crack opening. In addition, in this study, the effect of Hertzian contact pressure, friction between contact surfaces, elastohydrodynamic lubrication (EHL) condition, the fluid trapped in the crack, and residual stresses due to heat treatment of the material on the gear pitting life is investigated. Zhu et al. 69 proposed the pitting life prediction model based on a three-dimensional (3D) line contact mixed EHL analysis and subsurface Von Mises stress calculation. The fatigue life prediction proposed by Zaretsky 70 was used for gear pitting life estimation. Li and Kahraman 71 extended these studies and developed a complex physics-based model for predicting the gear pitting initiation. In this study, the effect of tooth force, rotational speed, lubricant properties, lubricant temperature, surface roughness, residual stress, and material fatigue strength on gear pitting initiation is evaluated. In addition, in this study, a micro-pitting severity index is proposed for defining the pitting severity level. Li et al. 72 also developed a similar model for gear pitting initiation prediction.
Correlation between gear dynamics and gear tribology is very important for the gear life prediction. The close correlation between gear tribology and gear dynamics is also called as gear tribo-dynamics. Li and Anisetti 73 proposed a turbo-dynamic contact fatigue model for gear pitting initiation prediction. The governing equations of motion developed based on gear dynamic model and mixed EHL model are coupled together. Based on the coupled equations, normal pressure and tangential shear stresses are estimated below the surface and incorporated into the fracture mechanics model for life estimation. Multiaxial fatigue criterion was used for predicting the contact fatigue crack initiation life (pitting initiation). Depending on the gear contact ratio, the load is shared between multiple gear teeth. Moallem et al. 27 introduced this load-sharing concept in gear pitting life estimation. Yin et al. 61 developed a 3D dynamic finite element analysis model for gear pitting life prediction in ANSYS workbench. In this work, the plastic deformation of each element near the subsurface crack is used for subsurface crack propagation. This accumulated plastic strain is replaced with SIF in the Paris power law equation for pitting initiation prediction for gear under heavily loaded lubricated contacts.
Although models for predicting the initiation of pitting are proposed, a model for predicting the growth of existing pit on the gear tooth surface is not yet available. The model developed for representing the bearing spall growth can also be adapted/extended to represent the pit growth rate on the gear tooth surface, both being surface fatigue phenomena. The spall growth in the rolling element bearing can be represented with a small modification in the Paris power law crack growth equation as74,75
The above equation indicates that the rate of growth of the defect is related to the instantaneous defect area D under constant operating condition. 74
Wear
Apart from a fatigue surface crack, tooth contact surface wear (occurs due to sliding contact) may be modeled. A very thin lubrication film separates the contact surfaces of the gear teeth while in the mesh. However, this film is insufficient to avoid the direct asperity contact between two gear mating surfaces that results in the wear on the mating surfaces. Due to continuous wear, the gear tooth becomes thinner and increases the vibration level. Zhu et al. 76 presented a summary of the empirical and theoretical wear laws developed for wear rate progression estimation. The wear progression on the gear tooth surface is usually modeled by Archard’s law. 77 According to this law, the accumulated wear at a particular point on the gear tooth surface under rubbing dry, mixed, or boundary lubricated surface can be expressed by integrating the following equation 78
where
where
The wear on the gear tooth surface is highly affected by the lubrication condition and in most of the case studies on gear wear, partial-EHL condition is considered. For exact determination of wear rate progression on the gear tooth surface, equation (6) can be modified. For example, Wu and Cheng 81 considered the thermal desorption and oxidative wear mechanism at low and elevated asperity contact temperature, respectively, during the determination of wear rate progression in a spur gear under partial-EHL lubrication condition. This study was further extended by Shifeng and Cheng 80 in which a gear tooth profile was analyzed by consideration of gear dynamics. The equivalent wear rate and tooth wear profile along the line of action were analyzed in this study. In both addendum and dedendum portion of the gear tooth surfaces, the material was removed. However, the highest wear was found at the beginning of gear engagements. Bajpai et al. 82 combined the finite element-based gear contact mechanics model and Archard’s wear model for predicting the wear evolution on the spur and helical gear tooth surfaces. This study is focused on developing the wear prediction model considering the manufacturing and assembly imperfections and intentional surface modifications. Liu et al. 83 developed a comprehensive contact fatigue wear model in which the effect of loading condition, lubrication condition (mixed-EHL modeling), initial surface roughness, residual stress, and hardness on gear wear life is investigated.
In the above studies, the wear coefficient parameter
After a number of cycles, fatigue wear leads to the development of subsurface crack on the gear tooth surface. This subsurface crack propagation develops into pitting.15,83 Ghosh et al. 85 developed a correlation between wear and subsurface crack propagation
where N is the number of cycles to failure in pitting, Q is the contact shear stress, which depends on the coefficient of friction and wear rate, and a and b are constants obtained based on experimental data.
In summary, Table 1 lists the issues addressed by various researchers on the development of physics-based prognostic approaches for different kinds of failure modes in gear.
Physics-based approaches implemented by various researchers for gear damage growth rate estimation.
FEM: finite element modeling; EHL: elastohydrodynamic lubrication.
In summary, Figure 2 represents a typical physics-based framework for fault diagnostics and prognostics. The observed damage level and location can be modeled into the dynamic system model that can give the vibratory response using the dynamic model. This helps in the process of diagnostics based on a physics-based approach. Based on fault diagnostics information, the physics-based prognostic approaches can provide good prediction results if properly modeled. However, the physics-based prognostic approaches have a few major drawbacks that need to be addressed. First, much effort is required to estimate the fatigue model parameters, for example,

Physics-based framework for fault diagnostics and prognostics.
The physics-based prognostics model calculates the time to reach a future predefined threshold value of the damage area based on constant fatigue parameters, for example,
Parameters updating of the physics-based prognostic models
Due to the complexity of the mechanical systems, inherent uncertainty due to underlying modeling assumptions, process noise, and measurement noise, the prediction made by the physics-based prognostic models is always error-prone. 16 The parameter updating approaches such as Kalman filter (KF), extended Kalman filter (EKF), linear Kalman filter (LKF), particle filter (PF) and so on help to overcome these uncertainties in the prognostic models. Table 2 lists some of the parameter updating approaches that can be used for updating the physics-based prognostics model parameter.
Model parameters updating approaches.
Most of these parameter updating approaches are based on the Bayesian inference. Using the Bayes theorem, these approaches estimate and update the model parameters in the form of probability density function (PDF)
where Ø is the vector of the unknown physical model parameters,
where
For the implementation of the updating algorithms, it is assumed that the model parameters follow some distribution, that is, normal with their initially known mean and standard deviation given as
Initially, based on an a priori known distribution in the parameters,
Data-driven approaches
The data-driven framework for fault diagnostics and prognostics is built based on the historical run to failure CM data. The key elements for a data-driven framework are suitable sensor selection and data acquisition, data processing, HIs construction for fault diagnostics, indicators processing/selection, fault classification, and fault prognostics. Broadly, a typical data-driven framework for fault diagnostics and prognostics is outlined in Figure 3.

Data-driven framework for fault diagnostics and prognostics.
Sensor selection and data acquisition
Several sensors can be installed at various locations on the gearbox to monitor its condition. Typical sensors for monitoring the health of the gear include accelerometers for measuring vibration, acoustic emission (AE) sensor for measuring AE stress waves, wear debris sensor for measuring the amount of material removed on the gear tooth surface, a thermocouple for measuring oil temperature, sound microphone for measuring noise, torque sensor for measuring torque fluctuation, and so on. The sensor selection depends on the constraints such as accuracy, cost, location, size, frequency range, amplitude range, and working and environmental conditions. However, the most important parameter for the selection of a sensor is the ability of the sensor to effectively capture a small change in the gear tooth health condition. From the literature, it is observed that the vibration, AE, and wear debris sensor most effectively capture changes in gear tooth and mostly used for gear CM. Table 3 shows the pros and cons of these sensors used for gear CM. In general, as compared to other sensors for gear CM, it is observed that the vibration sensor signal contains most of the information related to the gear dynamics and is quite responsive to the most types of gear tooth degradation. The vibration sensor is also cost-effective and convenient to use. Hence, this work mainly emphasizes the data-driven framework based on vibration signal. The acquisition of the data from all the sensors is carried out using an appropriate data acquisition system. It converts the typical input analog signal from the sensors into a set of discrete digital values that can be further processed for extracting the HIs relating the gear health information. Using the changes in the gear pair dynamics as elaborated above, a number of HIs can be extracted for the gear fault diagnostics and are discussed in the next section.
Pros and cons of sensors used for gear condition monitoring.
AE: acoustic emission.
Gear fault diagnostics
Gear fault diagnostics is a combination of sequential steps such as sensor data processing, HIs construction, HIs processing/selection, and gear fault type/severity classification. This stage is designed to generate a vector of HIs, which can be used to infer the current health status of a monitored system. The generation of an appropriate HI vector is typically application dependent and is one of the most important stages in a PHM framework. For a robust and more reliable correlation of the signal with the gear tooth condition, pre-processing of raw vibration data is usually necessary, which is discussed in the next section.
Data processing
The raw vibration signal generated in the gearbox has three main components (1) periodic components due to interactions between the pair(s) of gear teeth during meshing, (2) transient components due to short-duration impact because of tooth fault, and (3) broadband background noise. 104 In addition, the signal captured by the accelerometers over the gearbox housing is usually affected by the interference signal such as vibrations from shafts, bearings, and so on. The signal may also be contaminated by other disturbances such as electrical and electromagnetic.
One of the various ways in which various frequency components or a family of frequency components related to gear can be isolated/extracted from the raw vibration signal is the time synchronous averaging (TSA). The TSA approach allows separation of rotation speed synchronous components related to the gear of interest from other non-synchronous components, including the random broadband noise present in the raw vibration signal. In TSA, the raw vibration signal in the time domain is averaged over a large integer number of cycles synchronous with rotation of the particular gear shaft with the help of the tachometer sensor (reference) signal.105,106 The noise and contribution of the signals from other machine components are significantly decreased using the TSA. Hence, the TSA time domain waveform is cleaner/purer compared to the original signal. 107 Some signal processing techniques are developed108–110 in which tachometer signal information can be extracted from the vibration signal itself and hence TSA can also be performed without the need of a separate tachometer sensor.
The TSA signal gives direct visualization of the gear vibration signal and only localized tooth fault is easily discernible in the TSA signal, at least when fault becomes larger in size. 29 Many times, the TSA signal is further processed for diagnosing different kinds of gear faults which are difficult to diagnose using the bare TSA signal. Figure 4 shows different ways of processing the raw time domain data for diagnosing different kinds of gear faults.

Data processing of raw vibration signal.
The residual TSA vibration signal is obtained by removing the shaft rotational speed components, GMF and GMF harmonics components from the original TSA signal. Thus, the residual TSA vibration signal only contains sideband information. As discussed earlier in section “Modulation-based models,” the increase in sidebands amplitude and families can be used for gear pitting detection. Hence, the residual TSA vibration signal, which only contains the sideband, is helpful for diagnosing the pitting fault.32,105,111 For a tooth fault such as crack, it was observed that first- and/or second-order sidebands around each of the GMF harmonics produce large modulation effects that are not related to the local faults. 112 The researchers113,114 have observed that gear tooth crack diagnostic-related information are more pronounced in the difference signal compared to the residual vibration signal. The band pass TSA signal is obtained by removing the rotational shaft frequency components and then band pass filtering of the remaining signal around the GMF and GMF harmonics. During filtering around each of the GMFs, up to fourth-order sidebands can be included in the filtered signal. If sidebands are removed from the band pass TSA, then the obtained signal is called a harmonics TSA signal. The harmonics TSA signal is helpful in diagnosing the presence of wear on the gear tooth surface. The processing of the raw vibration signal is reviewed in detail in previous studies.21,23,24 For gear fault diagnostics, the most commonly used HIs extracted based on the processed/unprocessed vibration signal are discussed in the next section.
Construction of the HIs
The gear vibration signal is, in general, too complex to interpret directly and hence to map the degradation trend in the component health with the fault severity development, a HI is usually constructed.115,116 Based on this HI, fault severity classification and prognostic approaches are implemented whose success depends heavily on the performance metrics of the HI, such as monotonicity, trendability, and robustness. This HI extraction process for the gear fault diagnostics can be done using processed/unprocessed vibration signal data in three different domains: time domain (TD), frequency domain (FD), and time-frequency domain (TFD). Different failure modes have different dynamics of initiation and rate of propagation. Hence, different indicators may need to be used to track different kinds of failure modes. 32 Table 4 lists the most widely used HIs for different kinds of failure mode diagnosis in gear. A definition of these HIs is given in the subsequent sections.
Summary of the HIs used for diagnostics of various types of gear failure modes.
HIs: health indicators; RMS: root mean square; TSA: time synchronous averaging; CCR: Correlation coefficient of residual vibration signal; GMF: gear mesh frequency; ALR: average logarithmic ratio; CEEMD: complementary ensemble empirical mode decomposition; VMD: variable mode decomposition; OEMD: orthogonal empirical mode decomposition.
Time domain–based HIs
The TD-based HIs analyze the amplitude information or compare the vibration signal in a time series obtained in healthy and faulty stage for gear fault diagnostics. Based on the different types of signal pre-processing, various TD-based indicators can be extracted for gear fault diagnostics. The definition of the most widely used TD-based indicators is given in Table 5. The detailed description of many of these indicators can be seen in previous studies.20–25,117,158 In Table 5,
Common TD-based indicators used for gear fault diagnostics.
TD: time domain; RMS: root mean square; CF: crest factor; TSA: time synchronous averaging; AM: amplitude modulation.
Like kurtosis, the indicators FM4, NA4, and NB4 are dimensionless. For healthy gear conditions, the value of these indicators nearly three and a value of these indicators greater than three may indicate a faulty gear condition. A threshold may be defined for these indicators for the identification of different damage severity levels on the gear tooth surface.
In addition to the indicators discussed in Table 5, some other TD-based indicators were also extracted for gear fault diagnostics. For example, Mathew and Stecki 118 developed an indicator named matched filtered RMS for detecting the wear progression. The matched filtered RMS is the logarithmic value of the average power ratio between the current health state vibration signal and a reference vibration signal. The sensitiveness of this indicator is found to be better than the traditional indicators such as RMS and peak. Wang 161 used the resonance modulation technique for gear crack fault diagnostics. It is found that a crack in the gear tooth generates impacts that excite structural resonances. The residual TSA signal is band pass filtered around the structural resonance. The kurtosis of the envelope of the band pass filtered signal is found to be better than the kurtosis value of the raw signal for crack diagnostics. Kundu et al. 32 proposed an indicator CCR for monitoring the natural pitting progression on the gear tooth surface. The indicator CCR compares the correlation of residual TSA vibration signal in the healthy/reference stage with the signal obtained in the current/pitted stage. The signal correlation is found to decrease with an increase in the pitting severity level on the gear tooth surface. In this study, the performance of the CCR indicator is compared with other indicators such as RMS, peak, CF, FM4, NA4, M6A, ER, and ALR in different pitting severity stages of the gear. It is shown that the indicator CCR value changes significantly in consecutive pitting stages of the gear compared to other indicators.
Frequency domain–based HIs
The indicators developed in this category are based on the changes in the frequency content of the vibration signals. In the FD, the mixture of different periodicities is easier to interpret compared to the TD. 29 Most of the gear fault diagnostic indicators developed based on this domain involve filtering the sidebands, GMF, or GMF harmonics from the raw/TSA signal and then analyzing them in the time domain using indicators discussed in the previous section. The relationship between the energy at different frequencies and gear damage is not well established. Hence, fewer studies are available for fault diagnostics based on the spectrum of the gear vibration signal. For example, Randall 29 observed that the presence of uniform tooth wear leads to an increase in the amplitude of the GMF harmonics amplitude and hence higher amplitude of the GMF harmonics can be used for detecting the uniform wear at an early stage. For defect such as pitting, increase in the sideband families and sideband amplitudes may indicate the presence of pitting on the gear tooth surface. 29
For defects such as crack, McFadden 30 observed that the phase modulation of the TSA signal band pass filtered around the dominating GMF harmonics can be used for early-stage gear crack diagnosis. The cracked tooth was detected by studying the phase angle and amplitude of the signal. Based on the sidebands information in the spectrum of the current stage and reference stage vibration signal, an indicator ALR has been used for detecting the crack on the spur gear tooth.125–127
Combet and Gelman 124 proposed a side band ratio (SBR) indicator for differentiating the local tooth fault such as pitted gear from the healthy gear. The indicator SBR was calculated as the ratio of the sum of the amplitudes of sideband components of the envelope spectrum of a filtered signal around mesh harmonics to the measured power of the mesh harmonics. The first two harmonics were considered for pitting detection. Hu et al. 34 updated this indicator for monitoring the wear progression. The updated indicator was developed by the average logarithmic ratio of the current state SBR value to the reference state SBR value by considering all GMF harmonics in the signal. The logarithmic ratio was taken to deemphasize any substantial changes in the SBRs of any particular meshing harmonics. 125 Combet and Gelman 129 used the technique spectral kurtosis (SK) for the diagnosis of an early-stage pitting in gear. The SK technique was used to capture the small transients in the vibration signals. A similar technique was used by Barszcz and Randall 130 for crack fault identification in a wind turbine gear. Wang et al. 162 describe the usage of the SK technique for gear fault diagnostics.
In addition to the spectrum analysis, some researchers29,119,128 have used the cepstrum analysis for gear fault diagnostics, which detects the periodic structure in the spectrum. Cepstrum analysis is useful for separating overlapping side band families. The cepstrum represents a spectrum of the spectrum plot on a logarithmic scale. The cepstrum
where
For surface wear diagnosis, Randall 29 observed that cepstrum analysis helps in distinguishing the family of harmonics with the family of equally spaced sidebands. Similarly, Ziaran and Darula 128 observed a larger change in harmonics amplitude of the cepstrum compared to the harmonics of the spectrum for a pitted gear. Sometimes cepstrum approach suppresses the useful fault diagnostic information available in the gear spectrum. Hence, it is advised that the cepstrum can be utilized to help in the understanding of the spectrum, instead of replacing it. 29
Many times, a local tooth fault such as initial pitting or crack generates small-amplitude short-duration transients in the vibration signal. In the normal spectrum, these changes may be submerged in the dominating components of the signal as all sections of the TD signal contribute to the normal Fourier spectrum. Hence, the global basis function does not effectively capture localized transient features. The Fourier spectrum is therefore insensitive to small localized temporal changes. 156 Hence, the FD-based HIs are not useful for the non-stationary vibration signal and fail to describe the evolution of the frequency content of the signal with time. It is therefore worthwhile to analyze the signal in TFD 134 in situations where short-duration transients are excited.
Time-frequency domain–based HIs
Multiple types of TD- and FD-based indicators are well established and explored for their effectiveness in gear fault diagnostics. However, it is reported that many times these techniques are unable to diagnose the fault in its early stages as these techniques are based on analysis of the signal assuming stationarity of the gear vibration signals. The periodic components present in the gear vibration signal show up readily in the frequency spectrum while the impulsive content is more appropriately descramble in TD. To capture both the information, the TFD analysis can be a good alternative.21,133 The approaches such as short-time Fourier transform (STFT), wavelet transform (WT), empirical mode decomposition (EMD), and Wigner-Ville distribution (WVD) are commonly used TFD analysis approaches for gear fault diagnostics. The STFT represents the signal energy distribution over the frequency spectrum as it changes with time. It reflects changes in the short duration in the signal. Due to the application of a window function, the local damage on a tooth of the gear can be easily detected. 156 However, the approach has a disadvantage of a lack of simultaneous high resolution in both TD and FD. 163
WT that gives an improvement over the STFT technique is an adaptive multi-resolution analysis technique and is ideally suited to detect the non-stationary, non-periodic, and transient features in the vibration signal efficiently. 134 The WTs-based fault diagnostic methodologies may be categorized into a continuous wavelet transform (CWT), discrete wavelet transform (DWT), and wavelet packet transform (WPT). In CWT, the information on a series of wavelet coefficients at different scales is used for gear fault diagnosis. Polar wavelet maps were used by Meltzer and Dien 139 to improve the fault detection capability of a faulty gear operating under non-stationary rotating speeds. Various HIs were extracted using the wavelet coefficients of the CWT-based polar wavelet maps for the diagnostics. Similarly, Zhu et al. 140 mapped the wavelet coefficients into a polar diagram to enhance the periodic transients caused by gear faults such as pitting and crack. Morlet wavelet was used by Vernekar et al. 134 for diagnosing the missing tooth fault in the gear of an engine. The GMF amplitude in CWT was used to detect the presence of gear fault. Rafiee and Tse 141 extracted the HI for diagnosing the various gear faults such as slight-worn, medium-worn, and broken teeth by approximating the autocorrelation function of the wavelet coefficients as a simple sinusoidal function. Zuo et al. 142 used the WT to obtain multiple data series at different scales. These multiple data series were then used as an input to an independent component analysis (ICA) algorithm for the detection of an impulse generated due to broken tooth fault. Öztürk et al. 143 used the mean frequency of a scalogram to detect the presence of the pitting faults. Similarly, Wang et al. 144 proposed an HI from the amplitude of the wavelet coefficients of a CWT for a quantitative assessment of crack fault severity level under the varying operating conditions.
The CWT techniques are time-consuming and not suitable for large size of data set. It therefore becomes inconvenient for implementation of online fault diagnosis. The DWT is a fast computation version of the WTs. It is easy to implement and requires less computational resources in cost and time. In DWT, the signal is divided into approximation and detail coefficients depending on the level of decomposition. Saravanan and Ramachandran 145 used the DWT to represent all possible types of transients generated due to the presence of faults such as crack, wear, and broken tooth. Li et al. 146 used the DWT technique to denoise the raw vibration signal and diagnosed the gear faults such as crack, wear, and broken tooth, based on the autoregressive (AR) model and principal component analysis (PCA) approach. The WPT can be used as an alternative in applications wherein the DWT does not provide good fault diagnosis results. In WPT, the signal division takes place in each level of the approximated and detailed signals. 135 Hong et al. 164 identified the best sub-frequency band for classifying the condition such as normal, cracked, and broken teeth for a bevel gear based on the WPT technique. A detailed review of the application of different types of WT for fault diagnostics of rotating machinery elements such as bearings and gears is given in previous studies.136–138
An alternative form of a TFD technique that is mostly used nowadays for fault diagnosis is an EMD. The EMD decomposes the non-stationary vibration signal into intrinsic mode functions (IMFs) that are nearly orthogonal. The IMFs represent the natural oscillatory mode embedded in the signal. Parey and colleagues147,148 observed that the kurtosis value of the selected IMF was more sensitive to the incipient crack fault propagation compared to the kurtosis value of the raw original signal. Similarly, Li et al. 149 diagnosed a tooth crack in gear based on the marginal spectrum obtained using the EMD. Mode mixing is a major disadvantage of the EMD process, especially when the amplitude of the high-frequency component is smaller than the corresponding low-frequency component.115,165 To counter the mode mixing and enable separation of the high-frequency component, Zhang et al. 166 used a frequency-modulated EMD technique. Zhao et al. 155 proposed an orthogonal empirical mode decomposition (OEMD) approach to reduce the effect of mode mixing, the influence of false frequency, and noise in the EMD approach. Liu et al. 167 used the ensemble empirical mode decomposition (EEMD) technique to counter the mode mixing problem. Although the EEMD technique has efficiently solved the mode-mixing problem, it takes a lot of time for decomposition. Zhao et al. 151 proposed a complementary ensemble empirical mode decomposition (CEEMD) technique for gear fault diagnosis, which has less reconstruction error and is computationally more efficient compared to the EEMD technique. The poor performance of EEMD techniques is observed in the environment of strong noise present in the vibration signal. Recently, variable mode decomposition (VMD) technique is proposed for gear fault diagnosis due to high signal-to-noise ratio and better adaptability compared to EEMD. 152 The selection of an optimum number of decomposition layers is one of the critical challenges during the implementation of the VMD technique. Algorithms such as particle swarm optimization (PSO), 168 grasshopper optimization, 169 ant colony optimization (ACO), 170 and artificial fish swarm optimization 171 are used for optimizing the number of decomposition layers in the VMD technique. Lei et al. 115 give a detailed review of EMD-based technique for fault diagnostics of rotating machinery.
Alternate time-frequency techniques are also used for gear fault diagnosis. For example, Wong 157 used the WVD technique for early-stage crack diagnosis in gear. Polyshchuk et al. 132 proposed an indicator NP4 for gear fault detection based on WVD of the raw vibration signal. This indicator takes the kurtosis of the instantaneous power calculated using the WVD for gear fault diagnosis.131,132 Feng et al. 172 presented a review on the time-frequency domain-based HIs.
Processing/selection of the HIs
The accuracy of the classification and prognosis output of the models depend on the quality and sensitivity of indicators utilized to evaluate the condition of the faults. The HI processing step is optional while implementing the data-driven framework since its implementation depends on the correlation of the HI space with actual damage on the gear tooth surface. Figure 5 shows different ways of processing/selecting the HI space.

The HIs processing/selection for prognosis/classification model.
HIs smoothness
The HI value extracted from the vibration signal exhibits fluctuations with time. This variation is usually observed due to inherent complexity in the correlation between physical damage level and the corresponding HI value as well as due to the unpredictable balance between self-healing of the damage area and subsequent secondary degradation from primary damage sources. In some cases, there are other unexplained practical reasons. The actual measured data also has an inherent measurement noise. To avoid/reduce these fluctuations, the HI value is smoothened before it is used. For example, Tian and Zuo 173 smoothened the RMS-based HI using the Weibull hazard rate function for gear RUL prediction. A moving average method was used by Shao et al. 174 for reducing the fluctuations in the RMS-based HI value obtained from the vibration signal. For better fitting, some researchers have used the logarithm of HIs.175–177 The main advantage of the indicator smoothness step is that it helps in removing the outliers present in the HI values and hence helps in improving the accuracy of prediction.
Best HIs selection
Most of the existing gear tooth fault diagnostic HIs are not equally sensitive at different damage levels. Some indicators are sensitive for the early-stage fault diagnostic and some are more sensitive for advanced stage gear fault diagnostic. The performance of an indicator depends on the fault characteristics, that is, type and location of the damage and whether the damage is distributed or localized. 178 Hence, the researchers usually extracted multiple indicators that are either sensitive in the early stage of gear fault or in the advanced stage of the gear fault. However, if all these indicators are used as inputs in the development of an RUL prediction or classification model, there is a strong probability that this input data may tend to describe the random error or noise apart from an underlying relationship. This is called overfitting of the model and in such cases, the performance of the model may be good during training but is likely to be significantly worse during testing. In addition to the effect on the performance, a large HI space increases the model learning time. Generally, two types of approaches are used to reduce the HI space dimensionality. The first approach involves the generation of a new HI with a lower dimension from the extracted HI space. This is done with the help of dimensionality reduction techniques that are discussed in section “Processing/selection of the HIs.” The second approach involves the elimination of the non-sensitive HIs from the HI space based on certain benchmarks that are discussed below.
In general, the indicators used for prognosis and classification should have the following three characteristics:8,179
Monotonicity: represents an overall positive or negative trend in time.
Robustness: reflects the tolerance of the HI to outliers.
Trendability/correlation: the correlation of the entire history of the evolution of the HI with the fault progression. 180
The mathematical formulation of the metrics quantifying the characteristics described above can be found in Kundu et al. 32 and Zhang et al. 179 In addition to these characteristics, other characteristics such as sensitivity and early detection for the HI can also be investigated. These characteristics of fault diagnostics are investigated by Shakya et al. 181 for comparing the performance of various HIs. Instead of using a separate HI selection methodology, some prognosis models such as general log-linear Weibull (GLLW)98,116 and random forest (RF)105,182,183 have an inbuilt best HI selection capability. In the GLLW model, the backward stepwise regression procedure is followed for eliminating the least significant HIs and re-estimating the model parameters. This model provides statistical information in the form of p-values. The higher p-value (usually more than 0.05) indicates a poorer fit for the model. In addition, the RF approach that can be used for both classification and prognostics has the inbuilt best HI selection capability.
Other methods for best HI selection include Taguchi’s method used in Alkhadafe et al. 184 for classification of fault into the slight, moderate, and severe category in gear. The distance evaluation criterion for best indicator selection has been used in previous studies,167,175,178,185 whereas Lei et al. 186 and Cerrada et al. 187 used the genetic algorithm (GA) for best HI space selection. Shakya et al. 188 explored the use of Mahalanobis–Taguchi–Gram–Schmidt method and used gain values to remove/select appropriate indicators before the indicator fusion process. HIs processing/selection techniques discussed above can be implemented individually or together for improvement in the classification/prognosis accuracy, depending on the problem in hand.
HIs fusion/compression
Another way for HI space selection is to fuse all the indicators in such a way that it reduces the dimensionality of the extracted indicator space and at the same time retains the sensitivity or variability of all the indicators. Algorithms such as PCA,107,189–195 ICA, 191 and Mahalanobis distance (MD) 196 are widely used for indicator fusion/compression.
Fault classification
Based on the identified HI space, the fault classification approaches classify different classes of faults in gear such as wear, pitting, crack, and tooth chipping. In addition, for a particular fault type such as pitting, the classification approaches can be used to classify the severity or the state of the gear pitting condition such as initial pitting, medium pitting, and severe pitting. The classification approaches separate the different fault classes based on some statistical criteria or hyper plane construction. Input from this step is very important while developing a prognosis model. For each type of failure mode, the evolution/trend of a particular fault indicator may be different. Hence, usually, a different prognosis model may need to be developed for each kind of failure in the component. Table 6 summarizes various classification approaches used for gear fault classification. Some of the most widely used approaches are discussed in the following subsection.
Fault classification models implemented by various researchers for gear fault severity classification.
HI: health indicator; TD: time domain; FD: frequency domain; TFD: time-frequency domain; EMD: empirical mode decomposition; SVM: support vector machine; PCA: principal component analysis; RMS: root mean square; IMF: intrinsic mode functions; CC: correlation coefficient; RVM: relevance vector machine; EEMD: ensemble empirical mode decomposition; ANN: artificial neural network; GA: genetic algorithm; KNN: K-nearest neighbor; LDA: linear discriminant analysis; MLPNN: multiple-layer perceptron neural network; RBFNN: radial basis function neural network; WPT: wavelet packet transform.
Neural network
The idea of an artificial neural network (ANN) is inspired by the biological nervous system and is the most popular methodology used for classification problems due to its prediction capability. The ANN model learns from the training in a way similar to the biological neural network learns from the experience. 225 Different structures and types of ANN models are trained by various researchers for gear fault severity classification. For example, Saravanan and Ramachandran 145 used the back propagation neural network approach to identify different kinds of faults such as crack, wear, and tooth breakage in a bevel gear. Wavelet-based HIs were extracted from the wavelet coefficients obtained using the Daubechies wavelets “db1” to “db15.” The HIs generated from the wavelet that gives the highest potential to identify different types of faults in gear is used as inputs for classifying a different kind of fault in gear. The accuracy of the ANN model was tested with the number of neurons between 2 and 30. The number that resulted in the minimum classification error was selected for building the structure of the ANN for fault classification. Alkhadafe et al. 184 used multiple sensors such as vibration, acoustic, speed, and torque for classification of the pitting fault severity into initial, medium, and severe pitting in a helical gear. Various TD-, FD-, and TFD-based HIs were extracted from the sensor signals. Taguchi’s method was used to select the best indicators that are sensitive to different fault types in gear and are then used as inputs to the neural network. Two types of neural networks, namely, back propagation and radial basis, were used in this study for fault severity classification. Rafiee et al. 198 used the multi-layer perceptron neural network for classifying the gear faults such as slight-worn, medium-worn, and broken-tooth. A self-organized map (SOM) neural network was presented by Cheng et al. 199 for identifying the gear faults such as a worn, broken, and missing tooth.
The neural network-based models have two critical disadvantages. First, it is difficult to determine the optimum network structure and number of nodes.226,227 Second, the neural network training algorithms may not give the same results for each run during training with the same input, output, and model structure. 228 To overcome the first disadvantage, researchers have attempted some optimization algorithms for finding out the optimum number of hidden layers and neurons in the network. For example, Samanta 206 optimized the number of nodes in the hidden layer of an ANN model using GA.
Fuzzy rule
In the fuzzy rule-based approaches, the rule set is usually learned from the knowledge of the expert or prior system knowledge for fault classification. Jolandan et al. 200 used a fuzzy-based approach for the classification of faults such as a broken and worn tooth. For developing the rules, the decisions tree approach was used. A major issue with the fuzzy approaches is the correctness of the rules developed from the expert knowledge or requirement of a separate algorithm for defining the rules. Hence, some researchers201–203 integrate the fuzzy approaches with the neural network for automatically defining the rules. The integration overcomes the interpretability issue of the ANN and rule defining issue of the fuzzy approach.
Support vector machine
The support vector machine (SVM) is a nonlinear approximation approach and can give better classification accuracy even with a small sample size.
229
It maps the input data x in a nonlinear fashion into a higher-dimensional HI space via kernel functions.178,191 Each instance in the input HI space is assigned with a class label of +1 or −1. For example, if
A basic SVM classifier can only be used for two-class classification problems. For solving the multi-class classification problems using SVM, many classifier architectures such as one-against-one, one-against-all, and direct acyclic graph have been developed.189,232,233 Qu et al. 204 used the multi-class SVM approach for the classification of different levels of pitting severity (initial, medium, and severe pitting) in a planetary gear. A total of 134 TD- and FD-based HIs were extracted from the raw vibration signals. Based on this HI space, an SVM classifier was built and different damage severity levels were classified. Zhang et al. 166 observed that the energy distribution of the IMF extracted using the EMD process can be used to recognize the dynamic state and fault type of gear. With the multi-class SVM approach, gear health stages such as healthy, crack, and broken teeth were identified. For multi-class classification, Liu et al. 167 proposed a Kernel-based approach similar to the SVM called relevance vector machine (RVM) and obtained better accuracy compared to that from the SVM.
As discussed earlier, the accuracy of the SVM-based approaches depends on the values of the kernel parameters and there is no precise basis to determine suitable kernel parameters for a given problem. 234 Some optimization algorithms are required to find the optimum kernel parameters. Liu et al. 167 obtained the optimum kernel parameters using the ACO algorithm. Higher accuracy was observed with the ACO-RVM approach compared to that from the RVM approach. In this work, 14 TD-based HIs and 13 FD-based HIs were extracted from the first three IMFs extracted from the EEMD process. 235 Based on the distance evaluation criteria, dominant HIs were selected for the HI space to the RVM model for improving the fault classification accuracy. Similarly, Samanta 206 used the GA for optimizing the SVM kernel parameters. The SVM classifier is less influenced by the HI space dimensionality and is less prone to the overfitting problems as observed with the neural networks. 236 Hence, some researchers206,207 have observed that the SVM outperforms the ANN-based models in terms of classification accuracy.
Random forest
Recently, RF methodology is used by various researchers182,183,187,212 for fault severity level classification. The RF methodology has fewer hyper-parameters and easy to interpret. Compared to the algorithms such as ANN and SVM, it provides a higher or at least comparable classification accuracy. The RF is an ensemble learning methodology that integrates the multiple weak and diverse decision tree classifiers and reaches a final decision by majority of votes from the multiple decision trees. 212 Han et al. 212 compare the performance of RF, ANN, and SVM models for classifying faults such as spalling, crack, and broken tooth. Compared to the SVM and ANN algorithms, the performance of the RF algorithm was found superior, especially when the number of training samples was limited and noise in the signal is high. Cerrada et al. 187 used the RF algorithm with the integration of GA for classification of gear faults such as pitting, wear, broken tooth, and misaligned gear. In this work, many TD-, FD-, and TFD-based indicators were extracted from the raw vibration signal and the GA was applied on these indicators for selecting the best indicators. The best indicators were used as inputs to the RF model for gear faults classification. The RF methodology has an inbuilt best HI selection capability. The separate GA used in the above work for selecting the best HIs is not required if RF is used as a fault classification model. The best HI selection capability inbuilt in the RF model is demonstrated in previous studies.182,183
K-nearest neighbor
The K-nearest neighbor (KNN) is a type of instance-based learning algorithm that works based on the principle that the instances within a dataset lie in close proximity to the other instances with similar properties. 237 The KNN is more stable and has a good classification performance compared to other algorithms such as ANN and SVM. For example, Wang 195 identified five crack severity levels on gear tooth using the KNN algorithm based on HIs extracted from WPT. The highest classification accuracy was observed with the KNN algorithm compared to algorithms such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), classification and regression tree (CART), and Naïve Bayes classifier (NBC). Similarly, Liu et al. 208 observed the comparable performance of the KNN with the SVM algorithm for the identification of crack and missing teeth in a planetary gear train.
Deep learning
The performance of the conventional machine learning approaches depends on the sensitivity of the HI value to the fault progression. The extraction of appropriate HI requires domain knowledge and understanding of signal processing techniques. In recent years, deep learning-based neural network models have become popular for fault classification. They reduce the manual processing and analysis of the data. The deep learning models learn features from the raw data. Their deep architecture contains many layers of nonlinear data processing units. 238 Each layer learns a higher level of raw vibration data representation from the output of its preceding layer. Hence, deep learning models automatically extract multiple complex features/indicators from the raw vibration data without the use of human expertise, domain knowledge, and signal processing techniques. 239 Different types of deep learning architectures such as convolution neural network (CNN),214–216,219,222 stacked auto encoder (SAE),218,220,224,240 deep belief network (DBN),213,223,241 and deep Boltzmann machine (DBM) 221 are applied for gear faults classification. Some architectures were applied to the HIs extracted from the raw vibration signal,182,184–186,189,190 whereas some architectures were applied to the raw vibration data in the time domain,223,242–244 frequency domain,218,224 and time-frequency domain.214,219,222
For example, a local-global deep neural network (LGDNN) was proposed by Zhao and Jia 223 for diagnosing different kinds of bearing faults and different fault severity levels for gear using raw vibration data in the time domain. In the local layer of the LGDNN model, an improved convolutional deep belief network (CDBN) based on the Fisher parameter optimization criterion (called Fisher-CDBN) is used to efficiently extract local discriminant information from the raw vibration data. In the secondary global layer of the network, the kernel PCA is used to reduce the redundancy attribute of the data. Jia et al. 218 used an auto encoder-based deep neural network (DNN) model for classifying different kinds of gear faults using frequency spectra of the machinery. Different types of faults such as pitting on sun gear, cracked sun gear, chipped planetary gear in the first stage, chipped sun gear, a missing tooth sun gear in the second stage, and an inner race bearing fault of planetary gear in the first stage was classified in this work. Wang et al. 214 used the CNN model for classification of gear fault severity levels such as slight crack, medium crack, and broken teeth. The time sequences in this work were converted to time-frequency images by wavelet analysis. The CNN model learns the underlying features in the signal from the time-frequency domain spectral images and performs fault classification.
The conventional classification algorithms are suitable for one operating condition. If the operating conditions change for the system, then the historical data for the changed value of the operating condition is required and the model needs to be retrained to accommodate the additional operating condition. This issue is overcome by deep transfer learning approaches developed recently. For example, Li et al. 243 proposed a deep transfer learning network (DTLN) to diagnose different pitting severity levels. The parameters of the DTLN model in this work were optimized using PSO algorithm. Han et al. 244 used the CNN model for diagnosing multiple fault types in the gear, bearing and shaft for a compound gearbox. In the source domain, the CNN model was trained on a large dataset to learn hierarchical features from the raw vibration data. The architecture and weights of this CNN model in the source domain were fine-tuned for classifying the fault severity level/types in the target domain (for different operating conditions). Different types of architectures developed in the deep transfer learning category for gear fault diagnosis can be referred in previous studies.242,245,246
Compared to conventional machine learning approaches, designing a deep learning architecture is still a challenge. The deep learning model has so many hyper-parameters and each hyper-parameter has a substantial effect on the model performance. More standard methods need to develop for the selection of the optimal hyper-parameters for a particular fault diagnosis problem. 239 A detailed description of deep learning–based models for machinery fault diagnosis can be seen in previous studies.239,247–249
Miscellaneous
In addition to the major approaches discussed above, other approaches have also been tried for gear fault classification. For example, Cheng et al. 250 proposed a gray relational analysis approach for the identification of the different level of pitting severity in gears. In this work, the best HIs were selected based on their correlation to the damage severity level. Based on the selected best HIs, the relational analysis of gear in the healthy stage is carried out with the gear in different pitting severity levels. Zhao et al. 210 classified the artificially created pitting surface damage using an ordinal ranking/classification methodology. A total of 63 TD- and FD-based HIs were extracted from each vibration sensor data. Vibration sensors at four different positions were used for health monitoring of the gear and hence total 252 HIs were available for gear pitting severity level classification. The correlation coefficient of each HI vector with the pitting severity level is estimated and indicators with high correlation coefficient value were considered for pitting severity classification.
The detailed mathematical and graphical description of classifier models for machinery fault diagnosis is given in Liu et al. 237
Fault prognostics
In contrast to the wide variety of fault diagnostic approaches available for gears, fault prognostic approaches are limited in the open literature. This may be due to the non-availability of the run to failure data for gears. The data-driven methodologies used for gear fault prognostics are tabulated in Table 7. A detailed description of each of these RUL prediction methodologies is not given here, as many reviews9,11–13,19,237,248,251–255 are already available on general data-driven prognostic methodologies and hence some of the common RUL prediction methodologies that are used specifically for gears are discussed in this section.
Data-driven prognostic models implemented by various researchers for gear RUL prediction.
RUL: remaining useful life; HI: health indicator; ANN: artificial neural network; RMS: root mean square; CWT: continuous wavelet transform; ER: energy ratio; AM: amplitude modulation; ANFIS: adaptive neuro-fuzzy inference system; SVR: support vector regression; NARX: nonlinear autoregressive model with exogenous inputs; PF: particle filter; PSD: power spectral density; GMF: gear mesh frequency; CF: crest factor; HMM: hidden Markov model; TD: time domain; FD: frequency domain; TFD: time-frequency domain; ICA: independent component analysis; RFR: random forest regression; CCR: correlation coefficient of residual vibration signal; MD: Mahalanobis distance.
Neural network
The architecture of the ANN model discussed in section “Neural network” for gear fault classification is the same for fault prognosis with the exception of the target variable. The state of the gear fault in the form of 1, 0, −1, and so on was output in the ANN classification model, whereas the output variable in the prognostic model is the RUL. In all the reported work for the gear RUL prediction based on the ANN model, a nonlinear mapping between the subset of the past HI values and future HI values is considered. For RUL prediction, all these models extrapolate the HI value based on the past HI values. Zhang et al. 256 used a back propagation ANN model for estimating the degradation trend of the sideband-based HI developed for a gear subjected to wear failure mode. The ANN model is combined with the change point detection method and the window adjusting method for enhancing the RUL prediction capability of the model. The recurrent neural network (RNN) model can better process and extrapolate the time series compared to the feed-forward neural network. 271 Hence, Tian and Zuo 173 used RNN for multi-step-ahead prediction of RMS-based HI for gear RUL prediction. In this work, the RMS indicator was fitted with the Weibull hazard rate in order to reduce the fluctuation in the HI value and hence improve the accuracy of the RUL prediction.
For further improvement in the accuracy of the neural network model, the ANN model is combined with some other artificial intelligence (AI)-based model. For example, Wang et al. 257 used a neuro-fuzzy (NF) network to estimate the RUL one step ahead. Significant accuracy in prediction was observed with the NF network compared to the RNN model when trained with a large amount of training data. Deutsch and He 258 used a deep belief feed-forward NN algorithm for RUL prediction of the bevel gear subjected to pitting. Six HIs, namely, residual RMS, energy operator RMS, FM0, narrowband kurtosis, AM kurtosis, and FM RMS were extracted from the raw vibration signal and used as inputs to the neural network for gear RUL prediction. The advantage of the DNN model over the conventional ANN is that it reduces the manual processing and analysis of the data by human experts. These methods are therefore suitable for automatic data processing and HI extraction for big data.
Particle filter
The PF is a Monte-Carlo approximation method that can monitor dynamic systems. The PF-based approach first creates the analytical/mathematical model that describes the component degradation process, and later, this degradation model is used to predict the future evolution of the component state and hence predict the RUL. Based on the past and present values of the available degradation data, the PF approach is used to calibrate the model parameters by assuming a probabilistic measurement model that links the measurement data with the component degradation. 272 He et al. 261 and Jia et al. 262 used the PF algorithm for predicting the RUL for a gear subjected to pitting and wear, respectively.
The PF approach can’t be directly implemented if the current state of the gear is not available. 272 Hence, some researchers integrated the PF model with some AI-based approaches; the later can be used for estimating the current state of the gear. For example, Cheng et al. 264 combined the adaptive neuro-fuzzy inference system (ANFIS) and PF algorithm for RUL prediction of a wind turbine gear subjected to wear failure mode. Based on the learned state transition function from ANFIS, the RUL was predicted using the PF algorithm.
If multiple observation indicators/sensors such as vibration, noise, and acoustics are used for the RUL prediction, then the conventional PF cannot be used as it deals with a single observation indicator. Hence, Ni et al. 265 used an adaptive state-space model for the RUL estimation of a planetary gear. Using different types of sensor data such as wear, vibration, and backlash, three RUL prediction models were developed: (a) considering the wear and backlash data, (b) considering the wear and vibration data, and (c) considering wear, backlash, and vibration data. The highest accuracy was observed with the third model, as this model was developed using the data from three different kinds of sources.
Working on similar lines to the PF algorithm, other nonlinear state-space models have also proposed for the gear RUL prediction in the past.266–270,273
Miscellaneous
Most of the approaches for gear prognosis in the literature are either state space-based or ANN-based. A few other possible prognostic models were tried for gear RUL prediction. Roulias 191 compared the performance of three RUL prediction models: (1) proportional hazards model, (2) ε-support vector regression (ε-SVR), and (3) exponential extrapolation based on bootstrap sampling. The TD-, FD-, and wavelet packet (WP) energy-based HIs extracted from AE and vibration data were fused using ICA. The ε-SVR was highlighted as the highly efficient method for gear fault prognosis if sufficient training data was available. It was observed that the exponential extrapolation method works best for the gear RUL prediction in the absence of sufficient training data. Samanta and Nataraj 259 compared the performance of two AI-based techniques, that is, SVR and ANFIS, for extrapolating the trend of different HIs developed for different kinds of gear faults, that is, worn, chipped, cracked, and pitting. The performance of the SVR model was found better than the ANFIS model. Hussain and Gabbar 260 compared the performance of the ANFIS model with the nonlinear autoregressive model with exogenous inputs (NARX) model for predicting the trend of the HI developed for planetary gearbox health monitoring. Comparable prediction performance from both algorithms was observed. However, the performance of the NARX model was observed slightly higher than the ANFIS model.
The data-driven prognostic models such as ANN and PF have been extensively studied while the random forest regression (RFR), one of the better-known algorithms for performing regression tasks, is relatively unexplored in the area of gear fault prognosis. The RFR methodology builds multiple weak and diverse decision trees and ensembles them together to get a more accurate and stable RUL prediction. 274 Kundu et al. 105 presented the RFR approach for RUL prediction of a spur gear subjected to natural pitting progression. HIs such as CCR, RMS, peak, CF, M6A, FM4, NA4, and ER were used as inputs to the prognostic model. As mentioned earlier, the RFR approach has an inbuilt best HI selection capability, and using this model, the CCR was found to be the best HI for estimating the RUL. Based on the best HI (CCR), the RUL of the gear was predicted. In addition, the fusion of the CCR indicators extracted from the vibration sensor signals drawn from different sensor locations using the MD approach has been shown to be more effective in the RUL prediction compared to the RUL prediction using a single sensor-based CCR indicator. Elforjani 275 proposed a combined diagnostic and prognostic model for health monitoring of a wind turbine gearbox. The RUL was predicted using four different models such as ANN, decision tree, SVM, and Gaussian process. The highest prediction accuracy was obtained with the Gaussian process model and the lowest accuracy with the ANN model.
Hybrid approaches
It is well known that data-driven prognostic models give better accuracy provided a large amount of training data is available. Such a kind of extensive data may not be readily available in practice. In addition, this approach is time-consuming as one has to wait for the failure to happen while the data from the sensors are collected over a long period of time. In addition, if one wishes to develop a data-driven model based on lab-scale test data instead of the field failure data (that may take a longer time), the cost of performing the experiments is high. Hence, for prognostics, a physics-based degradation model that gives a close and accurate relationship between the extent of the damage on the gear tooth and time can be a preferred alternative. However, the accuracy of the prediction by a physics-based prognostic model depends on how accurate the model captures the physics of the system and the exact role of multiple variables that influence the damage progression. Even if the model is built with reasonable accuracy, the prediction is influenced by the current stage fault diagnosis information (exact information of the type of failure mode and its severity). It is difficult to assess the current physical health condition of the component without interrupting the machine operation. Researchers have developed the dynamic models discussed in section “Dynamic models” to simulate response due to such kind of failure mode in gear. However, these simulation models are not validated with the actual field data. In addition, the physics-based diagnostic models may also give a high error in prediction if the parameters used in the model do not represent the physical system accurately. Data-based models discussed in section “Gear fault diagnostics” are very good in diagnosis compared to the physics-based diagnostic model. Hence, the physics-based prognostic models assume that the current health condition could be estimated from the sensor data. 16
The hybrid prognostic model exploits the benefits of both physics and data-driven models. They are expected to give a better prediction, effectively harnessing the diagnosis accuracy benefits of the data-driven approaches and prognosis accuracy benefits of the physics-based approaches. Figure 6 shows a possible architecture for a hybrid framework for fault diagnostics and prognostics. A very few works dealing with the hybrid prognostic model for gear are reported and are mainly restricted to a tooth crack failure mode only.

A hybrid framework for fault diagnostics and prognostics.
Zhao et al. 276 proposed a hybrid model for estimating the gear life subjected to the gear tooth crack failure mode. The hybrid model in this work is divided into several parts: a gear FE model for gear stress analysis, a gear dynamic model for dynamic load calculation, and then a Paris power law equation for RUL estimation as was proposed by Li and Choi 45 and Li and Lee 47 discussed in section “Crack.” Using the HI extracted from the CM data, the current crack length on the gear tooth was predicted. Based on predicted crack length, the fatigue model parameters were updated using the Bayesian inference and RUL was estimated. Uncertainty and error in prediction was found to decrease with the availability of more crack length data and corresponding updating of Paris power law equation model parameters using Bayesian inference. In this study, simulated degradation data is used to show the effectiveness of the proposed methodology. Later, the same work is extended by Zhao et al., 277 in which a polynomial chaos expansion approach was used to evaluate the uncertainty in RUL prediction and likelihood function of Bayesian inference. The same work was further extended by Zhao et al., 278 in which a hybrid prognostics model was developed for gear under time-varying operating conditions. In practical cases, a sudden shock on the gear tooth surface may suddenly increase the damage size and creates a jump in the degradation path that reduces the gear life. In a further extension of the work proposed in Zhao et al., 276 Zhao et al. 279 address this problem by proposing a hybrid prognostics model which considers the uncertainty of crack initiation time and random shock.
Similar to the above-mentioned studies, Kacprzynski et al. 280 proposed a hybrid prognostics model for a spiral bevel pinion gear of a helicopter gearbox under gear tooth crack failure mode. Based on the fusion of some of the vibration-based indicators, gear crack severity level was predicted and then using physics-based model, propagation of crack size till a threshold crack length was estimated. Choi and Li 281 also proposed the hybrid prognostics model for a spur gear subjected to tooth crack failure mode. The feed-forward neural network model was used for estimating the crack size. The ANN model predicts the crack length based on HIs extracted using the raw vibration data. Based on the estimated crack length, the future evolution of the crack length was predicted for RUL prediction using the Paris power law equation. Zhao et al. 79 proposed a hybrid prognostic model for a spur gear subjected to surface wear failure mode. Based on the metal particle data, the mass loss from the gear tooth surface was estimated. Based on this estimated mass loss in the current state of the gear, the Archard’s wear model was used for predicting the future evolution of the wear on the gear tooth surface. The Bayesian inference approach was used to reduce the uncertainty and error in RUL prediction and to update the wear coefficient parameter in the Archard’s wear model. The posterior distribution of the wear coefficient parameter was found to become narrower and uncertainty and error in the RUL prediction is found to reduce with more availability of the mass loss data.
Basically, all the hybrid models for gear integrate material-level fatigue models, system-level data fusion algorithms, or data-based diagnostic and model parameter tuning techniques for the RUL prediction.
In general, due to a lack of insights/understanding of complex real-life conditions, it is often easy and convenient to neglect/overlook certain practical considerations while developing models in a laboratory environment. 16 For the implementation of the PHM framework, for an industry environment, some PHM standards given by a different organization or researchers282–286 can be referred. In addition, the 5S (streamline, smart processing, synchronize and see, standardize, and sustain) approach given by Lee et al. 9 for systematic PHM design and implementation may also be referred.
Uncertainty quantification of prognostic approaches
The prognostic approaches should not only predict the RUL but also report the associated uncertainty of prediction or probability. 287 The single point estimation of RUL is usually considered meaningless in industries. Hence, the decision making can be more convenient when the predicted RUL is reported with the associated uncertainty of the prediction. The importance of the uncertainty quantification during prognostics modeling is highlighted in Engel et al. 288 and Saha and Goebel. 289 Zhao et al. 276 summarized the different sources of uncertainty and various uncertainty quantification activities. Shrestha and Solomatine 290 gives an overview of uncertainty quantification for different approaches such as linear regression, nonlinear regression, decision tree, locally weighted regression, and fuzzy c-means clustering. In general, the uncertainty is expressed in the form of upper and lower quantiles of the underlying distribution of prediction errors. 290 Figure 7 shows an example of the representation of uncertainty in RUL prediction. The upper and lower RUL prediction interval quantifies the model uncertainty and a narrower prediction interval width implies a lower model uncertainty. In the initial stage of gear life, the gear fault propagation-related information is less informative and loading conditions need to be assumed for a longer period of time. Hence, in the initial stage of gear life, RUL is predicted with a large amount of uncertainty. Whereas near to end of the gear life, the loading conditions need to be assumed for a shorter period of time and therefore uncertainty in prediction is less. In general, the uncertainty in prediction decreases with time. 276

An example of uncertainty in RUL prediction.
The prognostic approaches estimate the RUL in the form of a PDF based on which model uncertainty is estimated. Using the central limit theorem, based on RUL prediction distribution mean and standard deviation parameters, the uncertainty in RUL prediction can be obtained 105
where
Different methods have been used for quantifying the uncertainty in RUL prediction. According to Shrestha and Solomatine, 290 the uncertainty of the prognostics model can be estimated using four ways:
Probabilistic-based methods: In the probabilistic-based methods, the model parameters are associated with uncertainty with some distribution parameters. The random samples of parameters are drawn from the distribution and based on which predictive distribution for model output, that is, RUL is obtained.91,227,291 In the probabilistic-based methods, the Bayesian approach is the most widely used for interpreting the uncertainty in RUL prediction, especially in physics-based prognostic approaches.
Model error-based methods: In these methods, the model uncertainty is estimated through statistical properties of the error while producing the observed data. 290 Many of the machine learning and statistical models used this method for prediction uncertainty estimation. The studies reported in previous studies292–294 can be referred for a detailed study on these methods.
Resampling methods: The ensemble learning-based methods belong to this category. The RFR is the most widely method used in this category for estimation of the prediction uncertainty. In the resampling methods, out of the total available data set, “b,” random samples are bootstrapped/resampled each time to train a model. This process is repeated “B,” times and hence “B,” number of predictions are available for uncertainty quantification. The studies reported in Kundu et al. 105 and Kasiviswanathan et al. 227 can be referred for resampling-based methods for uncertainty quantification.
Fuzzy-based: In the fuzzy-based methods, the model inputs and parameters are represented as a fuzzy number for defining the variability and to quantify the prediction uncertainty. 227 The studies reported in Shrestha and Solomatine 290 and Maskey et al. 295 can be referred for fuzzy-based methods for uncertainty quantification.
Some prognostic approaches have inbuilt uncertainty quantification capability, whereas some approaches are integrated with uncertainty methods discussed above for quantification of uncertainty in predicted RUL. For example, Kundu et al.
105
used an ensemble decision tree-based RFR methodology for RUL prediction with associated uncertainty. The RFR methodology has an inbuilt uncertainty quantification capability. As discussed earlier, the RFR approach constructs
Performance evaluation metrics
The performance metrics are used to quantify the uncertainty and error of the prognostic/classification algorithms. The performance metrics allow the user to compare different models, understand the validity of a prognostic/classification estimation, and model performance characterization over different operating regimes.297,298 The metrics for fault classification and fault prognosis are completely different due to the difference in the output of the two processes.
Metrics for evaluating the performance of the fault classification approaches
The confusion matrix is usually created for checking the correctness and accuracy of a classifier model. The confusion matrix is a table with its size based on the number of classes in both dimensions. Table 8 shows how a confusion matrix can look like for a two-class problem. In Table 8, the true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) stand for the number of instances the model predicts a healthy class as healthy, a healthy class as faulty, a faulty class as healthy, and a faulty class as faulty, respectively. The FP are also termed as a Type-I error and the FN are termed as Type-II error. The confusion matrix itself is not a performance measure as such. Still, most of the performance metrics for a classifier model are based on the confusion matrix and its values.
A confusion matrix.
Some of the performance metrics commonly used are as follows:
Metrics for evaluating the performance of the fault prognosis approaches
Performance metrics for prognostics should quantify factors like sensitivity to small changes, resistance to outliers, and usefulness for decision making. There is no single metric that can capture all these factors. The four major parameters such as uncertainty with some confidence bound, accuracy, precision, and timeliness are used for checking the RUL prediction model performance. The prognosis output should have two components: RUL or an estimated time to failure (ETTF) and the associated uncertainty. As discussed earlier in section “Uncertainty quantification of prognostic approaches,” subsequent to knowing the RUL value, the decision-making activity is more adaptive and effective if the RUL value is provided with the associated uncertainty rather than only a single predicted value. 11 Without uncertainty estimation, the prediction loses its significance. 105 The accuracy measures the closeness of the actual value to the predicted value. The precision measures the narrowness of the interval in which RUL falls and checks how close the predictions are clustered together. For every model, high accuracy and precision are required, but the same level of accuracy and precision in RUL is difficult to obtain in all the models. Hence, model selection is a key parameter for accurate RUL prediction. Timeliness measures the relative position of the PDF of the predicted model along the time axis.
The most widely used prognosis performance metrics in literature are mean square error (MSE)180,228,299,300 and mean absolute percentage error (MAPE). 190 However, these metrics are scale-dependent. The other performance metrics proposed include prognostics horizon, relative accuracy, convergence, α-λ performance, and so on. The development of performance metrics for a prognostic model is discussed in Saxena and colleagues.297,301
Post prognostics decision making/health management process
Most of the gear failures take time to grow and reach a threshold of failure. In that scenario, using the RUL prediction results, the decision-making process provides the economic benefits, by exploiting the gear service life, adjusting the gear maintenance cycles, improving safety by adjusting the operating conditions, and so on, reduces life cycle costs, and reduces catastrophic failures.6,302 In addition, the decision making step may also involve optimum allocation of resources/inventory, managing the logistic chain, an extension of production, and so on. 303 The utilization of prognostic results in the decision-making process has not matured as yet. The decision-making strategies specific to gear health management is not available in the literature. However, in general, a few strategies are presented in the literature to optimize the maintenance activities based on the predicted RUL. For example, based on the predicted RUL, Lei and Sandborn 304 proposed a real options analysis approach to determine the optimum predictive maintenance opportunity for a subsystem in a wind turbine. Skima et al. 302 addressed the post prognostics decision-making process for a micro-electro-mechanical system (MEMS). In this work, the decision-making problem is solved for distributed MEMS systems in which a set of autonomous equipment communicate and interact with each other to achieve a common mission. Camci et al. 305 integrate the PHM information into maintenance and logistic infrastructure in order to reduce development, production, and ownership costs for the next-generation fighter aircraft. In the absence of precise RUL information, industries usually suffered from production loss and material wastage due to unplanned failure of the machine in the middle of the operation. To overcome this, Asmai et al. 306 designed a framework for production scheduling based on the predicted RUL.
Based on multiple case studies conducted in different industries, Tiddens et al. 307 presented a generalized framework to implement prognostics technologies for maintenance decision making. Chebel-Morello et al. 303 presented many concepts of decision making based on expert knowledge of the system and predicted RUL. Based on the RUL prediction results, an example of reducing the predictive maintenance-related costs via appropriate planning is also discussed in this work. A bibliographic overview of different decision-making approaches is given in this work. Hence, for a detailed study on post prognostics decision-making process, this work may be referred. Based on the above discussion, for mechanical components such as gear, the decision-making process can be broadly divided into two categories: maintenance optimization and mission reconfiguration. For a detailed study, previous works308–311 may be referred for the former category, whereas the work by Asmai et al. 306 may be referred for the latter category.
Concluding remarks
The present work reviews different diagnostic and prognostic approaches, which encompass major activities of the PHM related to gears. A wide variety of sensors and its associated data processing approaches have been explored in developing diagnostic and prognostic approaches in the geared system. However, this review has, in particular, focused on the vibration sensor-based approaches.
Substantial investigations have been undertaken in major aspects of the fault identification, fault modeling, and fault propagation models of different fault categories in gears. Based on the literature reviewed, challenges and future directions in different aspects of the gear diagnostics and prognostics are enlisted as follows:
1. Fault diagnostics
• Hybrid faults
Researchers have investigated the dynamic response and developed physics-based and data-driven diagnostics and prognostics model for individual failure modes such as pitting, crack, and wear separately. These studies can be further extended for investigating the hybrid faults on the gear tooth surface. Since the combination of failure modes may sometimes appear in a practical situation, the complexity of the model can be very challenging in such scenarios.
HI development
A very few HIs exist that are sensitive for the entire range of gear life cycle. Some indicators are sensitive in the early stage of the gear fault, while others are sensitive in the advanced stage of the gear fault. Similarly, since most of the studies are based on the discrete amount of seeded gear fault size, it is still not clear which way many of these proposed indicators would behave during the gradual changes in the different health stages of the gear or during a complete life cycle for a particular kind of failure mode. Some of the recent signal processing approaches obliviate the need of a reference signal and the corresponding sensor (as required in many of the TSA-based techniques) during development of these HIs.
Definition of health stage division
Most of the gear failure modes are progressive in nature. With the development of a particular fault size, the trend of HI value varies. The degradation process of the gear can be divided into various stages based on the change in the value of physical damage or HI. Different quantification criterion is considered by various researchers to define the health stage division. However, no standard method is available in the literature to establish the division of these health stages for the gear.
2. Fault prognostics
Mainly due to the non-availability of enough run to failure test data for the gears, fewer studies exist for data-driven gear fault prognostics in contrast to the many for bearing, battery, and so on. Although there are many fault diagnostic approaches developed for gears, extensive studies on prognostic approaches, particularly with different operating parameters, may be pursued to increase confidence in the proposed prognostic approaches. The future prognostic approaches may involve further work on the stochastic nature of the lubrication, surface quality, hardening/heat treatment, and so on.
Change in operating condition
The data-driven prognostics and fault severity classification models are designed for particular operating conditions. Hence, if the operating conditions change for a system, then the historical data for that particular value of operating conditions is required. This leads to developing new prognostic and classifier models, which is costly both in terms of time and extensive experimentation. However, recently deep transfer learning-based classifier models are being developed. Such models use data obtained from one operating condition as a historical dataset for model training and predict fault severity level/types for other operating condition datasets. Similarly, the transfer learning-based developed models for gear fault severity/types classification can be used to develop a transfer learning-based gear RUL prediction model.
Prediction of the RUL in the field based on data obtained from the laboratory environment
Most of the data-driven prognostic models for gears have been developed based on the data obtained from specific test rigs at research laboratories. In most of the field applications, run-to-failure data may not be available to develop the prognostics model for a different size/capacity of the field gearbox vis-à-vis laboratory test setup. In such scenarios, the approaches to extrapolate the knowledge base on the prognostics model from research laboratories to the field application should be developed and explored for its accuracy, repeatability, and practicality. In literature, deep transfer learning models are being demonstrated to classify the fault severity level in the field based on the model trained using the laboratory test data. The idea can be further explored to predict the RUL in the field, based on the lab test data.
3. Definition of failure threshold
The failure threshold is typically defined based on the limiting value of damage or HI. Based on the size of the physical damage, the failure threshold for most of the gear failure modes is defined subjectively, not quantitatively. Some ISO standards are available to determine the failure threshold based on the value of HIs such as RMS and peak. In many of the studies reported in the literature, it was found that these HIs may not exhibit good sensitivity to monitor the damage progression in gear. Suitable guidelines to define the failure threshold for other HIs as well as based on the damage area may be desirable.
4. Integration of different approaches for a composite prognostics framework
Future research may need to be more focused on the integrated development of a prognostics framework involving alternate types, harnessing the strengths of the individual approaches. For example, the data-driven fault prognostic and fault severity classification models heavily rely on historical failure data. For a newly installed gearbox, the historical failure data may not be available. In such scenarios, integration of the dynamic models used in a physics-based framework and data-driven fault prognostics/classification model can improve the overall performance of the PHM activities. The fault prognostics/classifier models trained on simulated vibration data (obtained from the physics-based framework) can be used to predict the RUL/fault severity level for newly installed gears. For further improvement in the prediction accuracy, the trained classifier and prognostic models can be updated with data available from the actual field.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
