Leveraging generative artificial intelligence to bridge domain gaps in wind turbine research

Abstract

A central challenge in wind turbine health monitoring is the scarcity of real-world data due to limited instrumentation, leading researchers to rely on simulation models that often suffer from reduced fidelity. However, even within simulation environments, discrepancies arise because of modeling assumptions, and configuration fidelities, creating domain gaps that limit the transferability of learned representations. To investigate domain translation under controlled conditions, this project explores the use of generative artificial intelligence, specifically cycle-consistent generative adversarial networks (CGANs), to bridge the gap between OpenFAST simulation models representing 1.5 MW and 5 MW wind turbines. A physics-informed CGAN architecture is introduced, where a simplified turbine tower dynamics model is incorporated into the training loss to ensure physically consistent outputs. Quantitative results showed moderate to high agreement in frequency-domain features. Incorporating the physics-informed loss function improved the R² values by 30%, reduced the RMSE from 1.39 to 1.1 m/s², and reduced training time by 82%. Furthermore, under increased turbulence intensity (IEC Category A), the RMSE remained stable at approximately 1.1 m/s². While the present study is entirely simulation-based, it establishes a pipeline for evaluating physics-informed generative domain translation, which may serve as a foundation for future simulation-to-reality validation studies.

Keywords

GAN virtual sensing wind turbines

1. Introduction

Wind turbine health monitoring and reliability analysis are often hindered by the scarcity of high-quality, real-world operational data. Limited instrumentation, high installation and maintenance costs, and restricted access to field measurements frequently force researchers and practitioners to rely on simulation models or subscale experiments (Wan et al., 2025; Yang et al., 2023). While these approaches are valuable, they often suffer from reduced fidelity and limited transferability, particularly when extrapolating across turbine ratings or from simulated to real systems. This data gap becomes even more pronounced when studying next-generation turbines, for which only reference models are available. Addressing this challenge requires data-driven approaches capable of bridging domain gaps while preserving the underlying physical behavior of wind turbine systems.

Commercial wind turbines are not always equipped with all instruments required for comprehensive condition monitoring. As a result, simulation data and benchtop subscale models are commonly extrapolated to full-scale applications. However, simulation models typically rely on simplifying assumptions and reduced degrees of freedom to enable computational efficiency, which can lead to a simulation–reality mismatch. Even within purely simulation environments, discrepancies arise across structural configurations, and modeling fidelities, creating domain gaps that limit the direct transferability of learned representations between simulated platforms. A similar gap exists when transferring knowledge from benchtop subscale prototypes to full-scale systems. For example, Keller et al. (2017) demonstrated that cumulative frictional energy observed in a benchtop gearbox test is highly dependent on specific test configurations, materials, and lubrication conditions, limiting its direct applicability to full-scale turbines. Numerous model-driven techniques have been proposed to address domain shifts, including response surface methods (Ren and Chen, 2010), sensitivity-based approaches (Ferrari et al., 2019), and Bayesian updating techniques (Mao et al., 2020). Despite their effectiveness in certain contexts, these methods often require extensive computational resources and exhibit limited generalization to new operating conditions (Ge and Sadhu, 2024).

From a data-driven perspective, transfer learning has emerged as a promising alternative due to its flexibility and scalability. In structural mechanics, researchers have explored learning representations from simulated environments and transferring them to physical systems. For instance, Teng et al. (2023) trained a convolutional neural network using digital twin responses for damage classification and subsequently applied the model to experimental bridge data. Similarly, Bao et al. (2023) employed finite element simulations combined with transfer learning to enable deployment on real structures. While effective, these approaches often depend on extensive simulation campaigns spanning a wide range of operating scenarios, which can be computationally prohibitive for complex systems (Ge and Sadhu, 2024).

Generative Adversarial Networks (GANs) have emerged as effective tools for domain adaptation in scenarios where paired correspondence between source and target domains is unavailable (Goodfellow, 2014). In this work, we apply cycle-consistent GANs (CGANs) to translate wind turbine tower acceleration responses between 1.5 MW and 5 MW turbines using unpaired OpenFAST simulation data. The key contribution is integrating a simplified SDOF cantilever beam model directly into the generator loss function, constraining generated signals to satisfy the equation of motion for a damped oscillator under aerodynamic forcing, with normalized mass, damping, and stiffness treated as trainable variables. Unlike traditional physics informed neural networks, which solve differential equations at discrete instances, or conditional GANs requiring paired data, our approach enforces physical constraints as a weighted penalty within the adversarial loss during unpaired distribution matching. Results demonstrate that incorporating the physics-informed constraint enhances cross-domain translation fidelity by reducing time-domain error, and accelerating convergence compared to standard CGAN training without physics regularization.

2. CGANs working principle combined with physics-informed loss function

2.1. Wasserstein GAN

GANs, originally proposed by Goodfellow (2014), are a class of neural networks that are composed of two sub-networks: a generator G and a discriminator D. Real data and latent noise are fed into D and G networks. The G network tries to generate synthetic data that look real in order to fool the D, which tries to discriminate between real and synthetic data. By doing so, the G learns the target domain data distribution over the source domain and polishes the target domain data such that it carries discriminant features from the source domain.

GANs training is usually challenging and suffers from mode collapse and convergence failure (Saad et al., 2024). In mode collapse, the generator is not able to learn rich feature representation and generate many duplicates, while in convergence failure, the D and G do not reach a balance, and one of them dominates the other (MathWorks, 2024). Literature review indicated that the Wasserstein loss function with gradient penalty offered more stable training compared to the traditional log-likelihood objective (Arjovsky et al., 2017; Gulrajani et al., 2017). Using the log-likelihood loss function (shown in equation (1)), the discriminator acts like a classifier that distinguishes between real and fake images. This typically leads to minimizing discontinuous functions with respect to the generators’ parameters

\begin{align} L_{disc} (P_{x}, P_{y}) & = E_{x \sim P_{x}} [\log D (x)] \\ + E_{y \sim P_{y}} [\log (1 - D (F (y)))] \end{align}

(1)

where

E

is the expected value, P_x and P_y are the probabilities that an instance belongs to domain X or Y. Optimizing the G parameters over discontinuous functions is argued to cause mode collapse and divergence issues in traditional GANs training. For this reason, the Wasserstein-1 loss or Earth-Mover distance (shown in equation (2)) was used, which typically captures the distance between the source and target domain probability distributions (P_x, P_y)

\begin{aligned} L_{critic} (P_{x}, P_{y}, D) = & - ‖ E_{x \sim P (x)} [D (G (x))] \\ - E_{y \sim P (y)} [D (y)] ‖_{1} \end{aligned}

(2)

By doing so, the discriminator is transformed into a critic that scores the realness or fakeness of an image, which leads to meaningful loss curves that monitor convergence while training. The new critic loss is given by the negative of the Wasserstein-1 distance between the real and fake scores while the generator loss is now just the score the critic gave it. A stopping criterion can be established by monitoring the loss curves as a loss of zero means that G has done a great job of creating fake images that look real. The G is responsible for matching the generated image distribution with the data distribution in the target domain, and its loss is defined based on the critic average score on fake images as shown in equation (3)

\begin{aligned} L_{gen} (P_{x}, D) = E_{x \sim P (x)} [D (G (x))] \end{aligned}

(3)

Transforming the discriminator into a critic in a Wasserstein GAN is accomplished by replacing the last sigmoid layer with a linear activation. A key challenge in training traditional GANs is the issue of vanishing or exploding gradients. To address this, Arjovsky et al. (2017) proposed weight clipping as a way to enforce the 1-Lipschitz constraint, which ensures that the gradient between any two points on the function does not exceed a value of 1. However, weight clipping often led to poor performance and failed to prevent gradient issues effectively. As an alternative, Gulrajani et al. (2017) introduced the gradient penalty method, which penalizes the norm of the critic’s gradient with respect to its input, rather than constraining weights directly. Extensive experiments demonstrated that combining the Wasserstein loss with gradient penalty results in significantly more stable training.

2.2. Cycle consistency loss

A major shortcoming of traditional generative models is the lack of temporal dynamics consideration. Wind turbine data are usually composed of time series representing temperature, vibration, etc. The sequential setting of regular GANs does not adequately consider temporal correlations embedded within the time series. In this paper, we utilized CGANs architecture, shown in Figure 1, to handle time series temporal correlations. CGANs were advanced to map between a set of unlabeled and unpaired images from two domains: source x_i ∈ X and target y_i ∈ Y domains (Zhu et al., 2017).

Figure 1.

CGAN architecture.

CGANs consist of two GANs: one is designated for forward mapping from X to Y, and the other GAN is designated for inverse mapping from Y to X domain. With two critics and closed-loop architecture, CGANs perform better in translating between the two spaces. The two mappings are accomplished by generators G: X → Y and F: Y → X. The two critics are responsible for distinguishing between real and generated time series: D_x distinguishes between time series x and F(y), D_y distinguishes between time series y and G(x). To prevent the learned mappings to contradict each other, cycle consistency loss is added to each generator loss function. G and F should be able to bring back each time series to its original representation, that is, forward cycle consistency x → G(x) → F (G(x)) ≈ x and backward cycle consistency y → F(y) → G (F(y)) ≈ y. As advocated in previous literature (Teng et al., 2023), L₁ norm was used to define the cyclic loss as it captures the reconstruction error

\begin{align} L_{cyc} (P_{x}, P_{y}) & = E_{x \sim P_{x}} [‖ F (G (x)) - x ‖_{1}] \\ + E_{y \sim P_{y}} [‖ G (F (y)) - y ‖_{1}] \end{align}

(4)

2.3. Physics-informed loss function

The integration of physics principles not only enhances training performance but also ensures that the generated data adheres to the relevant physics laws, thereby preventing divergence and reducing training time. A wind turbine can be modeled as a cantilever beam with a lumped mass representing the rotor and nacelle at the top. Under this approximation, the system’s structural dynamics are governed by the following equation of motion (Hernandez-Estrada et al., 2021)

M \ddot{r} + C \dot{r} + K r = \frac{1}{2} ρ A V^{2} C_{d}

(5)

Equation (5) describes the tower-top acceleration in response to the wind loading as a second-order non-homogeneous differential equation where M represents the lumped mass of the rotor and nacelles, C is the damping coefficient, K is the structural stiffness, and $\ddot{r}, \dot{r}, r$ represents the tower-top acceleration, velocity, and displacement, respectively. The right-hand side describes the wind-induced force, where ρ is the air density, A is the effective area of the tower subject to the airflow, V is the wind velocity, and C_d is the drag coefficient of the tower. Since onboard sensors are typically limited to measuring only the tower-top acceleration $\ddot{r}$ , numerical integration is employed to estimate the corresponding velocity $\dot{r}$ and the displacement r. High-pass filtering is applied to remove the drift due to numerical integration. The values of M, C, K, ρ, A, C_d are not required to be known a priori. Instead, the training process adopts a data-driven optimization approach, treating these physical coefficients as trainable parameters. Gradients are computed with respect to each parameter, and the model iteratively updates them to converge toward their optimal estimates. Equations (6) and (7) below describe the physics loss as follows

\begin{align} \begin{aligned} L_{p h y s_{x}} (P_{{\ddot{r}}_{x}}) = E_{{\ddot{r}}_{x} \sim P_{{\ddot{r}}_{x}}} [‖ M_{x} {\ddot{r}}_{x}^{g} + C_{x} {\dot{r}}_{x}^{g} + K_{x} r_{x}^{g} \\ - \frac{1}{2} ρ A_{x} V^{2} C_{d x} ‖_{2}] \end{aligned} \end{align}

(6)

\begin{align} \begin{aligned} L_{p h y s_{y}} (P_{{\ddot{r}}_{y}}) = E_{{\ddot{r}}_{y} \sim P_{{\ddot{r}}_{y}}} [‖ M_{y} {\ddot{r}}_{y}^{g} + C_{y} {\dot{r}}_{y}^{g} + K_{y} r_{y}^{g} \\ - \frac{1}{2} ρ A_{y} V^{2} C_{d y} ‖_{2}] \end{aligned} \end{align}

(7)

Where ${\ddot{r}}_{x}^{g}$ and ${\ddot{r}}_{y}^{g}$ are the generated x − and y − domains tower-top acceleration. The simplified SDOF cantilever beam model neglects several important physical phenomena: higher-order tower bending modes, blade-tower aeroelastic coupling, and turbine-specific control strategies that differ substantially between the 1.5 MW and 5 MW turbines. The physics-informed loss serves primarily as a regularizing constraint that enforces basic force-balance relationships rather than a high-fidelity structural model. It was noticed that, in attempting to minimize the L₂ norm in equations (6) and (7), the training algorithm converged to a trivial solution by setting all trainable parameters to zero. To prevent this behavior, the physics-informed loss functions were normalized by the wind loading parameters, ρ, A, C_d. This normalization eliminates the incentive for the algorithm to trivially minimize the loss by driving the parameters to zero. As a result of this normalization, the coefficients associated with the $\ddot{r}, \dot{r}$ , and r terms are no longer the mass M, damping coefficient C, or the stiffness K but instead M/ρAC_d, C/ρAC_d, K/ρAC_d. Additionally, numerical integration assumes zero initial conditions by default, which may not always reflect the true state. To account for this, an additional trainable parameter r₀ was introduced to represent the initial displacement. The resulting physics-based losses are incorporated into the total generator loss, alongside the critic loss and cycle consistency loss. During training, gradients are computed with respect to each parameter based on their associated loss contributions, and the model parameters are updated accordingly. Since training data was normalized, the learned parameters do not represent absolute mass, damping, or stiffness values and should be interpreted as regularization terms enforcing fundamental force-balance relationships. Nevertheless, their relative scaling is physically consistent: the learned mass and stiffness for the 5 MW turbine are approximately double those of the 1.5 MW turbine, reflecting expected structural scaling between the two systems.

2.4. Total losses

The previously defined loss components are aggregated to form the total loss function for each of the four networks. Each network is then trained to minimize its corresponding composite loss. For the critic networks specifically, the objective is to minimize the loss toward increasingly negative values, which corresponds to maximizing the separation between the critic scores assigned to real and generated (fake) samples. The complete composite loss functions are defined as follows

\begin{align} \begin{aligned} L_{G_{total}} (P_{x}, P_{y}) = L_{gen} (P_{x}, D_{y}) & + λ_{cyc} L_{cyc} (P_{x}, P_{y}) \\ + λ_{phy} L_{p h y s_{y}} (P_{y}) \end{aligned} \end{align}

(8)

\begin{align} \begin{aligned} L_{F_{total}} (P_{x}, P_{y}) = L_{gen} (P_{y}, D_{x}) & + λ_{cyc} L_{cyc} (P_{x}, P_{y}) \\ + λ_{phy} L_{p h y s_{y}} (P_{x}) \end{aligned} \end{align}

(9)

\begin{align} L_{c r t i c_{x}} (P_{x}, P_{y}) = L_{critic} (P_{y}, P_{x}, D_{x}) \end{align}

(10)

\begin{align} L_{c r t i c_{y}} (P_{x}, P_{y}) = L_{critic} (P_{x}, P_{y}, D_{y}) \end{align}

(11)

Where λ_cyc and λ_phy are the hyperparameters determining the relative weight of the cycle consistency loss and the physics-informed loss.

3. Generators and critics network architecture

Table 1 shows the hyperparameters used for training the CGAN. Both G and F networks utilize encoder-decoder architecture. Extensive hyperparameter tuning was carried out before reaching to the numbers in the table. It was noticed that each training epoch lasted more than 1 minute, given that four networks are trained simultaneously. Hence, we decided to utilize TensorFlow distributed training strategy which reduced the training time potentially and was instrumental in identifying the best set of hyperparameters that leads to fast convergence.

Table 1.

Convolutional neural network hyperparameters.

	Generator	Critic
Number of conv layers (encoder)	3	4
Output maps for conv (encoder)	16, 32, 64	32,64, 128, 128
Kernel widths (encoder)	19, 19, 19	17, 17, 17, 17
Number of conv layers (decoder)	4	—
Output maps for conv (decoder)	64, 32, 16, 1	—
Kernel widths (decoder)	9, 7, 4, 3	—
Learning optimizer	Adam	Adam
Learning rate without physics	1e–4	2e–4
Learning rate with physics	1.5e–4	3e–4
Input size	[(2048, 1), (2048, 1)]	(2048, 1)
Padding	Valid	Valid

During the preprocessing phase, we initially applied same padding in our convolution operations on the time series data. In CNNs, same padding maintains the original sequence length by symmetrically adding zeros to both the beginning and end of the input. This approach is standard in image processing because images are spatially invariant, meaning the spatial relationships in images are less sensitive to boundary padding. Unlike time series, images do not rely on strict directional causality, so padding artifacts typically do not affect their internal structure or interpretation significantly. However, in the case of time series, this symmetric zero-padding introduced misalignment and artificial time delays, particularly at the sequence boundaries. These delays disrupted the model’s ability to learn precise temporal dependencies, especially at the start and end of each input window where the zero-padding created discontinuities. As a result, we observed phase shifts between the predicted and actual signals, which reduced the temporal accuracy of the model. To address the issue, we transitioned to valid padding, which avoids introducing edge artifacts by omitting any padding. While this ensures that each output element is the result of a full convolution, it comes at the cost of reducing the output length and causing a mismatch between the input and output sizes. Additionally, valid padding inherently biases the model toward the middle of the sequence, as the beginning and end values are involved in fewer convolution operations compared to those in the center. To mitigate this imbalance, we developed a custom padding scheme, illustrated in Figure 2. The dataset was preprocessed using overlapping windows, where the end of each time series segment overlaps with the beginning of the next. This is a standard technique in time series processing to promote stable training and prevent abrupt gradient shifts between consecutive batches. Our input layer crops a portion of each sequence that overlaps with adjacent windows and concatenates it to the beginning and end of the original sequence. The network then receives this extended input, which allows the first convolution layer to fully convolve, even the original boundaries. This method effectively preserves the original sequence length while ensuring that edge values are convolved multiple times, just like central values. As a result, this approach significantly reduced phase shifts and improved the alignment between predicted outputs and the original time series.

Figure 2.

Proposed custom padding scheme.

Two other key architectural parameters are the number of output feature maps and the convolutional kernel widths. In the encoder, each convolutional layer is followed by a MaxPooling layer, which reduces the spatial resolution by a factor of 2. To compensate for this down-sampling and preserve the representational capacity of the network, the number of output feature maps is doubled at each subsequent layer. In contrast, the decoder uses UpSampling layers between convolutional layers to progressively restore spatial resolution, and accordingly, the number of feature maps is halved at each stage. Relatively wide kernel sizes were used in the encoder (19 for the generator and 17 for the critic) to allow each layer to capture features across a broader receptive field. This becomes increasingly important in the deeper layers, as the receptive field of the kernel effectively doubles with each pooling operation. In the decoder, narrower kernels were employed to facilitate precise reconstruction of the signal from the learned feature maps.

This full setup was used twice, to train with and without the physics loss added to the loss function. The only difference between these setups was a slightly higher learning rate for the one with the physics loss. This was because it was observed that without the physics loss the training at the higher rate was unstable and would not converge. However, with the physics loss the network would converge and remain stable.

4. Dataset preparation and prepossessing

For the simulation data generation, stochastic inflow turbulence input was generated using the TurbSim tool (National Laboratory of the Rockies, 2024). TurbSim was developed by the National Laboratory of the Rockies to provide a numerical simulation of a full-field flow. OpenFAST receives the input from TurbSim and simulates the response of the 1.5 MW and 5 MW turbines for 3000 s. Initial training with shorter durations were found to be insufficient, as the generators tended to overfit and memorize the dataset. To mitigate this, a longer simulation was adopted to provide a more diverse and representative training set. The SDOF approximation explained above is most appropriate for steady-state operation where the first tower mode dominates. The approach may not adequately capture transient dynamics or coupled rotor-tower interactions. Therefore, the initial transient segment (first 40 s) was excluded, and the training dataset was restricted to the steady-state regime. While this simplification enables stable and efficient training for the present single channel tower-top acceleration signal, extension to multichannel measurements or coupled subsystem dynamics would require higher order or coupled physics formulations. All datasets were standardized using zero-mean, unit-variance normalization. The time series data were then segmented into overlapping windows to increase the effective dataset size while preserving temporal correlations. The degree of overlap was dependent on the generator architecture; in the final configuration, each 2048-point sample overlapped by 1816 data points with its neighboring sample. From the generated data, the first 1600 samples were allocated for training. This number was chosen to align with a batch size of 32, allowing efficient use of computational resources during each training step. An additional 850 samples were reserved for testing, with a 100-sample gap separating the training and testing sets to avoid data leakage. This dataset partitioning is illustrated by the wind input in Figure 3.

Figure 3.

Simulated wind speed input with labeled data section.

5. Results and discussion

The CGAN, with and without the physics-informed loss function, was trained until convergence of the generator and critic losses, which served as the stopping criterion. Convergence was defined as the point at which the average difference between the critic score for real and generated samples fell below a threshold of 10. The baseline model, trained without the physics-informed loss, required approximately 73,000 epochs to reach convergence. In contrast, the physics-informed model converged significantly faster, requiring only 13,000 epochs to achieve the same criterion, representing an 82% reduction in training time. The physical quantity examined throughout this section is the tower-top acceleration which is measured at the nacelle level for both the 1.5 MW and 5 MW turbines. To evaluate G and F performance in mapping between different domains, we first examine the power spectral density (PSD) of the generated signals, as shown in Figure 4. The PSDs of the original signals compared with the generated counterparts exhibit a very similar spectral distribution. In both cases, without and with physics incorporation, the dominant energy is concentrated at low frequencies (<1.5 Hz), with smaller harmonics appearing at higher frequencies. The quantitative metrics in Tables 2 and 3 further support this observation. For X and its generated counterpart F(Y), the mean and peak frequencies remain close (1.22 vs 1.34/1.26 Hz and 1.00 vs 0.91 Hz, respectively), and the frequency deviation is slightly lower for F(Y) (0.026 vs 0.021/0.019 Hz). Similarly, for Y and G(X), both the mean and peak frequencies are well preserved (0.78 vs 0.84/0.81 Hz and 0.34 vs 0.57 Hz), with G(X) again exhibiting a reduced frequency deviation.

Figure 4.

The tower-top acceleration PSD of the reference and generated signals.

Table 2.

Frequency-domain features of testing data without physics loss.

Feature	X	F(Y)	Y	G(X)
Mean frequency [Hz]	1.22	1.34	0.78	0.84
Peak frequency [Hz]	1.00	0.91	0.34	0.57
Frequency deviation [Hz]	0.026	0.021	0.019	0.014

Table 3.

Frequency-domain features of testing data with physics loss.

Feature	X	F(Y)	Y	G(X)
Mean frequency [Hz]	1.22	1.26	0.78	0.81
Peak frequency [Hz]	1.00	0.91	0.34	0.57
Frequency deviation [Hz]	0.026	0.019	0.019	0.013

Figures 5 and 6 show time-domain comparisons between the original signals and their generated counterparts, as well as the cycle consistency reconstructions. The generated signals (orange curves) closely follow the dynamics of the original signals (blue curves), capturing both the amplitude and phase variations over time. Although slight amplitude mismatches are visible, the overall temporal structure is preserved and the incorporation of the physics-informed loss improved the coefficient of determination R² by approximately 30%, while reducing the RMSE from 1.39 to 1.1 m/s². For the cycled signals, the results show near-perfect mapping, indicating that the bidirectional mappings not only approximate the transformations between domains but also maintain consistency when cycled back to the original domain. This highlights the CGAN ability to preserve essential signal dynamics in both forward and inverse mappings. To further assess robustness and practical relevance, we conducted a preliminary sensitivity analysis by increasing the turbulence intensity from IEC Category B to Category A. At a reference wind speed of 12 m/s, this corresponds to an increase of approximately 3% in turbulence intensity. The trained physics-informed model was evaluated using the new simulation data without retraining. The prediction performance remained stable as shown in Figure 7, with the RMSE remaining approximately 1.1 m/s², indicating that the learned domain mapping is not highly sensitive to moderate changes in inflow turbulence characteristics.

Figure 5.

Baseline model 10 s prediction (from 2662 s to 2672 s).

Figure 6.

Physics-informed model 10 s prediction (from 2662 s to 2672 s).

Figure 7.

Physics-informed model prediction under increased turbulence intensity (IEC Category A).

6. Conclusions

In this paper, a physics-informed CGAN was proposed to bridge the probability distribution gap between simulated 1.5 MW and 5 MW wind turbines. The generated signals preserved essential temporal dynamics, dominant spectral peaks, and low-frequency energy content in both time and frequency domains. Incorporating a physics-informed loss function improved the R² values by 30%, reduced the RMSE from 1.39 to approximately 1.1 m/s², and decreased training time by 82%. Furthermore, under increased wind turbulence intensity, the RMSE remained stable at approximately 1.1 m/s². The proposed methodology enables domain translation between simulated turbines of different ratings and is architecturally extensible to multichannel sensor data. Validation against real operational turbine data and extension to other subsystems remain important directions for future work.

Footnotes

Acknowledgments

This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS) under the Visiting Faculty Program (VFP). This work was authored in part by the National Laboratory of the Rockies for the U.S. Department of Energy (DOE), operated under Contract No. DE-AC36-08GO28308. Funding provided by U.S. Department of Energy Critical Minerals and Energy Innovation Integrated Energy Systems Office. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

ORCID iDs

Vincent Perlowin

Mohammed Alabsi

Shawn Sheng

Larry Pearlstein

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work also supported in part by the National Science Foundation Research Experiences for Undergraduates program (grant number 2244119) through Dr. Faisal Aqlan’s lab at the University of Louisville.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Arjovsky

Chintala

Bottou

(2017) Wasserstein generative adversarial networks In: International conference on machine learning, August 6–11, 2017, Sydney, Australia. PMLR, pp. 214–223.

Bao

Zhang

Huang

, et al. (2023) A deep transfer learning network for structural condition identification with limited real-world training data. Structural Control and Health Monitoring 8899806. Available at: https://doi.org/10.1155/2023/8899806

Ferrari

Froio

Rizzi

, et al. (2019) Model updating of a historic concrete bridge by sensitivity-and global optimization-based latin hypercube sampling. Engineering Structures 179: 139–160. https://doi.org/10.1016/j.engstruct.2018.08.004

Sadhu

(2024) Domain adaptation for structural health monitoring via physics-informed and self-attention-enhanced generative adversarial learning. Mechanical Systems and Signal Processing 211: 111236. https://doi.org/10.1016/j.ymssp.2024.111236

Goodfellow

(2014) On distinguishability criteria for estimating generative models. https://arxiv.org/abs/1412.6515.

Gulrajani

Ahmed

Arjovsky

et al. (2017) Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, December 4–9, 2017, pp. 5769–5779.

Hernandez-Estrada

Lastres-Danguillecourt

Robles-Ocampo

, et al. (2021) Considerations for the structural analysis and design of wind turbine towers: a review. Renewable and Sustainable Energy Reviews 137: 110447. https://doi.org/10.1016/j.rser.2020.110447

Keller

Gould

Greco

(2017) Investigation of bearing axial cracking: benchtop and full-scale test results. NREL/TP-5000-67523, National Renewable Energy Laboratory, Golden, CO, USA, August 2017 (revised January 2019).

Mao

Wang

(2020) Bayesian finite element model updating of a long-span suspension bridge utilizing hybrid monte carlo simulation and kriging predictor. KSCE Journal of Civil Engineering 24(2): 569–579. https://doi.org/10.1007/s12205-020-0983-4

10.

MathWorks (2024) Monitor gan training progress and identify common failure modes. https://www.mathworks.com/help/deeplearning/ug/monitor-gan-training-progress.html. (Accessed 16 12 2024).

11.

National Laboratory of the Rockies (2024) Turbsim. https://www.nrel.gov/wind/nwtc/turbsim.html. (Accessed 16 12 2024).

12.

Ren

Chen

(2010) Finite element model updating in structural dynamics by using the response surface method. Engineering Structures 32(8): 2455–2465. https://doi.org/10.1016/j.engstruct.2010.04.019

13.

Saad

O’Reilly

Rehmani

(2024) A survey on training challenges in generative adversarial networks for biomedical image analysis. Artificial Intelligence Review 57(2): 19. https://doi.org/10.1007/s10462-023-10624-y

14.

Teng

Chen

, et al. (2023) Structural damage detection based on transfer learning strategy using digital twins of bridges. Mechanical Systems and Signal Processing 191: 110160. https://doi.org/10.1016/j.ymssp.2023.110160

15.

Wan

Peng

Khalil

, et al. (2025) The early warning method for offshore wind turbine gearbox oil temperature based on fstae-att. Sustainable Computing: Informatics and Systems 47: 101180. https://doi.org/10.1016/j.suscom.2025.101180

16.

Yang

Peng

Zhang

, et al. (2023) Abnormal data identification and reconstruction based on wind speed characteristics. CSEE Journal of Power and Energy Systems 11(2): 612–622.

17.

Zhu

Park

Isola

, et al. (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 October 2017, pp. 2223–2232. IEEE.