Comparison study of advanced computer vision models for wind turbine blade defect detection

Abstract

Automated detection of surface defects in wind turbine blades is essential for cost-effective and reliable maintenance in modern energy infrastructure. This study presents a comprehensive evaluation of recent single-stage (YOLOv8, YOLOv9, YOLOv10, YOLOv11) and two-stage (Faster R-CNN) deep learning-based object detection models for wind turbine blade inspection. To address class imbalance, StyleGAN2-ADA augmentation was applied to a real-world dataset, and detection accuracy, class-wise performance, inference speed, and model size were assessed using stratified cross-validation and an independent holdout set. Results show that all YOLO models consistently outperform Faster R-CNN in mean Average Precision (mAP@0.5), with YOLOv11 achieving the highest overall score of 0.969 on the holdout test set. The integration of synthetic data led to substantial performance gains for the minority classes and reduced variance across folds. In addition to superior accuracy, YOLO models demonstrate faster inference (<11 ms per image) and compact model sizes (48–84 MB), highlighting their suitability for real-time and edge-based industrial deployments. These findings establish the technical and practical benefits of combining advanced YOLO models with data augmentation strategies for automated wind turbine blade defect detection.

Keywords

wind turbine blade computer vision defect detection YOLO Faster RCNN

Introduction

Wind energy has emerged as a crucial component in the global effort to address climate change by reducing greenhouse gas emissions (Lv et al., 2022). As the need for renewable energy continues to grow, maintaining the reliability and operational efficiency of wind turbines is becoming increasingly critical (Yu et al., 2017). For instance, wind turbines are frequently subject to harsh environmental conditions, leading to damage such as blade cracks, surface erosion, delamination, lightning strikes, and other structural defects (Yang et al., 2021). These damages, if not promptly detected and addressed, can compromise the performance and lifespan of the turbines, resulting in significant energy losses and maintenance costs. Furthermore, wind turbine blades are vital for electricity generation and account for roughly 15% to 20% of the overall investment in a wind turbine (Tchakoua et al., 2014). However, they are fragile and vulnerable to corrosion and damage from strong wind gusts, lightning strikes, etc. They were the cause of 13.4% of the failures observed in wind turbines in Sweden between 1997 and 2005 from the survey of over 700 onshore wind turbines; they are also failing at approximately 3,800 per year, corresponding to 0.54% of the 700,000 blades currently operating worldwide (Yang et al., 2017). Therefore, it is necessary to implement dependable and effective monitoring of their condition to prevent mechanical, structural, or environmental damage (Carnero et al., 2023). Some of the current inspection techniques are visual inspection method, acoustic emission technology (Han et al., 2014), vibration detection (Wang et al., 2014), infrared thermography (Rumsey and Musial, 2001), ultrasonic flaw detection (Li et al., 2021), and strain detection (Sierra-Pérez et al., 2016). Visual inspection involves scheduling a shutdown, resulting in significant energy generation losses due to prolonged maintenance time. Additionally, it poses risks because of the tall structures of wind turbine blades and the need for human intervention (Reddy et al., 2019). Several challenges are associated with sensor installation, data storage, and transmission. Environmental changes can easily disrupt the signals collected by the sensors. Additionally, placing numerous sensors on wind turbine blades can hinder their energy capture efficiency. Therefore, these techniques are labor-intensive, costly, often inefficient, and susceptible to mistakes made by humans (Lv et al., 2022).

There has been growing interest in unmanned aerial vehicles (UAVs) in recent years for wind turbine blade inspection, as they offer an efficient, cost-effective, and assistive tool with strong potential for automating inspections. However, interpreting the data for damage assessment can be labor-intensive and complex due to the large volume of images captured by UAV cameras. A single flight can yield thousands of high-resolution images per turbine, making manual review labor-intensive and variable across inspectors. To mitigate these challenges, the computer vision-based technique is prioritized to automate this task.

There have been a limited research efforts focusing on machine learning-based approaches for detecting surface damage on wind turbine blades using drone images (Wang and Zhang, 2017). Chandrashekhar et al. (2025) presented a machine learning approach for detecting damage in wind turbine blades, using Gaussian regression to track changes in blade frequency. Similarly, Joshua et al. (2025) introduced a crack detection framework for wind turbine blades, modelling the data with multi-layer perceptrons, and demonstrating strong potential for condition monitoring. However, these studies primarily focused on detecting a single distress (i.e., a crack) and relied on traditional machine learning (ML) techniques. In parallel with inspection research, ML has advanced wind-energy forecasting, improving short-term power prediction and grid planning (Cui et al., 2025; Ibrahim et al., 2023a, 2023b). Complementing this, control and hardware studies have explored robust multi-input control, TSR-MPPT backstepping, and emulator-based converter testing (Boutabba et al., 2025; Elzein et al., 2025; Ravikumar et al., 2025). This proposed study complements these threads, focusing on the vision-based detection of blade-surface defects.

Recent advancements in artificial intelligence (AI) and computer vision, particularly deep learning (DL), provide a promising solution for automating damage detection processes, replacing multi-stage hand-crafted pipelines with learned features (Lecun et al., 2015). Deep neural network approaches are increasingly being used for defect detection in automated visual inspection systems. In the context of wind turbine blade inspection, several studies have demonstrated the potential of deep convolutional models for detecting complex surface anomalies under diverse environmental conditions (see Table 1). For instance, Zhang and Wen (2022) presented SOD-YOLO, a YOLOv5 variant for blade-surface fault detection, which inserts a compact feature-fusion stage and a convolutional block attention module to retain fine-scale cues from small defects. Yet, its robustness in adverse challenging conditions was not evaluated, leaving uncertainty about real-world reliability. In parallel, Wang et al. (2025) introduced AFB-YOLO, which is derived from YOLOv5s with attention and feature balancing, to capture low-resolution, indistinct damage anomalies more effectively and improve sensitivity to small targets. However, the datasets referenced emphasize single damage categories and offer limited variation in illumination and weather conditions, which may constrain the practical performance during field inspections of wind turbine blades. Shihavuddin et al. (2019) demonstrated reliable drone-based defect localization, and a recent survey outlines similar benefits of vision- and detector-based inspection pipelines for wind-turbine blades (Dimitrova et al., 2022).

Table 1.

Comparative analysis between the proposed work and the existing studies.

Study	Model	Classes	Key limitations
Zhang et al. (2021)	Image-enhanced mask R-CNN	4	Single model; deployment metrics are limited
Gohar et al. (2023)	Slice-aided inference model	5	Focus on slicing strategy; deployment metrics are limited.
Tong et al. (2024)	WTBD-YOLOv8	4	Single architecture; limited external comparability
Zou et al. (2025)	AUD-YOLO	2	Single-model improvement; limited class diversity.
Shihavuddin et al. (2019)	Inception-ResNet-v2 architecture inside faster R-CNN	4	Used traditional image augmentation through flipping and rotation. This technique only increases the dataset size; it doesn’t produce unique images.
This Proposed Study	Model performance comparison: YOLO models (8/v9/v10/v11) and Faster R-CNN; Class Balance through image generation (StyleGan2-ADA)	5	This addresses the gaps: multi-model evaluation, five defect classes, variance reporting, latency & size, and image generation for class imbalance mitigation.

Despite this progress, the evidence base remains constrained by small, task-specific datasets, earlier detector versions, single-defect focus, and limited treatment of class imbalance (Moreno et al., 2018; Yu et al., 2017; Zou and Cheng, 2022). In practice, publicly available datasets are modest in size, with Zou and Cheng (2022) reporting 306 images, Yu et al. (2017) reporting 36 images, and Moreno et al. (2018) reporting 78 images. These sizes reflect the challenges of image acquisition on operating turbines, confidentiality constraints on industrial data, and the cost of expert annotation; even curated resources remain limited relative to the need (Carnero et al., 2023; Nikolov and Madsen, 2020). Consequently, overfitting remains a persistent risk, compromising the quality of model training (Schmedemann et al., 2022). Therefore, studies that integrate modern detectors with transparent, imbalance-aware assessments on wind turbine blade defect datasets are needed to ensure accuracy and reproducibility.

Common approaches to address limited datasets include data augmentation, transfer learning, leveraging pre-trained models, and using synthetic data. Among these, synthetic training data offers a promising solution, as it allows for the efficient and unbiased generation of large volumes of labeled images.

StyleGan2-ADA (Karras et al., 2021), a generative model, creates entirely new images that do not exist in the original dataset, providing a richer and more varied set of training samples than traditional augmentation methods. Another advantage of the image generative model is that it generates highly realistic images that can closely mimic real-world scenarios, helping the model generalize better to actual defect conditions than traditional image augmentation, which may not always create realistic variations, especially for complex patterns or defects. Furthermore, the artificial image generative model can produce a virtually unlimited number of unique images, comprehensively addressing data scarcity. Although traditional image generation can increase the dataset size, it is still limited by the number of original images and the range of possible transformations.

The major contributions of this study are:

• An expertly curated and annotated dataset of wind turbine blade images with five defect types compiled from Carnero et al. (2023) and Nikolov and Madsen (2020), and formatted in both YOLO and COCO standards.

• A comprehensive evaluation of four recent single-stage detectors (YOLOv8, YOLOv9, YOLOv10, and YOLOv11) alongside a representative two-stage detector (Faster R-CNN) for automated defect detection in wind turbine blades.

• To mitigate severe class imbalance, StyleGAN2-ADA was utilized to generate high-quality synthetic images for underrepresented defect categories, thereby improving model generalization and achieving equitable per-class performance.

Materials and methods

The following section provides an introduction to the dataset used for training and analysis of the two model architectures applied in this study. Finally, it outlines the evaluation metrics adopted for comparison and discussion.

Dataset acquisition and preparation

Existing dataset

The few publicly available datasets on wind turbine blade defects are limited in size and diverse types of distresses. The majority of the inspection datasets are either private or have limited dataset. Among the available datasets, Nikolov and Madsen (2020) presented one of the more comprehensive collections, featuring 422 images that contain five common defect classes. Subsequently, Carnero et al. (2023) utilized the dataset provided by Nikolov and Madsen (2020) and performed synthetic augmentation to enhance visual diversity by applying color transformations, effectively doubling the dataset size. Figure 1 shows the distribution of image classes on the Carnero et al. (2023) dataset. These images, initially captured by drones, represent various defects on wind turbine blades. This study utilizes the augmented dataset from Carnero et al. (2023) as the base dataset due to its comprehensive coverage of common defect types and improved diversity. Table 2 shows examples of the publicly available wind turbine blade dataset. Figures 2(a)–(e) show examples of the wind turbine blade image defects in the Carnero et al. (2023) dataset.

Figure 1.

The distribution of image classes in the Carnero et al. (2023) dataset.

Table 2.

Publicly available image dataset on wind turbine distresses.

Dataset	Number of images	Classes
Yu et al. (2017)	36	2
Zou and Cheng (2022)	306	2
Moreno et al. (2018)	78	3
Nikolov and Madsen (2020)	422	5
Carnero et al. (2023)	844	5

Figure 2.

Images from Carnero et al. (2023) that capture common distresses on wind turbine blades.

Class distribution and imbalance quantification

To quantify class balance, three global indices and one per-class index were used. (1) Normalized Shannon entropy (J'), (2) Normalized Gini impurity (G*), (3) the Coefficient of Variation on the image counts (CV), and (4) the Ratio-to-uniform.

Let $n_{i}$ denote the number of images that contain at least one instance of class i in a given split, N: total number of images in the split under consideration, $P i$ : class share used in evenness indices, $K :$ number of defect classes, $μ$ : uniform class share, $H$ : Raw Shannon entropy, $G$ : Gini impurity, $R i$ : Ratio-to-uniform.

N = \sum_{i = 1}^{K}, P i = \frac{n_{i}}{N}

(1) Normalized Shannon entropy, $J^{'}$ :

H = - \sum_{i = 1}^{K} p_{i} \ln p_{i}, J^{'} = \frac{H}{\ln K}

(1)

Higher J′ indicates a more even distribution (Pielou, 1966; Shannon, 1948).

(2) Normalized Gini impurity, $G *$ :

G = 1 - \sum_{i = 1}^{K} {p_{i}}^{2}, G * = \frac{G}{1 - 1 / K} \in [0, 1]

(2)

Higher G* indicates a more even distribution (Breiman et al., 1984).

(3) Coefficient of variation, $C V$ :

μ = \frac{1}{K} \sum_{i} n_{i}, σ = \sqrt{\frac{1}{K} \sum_{i} {(n_{i} - μ)}^{2}}, C V = \frac{σ}{μ}

(3)

Lower CV indicates a more even distribution (Sokal and Rohlf, 2012).

For the per-class index, the ratio to a uniform share was used, as

R i = K p_{i}

(4)

where R_i ≈ 1 indicates balance, R_i < 1 indicates a minority class, and R_i > 1 indicates a majority class (Manly et al., 2002). In this study, we set a tight balanced band 0.85 ≤ R_i ≤ 1.15.

To limit over-reliance on generated data while improving class balance, the dataset-level synthetic share (SS) was capped at ≤ 50%. The synthetic share is defined as:

s s = \frac{N_{G e n}}{N_{R e a l} + N_{G e n}}

(5)

This practice is consistent with the synthetic-to-real studies, where the use of synthetic data alongside real data shows better accuracy than using either source alone (Richter et al., 2016; Ros et al., 2016; Tremblay et al., 2018). These prior studies report that mixed training (synthetic + real) improves performance; however, an excessive synthetic proportion can exacerbate the domain gap. Recommended practices include domain randomization and realism refinement, as well as real-only validation, to ensure real-world adaptability. Following this, our work treats ≤50% synthetic data as a guardrail rather than an optimum, and primarily uses synthetic images for minority-class balancing, caps their dataset share at ≤50%, and evaluates exclusively on real-only validation and test splits.

This study pre-specified

J^{'} \geq 0.98

, R_i ∈ [0.85, 1.15] for all augmented classes, and SS ≤ 0.5. The smallest integer target meeting these criteria determined the number of generated images per minority class. See Tables 3 –5 for the corresponding values and the selected target.

Table 3.

Dataset synthetic share.

Target per augmented class	Synthetic share (%)
200	32.0
300	45.3
400	54.1
500	61.1

Table 4.

Per-class ratio to uniform share before and after augmentation.

Class	Real data			Real + Gen data
Class	Images	R_i	Status	Images	R_i	Status
Crack	98	0.581	Minority	300	0.972	Balanced
Erosion	76	0.450	Minority	300	0.972	Balanced
Mechanical damage	26	0.154	Minority	300	0.972	Balanced
Paintoff	374	2.216	Majority	374	1.211	Balanced
Scratch	270	1.600	Majority	270	0.874	Balanced

Table 5.

Global evenness indices before and after augmentation.

	Total	J′	G*	CV	Status
Real data	844	0.807	0.848	0.779	Imbalanced
Real + Gen data	1544	0.996	0.997	0.112	Balanced

Based on Tables 3 –5, which report the dataset synthetic share the per-class ratios to uniform, global evenness indices, 300 images per class is the smallest setting that meets all pre-specified criteria. Accordingly, the totals for Crack, Erosion, and Mechanical Damage were set to 300 images per class (both real and generated).

Synthetic data generation

Given the limited number of images for certain defect classes, synthetic data generation was employed using StyleGAN2-ADA (Karras et al., 2021) to enhance the size of dataset and mitigate class imbalance. Synthetic data augmentation plays a crucial role in developing deep learning models that can generalize effectively across diverse real-world inspection conditions. StyleGAN2-ADA was configured and trained specifically for classes with fewer data points (Crack, Erosion, Mechanical Damage). Training involved iteratively optimizing network parameters to ensure that the generated synthetic images closely resembled actual defects, realistically and diversely. Adaptive discriminator augmentation (ADA) techniques were integrated during training to prevent mode collapse and enhance output variability.

The generated images were firstly validated through manual inspection by domain experts to ensure visual authenticity and defect realism. To complement this qualitative assessment, the Fréchet Inception Distance (FID) (Heusel et al., 2017) was computed as a quantitative measure to assess how closely the generated images resemble real ones. Consistent with best practices in previous studies (Nunn et al., 2021; Zhao et al., 2021), only synthetic images achieving an FID score below a predefined threshold of 20 were considered sufficiently realistic and thus selected for image data augmentation. This dual-filtering strategy ensured that only high-quality synthetic data was incorporated into the dataset, thereby reducing the risk of introducing artefacts that could adversely affect model performance. Table 6 shows the dataset before and after image generation. Figure 3(a)–(c) show examples of generated images.

Table 6.

Dataset before and after image generation.

Class	Real data	Synthetic added	Real + Gen data
Crack	98	202	300
Erosion	76	224	300
Mechanical damage	26	274	300
Paintoff	374	0	374
Scratch	270	0	270
Total	844	700	1544

Figure 3.

Example of generated images of wind turbine blade with distress.

Dataset splitting

The stratified K-Fold cross-validation approach was applied independently to two scenarios: (1) the original dataset of 844 real images and (2) the augmented dataset combining real images and synthetic images generated by StyleGAN2-ADA. This allowed for direct comparison of model performance across datasets with and without synthetic images. A separate, fixed holdout test set comprising 20% of the original real images was created and used for all experiments to enable fair, direct comparisons. The remaining 80% of the real images formed the development set, which was partitioned using stratified five-fold cross-validation, thereby preserving the class distribution in each fold. Within each CV iteration, models were trained on the four training folds (Real data or Real + Gen data) and validated on the remaining fold using real images only. The test set and validation folds are kept free of synthetic images to prevent leakage and optimistic bias (e.g., near-duplicates inflating accuracy) and to ensure that reported metrics reflect performance on real-world data (Nikolenko, 2021; Ros et al., 2016; Shorten and Khoshgoftaar, 2019; Tremblay et al., 2018). A schematic of the split and cross-validation procedure is provided in Figure S1.

Stratified k-fold cross-validation was adopted to provide robust and unbiased estimates of model performance, particularly in the presence of class imbalance. Unlike a single holdout validation split, stratified k-fold ensures that each fold preserves the overall class distribution, avoiding the risk of underrepresenting minority classes in the validation set. This approach ensures that every image, including those from rare classes, such as crack, erosion, and mechanical damage, serves as a validation instance exactly once, thereby enhancing the reliability and statistical significance of per-class performance metrics. Additionally, this method maximizes data utilization, reduces evaluation variance, and is considered best practice for limited and imbalanced datasets.

Cross-validation design and choice of folds

To prevent distributional leakage from synthetic data, only real images were used for validation (and for the fixed hold-out test set), as detailed previously. Of the 844 real images available, 20% were reserved as a real-only test set, leaving 80% for model development. This study adopted class-stratified K-fold cross-validation on this development set to preserve class proportions within each split. Under K = 5, the approximate number of real validation images per fold is: Crack ≈16, Erosion ≈12, Mechanical Damage ≈4, Paintoff ≈60, Scratch ≈43.

Increasing the granularity to K = 10 would reduce the minority-class sample size per fold (Mechanical Damage to ≈2), rendering class-wise average precision (AP) highly sensitive to a few instances and producing unstable precision–recall curves with wider fold-to-fold variance. Conversely, using fewer folds (e.g., K = 3) would raise per-fold counts (Mechanical Damage ≈7) but at the cost of (i) larger validation fractions per iteration and smaller training splits, potentially impairing model fitting, and (ii) fewer repeats, which increases the Monte-Carlo error of the cross-validated mean and inflates uncertainty in performance estimates.

Balancing these considerations, K = 5 is the largest fold count that preserves ∼4–5 real validation examples in the rarest class per fold while still averaging over multiple stratified splits. This choice jointly promotes (i) fold-wise metric stability for minority classes and (ii) variance reduction of summary estimates through repeated re-sampling. Throughout, mean ± standard deviation (SD) was reported across the five validation folds, which quantifies residual variability due to sampling across cross-validation partitions.

Image annotation

Annotating images is essential for training object detection models, as it enables them to learn both the positions and categories of objects present within each image. All images, including those from the original dataset and the newly generated samples, were carefully annotated to ensure high-quality ground truth data. The annotation process was performed manually using the open-source software LabelImg (Tzutalin, 2015), which offers an effective interface for bounding box annotation. Manual annotation was chosen to promote accuracy and consistency throughout the entire dataset.

Model architectures and selection

This study conducts a comprehensive comparison of modern object detection frameworks, focusing on the YOLO family (YOLOv8, YOLOv9, YOLOv10, YOLOv11) (Jocher et al., 2023, 2024; Wang et al., 2024; Wang and Zhang, 2017) and the standard two-stage Faster R-CNN (with a ResNet-50-FPN backbone) (Ren et al., 2017). The motivation for this dual selection is twofold: (1) to benchmark the most recent advancements in real-time single-stage detection against a widely recognized two-stage baseline and (2) to discuss their respective strengths and limitations for wind turbine blade defect detection, particularly in settings characterized by limited annotated data and significant class imbalance.

YOLO models

This study evaluates four recent versions of the You Only Look Once (YOLO) object detection family: YOLOv8, YOLOv9, YOLOv10, and YOLOv11. The YOLO series is recognized for its single-stage design, enabling real-time detection by predicting bounding boxes and class probabilities in a single network pass (Jocher et al., 2023). These characteristics make YOLO models particularly suitable for large-scale and edge-based inspection tasks, where speed and computational efficiency are crucial (Hussain, 2023).

YOLOv8, developed by Ultralytics, utilizes a fully convolutional architecture with an anchor-free detection head and a decoupled design, resulting in enhanced detection accuracy and speed (Jocher et al., 2023). YOLOv9 incorporates advanced methods such as programmable gradient information (PGI) and the generalized efficient layer aggregation network (GELAN), which help mitigate information loss during feature extraction and enhance computational efficiency. These improvements contribute to more reliable detection of small or difficult defects (Wang and Zhang, 2017). YOLOv10 builds on these advances, incorporating a stable dual-assignment strategy for training without NMS and prioritizing a model architecture optimized for both speed and precision, which further increases detection accuracy and model robustness, particularly beneficial for industrial defect detection tasks (Wang et al., 2024a). YOLOv11 is the newest version at the time of this study, featuring a redesigned backbone and neck, optimized attention mechanisms, and a streamlined training pipeline. This provides improved feature extraction, faster inference, and superior accuracy for challenging object detection problems (Jocher et al., 2024).

All YOLO models in this work were implemented using the official Ultralytics Python library. The “large” variants of these models were chosen for their balance between model capacity and computational efficiency, as well as for their suitability for deployment on GPU-accelerated cloud platforms and, potentially, on resource-constrained edge devices. The overview architectures of the YOLO models are shown in Figure 4.

Figure 4.

Overview of image-based distress detection using YOLO models architecture.

All models were initialized with weights pre-trained on the COCO dataset and trained for up to 100 epochs. The batch size, along with the learning rate, momentum, optimizer, and other key training parameters, was treated as a tunable hyperparameter and optimized using Bayesian Optimization, as elaborated in the earlier part of this paper.

Faster R-CNN with ResNet-50-FPN

Faster R-CNN, a widely adopted two-stage object detector, was included in this study to serve as a benchmark against the single-stage YOLO architectures. This model is recognized for its high localization and classification accuracy, particularly where detection precision is more crucial than real-time inference speed (Ren et al., 2017). The Faster R-CNN framework is comprised of three key components: a deep convolutional neural network backbone for extracting features, a region proposal network (RPN) for identifying potential object regions, and dedicated heads for classification and bounding box regression (Carranza-García et al., 2021). This study utilized a ResNet-50 backbone enhanced with a feature pyramid network (FPN), in accordance with the PyTorch torchvision implementation (as seen in Figure 5). The inclusion of FPN enables the model to leverage features at multiple scales, improving its ability to identify defects of varying sizes and shapes on wind turbine blades. Training was performed using the same data splits as the YOLO models, with input images standardized to 640 × 640 pixels for uniformity. Core hyperparameters, including initial learning rate, batch size, weight decay, RPN anchor configuration, and the learning rate scheduler, were tuned via Bayesian Optimization as outlined in Section 2.2.3.

Figure 5.

Overview of Faster RCNN model architecture.

For a fair comparative analysis, all models were trained and evaluated using the same datasets, image preprocessing pipelines, and cross-validation splits. Both real and augmented (generated) datasets were utilized to assess model robustness under varying data availability and class imbalance scenarios. Training and evaluation were performed under equivalent computational settings, including preprocessing workflows, and cross-validation setups so that any observed differences in performance could be attributed to the models themselves rather than variations in experimental setup. The diagrammatic illustration of the methodology is shown in Figure 6.

Figure 6.

Illustration of the proposed methodology.

The training experiments were conducted using Google Cloud Pro, equipped with 83 GB of system RAM, 40 GB of GPU RAM, and 113 GB of disk space. An NVIDIA A100-SXM4-40GB GPU (CUDA 12.4, driver version 550.54.15) was utilized in this environment, enabling experimental runs with various learning rates and epoch values across different models.

Hyperparameter optimization

The hyperparameters of the models are fine-tuned with Bayesian Optimization (BO). Bayesian Optimization is recognized as an effective and efficient method for tuning deep learning hyperparameters, particularly in high-dimensional spaces where model training is computationally intensive (Snoek et al., 2012). Unlike manual, grid, or random search methods, BO constructs a model of the objective function and iteratively proposes hyperparameter configurations that are likely to yield improved results, thereby accelerating convergence toward optimal settings.

Optimization was carried out using the Optuna framework, which offers advanced BO algorithms and integrates smoothly with PyTorch and Ultralytics training workflows (Akiba et al., 2019). For each of the models, BO was utilized to optimize key training hyperparameters, including initial and final learning rates, momentum, weight decay, optimizer choice, dropout rate, batch size. Additionally, NMS thresholds and anchor settings were included for Faster R-CNN. Each BO trial involved training the model on a fold from the stratified cross-validation and assessing performance using the primary metric, mean Average Precision at IoU 0.5 (mAP@0.5). The best hyperparameter configuration for each model and experimental setting (real data vs real + synthetic data) was selected based on the highest validation mAP@0.5, and subsequently applied for final cross-validation and test set evaluation. Tables 7 –9 summarize the hyperparameter search space and the optimal hyperparameter configurations identified for each model.

Table 7.

Hyperparameter search space.

Component	Search space	Rationale
Initial learning rate (lr0)	1e-5, 1e-4, 1e-3, 1e-2, 1e-1	Covers the standard 10× ladder for detectors initialized from COCO, allowing BO to pick stable warm-starts for both large and small batches.
Final learning rate (lrf)	1e-4, 1e-3, 1e-2, 1e-1	Controls terminal LR under cosine and step decay, preventing under- or over-training late in training.
Momentum	0.3, 0.4, 0.5, 0.7, 0.9	Spans low- to high-momentum regimes; higher values improve stability for batch-normalized backbones, while lower values are beneficial when batches are small.
Weight decay	1e-4, 1e-3, 1e-2, 5e-4, 5e-3	Regularizes large models; range covers common detector values and slightly stronger penalties when minority classes cause overfitting.
Batch size	8, 16, 32	Constrained by the GPU, especially at 640 × 640 inputs. BO tunes the LR in conjunction with the batch size to maintain stable training.
Dropout	0.0, 0.1, 0.2, 0.3	Optional regularization for heads; let BO increase regularization when synthetic and real training increase sample diversity.
Optimizer	SGD, AdamW, Adam	Compares momentum-SGD (often best for detectors) with Adam/AdamW (faster convergence for class-imbalanced data).
LR scheduler	None, step, cosine	Step and cosine are standard for detectors; “none” allows BO to choose a stable constant-LR when decay is harmful.
RPN anchor base (Faster R-CNN)	8, 16, 32, 64	Matches the pixel scale of small scratches to large regions (e.g., Paintoff); FPN handles multi-scale, base size-tuned proposal density.
RPN NMS thresh.	0.4, 0.5, 0.7, 0.9	Trades duplicate suppression vs recall on clustered defects; BO tunes per-class crowding patterns.
Initial epoch	70	Provides a warm-up and convergence window while avoiding late-epoch overfitting, aligned with our compute budget. Used for hyperparameter search
Final epoch	100	For retraining each model with the best hyperparameter

Table 8.

Best hyperparameter for YOLO models.

Hyperparameters	YOLOv8L		YOLOv9L		YOLOv10L		YOLOv11L
Hyperparameters	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen
Initial learning rate (lr0)	0.0001	0.01	0.0001	0.01	0.0001	0.0001	0.01	0.0001
Final learning rate (lrf)	0.1	0.1	0.1	0.01	0.01	0.01	0.01	0.01
Momentum	0.9	0.7	0.7	0.9	0.9	0.7	0.9	0.7
Weight_decay	0.0005	0.0001	0.0005	0.0001	0.0005	0.0001	0.0005	0.0005
Batch	16	8	16	8	16	8	16	8
Dropout	0.2	0.0	0.2	0.0	0.0	0.2	0.2	0.0
Optimizer	Adam	SGD	Adam	SGD	Adam	Adam	SGD	SGD
Initial epoch	70	70	70	70	70	70	70	70
Final epoch	100	100	100	100	100	100	100	100

Table 9.

Best hyperparameter for Faster RCNN.

Hyperparameters	Faster RCNN
Hyperparameters	Real data	Real + Generated dataset
Learning rate	0.0001	0.0001
Weight_decay	0.0005	0.0005
Batch size	8	4
Lr scheduler	Step	Step
RPN anchor base	16	32
RPN NMS threshold	0.9	0.9
Initial epoch	70	70
Final epoch	100	100

Evaluation metrics

The performance of the models on the wind turbine dataset was evaluated using standard metrics, including precision, recall, and mean average precision (mAP), across various intersection over union (IoU) thresholds. These metrics are essential for evaluating how accurately and reliably each model’s predicted bounding boxes align with the ground truth annotations. Precision measures the proportion of correctly predicted positive detections among all positive predictions made by the model. Recall assesses the proportion of actual positives that the model correctly detects (Padilla et al., 2020). The equations of Precision and Recall are shown in equations (6) and (7), respectively.

P r e c i s i o n = \frac{True Positive (TP)}{True Positive (TP) + False Positive (FP)}

(6)

R e c a l l = \frac{True Positive (TP)}{True Positive (TP) + False Negative (FN)}

(7)

True positives occur when the model accurately detects an annotated object of interest. In contrast, False positives occur when the model mistakenly identifies an object that does not exist in the ground truth. False negatives occur when the model fails to detect an object that is actually present in the ground truth annotations (Padilla et al., 2020). An increased precision score indicates that the model makes fewer false positive errors, resulting in more accurate detections. Conversely, a higher recall value indicates the model’s ability to effectively identify true positive instances, highlighting its sensitivity to the target objects.

Mean Average Precision (mAP) serves as a holistic metric for evaluating detection across multiple object classes. The calculation involves first obtaining the average precision (AP) for each class, defined as the area under the precision-recall curve, which reflects the trade-off between precision and recall across different confidence levels. The mAP is subsequently determined by averaging the AP scores of all classes. A higher mAP score signifies improved overall object detection performance, indicating that the model achieves greater precision and recall for all target classes. It is expressed mathematically as:

m A P = \frac{1}{N} \sum_{i = 1}^{N} ({A P}_{i})

(8)

mAP@50 is the average precision calculated at an intersection over union (IoU) threshold of 0.50. It’s a measure of the model’s accuracy considering only the easy detections. mAP@50-95 is the average of the mean average precision calculated at varying IoU thresholds, ranging from 0.50 to 0.95.

All evaluation metrics for the YOLO models were obtained using the Ultralytics framework’s built-in routines, which follow the COCO standard. For Faster R-CNN, metrics were calculated with the pycocotools library to maintain consistency and enable direct comparison between the different models. Results are reported as mean ± standard deviation across stratified 5-fold cross-validation and on the independent holdout test set.

Results

This section presents a comparative performance of the YOLO models (v8, v9, v10, v11) and Faster R-CNN on the dataset. Results are reported for both the original dataset and the augmented dataset that includes StyleGAN2-ADA-generated synthetic images. Model evaluation was conducted using stratified 5-fold cross-validation, and performance is assessed with mean Average Precision at IoU 0.5 (mAP@0.5) as the primary metric, alongside per-class results and standard deviation to reflect robustness across folds.

YOLO models and Faster RCNN performance on validation set

Table 10 reports per-class AP@0.50 and the macro mAP@0.50 (mean ± SD, five folds) on the real-only validation folds. The top panel displays models trained on real images, while the bottom panel shows models trained on real images combined with generated images.

Table 10.

Per-class AP@0.50 and mAP@0.50 (mean ± SD, 5-fold) on validation folds.

Model	Crack	Erosion	Mechanical damage	Paintoff	Scratch	All (±SD)
Real data
YOLOv8L	0.995 ± 0.000	0.936 ± 0.051	0.772 ± 0.105	0.962 ± 0.015	0.926 ± 0.051	0.918 ± 0.019
YOLOv9L	0.995 ± 0.000	0.938 ± 0.045	0.783 ± 0.072	0.963 ± 0.015	0.941 ± 0.029	0.924 ± 0.011
YOLOv10L	0.993 ± 0.002	0.948 ± 0.045	0.766 ± 0.144	0.942 ± 0.027	0.944 ± 0.029	0.919 ± 0.029
YOLOv11L	0.995 ± 0.002	0.954 ± 0.036	0.80 5± 0.108	0.960 ± 0.025	0.946 ± 0.027	0.932 ± 0.017
Faster_RCNN	0.990 ± 0.009	0.872 ± 0.049	0.569 ± 0.105	0.954 ± 0.016	0.620 ± 0.076	0.801 ± 0.020
Real + Gen data
YOLOv8L	0.967 ± 0.023	0.978 ± 0.025	0.961 ± 0.034	0.961 ± 0.016	0.940 ± 0.020	0.961 ± 0.007
YOLOv9L	0.966 ± 0.028	0.982 ± 0.017	0.968 ± 0.026	0.965 ± 0.020	0.947 ± 0.020	0.966 ± 0.011
YOLOv10L	0.967 ± 0.018	0.981 ± 0.018	0.960 ± 0.023	0.933 ± 0.035	0.947 ± 0.024	0.958 ± 0.011
YOLOv11L	0.971 ± 0.025	0.978 ± 0.025	0.969 ± 0.022	0.970 ± 0.013	0.945 ± 0.024	0.967 ± 0.007
Faster_RCNN	0.963 ± 0.024	0.958 ± 0.017	0.939 ± 0.021	0.970 ± 0.023	0.902 ± 0.046	0.946 ± 0.013

In the top panel, all YOLO variants substantially outperform Faster R-CNN, especially on the most under-represented class (mechanical damage), where Faster R-CNN attains AP@0.50 = 0.569 versus 0.772–0.805 for YOLO models. The highest macro mAP@0.50 is achieved by YOLOv11L: 0.932 ± 0.017, closely followed by YOLOv9L and YOLOv10L. Performance remains lower on mechanical damage (e.g., YOLOv11L: 0.805 ± 0.108) than on the other classes (≈0.96 on average), reflecting its scarcity and complexity.

In the bottom panel, adding synthetic images improves all architectures, with the largest gains on minority classes. Across YOLO variants, macro mAP@0.50 spans 0.958–0.967 (YOLOv8L: 0.961 ± 0.007; YOLOv9L: 0.966 ± 0.011; YOLOv10L: 0.958 ± 0.011; YOLOv11L: 0.967 ± 0.007). YOLOv11L remains best on average. Faster R-CNN also improves from 0.801 ± 0.020 to 0.946 ± 0.013, though it still trails the YOLO models. For the most challenging and underrepresented class, YOLOv11L rises from 0.805 ± 0.108 (Real) to 0.969 ± 0.022 (Real + Gen), emphasizing the benefit of synthetic augmentation for minority-class detection.

YOLO models and Faster RCNN performance on holdout test set

To further evaluate the generalization and practical deployment potential of the trained models, we assessed their performance on an independent holdout test set consisting of data not used during training or validation. Tables 11 and 12 summarize per-class AP@0.50 and aggregate metrics.

Table 11.

Per class AP@50 of models on the holdout test.

Classes	YOLOv8L		YOLOv9L		YOLOv10L		YOLOv11L		Faster RCNN
Classes	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen
Crack	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.95	0.95
Erosion	0.99	0.96	0.99	0.95	0.99	0.97	0.99	0.96	0.98	0.94
Mech damage	0.86	0.97	0.77	0.97	0.69	0.95	0.83	0.98	0.60	0.71
Paintoff	0.97	0.96	0.96	0.94	0.92	0.93	0.94	0.97	0.98	0.94
Scratch	0.98	0.92	0.95	0.95	0.96	0.92	0.98	0.95	0.72	0.93

Table 12.

Performance evaluation of models on the holdout test.

Model	Precision		Recall		mAP50		mAP50-95		Inference time (ms)		Model size (MB)
Model	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen	Real	Real + Gen
YOLOv8L	0.950	0.946	0.938	0.924	0.961	0.958	0.686	0.753	11.3	11.1	83.5	83.5
YOLOv9L	0.912	0.926	0.922	0.957	0.934	0.963	0.709	0.763	9.7	9.6	49.2	49.2
YOLOv10L	0.916	0.961	0.849	0.900	0.914	0.954	0.673	0.769	10.8	10.7	49.7	49.7
YOLOv11L	0.934	0.944	0.935	0.955	0.951	0.969	0.690	0.786	9.7	9.7	48.8	48.8
Faster RCNN	0.436	0.782	0.87	0.912	0.846	0.896	0.540	0.654	27.08	25.81	158	158

Note. Model size (MB) denotes the on-disk checkpoint footprint. Since the architecture and precision are unchanged, model size is the same for Real and Real + Gen training.

As shown in Table 10, adding generated data consistently improves the minority class (mechanical damage) across all architectures. For example, YOLOv10L improves from 0.69 to 0.95, YOLOv11L from 0.83 to 0.98, and Faster R-CNN from 0.60 to 0.71 AP@0.50. Small decreases appear for some majority classes (e.g., scratch and paintoff); this is an expected trade-off when rebalancing toward rare patterns.

As seen in Table 12, the mAP@0.50 increases for YOLOv9L (from 0.934 to 0.963), YOLOv10L (from 0.914 to 0.954), YOLOv11L (from 0.951 to 0.969), and Faster R-CNN (from 0.846 to 0.896); YOLOv8L is essentially unchanged (from 0.961 to 0.958). At stricter IoUs, mAP@[.50:.95] rises for all models, for instance, YOLOv8L from 0.686 to 0.753 and YOLOv11L from 0.690 to 0.786, indicating better localization after augmentation. Precision and recall trends mirror these gains, with the largest improvement for Faster R-CNN.

Inference times and model sizes are effectively unchanged by augmentation (YOLO ≤ ∼11 ms per image; 48–84 MB; Faster R-CNN ≈26 ms, 158 MB), so the accuracy gains do not carry a runtime or memory cost.

Conclusively, across the fixed holdout, YOLO models consistently outperform Faster R-CNN in both accuracy and efficiency. The best overall trade-off is YOLOv11L with mAP@0.50 of 0.969 at 9.7 ms per image and 48.8 MB, enabling real-time deployment on edge devices. Minority-class gains are significant: mechanical-damage AP@0.50 improves from 0.83 to 0.98 when adding StyleGAN2-ADA rebalancing, while precision and recall remain high. These trends replicate across other YOLO variants, indicating that class-balanced data, not just architecture choice, is pivotal.

The Precision-Recall curves for all evaluated models on the holdout test set, using both real plus generated images for training, are shown in Figure 7. The curves indicate that all YOLO models maintain high precision and recall across a broad range of thresholds. In contrast, Faster R-CNN’s curve falls below those of the YOLO models, especially for mechanical damage, confirming the trend seen in Table 10.

Figure 7.

Precision-recall curve of the models for Real + Generated images.

Model inference speed and resource efficiency

All YOLO models achieved <11 ms per image inference and maintained compact model sizes (48–84 MB) on the holdout test (Table 10). Among them, YOLOv11L provided the best accuracy–latency trade-off (mAP@0.50 = 0.969, mAP@[0.50:0.95] = 0.786, 9.7 ms, 48.8 MB). These characteristics make YOLOv11 well-suited for real-time deployment on resource-constrained platforms, including edge devices, UAVs, and web or mobile applications. In contrast, Faster R-CNN required substantially more compute (approx 26 ms per image; 158 MB), which limits its practicality for time-sensitive or embedded settings.

These findings emphasize the practical advantages of adopting modern single-stage detectors for automated wind turbine blade defect detection. Deploying accurate and efficient models, such as YOLOv11, on embedded or mobile platforms enables real-time monitoring of blade conditions and facilitates rapid maintenance interventions when defects are detected. Overall, the YOLO family, particularly YOLOv11L, offers a favorable balance between detection performance and computational efficiency, supporting scalable, industry-ready solutions for reliable wind turbine blade health assessment. Figure 8 shows visual examples of YOLOv11L detections on holdout images, illustrating accurate localization on minority defects; additional examples appear in Figure S2.

Figure 8.

YOLOv11L detections on the real-only holdout test set: (a) Crack, (b) Erosion, (c) Mechanical Damage, (d) Paintoff, (e) Scratch.

Impact of input resolution on small-defect detection

After training all detectors at 640 × 640, we evaluated inference at image sizes of 320, 512, 640, 800, and 1024. YOLOv11 peaked at 640 px (mAP@0.5 = 0.9691); moving to 800 and 1024 px changed performance by only −0.18% and −0.38%, respectively, confirming robustness above 640 px. Reducing to 320 px decreased mAP by 4.30%, consistent with down-scaling effects on small-defect sensitivity. Other YOLO variants exhibit similar saturation around their peaks (≤0.8% change above peak), whereas Faster R-CNN displays a flatter curve but a lower overall mAP50, as shown in Figure 9. Overall, 640 px offers the best accuracy–latency trade-off, with only marginal changes at higher resolutions.

Figure 9.

Model mAP@0.5 across input sizes.

Discussion

This study provides a comprehensive comparison of advanced single-stage (YOLOv8, YOLOv9, YOLOv10, YOLOv11) and two-stage (Faster R-CNN) object detection models for the task of automated wind turbine blade defect detection. The findings demonstrate that the YOLO models consistently outperform Faster R-CNN across nearly all detection metrics on both validation and independent holdout test sets (see Tables 7 –10), aligned with earlier studies on object detection tasks (Jin et al., 2022; Sharma et al., 2024). YOLO variants achieved higher overall mAP@0.5 values, with balanced and robust performance across well-represented classes (paintoff and scratch). The original minority classes (crack, erosion, and mechanical damage), which typically present significant challenges due to limited training data, also benefited markedly from targeted augmentation.

Synthetic data augmentation using StyleGAN2-ADA proved especially valuable in addressing class imbalance and improving detection accuracy. The inclusion of generated images led to substantial gains in mAP@0.5 for mechanical damage and also reduced variance across cross-validation folds (as shown in Tables 7–9). These results align with recent studies that have highlighted the effectiveness of GAN-based synthetic augmentation for enhancing visual inspection models, particularly when rare defect classes are underrepresented (Frid-Adar et al., 2018).

Although all detectors were trained at 640 × 640 for comparability, a test-time resolution sweep was conducted at different input sizes to assess adaptability (see Figure 9). YOLOv11, our best overall model, achieves its peak performance at 640 px (mAP@0.5 = 0.9691). Increasing input size beyond 640 yields only −0.18% (800 px) and −0.38% (1024 px) relative to the peak, indicating robustness above 640 px with diminishing returns. Reducing resolution to 320 px lowers mAP by 4.49%, consistent with loss of small-defect detail under down-scaling. Other YOLO variants exhibit similar saturation near their respective peaks (≤0.8% change beyond peak). Collectively, these results suggest 640 px as the accuracy–latency sweet spot for field deployment; 512 px is a viable fallback when compute is tight, while 800–1024 px offer negligible gains on this dataset.

Model deployment considerations, such as inference speed and model size, further highlight the advantages of the YOLO family. All YOLO variants exhibited low inference times (under 11 ms per image) and compact model sizes (48–84 MB), as seen in Table 10, making them well-suited for real-time or edge-based deployment, including web applications and drone-mounted inspection systems. In contrast, Faster R-CNN’s larger model size and slower inference time (25.8 ms per image) present challenges for practical, time-sensitive applications.

Nevertheless, while synthetic data augmentation helped to balance the dataset and improve rare class detection, the realism and diversity of generated images are inherently limited by the GAN’s capabilities and the diversity present in the original data. Future research should focus on exploring more advanced generative approaches to enhance performance on rare and emerging defect types.

Overall, the results demonstrate that integrating state-of-the-art YOLO architectures with generated image data yields a robust and deployable solution for automated detection of wind turbine blade defects. These findings support the use of modern single-stage detectors and GAN-based augmentation in infrastructure inspection workflows, offering promising directions for future advances in automated infrastructure monitoring.

Conclusion

This work presents a comprehensive comparative evaluation of leading single-stage (YOLOv8, YOLOv9, YOLOv10, YOLOv11) and two-stage (Faster R-CNN) object detection models for automated wind turbine blade defect detection. By leveraging both real and GAN-augmented (StyleGAN2-ADA) datasets, our results demonstrate that recent YOLO architectures consistently outperform Faster R-CNN in mAP@0.50 and per-class AP@0.50, particularly when addressing the challenges of class imbalance through targeted data augmentation. The study reveals that incorporating synthetic data not only enhances the detection of the minority class (crack, erosion, and mechanical damage) but also stabilizes performance across cross-validation folds. These results emphasize the crucial role of generative models in addressing data scarcity and improving model robustness for real-world inspection applications.

On the independent holdout set, YOLOv11L achieved a mAP@0.5 of 0.969 at 9.7 ms per img with a 48.8 MB checkpoint, outperforming Faster R-CNN, which achieved a mAP@0.5 of 0.896 at 26 ms per img with a 158 MB checkpoint. StyleGAN2-ADA augmentation particularly improved the underrepresented Mechanical Damage class, from 0.83 to 0.98 AP@0.5, while maintaining high performance on the majority classes. These results indicate that YOLOv11L offers the best accuracy–latency trade-off for real-time and edge deployments in wind turbine blade inspection. Additionally, these trends replicate across other YOLO variants, indicating that class-balanced data, not just architecture choice, is pivotal.

While the findings substantiate the impact of combining advanced single-stage detectors with GAN-based data augmentation, future research should explore the use of advanced generative approaches and focus on deploying the models in web applications and UAV systems for real-time defect detection.

Collectively, this study advances the state-of-the-art in automated wind turbine blade inspection and provides a robust methodological framework for the deployment of deep learning–based defect detection in the broader domain of structural health monitoring.

Supplemental material

Supplemental material - Comparison study of advanced computer vision models for wind turbine blade defect detection

Supplemental material for Comparison study of advanced computer vision models for wind turbine blade defect detection by Jamiu Lateef, Xiong (Bill) Yu in Wind Engineering

Footnotes

Acknowledgements

The authors would like to thank the US National Science Foundation for partial support of this research (Grant No. 2026612).

ORCID iDs

Jamiu Lateef

Xiong (Bill) Yu

Author contributions

Jamiu Lateef: conceptualization, methodology development, data analysis, and writing.

Xiong (Bill) Yu: conceptualization, supervision, and manuscript revision.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is partially funded by the US National Science Foundation (Grant No. 2026612).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Supporting datasets used in this study are available from the corresponding author on request.*

Supplemental material

Supplemental material for this article is available online.

References

Akiba

Sano

Yanase

, et al. (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Anchorage, Alaska, USA, from August 4–8, 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701

Boutabba

Benlaloui

Mechnane

, et al. (2025) Design of a small wind turbine emulator for testing power converters using dSPACE 1104. International Journal of Robotics and Control Systems 5(2): 698–712. Available at: https://doi.org/10.31763/ijrcs.v5i2.1685

Breiman

Friedman

Olshen

, et al. (1984) Classification and regression trees. Chapman and Hall/CRC. https://doi.org/10.1201/9781315139470

Carnero

Martín

Díaz

(2023) Portable motorized telescope system for wind turbine blades damage detection. Engineering Reports 7: 1–24. https://doi.org/10.1002/eng2.12618

Carranza-García

Torres-Mateo

Lara-Benítez

, et al. (2021) On the performance of one-stage and two-stage object detectors in autonomous vehicles using camera data. Remote Sensing 13(1): 1–23. https://doi.org/10.3390/rs13010089

Chandrashekhar

Satyanarayana

Gorrepati

, et al. (2025) An efficient YOLOv12-based framework for detecting extremely small-scale objects. Scientific Reports.

Cui

Harrison

Idriss

, et al. (2025) Enhancing short-term electricity forecasting with advanced machine learning techniques. Journal of Electrical Engineering and Technology 21(1): 147–187. Available at: https://doi.org/10.1007/s42835-025-02430-z

Dimitrova

Aminzadeh

Meiabadi

, et al. (2022) A survey on non-destructive smart inspection of wind turbine blades based on industry 4.0 strategy. Applied Mechanics 3: 1299–1326.

Elzein

Maamar

Mahmoud

, et al. (2025) The utilization of a TSR-MPPT-based backstepping controller and speed estimator across varying intensities of wind speed turbulence. International Journal of Robotics and Control Systems 5(2): 1315–1330. Available at: https://doi.org/10.31763/ijrcs.v5i2.1793

10.

Frid-Adar

Diamant

Klang

, et al. (2018) GAN-Based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321: 321–331. https://doi.org/10.1016/j.neucom.2018.09.013

11.

Gohar

Halimi

See

, et al. (2023) Slice-aided defect detection in ultra high-resolution wind turbine blade images. Machines 11: 1–14.

12.

Han

Yoon

Huh

, et al. (2014) Damage assessment of wind turbine blade under static loading test using acoustic emission. Journal of Intelligent Material Systems and Structures 25(5): 621–630. https://doi.org/10.1177/1045389X13508329

13.

Heusel

Ramsauer

Unterthiner

, et al. (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems 30: 6627–6638. Available at: https://doi.org/10.18034/ajase.v8i1.9

14.

Hussain

(2023) YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 11(7): 677. https://doi.org/10.3390/machines11070677

15.

Ibrahim

Alkuhayli

Beroual

, et al. (2023a) Enhancing the functionality of a grid-connected photovoltaic system in a distant Egyptian region using an optimized dynamic voltage restorer: application of artificial rabbits optimization. Sensors 23: 7146. https://doi.org/10.3390/s23167146

16.

Ibrahim

Mahmoud

Omar

, et al. (2023b) Operation of grid-connected PV system with ANN-based MPPT and an optimized LCL filter using GRG algorithm for enhanced power quality. IEEE Access 11: 106859–106876. https://doi.org/10.1109/ACCESS.2023.3317980

17.

Jin

Sun

Che

, et al. (2022) A novel deep learning-based method for detection of weeds in vegetables. Pest Management Science 78(5): 1861–1869. https://doi.org/10.1002/ps.6804

18.

Jocher

Chaurasia

Qiu

, et al. (2023) YOLOV8. https://github.com/ultralytics/ultralytics

19.

Jocher

Chaurasia

Qiu

, et al. (2024) YOLO11 New. https://docs.ultralytics.com/models/yolo11/

20.

Joshua

Palilingan

Lengkong

, et al. (2025) Deep Learning-Driven Solar Fault Detection in Solar–Hydrogen AIoT Systems: Implementing CNN VGG16, ResNet-50, DenseNet121, and EfficientNetB0 in a University-Based Framework. Hydrogen 7(1): 1.

21.

Karras

Laine

Aila

(2021) A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(12): 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919

22.

Lecun

Bengio

Hinton

(2015) Deep learning. Nature 521(7553): 436–444. https://doi.org/10.1038/nature14539

23.

Shen

Guo

(2021) Simulation and experimental study on the ultrasonic micro-vibration de-icing method for wind turbine blades. Energies 14(24): 8246. https://doi.org/10.3390/en14248246

24.

Yao

Wang

, et al. (2022) Efficient and accurate damage detector for wind turbine blade images. IEEE Access 10: 123378–123386. https://doi.org/10.1109/ACCESS.2022.3224446

25.

Manly

McDonald

Thomas

, et al. (2002) Resource Selection by Animals. 2nd edition. Springer.

26.

Moreno

Pena

Toledo

, et al. (2018) A new vision-based method using deep learning for damage inspection in wind turbine blades. In: 2018 15th international conference on electrical engineering, computing science and automatic control, CCE 2018, Mexico City, Mexico, 05–07 September 2018, pp. 1–5. https://doi.org/10.1109/ICEEE.2018.8533924

27.

Nikolenko

(2021) Synthetic data for deep learning. Springer Optimization and Its Applications 174: 1–54. https://doi.org/10.1007/978-3-030-75178-4_1

28.

Nikolov

Madsen

(2020). Rough or noisy? Metrics for noise estimation in SfM reconstructions. Sensors, 20(19), p.5725. https://doi.org/10.17632/fptxw8cynv.1

29.

Nunn

Khadivi

Samavi

(2021). Compound frechet inception distance for quality assessment of gan created images. arXiv preprint arXiv:2106.08575 . https://doi.org/10.abs/2106.08575

30.

Padilla

Netto

Da Silva

EAB

(2020) A survey on performance metrics for object-detection algorithms. In: International conference on systems, signals, and image processing, Niteroi, Brazil, 01–03 July 2020, pp. 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130

31.

Pielou

(1966) The measurement of diversity in different types of biological collections. Journal of Theoretical Biology 13: 131–144. https://doi.org/10.1016/0022-5193(66)90013-0

32.

Ravikumar

NVA

Sasidhar

Manoj

, et al. (2025) Design and real-time simulations of robust controllers for uncertain multi-input wind turbine. Energy Exploration & Exploitation 44: 276–292. https://doi.org/10.1177/01445987251373101

33.

Reddy

Indragandhi

Ravi

, et al. (2019) Detection of cracks and damage in wind turbine blades using artificial intelligence-based image analytics. Measurement: Journal of the International Measurement Confederation 147: 106823. https://doi.org/10.1016/j.measurement.2019.07.051

34.

Ren

Girshick

, et al. (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6): 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

35.

Richter

Vineet

Roth

, et al. (2016) Playing for data: Ground truth from computer games. European conference on computer vision. Cham: Springer International Publishing, 102–118. Available at: https://doi.org/10.1007/978-3-319-46475-6_7

36.

Ros

Sellart

Materzynska

, et al. (2016) The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2016-Decem(600388), Las Vegas, NV, USA, 27–30 June 2016. IEEE, pp. 3234–3243. https://doi.org/10.1109/CVPR.2016.352

37.

Rumsey

Musial

(2001) Application of infrared thermography nondestructive testing during wind turbine blade tests. Journal of Solar Energy Engineering, Transactions of the ASME 123(4): 271. https://doi.org/10.1115/1.1409560

38.

Schmedemann

Baaß

Schoepflin

, et al. (2022) Procedural synthetic training data generation for AI-based defect detection in industrial surface inspection. Procedia CIRP 107: 1101–1106. https://doi.org/10.1016/j.procir.2022.05.115

39.

Shannon

(1948) A mathematical theory of communication. Bell System Technical Journal 27(3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

40.

Sharma

Kumar

Longchamps

(2024) Smart agricultural technology faster R-CNN models for detection of multiple weed species. Smart Agricultural Technology 9: 100648. https://doi.org/10.1016/j.atech.2024.100648

41.

Shihavuddin

ASM

Chen

Fedorov

, et al. (2019) Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies 12(4): 1–15. https://doi.org/10.3390/en12040676

42.

Shorten

Khoshgoftaar

(2019) A survey on image data augmentation for deep learning. Journal of Big Data 6(1): 60. https://doi.org/10.1186/s40537-019-0197-0

43.

Sierra-Pérez

Torres-Arredondo

Güemes

(2016) Damage and nonlinearities detection in wind turbine blades based on strain field pattern recognition. FBGs, OBR and strain gauges comparison. Composite Structures 135: 156–166. https://doi.org/10.1016/j.compstruct.2015.08.137

44.

Snoek

Larochelle

Adams

(2012) Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 4: 2951–2959.

45.

Sokal

Rohlf

(2012) Biometry. The principles and practice of statistics in biological research. Systematic Zoology 19(4): 391–393. https://doi.org/10.2307/2412280

46.

Tchakoua

Wamkeue

Ouhrouche

, et al. (2014) Wind turbine condition monitoring: state-of-the-art review, new trends, and future challenges. Energies 7(4): 2595–2630. https://doi.org/10.3390/en7042595

47.

Tong

Fan

Peng

, et al. (2024) WTBD-YOLOv8: an improved method for wind turbine generator defect detection. Sustainability 16: 4467.

48.

Tremblay

Prakash

Acuna

, et al. (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018, pp. 1082–1090. https://doi.org/10.1109/CVPRW.2018.00143

49.

Tzutalin (2015) LabelImg. Online github. https://github.com/heartexlabs/labelImg

50.

Wang

Zhang

(2017) Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Transactions on Industrial Electronics 64(9): 7293–7303. https://doi.org/10.1109/TIE.2017.2682037

51.

Wang

Liang

Xiang

(2014) Damage detection method for wind turbine blades based on dynamics analysis and mode shape difference curvature information. Mechanical Systems and Signal Processing 48(1–2): 351–367. https://doi.org/10.1016/j.ymssp.2014.03.006

52.

Wang

Chen

Liu

, et al. (2024a) YOLOv10: real-time end-to-end object detection. Advances in Neural Information Processing Systems 37: 1–21.

53.

Wang

Yang

Guanghuan

Xianghua

. “Light-YOLO: a lightweight framework for multi-scale contraband detection in X-ray security images via channel-decoupled feature learning.” Journal of Real-Time Image Processing 22, no. 4 (2025): 158.

54.

Wang

Zhang

(2017) Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Transactions on Industrial Electronics 64(9): 7293–7303.

55.

Yang

Peng

Wei

, et al. (2017) Structural health monitoring of composite wind turbine blades: challenges, issues and potential solutions. IET Renewable Power Generation 11(4): 411–416. https://doi.org/10.1049/iet-rpg.2016.0087

56.

Yang

Zhang

, et al. (2021) Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier. Renewable Energy 163: 386–397. https://doi.org/10.1016/j.renene.2020.08.125

57.

Cao

Liu

, et al. (2017) Image-based damage recognition of wind turbine blades. In: 2017 2nd international conference on advanced robotics and mechatronics, ICARM, Hefei and Tai’an, China, 27–31 August 2017, pp. 161–166. https://doi.org/10.1109/ICARM.2017.8273153

58.

Zhang

Cosma

Watkins

(2021) Image enhanced mask R-CNN: a deep learning pipeline with new evaluation measures for wind turbine blade defect detection and classification. Journal of Imaging 7: 46.

59.

Zhang

Wen

(2022) Sod‐yolo: a small target defect detection algorithm for wind turbine blades based on improved YOLOV5. Advanced Theory and Simulations 5(7): 2100631.

60.

Zhao

Singh

Lee

, et al. (2021) Improved consistency regularization for gans. Proceedings of the AAAI conference on artificial intelligence 35(12): 11033–11041. Available at: https://doi.org/10.1609/aaai.v35i12.17317

61.

Zou

Cheng

(2022) Research on wind turbine blade surface damage identification based on improved convolution neural network. Applied Sciences 12(18): 9338. https://doi.org/10.3390/app12189338

62.

Zou

Chen

Yang

, et al. (2025) An improved method of AUD-YOLO for surface damage detection of wind turbine blades. Scientific reports 15: 1–16.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.79 MB