A wind turbine blade damage detection method for low-quality damage images

By constructing a wind turbine blade damage detection model WLM-Net, which consists of a wavelet deformable module WDM, a cross-stage local dual-core module C3k2, a lightweight dynamic feature pyramid LD-FPN, and a hybrid residual detection head MRHead, the problem of difficulty in extracting damage feature information from low-quality images is solved, achieving efficient and accurate damage detection. It is suitable for embedded devices with limited computing resources.

CN122048944BActive Publication Date: 2026-06-19HUNAN UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN UNIV OF SCI & TECH
Filing Date
2026-04-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing wind turbine blade damage detection methods based on YOLO series detection networks suffer from a trade-off between model complexity and detection accuracy in low-quality images, making them difficult to deploy effectively on embedded devices with limited computing resources. Furthermore, damage feature information is difficult to extract effectively from low-quality images.

Method used

We constructed a wavelet deformable module (WDM), a cross-stage local dual-core module (C3k2), a lightweight dynamic feature pyramid (LD-FPN), and a hybrid residual detection head (MRHead). Combined with the YOLO framework, we designed a wind turbine blade damage detection model (WLM-Net). Through multimodal feature collaborative extraction, dynamic receptive field adjustment, and global-local feature fusion, we improved detection accuracy and computational efficiency.

Benefits of technology

It improves the accuracy and efficiency of damage detection in low-quality damage images, enhances the ability to identify and locate damage in complex backgrounds and at multiple scales, reduces the computational resource requirements, and is suitable for edge devices with limited computational resources.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122048944B_ABST
    Figure CN122048944B_ABST
Patent Text Reader

Abstract

This invention discloses a wind turbine blade damage detection method for low-quality damaged images, comprising the following steps: constructing a dataset; designing a wavelet deformable module, a cross-stage local dual-core module, a lightweight dynamic feature pyramid, and a hybrid residual detection head to construct a wind turbine blade damage detection model; training the wind turbine blade damage detection model using a training set; and performing detection. In the backbone network of the wind turbine blade damage detection model, this invention designs C3k2_WDM, enhancing the backbone network's ability to extract global features and complex textured damage shapes, thus solving the problem of effectively extracting damage feature information from complex backgrounds in low-quality images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of wind power, and in particular to a method for detecting damage to wind turbine blades in low-quality damage images. Background Technology

[0002] To maximize the utilization of wind energy, wind turbines are typically installed in locations with abundant wind resources, such as high altitudes and offshore areas. As the core component of wind turbines, the wind turbine blades are therefore susceptible to environmental factors such as alternating wind speeds, diurnal temperature variations, moisture erosion, lightning strikes, and icing. Simultaneously, the increasing capacity of individual wind turbine units, the continuous growth in tower height, and the increasing length of blades have brought significant challenges to wind turbine maintenance. Most wind turbine failures are caused by damage to the wind turbine blades, with cracks and surface peeling being the most common forms of blade damage.

[0003] Surface damage detection of wind turbine blades is a key concern in the wind power industry. Numerous experts and scholars have conducted extensive research using various technologies, achieving excellent application results in specific scenarios. With the continuous development of technologies such as deep learning and object detection, methods combining these technologies for wind turbine blade surface damage detection have received widespread attention. Deep learning-based damage detection methods are a type of non-invasive blade detection method, mainly including single-stage and two-stage detection networks. YOLO, as a typical single-stage detection network, simplifies the wind turbine blade damage detection process by achieving target localization and classification through a single network.

[0004] While the YOLO series of detection networks has achieved good results in wind turbine blade damage detection due to its efficiency, real-time performance, and robustness, the quality of the acquired wind turbine blade surface damage images is low due to the influence of drone shooting angle and distance, light intensity, and complex backgrounds such as farmland and forests. Furthermore, low-quality wind turbine blade images contain a large number of multi-scale and complex damage types, and the computing resources of edge devices are limited. Therefore, damage detection technology still faces a trade-off between model complexity and detection accuracy. In other words, while increasing the number of parameters can improve detection accuracy through more complex feature representations, it also leads to increased demands on computing resources, making it difficult to meet the deployment requirements of embedded devices. Summary of the Invention

[0005] To address the aforementioned technical problems, this invention provides a wind turbine blade damage detection method for low-quality damage images that features a simple algorithm and high detection accuracy.

[0006] The technical solution of this invention to solve the above-mentioned technical problems is: a method for detecting damage to wind turbine blades in low-quality damage images, comprising the following steps:

[0007] S1, Constructing the dataset: Collect images of wind turbine blade damage by drone, and divide the blade image data of different damage types into training set, validation set and test set after data augmentation;

[0008] S2, Model Construction: Design wavelet deformable module WDM, cross-stage local dual-core module C3k2, lightweight dynamic feature pyramid LD-FPN and hybrid residual detection head MRHead to construct a wind turbine blade damage detection model.

[0009] S3, Model Training: The wind turbine blade damage detection model is trained using the training set;

[0010] S4, Perform detection: Input the wind turbine blade damage images from the test set into the trained wind turbine blade damage detection model, and based on the inference results of the wind turbine blade damage detection model on the test set, save the weight file with the best performance and determine the structure of the final wind turbine blade damage detection model.

[0011] The above-mentioned wind turbine blade damage detection method for low-quality damage images, specifically step S2, is as follows:

[0012] S21: In the backbone network, a wavelet deformable module WDM and a cross-stage local dual-core module C3k2 are designed. The combination of wavelet deformable module WDM and C3k2 is denoted as C3k2_WDM. C3k2_WDM is used to extract features from global features and complex texture damage shapes.

[0013] S22: In the Neck network, a lightweight dynamic feature pyramid LD-FPN is constructed to fuse local and global feature information for damage identification and localization;

[0014] S23: In the Head network, a hybrid residual detection head MRHead is proposed to capture fine-grained features;

[0015] S24: Construct WLM-Net, a wind turbine blade damage detection model based on the YOLO framework.

[0016] In the above-mentioned wind turbine blade damage detection method for low-quality damage images, the working process of C3k2_WDM in step S21 is as follows:

[0017] First, the input feature map is adjusted for the number of channels using a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1, resulting in the first feature map. The standard convolutional layer ConvBNSiLU is composed of a standard convolutional layer, a batch normalization layer, and a SiLU activation function layer connected in series. Then, the first feature map is divided into two parts along the channel dimension by the channel splitting module Split, resulting in the second feature map and the eighth feature map.

[0018] The second feature map enters the feature extraction module. The Feature Extraction module performs different feature extractions based on the value of the Wavelet Deformable Module (WDM): When the WDM value is false, the Bottleneck module is used to extract shallow features, resulting in the third feature map. When the WDM value is true, a dual-branch structure containing the WDM is used to extract deep features, resulting in the seventh feature map. One branch of the dual-branch structure containing the WDM modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fourth feature map. The other branch first modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fifth feature map. Then, features are extracted using the WDM to obtain the sixth feature map. Finally, the fourth and sixth feature maps are concatenated and the number of channels is adjusted using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the seventh feature map.

[0019] The eighth feature map is concatenated with the third or seventh feature map, and the number of channels is adjusted again by a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. Finally, the integrated ninth feature map is output.

[0020] The above-mentioned wind turbine blade damage detection method for low-quality damage images uses a wavelet deformable module (WDM) with a four-branch processing architecture to achieve multimodal feature collaborative extraction, including a first branch, a second branch, a third branch, and a fourth branch.

[0021] The first branch uses depth wavelet separable convolution (DWTPConv) to extract feature information. It decomposes the input fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH through discrete wavelet transform (DWT). It uses a dynamic feature weighting mechanism to enhance the feature information of LL, LH, HL, and HH, and uses pointwise convolution to achieve linear combination across channels, thereby extracting feature information of different frequencies and spatial scales to obtain the eighteenth feature map.

[0022] The second branch employs a dynamic receptive field adjustment technique, using convolutional kernels with learnable offset parameters to enable the network to adaptively capture the complex texture features of the damaged area, thus obtaining the nineteenth feature map.

[0023] The third branch captures local spatial detail features through a standard convolution ConvBNSiLU with a kernel size of 3×3 and a stride of 1, resulting in the twentieth feature map.

[0024] The fourth branch preserves global semantic information through identity mapping, resulting in the twenty-first feature map;

[0025] Finally, the eighteenth, nineteenth, twentieth, and twenty-first feature maps are concatenated along the channel dimension and then recombined and nonlinearly mapped using a dual pointwise Gaussian error linear unit (DPGU) to generate the sixth feature map.

[0026] The Dual Pointwise Gaussian Error Linear Unit (DPGU) is a feedforward structure with residual fusion. Specifically, the concatenated input features are divided into two paths: one is the residual branch, which directly performs identity mapping; the other is the main processing branch, which sequentially passes through two standard convolutional layers (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, a GELU activation function layer, and a dropout layer. The output of the main processing branch and the features of the residual branch are added element-wise to perform residual fusion, and the final sixth feature map is output.

[0027] In the above-mentioned wind turbine blade damage detection method for low-quality damage images, the first branch of the wavelet deformable module (WDM) uses the depth wavelet separable convolution (DWTPConv) to process the input fifth feature map through a two-step feature extraction process.

[0028] Step 1: First, input a fifth feature map of size C1×H×W, where C1 is the number of channels, H is the height, and W is the width. Processing is done through two parallel branches: a fifth branch and a sixth branch. The fifth branch uses a 3×3 standard 2D convolutional layer (Conv2d) to extract features from the fifth feature map, resulting in the sixteenth feature map. The sixth branch uses Discrete Wavelet Transform (DWT) to decompose the fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH. These four components constitute the four... The feature maps are designated as the tenth, eleventh, twelfth, and thirteenth feature maps, each with a size of C1×H / 2×W / 2. The ReShape module is used to adjust these four feature maps into a fourteenth feature map with a size of 4C1×H / 2×W / 2. Next, the fourteenth feature map is processed using a 3×3 standard 2D convolutional layer (Conv2d) and inverse wavelet transform (IWT) to obtain the fifteenth feature map. Finally, the fifteenth and sixteenth feature maps undergo residual fusion to output a seventeenth feature map with a size of C1×H×W.

[0029] Step 2: Apply pointwise convolution to the seventeenth feature map output from Step 1 to perform a linear combination across channels, thereby adjusting the channel dimensions and fusion the features, and outputting the eighteenth feature map with C2 channels, H height, and W width.

[0030] In the aforementioned wind turbine blade damage detection method for low-quality damage images, in step S22, the lightweight dynamic feature pyramid LD-FPN is fused from a lightweight linear dilated convolution module LDConv and a cross-stage partial channel-space grouping convolution module VoV-GSCSPC. The feature extraction mechanism of the lightweight linear dilated convolution module LDConv is as follows: First, based on the number of parameters N, an initial approximately square sampling topology network is constructed according to the coordinate generation algorithm; then, a 2N-dimensional dynamic offset is predicted through a 1×1 two-dimensional standard convolutional layer Conv2d, and the offset is superimposed with the initial coordinates to form an adaptive sampling position; finally, feature resampling of irregular regions is achieved based on bilinear interpolation, and multi-scale feature aggregation is completed through a feature recombination strategy.

[0031] The aforementioned wind turbine blade damage detection method for low-quality damaged images utilizes the cross-stage partial channel-spatial grouped convolutional module VoV-GSCSPC, which processes the input features through two parallel seventh and eighth branches. The seventh branch adjusts the number of channels using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1. The eighth branch first performs channel matching on the input features using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then inputs the processed features into the grouped shuffling bottleneck module GSBottleneckC for feature extraction. Finally, the features output from the seventh branch and the features output from the eighth branch after processing by the grouped shuffling bottleneck module GSBottleneckC are concatenated along the channel dimension, and the number of channels is adjusted again using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, outputting the final processed features of the cross-stage partial channel-spatial grouped convolutional module VoV-GSCSPC.

[0032] The group shuffling bottleneck module GSBottleneckC comprises two group spatial convolutional modules GSConv with different kernel sizes and a depthwise separable convolutional module DWConv. The features input to the group shuffling bottleneck module GSBottleneckC are processed in two parallel paths. One path of feature information is extracted by the depthwise separable convolutional module DWConv, while the other path of feature information is extracted sequentially by the two group spatial convolutional modules GSConv. The features output from the two parallel processing paths are then subjected to residual fusion to output the final processed features of the group shuffling bottleneck module GSBottleneckC.

[0033] The aforementioned wind turbine blade damage detection method for low-quality damaged images uses a grouped spatial convolution module GSConv to process input features through two parallel ninth and tenth branches. The ninth branch first uses a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels, and then uses a depthwise separable convolution DWConv to complete feature extraction. The tenth branch directly passes features through identity mapping. Finally, the features output by the ninth and tenth branches are concatenated along the channel dimension, and the features are recombined through a channel shuffle module to output the final processed features of the grouped spatial convolution module GSConv.

[0034] In the above-mentioned wind turbine blade damage detection method for low-quality damage images, in step S23, the hybrid residual detection head MRHead adopts a dual-branch architecture, including a regression branch and a classification branch.

[0035] The regression branch first extracts features through two cascaded hybrid residual modules MRBlock, and then completes feature reconstruction through a two-dimensional standard convolutional layer Conv2d. The hybrid residual module MRBlock consists of a depthwise separable convolution DWConv and a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. It enhances feature learning ability through residual fusion operation. The feature map output by the two-dimensional standard convolutional layer Conv2d completes target localization through distributed focal loss DFL and complete intersection-union ratio loss CIOU.

[0036] The classification branch includes two depthwise separable convolutional layers (DWConv), two standard convolutional layers (ConvBNSiLU) with 1×1 kernels and a stride of 1, and a two-dimensional standard convolutional layer (Conv2d). The network's learning ability for difficult samples is enhanced by the fully adaptive threshold focus loss function (ATFL). The formula for calculating the fully adaptive threshold focus loss function (ATFL) is as follows:

[0037] ;

[0038] in, This represents the current average predicted probability value. This indicates the predicted value for the next batch. It is an adjustable factor.

[0039] In the above-mentioned wind turbine blade damage detection method for low-quality damage images, in step S24, the wind turbine blade damage detection model WLM-Net includes a backbone network, a neck network, and a head network.

[0040] The backbone network consists of a first standard convolution, a second standard convolution, a first C3k2_WDM with the wavelet deformable module WDM set to False, a third standard convolution, a second C3k2_WDM with the wavelet deformable module WDM set to False, a fourth standard convolution, a third C3k2_WDM with the wavelet deformable module WDM set to True, and a fifth standard convolution, a fourth C3k2_WDM with the wavelet deformable module WDM set to True.

[0041] The neck network comprises, in sequence, a Fast Spatial Pyramid Pooling (SPPF) module, a cross-stage local pyramid slice attention module (C2PSA), a first upsampling module, a first stitching module, a first cross-stage partial channel-spatial grouping convolution module, a second upsampling module, a second stitching module, a second cross-stage partial channel-spatial grouping convolution module, a first lightweight linear dilation convolution module, a third stitching module, a third cross-stage partial channel-spatial grouping convolution module, a second lightweight linear dilation convolution module, a fourth stitching module, and a fourth cross-stage partial channel-spatial grouping convolution module; wherein the second C3k2_WDM is connected to the second stitching module, the third C3k2_WDM is connected to the first stitching module, the cross-stage local pyramid slice attention module (C2PSA) is connected to the fourth stitching module, and the first cross-stage partial channel-spatial grouping convolution module is connected to the third stitching module;

[0042] The input features of the Fast Spatial Pyramid Pooling (SPPF) module are first processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels. Then, they are input into three parallel cascaded 5×5 max pooling layers MaxPool. The pooled features are concatenated with the features before pooling along the channel dimension. Finally, the features are processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimension transformation, and output multi-scale fused features.

[0043] The cross-stage local pyramid slicing attention module C2PSA adopts a cross-stage local dual-branch architecture: the input features are first processed by a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then divided into two branches by the channel splitting module Split; one branch keeps the original features unchanged and directly passes them to the concatenation operation; the other branch passes through the position-sensitive attention module PSABlock to extract global attention information; the output features of the two branches are concatenated along the channel dimension, and then passed through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimensional transformation, and finally output features that have fused global attention information.

[0044] The input features of the position-sensitive attention module PSABlock are first processed by the multi-head self-attention module Attention to extract attention weights, and then residually connected with the input features of the position-sensitive attention module PSABlock. Subsequently, the feature transformation and residual fusion are completed through the feedforward network module FFN, which sequentially passes through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and a convolutional-batch normalization module ConvBN with a kernel size of 1×1 and a stride of 1, and outputs the enhanced attention features.

[0045] The head network includes a first hybrid residual detection head, a second hybrid residual detection head, and a third hybrid residual detection head. The first hybrid residual detection head is connected to a second cross-stage partial channel-space grouping convolution module, the second hybrid residual detection head is connected to a third cross-stage partial channel-space grouping convolution module, and the third hybrid residual detection head is connected to a fourth cross-stage partial channel-space grouping convolution module.

[0046] The beneficial effects of this invention are as follows:

[0047] 1. In the backbone network of the wind turbine blade damage detection model, the present invention designs C3k2_WDM to enhance the backbone network’s ability to extract global features and complex texture damage shapes, and solves the problem of difficulty in effectively extracting damage feature information in complex backgrounds of low-quality images.

[0048] 2. In the neck network of the wind turbine blade damage detection model, the present invention constructs a lightweight dynamic feature pyramid LD-FPN. By fusing local and global feature information, the network's ability to identify and locate damage is enhanced, solving the problem of multi-scale feature fusion for damage types of different shapes and sizes.

[0049] 3. In the head network of the wind turbine blade damage detection model, this invention proposes a hybrid residual detection head MRHead, which improves the network's ability to capture fine-grained features, enhances the learning ability of difficult samples, and solves the problems of difficulty in effectively extracting fine-grained features of complex textures and noise interference in low-quality images. Attached Figure Description

[0050] Figure 1 This is the overall flowchart of the present invention.

[0051] Figure 2 This is a structural diagram of the wavelet deformable module (WDM) in this invention.

[0052] Figure 3 This is a structural diagram of the depth wavelet separable convolution DWTPConv in this invention.

[0053] Figure 4 This is a structural diagram of C3k2_WDM in this invention.

[0054] Figure 5 This is a structural diagram of the cross-stage partial channel-space grouped convolution module VoV-GSCSPC in this invention.

[0055] Figure 6 This is a structural diagram of the hybrid residual detection head MRHead in this invention.

[0056] Figure 7This is a structural diagram of the wind turbine blade damage detection model WLM-Net in this invention. Detailed Implementation

[0057] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0058] like Figure 1 As shown, a method for detecting damage to wind turbine blades in low-quality damage images includes the following steps:

[0059] S1, Constructing the dataset: Images of wind turbine blade damage were collected by drones, and data augmentation was performed on three types of damage: cracks, epidermal peeling, and sand holes. Finally, the dataset of 2000 images was divided into training set, validation set, and test set in a ratio of 8:1:1.

[0060] The dataset used in this invention was captured by a drone at a wind farm and contains 2000 images of wind turbine blade damage, each 480×480 pixels. The dataset was divided in an 8:1:1 ratio, resulting in 1600 training images, 200 validation images, and 200 test images.

[0061] S2, Model Construction: Design wavelet deformable module WDM, lightweight dynamic feature pyramid LD-FPN and hybrid residual detection head MRHead to construct a wind turbine blade damage detection model for edge devices with limited computing resources.

[0062] The specific process of step S2 is as follows:

[0063] S21: In the backbone network, a wavelet deformable module WDM and a cross-stage local dual-core module C3k2 are designed. The combination of wavelet deformable module WDM and C3k2 is denoted as C3k2_WDM. C3k2_WDM is used to extract features from global features and complex texture damage shapes.

[0064] like Figure 4 As shown, the working process of C3k2_WDM is as follows:

[0065] First, the input feature map is adjusted for the number of channels using a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1, resulting in the first feature map, denoted as F0. The standard convolutional layer ConvBNSiLU consists of a standard convolutional layer, a batch normalization layer, and a SiLU activation function layer connected in series. Then, the first feature map is divided into two parts along the channel dimension by the channel splitting module Split, resulting in the second feature map and the eighth feature map, denoted as F2.

[0066] The second feature map enters the Feature Extraction module. The Feature Extraction module performs different feature extractions based on the value of the wavelet deformable module (WDM): when the WDM value is false, the bottleneck module (Bottleneck) is used to extract shallow features, resulting in the third feature map, denoted as F3; when the WDM value is true, a bi-branch structure containing the WDM is used to extract deep features, resulting in the seventh feature map, denoted as F7. The two-branch structure containing the wavelet deformable module (WDM) has one branch that modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fourth feature map. The other branch first modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fifth feature map, denoted as F5. Then, the wavelet deformable module (WDM) extracts features to obtain the sixth feature map, denoted as F6. Finally, the fourth and sixth feature maps are concatenated. Figure 4 In this context, C represents the splicing module) and a standard convolutional ConvBNSiLU with a kernel size of 1×1 and a stride of 1 is used to adjust the number of channels to obtain the seventh feature map;

[0067] The eighth feature map is concatenated with the third or seventh feature map, and the number of channels is adjusted again by a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. The final output is the integrated ninth feature map, which is denoted as F9.

[0068] like Figure 2 As shown, the wavelet deformable module (WDM) adopts a four-branch processing architecture to achieve collaborative extraction of multimodal features, including the first branch, the second branch, the third branch, and the fourth branch.

[0069] The first branch uses depth wavelet separable convolution (DWTPConv) to extract feature information. It decomposes the input fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH through discrete wavelet transform (DWT). It uses a dynamic feature weighting mechanism to enhance the feature information of LL, LH, HL, and HH, and uses pointwise convolution to achieve linear combination across channels, thereby extracting feature information of different frequencies and spatial scales to obtain the eighteenth feature map.

[0070] The second branch employs a dynamic receptive field adjustment technique, using convolutional kernels with learnable offset parameters to enable the network to adaptively capture the complex texture features of the damaged area, thus obtaining the nineteenth feature map.

[0071] The third branch captures local spatial detail features through a standard convolution ConvBNSiLU with a kernel size of 3×3 and a stride of 1, resulting in the twentieth feature map.

[0072] The fourth branch preserves global semantic information through identity mapping, resulting in the twenty-first feature map;

[0073] Finally, the eighteenth, nineteenth, twentieth, and twenty-first feature maps are concatenated along the channel dimension and then recombined and nonlinearly mapped using a dual pointwise Gaussian error linear unit (DPGU) to generate the sixth feature map.

[0074] The Dual Pointwise Gaussian Error Linear Unit (DPGU) is a feedforward structure with residual fusion. Specifically, the concatenated input features are divided into two paths: one is the residual branch, which directly performs identity mapping; the other is the main processing branch, which sequentially passes through two standard convolutional layers (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, a GELU activation function layer, and a dropout layer. The output of the main processing branch and the features of the residual branch are combined element-wise by a residual fusion operation (ResidualFusion) to output the final sixth feature map.

[0075] In the first branch of the wavelet deformable module WDM, such as Figure 3 As shown, the depth wavelet separable convolution (DWTPConv) processes the input fifth feature map through a two-step feature extraction process.

[0076] Step 1: First, input a fifth feature map of size C1×H×W, where C1 is the number of channels, H is the height, and W is the width. Processing is done through two parallel branches: a fifth branch and a sixth branch. The fifth branch uses a 3×3 standard 2D convolutional layer (Conv2d) to extract features from the fifth feature map, resulting in the sixteenth feature map. The sixth branch uses Discrete Wavelet Transform (DWT) to decompose the fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH. These four components constitute the four... The feature maps are designated as the tenth, eleventh, twelfth, and thirteenth feature maps, each with a size of C1×H / 2×W / 2. The ReShape module is used to adjust these four feature maps into a fourteenth feature map with a size of 4C1×H / 2×W / 2. Next, the fourteenth feature map is processed using a 3×3 standard 2D convolutional layer (Conv2d) and inverse wavelet transform (IWT) to obtain the fifteenth feature map. Finally, the fifteenth and sixteenth feature maps undergo residual fusion to output a seventeenth feature map with a size of C1×H×W.

[0077] Step 2: Apply pointwise convolution to the seventeenth feature map output from Step 1 to perform a linear combination across channels, thereby adjusting the channel dimensions and fusion the features, and outputting the eighteenth feature map with C2 channels, H height, and W width.

[0078] S22: In the neck network, a lightweight dynamic feature pyramid (LD-FPN) is constructed to fuse local and global feature information for damage identification and localization. Through the lightweight dynamic feature pyramid (LD-FPN), the wind turbine blade damage detection model WLM-Net can effectively fuse local and global feature information, enhancing the network's ability to identify and localize damage.

[0079] The Lightweight Dynamic Feature Pyramid LD-FPN is composed of a lightweight linear dilated convolution module LDConv and a cross-stage partial channel-spatial grouped convolution module VoV-GSCSPC. The lightweight linear dilated convolution module LDConv can reduce network parameters and computational complexity, and realize the irregular convolution feature extraction process. VoV-GSCSP can maintain accuracy while keeping the computational cost low.

[0080] The feature extraction mechanism of the lightweight linear dilated convolution module LDConv is as follows: First, based on the number of parameters N, an initial approximately square sampling topology network is constructed according to the coordinate generation algorithm; then, a 1×1 two-dimensional standard convolutional layer Conv2d is used to predict the 2N-dimensional dynamic offset, and the offset is superimposed with the initial coordinates to form an adaptive sampling position; finally, feature resampling of irregular regions is achieved based on bilinear interpolation, and multi-scale feature aggregation is completed through a feature recombination strategy.

[0081] like Figure 5 As shown, the cross-stage partial channel-spatial grouped convolutional module VoV-GSCSPC processes the input features through two parallel seventh and eighth branches, denoted as T1. The seventh branch adjusts the number of channels using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1. The eighth branch first performs channel matching on the input features using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then inputs the processed features into the grouping shuffling bottleneck module GSBottleneckC for feature extraction. Finally, the features output from the seventh branch and the features output from the eighth branch after processing by the grouping shuffling bottleneck module GSBottleneckC are concatenated along the channel dimension, and the number of channels is adjusted again using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, outputting the final processed features of the cross-stage partial channel-spatial grouped convolutional module VoV-GSCSPC.

[0082] The group shuffling bottleneck module GSBottleneckC comprises two group spatial convolutional modules GSConv with different kernel sizes and a depthwise separable convolutional module DWConv. The features input to the group shuffling bottleneck module GSBottleneckC are processed in two parallel paths. One path of feature information is extracted by the depthwise separable convolutional module DWConv, while the other path of feature information is extracted sequentially by the two group spatial convolutional modules GSConv. The features output from the two parallel processing paths are then subjected to residual fusion to output the final processed features of the group shuffling bottleneck module GSBottleneckC.

[0083] The grouped spatial convolution module GSConv processes the input features through two parallel ninth and tenth branches. The ninth branch first uses a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels, and then uses a depthwise separable convolution DWConv to complete feature extraction. The tenth branch directly passes features through identity mapping. Finally, the features output by the ninth and tenth branches are concatenated along the channel dimension, and the features are reorganized through the channel shuffle module to output the final processed features of the grouped spatial convolution module GSConv.

[0084] S23: In the Head network, a hybrid residual detection head MRHead is proposed to capture fine-grained features. In addition, a fully adaptive threshold focus loss function ATFL is introduced to classify damage by dynamically adjusting the loss weights and using a fixed threshold segmentation mechanism.

[0085] like Figure 6 As shown, the hybrid residual detection head MRHead adopts a dual-branch architecture, including a regression branch and a classification branch;

[0086] The features input to the hybrid residual detection head MRHead are denoted as T2. The regression branch first extracts features through two cascaded hybrid residual modules MRBlock, and then completes feature reconstruction through a two-dimensional standard convolutional layer Conv2d. The hybrid residual module MRBlock consists of a depthwise separable convolution DWConv and a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. It enhances the feature learning ability through residual fusion operation. The enhanced features output by the first hybrid residual module MRBlock are denoted as T3. The feature map output by the two-dimensional standard convolutional layer Conv2d completes target localization through distributed focal loss DFL and complete intersection-union loss CIOU.

[0087] The classification branch includes two depthwise separable convolutional layers (DWConv), two standard convolutional layers (ConvBNSiLU) with 1×1 kernels and a stride of 1, and a two-dimensional standard convolutional layer (Conv2d). The network's learning ability for difficult samples is enhanced by the fully adaptive threshold focus loss function (ATFL). The formula for calculating the fully adaptive threshold focus loss function (ATFL) is as follows:

[0088] ;

[0089] in, This represents the current average predicted probability value. This indicates the predicted value for the next batch. It is an adjustable factor used to increase attention to difficult samples.

[0090] S24: Construct WLM-Net, a wind turbine blade damage detection model based on the YOLO framework.

[0091] like Figure 7 As shown, the wind turbine blade damage detection model WLM-Net includes a backbone network, a neck network, and a head network;

[0092] The backbone network consists of a first standard convolution, a second standard convolution, a first C3k2_WDM with the wavelet deformable module WDM set to False, a third standard convolution, a second C3k2_WDM with the wavelet deformable module WDM set to False, a fourth standard convolution, a third C3k2_WDM with the wavelet deformable module WDM set to True, and a fifth standard convolution, a fourth C3k2_WDM with the wavelet deformable module WDM set to True.

[0093] The neck network comprises, in sequence, a Fast Spatial Pyramid Pooling (SPPF) module, a cross-stage local pyramid slice attention module (C2PSA), a first upsampling module, a first stitching module, a first cross-stage partial channel-spatial grouping convolution module, a second upsampling module, a second stitching module, a second cross-stage partial channel-spatial grouping convolution module, a first lightweight linear dilation convolution module, a third stitching module, a third cross-stage partial channel-spatial grouping convolution module, a second lightweight linear dilation convolution module, a fourth stitching module, and a fourth cross-stage partial channel-spatial grouping convolution module; wherein the second C3k2_WDM is connected to the second stitching module, the third C3k2_WDM is connected to the first stitching module, the cross-stage local pyramid slice attention module (C2PSA) is connected to the fourth stitching module, and the first cross-stage partial channel-spatial grouping convolution module is connected to the third stitching module;

[0094] The input features of the Fast Spatial Pyramid Pooling (SPPF) module are first processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels. Then, they are input into three parallel cascaded 5×5 max pooling layers MaxPool. The pooled features are concatenated with the features before pooling along the channel dimension. Finally, the features are processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimension transformation, and output multi-scale fused features.

[0095] The cross-stage local pyramid slicing attention module C2PSA adopts a cross-stage local dual-branch architecture: the input features are first processed by a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then divided into two branches by the channel splitting module Split; one branch keeps the original features unchanged and directly passes them to the concatenation operation; the other branch passes through the position-sensitive attention module PSABlock to extract global attention information; the output features of the two branches are concatenated along the channel dimension, and then passed through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimensional transformation, and finally output features that have fused global attention information.

[0096] The input feature T4 of the position-sensitive attention module PSABlock is first processed by the multi-head self-attention module Attention to extract attention weights, and then residually connected with the input feature of the position-sensitive attention module PSABlock. Subsequently, it is processed by the feedforward network module FFN, which sequentially passes through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and a convolutional-batch normalization module ConvBN with a kernel size of 1×1 and a stride of 1, to complete feature transformation and residual fusion, and output the enhanced attention feature T5.

[0097] The head network includes a first hybrid residual detection head, a second hybrid residual detection head, and a third hybrid residual detection head. The first hybrid residual detection head is connected to a second cross-stage partial channel-space grouping convolution module, the second hybrid residual detection head is connected to a third cross-stage partial channel-space grouping convolution module, and the third hybrid residual detection head is connected to a fourth cross-stage partial channel-space grouping convolution module.

[0098] S3, Model Training: The wind turbine blade damage detection model is trained using the training set.

[0099] S4, Perform detection: Input the wind turbine blade damage images from the test set into the trained wind turbine blade damage detection model, and save the weight file with the best performance based on the inference results of the wind turbine blade damage detection model on the test set.

[0100] Based on the evaluation indicators, a comprehensive analysis was conducted on the wind turbine blade damage detection model WLM-Net of the present invention and other YOLO series detection algorithms, and the effectiveness of each core module in the wind turbine blade damage detection model WLM-Net of the present invention was verified by ablation experiments.

[0101] Evaluation metrics include average accuracy (AP50) for a single damage class, average accuracy (mAP50) for all damage classes, number of parameters (Params), and number of floating-point operations (FLOPS).

[0102] This embodiment compares and analyzes the currently mainstream YOLO series detection networks with the wind turbine blade damage detection model WLM-Net proposed in this invention. The experimental results are shown in Table 1. In Table 1, mAP50 represents the average accuracy of all damage categories when the intersection-to-union ratio (IOU) is 0.5, and AP50 represents the average accuracy of a single damage category. The larger these two values ​​are, the higher the detection accuracy. Params and FLOPS represent the number of parameters and floating-point calculations of the detection network, respectively. The smaller their values ​​are, the lower the complexity of the network.

[0103] Table 1

[0104]

[0105] As shown in Table 1, the wind turbine blade damage detection model WLM-Net proposed in this invention achieves an average accuracy (mAP50) of 85.1% on the wind turbine blade damage dataset. In terms of mAP50, the wind turbine blade damage detection model WLM-Net achieves improvements of 3.8, 4.0, 6.8, 5.7, 4.2, 3.7, and 2.8 percentage points respectively compared to several mainstream versions of the single-stage target detection algorithm YOLO, including the 8th generation medium-scale version YOLOv8-M, the 9th generation medium-scale version YOLOv9-M, the 10th generation basic version YOLOv10-B, the 10th generation large-scale version YOLOv10-L, the 11th generation medium-scale version YOLOv11-M, the 11th generation large-scale version YOLOv11-L, and the 12th generation medium-scale version YOLOv12-M, reaching the best value among the currently compared detection methods. Through detailed comparative analysis, it was found that the wind turbine blade damage detection model WLM-Net has the lowest number of parameters (17.4M) and the lowest computational cost (54.4G). This indicates that the wind turbine blade damage detection model WLM-Net achieves superior parameter and computational cost while maintaining high-precision detection capabilities.

[0106] In fine-grained category analysis, the wind turbine blade damage detection model WLM-Net achieved the highest AP50 detection accuracies of 94.3% and 74.6% for cracks and pinholes, respectively. This result is attributed to the C3k2_WDM designed in this invention, which allows the network to adapt to the damage characteristics of slender cracks and irregularly shaped pinholes through dynamic receptive field adjustment. Furthermore, the lightweight dynamic feature pyramid LD-FPN promotes multi-scale feature fusion, and the hybrid residual detection head MRHead increases the network's capture of fine-grained features. The combined optimization effect of these three elements improves the damage detection accuracy.

[0107] To verify the effect of each step improvement of the wind turbine blade damage detection model WLM-Net proposed in this invention, the following ablation experiments were conducted, and the experimental results are shown in Table 2.

[0108] Table 2

[0109]

[0110] To better present the experimental results, the five improvements were named as follows:

[0111] (1) YOLOv11-M, the 11th generation medium-scale version, was used as the baseline model;

[0112] (2) The baseline model improved using C3k2_WDM is called W-Net;

[0113] (3) The baseline model using the Lightweight Dynamic Feature Pyramid LD-FPN is called L-Net;

[0114] (4) The baseline model of the hybrid residual detection head MRHead is called M-Net;

[0115] (5) The baseline model that introduces the Lightweight Dynamic Feature Pyramid LD-FPN and the Hybrid Residual Detection Head MRHead is called LM-Net;

[0116] (6) The baseline model of the lightweight dynamic feature pyramid LD-FPN, which is improved by C3k2_WDM, is called WL-Net;

[0117] (7) The baseline model that adopts C3k2_WDM and introduces the hybrid residual detection head MRHead is called WM-Net;

[0118] (8) The baseline model that adopts C3k2_WDM improvement, lightweight dynamic feature pyramid LD-FPN and introduces hybrid residual detection head MRHead is called WLM-Net.

[0119] As shown in the first four sets of experimental data in Table 2, C3k2_WDM, the lightweight dynamic feature pyramid LD-FPN, and the hybrid residual detection head MRHead all contributed to the detection accuracy of WLM-Net, verifying the effectiveness and feasibility of each module. W-Net improved mAP50 from 80.9% to 82.9%, with a slight reduction in parameter count and computational complexity. L-Net, by constructing the lightweight feature pyramid LD-FPN using the lightweight linear dilated convolution module LDConv and the cross-stage partial channel-spatial grouping convolution module VoV-GSCSPC in the neck network, achieved an mAP50 improvement of more than 1.6% on this dataset, promoting the improvement of network damage detection capabilities. After introducing the efficient hybrid residual detection head MRHead, M-Net improved mAP50 by 2.1% on this dataset, indicating that M-Net had a positive impact on detection accuracy while reducing parameter count and computational complexity to varying degrees.

[0120] In subsequent experiments (groups 5 through 7), this invention combined the Lightweight Dynamic Feature Pyramid (LD-FPN), C3k2_WDM, and the Hybrid Residual Detection Head (MRHead) to form WL-Net, WM-Net, and LM-Net, respectively. WL-Net, combining C3k2_WDM and the Lightweight Dynamic Feature Pyramid (LD-FPN), achieved a 2.3% improvement in detection accuracy (mAP50) and reduced network parameter count and computational complexity. Similarly, WM-Net, by combining C3k2_WDM and the Hybrid Residual Detection Head (MRHead), also positively impacted network performance, improving mAP50 by 3.3% while reducing parameter count and computational complexity. Notably, WM-Net achieved detection accuracies (AP50) of 93.4% and 89.3% for crack and epidermal detachment damage, respectively, representing improvements of 9.9% and 1.5% compared to the baseline model. This demonstrates that the wavelet deformable module (WDM) designed in this invention can effectively extract features of slender damage such as cracks, while the hybrid residual detection head (MRHead) can effectively capture fine-grained feature information. The synergy between the two significantly improves the detection accuracy of these two types of damage. LM-Net, combining the lightweight dynamic feature pyramid (LD-FPN) and the efficient hybrid residual detection head (MRHead), reduces the number of network parameters and computational complexity by more than 10%. Furthermore, this design also improves the mAP50 to 83.8% on the dataset, achieving a performance gain of 2.9 percentage points. Finally, combining these three methods forms the wind turbine blade damage detection model WLM-Net, which achieves an mAP50 of 85.1% on the dataset, significantly improving the network's ability to detect damaged surfaces on wind turbine blades.

Claims

1. A wind turbine blade damage detection method for low-quality damaged images, characterized in that, Includes the following steps: S1, Constructing the dataset: Collect images of wind turbine blade damage by drone, and divide the blade image data of different damage types into training set, validation set and test set after data augmentation; S2, Model Construction: Design wavelet deformable module WDM, cross-stage local dual-core module C3k2, lightweight dynamic feature pyramid LD-FPN and hybrid residual detection head MRHead to construct a wind turbine blade damage detection model. The specific process of step S2 is as follows: S21: In the backbone network, a wavelet deformable module WDM and a cross-stage local dual-core module C3k2 are designed. The combination of wavelet deformable module WDM and C3k2 is denoted as C3k2_WDM. C3k2_WDM is used to extract features from global features and complex texture damage shapes. The Wavelet Deformable Module (WDM) employs a four-branch processing architecture to achieve collaborative extraction of multimodal features, including the first branch, the second branch, the third branch, and the fourth branch. The first branch uses depth wavelet separable convolution (DWTPConv) to extract feature information. It decomposes the input fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH through discrete wavelet transform (DWT). It uses a dynamic feature weighting mechanism to enhance the feature information of LL, LH, HL, and HH, and uses pointwise convolution to achieve linear combination across channels, thereby extracting feature information of different frequencies and spatial scales to obtain the eighteenth feature map. The second branch employs a dynamic receptive field adjustment technique, using convolutional kernels with learnable offset parameters to enable the network to adaptively capture the complex texture features of the damaged area, thus obtaining the nineteenth feature map. The third branch captures local spatial detail features through a standard convolution ConvBNSiLU with a kernel size of 3×3 and a stride of 1, resulting in the twentieth feature map. The fourth branch preserves global semantic information through identity mapping, resulting in the twenty-first feature map; Finally, the eighteenth, nineteenth, twentieth, and twenty-first feature maps are concatenated along the channel dimension and then recombined and nonlinearly mapped using a dual pointwise Gaussian error linear unit (DPGU) to generate the sixth feature map. The Dual Pointwise Gaussian Error Linear Unit (DPGU) is a feedforward structure with residual fusion. Specifically, the concatenated input features are divided into two paths: one is the residual branch, which directly performs identity mapping; the other is the main processing branch, which sequentially passes through two standard convolutional layers (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, a GELU activation function layer, and a dropout layer. The output of the main processing branch and the features of the residual branch are added element-wise to perform residual fusion, and the final sixth feature map is output. In the first branch of the Wavelet Deformable Module (WDM), the Depth Wavelet Separable Convolution (DWTPConv) processes the input fifth feature map through a two-step feature extraction process. Step 1: First, input a fifth feature map of size C1×H×W, where C1 is the number of channels, H is the height, and W is the width. Processing is done through two parallel branches: a fifth branch and a sixth branch. The fifth branch uses a 3×3 standard 2D convolutional layer (Conv2d) to extract features from the fifth feature map, resulting in the sixteenth feature map. The sixth branch uses Discrete Wavelet Transform (DWT) to decompose the fifth feature map into a low-frequency approximation component LL and three high-frequency detail components LH, HL, and HH. These four components constitute the four... The feature maps are designated as the tenth, eleventh, twelfth, and thirteenth feature maps, each with a size of C1×H / 2×W / 2. The ReShape module is used to adjust these four feature maps into a fourteenth feature map with a size of 4C1×H / 2×W / 2. Next, the fourteenth feature map is processed using a 3×3 standard 2D convolutional layer (Conv2d) and inverse wavelet transform (IWT) to obtain the fifteenth feature map. Finally, the fifteenth and sixteenth feature maps undergo residual fusion to output a seventeenth feature map with a size of C1×H×W. Step 2: Apply pointwise convolution to the seventeenth feature map output from Step 1 to perform a linear combination across channels, complete the adjustment of channel dimensions and feature fusion processing, and output the eighteenth feature map with C2 channels, H height and W width; S22: In the Neck network, construct a lightweight dynamic feature pyramid LD-FPN to fuse local and global feature information to identify and locate damage; In step S22, the Lightweight Dynamic Feature Pyramid LD-FPN is formed by fusing the Lightweight Linear Dilated Convolutional Module LDConv and the Cross-Stage Partial Channel-Spatial Grouped Convolutional Module VoV-GSCSPC. The feature extraction mechanism of the Lightweight Linear Dilated Convolutional Module LDConv is as follows: First, based on the number of parameters... N The initial approximately square sampling topology network is constructed based on the coordinate generation algorithm, and then predicted using a 2D standard convolutional layer Conv2d with a kernel size of 1×1. N The dynamic offset is superimposed with the initial coordinates to form an adaptive sampling position; finally, feature resampling of irregular regions is achieved based on bilinear interpolation, and multi-scale feature aggregation is completed through a feature recombination strategy; S23: In the Head network, a hybrid residual detection head MRHead is proposed to capture fine-grained features; In step S23, the hybrid residual detection head MRHead adopts a dual-branch architecture, including a regression branch and a classification branch; The regression branch first extracts features through two cascaded hybrid residual modules MRBlock, and then completes feature reconstruction through a two-dimensional standard convolutional layer Conv2d. The hybrid residual module MRBlock consists of a depthwise separable convolution DWConv and a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. It enhances feature learning ability through residual fusion operation. The feature map output by the two-dimensional standard convolutional layer Conv2d completes target localization through distributed focal loss DFL and complete intersection-union ratio loss CIOU. The classification branch includes two depthwise separable convolutional layers (DWConv), two standard convolutional layers (ConvBNSiLU) with 1×1 kernels and a stride of 1, and a two-dimensional standard convolutional layer (Conv2d). The network's learning ability for difficult samples is enhanced by the fully adaptive threshold focus loss function (ATFL). The formula for calculating the fully adaptive threshold focus loss function (ATFL) is as follows: ; in, This represents the current average predicted probability value. This indicates the predicted value for the next batch. S24: Construct the wind turbine blade damage detection model WLM-Net based on the YOLO framework; In step S24, the wind turbine blade damage detection model WLM-Net includes a backbone network, a neck network, and a head network. The backbone network consists of a first standard convolution, a second standard convolution, a first C3k2_WDM with the wavelet deformable module WDM set to False, a third standard convolution, a second C3k2_WDM with the wavelet deformable module WDM set to False, a fourth standard convolution, a third C3k2_WDM with the wavelet deformable module WDM set to True, and a fifth standard convolution, a fourth C3k2_WDM with the wavelet deformable module WDM set to True. The neck network comprises, in sequence, a Fast Spatial Pyramid Pooling (SPPF) module, a cross-stage local pyramid slice attention module (C2PSA), a first upsampling module, a first stitching module, a first cross-stage partial channel-spatial grouping convolution module, a second upsampling module, a second stitching module, a second cross-stage partial channel-spatial grouping convolution module, a first lightweight linear dilation convolution module, a third stitching module, a third cross-stage partial channel-spatial grouping convolution module, a second lightweight linear dilation convolution module, a fourth stitching module, and a fourth cross-stage partial channel-spatial grouping convolution module; wherein the second C3k2_WDM is connected to the second stitching module, the third C3k2_WDM is connected to the first stitching module, the cross-stage local pyramid slice attention module (C2PSA) is connected to the fourth stitching module, and the first cross-stage partial channel-spatial grouping convolution module is connected to the third stitching module; The input features of the Fast Spatial Pyramid Pooling (SPPF) module are first processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels. Then, they are input into three parallel cascaded 5×5 max pooling layers MaxPool. The pooled features are concatenated with the features before pooling along the channel dimension. Finally, the features are processed by a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimension transformation, and output multi-scale fused features. The cross-stage local pyramid slicing attention module C2PSA adopts a cross-stage local dual-branch architecture: the input features are first processed by a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then divided into two branches by the channel splitting module Split; one branch keeps the original features unchanged and directly passes them to the concatenation operation; the other branch passes through the position-sensitive attention module PSABlock to extract global attention information; the output features of the two branches are concatenated along the channel dimension, and then passed through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to complete channel fusion and dimensional transformation, and finally output features that have fused global attention information. The input features of the position-sensitive attention module PSABlock are first processed by the multi-head self-attention module Attention to extract attention weights, and then residually connected with the input features of the position-sensitive attention module PSABlock. Subsequently, the feature transformation and residual fusion are completed through the feedforward network module FFN, which sequentially passes through a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and a convolutional-batch normalization module ConvBN with a kernel size of 1×1 and a stride of 1, and outputs the enhanced attention features. The head network includes a first hybrid residual detection head, a second hybrid residual detection head, and a third hybrid residual detection head. The first hybrid residual detection head is connected to a second cross-stage partial channel-space grouping convolutional module, the second hybrid residual detection head is connected to a third cross-stage partial channel-space grouping convolutional module, and the third hybrid residual detection head is connected to a fourth cross-stage partial channel-space grouping convolutional module. S3, Model Training: The wind turbine blade damage detection model is trained using the training set; S4, Perform detection: Input the wind turbine blade damage images from the test set into the trained wind turbine blade damage detection model, and based on the inference results of the wind turbine blade damage detection model on the test set, save the weight file with the best performance and determine the structure of the final wind turbine blade damage detection model.

2. The wind turbine blade damage detection method for low-quality damage images according to claim 1, characterized in that, In step S21, the working process of C3k2_WDM is as follows: First, the input feature map is adjusted for the number of channels using a standard convolutional layer ConvBNSiLU with a kernel size of 1×1 and a stride of 1, resulting in the first feature map. The standard convolutional layer ConvBNSiLU is composed of a standard convolutional layer, a batch normalization layer, and a SiLU activation function layer connected in series. Then, the first feature map is divided into two parts along the channel dimension by the channel splitting module Split, resulting in the second feature map and the eighth feature map. The second feature map enters the feature extraction module. The Feature Extraction module performs different feature extractions based on the value of the Wavelet Deformable Module (WDM): When the WDM value is false, the Bottleneck module is used to extract shallow features, resulting in the third feature map. When the WDM value is true, a dual-branch structure containing the WDM is used to extract deep features, resulting in the seventh feature map. One branch of the dual-branch structure containing the WDM modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fourth feature map. The other branch first modifies the number of channels in the second feature map using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, resulting in the fifth feature map. Then, features are extracted using the WDM to obtain the sixth feature map. Finally, the fourth and sixth feature maps are concatenated and the number of channels is adjusted using a standard convolutional loop (ConvBNSiLU) with a kernel size of 1×1 and a stride of 1, outputting the seventh feature map. The eighth feature map is concatenated with the third or seventh feature map, and the number of channels is adjusted again by a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1. Finally, the integrated ninth feature map is output.

3. The wind turbine blade damage detection method for low-quality damage images according to claim 2, characterized in that, The cross-stage partial channel-spatial grouped convolutional module VoV-GSCSPC processes the input features through two parallel seventh and eighth branches; The seventh branch adjusts the number of channels using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1. The eighth branch first performs channel matching on the input features using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, and then inputs the processed features into the grouping shuffling bottleneck module GSBottleneckC for feature extraction. Finally, the features output by the seventh branch and the features output by the grouping shuffling bottleneck module GSBottleneckC from the eighth branch are concatenated along the channel dimension, and the number of channels is adjusted again using a standard convolutional module ConvBNSiLU with a kernel size of 1×1 and a stride of 1, outputting the final processed features of the cross-stage partial channel-space grouping convolutional module VoV-GSCSPC. The group shuffling bottleneck module GSBottleneckC comprises two group spatial convolutional modules GSConv with different kernel sizes and a depthwise separable convolutional module DWConv. The features input to the group shuffling bottleneck module GSBottleneckC are processed in two parallel paths. One path of feature information is extracted by the depthwise separable convolutional module DWConv, while the other path of feature information is extracted sequentially by the two group spatial convolutional modules GSConv. The features output from the two parallel processing paths are then subjected to residual fusion to output the final processed features of the group shuffling bottleneck module GSBottleneckC.

4. The wind turbine blade damage detection method for low-quality damage images according to claim 3, characterized in that, The grouped spatial convolution module GSConv processes the input features through two parallel ninth and tenth branches; The ninth branch first uses a standard convolution ConvBNSiLU with a kernel size of 1×1 and a stride of 1 to adjust the number of channels, and then uses a depthwise separable convolution DWConv to complete feature extraction; the tenth branch directly passes features through identity mapping; finally, the features output by the ninth and tenth branches are concatenated along the channel dimension, and the features are reorganized through the channel shuffle module to output the final processed features of the grouped spatial convolution module GSConv.