Wind turbine blade crack and icing soundprint visual bimodal diagnosis method and system

The wind turbine blade diagnostic method, which integrates voiceprint and visual dual-modal approaches, solves the problems of insufficient real-time performance and accuracy in existing technologies. It enables high-precision all-weather blade condition monitoring and early damage identification, and provides visualized damage location.

CN122196773APending Publication Date: 2026-06-12HUANENG TONGLIAO WIND POWER CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUANENG TONGLIAO WIND POWER CO LTD
Filing Date
2026-02-28
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing methods for monitoring and diagnosing wind turbine blades suffer from poor real-time performance, strong environmental dependence, low automation, and low diagnostic accuracy, particularly in early damage identification and damage localization.

Method used

A dual-modal diagnostic method for wind turbine blade cracks and icing was adopted, which uses a microphone array to collect acoustic signals and a high-definition camera to collect visual images. The method combines deep learning models for feature extraction and information fusion to achieve early identification and localization of blade cracks and icing.

Benefits of technology

It achieves high-precision all-weather blade condition monitoring, enabling early identification of cracks and icing, reducing false alarm and missed alarm rates, and providing visual location of damage, thus improving the reliability and automation of diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196773A_ABST
    Figure CN122196773A_ABST
Patent Text Reader

Abstract

The application discloses a wind turbine blade crack and icing voiceprint visual bimodal diagnosis method and system, comprising: feature extraction is performed on the pretreated original voiceprint signal, and crack and icing recognition is performed on the extracted voiceprint features by using a voiceprint deep learning model to obtain the recognition result and confidence of the voiceprint mode; feature extraction is performed on the pretreated original visual image data, and crack and icing recognition is performed on the extracted visual features by using a computer vision deep learning model to obtain the recognition result and confidence of the visual mode; information fusion is performed according to the recognition result and confidence of the voiceprint mode and the recognition result and confidence of the visual mode, and a diagnosis conclusion is generated according to the fusion result, and the method and system can realize intelligent diagnosis of the state of the wind turbine blade through multi-source information fusion.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of wind turbine condition monitoring and fault diagnosis technology, and relates to a visual dual-modal diagnostic method and system for wind turbine blade cracks and icing acoustic signatures. Background Technology

[0002] As a crucial component of clean energy, the safe and stable operation of wind power is of paramount importance. Wind turbine blades, as key components for capturing wind energy, are constantly exposed to complex and harsh natural environments, making them highly susceptible to damage such as surface cracks and icing. Blade cracks lead to stress concentration, which propagates continuously under alternating loads, ultimately causing blade breakage or even catastrophic accidents like tower collapse. Blade icing alters its aerodynamic shape, significantly reducing power generation efficiency and causing abnormal vibrations due to rotational mass imbalance, seriously threatening operational safety. Therefore, real-time, accurate online monitoring and early diagnosis of wind turbine blade conditions are of great significance for ensuring wind farm safety and improving operational efficiency.

[0003] Currently, monitoring and diagnostic technologies for wind turbine blades can be mainly categorized as follows: First, there's the traditional manual inspection and drone-based visual inspection method. This method relies on technicians using telescopes or high-definition cameras mounted on drones to conduct regular inspections or take photos, with manual interpretation of the images afterwards to identify damage. However, this method has significant drawbacks: First, it lacks real-time performance, failing to achieve continuous online monitoring around the clock and making it difficult to promptly detect sudden damage; second, it is severely constrained by environmental conditions, with the image quality of optical cameras drastically decreasing under adverse visibility conditions such as nighttime, fog, rain, or snow, leading to detection failure; third, it has low levels of automation and intelligence, heavily relying on human experience, resulting in high rates of missed detections and false alarms, and incurring high labor costs.

[0004] Second, there is the vibration analysis-based monitoring method. This method identifies structural anomalies by installing accelerometers at the blade root or on the main structure and analyzing their vibration signal characteristics. However, its disadvantages are as follows: First, this method is usually a contact measurement, requiring wiring to install sensors on the blades. This is difficult and costly to retrofit existing units, and the sensors themselves are easily damaged under blade rotation conditions, making maintenance difficult. Second, the vibration signals are not sensitive to early blade cracks, minor icing, and other localized damage, resulting in poor diagnostic timeliness. Often, the damage can only be identified after it has developed to a certain extent. Finally, the vibration signals are easily affected by the unit's operating conditions (such as speed and power) and external wind disturbances, making feature extraction and fault isolation difficult.

[0005] Third, monitoring methods based on acoustic signatures / acoustic emission. This method uses a microphone array installed on the nacelle or tower to collect airborne acoustic signature signals or acoustic emission signals generated by the blades during rotation. Although this method achieves non-contact measurement, existing technologies still have limitations: First, acoustic signature signals attenuate significantly during propagation in the air and are easily contaminated by environmental noise (such as wind, rain, and background industrial noise), resulting in a low signal-to-noise ratio and difficulty in extracting effective features; second, simple acoustic signature modes are insufficient for precise spatial localization and visualization of damage, making it impossible to intuitively determine the specific location of cracks and the extent of icing, which is detrimental to subsequent maintenance decisions.

[0006] Fourth, other non-destructive testing methods such as thermal imaging and ultrasound. Most of these methods require equipment shutdown or close-range operation, which cannot meet the real-time requirements of online monitoring. They are only suitable for periodic maintenance, not continuous condition monitoring.

[0007] In summary, existing single-modal monitoring methods (whether purely visual or purely acoustic) each have their own technical bottlenecks. Visual methods are limited by ambient light and weather, while acoustic methods are limited by ambient noise. Furthermore, both methods have shortcomings in terms of sensitivity for early diagnosis, accuracy of damage localization, and system reliability. Therefore, the industry urgently needs a new method and system that can overcome these shortcomings, integrate the advantages of multi-source information, and achieve all-weather, high-precision, and high-reliability intelligent diagnosis of wind turbine blade conditions. Summary of the Invention

[0008] The purpose of this invention is to overcome the shortcomings of the prior art and provide a dual-modal diagnostic method and system for wind turbine blade cracks and icing acoustic and visual imaging. This method and system can achieve intelligent diagnosis of wind turbine blade status through multi-source information fusion.

[0009] To achieve the above objectives, this invention discloses a dual-modal visual diagnostic method for wind turbine blade cracks and icing acoustic signatures, comprising: Acquire the raw acoustic signature signal and raw visual image data of the wind turbine; The original voiceprint signal and the original visual image data are preprocessed to obtain the preprocessed original voiceprint signal and the preprocessed original visual image data. Feature extraction is performed on the preprocessed raw voiceprint signal, and crack and icing identification is performed using a voiceprint deep learning model based on the extracted voiceprint features to obtain the voiceprint modality identification result and confidence level; feature extraction is performed on the preprocessed raw visual image data, and crack and icing identification is performed using a computer vision deep learning model based on the extracted visual features to obtain the visual modality identification result and confidence level. Information is fused based on the recognition results and confidence levels of the voiceprint modality and the recognition results and confidence levels of the visual modality, and a diagnostic conclusion is generated based on the fusion results.

[0010] Furthermore, a microphone array installed on the nacelle or tower of the wind turbine is used to collect high-speed airflow sound, friction sound and acoustic emission signals generated by the blades during rotation in a non-contact manner, forming the original acoustic signature signal.

[0011] Furthermore, high-definition cameras and infrared thermal imagers installed on the top of the nacelle or in a predetermined location are used to non-contactly acquire visible light and infrared thermal images of the blades, forming raw visual image data.

[0012] Furthermore, the original voiceprint signal is subjected to noise reduction, filtering, and enhancement processing to obtain a preprocessed original voiceprint signal.

[0013] Furthermore, the original visual image data is subjected to dehazing, enhancement, and cropping / alignment processing to obtain preprocessed original visual image data.

[0014] Furthermore, the diagnostic conclusions include the type of injury, severity, location information, and confidence level.

[0015] Furthermore, it also includes: visually displaying the final diagnostic conclusions through a human-computer interaction interface, uploading the early warning information to the wind farm's central monitoring system through a communication interface, and simultaneously triggering local audible and visual alarm devices.

[0016] This invention discloses a dual-modal visual diagnostic system for wind turbine blade cracks and icing acoustic signatures, comprising: The data acquisition module is used to acquire the raw acoustic signature signal and raw visual image data of the wind turbine. The data preprocessing and synchronization module is used to preprocess the original voiceprint signal and the original visual image data to obtain the preprocessed original voiceprint signal and the preprocessed original visual image data. The feature extraction and recognition module is used to extract features from the preprocessed raw voiceprint signal, and to use a voiceprint deep learning model to identify cracks and icing based on the extracted voiceprint features, thereby obtaining the recognition result and confidence level of the voiceprint modality; and to extract features from the preprocessed raw visual image data, and to use a computer vision deep learning model to identify cracks and icing based on the extracted visual features, thereby obtaining the recognition result and confidence level of the visual modality; The multimodal information fusion decision module is used to perform information fusion based on the recognition results and confidence levels of the voiceprint modality and the recognition results and confidence levels of the visual modality, and generate a diagnostic conclusion based on the fusion results.

[0017] This invention discloses a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the visual dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures.

[0018] This invention discloses a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures.

[0019] The present invention has the following beneficial effects: The dual-modal diagnostic method and system for wind turbine blade cracks and icing described in this invention effectively overcomes the limitations of a single modality, which is constrained by the environment (light, weather, noise), through the synergy and complementarity of acoustic and visual modalities. The visual modality can serve as the primary criterion when the acoustic signature is subject to strong noise interference, while the acoustic signature modality can serve as the primary criterion when vision is limited (e.g., at night, in foggy weather), significantly improving the system's all-weather operational capability and the reliability of the diagnostic results.

[0020] Furthermore, the acoustic signature mode is extremely sensitive to the high-frequency acoustic emission signals emitted by crack propagation, enabling early crack detection; the visual and infrared modes accurately identify surface temperature and shape changes caused by icing. The combination of these two modes achieves early and sensitive detection of both types of damage.

[0021] Furthermore, by combining microphone array sound source localization (beamforming) technology with pixel-level analysis of visual images, it is possible to spatially locate the specific blade and approximate segment where the damage occurred, and provide visual image evidence of the damage, which greatly facilitates subsequent maintenance and repair work.

[0022] Furthermore, by employing deep learning models for feature extraction and recognition, the reliance on human experience is reduced, enabling end-to-end automated intelligent diagnosis and significantly lowering the false negative and false positive rates. Attached Figure Description

[0023] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0024] Figure 1 This is a diagram showing the installation of the device according to the present invention; Figure 2 This is a flowchart of the method of the present invention; Figure 3 This is a system structure diagram of the present invention. Detailed Implementation

[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0026] In the description of this invention, it should be understood that the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0027] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0028] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Additionally, the character " / " in this invention generally indicates that the preceding and following objects have an "or" relationship.

[0029] It should be understood that although terms such as first, second, third, etc., may be used in the embodiments of the present invention to describe the preset range, these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from one another. For example, without departing from the scope of the embodiments of the present invention, the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.

[0030] Depending on the context, the word "if" as used here can be interpreted as "when," "when," "in response to determination," or "in response to detection." Similarly, depending on the context, the phrase "if determination" or "if detection (of the stated condition or event)" can be interpreted as "when determination," "in response to determination," "when detection (of the stated condition or event)," or "in response to detection (of the stated condition or event)."

[0031] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.

[0032] The accompanying drawings illustrate various structural schematic diagrams according to embodiments disclosed in this invention. These drawings are not to scale, and some details have been enlarged for clarity, and some details may have been omitted. The shapes of the various regions and layers shown in the drawings, as well as their relative sizes and positional relationships, are merely exemplary and may deviate from reality due to manufacturing tolerances or technical limitations. Furthermore, those skilled in the art can design regions / layers with different shapes, sizes, and relative positions as needed.

[0033] Example 1 refer to Figure 1 and Figure 2 The wind turbine blade crack and icing acoustic signature visual dual-modal diagnostic method of the present invention includes: Acquire the raw acoustic signature signal and raw visual image data of the wind turbine; The original voiceprint signal and the original visual image data are preprocessed to obtain the preprocessed original voiceprint signal and the preprocessed original visual image data. Feature extraction is performed on the preprocessed raw voiceprint signal, and crack and icing identification is performed using a voiceprint deep learning model based on the extracted voiceprint features to obtain the voiceprint modality identification result and confidence level; feature extraction is performed on the preprocessed raw visual image data, and crack and icing identification is performed using a computer vision deep learning model based on the extracted visual features to obtain the visual modality identification result and confidence level. Information is fused based on the recognition results and confidence levels of the voiceprint modality and the recognition results and confidence levels of the visual modality, and a diagnostic conclusion is generated based on the fusion results.

[0034] In this embodiment, a microphone array installed on the nacelle or tower of the wind turbine is used to collect high-speed airflow sound, friction sound and acoustic emission signals generated by the blades during rotation in a non-contact manner, forming the original acoustic signature signal.

[0035] In this embodiment, a high-definition camera and an infrared thermal imager installed on the top of the nacelle or at a preset location are used to non-contactly acquire visible light images and infrared thermal images of the blades to form raw visual image data.

[0036] In this embodiment, the original voiceprint signal is subjected to noise reduction, filtering and enhancement processing to obtain the preprocessed original voiceprint signal.

[0037] In this embodiment, the original visual image data is subjected to dehazing, enhancement, and cropping alignment processes to obtain preprocessed original visual image data.

[0038] In this embodiment, the diagnostic conclusion includes the type of injury, severity, location information, and confidence level.

[0039] In this embodiment, the method further includes: visually displaying the final diagnostic conclusion through a human-computer interaction interface, uploading the early warning information to the wind farm central monitoring system through a communication interface, and simultaneously triggering a local audible and visual alarm device.

[0040] Example 2 refer to Figure 3 The wind turbine blade crack and icing acoustic and visual dual-modal diagnostic system of the present invention includes the following steps: The data acquisition module includes an acoustic signature monitoring unit and a visual monitoring unit. The acoustic signature monitoring unit includes a microphone array installed on the nacelle or tower of the wind turbine, used to non-contactly collect high-speed airflow sound, friction sound, and acoustic emission signals generated by the blades during rotation, forming raw acoustic signature signals. The visual monitoring unit includes a high-definition camera and an infrared thermal imager installed on the top of the nacelle or at a preset position, used to non-contactly collect visible light images and infrared thermal images of the blades, forming raw visual image data.

[0041] Data preprocessing and synchronization module: electrically connected to the data acquisition module, used to perform noise reduction, filtering and enhancement processing on the original voiceprint signal, and to perform dehazing, enhancement and cropping alignment processing on the original visual image data; at the same time, this module has a built-in high-precision clock to give the voiceprint signal and visual image data a unified timestamp, so as to achieve high-precision synchronization of dual-modal data.

[0042] Feature extraction and recognition module: electrically connected to the data preprocessing and synchronization module, including a voiceprint analysis unit and an image analysis unit; the voiceprint analysis unit is used to extract time-domain, frequency-domain, and time-frequency-domain features from the preprocessed voiceprint signal, and to identify abnormal voiceprint events related to cracks or icing using a pre-trained voiceprint deep learning model; the image analysis unit is used to extract and analyze features from the preprocessed visual image using a pre-trained computer vision deep learning model, and to identify crack texture features, icing areas, and temperature anomaly features in the image.

[0043] Multimodal information fusion decision module: electrically connected to the feature extraction and recognition module, used to receive and fuse the recognition results and confidence levels from the voiceprint analysis unit and the image analysis unit; based on a preset decision-level fusion strategy, this module performs weighted analysis and comprehensive judgment on the bimodal diagnostic information, and generates a diagnostic conclusion that includes damage type, severity, location information and confidence level only when the bimodal evidence corroborates each other.

[0044] Diagnostic result output and early warning module: electrically connected to the multimodal information fusion decision module, used to visualize the final diagnostic conclusion through the human-computer interaction interface, and upload the early warning information to the wind farm central monitoring system through the communication interface, while triggering the local audible and visual alarm device.

[0045] When both the voiceprint modality and the visual modality diagnose the same type of damage and the confidence level is higher than the set threshold, the damage is finally diagnosed as that type. When only one modality diagnoses damage with extremely high confidence, while another modality fails to diagnose it effectively due to environmental interference (such as visual failure caused by nighttime or acoustic failure caused by strong wind noise), the key monitoring process for that blade is initiated, and cross-validation is attempted using auxiliary features from another modality. When the diagnostic results of the two modalities contradict each other, the result is considered uncertain and requires trend analysis based on historical data or waiting for the next diagnostic cycle for reconfirmation. Example 3 This diagnostic system is implemented using a 3MW horizontal axis wind turbine generator as an example.

[0046] 1) Data acquisition module; Acoustic signature monitoring unit: A linear array of four microphones is mounted on the top of the nacelle, facing the rotor's plane of rotation. The microphones are weather-resistant condenser microphones with a frequency response range of 20Hz to 20kHz and a maximum sampling rate of no less than 51.2kHz to ensure the capture of high-frequency acoustic emission signals (typically >1kHz) generated by blade cracks. The microphone array aperture is designed to be 0.5 meters to meet the requirements for locating the sound source from the blades.

[0047] Visual monitoring unit: A high-speed visible light camera and a long-wave infrared (LWIR) thermal imager are mounted side-by-side within the same weather-resistant protective housing and fixed to the top of the nacelle. The visible light camera has a resolution of at least 1920×1080 pixels and a frame rate of at least 30fps; the infrared thermal imager has a resolution of at least 640×512 pixels, a thermal sensitivity (NETD) of less than 40mK, and a frame rate of at least 25Hz. A telephoto lens is used to ensure clear imaging of the blade surface at distances of 50-150 meters.

[0048] 2) The data preprocessing and synchronization module is implemented by an industrial-grade edge computing device embedded in the cabin. Its preprocessing algorithms include: Voiceprint signal preprocessing: Noise reduction: Preliminary noise reduction is performed using spectral subtraction. A spectral estimate of the "background noise" without blade noise interference is set. For the spectrum of the input signal in each frame The signal spectrum after noise reduction The calculation is as follows:

[0049] in, (Over-reduction factor) is set to 1.2. (Spectral base parameter) is set to 0.1. (Gain factor) is set to 0.01 to handle residual background noise.

[0050] Filtering: A 100Hz ~ 8kHz bandpass filter is used to filter out low-frequency wind noise and high-frequency irrelevant noise.

[0051] Sound source localization (beamforming): A delay-summation beamforming algorithm is applied to the microphone array signal to initially determine the direction of the abnormal sound source (i.e., the number of the blade that may be damaged). For a certain direction... Its output power The calculation is as follows:

[0052] in, Let be the covariance matrix of the microphone array signal. For the corresponding direction The weighted vector is used to align the signal phase. This indicates the conjugate transpose.

[0053] Visual image preprocessing: Image enhancement: A contrast-limited adaptive histogram equalization algorithm is used for the visible light image, with parameters set to: Clip Limit=2.0, Tile Grid Size=8x8, to enhance the contrast between the crack and the background.

[0054] ROI Extraction and Alignment: By utilizing data such as blade pitch angle and rotational speed from the wind turbine SCADA system, combined with image recognition algorithms, the position of the blades in the image is predicted in real time, and the region of interest (ROI) of the blades is automatically cropped, which greatly reduces the amount of data for subsequent processing.

[0055] Infrared image calibration: The temperature data collected by the infrared thermal imager is calibrated in real time according to the ambient temperature and humidity to eliminate the influence of atmospheric attenuation and ensure the accuracy of temperature measurement.

[0056] Synchronization mechanism: The edge computing device has a built-in GPS module or receives the signal from the NTP time server inside the cabin, and adds a unified UTC timestamp with microsecond precision to each frame of voiceprint data and each frame of image data, which serves as the sole time reference for subsequent dual-modal data fusion.

[0057] 3) Implementation of the feature extraction and recognition module, which runs a pre-trained deep learning model on an edge computing device.

[0058] Voiceprint analysis unit: Feature extraction: From the preprocessed acoustic signal, a set of Mel-frequency cepstral coefficients (MFCCs) is calculated every 0.5 seconds. The specific parameters are: frame length 25ms, frame shift 10ms, pre-emphasis coefficient 0.97. The first 13 dimensions of coefficients and their first and second order differences are extracted, resulting in a total of 39 dimensions of features. At the same time, the voiceprint map within this time period is calculated as the input to the CNN.

[0059] Recognition Model: A one-dimensional convolutional neural network (1D-CNN) model is used to classify the MFCC feature sequence, while a two-dimensional CNN (such as ResNet-18) is used to classify the voiceprint image. The outputs of the two models are combined through a fusion layer to finally output the preliminary diagnostic results of the voiceprint modality (e.g., "no abnormality", "crack suspicion 0.85", "ice accretion suspicion 0.70") and their confidence scores.

[0060] Image analysis unit: Visible light image recognition: A semantic segmentation model based on the U-Net architecture is used to process the blade ROI image. This model has been trained on a large number of labeled blade crack images and can classify each pixel in the image, outputting a pixel-level segmentation map of the crack, thereby accurately calculating the length and area of ​​the crack.

[0061] Infrared image recognition: A CNN model (such as MobileNetV3) is also used to classify infrared ROI images and identify regional temperature anomalies caused by icing. The input of this model is the calibrated temperature matrix, and the output is the icing diagnosis result and confidence level.

[0062] 4) Implementation of the multimodal information fusion decision-making module; This module receives preliminary results from voiceprint and visual modalities (assuming the outputs are probability values ​​for three categories: "normal," "cracked," and "iced"), and performs decision-level fusion based on Dempster-Shafer Theory. The recognition framework is defined. ,in This indicates that it is normal. Represents cracks. Represents icing. Both the voiceprint modality and the visual modality provide a basic probability assignment (BPA) function for the proposition. and .

[0063] For example, voiceprint modal output: (This indicates a 75% certainty that it is a crack, and a 25% certainty that it is uncertain).

[0064] Visual modal output: .

[0065] Fusion was performed using Dempster's combination rules:

[0066] in, It is a normalization constant used to exclude conflicting evidence.

[0067] Calculate the above example:

[0068] Then for the proposition {crack}:

[0069] After fusion, the system was 90% confident that a crack existed in the leaf, far higher than the result of the single mode. Finally, the system generated a diagnostic conclusion: "A crack of about 15 cm in length exists about 3 meters behind the leaf root of leaf #2, with a confidence level of 90%", and triggered a "Level 1 warning".

[0070] 5) Implementation of the diagnostic result output and early warning module; This module uploads the aforementioned fusion diagnostic conclusions, original acoustic prints, visualized images (damage areas marked with red boxes), and processed data to the server at the wind farm's central monitoring center via 4G / 5G or fiber optic networks. The monitoring software's human-machine interface will issue alarms via pop-ups, sounds, etc., and generate detailed inspection reports to guide maintenance personnel in handling the issues.

[0071] Example 4 A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of a dual-modal diagnostic method for wind turbine blade cracks and icing based on acoustic signature and visual image data. For example, the method includes: acquiring raw acoustic signature signals and raw visual image data of the wind turbine; preprocessing the raw acoustic signature signals and raw visual image data to obtain preprocessed raw acoustic signature signals and preprocessed raw visual image data; extracting features from the preprocessed raw acoustic signature signals and using a deep learning acoustic signature model to identify cracks and icing based on the extracted acoustic signature features, obtaining the acoustic signature modality identification result and confidence level; extracting features from the preprocessed raw visual image data and using a deep learning computer vision model to identify cracks and icing based on the extracted visual features, obtaining the visual modality identification result and confidence level; fusing information based on the acoustic signature modality identification result and confidence level and the visual modality identification result and confidence level, and generating a diagnostic conclusion based on the fusion result. The memory may include main memory, such as high-speed random access memory (RAM), or non-volatile memory, such as at least one disk storage device. The processor, network interface, and memory are interconnected via an internal bus, which may be an industry-standard architecture bus, a peripheral component interconnection standard bus, or an extended industry-standard architecture bus. The bus can be categorized as an address bus, data bus, or control bus. The memory stores programs; specifically, the program may include program code, which includes computer operation instructions. The memory may include main memory and non-volatile memory, and provides instructions and data to the processor.

[0072] Example 5 A computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of a dual-modal diagnostic method for wind turbine blade cracks and icing based on acoustic signature and visual image data. For example, the method includes: acquiring raw acoustic signature signals and raw visual image data of the wind turbine; preprocessing the raw acoustic signature signals and raw visual image data to obtain preprocessed raw acoustic signature signals and preprocessed raw visual image data; extracting features from the preprocessed raw acoustic signature signals and using an acoustic signature deep learning model to identify cracks and icing based on the extracted acoustic signature features, obtaining the acoustic signature modality identification result and confidence level; extracting features from the preprocessed raw visual image data and using a computer vision deep learning model to identify cracks and icing based on the extracted visual features, obtaining the visual modality identification result and confidence level; fusing information based on the acoustic signature modality identification result and confidence level and the visual modality identification result and confidence level, and generating a diagnostic conclusion based on the fusion result. Specifically, the computer-readable storage medium includes, but is not limited to, volatile memory and / or non-volatile memory. The volatile memory may include random access memory (RAM) and / or cache memory, etc. The non-volatile memory may include read-only memory (ROM), hard disk, flash memory, optical disk, magnetic disk, etc.

[0073] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0074] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0075] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0076] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0077] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and disclosure of the invention. This application is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of the invention are indicated by the following claims.

[0078] It should be understood that the present invention is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

[0079] The above description is merely a preferred embodiment of the present invention and does not constitute any limitation on the present invention. Any simple modifications, alterations, or equivalent structural changes made to the above embodiments based on the technical essence of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures, characterized in that, include: Acquire the raw acoustic signature signal and raw visual image data of the wind turbine; The original voiceprint signal and the original visual image data are preprocessed to obtain the preprocessed original voiceprint signal and the preprocessed original visual image data. Feature extraction is performed on the preprocessed raw voiceprint signal, and crack and icing identification is performed using a voiceprint deep learning model based on the extracted voiceprint features to obtain the voiceprint modality identification result and confidence level; feature extraction is performed on the preprocessed raw visual image data, and crack and icing identification is performed using a computer vision deep learning model based on the extracted visual features to obtain the visual modality identification result and confidence level. Information is fused based on the recognition results and confidence levels of the voiceprint modality and the recognition results and confidence levels of the visual modality, and a diagnostic conclusion is generated based on the fusion results.

2. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, Microphone arrays installed on the nacelle or tower of a wind turbine are used to collect high-speed airflow noise, friction noise, and acoustic emission signals generated by the blades during rotation in a non-contact manner, forming raw acoustic signature signals.

3. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, High-definition cameras and infrared thermal imagers installed on the top of the nacelle or in a pre-set location are used to non-contactly acquire visible light and infrared thermal images of the blades, forming raw visual image data.

4. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, The original voiceprint signal is subjected to noise reduction, filtering and enhancement processing to obtain the preprocessed original voiceprint signal.

5. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, The original visual image data is subjected to dehazing, enhancement, and cropping / alignment processing to obtain preprocessed original visual image data.

6. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, The diagnostic conclusions include the type of injury, severity, location information, and confidence level.

7. The dual-modal diagnostic method for wind turbine blade cracks and icing acoustic signatures according to claim 1, characterized in that, It also includes: visually displaying the final diagnostic conclusions through a human-computer interaction interface, uploading the early warning information to the wind farm's central monitoring system through a communication interface, and simultaneously triggering local audible and visual alarm devices.

8. A dual-modal diagnostic system for wind turbine blade cracks and icing based on acoustic signature and visual characteristics, characterized in that, include: The data acquisition module is used to acquire the raw acoustic signature signal and raw visual image data of the wind turbine. The data preprocessing and synchronization module is used to preprocess the original voiceprint signal and the original visual image data to obtain the preprocessed original voiceprint signal and the preprocessed original visual image data. The feature extraction and recognition module is used to extract features from the preprocessed raw voiceprint signal, and to use a voiceprint deep learning model to identify cracks and icing based on the extracted voiceprint features, thereby obtaining the recognition result and confidence level of the voiceprint modality; and to extract features from the preprocessed raw visual image data, and to use a computer vision deep learning model to identify cracks and icing based on the extracted visual features, thereby obtaining the recognition result and confidence level of the visual modality; The multimodal information fusion decision module is used to perform information fusion based on the recognition results and confidence levels of the voiceprint modality and the recognition results and confidence levels of the visual modality, and generate a diagnostic conclusion based on the fusion results.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the visual dual-modal diagnostic method for wind turbine blade cracks and icing as described in any one of claims 1-7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the visual dual-modal diagnostic method for wind turbine blade cracks and icing as described in any one of claims 1-7.