A wind farm equipment fault diagnosis method based on neural network and digital twinning

By integrating multi-source heterogeneous data from wind farms using neural networks and digital twin technology, a hybrid model is constructed and dynamically adjusted, solving the problems of efficiency and accuracy in wind farm equipment fault diagnosis, and realizing real-time monitoring and efficient operation and maintenance of equipment.

CN122241359APending Publication Date: 2026-06-19HUADIAN NEW ENERGY XINJIANG MULEI NEW ENERGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUADIAN NEW ENERGY XINJIANG MULEI NEW ENERGY CO LTD
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for diagnosing wind farm equipment faults are inefficient and inaccurate, making it difficult to monitor in real time and comprehensively. They also cannot adapt to complex and changing equipment operating environments, and multi-source heterogeneous data are difficult to integrate and analyze effectively.

Method used

A fault diagnosis method based on neural networks and digital twins is adopted, which integrates multi-source heterogeneous data, constructs a hybrid model of physical-driven model and data-driven model, dynamically adjusts model parameters by combining reinforcement learning, builds a fault database, simulates equipment status through fault simulation data, and performs fault diagnosis by setting confidence thresholds based on matching probability statistics.

Benefits of technology

It enables accurate identification and rapid location of equipment faults in wind farms, improves equipment operational reliability, reduces operation and maintenance costs, and can track equipment status changes in real time, reducing equipment downtime and maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241359A_ABST
    Figure CN122241359A_ABST
Patent Text Reader

Abstract

This invention discloses a fault diagnosis method for wind farm equipment based on neural networks and digital twins. First, multimodal data from multiple equipment units and their sub-units during normal operation are collected and preprocessed. Then, physical-driven and data-driven models are constructed separately and combined to form a digital twin model. Through reinforcement learning, model parameters are dynamically adjusted based on real-time equipment operation and external environmental data. Fault simulation data is injected to construct a fault database, which is used to match fault locations and assign confidence levels when equipment malfunctions. A confidence threshold is set based on the matching probability, and the digital twin model parameters are dynamically adjusted to determine the diagnostic scheme. The multimodal data covers various aspects, including structural health monitoring, and preprocessing includes noise reduction and feature extraction. This method integrates multi-source data, improves diagnostic accuracy and real-time performance, effectively addresses complex fault scenarios, reduces operation and maintenance costs, and ensures stable operation of the wind farm.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of fault diagnosis technology, specifically to a method for fault diagnosis of wind farm equipment based on neural networks and digital twins. Background Technology

[0002] With the global demand for clean energy continuing to rise, wind farms, as an important component of renewable energy, are experiencing rapid growth in scale and number. Wind farm equipment, as the key carrier for converting wind energy into electricity, operates in complex and harsh natural environments, enduring the influence of multiple factors such as variable weather conditions, mechanical stress, and electrical loads. This leads to frequent equipment failures, seriously threatening the stable operation and economic benefits of wind farms.

[0003] Traditional methods for diagnosing wind farm equipment faults primarily rely on manual inspections and simple threshold-based judgments. Manual inspections are not only inefficient and resource-intensive, but also difficult to monitor in real-time and comprehensively due to the wide distribution of wind farm equipment, often resulting in delayed fault detection and missed opportunities for optimal maintenance. Simple threshold-based methods, on the other hand, rely too heavily on experience-based threshold settings, failing to adapt to the complex changes during equipment operation and prone to misdiagnosis and missed diagnosis. For example, when diagnosing wind turbine blade faults, relying solely on vibration amplitude thresholds may overlook other important characteristics such as vibration frequency and phase, leading to inaccurate fault diagnosis.

[0004] In recent years, some intelligent diagnostic technologies have begun to be applied to the field of wind farm equipment fault diagnosis. However, these technologies still face many challenges in practical applications. On the one hand, single data-driven models, such as deep learning models, while possessing powerful self-learning capabilities, rely excessively on large amounts of high-quality data for training. In actual wind farm operation, data quality varies greatly, with issues such as noise and missing values, which can easily lead to model overfitting, poor generalization ability, and difficulty in accurately diagnosing complex and ever-changing faults. On the other hand, while physical models are built based on the physical principles of the equipment and have the advantage of strong interpretability, the modeling process requires precise equipment parameters and complex physical equations, making it difficult to account for various uncertainties in equipment operation, such as component aging and environmental changes. This results in deviations between the model and the actual situation, limiting diagnostic accuracy.

[0005] The diverse and heterogeneous data generated by wind farm equipment, including structural health monitoring data, environmental and meteorological data, electrical and operational status data, image and video data, sound and acoustic data, and maintenance and log data, provides a wealth of information for fault diagnosis. However, effectively integrating and analyzing this multi-source data to uncover potential fault modes has become a key challenge in current wind farm equipment fault diagnosis. Furthermore, the fault modes of wind farm equipment are complex and diverse, and different faults may be interconnected and influence each other, making traditional diagnostic methods ill-suited to handle such complex fault scenarios.

[0006] In summary, existing wind farm equipment fault diagnosis technologies have many shortcomings and are insufficient to meet the growing demands for efficient and reliable operation of wind farms. Developing a fault diagnosis method that can comprehensively utilize multi-source heterogeneous data, adapt to complex fault scenarios, and improve diagnostic accuracy and real-time performance is urgently needed. This is of significant practical importance for ensuring the stable operation of wind farms, reducing operation and maintenance costs, and promoting the sustainable development of the wind power industry. Summary of the Invention

[0007] The technical problem to be solved by this invention is to provide a fault diagnosis method for wind farm equipment based on neural networks and digital twins, which overcomes the shortcomings of traditional methods, integrates multi-source heterogeneous data, accurately identifies faults, quickly locates and diagnoses faults, improves equipment operation reliability, and reduces operation and maintenance costs.

[0008] To address the aforementioned technical problems, embodiments of the present invention provide the following technical solution: a method for fault diagnosis of wind farm equipment based on neural networks and digital twins, comprising the following steps: According to the normal operation of wind farm equipment Multimodal data of N device units, and then the multimodal data of each unit are collected separately. The multimodal data of each subunit is processed, and the multimodal data is preprocessed; where N, M, and x are positive integers. Based on wind power generation equipment Each equipment unit and The operating principles of each sub-unit and multimodal data are used to construct physical-driven models and data-driven models respectively, and the physical-driven models and data-driven models are combined to build a digital twin model. External environmental data and real-time operation data of wind farm equipment are collected at various time periods, and the parameters of the digital twin model are dynamically adjusted using reinforcement learning. Inject the digital twin model Each equipment unit and The fault simulation data of each sub-unit is used to simulate the operating state of the equipment when different fault simulation data are injected, and a fault database is constructed. When wind farm equipment fails, the fault location is matched according to the fault database, and the confidence level of the corresponding fault index in the fault database is marked based on the accuracy of the fault location matching. The confidence threshold is set based on the matching probability statistics. The parameters of the digital twin model are dynamically and iteratively adjusted based on the confidence threshold. After adjustment, the index matching scheme that meets the confidence threshold will be used as the fault diagnosis scheme when the wind farm equipment fails.

[0009] Preferably, the multimodal data includes structural health monitoring data, including but not limited to vibration, stress, strain, and temperature; environmental and meteorological data, including but not limited to wind speed, wind direction, temperature, humidity, and air pressure; electrical and operational status data, including but not limited to voltage, current, power, rotational speed, and angle; image and video data, including but not limited to visible light images, infrared imaging, and laser point clouds; sound and acoustic data, including but not limited to sound and ultrasound; and maintenance and log data, including but not limited to maintenance records and SCADA system logs. The preprocessing of the multimodal data includes, but is not limited to, any combination of noise reduction, feature extraction, outlier correction, time alignment, data standardization, image enhancement, and structured processing.

[0010] Preferably, the physical driving model is constructed based on the principles of mechanical dynamics, thermodynamics, and electrical control of the corresponding unit or sub-unit; the data driving model is constructed using a hybrid neural network model, selecting corresponding feature extraction networks for different data types, including convolutional neural networks (CNN), recurrent neural networks (RNN) and their variants LSTM, GRU, and support vector machines (SVM).

[0011] Preferably, the step of hybrid modeling using physical-driven and data-driven models to construct a digital twin model specifically involves: The state results calculated by the physics-driven model are used as the input data for the data-driven model, and the abnormal patterns identified by the data model are used as the boundary conditions of the physics model. Calculate the residuals of the physics-driven model and the data-driven model, and assign weights to the residuals to fuse the physics-driven model and the data-driven model; The prediction results of the hybrid model are evaluated using an independent test dataset, evaluation metrics are calculated, and the fusion weights are dynamically adjusted based on the evaluation metrics.

[0012] Preferably, the hybrid neural network model employs a fusion of parallel channel construction and attention mechanisms, specifically: Multimodal data is preprocessed in different ways and then input into different branches, and feature alignment is performed. The output features of each branch are then subjected to dimensional transformation or time step processing. The attention weights are normalized to the [0, 1] interval using the Softmax function, and the features of each parallel channel are multiplied by their corresponding attention weights and then summed. After fusing the branches and attention mechanism modules built in parallel channels into a whole model, the loss is calculated and all network layer parameters are updated through backpropagation algorithm to balance the learning speed of each branch. An auxiliary loss function is added to each parallel channel branch. Each branch is pre-trained separately, and then the branches are combined and added to the attention mechanism fusion module to fine-tune the entire model.

[0013] Preferably, the hybrid neural network model is optimized using lightweight techniques, including pruning and quantization, wherein: (1) Pruning optimization includes: Determine the pruning criteria, calculate the absolute value of neuron connection weights based on the weight magnitude, set a threshold, and prune connections whose absolute weights are less than the threshold. Perform pruning operations, using structured pruning, removing parts that contribute little to model performance on a per-neuron, per-kernel, or per-channel basis, and adjusting the number of input channels in subsequent layers; After pruning, the model is retrained, and the weights of the remaining parameters are adjusted using the training dataset. The learning rate is reduced, and the model's performance metrics on the validation set are monitored. (2) Quantitative optimization includes: Uniform quantization is used to map continuous floating-point values ​​to a finite number of discrete integer values, and the quantization range and step size are set. Perform quantization operations: after the model training is completed, quantize the weights into integers according to the selected scheme and store them; After quantization, the model is fine-tuned by adjusting the model parameters using training data to compensate for quantization errors and by monitoring the performance metrics of the validation set.

[0014] Preferably, the external environment data and real-time operation data of the wind farm equipment are collected at various time periods, and the parameters of the digital twin model are dynamically adjusted using reinforcement learning, specifically as follows: Define a state space by using preprocessed wind farm equipment operation data and external environment data as state variables; Define the action space, which is set as the set of adjustable parameters in the digital twin model, including physical parameters in the physics-driven model and network weights and biases in the data-driven model. Initialize the reinforcement learning algorithm, select the Deep Q-Network (DQN) algorithm and set the hyperparameters of the algorithm, and determine the reward function based on the error between the predicted value and the actual observed value of the digital twin model; In each time period, the current state is input into the reinforcement learning algorithm, an action is selected according to the current policy, the adjustment value of the digital twin model parameters is determined, and the digital twin model with adjusted parameters is used to predict the operating status of wind farm equipment.

[0015] The reward value is calculated based on the prediction results and the actual observation values. The current state, action, reward and the next state are stored in the experience replay pool. When the data in the experience replay pool reaches a certain amount, a batch of data is randomly sampled from the pool to train the reinforcement learning algorithm and update the algorithm's policy network and value network.

[0016] Preferably, the injection of the digital twin model... Each equipment unit and The fault simulation data of each sub-unit is used to simulate the operating state of the equipment under different fault simulation data, and a fault database is constructed, specifically as follows: Fault simulation data generation is based on historical fault data of each unit and subunit of wind farm equipment, equipment design documents, and fault mechanism analysis to determine common fault types and their corresponding fault characteristic parameters. At the same time, fault simulation software is used to generate fault simulation data of different fault degrees and combinations based on the determined fault characteristic parameters. Data injection and operational status simulation: The generated fault simulation data is accurately injected into the digital twin model according to the correspondence between equipment units and sub-units. The digital twin model after injecting fault simulation data is used to simulate the operational status of wind farm equipment under different fault conditions. Design the structure of the fault database, including the field settings of the data tables, including fault type field, fault characteristic parameter field, equipment operating parameter field, and fault occurrence time field; enter the simulated fault data and equipment operating status data into the fault database one by one according to the designed database structure, and establish a data index and query mechanism.

[0017] Preferably, when wind farm equipment fails, the fault location is matched according to the fault database, and the confidence level of the corresponding fault index in the fault database is marked based on the accuracy of the fault location matching. Specifically: Generate fault keywords and use them as query conditions to perform exact and fuzzy matching in the existing fault database. For the matched fault records, extract the corresponding fault location information and perform automatic on-site fault collection and verification. The accuracy is matched based on the verification results, and the confidence level of the corresponding fault index in the fault database is marked according to the matching accuracy. The accuracy is proportional to the confidence level.

[0018] Preferably, the step of setting a confidence threshold based on matching probability statistics, dynamically adjusting the digital twin model parameters based on the confidence threshold, and then using the index matching scheme that meets the confidence threshold as the fault diagnosis scheme when wind farm equipment fails, specifically: Collect historical fault data of wind farm equipment and corresponding fault database matching results, record the matching situation of fault information with different fault indexes in the database, count the number of times each fault index is matched, and calculate the matching probability of each fault index based on the number of matching and the total number of fault samples. Based on the distribution of matching probabilities, calculate the mean and standard deviation of the matching probabilities, and use the mean plus the standard deviation as the confidence threshold. Analyze the fault characteristics and equipment operating status data corresponding to fault indices that do not meet the confidence threshold, and iteratively adjust the parameters related to the fault characteristics in the digital twin model based on the analysis results; Fault indices that meet the set confidence threshold are selected, and the matching schemes corresponding to the fault indices that meet the set threshold are used as fault diagnosis schemes.

[0019] The beneficial effects of the above-described technical solution of the present invention are as follows: 1. This invention comprehensively collects multimodal data from equipment, covering multiple aspects such as structural health monitoring, environmental meteorology, and electrical operation. Through preprocessing techniques, such as noise reduction and feature extraction, it effectively integrates this multi-source heterogeneous data, uncovers potential fault modes, and solves the problem that traditional methods struggle to fully utilize multi-source data.

[0020] 2. This invention combines a physical-driven model and a data-driven model to construct a digital twin model, integrating the advantages of both physical principles and data-driven approaches. The physical-driven model is based on the operating principles of the equipment, while the data-driven model leverages the powerful feature extraction capabilities of hybrid neural networks. These two approaches complement each other, overcoming the limitations of a single model. During fault diagnosis, by calculating residuals and fusing model results, and dynamically adjusting the fusion weights using independent test datasets, the accuracy of fault diagnosis is improved.

[0021] 3. This invention employs reinforcement learning to dynamically adjust the parameters of the digital twin model, continuously optimizing the model based on real-time equipment operating data and external environmental data. The reinforcement learning algorithm defines the state space and action space, selects the Deep Q-Network (DQN) algorithm and sets reasonable hyperparameters, and determines the reward function based on the error between the model's predicted values ​​and the actual observed values. As time progresses, the algorithm continuously learns the optimal parameter adjustment strategy, enabling the digital twin model to track equipment state changes in real time, promptly detect potential faults, and achieve real-time fault diagnosis and early warning.

[0022] 4. This invention constructs a fault database, simulating the operating state of equipment under different fault conditions by injecting fault simulation data. The database structure is rationally designed, including fields such as fault type, characteristic parameters, equipment operating parameters, and fault occurrence time, facilitating querying and analysis. During fault diagnosis, precise and fuzzy matching is performed by generating fault keywords, combined with on-site verification to mark the confidence level of the fault index, enabling it to handle complex and diverse fault scenarios and improve the reliability of fault diagnosis.

[0023] 5. This invention sets a confidence threshold based on matching probability statistics, selects reliable fault index matching schemes as diagnostic schemes, and iteratively adjusts the parameters of the digital twin model based on the analysis results. This approach optimizes the fault diagnosis process and improves diagnostic efficiency. Accurate fault diagnosis helps maintenance personnel quickly locate and resolve faults, reducing equipment downtime, maintenance costs, and production losses. Simultaneously, it allows for early detection of potential faults, enabling reasonable maintenance planning, avoiding unnecessary maintenance work, further reducing maintenance costs, and improving the economic benefits of wind farms. Attached Figure Description

[0024] Figure 1 This is a flowchart of the wind farm equipment fault diagnosis method based on neural networks and digital twins according to the present invention.

[0025] Figure 2 This is a schematic diagram of the wind farm equipment fault diagnosis method based on neural networks and digital twins of the present invention. Detailed Implementation

[0026] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0027] like Figure 1 As shown, this invention proposes a fault diagnosis method for wind farm equipment based on neural networks and digital twins, comprising the following steps: S1. According to the normal operation of wind farm equipment Multimodal data of N device units, and then the multimodal data of each unit are collected separately. The multimodal data of each subunit is processed, and the multimodal data is preprocessed; where N, M, and x are positive integers. S2, Based on wind power generation equipment Each equipment unit and The operating principles of each sub-unit and multimodal data are used to construct physical-driven models and data-driven models respectively, and the physical-driven models and data-driven models are combined to build a digital twin model. S3. Collect external environmental data and real-time operation data of wind farm equipment at various time periods, and dynamically adjust the parameters of the digital twin model using reinforcement learning; S4. Inject the digital twin model with the... Each equipment unit and The fault simulation data of each sub-unit is used to simulate the operating state of the equipment when different fault simulation data are injected, and a fault database is constructed. S5. When wind farm equipment fails, the fault location is matched according to the fault database, and the confidence level of the corresponding fault index in the fault database is marked based on the accuracy of the fault location matching. S6. Set a confidence threshold based on the matching probability statistics, dynamically iterate and adjust the parameters of the digital twin model based on the confidence threshold, and use the index matching scheme that meets the confidence threshold as the fault diagnosis scheme when the wind farm equipment fails.

[0028] In this embodiment, step S1 is based on the normal operation of the wind farm equipment. Multimodal data of N device units, and then the multimodal data of each unit are collected separately. Multimodal data of N sub-units. These N equipment units represent the main components of a wind farm's power generation system, such as wind turbine units, transmission system units, generator units, and converter units. Each equipment unit further comprises different sub-units, hence the use of... This indicates, for example, that a wind turbine unit includes... , , Sub-units, such as blades, hubs, and nacelles, are used to capture wind energy. Blades connect the blades to the drive system, the hub houses many critical pieces of equipment, and the drive system unit includes... , , , Sub-units, such as gearboxes, high-speed shafts, couplings, and bearings, respectively. Gearboxes enable the conversion of speed and torque, high-speed shafts transmit power, couplings connect different components, and bearings ensure the smooth rotation of components.

[0029] Multimodal data includes structural health monitoring data, including but not limited to vibration, stress, strain, and temperature; environmental and meteorological data, including but not limited to wind speed, wind direction, temperature, humidity, and air pressure; electrical and operational status data, including but not limited to voltage, current, power, speed, and angle; image and video data, including but not limited to visible light images, infrared imaging, and laser point clouds; sound and acoustic data, including but not limited to sound and ultrasound; and maintenance and log data, including but not limited to maintenance records and SCADA system logs. Multimodal data provides a rich and comprehensive source of information for wind farm equipment fault diagnosis. Each type of data reflects the operating status of the equipment from different perspectives and plays a crucial role in accurately diagnosing faults.

[0030] (1) Structural health monitoring data Vibration: During wind turbine operation, various components generate vibrations, such as blade rotation, gear meshing, and bearing operation. Under normal operating conditions, the vibration amplitude and frequency fluctuate within a certain range. When equipment malfunctions, such as blade cracks, gear wear, or bearing failure, the amplitude, frequency, and phase characteristics of the vibration will change. By monitoring and analyzing vibration data, potential faults can be detected in a timely manner. For example, by using an accelerometer to collect vibration signals and performing spectrum analysis, the main frequency components of the vibration can be determined, thereby identifying the type of fault.

[0031] Stress and Strain: Wind turbine components such as the tower and blades are subjected to various stresses and strains during operation. Stress concentration or excessive strain can lead to fatigue damage and cracking. By installing stress and strain sensors at critical locations, the stress on components can be monitored in real time. For example, by installing strain gauges at key sections of the tower, when the tower is subjected to excessive wind loads or other external forces, the strain gauges will sense changes in strain. By analyzing the stress and strain data, the structural health of the components can be assessed, and the remaining service life of the components can be predicted.

[0032] Temperature: During equipment operation, heat is generated due to friction, current heating, and other factors, causing the temperature to rise. Under normal operation, the temperature of various parts of the equipment remains within the normal range. When a malfunction occurs, such as motor overload or poor bearing lubrication, the temperature will rise abnormally. By monitoring the temperature of various parts of the equipment, signs of malfunctions can be detected promptly. For example, temperature sensors can be installed on motor windings, bearings, etc., to monitor temperature changes in real time. Once the temperature exceeds a set threshold, an early warning signal is issued, prompting maintenance personnel to inspect and repair the equipment.

[0033] (2) Environmental and meteorological data Wind speed and direction: Wind speed and direction are crucial factors affecting the power generation efficiency and operational safety of wind turbines. Excessive wind speed can cause turbine overspeeding and damage, while insufficient wind speed will affect power generation efficiency. Changes in wind direction subject the turbine blades to forces in different directions, and prolonged uneven stress can lead to blade fatigue damage. Installing anemometers allows for real-time monitoring of wind speed and direction changes, providing vital information for turbine control and fault diagnosis. For example, when the wind speed exceeds the turbine's rated speed, the control system automatically adjusts the blade angle to ensure safe operation.

[0034] Temperature, humidity, and air pressure: Changes in ambient temperature, humidity, and air pressure can affect equipment performance and lifespan. For example, high temperatures accelerate the aging and wear of equipment components; high humidity can easily cause electrical equipment to become damp, leading to short circuits; and changes in air pressure can affect the sealing performance of equipment. By monitoring ambient temperature, humidity, and air pressure, preventative measures can be taken to prevent equipment failures. For instance, installing temperature and humidity sensors in the electrical equipment room can activate dehumidification equipment when the humidity exceeds a set range, maintaining a dry operating environment for the equipment.

[0035] (3) Electrical and operational status data Voltage, current, and power: Voltage, current, and power are important parameters reflecting the operating status of an electrical system. When electrical equipment malfunctions, such as a short circuit, open circuit, or overload, voltage, current, and power will change abnormally. By monitoring these parameters, electrical faults can be detected in a timely manner. For example, by installing voltage transformers and current transformers, voltage and current signals can be collected, and power can be calculated. When the current suddenly increases and the power changes abnormally, it may indicate that the equipment has an overload or short circuit fault.

[0036] Rotational speed and angle: Parameters such as blade rotational speed, gearbox rotational speed, and generator rotational speed reflect the operating status of the equipment. Abnormal rotational speed may be caused by equipment failure, load changes, or other reasons. Angle parameters such as blade angle and yaw angle are also crucial for the normal operation of the wind turbine. By monitoring rotational speed and angle, it is possible to determine whether the equipment is operating normally. For example, a sudden drop in blade rotational speed may indicate that the blades are obstructed by foreign objects or that there is a malfunction in the transmission system.

[0037] (4) Image and video data Visible light images: Visible light images can visually display the appearance of equipment, such as whether blades have cracks or deformation, whether towers are corroded or tilted, and whether there are foreign objects on the equipment surface. By installing cameras and periodically taking visible light images of the equipment, and using image processing technology for analysis, appearance defects can be detected in a timely manner. For example, image recognition algorithms can be used to detect cracks on the blade surface, and the development of cracks can be monitored by comparing images from different periods.

[0038] Infrared Imaging: Infrared imaging technology can detect the heat generation of equipment. Since equipment malfunctions are often accompanied by temperature changes, capturing infrared images of the equipment with an infrared imager allows for a direct view of the temperature distribution in different parts of the equipment, thereby identifying potential faults. For example, faults such as short circuits in motor windings or overheating of bearings can cause localized temperature increases, which appear as abnormal hot spots on the infrared image. By analyzing the infrared images, the fault location can be quickly determined, allowing for timely repairs.

[0039] Laser point cloud technology can acquire three-dimensional structural information of equipment. By processing and analyzing the laser point cloud data, the size, shape, and positional relationships of equipment components can be accurately measured. In wind farm equipment fault diagnosis, laser point cloud technology can be used to detect blade deformation, tower verticality, etc. For example, by laser scanning of the blades to obtain three-dimensional point cloud data, and comparing it with the design model, the amount of blade deformation can be accurately calculated, and the health status of the blades can be assessed.

[0040] (5) Sound and acoustic data Sound and Ultrasound: Equipment emits various sounds during operation, and the sounds under normal operating conditions have certain characteristics. When equipment malfunctions, such as gear wear, bearing failure, or loose parts, the frequency, amplitude, and timbre of the sound will change. By installing microphones or acoustic sensors, sound signals from the equipment during operation can be collected, and sound analysis technology can be used to determine if the equipment is malfunctioning. For example, using voiceprint recognition technology, the sounds under normal operating and fault conditions can be modeled and compared. When a sound characteristic matches the fault model, a fault warning is issued. Ultrasonic testing technology can be used to detect internal defects in equipment, such as cracks and holes. Because ultrasonic waves have different propagation characteristics in different media, when ultrasonic waves encounter defects, they undergo reflection, refraction, and scattering. By analyzing changes in ultrasonic signals, the location and size of internal defects can be detected.

[0041] (6) Maintenance and log data Maintenance Records: Maintenance records detail the equipment's maintenance history, including maintenance time, maintenance content, and replaced parts. Analyzing these records allows us to understand the equipment's maintenance status, determine whether maintenance is being performed according to the prescribed schedule, and assess the quality of maintenance work. For example, if a piece of equipment frequently requires the replacement of the same part, it may indicate a design or quality issue with that part, necessitating further analysis and improvement.

[0042] SCADA System Logs: SCADA (Supervisory Control and Data Acquisition) system logs record a large amount of data, including equipment operating parameters, alarm information, and operation records. Analyzing SCADA system logs allows us to trace the equipment's operating history, understand its operating status before a failure, and review operator records. This is crucial for fault diagnosis and root cause analysis. For example, when equipment malfunctions, checking the SCADA system logs can reveal whether the equipment's operating parameters were normal before the failure, whether there were any abnormal alarms, and whether the operator performed any improper actions.

[0043] Step S1 further preprocesses the multimodal data, including but not limited to noise reduction, feature extraction, outlier correction, time alignment, data standardization, image enhancement, and any combination thereof, or structuring processing. Multimodal data preprocessing is a crucial step in wind farm equipment fault diagnosis; it improves data quality, extracts key information, and provides strong support for subsequent accurate diagnosis. Noise Reduction: Wind farm environments are complex, and equipment sensors are susceptible to electromagnetic interference and mechanical vibration, resulting in noisy data. Noise can interfere with the judgment of the actual operating status of the equipment and reduce data reliability. Filtering algorithms are common noise reduction methods. For example, Kalman filtering utilizes the system state equation and observation equation, and through prediction and update processes, it makes optimal estimates of noisy measurement data, effectively removing noise. Wavelet filtering, based on wavelet transform, decomposes the signal into different frequency sub-bands and removes the coefficients of the sub-bands containing noise through thresholding, achieving noise reduction. During vibration data acquisition, electromagnetic interference may cause high-frequency noise in the vibration signal. Wavelet filtering can smooth the vibration curve and accurately reflect the actual vibration of the equipment.

[0044] Feature extraction: Raw multimodal data is massive and contains a lot of redundant information, making direct use for fault diagnosis inefficient and inaccurate. Feature extraction aims to extract the most representative and discriminative key features from complex data, which can effectively reflect the equipment's operating status and fault characteristics. For vibration data, statistical features such as amplitude, frequency, and phase are calculated. For example, vibration amplitude may increase during a fault, and certain frequency components may show anomalies. Image data can utilize edge detection and texture analysis algorithms to extract features. For instance, the Canny edge detection algorithm can accurately delineate the edge contours of equipment components, helping to detect anomalies such as cracks and deformations. Through feature extraction, data dimensionality can be reduced, improving the training speed and diagnostic accuracy of subsequent fault diagnosis models.

[0045] Outlier Correction: During data acquisition, outliers may occur due to sensor malfunctions, communication failures, or sudden external interference. These outliers deviate from the normal data range, and if left unaddressed, they can severely impact the accuracy and reliability of data analysis results. Statistical methods such as the 3σ principle can be used to identify outliers. This principle is based on the assumption of a normal distribution, and data values ​​exceeding the mean plus or minus three standard deviations are considered outliers. Once outliers are identified, corrections can be made based on data trends, historical data, or data from other relevant sensors at the same time. For example, if a temperature sensor collects an abnormally high temperature value, and inspection reveals a temporary sensor malfunction, the outlier can be reasonably corrected by referring to temperature data from other parts of the same equipment and the historical temperature trends of that part, ensuring the accuracy of the temperature data.

[0046] Time Alignment: Multimodal data in wind farms is collected by different sensors. Due to differences in sensor sampling frequency, clock accuracy, etc., the data may be out of sync in time. Out-of-sync data cannot accurately reflect the true correlation between different equipment states, affecting the accuracy of fault diagnosis. Time alignment is achieved by using key equipment operation events (such as wind turbine start-up and shutdown) or high-precision clocks as benchmarks, employing methods such as interpolation and resampling. For example, for vibration data with a low sampling frequency and electrical data with a high sampling frequency, the vibration data can be interpolated to match the time interval of the electrical data; or high-frequency data can be resampled to reduce its sampling frequency and match it with low-frequency data, thereby ensuring accurate temporal correspondence between data from different data sources and providing a reliable basis for comprehensive analysis.

[0047] Data Standardization: Multimodal data contains various types, with significant differences in units and numerical ranges. For example, voltage data is measured in volts and has a large numerical range, while temperature data is measured in degrees Celsius and has a relatively small numerical range. This difference can lead to unreasonable weight allocation during model training, affecting the model's convergence speed and diagnostic performance. Normalization and Z-score standardization are employed. Normalization maps the data to the [0, 1] interval. After standardization, different types of data become comparable, effectively improving model training performance and fault diagnosis accuracy.

[0048] Image Enhancement: Wind farm environments are complex and variable, and image and video data acquisition is easily affected by factors such as lighting and weather, leading to a decline in image quality. For example, insufficient light blurs visible light images, and fog reduces image clarity. Image enhancement aims to improve image quality through various algorithms, highlighting key information in the image to facilitate observation and analysis of equipment status. Histogram equalization algorithms adjust the image's grayscale histogram to make the grayscale distribution more uniform and enhance image contrast; contrast enhancement algorithms directly adjust the image's contrast, making the details of equipment components clearer. Through image enhancement, abnormalities in equipment can be detected more accurately from images, such as tiny cracks on blade surfaces and corrosion marks on equipment surfaces.

[0049] Structured processing: Maintenance records and log files are often unstructured or semi-structured data, containing a large amount of useful information, but their formats are not standardized, making them difficult to use directly for data analysis. Structured processing transforms this data into a structured form through data parsing, classification, and annotation. For example, it can be organized into tables containing fields such as maintenance time, equipment components, maintenance content, and maintenance personnel, or events in log files can be classified and annotated to give them a clear structure. After structured processing, the data is easier to store in a database, facilitating subsequent queries, statistics, and analysis. It also enables the rapid extraction of key information related to fault diagnosis from large amounts of data, improving fault diagnosis efficiency.

[0050] In this embodiment, step S2, which constructs the physical driving model, is specifically constructed based on the principles of mechanical dynamics, thermodynamics, and electrical control of the corresponding unit or sub-unit. The data driving model is constructed using a hybrid neural network model, selecting corresponding feature extraction networks for different data types, including convolutional neural networks (CNN), recurrent neural networks (RNN) and their variants LSTM, GRU, and support vector machines (SVM).

[0051] (1) Physically driven model, built according to the principles involved in the operation of each unit or sub-unit, including, for example: Mechanical dynamics principles: For wind turbine units and their sub-units, such as blade rotation, based on Newton's laws of motion and other mechanical dynamics knowledge, considering aerodynamics, inertial forces, gravity, etc., the motion equation of the blade is constructed, and its vibration, stress and strain under different wind speeds are analyzed to determine whether the blade has fatigue or damage risks. For transmission system units, based on gear meshing principles, shaft torsional vibration theory, etc., the dynamic model of gearbox and shaft system is constructed to monitor gear wear, shaft failures, etc.

[0052] Thermodynamic principles: In the generator unit, the generator generates heat due to resistance and other factors during operation. Using the heat transfer principle in thermodynamics, considering heat conduction, convection and radiation, a temperature field model of the generator is established to predict the temperature changes of each component and avoid failures caused by overheating. When the converter unit is working, the power module also generates heat. A heat dissipation model is constructed using thermodynamic principles to ensure that it operates within a suitable temperature range.

[0053] Electrical control principles: Based on power electronics technology and automatic control principles, the converter unit establishes its control model and analyzes how to adjust the converter's control strategy according to grid parameters and generator output to achieve high-quality power output; the generator unit combines electrical principles such as the law of electromagnetic induction to construct an electrical model of the power generation process to ensure the stability and efficiency of power generation.

[0054] (2) Data-driven model: A hybrid neural network model is used to match the corresponding feature extraction network for different types of data. Convolutional Neural Networks (CNNs) are suitable for processing image and video data, such as visible light images of wind turbine blades and infrared images of equipment. Convolutional layers in CNNs automatically extract features such as edges and textures from images, pooling layers reduce data dimensionality, and fully connected layers perform feature classification, thereby identifying anomalies in equipment within images, such as cracks in blades or overheated areas.

[0055] Recurrent Neural Networks (RNNs) and their variants (LSTM, GRU): Suitable for processing data with time-series characteristics, such as vibration, stress-strain, temperature, voltage, and current data that change over time. RNNs can handle dependencies in sequential data, while LSTM and GRU solve the gradient vanishing problem in RNNs, and can better capture key information in long sequences, used to predict changes in equipment operating status and detect potential faults in advance.

[0056] Support Vector Machine (SVM): Applicable to various data types, it classifies and regresses data by finding the optimal hyperplane. In wind farms, SVM can effectively distinguish between normal and faulty states of equipment based on multimodal data after feature extraction; for example, it can determine whether wind turbine components are malfunctioning based on vibration and sound characteristic data.

[0057] Then, the physical-driven model and the data-driven model are combined to construct a digital twin model, specifically: The results calculated by the physics-driven model serve as input data for the data-driven model, while the anomaly patterns identified by the data model are used as boundary conditions for the physics model. The physics-driven model calculates the state results of equipment units or sub-units, such as stress, temperature, and voltage values, based on principles of mechanical dynamics, thermodynamics, and electrical control. These results are provided as data to the data-driven model to help it better learn the normal and abnormal state characteristics of the equipment. For example, the physics-driven model calculates the stress distribution of wind turbine blades under specific operating conditions; the data-driven model can combine this stress data with other multimodal data for training and fault mode identification. The data-driven model analyzes multimodal data using hybrid neural networks to identify abnormal patterns in the equipment. These abnormal patterns are fed back to the physics-driven model as new boundary conditions. For instance, if the data-driven model detects an abnormal fluctuation pattern in generator current, it inputs this as a boundary condition into the physics-driven model, allowing the model to recalculate the generator's operating state under abnormal conditions.

[0058] The process involves calculating the residuals of the physics-driven model and the data-driven model, assigning weights to these residuals, and then fusing the two models. After calculating and predicting the same equipment state using both models, the difference between the results is the residual. These residuals are assigned appropriate weights based on the characteristics and importance of different data types. For example, residuals for vibration data are given higher weights because they are more sensitive to equipment failures; residuals for ambient temperature data are given lower weights because their direct impact on equipment failures is relatively smaller. By adjusting the weights, the results of the two models are fused to obtain a more accurate assessment of the equipment state. The prediction results of the hybrid model are evaluated using independent test datasets, and evaluation metrics are calculated. The fusion weights are then dynamically adjusted based on these metrics. The fused hybrid model is then tested using independent test datasets. Common evaluation metrics include accuracy, recall, and mean squared error (MSE). For fault diagnosis, accuracy measures the proportion of times the model correctly identifies a normal or faulty equipment state; recall indicates the model's ability to correctly identify faults; and MSE is used to assess the degree of deviation between the model's predicted values ​​and the actual values. Based on the calculated evaluation metrics, the weights of the fusion of the physical-driven model and the data-driven model are dynamically adjusted. If the hybrid model is found to have low accuracy in diagnosing certain types of faults, it may be due to inappropriate weights for a particular model. In this case, the weights of the residual data related to that type of fault in the corresponding model are increased to optimize model performance and make it perform better in subsequent diagnoses.

[0059] In addition, the hybrid neural network model in step S2 employs a fusion of parallel channel construction and attention mechanisms, specifically as follows: Multimodal data from wind farms (such as vibration, images, and electrical parameters) have different characteristics. Different preprocessing methods, such as denoising and feature extraction, are used before the processed data is input into the corresponding branch networks. For example, vibration data is input into one branch after denoising and frequency domain feature extraction, while image data is input into another branch after image enhancement and edge feature extraction. Different modal data may have different feature dimensions and time steps, requiring feature alignment. For time-series data (such as vibration and electrical parameters), interpolation or resampling is used to ensure consistent time steps. For features with different dimensions, padding or dimensionality reduction is used to ensure comparability of features across branches. The features output from each branch may require further processing. For example, dimensionality transformation can be performed using fully connected or convolutional layers to adapt to the computational needs of subsequent models; for time-series features, time step compression or expansion operations may be performed to extract more effective time-related features.

[0060] Next, attention mechanism fusion is performed. The Softmax function is used to normalize the attention weights to the [0, 1] interval, so that each weight represents the relative importance of that channel's feature within the overall context. For example, when fusing vibration and temperature data features, their respective attention weights are calculated based on the equipment's operating characteristics and historical data. The features of each parallel channel are multiplied by their corresponding attention weights and then summed to achieve multimodal feature fusion. This approach highlights the features of important modes while also considering information from other modes, making the fused features more accurately reflect the true state of the equipment.

[0061] Finally, the branches constructed from the parallel channels and the attention mechanism fusion module are treated as a single model. The loss between the predicted results and the true labels is calculated using backpropagation, and the parameters of all network layers are updated accordingly. Balancing the learning speed of each branch is crucial in this process, preventing any branch from overlearning or underlearning, and ensuring the model fully utilizes the information from the multimodal data. An auxiliary loss function is then added to each parallel channel branch, and each branch is pre-trained separately to learn the features of its corresponding modality. Finally, the branches are combined and added to the attention mechanism fusion module to fine-tune the entire model, further optimizing its performance under multimodal data fusion and improving the accuracy of fault diagnosis.

[0062] In addition, lightweight optimization techniques were used to optimize the hybrid neural network model, including pruning and quantization, among which: (1) Pruning optimization includes: Determine the pruning criteria, calculate the absolute value of neuron connection weights based on the weight magnitude, set a threshold, and prune connections whose absolute weights are less than the threshold. Perform pruning operations, using structured pruning, removing parts that contribute little to model performance on a per-neuron, per-kernel, or per-channel basis, and adjusting the number of input channels in subsequent layers; After pruning, the model is retrained, and the weights of the remaining parameters are adjusted using the training dataset. The learning rate is reduced, and the model's performance metrics on the validation set are monitored.

[0063] In neural networks, the connection weights between neurons reflect the importance of information transmission. Weight-based pruning methods measure the importance of each neuron's connection weight by calculating its absolute value. A larger absolute weight value indicates a potentially greater role for the connection in model learning and prediction; conversely, smaller absolute weight values ​​contribute less to model performance. A suitable threshold is set based on the model's complexity and performance requirements. This threshold acts like a pair of "scissors," determining which connections can be pruned. For example, a relatively simple fault diagnosis task might require a higher threshold to prune more unimportant connections; while for complex diagnostic scenarios, a lower threshold is used to retain more information. Determining pruning criteria can initially screen for connections with minimal impact on model performance, providing a basis for subsequent pruning operations and thus reducing the number of model parameters and computational cost without significantly decreasing model performance.

[0064] (2) Quantitative optimization includes: Uniform quantization is used to map continuous floating-point values ​​to a finite number of discrete integer values, and the quantization range and step size are set. Perform quantization operations: after the model training is completed, quantize the weights into integers according to the selected scheme and store them; After quantization, the model is fine-tuned by adjusting the model parameters using training data to compensate for quantization errors and by monitoring the performance metrics of the validation set.

[0065] Uniform quantization maps continuous floating-point values ​​to a finite number of discrete integer values. First, the quantization range is defined, i.e., the maximum and minimum values ​​of the original floating-point number are determined. Then, the step size is determined based on the number of quantization bits; the step size is equal to the quantization range divided by (number of quantization bits - 1). For example, for 8-bit quantization, floating-point numbers are mapped to the integer range of 0-255. Properly setting the quantization range and step size is crucial for maintaining model performance. If the quantization range is too large, it may lead to increased quantization error; if the range is too small, it may not adequately represent the variations in the original data. The choice of step size also affects the quantization accuracy and needs to be optimized based on the characteristics of the model and data. Uniform quantization can convert floating-point parameters in the model into integer representations, greatly reducing data storage and computational requirements, while also reducing computational complexity to some extent and improving model efficiency.

[0066] In this embodiment, step S3 involves collecting external environmental data and real-time operational data of the wind farm equipment at various time periods, and dynamically adjusting the parameters of the digital twin model using reinforcement learning. Specifically: (1) Define the state space Data Selection: Wind farm equipment operation data and external environmental data contain a wealth of information. Preprocessed data can more accurately reflect the equipment's operating status and environmental impact. Equipment operation data includes electrical parameters (such as voltage, current, and power) and mechanical parameters (such as speed and vibration), directly reflecting the equipment's working status. External environmental data includes wind speed, wind direction, temperature, and humidity, which affect equipment performance and operational stability. For example, excessively high wind speeds may cause excessive pressure on the wind turbine blades, affecting power generation efficiency and equipment lifespan.

[0067] State Variable Construction: These data are used as state variables to construct a state space. The state space comprehensively describes the operating environment and state combinations of wind farm equipment at different times, providing a decision-making basis for reinforcement learning algorithms. Based on the current state variable values, the algorithm can understand the real-time status of the equipment and then make reasonable decisions to adjust the parameters of the digital twin model.

[0068] (2) Determine the action space Parameter selection: In digital twin models, the physical parameters of the physical-driven model (such as the friction coefficient and heat exchange coefficient of mechanical components) and the network weights and biases of the data-driven model have a significant impact on the model's prediction results. By adjusting these parameters, the model can be made to better reflect the actual operating conditions of the equipment. For example, changing the friction coefficient of the gearbox can adjust the simulation of the gearbox's operating state by the physical-driven model; adjusting the weights and biases of the neural network can optimize the feature learning and prediction capabilities of the data-driven model.

[0069] Action Space Construction: These adjustable parameters are defined as the action space. During reinforcement learning, the algorithm selects appropriate actions (i.e., parameter adjustments) from the action space to optimize the digital twin model. The action space provides multiple possibilities for adjusting model parameters, and the reinforcement learning algorithm must find the optimal adjustment strategy within it.

[0070] (3) Initialize the reinforcement learning algorithm Algorithm Selection: The Deep Q-Network (DQN) algorithm was chosen. It combines deep learning and Q-learning, effectively handling high-dimensional state spaces and complex decision-making problems. DQN uses neural networks to estimate Q-values ​​and breaks down correlations between data through an empirical replay mechanism, improving learning efficiency and stability.

[0071] Hyperparameter settings: Setting algorithm hyperparameters, such as the learning rate which determines the step size of each parameter update, and the discount factor which reflects the importance attached to future rewards. Appropriate hyperparameter settings are crucial to the algorithm's convergence speed and performance. For example, an excessively large learning rate may cause the algorithm to oscillate during training and fail to converge; an excessively large discount factor may cause the algorithm to focus too much on future rewards and ignore current rewards.

[0072] Reward function determination: The reward function is determined based on the error between the digital twin model's predicted values ​​and the actual observed values. A small prediction error indicates a good model simulation, resulting in a positive reward to encourage the algorithm to continue taking similar actions; a large prediction error results in a negative reward, prompting the algorithm to adjust its actions. The reward function guides the algorithm to find the optimal parameter adjustment strategy, making the digital twin model's predicted values ​​closer to the actual observed values.

[0073] (4) Model prediction and data processing Action selection and model adjustment: In each time period, the current state is input into the reinforcement learning algorithm. The algorithm selects an action based on the current policy and determines the adjustment values ​​for the digital twin model parameters. Then, the adjusted digital twin model is used to predict the operating status of wind farm equipment.

[0074] Reward Calculation and Data Storage: The reward value is calculated based on the prediction results and actual observations. The reward value reflects the accuracy of the model's prediction. The current state, action, reward, and next state are stored in the experience replay pool. The experience replay pool stores historical data, breaking the temporal correlation between data and enabling the algorithm to better learn the optimal policy.

[0075] Algorithm Training and Network Update: When the data in the experience replay pool reaches a certain amount, a batch of data is randomly sampled from the pool to train the reinforcement learning algorithm. Using this data, the algorithm calculates the Q-value estimation error, updates the parameters of the policy network and value network using gradient descent, optimizes the algorithm's decision-making strategy, and improves the model's prediction accuracy. As training progresses, the algorithm continuously learns the optimal parameter adjustment strategy, enabling the digital twin model to more accurately simulate the operating status of wind farm equipment.

[0076] In this embodiment, step S4 injects the digital twin model with the... Each equipment unit and The fault simulation data of each sub-unit is used to simulate the operating state of the equipment under different fault simulation data, and a fault database is constructed, specifically as follows: (1) Generation of fault simulation data Determining Fault Types and Characteristic Parameters: Historical fault data of wind farm equipment records various past fault scenarios, serving as a crucial basis for identifying common fault types. Analysis of this data can identify frequently occurring fault patterns. Equipment design documents contain information such as the equipment's structure, operating principles, and performance indicators, revealing potential weaknesses and failure risks from a design perspective. Fault mechanism analysis delves into the causes, development processes, and intrinsic relationships between faults and equipment operating parameters. Combining these three aspects allows for the precise identification of common fault types, such as wind turbine blade cracks and gearbox gear wear, and clarifies the characteristic parameters corresponding to each fault. For example, blade cracks may correspond to increased vibration amplitude and intensified vibration at specific frequencies.

[0077] Data generation using fault simulation software: After obtaining the fault type and characteristic parameters, fault simulation software is used to generate fault simulation data of different fault degrees and combinations. For example, in software such as Simulink, different degrees of gear wear parameters can be set according to the mechanical structure and dynamic model of the gearbox to simulate vibration, speed, and other data under various fault states, from slight wear to severe wear. Multiple faults can also be combined, such as simultaneously simulating gear wear and bearing failure, to generate data under complex fault conditions, providing rich material for subsequent comprehensive testing and training of fault diagnosis models.

[0078] (2) Data injection and operation status simulation Accurate data injection: The generated fault simulation data is precisely injected into the digital twin model according to the correspondence between equipment units and sub-units. For example, for the transmission system of a wind turbine, the fault data simulating gear wear is accurately input into the part representing the gearbox in the digital twin model to ensure the correspondence between the data and the actual equipment structure. This ensures the accuracy and reliability of the simulation results.

[0079] Simulating Fault Conditions: After injecting fault simulation data, the digital twin model simulates the operating status of wind farm equipment under different fault conditions. Because the digital twin model integrates physical and data-driven models, it can realistically reflect various changes in equipment under fault conditions. Through simulation, information such as changes in electrical parameters, stress on mechanical components, and temperature distribution under fault conditions can be obtained. These simulation results provide intuitive data support for studying the impact of faults on equipment.

[0080] (3) Fault database construction and management Database Structure Design: The structure of the fault database should be comprehensive and reasonable. The Fault Type field clearly records the specific type of fault, facilitating subsequent queries and statistical classification. The Fault Characteristic Parameter field records the characteristic parameter values ​​corresponding to each fault in detail, providing crucial information for fault diagnosis. The Equipment Operating Parameter field records the overall operating parameters of the equipment at the time of the fault, helping to analyze the correlation between the fault and the equipment's operating status. The Fault Occurrence Time field marks the chronological order of fault occurrence, which is significant for studying the development patterns and trends of faults.

[0081] Data Entry and Management: Simulated fault data and equipment operating status data are entered line by line according to the designed database structure. After entry, a data index and query mechanism are established. The data index accelerates data retrieval; for example, indexes can be created based on fields such as fault type and fault occurrence time. The query mechanism allows users to quickly obtain the data they need based on different requirements. For example, maintenance personnel can input specific fault characteristic parameters to query historical fault cases that match them, providing a reference for actual fault diagnosis.

[0082] In this embodiment, step S5, when a wind farm equipment malfunctions, involves matching the fault location according to the fault database and marking the confidence level of the corresponding fault index in the fault database based on the accuracy of the fault location matching. Specifically: The system generates fault keywords, which are then used as query conditions for precise and fuzzy matching within a pre-built fault database. When wind farm equipment malfunctions, the system generates fault keywords based on real-time operating data, detected anomalies, and relevant alarm information. These keywords are concise summaries of the fault characteristics, such as "abnormal wind turbine blade vibration" or "gearbox oil temperature too high." These keywords are then used as query conditions for precise and fuzzy matching within the pre-built fault database. Precise matching quickly locates fault records that perfectly match the keywords, providing direct reference for diagnosis; fuzzy matching offers greater flexibility, finding records that partially match the keywords, avoiding the omission of potentially related fault cases, and thus comprehensively acquiring historical data on similar faults.

[0083] For matched fault records, the corresponding fault location information is extracted for automatic on-site fault data acquisition and verification. For fault records matched in the database, the corresponding fault location information is extracted, indicating the specific location where the fault might occur, such as a certain part of the wind turbine blade or a specific gear in the gearbox. Next, various sensors, detection equipment, and automated data acquisition systems installed at the wind farm equipment site are used for automatic on-site fault data acquisition and verification. These devices can collect relevant data from the fault site in real time, such as vibration data, temperature data, and image data, and compare and analyze them with the fault characteristics recorded in the database to determine the consistency between the actual fault situation and the matched records in the database.

[0084] The accuracy of the matching is determined based on the verification results, and the confidence level of the corresponding fault index in the fault database is marked according to the matching accuracy, where accuracy is directly proportional to confidence. The accuracy of the matching is evaluated based on the results of automatic on-site fault data collection and verification. If the actual fault characteristics are highly consistent with the fault characteristics of the matched records in the database, the matching accuracy is high; conversely, if there is a significant difference, the matching accuracy is low. Then, the confidence level of the corresponding fault index in the fault database is marked according to the matching accuracy. Since accuracy and confidence are directly proportional, the higher the matching accuracy, the higher the confidence level of the corresponding fault index. For example, if the accuracy of a match reaches 90%, the confidence level of the corresponding fault index is marked as high; if the accuracy is only 30%, the confidence level is marked as low. This confidence level marking provides an important reference for subsequent fault diagnosis, enabling maintenance personnel to prioritize fault indexes with high confidence when faced with multiple matching results, thereby improving the efficiency and accuracy of fault diagnosis.

[0085] In this embodiment, step S6 sets a confidence threshold based on matching probability statistics, dynamically adjusts the digital twin model parameters based on the confidence threshold, and after adjustment, uses the index matching scheme that meets the confidence threshold as the fault diagnosis scheme when wind farm equipment fails. Specifically: Collect historical fault data of wind farm equipment and corresponding fault database matching results, record the matching status of fault information with different fault indices in the database, count the number of times each fault index is matched, and calculate the matching probability of each fault index based on the number of matches and the total number of fault samples. Collect historical fault data of wind farm equipment and the matching results of these faults in the fault database, and record in detail the matching status of each fault information with different fault indices in the database. For example, when a fault occurs, record whether it matches fault indices A, B, C, etc., and the degree of matching. Then count the number of times each fault index is matched. Assuming there are a total of N fault samples, the number of times a certain fault index i is matched is... Then the matching probability of this fault index It can be done through formula The calculation shows that the matching probability reflects how frequently the fault index appears in historical faults.

[0086] Based on the distribution of matching probabilities, the mean and standard deviation of the matching probabilities are calculated, and the mean plus the standard deviation is used as the confidence threshold. Based on the calculated matching probability distribution of all fault indices, the mean of these matching probabilities is further calculated. and standard deviation Mean This represents the average level of the matching probability, while the standard deviation represents the average level of the matching probability. It measures the dispersion of the matching probability relative to the mean. The mean plus the standard deviation, i.e. + This threshold is set as the confidence level. It is based on statistical principles, taking into account the overall distribution of matching probabilities, and can reasonably filter out fault indexes with relatively high matching probabilities and greater reliability.

[0087] This study analyzes the fault characteristics and equipment operating status data corresponding to fault indices that do not meet the confidence threshold. Based on the analysis results, the parameters related to the fault characteristics in the digital twin model are iteratively adjusted. For fault indices that do not meet the confidence threshold, a comprehensive and in-depth study of their corresponding fault characteristics and equipment operating status data is required. Fault characteristics encompass abnormal manifestations in multimodal data such as structural health monitoring, environmental meteorology, electrical operation, image and video, and sound acoustics. For example, in structural health monitoring data, there may be situations such as vibration amplitudes far exceeding the normal range and abnormal stress and strain distribution; in terms of electrical operation data, abnormal voltage fluctuations and sudden current changes may all be fault characteristics. Equipment operating status data involves the overall operating parameters of the equipment at the time of the fault, such as fan speed and power output. Through detailed analysis of this data, the abnormal condition of the equipment at the time of the fault can be accurately grasped. Then, the potential causes of these faults and the factors that cause deviations in the predictions of the digital twin model are explored in depth. On the one hand, the fault may originate from the wear and aging of components caused by long-term operation of the equipment, such as fatigue cracks in fan blades; it may also be caused by external environmental factors, such as the impact of extreme wind speeds on the equipment. On the other hand, inaccurate predictions by digital twin models can stem from insufficient consideration of certain key physical processes during model construction, such as the simplification of complex interactions between components in physics-driven models; or the failure of data-driven models to effectively capture crucial information during feature extraction, such as the omission of subtle fault features in images by convolutional neural networks. A detailed analysis of these factors can provide a strong basis for subsequent adjustments to model parameters.

[0088] Based on the above analysis results, the parameters related to fault characteristics in the digital twin model are iteratively adjusted. For the physical-driven model, the corresponding physical parameters are optimized according to the cause of the fault. If the fault is found to be related to the friction coefficient of the gearbox, the friction coefficient of the gearbox in the physical-driven model is adjusted to better reflect reality. For the data-driven model, if certain feature extraction networks are found to be ineffective in processing specific fault data, the network structure or parameters can be adjusted. For example, increasing the number of convolutional layers in the convolutional neural network enhances its ability to extract image features; or adjusting the parameters of the gating units in the recurrent neural network optimizes its processing effect on time series data. Through multiple iterative adjustments, the simulation and prediction capabilities of the digital twin model for faults are continuously improved, thereby enhancing the accuracy of fault diagnosis.

[0089] Fault indices with confidence levels meeting a set threshold are selected, and the matching schemes corresponding to these indices are used as fault diagnosis schemes. These matching schemes have been validated and screened using historical data, demonstrating high reliability. When wind farm equipment experiences a fault again, these validated diagnostic schemes can be directly referenced for quick and accurate fault diagnosis and handling.

[0090] like Figure 2 The present invention first collects multimodal data from multiple equipment units and their sub-units during normal operation of wind farm equipment, covering aspects such as structural health monitoring, environmental meteorology, and electrical operation. After preprocessing such as noise reduction and feature extraction, this data provides high-quality data support for subsequent analysis, ensuring that the data accurately reflects the operating status of the equipment.

[0091] Digital twin model construction: Based on the operating principles of equipment units and sub-units and multimodal data, a physical-driven model and a data-driven model are constructed respectively. The physical-driven model is based on the principles of mechanical dynamics, thermodynamics, and electrical control to simulate the physical processes of equipment operation; the data-driven model adopts a hybrid neural network model, selecting an appropriate feature extraction network according to the data type. Then, the state results of the physical-driven model are input into the data-driven model, and the abnormal patterns identified by the data-driven model are used as the boundary conditions of the physical-driven model. The residuals of the two models are calculated and fused. The fusion weights are dynamically adjusted through independent test datasets to construct the digital twin model.

[0092] Real-time operational data of wind farm equipment and external environmental data are collected to define the state space for reinforcement learning. Physical parameters, network weights, and biases from the digital twin model are used as the action space. A Deep Q-Network (DQN) algorithm is selected, and the reward function is determined based on the error between the digital twin model's predictions and actual observations. At each time period, actions are selected based on the current state to adjust the digital twin model parameters. The reward value is calculated based on the prediction results and actual observations and stored in an experience replay pool. When the pool reaches a certain amount of data, samples are taken from it to train the algorithm, updating the policy network and value network, enabling the digital twin model to track equipment state changes in real time.

[0093] Based on historical fault data, design documents, and fault mechanism analysis of wind farm equipment, common fault types and their characteristic parameters are identified. Fault simulation software is used to generate simulation data for different fault degrees and combinations. This data is then injected into a digital twin model to simulate the operating state of equipment under fault conditions, acquiring data such as changes in electrical parameters and stress on mechanical components. A fault database is designed, including fields such as fault type, characteristic parameters, equipment operating parameters, and fault occurrence time. Simulation data is entered, and an indexing and query mechanism is established. When equipment fails, fault keywords are generated based on real-time operating data. Precise and fuzzy matching is performed in the database to extract fault location information, which is then verified on-site. The confidence level of the fault index is marked based on the verification results.

[0094] Historical fault data and database matching results of wind farm equipment are collected. The number of fault index matches is counted, the matching probability is calculated, and a confidence threshold is determined based on the matching probability distribution. Data related to fault indices that do not meet the threshold are analyzed to identify fault causes and model prediction bias factors, and the digital twin model parameters are iteratively adjusted. Fault index matching schemes that meet the threshold are selected as fault diagnosis schemes to guide fault diagnosis and maintenance of wind farm equipment.

[0095] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for fault diagnosis of wind farm equipment based on neural networks and digital twins, characterized in that, According to the normal operation of wind farm equipment Multimodal data of N device units, and then the multimodal data of each unit are collected separately. The multimodal data of each subunit is processed, and the multimodal data is preprocessed; where N, M, and x are positive integers. Wind power plant Device unit and The operation principle of the device unit and the sub-unit, and the multi-modal data are used to respectively construct a physical driving model and a data driving model, and the physical driving model and the data driving model are hybrid modeled to construct a digital twin model. External environmental data and real-time operation data of wind farm equipment are collected at various time periods, and the parameters of the digital twin model are dynamically adjusted using reinforcement learning. injecting the fault simulation data of the device units and sub-units into the digital twin model, simulating the running state of the device under different fault simulation data, and constructing a fault database; When wind farm equipment fails, the fault location is matched according to the fault database, and the confidence level of the corresponding fault index in the fault database is marked based on the accuracy of the fault location matching. The confidence threshold is set based on the matching probability statistics. The parameters of the digital twin model are dynamically and iteratively adjusted based on the confidence threshold. After adjustment, the index matching scheme that meets the confidence threshold will be used as the fault diagnosis scheme when the wind farm equipment fails.

2. The wind farm equipment fault diagnosis method based on neural network and digital twinning according to claim 1, characterized in that, The multimodal data includes structural health monitoring data, including but not limited to vibration, stress, strain, and temperature; Environmental and meteorological data, including but not limited to wind speed, wind direction, temperature, humidity, and air pressure; Electrical and operational status data, including but not limited to voltage, current, power, speed, and angle; image and video data, including but not limited to visible light images, infrared imaging, and laser point clouds; sound and acoustic data, including but not limited to sound and ultrasound; maintenance and log data, including but not limited to maintenance records and SCADA system logs; the preprocessing of the multimodal data includes, but is not limited to, any combination of noise reduction, feature extraction, outlier correction, time alignment, data standardization, image enhancement, and structured processing.

3. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, The physical driving model is constructed based on the principles of mechanical dynamics, thermodynamics, and electrical control of the corresponding unit or sub-unit. The data driving model is constructed using a hybrid neural network model, selecting corresponding feature extraction networks for different data types, including convolutional neural networks (CNN), recurrent neural networks (RNN) and their variants LSTM, GRU, and support vector machines (SVM).

4. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, The process of hybrid modeling using both physical-driven and data-driven models to construct a digital twin model specifically involves: The state results calculated by the physics-driven model are used as the input data for the data-driven model, and the abnormal patterns identified by the data model are used as the boundary conditions of the physics model. Calculate the residuals of the physics-driven model and the data-driven model, and assign weights to the residuals to fuse the physics-driven model and the data-driven model; The prediction results of the hybrid model are evaluated using an independent test dataset, evaluation metrics are calculated, and the fusion weights are dynamically adjusted based on the evaluation metrics.

5. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 4, characterized in that, The hybrid neural network model employs a fusion of parallel channel construction and attention mechanisms, specifically: Multimodal data is preprocessed in different ways and then input into different branches, and feature alignment is performed. The output features of each branch are then subjected to dimensional transformation or time step processing. The attention weights are normalized to the [0, 1] interval using the Softmax function, and the features of each parallel channel are multiplied by their corresponding attention weights and then summed. After fusing the branches and attention mechanism modules built in parallel channels into a whole model, the loss is calculated and all network layer parameters are updated through backpropagation algorithm to balance the learning speed of each branch. An auxiliary loss function is added to each parallel channel branch. Each branch is pre-trained separately, and then the branches are combined and added to the attention mechanism fusion module to fine-tune the entire model.

6. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 4, characterized in that, The hybrid neural network model is optimized using lightweight techniques, including pruning and quantization, wherein: (1) Pruning optimization includes: Determine the pruning criteria, calculate the absolute value of neuron connection weights based on the weight magnitude, set a threshold, and prune connections whose absolute weights are less than the threshold. Perform pruning operations, using structured pruning, removing parts that contribute little to model performance on a per-neuron, per-kernel, or per-channel basis, and adjusting the number of input channels in subsequent layers; After pruning, the model is retrained, and the weights of the remaining parameters are adjusted using the training dataset. The learning rate is reduced, and the model's performance metrics on the validation set are monitored. (2) Quantitative optimization includes: Uniform quantization is used to map continuous floating-point values ​​to a finite number of discrete integer values, and the quantization range and step size are set. Perform quantization operations: after the model training is completed, quantize the weights into integers according to the selected scheme and store them; After quantization, the model is fine-tuned by adjusting the model parameters using training data to compensate for quantization errors and by monitoring the performance metrics of the validation set.

7. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, The external environmental data and real-time operational data of the wind farm equipment are collected at various time periods, and the parameters of the digital twin model are dynamically adjusted using reinforcement learning. Specifically: Define a state space by using preprocessed wind farm equipment operation data and external environment data as state variables; Define the action space, which is set as the set of adjustable parameters in the digital twin model, including physical parameters in the physics-driven model and network weights and biases in the data-driven model. Initialize the reinforcement learning algorithm, select the Deep Q-Network (DQN) algorithm and set the hyperparameters of the algorithm, and determine the reward function based on the error between the predicted value and the actual observed value of the digital twin model; In each time period, the current state is input into the reinforcement learning algorithm, an action is selected according to the current policy, the adjustment value of the digital twin model parameters is determined, and the digital twin model with adjusted parameters is used to predict the operating status of wind farm equipment. The reward value is calculated based on the prediction results and the actual observation values. The current state, action, reward and the next state are stored in the experience replay pool. When the data in the experience replay pool reaches a certain amount, a batch of data is randomly sampled from the pool to train the reinforcement learning algorithm and update the algorithm's policy network and value network.

8. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, The injection of the digital twin model Each equipment unit and The fault simulation data of each sub-unit is used to simulate the operating state of the equipment under different fault simulation data, and a fault database is constructed, specifically as follows: Fault simulation data generation is based on historical fault data of each unit and subunit of wind farm equipment, equipment design documents, and fault mechanism analysis to determine common fault types and their corresponding fault characteristic parameters. At the same time, fault simulation software is used to generate fault simulation data of different fault degrees and combinations based on the determined fault characteristic parameters. Data injection and operational status simulation: The generated fault simulation data is accurately injected into the digital twin model according to the correspondence between equipment units and sub-units. The digital twin model after injecting fault simulation data is used to simulate the operational status of wind farm equipment under different fault conditions. Design the structure of the fault database, including the field settings of the data tables, including fault type field, fault characteristic parameter field, equipment operating parameter field, and fault occurrence time field; The simulated fault data and equipment operating status data are entered into the fault database one by one according to the designed database structure, and a data index and query mechanism are established.

9. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, When wind farm equipment fails, the fault location is matched according to the fault database, and the confidence level of the corresponding fault index in the fault database is marked based on the accuracy of the fault location matching. Specifically: Generate fault keywords and use them as query conditions to perform exact and fuzzy matching in the existing fault database. For the matched fault records, extract the corresponding fault location information and perform automatic on-site fault collection and verification. The accuracy is matched based on the verification results, and the confidence level of the corresponding fault index in the fault database is marked according to the matching accuracy. The accuracy is proportional to the confidence level.

10. The wind farm equipment fault diagnosis method based on neural networks and digital twins according to claim 1, characterized in that, The process involves setting a confidence threshold based on matching probability statistics, dynamically adjusting the digital twin model parameters according to the confidence threshold, and then using the index matching scheme that meets the confidence threshold as the fault diagnosis scheme when wind farm equipment fails. Specifically: Collect historical fault data of wind farm equipment and corresponding fault database matching results, record the matching situation of fault information with different fault indexes in the database, count the number of times each fault index is matched, and calculate the matching probability of each fault index based on the number of matching and the total number of fault samples. Based on the distribution of matching probabilities, calculate the mean and standard deviation of the matching probabilities, and use the mean plus the standard deviation as the confidence threshold. Analyze the fault characteristics and equipment operating status data corresponding to fault indices that do not meet the confidence threshold, and iteratively adjust the parameters related to the fault characteristics in the digital twin model based on the analysis results; Fault indices that meet the set confidence threshold are selected, and the matching schemes corresponding to the fault indices that meet the set threshold are used as fault diagnosis schemes.