Wind turbine independent pitch control method and system based on deep learning

The independent pitch control method for wind turbines, which utilizes deep learning and multi-source state observation and residual reinforcement learning, generates precise pitch commands. This solves the model mismatch problem of traditional controllers under complex wind conditions, achieves accurate suppression of asymmetric loads, and improves the reliability and safety of wind turbines.

CN122190993APending Publication Date: 2026-06-12PENGLAI WIND POWER BRANCH OF HUANENG SHANDONG POWER GENERATION CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PENGLAI WIND POWER BRANCH OF HUANENG SHANDONG POWER GENERATION CO LTD
Filing Date
2026-03-03
Publication Date
2026-06-12

Smart Images

  • Figure CN122190993A_ABST
    Figure CN122190993A_ABST
Patent Text Reader

Abstract

The embodiment of the application provides a wind turbine independent variable pitch control method and system based on deep learning, which can learn and accurately identify the model mismatch error caused by the simplification of the internal model of the traditional controller and the real physical world through real-time observation of the system state and the control intention of the baseline controller. When the wind turbine encounters complex working conditions such as wake flow, a precise residual correction instruction can be generated to compensate for the deficiency of the baseline instruction. Finally, by fusing the stable baseline instruction with the intelligent residual correction instruction, a final control instruction that is stable and highly adaptive to working conditions is formed, thereby solving the technical problem of control performance decline or even failure caused by model mismatch.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of new energy wind power generation technology, specifically to a method and system for independent pitch control of wind turbines based on deep learning. Background Technology

[0002] During operation, wind turbines experience uneven, asymmetric aerodynamic loads on their rotating blades due to complex wind conditions such as wind shear, tower shadow effect, turbulence, and wake. These periodically varying loads are a major cause of fatigue damage to critical components like the blades, hub, main shaft, and tower, shortening the turbine's design life. To effectively suppress these asymmetric loads, independent pitch control technology has emerged. This technology actively counteracts the uneven loads acting on the blades by independently and differentiatedly adjusting the pitch angle of each blade, thus becoming a key technology for improving the reliability and economy of large wind turbines.

[0003] Currently, most existing independent pitch control methods are based on traditional feedback control theory, such as proportional-integral controllers. These controllers rely on a pre-established, simplified wind turbine dynamics model that describes the relationship between pitch angle changes and aerodynamic load response. Under ideal, stable inflow conditions, this fixed-model-based controller can achieve a certain load suppression effect. However, the actual operating environment of wind turbines is extremely complex and variable, especially in wind farm clusters, where downstream turbines often operate in the wake region generated by upstream turbines. The wake is characterized by wind speed loss and a surge in turbulence intensity, resulting in a highly nonlinear and time-varying aerodynamic environment experienced by the downstream turbine blades. Under these complex conditions, a serious model mismatch arises between the fixed, simplified linear model within the traditional controller and the actual, nonlinear dynamic response of the wind turbine. This mismatch prevents the controller from accurately predicting the actual load reduction effect generated by its pitch motion energy. The calculated control commands may be suboptimal or even harmful, not only significantly reducing the load suppression effect but also potentially causing oscillations in the control system, exacerbating fatigue damage to key components, and posing a potential threat to the safe and stable operation of the turbine.

[0004] Therefore, an optimized independent pitch control scheme for wind turbines is desired. Summary of the Invention

[0005] The present invention aims to at least solve one of the technical problems existing in the prior art, and provides a method and system for independent pitch control of wind turbines based on deep learning.

[0006] In a first aspect, embodiments of the present invention provide a deep learning-based independent pitch control method for wind turbines, comprising: The real-time sensor data of the wind turbine is processed by multi-source state observation to obtain the system state vector. The real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed. The system state vector is used to calculate the baseline control command to obtain the baseline independent pitch command; The system state vector and baseline independent pitch commands are subjected to residual reinforcement learning-based inference to obtain residual correction commands. The collective pitch command, baseline independent pitch command, and residual correction command provided by the wind turbine main controller are fused together to obtain the unrestricted final pitch command. The unrestricted final pitch command is constrained by a safety envelope to obtain a restricted final pitch command; The limited final pitch control command is issued to the pitch actuator for control execution.

[0007] In the above-mentioned deep learning-based independent pitch control method for wind turbines, the real-time sensor data of the wind turbines is processed by multi-source state observation to obtain the system state vector, including: performing Parker transformation on the flapping torque in the real-time sensor data to obtain the tilting torque and yaw torque; and normalizing the tilting torque, yaw torque, rotor speed and rotor azimuth angle into the system state vector.

[0008] In the aforementioned deep learning-based independent pitch control method for wind turbines, baseline control commands are calculated from the system state vector to obtain baseline independent pitch commands. This includes: calculating the tilting moment and yaw moment in the system state vector using a proportional-integral controller to obtain control commands in the non-rotating coordinate system; and performing an inverse Parker transformation based on the real-time rotor azimuth angle on the control commands in the non-rotating coordinate system to obtain baseline independent pitch commands.

[0009] In the aforementioned deep learning-based wind turbine independent pitch control method, the system state vector and the baseline independent pitch command are subjected to residual reinforcement learning-based inference processing to obtain the residual correction command. This includes: concatenating the system state vector and the baseline independent pitch command to obtain the state-control intent fusion enhancement vector; and passing the state-control intent fusion enhancement vector through a pre-trained Actor network to obtain the residual correction command.

[0010] In the aforementioned deep learning-based independent pitch control method for wind turbines, the training steps of the Actor network include: constructing a reward function to obtain a reward signal by minimizing the standard deviation of the flapping torque as the primary reward term and minimizing the sum of squares of the residual correction commands as the secondary penalty term; storing the empirical tuples containing the state-control intent fusion enhancement vector, residual correction commands, reward signal, and the state-control intent fusion enhancement vector at the next time step in the empirical replay pool to construct a training dataset; sampling training data from the empirical replay pool and using the Actor-Critic algorithm to iteratively update the policy network and value network to obtain the Actor network.

[0011] In the aforementioned deep learning-based wind turbine independent pitch control method, the collective pitch command, baseline independent pitch command, and residual correction command provided by the wind turbine master controller are fused to obtain an unrestricted final pitch command. This includes: adding the collective pitch command, baseline independent pitch command, and residual correction command element by element to obtain the unrestricted final pitch command.

[0012] In the aforementioned deep learning-based independent pitch control method for wind turbines, the unrestricted final pitch command is constrained by a safety envelope to obtain a restricted final pitch command. This includes: limiting the amplitude and pitch angle change rate of the unrestricted final pitch command to obtain a restricted final pitch command.

[0013] Secondly, embodiments of the present invention provide a deep learning-based independent pitch control system for wind turbines, comprising: The multi-source state observation and processing module is used to perform multi-source state observation and processing on the real-time sensor data of the wind turbine to obtain the system state vector. The real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed. The baseline control command calculation module is used to calculate the baseline control command from the system state vector to obtain the baseline independent pitch command. The residual reinforcement learning inference module is used to perform residual reinforcement learning-based inference processing on the system state vector and baseline independent pitch commands to obtain residual correction commands. The command fusion module is used to fuse the collective pitch command, baseline independent pitch command and residual correction command provided by the wind turbine main controller to obtain the unrestricted final pitch command. The safety envelope constraint module is used to apply a safety envelope constraint to the unconstrained final pitch command to obtain a constrained final pitch command; The command issuing module is used to issue the restricted final pitch command to the pitch actuator for control execution.

[0014] Compared with existing technologies, this invention proposes a deep learning-based independent pitch control method for wind turbines. By observing the system state and the control intent of the baseline controller in real time, it can learn in a data-driven manner and accurately identify model mismatch errors caused by the simplification of the internal model of traditional controllers compared to the real physical world. When the wind turbine encounters complex operating conditions such as wake turbulence that lead to model mismatch, it can generate an accurate residual correction command to compensate for the deficiencies of the baseline command. Finally, by fusing the stable baseline command with the intelligent residual correction command, a final control command that is both stable and highly adaptable to operating conditions is formed, thereby solving the technical problem of control performance degradation or even failure caused by model mismatch. Attached Figure Description

[0015] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0016] Figure 1 A flowchart of a deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the data flow of the deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention. Figure 3 This is a flowchart illustrating the process of performing residual correction commands by performing residual reinforcement learning-based inference processing on the system state vector and baseline independent pitch commands in a deep learning-based wind turbine independent pitch control method according to an embodiment of the present invention. Figure 4 This is a flowchart of the training Actor network for the deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention. Figure 5 This is a block diagram of a deep learning-based independent pitch control system for wind turbines according to an embodiment of the present invention. Detailed Implementation

[0017] To enable those skilled in the art to better understand the technical solutions of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the described embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0018] Unless otherwise specifically stated, the technical or scientific terms used in the embodiments of this invention should be understood in their ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains. The terms "comprising" or "including," as used in the embodiments of this invention, do not limit the shapes, numbers, steps, actions, operations, components, elements, and / or groups thereof mentioned, nor do they exclude the appearance or addition of one or more other different shapes, numbers, steps, actions, operations, components, elements, and / or groups thereof, or the inclusion of these.

[0019] Unless otherwise specifically stated, the relative arrangement, numerical expressions, and values ​​of the components and steps described in these embodiments do not limit the scope of the invention. It should also be understood that, for ease of description, the dimensions of the various parts shown in the drawings are not drawn to actual scale, and techniques, methods, and apparatus known to those skilled in the art may not be discussed in detail; however, where appropriate, the illustrated techniques, methods, and apparatus should be considered part of the specification. In all the examples shown and discussed herein, any other specific example may have different values. It should be noted that similar symbols and letters in the following figures denote similar items; therefore, once an item is defined in one figure, it need not be further discussed in subsequent figures.

[0020] In the description of the embodiments of the present invention, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In the embodiments of the present invention, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. Furthermore, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in the embodiments of the present invention, as well as the features of different embodiments or examples.

[0021] Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments of the present invention. It should be understood that the present invention is not limited to the exemplary embodiments described herein.

[0022] Existing independent pitch control methods for wind turbines mostly rely on simplified fixed models. When the turbine encounters complex conditions such as wake vortices, a severe mismatch occurs between this model and the actual physical dynamics. This model mismatch prevents traditional controllers from calculating accurate commands, significantly reducing load suppression effectiveness and potentially threatening the turbine's operational safety. Therefore, this invention proposes a deep learning-based independent pitch control method for wind turbines. Specifically, it first processes real-time sensor data from the wind turbine through multi-source state observation to obtain a system state vector that comprehensively describes the current turbine dynamics. A traditional baseline controller then calculates a set of basic baseline independent pitch commands based on this state vector to ensure system stability. Furthermore, an intelligent correction module based on residual reinforcement learning simultaneously receives the system state vector and this set of baseline commands. Through inference using a pre-trained deep learning policy network, it accurately outputs a set of residual correction commands to compensate for potential errors in the baseline controller caused by model mismatch. Subsequently, the collective pitch control command provided by the wind turbine main controller, the aforementioned baseline command, and the residual correction command output by the intelligent agent are arithmetically fused and subjected to strict safety envelope constraints such as amplitude and rate of change to generate the final executable commands issued to each blade pitch actuator. This hybrid control architecture allows the entire control system to retain the stability and reliability of traditional control while being endowed with powerful online adaptive and learning capabilities through deep learning. This enables it to overcome the technical problem of model mismatch and achieve accurate and robust suppression of asymmetric loads under complex and variable actual operating conditions.

[0023] Figure 1 This is a flowchart of a deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention. Figure 2 This is a schematic diagram of the data flow for a deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention. Figure 1 and Figure 2As shown, the wind turbine independent pitch control method and system based on deep learning according to an embodiment of the present invention includes the following steps: S100, performing multi-source state observation processing on real-time sensor data of the wind turbine to obtain a system state vector, wherein the real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed; S200, performing baseline control command calculation on the system state vector to obtain a baseline independent pitch command; S300, performing residual reinforcement learning-based inference processing on the system state vector and the baseline independent pitch command to obtain a residual correction command; S400, performing command fusion on the collective pitch command provided by the wind turbine master controller, the baseline independent pitch command and the residual correction command to obtain an unrestricted final pitch command; S500, applying a safety envelope restriction to the unrestricted final pitch command to obtain a restricted final pitch command; S600, sending the restricted final pitch command to the pitch actuator for control execution.

[0024] Specifically, in step S100, the real-time sensor data of the wind turbine is processed through multi-source state observation to obtain a system state vector. The real-time sensor data includes flapping torque, rotor azimuth angle, and rotor speed. It should be understood that because the raw data such as flapping torque measured by the sensors of each blade of the wind turbine are in a rotating coordinate system, their values ​​fluctuate drastically and periodically with blade rotation, and the data sources are scattered, making it difficult to directly and stably characterize the overall state of the asymmetric load acting on the entire hub. Therefore, in the technical solution of this invention, the real-time sensor data of the wind turbine is processed through multi-source state observation to obtain a system state vector. This converts the dynamic measurements from multiple sources in a rotating coordinate system into a standardized state vector in a fixed coordinate system that comprehensively describes the key dynamic characteristics of the turbine. This provides a concentrated, physically meaningful, and stable input for subsequent baseline control calculations and residual reinforcement learning inference, thus laying the foundation for achieving accurate and robust adaptive control.

[0025] More specifically, in a specific example of the present invention, the real-time sensor data of the wind turbine is processed by multi-source state observation to obtain a system state vector, including: performing Parker transformation on the flapping torque in the real-time sensor data to obtain the tilting torque and yaw torque; and normalizing the tilting torque, yaw torque, rotor speed and rotor azimuth angle into a system state vector.

[0026] More specifically, firstly, multi-sensor data streams are acquired in real time, including the flapping torque at the root of each of the three blades, the real-time rotational speed of the entire rotor, and the current rotational azimuth angle of the rotor. After acquiring the raw data, the three flapping torque signals, which change drastically over time, are processed using Parker transform. The physical meaning of this transform is to decouple and project the three independent, interrelated torque components in the rotating coordinate system onto a fixed coordinate system, thereby obtaining two physical quantities that directly reflect the overall stress state of the hub: tilting torque and yaw torque. Finally, the calculated tilting torque and yaw torque are combined and regularized with the four key parameters—rotor speed and rotor azimuth angle—acquired in a predetermined order into a numerical vector. This vector is the system state vector, which can be directly used for subsequent control calculations.

[0027] Specifically, in step S200, the system state vector is used to calculate baseline control commands to obtain baseline independent pitch commands. It should be understood that since the tilting moment and yaw moment in the system state vector obtained in the previous step directly quantify the asymmetric loads that need to be suppressed, a stable and reliable control law is required to respond to them. Therefore, in the technical solution of this invention, the system state vector is further used to calculate baseline independent pitch commands, thereby utilizing mature traditional feedback control methods to calculate a pitch angle adjustment that can ensure the basic stability of the system and provide the main control effect. In this way, a set of reliable independent pitch commands can be generated as the basis for subsequent intelligent correction, ensuring the stability and engineering practicality of the entire hybrid control architecture.

[0028] More specifically, in a specific example of the present invention, the baseline control command calculation of the system state vector to obtain the baseline independent pitch command includes: calculating the tilt moment and yaw moment in the system state vector through a proportional-integral controller to obtain the control command in the non-rotating coordinate system; and performing an inverse Parker transformation based on the real-time rotor azimuth angle on the control command in the non-rotating coordinate system to obtain the baseline independent pitch command.

[0029] More specifically, the tilting moment and yaw moment components are extracted from the input system state vector. These two moment components are compared with their control target value, zero, to generate two independent error signals. These error signals are then input to two independent proportional-integral (PI) controllers. Through the calculations of the proportional and integral components, a set of control commands required to compensate for these two errors in a non-rotating coordinate system is derived. Finally, to convert this set of commands in a fixed coordinate system into commands executable by each individual blade, a reverse Parker transformation is performed on this set of commands based on the real-time rotor azimuth angle obtained from the system state vector. The output of this transformation is three pitch angle adjustments corresponding to each blade, which together constitute the baseline-independent pitch command.

[0030] Specifically, in step S300, the system state vector and the baseline independent pitch command are subjected to residual reinforcement learning-based inference processing to obtain a residual correction command. It should be understood that since the aforementioned baseline independent pitch command is calculated based on a fixed, simplified linear model, it inherently suffers from model mismatch when facing the complex nonlinear dynamics and unmodeled disturbances in actual wind turbine operation, resulting in limited control accuracy and performance. Therefore, in the technical solution of this invention, the system state vector and the baseline independent pitch command are further subjected to residual reinforcement learning-based inference processing to obtain a residual correction command. This utilizes the powerful nonlinear fitting and data-driven learning capabilities of deep learning models to identify and compensate for performance deviations of the baseline controller online. This generates a precise and adaptive correction command specific to the current operating condition, effectively overcoming the model mismatch defects of traditional control methods and significantly improving the robustness and overall performance of the control system.

[0031] Figure 3 This is a flowchart illustrating the process of obtaining residual correction commands through residual reinforcement learning-based inference processing of the system state vector and baseline independent pitch control command according to an embodiment of the present invention. Figure 3 As shown, step S300 includes: S310, concatenating the system state vector and the baseline independent pitch command to obtain the state-control intent fusion enhancement vector; S320, passing the state-control intent fusion enhancement vector through the pre-trained Actor network to obtain the residual correction command.

[0032] In step S310, the system state vector and the baseline independent pitch command are concatenated to obtain the state-control intent fusion enhancement vector. It should be understood that if only the system state vector is used as input for reinforcement learning inference, the algorithm would need to learn a complete and complex independent pitch control law from scratch. This is not only difficult to learn and inefficient in training, but also fails to fully utilize the stability and prior knowledge of existing mature baseline controllers. Therefore, in the technical solution of this invention, the system state vector and the baseline independent pitch command are further concatenated to obtain the state-control intent fusion enhancement vector. This provides the subsequent inference network with a richer information dimension and a more explicit context, transforming the learning task from solving for the absolutely optimal control quantity to solving for the optimal correction quantity to the baseline control quantity. This significantly reduces the exploration and learning difficulty of the reinforcement learning algorithm, allowing it to focus on the more specific and convergent goal of fitting the model mismatch error of the baseline controller, thereby accelerating the training process and improving the accuracy and stability of the final control strategy.

[0033] More specifically, in a concrete example of the present invention, firstly, two numerical vectors are determined as splicing sources: a system state vector containing elements such as tilting moment, yaw moment, rotor speed, and rotor azimuth angle; and a baseline independent pitch command vector containing the pitch angle adjustments of each of the three blades. Subsequently, a vector splicing operation is performed, which computationally involves appending all elements of the baseline independent pitch command vector to the end of the system state vector in their original order. After this operation, a new vector with a higher dimension is formed. The first half of this new vector contains the physical state information of the system, and the second half contains the decision intent information of the baseline controller. This new vector is the aforementioned state-control intent fusion enhancement vector and serves as the final input for subsequent Actor network inference.

[0034] In step S320, the state-control intent fusion enhancement vector is passed through a pre-trained Actor network to obtain the residual correction command. It should be understood that there exists a highly complex nonlinear mapping relationship between the state-control intent fusion enhancement vector constructed in the previous step and the optimal residual correction command, which cannot be directly expressed by an explicit mathematical formula. Therefore, in the technical solution of this invention, the state-control intent fusion enhancement vector is further passed through a pre-trained Actor network to obtain the residual correction command. This utilizes the powerful function approximator capability of deep neural networks to perform a fast inference calculation online, thereby solving for the optimal solution under this complex mapping relationship. In this way, the complex control strategy knowledge learned from massive amounts of data during the offline training phase can be efficiently and in real-time applied to online control decision-making, generating intelligent correction commands that can accurately respond to the current specific operating conditions.

[0035] More specifically, in a concrete example of the present invention, the reasoning process is implemented as follows. First, the data of the state-control intent fusion enhancement vector is fed into the input layer of a pre-trained Actor network, where each element value in the vector activates a neuron in the input layer. Next, the activation values ​​of the input layer are propagated layer by layer forward within the network. In each hidden layer, the output data of the previous layer is linearly multiplied and added with the weight matrix of that layer, processed by a non-linear activation function, and then the result is passed to the next layer. This series of weighted summations and non-linear transformations constitutes the core computation for feature extraction and complex relationship modeling of the input information. Finally, when the signal propagates to the output layer of the network, the values ​​generated by the output layer neurons are combined into an output vector whose dimension matches the dimension of the required control command; its value is the final calculated residual correction command.

[0036] Figure 4 This is a flowchart of the Actor network training for a deep learning-based independent pitch control method for wind turbines according to an embodiment of the present invention; as follows: Figure 4 As shown, the training steps of the Actor network include the following steps: S321, constructing a reward function to obtain a reward signal by minimizing the standard deviation of the swinging torque as the primary reward term and minimizing the sum of squares of the residual correction instructions as the secondary penalty term; S322, storing the experience tuples containing the state-control intention fusion enhancement vector, residual correction instructions, reward signal, and the state-control intention fusion enhancement vector at the next time step in the experience replay pool to construct a training dataset; S323, sampling training data from the experience replay pool and using the Actor-Critic algorithm to iteratively update the policy network and value network to obtain the Actor network.

[0037] In step S321, minimizing the standard deviation of the flapping torque is used as the primary reward, and minimizing the sum of squares of the residual correction commands is used as the secondary penalty. A reward function is constructed to obtain the reward signal. It should be understood that since the training process of deep reinforcement learning is a goal-oriented optimization process, it requires a clear and quantifiable feedback signal to guide the learning direction of the policy network, i.e., to evaluate the merits of its actions under specific states. Therefore, in the technical solution of this invention, minimizing the standard deviation of the flapping torque is used as the primary reward, and minimizing the sum of squares of the residual correction commands is used as the secondary penalty. A reward function is constructed to obtain the reward signal, thereby transforming the complex, multi-objective wind turbine load suppression problem into a specific mathematical expectation maximization problem that can be optimized by the algorithm. This provides a clear guiding signal containing primary and secondary optimization objectives for the iterative update of the policy network, ensuring that it learns and converges to a balanced control strategy that effectively reduces blade fatigue load while also considering control costs and actuator losses.

[0038] More specifically, in a concrete example of the present invention, the calculation of the reward signal is performed as follows. First, after the actuator completes the pitch change action, the flapping torque data of the three blades at the next moment is collected, and the standard deviation is calculated based on this set of data. This standard deviation directly reflects the severity of the load fluctuation on the blades and is a key indicator for assessing fatigue damage. Multiplying it by a preset negative weighting coefficient constitutes the primary reward term. Simultaneously, the residual correction command vector output by the Actor network in the previous control cycle is obtained, and the square of each element in this vector, i.e., the pitch angle correction of each blade, is calculated. All squared values ​​are then summed. The magnitude of this sum of squares represents the severity of the control action or the consumption of control energy. Multiplying it by another preset negative weighting coefficient constitutes the secondary penalty term. Finally, the calculated primary reward term and secondary penalty term are algebraically summed to obtain the final scalar value, which is the reward signal. This signal will be used in subsequent policy update calculations.

[0039] In step S322, the empirical tuples containing the state-control intent fusion enhancement vector, residual correction instruction, reward signal, and the state-control intent fusion enhancement vector at the next time step are stored in the empirical replay pool to construct the training dataset. It should be understood that if deep reinforcement learning training directly uses sequentially generated interaction data, it will lead to a high degree of temporal correlation between training samples. This correlation will violate the independent and identically distributed (IOD) assumption required for neural network training, resulting in instability or even divergence in the learning process. Therefore, in the technical solution of this invention, the empirical tuples containing the state-control intent fusion enhancement vector, residual correction instruction, reward signal, and the state-control intent fusion enhancement vector at the next time step are further stored in the empirical replay pool to construct the training dataset, thereby decoupling the temporal coupling between training data and improving data utilization efficiency. In this way, subsequent random sampling training can provide a dataset that conforms to the independent and identically distributed (IOD) assumption for the parameter updates of the neural network, significantly enhancing the stability of the training process and ensuring that the algorithm can robustly converge to a high-performance control strategy.

[0040] More specifically, in a concrete example of the present invention, at each time step of offline training, the following four data items are combined into a structured data unit, namely, an experience tuple: the current state-control intention fusion enhancement vector, the current residual correction instruction, the reward signal from the environment after executing the instruction, and the state-control intention fusion enhancement vector evolved to the next time step. This experience tuple is then stored as a whole in an experience replay pool with a pre-set capacity limit. This experience replay pool is a data buffer with a first-in, first-out (FIFO) characteristic. In the initial stage of training, experience tuples are continuously added until the pool is full; when the experience replay pool is full, whenever a new experience tuple needs to be added, the oldest experience tuple in the pool is automatically deleted to make room for the new tuple. In this way, interaction experience is continuously collected and updated, and the resulting experience replay pool becomes the complete training dataset used for subsequent policy network iterations.

[0041] In step S323, training data is sampled from the experience replay pool, and the Actor-Critic algorithm is used to iteratively update the policy network and value network to obtain the Actor network. It should be understood that since the discrete data points stored in the experience replay pool cannot directly constitute an executable control policy, an effective learning algorithm must be used to utilize this data to iteratively optimize a policy network capable of generalizing to unseen states. Therefore, in the technical solution of this invention, training data is further sampled from the experience replay pool, and the Actor-Critic algorithm is used to iteratively update the policy network and value network to obtain the Actor network. This allows the value network to accurately evaluate the value of state-action pairs, and based on this evaluation, guides the policy network to optimize in a direction that yields higher long-term returns. In this way, through the collaborative learning and iterative optimization of the policy network and value network, it is ensured that the policy network eventually converges to an approximately optimal policy that maximizes cumulative rewards, i.e., a high-performance intelligent controller that can be deployed online.

[0042] More specifically, in a concrete example of the present invention, a small batch of experience tuples is randomly selected from the constructed experience replay pool as training samples for the current iteration. Subsequently, the value network, i.e., the Critic network, is updated. For each tuple in this batch of samples, the value of its next-time state is calculated using the target value network, and combined with the immediate reward signal in the tuple, a target value for the current state-action pair is constructed. The evaluation value of the current state-action pair by the current Critic network is compared with this target value, the temporal difference error between the two is calculated, and a loss function is constructed based on this error. Then, the parameters of the Critic network are updated using a gradient descent algorithm, making its value evaluation increasingly closer to the true expectation. After the Critic network is updated, the policy network, i.e., the Actor network, is updated. The states in this batch of samples are input into the Actor network to generate actions, and then the state-action pairs are input into the updated Critic network for value evaluation. Based on the evaluation results of the Critic network, a loss function for the Actor network is constructed, the goal of which is to maximize the Critic's evaluation value. Finally, the parameters of the Actor network are updated using a policy gradient method, making it inclined to output actions that obtain higher value evaluations. The above sampling and update process is repeated until the network performance converges. The final converged Actor network is the pre-trained network that can be deployed.

[0043] In step S400, the collective pitch control command, baseline independent pitch control command, and residual correction command provided by the wind turbine master controller are fused to obtain an unrestricted final pitch control command. It should be understood that since the pitch control of a wind turbine needs to simultaneously satisfy two core objectives—power regulation and asymmetric load suppression—and these two objectives are respectively dominated by the collective pitch control command and the independent pitch control command, with the independent pitch control command consisting of baseline and residual correction components, these parallel-generated commands must be integrated into a unified final command that can be executed by the actuator. Therefore, in the technical solution of this invention, the collective pitch control command, baseline independent pitch control command, and residual correction command provided by the wind turbine master controller are further fused to obtain an unrestricted final pitch control command. This superimposes multiple command components serving different control objectives into a single pitch angle setpoint that can accommodate all control objectives and act on each blade. This ensures that while performing active and adaptive load suppression operations, the overall power generation and speed regulation of the wind turbine are not affected, thereby achieving multi-objective coordinated control.

[0044] More specifically, in a specific example of the present invention, the collective pitch command, baseline independent pitch command, and residual correction command provided by the wind turbine master controller are fused to obtain an unrestricted final pitch command, including: adding the collective pitch command, baseline independent pitch command, and residual correction command element by element to obtain the unrestricted final pitch command.

[0045] More specifically, the instruction fusion process is implemented as follows: First, three parallel instruction inputs within the current control cycle are acquired: a single collective pitch command value calculated by the wind turbine main controller to adjust the unit's power and speed; a baseline independent pitch command vector containing three elements calculated by the baseline controller to provide base load suppression; and a residual correction command vector containing three elements inferred by the Actor network to provide adaptive correction. Then, these three instructions are added element-wise. Specifically, the collective pitch command value is added to the corresponding elements in the baseline independent pitch command vector and the residual correction command vector, thereby calculating a fused pitch angle target value for each of the three blades. After this process, the resulting vector containing three elements is the unconstrained final pitch command.

[0046] Specifically, in step S500, the unrestricted final pitch command is subjected to a safety envelope constraint to obtain a restricted final pitch command. It should be understood that since the unrestricted final pitch command generated in the previous step is the ideal output of the control algorithm, it does not consider the physical travel limitations and maximum operating rate of the wind turbine pitch actuator itself. Directly issuing such a command might cause the command to exceed the hardware's execution capabilities, leading to execution deviations, increased component wear, or even equipment damage. Therefore, in the technical solution of this invention, the unrestricted final pitch command is further subjected to a safety envelope constraint to obtain a restricted final pitch command, thereby ensuring that the finally generated control command strictly conforms to the physical constraints and safe operation specifications of the pitch system. This ability to generate a set of final execution commands that both reflect the intent of the control algorithm and ensure physical feasibility and harmlessness to the equipment is a key step in ensuring the safe, stable, and reliable operation of the entire control system in practical engineering applications.

[0047] More specifically, in a specific example of the present invention, applying a safety envelope constraint to an unrestricted final pitch command to obtain a restricted final pitch command includes: applying an amplitude constraint and a pitch angle change rate constraint to the unrestricted final pitch command to obtain a restricted final pitch command.

[0048] More specifically, firstly, amplitude limiting is performed. Each target pitch angle value in the unrestricted final pitch command vector is compared with preset upper and lower pitch angle thresholds. If the target value exceeds the preset range, it is corrected to the corresponding upper or lower threshold. This process ensures that the command will not cause the blades to rotate beyond their mechanical travel. Next, based on amplitude limiting, pitch angle change rate limiting is performed. The amplitude-limited command is compared with the final command issued in the previous control cycle to calculate the pitch angle change within a single cycle. The required pitch rate is then calculated based on the control cycle duration. This rate is compared with the preset maximum permissible pitch rate. If the calculated rate exceeds the upper limit, the current command is adjusted so that its change within the cycle precisely corresponds to the maximum permissible pitch rate. The command vector obtained after these two levels of limiting is the restricted final pitch command.

[0049] Specifically, in step S600, the restricted final pitch command is sent to the pitch actuator for control execution. It should be understood that since all the aforementioned calculation steps are completed within the controller, the resulting final pitch command is only a set of digital signals. This must be converted into the actual physical movement of the wind turbine blades to effectively suppress asymmetric aerodynamic loads. Therefore, in the technical solution of this invention, the restricted final pitch command is further sent to the pitch actuator for control execution, thereby closing the entire control loop from state perception and decision calculation to physical execution, and converting the algorithm's output intent into real mechanical motion. In this way, by precisely driving each blade to rotate to the target angle, the aerodynamic characteristics of the blades can be changed in real time, thereby directly counteracting the unbalanced loads acting on the rotor, ultimately achieving the physical objective of the entire adaptive independent pitch control method.

[0050] More specifically, in a concrete example of the invention, a constrained final pitch command vector containing three target pitch angle values ​​is sent from the main controller at the tower to three independent pitch drives located within the hub via an industrial fieldbus. Each pitch drive, upon receiving the target angle command specific to its controlled blade, compares it with the actual current blade angle detected in real-time by an encoder to calculate the position deviation. Based on this position deviation, the pitch drive precisely controls the current and voltage supplied to the pitch motor, driving the motor to generate the required torque, which in turn rotates the blade via a reduction gearbox. This rotation process continues until the actual angle fed back by the encoder matches the target angle of the command, and the position deviation approaches zero, thus completing one precise positioning control of a blade. This process is performed synchronously and independently on the pitch mechanisms of the three blades, ultimately achieving precise physical execution of the command.

[0051] In summary, the deep learning-based independent pitch control method for wind turbines according to embodiments of the present invention is explained. It introduces an intelligent correction module based on residual reinforcement learning. By observing the system state and the control intent of the baseline controller in real time, it can learn in a data-driven manner and accurately identify model mismatch errors caused by the simplification of the internal model of the traditional controller compared to the real physical world. When the wind turbine encounters complex operating conditions such as wake turbulence leading to model mismatch, it can generate an accurate residual correction command to compensate for the deficiencies of the baseline command. Finally, by fusing the stable baseline command with the intelligent residual correction command, a final control command that is both stable and highly adaptable to operating conditions is formed, thereby solving the technical problem of control performance degradation or even failure caused by model mismatch.

[0052] Furthermore, a deep learning-based independent pitch control system for wind turbines is also provided.

[0053] Figure 5 This is a block diagram of a deep learning-based independent pitch control system for wind turbines according to an embodiment of the present invention. Figure 5 As shown, the wind turbine independent pitch control system 100 based on deep learning according to an embodiment of the present invention includes: a multi-source state observation and processing module 110, used to perform multi-source state observation and processing on real-time sensor data of the wind turbine to obtain a system state vector, wherein the real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed; a baseline control command calculation module 120, used to perform baseline control command calculation on the system state vector to obtain a baseline independent pitch command; a residual reinforcement learning-based inference module 130, used to perform residual reinforcement learning-based inference processing on the system state vector and the baseline independent pitch command to obtain a residual correction command; a command fusion module 140, used to fuse the collective pitch command, the baseline independent pitch command and the residual correction command provided by the wind turbine master controller to obtain an unrestricted final pitch command; a safety envelope restriction module 150, used to apply a safety envelope restriction to the unrestricted final pitch command to obtain a restricted final pitch command; and a command issuance module 160, used to issue the restricted final pitch command to the pitch actuator for control execution.

[0054] Furthermore, the baseline control command calculation module 120 includes: The control command acquisition unit is used to calculate the tilting moment and yaw moment in the system state vector through a proportional-integral controller to obtain control commands in a non-rotating coordinate system by performing error calculation and control quantity solution.

[0055] The baseline-independent pitch command acquisition unit is used to perform an inverse Parker transformation based on the real-time rotor azimuth angle on the control commands in the non-rotating coordinate system to obtain the baseline-independent pitch command.

[0056] Furthermore, the residual reinforcement learning-based inference module 130 includes: The command splicing unit is used to splice the system state vector and the baseline independent pitch command to obtain the state-control intent fusion enhancement vector.

[0057] The residual correction instruction acquisition unit is used to pass the state-control intent fusion enhancement vector through the pre-trained Actor network to obtain the residual correction instruction.

[0058] As described above, the deep learning-based independent pitch control system 100 for wind turbines according to embodiments of the present invention can achieve real-time communication with the wind turbine's pitch drive system and SCADA monitoring system in a collaborative working environment between the edge computing unit at the wind turbine site and the back-end server platform of the wind farm's central control center. This includes, for example, the main controller or dedicated industrial control computer deployed inside the wind turbine nacelle or tower base. In one possible implementation, the deep learning-based independent pitch control system 100 for wind turbines according to embodiments of the present invention can be integrated into the main control system of the wind turbine as a software or hardware module. For example, on the back-end server side, this system can be a background service program running in its operating system, including functions such as experience replay pool construction, iterative updates of the policy network and value network, and model verification and distribution. Alternatively, it can be an independent intelligent control strategy training engine developed for the wind farm. Of course, the core part of the system used to perform real-time control, namely the Actor network, can also be embedded in dedicated industrial computing hardware, such as an embedded AI chip or dedicated accelerator card inside the wind turbine's main controller, to accelerate the real-time inference process of residual correction instructions and ensure low-latency output of control instructions.

[0059] It is understood that the above embodiments are merely exemplary implementations used to illustrate the principles of the present invention, and the present invention is not limited thereto. For those skilled in the art, various modifications and improvements can be made without departing from the spirit and essence of the present invention, and these modifications and improvements are also considered to be within the scope of protection of the present invention.

Claims

1. A deep learning-based independent pitch control method for wind turbines, characterized in that, include: The real-time sensor data of the wind turbine is processed by multi-source state observation to obtain the system state vector. The real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed. The system state vector is used to calculate the baseline control command to obtain the baseline independent pitch command; The system state vector and baseline independent pitch commands are subjected to residual reinforcement learning-based inference to obtain residual correction commands. The collective pitch command, baseline independent pitch command, and residual correction command provided by the wind turbine main controller are fused together to obtain the unrestricted final pitch command. The unrestricted final pitch command is constrained by a safety envelope to obtain a restricted final pitch command; The limited final pitch command is issued to the pitch actuator for control execution.

2. The wind turbine independent pitch control method based on deep learning according to claim 1, characterized in that, The real-time sensor data of the wind turbine is processed through multi-source state observation to obtain the system state vector, including: The flapping torque in the real-time sensor data is subjected to Parker transformation to obtain the tilting torque and yaw torque; The tilting moment, yaw moment, rotor speed, and rotor azimuth angle are normalized into a system state vector.

3. The wind turbine independent pitch control method based on deep learning according to claim 2, characterized in that, The system state vector is used to calculate baseline control commands to obtain baseline-independent pitch commands, including: The tilting moment and yaw moment in the system state vector are used to perform error calculation and control quantity solution through a proportional-integral controller to obtain the control command in the non-rotating coordinate system. The control commands in the non-rotating coordinate system are subjected to an inverse Parker transformation based on the real-time rotor azimuth angle to obtain baseline-independent pitch commands.

4. The wind turbine independent pitch control method based on deep learning according to claim 3, characterized in that, The system state vector and baseline independent pitch commands are subjected to residual reinforcement learning-based inference to obtain residual correction commands, including: The system state vector and baseline independent pitch commands are concatenated to obtain the state-control intent fusion enhancement vector; The state-control intent fusion enhancement vector is passed through a pre-trained Actor network to obtain residual correction instructions.

5. The wind turbine independent pitch control method based on deep learning according to claim 4, characterized in that, The training steps for an Actor network include: The reward function is constructed to obtain the reward signal by minimizing the standard deviation of the swinging torque as the primary reward and minimizing the sum of squares of the residual correction instructions as the secondary penalty. The experience tuples containing the state-control intent fusion enhancement vector, residual correction instruction, reward signal and the state-control intent fusion enhancement vector of the next time step are stored in the experience replay pool to construct the training dataset. Training data is sampled from the experience replay pool, and the Actor-Critic algorithm is used to iteratively update the policy network and value network to obtain the Actor network.

6. The wind turbine independent pitch control method based on deep learning according to any one of claims 1 to 5, characterized in that, The collective pitch command, baseline independent pitch command, and residual correction command provided by the wind turbine main controller are fused to obtain an unrestricted final pitch command. This includes adding the collective pitch command, baseline independent pitch command, and residual correction command element by element to obtain the unrestricted final pitch command.

7. The deep learning-based independent pitch control method for wind turbines according to any one of claims 1 to 5, characterized in that, To obtain a restricted final pitch command, a safety envelope constraint is applied to an unrestricted final pitch command, including: applying amplitude and pitch angle change rate limits to the unrestricted final pitch command.

8. A deep learning-based independent pitch control system for wind turbines, characterized in that, include: The multi-source state observation and processing module is used to perform multi-source state observation and processing on the real-time sensor data of the wind turbine to obtain the system state vector. The real-time sensor data includes flapping torque, rotor azimuth angle and rotor speed. The baseline control command calculation module is used to calculate the baseline control command from the system state vector to obtain the baseline independent pitch command. The residual reinforcement learning inference module is used to perform residual reinforcement learning-based inference processing on the system state vector and baseline independent pitch commands to obtain residual correction commands. The command fusion module is used to fuse the collective pitch command, baseline independent pitch command and residual correction command provided by the wind turbine main controller to obtain the unrestricted final pitch command. The safety envelope constraint module is used to apply a safety envelope constraint to the unconstrained final pitch command to obtain a constrained final pitch command; The command issuing module is used to issue the restricted final pitch command to the pitch actuator for control execution.

9. The wind turbine independent pitch control system based on deep learning according to claim 8, characterized in that, The baseline control command calculation module includes: The control command acquisition unit is used to calculate the tilting moment and yaw moment in the system state vector through the proportional-integral controller to obtain the control command in the non-rotating coordinate system by performing error calculation and control quantity solution. The baseline-independent pitch command acquisition unit is used to perform an inverse Parker transformation based on the real-time rotor azimuth angle on the control commands in the non-rotating coordinate system to obtain the baseline-independent pitch command.

10. The wind turbine independent pitch control system based on deep learning according to claim 8, characterized in that, The residual reinforcement learning-based inference module includes: The command splicing unit is used to splice the system state vector and the baseline independent pitch command to obtain the state-control intent fusion enhancement vector; The residual correction instruction acquisition unit is used to pass the state-control intent fusion enhancement vector through the pre-trained Actor network to obtain the residual correction instruction.