A wind farm joint control method

By using deep Q-Network to jointly control the pitch and yaw of wind turbines in wind farms, the problem of power generation loss and turbine fatigue caused by wake effect in wind farms has been solved, achieving optimized control at the wind farm level and improving power generation efficiency and turbine lifespan.

CN115059576BActive Publication Date: 2026-06-16BEIJING HUANENG XINRUI CONTROL TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING HUANENG XINRUI CONTROL TECH
Filing Date
2022-05-24
Publication Date
2026-06-16

Smart Images

  • Figure CN115059576B_ABST
    Figure CN115059576B_ABST
Patent Text Reader

Abstract

The application provides a wind farm joint control method, which can effectively reduce the influence of model uncertainty, enhance the overall power generation of the wind farm, reduce the fatigue of the wind turbine, and prolong the service life. The deep Q-Network is used to jointly control the variable pitch and yaw of all wind turbines in the wind farm. The wind farm controller involved includes three modules, specifically environment, automatic encoder and two reinforcement learning agents. The environment includes wind turbines and a command center in the wind farm. The control variables in the environment are blade pitch and yaw angle. The sensory input from the wind farm includes free flow wind speed and direction. The synthetic wind speed, rotor angular velocity and power generated by each wind turbine are obtained. The global state S t+1 and the global reward r t+1 are obtained from the command center. The global state vector is transmitted through the automatic encoder. The automatic encoder automatically encodes the global state S t into a reduced number of features. The automatic encoder has an input layer, a plurality of hidden layers, a center layer and an output layer.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the technical field of wind farm control, specifically a joint control method for wind farms. Background Technology

[0002] Wind energy is a rapidly growing renewable energy source. Its sustainable development characteristics make wind power generation highly promising, but its economic viability remains crucial in competition with traditional power sources. Therefore, it is essential to maximize the power output of wind turbines. Furthermore, the increased pressure levels faced by the turbine rotor lead to higher mechanical stresses and fatigue loads with increased turbine power output, especially for large turbines. Since increased fatigue shortens turbine lifespan, any optimization of power production throughout the turbine's lifespan should aim to minimize the fatigue it experiences.

[0003] Wind turbines are typically part of a wind farm; standalone turbines are uncommon. Farm-level optimization involves modeling the combined interactions between turbines caused by wake effects. Wakes are aerodynamic shadows cast by upstream turbines onto downstream turbines. Wakes are detrimental because they reduce energy in the wind, increase wind periodicity, and thus lead to turbine fatigue. Wake effects can persist up to 15 times the diameter of the downstream rotor blades. Wakes have reportedly caused a 12% loss in power generation at offshore wind farms. While the design phase of turbine placement within a wind farm aims to minimize such wake effects, they still operate continuously. Therefore, in addition to the individual turbine-level tradeoffs between power generation and fatigue, farm-level optimization also needs to address the tradeoffs between upstream turbine power and downstream turbine fatigue.

[0004] In existing technologies, wind turbines are controlled individually, with each turbine attempting to maximize its power generation using a proportional-integral-derivative (PID) controller. Abstractly, at each control time step t, the PID controller of turbine i uses wind information and previous system states (rotor speed) to adjust the turbine's control settings to maximize power generation relative to a reference power curve. While individual PID controllers reactively detect and react to the presence of adjacent turbines by measuring wake effects, active joint control of turbines at the wind farm level can provide better performance in terms of energy and health. Joint control offers superior performance because the decisions of any single controller can be implemented as special cases of joint controller decisions. Therefore, how to jointly control the pitch and yaw of all turbines at the wind farm level to optimize power and fatigue while capturing cross-turbine wake interactions is a technical problem that needs to be solved. Summary of the Invention

[0005] To address the aforementioned issues, this invention provides a joint control method for wind farms, which can effectively reduce the impact of model uncertainties, enhance the overall power generation of the wind farm, reduce wind turbine fatigue, and extend its lifespan.

[0006] A joint control method for a wind farm is characterized by employing a deep Q-Network to jointly control the pitch and yaw of all wind turbines in the wind farm. The wind farm controller involved includes three modules: an environment module, an automatic encoder, and two reinforcement learning agents.

[0007] The environment includes the wind turbines and command center within the wind farm. Controlled variables within the environment are blade pitch and yaw angles. Sensory inputs from the wind farm include free-flow wind speed and direction; the composite wind speed, rotor angular velocity, and power generated by each turbine; and the previous global state S. t+1 and global reward r t+1 The global state vector, obtained from the command center, will be transmitted via an autoencoder.

[0008] The auto encoder automatically converts the global state S t Encoding features with reduced quantity, an autoencoder consists of an input layer, several hidden layers, a central layer, and an output layer. For an autoencoder, the input and output layers of the neural network have the same characteristics, and the central layer contains some minimal nodes to represent the compression state. The number of nodes in the central layer represents the trade-off between effective compression and information loss.

[0009] Information on the use of two reinforcement learning agents The actions of each wind turbine are determined sequentially, and the actions of each wind turbine are determined by the pitch angle and yaw angle. The joint action vector A drives each wind turbine in the wind farm through the command center. In return, the command center provides the next state and the current reward to the reinforcement learning agent. The goal is to find the optimal policy A*, which can maximize power while minimizing rotor blade damage.

[0010] Its further feature is that its specific implementation steps are as follows:

[0011] Step 1: First, construct the action space:

[0012] A t ={θ 1 ,L,θ N ,γ 1 ,L,γ N} t (1)

[0013] In the formula, θ i and γ i These are the pitch and yaw angles of the t-th wind turbine, respectively, with each step of the pitch and yaw angle discretized into 1° increments.

[0014] Step 2: The state space of the i-th wind turbine at time t can be expressed as:

[0015]

[0016] In the formula,

[0017]

[0018]

[0019] In the formula, and These are the angular velocity and resultant wind speed of the i-th wind turbine; f1,L,f o The best feature to represent a simplified global state;

[0020] Step 3: Since the objective is to maximize the total power generation (P) of the wind farm and minimize the total damage to the rotor blades, the instantaneous reward function at t+1 is defined as follows:

[0021]

[0022] In the formula, 0 < α < 1, the reward is defined to incentivize joint action to balance the power generated and the damage to the blades at a given α;

[0023] Step 4: Deep Reinforcement Learning Training

[0024] First, initialize the following constants, including: the number of fans N; the number of episodes nEpisodes; the number of batches within each episode nEpochs; ε min This represents ε-greedy exploration; the discount factor ξ; and the frequency K of the target network update.

[0025] For each wind turbine i, a separate buffer should be reserved for each of its agents, with n = 2 agents; one for pitch and the other for yaw; then, random weights W are used. n Initialize the policy network Q for two agents n And create a copy of the policy network as the target network;

[0026] In the outer loop, the environment is reset to a random state following a linear decay ε. In the inner loop, the next state and instantaneous reward are obtained, and an autoencoder is used to reduce the dimensionality of the global state. The state of each wind turbine is a combination of local and global information.

[0027] Then, pitch and yaw agents were trained sequentially for each wind turbine in the wind farm.

[0028] Specifically, for each wind turbine and agent, the latest transformation tuple is stored in a buffer in a first-in-first-out manner. A small batch is randomly drawn from this buffer, and the target vector is computed using standard Q-learning to train the policy network. Then, an operation is selected using an ε-greedy algorithm. In the last step of the inner loop, the next state is assigned to the current state. Finally, for both agents, the weights of the policy network are copied to the target network every K sets.

[0029] Wind farms typically consist of multiple independent wind turbines, which can interfere with each other, potentially leading to decreased power generation efficiency. This invention employs a reinforcement learning-based approach, where agents learn to achieve desired control objectives through model-free interaction with the environment. Feedback is provided in the form of rewards; the agent's goal is to optimize the total reward over a learning period. The proposed method considers both energy and fatigue in the reinforcement learning reward definition. Since this invention does not require a wind turbine model, it avoids errors related to model calibration in model predictive control. This invention trains individual agents for each wind turbine in a multi-agent reinforcement learning environment with a shared learning schedule, combining the learning of individual agents with shared global rewards. This invention constructs a joint control scheme that considers all wind turbines within the wind farm, coordinating the control of their operating states to obtain the optimal control performance indicators for the entire wind farm, achieving overall optimization of turbine power and load. Attached Figure Description

[0030] Figure 1 This is a flowchart illustrating the framework of the method of the present invention. Detailed Implementation

[0031] A joint control method for wind farms, see Figure 1 The deep Q-Network is used to jointly control the pitch and yaw of all wind turbines in the wind farm. The wind farm controller involved includes three modules: environment, autoencoder and two reinforcement learning agents.

[0032] The environment includes the wind turbines and command center within the wind farm. Controlled variables within the environment are blade pitch and yaw angles. Sensory inputs from the wind farm include free-flow wind speed and direction; the composite wind speed, rotor angular velocity, and power generated by each turbine; and the previous global state S. t+1 and global reward r t+1 The global state vector, obtained from the command center, will be transmitted via an autoencoder.

[0033] The auto encoder automatically converts the global state S tEncoding features with reduced quantity, an autoencoder consists of an input layer, several hidden layers, a central layer, and an output layer. For an autoencoder, the input and output layers of the neural network have the same characteristics, and the central layer contains some minimal nodes to represent the compression state. The number of nodes in the central layer represents the trade-off between effective compression and information loss.

[0034] Information on the use of two reinforcement learning agents The actions of each wind turbine are determined sequentially, and the actions of each wind turbine are determined by the pitch angle and yaw angle. The joint action vector A drives each wind turbine in the wind farm through the command center. In return, the command center provides the next state and the current reward to the reinforcement learning agent. The goal is to find the optimal policy A*, which can maximize power while minimizing rotor blade damage.

[0035] Its further feature is that its specific implementation steps are as follows:

[0036] Step 1: First, construct the action space:

[0037] A t ={θ 1 ,L,θ N ,γ 1 ,L,γ N} t (1)

[0038] In the formula, θ i and γ i These are the pitch and yaw angles of the t-th wind turbine, respectively, with each step of the pitch and yaw angle discretized into 1° increments.

[0039] Step 2: The state space of the i-th wind turbine at time t can be expressed as:

[0040]

[0041] In the formula,

[0042]

[0043]

[0044] In the formula, and These are the angular velocity and resultant wind speed of the i-th wind turbine; f1,L,f o The best feature to represent a simplified global state;

[0045] Step 3: Since the objective is to maximize the total power generation (P) of the wind farm and minimize the total damage to the rotor blades, the instantaneous reward function at t+1 is defined as follows:

[0046]

[0047] In the formula, 0 < α < 1, the reward is defined to incentivize joint action to balance the power generated and the damage to the blades at a given α;

[0048] Step 4: Deep Reinforcement Learning Training

[0049] First, initialize the following constants, including: the number of fans N; the number of episodes nEpisodes; the number of batches within each episode nEpochs; ε min This represents ε-greedy exploration; the discount factor ξ; and the frequency K of the target network update.

[0050] For each wind turbine i, a separate buffer should be reserved for each of its agents, with n = 2 agents; one for pitch and the other for yaw; then, random weights W are used. n Initialize the policy network Q for two agents n And create a copy of the policy network as the target network;

[0051] In the outer loop, the environment is reset to a random state following a linear decay ε. In the inner loop, the next state and instantaneous reward are obtained, and an autoencoder is used to reduce the dimensionality of the global state. The state of each wind turbine is a combination of local and global information.

[0052] Then, pitch and yaw agents were trained sequentially for each wind turbine in the wind farm.

[0053] Specifically, for each wind turbine and agent, the latest transformation tuple is stored in a buffer in a first-in-first-out manner. A small batch is randomly drawn from this buffer, and the target vector is computed using standard Q-learning to train the policy network. Then, an operation is selected using an ε-greedy algorithm. In the last step of the inner loop, the next state is assigned to the current state. Finally, for both agents, the weights of the policy network are copied to the target network every K sets.

[0054] This invention employs a reinforcement learning-based approach, where an agent learns to achieve the desired control objective through model-free interaction with the environment. Feedback is provided in the form of rewards. The agent's goal is to optimize the total reward over a learning period. The proposed method considers both energy and fatigue in the reinforcement learning reward definition. Since this invention does not require a wind turbine model, it avoids errors related to model calibration in model predictive control. This invention trains a single agent for each wind turbine in a multi-agent reinforcement learning environment with a shared learning schedule, combining the learning of the individual agent with a shared global reward. This invention constructs a joint control scheme that considers all wind turbines within the wind farm, coordinating the control of their operating states to obtain the optimal control performance indicators for the entire wind farm, achieving overall optimization of wind turbine power and load.

[0055] Q-Learning is a value-based reinforcement learning algorithm, so a very important value in the algorithm is Q-Value, which is also the origin of the name Q-Learning.

[0056] Deep Q-Network (DQN) refers to a Q-learning algorithm based on deep learning. It mainly combines value function approximation with neural network technology and uses a target network and experience replay method for network training.

[0057] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.

[0058] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can also be appropriately combined to form other embodiments that can be understood by those skilled in the art.

Claims

1. A joint control method for wind farms, characterized in that: The deep Q-Network is used to jointly control the pitch and yaw of all wind turbines in the wind farm. The wind farm controller involved includes three modules: environment, autoencoder and two reinforcement learning agents. where the environment includes wind turbines in a wind farm and a command center, the control variables in the environment are blade pitch and yaw angle, the sensory inputs from the wind farm include free stream wind speed and direction, the resultant wind speed generated by each wind turbine, rotor angular velocity, and power; the previous global state S t+1 and global reward r t+1 is obtained from the command center, the global state vector will be passed through an autoencoder; The auto encoder automatically converts the global state S t Encoding features with reduced quantity, an autoencoder consists of an input layer, several hidden layers, a central layer, and an output layer. For an autoencoder, the input and output layers of the neural network have the same characteristics, and the central layer contains some minimal nodes to represent the compression state. The number of nodes in the central layer represents the trade-off between effective compression and information loss. Information on the use of two reinforcement learning agents ( , r t+1 The action of each wind turbine is determined sequentially, and the action of each wind turbine is determined by the pitch angle and yaw angle. The joint action vector A drives each wind turbine in the wind farm through the command center. In return, the command center provides the next state and the current reward to the reinforcement learning agent. The goal is to find the optimal policy A*, which can maximize power while minimizing rotor blade damage. The specific implementation steps are as follows: Step 1: First, construct the action space: (1) In the formula, and These are the pitch and yaw angles of the i-th wind turbine, respectively, with each step of the pitch and yaw angle being discretized into 1° increments. Step 2: The state space of the i-th wind turbine at time t can be expressed as: (2) In the formula, (3) (4) In the formula, and It represents the angular velocity and the combined wind speed of the i-th wind turbine; The best feature to represent a simplified global state; Step 3: Since the objective is to maximize the total power generation (P) of the wind farm and minimize the total damage to the rotor blades, the instantaneous reward function at t+1 is defined as follows: (5) In the formula, 0 < α < 1, the reward is defined to incentivize joint action to balance the power generated and the damage to the blades at a given α; Step 4: Deep Reinforcement Learning Training First, initialize the following constants, including: the number of fans N; the number of episodes nEpisodes; the number of batches within each episode nEpochs; ε min This represents ε-greedy exploration; the discount factor ξ; and the frequency K of the target network update. For each wind turbine i, a separate buffer should be reserved for each of its agents, with n=2 agents; one for pitch and the other for yaw; then, random weights W are used. n Initialize the policy network Q for two agents n And create a copy of the policy network as the target network; In the outer loop, the environment is reset to a random state following a linear decay ε. In the inner loop, the next state and instantaneous reward are obtained, and an autoencoder is used to reduce the dimensionality of the global state. The state of each wind turbine is a combination of local and global information. Then, pitch and yaw agents were trained sequentially for each wind turbine in the wind farm. For each wind turbine and agent, the latest transformation tuple is stored in a buffer in a first-in-first-out manner. A small batch is randomly drawn from this buffer, and the target vector is computed using standard Q-learning. The policy network is trained, and then an operation is selected using an ε-greedy algorithm. In the last step of the inner loop, the next state is assigned to the current state. Finally, for both agents, the weights of the policy network are copied to the target network every K sets.