Wind farm optimal frequency trajectory collaborative control method and system based on contrast clustering and hierarchical reinforcement learning
By comparing clustering and hierarchical reinforcement learning methods, wind turbines are dynamically selected and grouped, a multi-agent model is constructed, and a multi-objective reward function and power allocation strategy are designed. This solves the frequency stability problem of wind farms, realizes optimal collaborative control of wind farm frequency trajectories, and improves frequency stability and wind turbine safety.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-26
AI Technical Summary
Existing wind farm frequency stabilization control methods fail to fully utilize the flexibility of wind turbine control, resulting in state space explosion, difficulty in training convergence, and inability to effectively respond to changes in wind turbine state, which may lead to a secondary frequency drop.
By employing contrastive clustering and hierarchical reinforcement learning, wind turbines with frequency regulation capabilities are dynamically selected. Through feature extraction networks and adaptive density clustering, a multi-agent Markov decision process model is constructed. A multi-objective reward function and a lower-level power allocation strategy are designed to achieve coordinated control of the optimal frequency trajectory of the wind farm.
It significantly improves the frequency stability of wind farms, dynamically adapts to changes in wind turbine status, avoids secondary frequency drops, and reduces the difficulty of engineering implementation and computational burden.
Smart Images

Figure CN122068490B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of power system frequency stability control technology, specifically relating to a method and system for collaborative control of optimal frequency trajectory of wind farms based on contrastive clustering and hierarchical reinforcement learning. Background Technology
[0002] With the large-scale grid connection of wind power, the inertia level of the power system is decreasing, and the frequency stability problem is becoming increasingly prominent. Wind turbines can provide short-term frequency support for the system by releasing rotor kinetic energy, but most existing control methods imitate the inertia response and primary frequency regulation characteristics of synchronous generators (such as virtual inertia control and droop control), failing to fully utilize the advantages of flexible and adaptable wind turbine control, and neglecting issues such as multi-machine coordination within wind farms, rotor kinetic energy constraints, and secondary frequency drops.
[0003] For large-scale wind farms (dozens to hundreds of turbines), traditional multi-agent reinforcement learning methods (such as MADDPG) present challenges such as state space explosion, difficulty in training convergence, and excessive communication and computational burdens, as each turbine is treated as an independent agent. Furthermore, in actual operation, some turbines may lack frequency regulation capabilities due to low speed, unsuitable wind speed, or malfunctions. Issuing frequency regulation commands to these turbines could lead to excessive deceleration, shutdown, or even damage, causing more serious problems such as secondary frequency drops. Clustering algorithms are an effective means of grouping large-scale wind farms, but traditional offline clustering cannot adapt to dynamic changes in turbine states (such as wind speed changes, fault recovery, and turbine exit). Summary of the Invention
[0004] The purpose of this invention is to provide a wind farm optimal frequency trajectory collaborative control method and system based on contrastive clustering and hierarchical reinforcement learning, which realizes optimal frequency trajectory collaborative tracking at the wind farm level, significantly improves the system's lowest frequency point, dynamically adapts to changes in wind turbine status, and effectively ensures wind turbine operation safety.
[0005] The technical solution for achieving the objective of this invention is as follows:
[0006] A collaborative control method for optimal frequency trajectory of wind farms based on contrastive clustering and hierarchical reinforcement learning includes:
[0007] Step 1: Establish criteria for determining the frequency regulation capability of wind turbines, and dynamically select the set of wind turbines with frequency regulation capability based on real-time operating data;
[0008] Step 2: A feature extraction network is used to map the original state of the wind turbines with frequency modulation capability to a low-dimensional feature space. Then, an adaptive density clustering algorithm is used to dynamically group the wind turbines to obtain wind turbine groups.
[0009] Step 3: Define the aggregated observation state of each group, treat each group as a virtual agent, and construct a multi-agent Markov decision process model for inter-group collaboration in the upper layer.
[0010] Step 4: Design a multi-objective reward function to guide the multi-agent Markov decision process, so that the wind power frequency trajectory approaches the theoretical optimal trajectory.
[0011] Step 5: Train the multi-agent Markov decision process model using a multi-agent deep reinforcement learning algorithm;
[0012] Step 6: Design a lower-level group power allocation strategy to dynamically allocate the total group power command output from the upper level to the adjustable frequency wind turbines in the group according to the real-time frequency modulation capability of each wind turbine.
[0013] Step 7: Deploy the trained multi-agent Markov decision process model and the lower-level intra-group power allocation strategy to the wind farm group controller to perform collaborative control of the optimal frequency trajectory of the wind farm, and dynamically update the clustering results according to the changes in the wind turbine status.
[0014] Furthermore, the criteria for determining the frequency regulation capability of the wind turbine mentioned in step 1 include:
[0015] Speed safety margin conditions: ,in For the first Typhoon machines at all times rotational speed, This is the minimum safe speed for the fan. This is a preset speed safety margin;
[0016] Rotor kinetic energy level conditions: ,in , indicating the first Typhoon machines at all times The kinetic energy state, The minimum kinetic energy ratio threshold. For the first The initial rotational speed of the typhoon fan;
[0017] Wind speed conditions: ,in The cut-in wind speed for the fan. To cut off the wind speed for the fan. For the first Typhoon machines at all times wind speed;
[0018] Equipment status conditions: The fan is fault-free and not in power-limited operation mode;
[0019] Communication status: The wind turbine and the wind farm group controller are communicating normally.
[0020] Furthermore, the feature extraction network described in step 2 is a multilayer perceptron, which is pre-trained using the SimCLR framework for contrastive learning.
[0021] Furthermore, the SimCLR framework is used for contrastive learning pre-training, specifically including:
[0022] A large number of wind turbine status samples were collected from historical data to construct a training dataset;
[0023] Each sample is augmented twice with random data to obtain positive sample pairs;
[0024] A projection head is connected after the feature extraction network;
[0025] For each positive sample pair, calculate the NT-Xent contrast loss, and then calculate the average of the NT-Xent contrast losses of the positive sample pairs as the total loss;
[0026] The Adam optimizer is used to train the network until convergence based on the total loss. After training, the feature extraction network is retained and the projection head is discarded.
[0027] Furthermore, step 2 employs the streaming HDBSCAN adaptive density clustering algorithm for dynamic grouping of wind turbines, specifically including:
[0028] Based on the current set of wind turbines with frequency regulation capabilities, calculate the reachability distance between all point pairs and construct a minimum spanning tree; the reachability distance is:
[0029]
[0030] in The core distance, i.e., the point To the The distance to the nearest neighbor, for and The Euclidean distance between them and These are the feature vectors extracted by the feature extraction network for the i-th and j-th wind turbines, respectively. For point Euclidean distance to its k-th nearest neighbor For point Euclidean distance to its k-th nearest neighbor This is the preset minimum cluster size;
[0031] Hierarchical clustering is performed on the minimum spanning tree, and cluster partitioning is automatically selected based on cluster stability; this includes: calculating the stability for each cluster. ,in , and To set a threshold, The cluster is represented by the cluster with the highest stability, and the cluster is selected as the final cluster assignment for that point. Points not assigned to any cluster are marked as noise points.
[0032] When a new air unit joins or leaves a set of air units with frequency regulation capabilities, the minimum spanning tree and cluster allocation are dynamically updated.
[0033] Furthermore, in step 3, the aggregated observation status of each group is obtained by average pooling the feature vectors of the wind turbines within the group and concatenating the group statistics, that is:
[0034]
[0035] in For the first The aggregated observation vectors of each group, For the first A set of wind turbine indexes within a group. This represents the total number of fans within the group. For the wind turbine feature vector, The average rotational speed, This represents the proportion of the total kinetic energy of the group. This represents the average frequency deviation.
[0036] Furthermore, the multi-objective reward function in step 4 includes trajectory tracking reward (aiming at the theoretically optimal frequency trajectory), energy conservation reward, inter-group cooperation reward, safety constraint reward, and smoothness reward, namely:
[0037]
[0038]
[0039]
[0040]
[0041]
[0042] in, As a reward for trajectory tracking, As a reward for energy conservation, For inter-group collaborative rewards, To ensure safety, rewards are imposed. For smooth reward, This represents the proportion of the total available kinetic energy of the group. This is the frequency modulation start threshold. For indicator functions, , , , , To set coefficients, For tolerance, For wind turbine time The actual frequency deviation, For wind turbine time Reference frequency deviation, For the first The total frequency modulation power command for the virtual agents, where K is the number of virtual agents. This is the minimum safe speed for the fan. This is the highest safe speed for the fan. For the first The rotational speed of the typhoon fan;
[0043] exist Additional terminal sparse reward , .
[0044] Furthermore, for the multi-agent deep reinforcement learning algorithm, each virtual agent includes an Actor network and a Critic network. The Actor network has a 6-dimensional input layer, a hidden layer of [128, 64], and a 1-dimensional output layer, scaled to the total power range of the group after tanh activation. The Critic network has an input layer of... dimension, The maximum number of groups is set, the hidden layer is [256, 128], the output evaluation Q value is used, and the multi-agent Markov decision process model is trained using a centralized training-distributed execution framework.
[0045] Furthermore, step 6 involves designing the power allocation strategy within the lower-level group, including:
[0046] Step 6-1, based on rotor kinetic energy, speed margin, and inertia, determine the weights as follows:
[0047]
[0048] in For the first in the group The inertia of a typhoon generator. The average inertia of the adjustable frequency fans within the group. Indicates the first Typhoon machines at all times The rotor kinetic energy state, in For the first Typhoon machines at all times rotational speed, This is the minimum safe speed for the fan. For the first The initial rotational speed of the typhoon fan;
[0049] Step 6-2: Perform preliminary allocation according to weight;
[0050] Step 6-3: Check whether the initial allocation of each wind turbine meets the power limit and speed safety estimate. If not, use iterative proportional allocation or quadratic programming for reallocation.
[0051] Step 6-4: When the wind turbine status becomes non-frequency adjustable, immediately remove it from the group and redistribute the remaining power, and iterate through Step 6-1 again to calculate and allocate weights.
[0052] A wind farm optimal frequency trajectory cooperative control system based on contrastive clustering and hierarchical reinforcement learning includes:
[0053] The wind turbine status monitoring module is used to collect real-time operating data of each wind turbine and evaluate its frequency regulation capability;
[0054] The dynamic filtering module generates a set of wind turbines that currently have frequency regulation capabilities based on the criteria for determining wind turbine frequency regulation capabilities.
[0055] The comparison clustering module includes a pre-trained feature extraction network and an adaptive density clusterer, used for online feature extraction and dynamic clustering of wind turbines with frequency regulation capabilities;
[0056] The upper-level collaborative training module defines the aggregated observation state of each group, treats each group as a virtual agent, constructs a multi-agent Markov decision process model for upper-level group collaboration, designs a multi-objective reward function to guide the multi-agent Markov decision process, makes the wind power frequency trajectory approximate the theoretical optimal trajectory, and uses a multi-agent deep reinforcement learning algorithm to train the multi-agent Markov decision process model.
[0057] The lower-level group power allocation strategy design module designs the lower-level group power allocation strategy, which is used to dynamically allocate the total group power command output by the upper level to the adjustable frequency wind turbines in the group according to the real-time frequency adjustment capability of each wind turbine.
[0058] The cluster controller, deployed in each cluster, includes a pre-trained multi-agent Markov decision process model and a lower-level power allocation strategy within the cluster. It performs collaborative control of the optimal frequency trajectory of the wind farm and monitors the status of the wind turbines within the cluster in real time, dynamically updating the clustering results based on changes in the turbine status.
[0059] Compared with the prior art, the present invention has the following beneficial effects:
[0060] (1) This invention adaptively aggregates dozens to hundreds of wind turbine units into several groups (usually 5-12) by comparing and clustering dynamic grouping. Each group participates in upper-level collaboration as a virtual intelligent agent, which greatly reduces the state space and action dimension of reinforcement learning, making it possible to apply multi-agent reinforcement learning in large-scale wind farms.
[0061] (2) The present invention adopts data-driven contrastive clustering + model-free reinforcement learning. It only requires the local measurable state of the wind turbine (rotation speed, wind speed, SoC, frequency deviation) to realize feature extraction, dynamic grouping and collaborative control. No system parameter identification or modeling is required, which greatly reduces the difficulty of engineering implementation.
[0062] (3) The present invention uses a contrastive learning (SimCLR) pre-trained feature extraction network to automatically learn the deep feature representation of the wind turbine operating status in massive historical data, so that wind turbines with the same operating conditions naturally cluster in the feature space, while wind turbines with different operating conditions are far apart from each other, providing a better input for subsequent density clustering. Moreover, the feature extraction network has a small computational load and is suitable for online deployment.
[0063] (4) The upper layer of this invention uses the theoretically optimal frequency trajectory as the tracking target and trains the inter-group cooperative strategy through multi-agent reinforcement learning (MADDPG) to achieve adaptive allocation of frequency regulation tasks among groups according to the proportion of total kinetic energy. The lower layer adopts a dynamic weight allocation strategy based on rotor kinetic energy, speed margin, and inertia to decompose the total power command of the group to each wind turbine according to real-time capability, and monitors speed safety and dynamically adjusts the allocation in real time. This hierarchical architecture not only ensures the optimality of the global frequency trajectory, but also achieves fine local control, effectively avoiding excessive deceleration of wind turbines and secondary frequency drops. Attached Figure Description
[0064] Figure 1 This is an overall flowchart of the method of the present invention;
[0065] Figure 2 This is a schematic diagram of the dynamic clustering of comparative clustering in this invention;
[0066] Figure 3 This is a diagram of the hierarchical collaborative control architecture of the present invention;
[0067] Figure 4 Schematic diagram of the upper-layer multi-agent reinforcement learning training architecture;
[0068] Figure 5 This is a flowchart of the lower-level power allocation strategy of the present invention;
[0069] Figure 6 This is a flowchart of the dynamic screening process for the frequency regulation capability of wind turbines according to the present invention;
[0070] Figure 7 This is a schematic diagram of the online update of the streaming HDBSCAN of the present invention;
[0071] Figure 8 This is a comparison graph of the system frequency response (simulation results). Detailed Implementation
[0072] The present invention will be further described below with reference to the accompanying drawings and embodiments, but this should not be construed as limiting the present invention.
[0073] This embodiment provides a collaborative control method for optimal frequency trajectory of wind farms based on contrastive clustering and hierarchical reinforcement learning, such as... Figure 1 As shown, the steps are as follows:
[0074] Step 1: Establish criteria for determining the frequency regulation capability of wind turbines. Based on real-time operating data, dynamically select the set of wind turbines that currently have frequency regulation capability as the basis for subsequent clustering and control.
[0075] Step 11: Define the condition that the i-th wind turbine has frequency regulation capability at time t:
[0076] (1) Speed safety margin condition: ,in For the first Typhoon machines at all times rotational speed, This is the minimum safe speed for the fan. To ensure a safe operating speed margin and sufficient space for kinetic energy release;
[0077] (2) Rotor kinetic energy level conditions: ,in , This is the minimum kinetic energy ratio threshold (usually taken as 0.1~0.2). For the first The initial rotational speed of the typhoon fan;
[0078] (3) Wind speed conditions: To ensure the wind turbine is in power generation mode, among which The cut-in wind speed for the fan. To cut off the wind speed for the fan. For the first Typhoon machines at all times Wind speed, units are ;
[0079] (4) Equipment status conditions: The wind turbine converter and pitch system are fault-free and not in power-limited operation mode (this can be determined by the wind turbine status word).
[0080] (5) Communication status conditions: The wind turbine communicates normally with the group controller, with no timeout or packet loss.
[0081] Step 12: The filtering is performed online in real time. The adjustable frequency set is updated at fixed intervals (e.g., 1 second) or when the fan status changes (e.g., speed change exceeds a threshold, fault signal trigger). The filtering results serve as input for the subsequent clustering module.
[0082] Step 2: Perform online clustering on the adjustable frequency fans selected in Step 1, and train a feature extraction network on massive historical operating data using a contrastive learning framework, such as... Figure 2 As shown, the original state of the wind turbine is mapped to a low-dimensional feature space, and then an adaptive density clustering algorithm is used in the feature space to achieve dynamic clustering of the wind turbines. This step integrates contrastive learning feature extraction and density clustering into a "contrastive clustering" method.
[0083] The maximum number of groups, N_max, is set, and its value can be determined statistically based on the historical clustering results of wind farms, typically set to 1.5 to 2 times the number of typical groups. If the actual number of groups K obtained by the comparative clustering algorithm at the current moment is less than N_max, then virtual agents with numbers m ∈ [K+1, N_max] are identified as "invalid agents". The observation states of these invalid agents are set to zero vectors, their action outputs are masked, they do not participate in environmental interaction, and when calculating the Q-value of the Critic network or the policy gradient of the Actor network, their corresponding loss terms are also masked to zero, and invalid agents are processed using a mask.
[0084] Step 21, the Typhoon machines at all times The original feature vector include:
[0085] (1) Dynamic state characteristics: rotational speed (Normalized) wind speed (Normalized) Rotor kinetic energy state Local frequency deviation ;
[0086] (2) Static attributes: geographic coordinates (Normalized) Moment of Inertia (Normalized) Rated Power (Normalization).
[0087] Total feature dimensions (Rotation speed, wind speed, SoC, frequency deviation, x-coordinate, y-coordinate, inertia). Rated power is optional; if included... .
[0088] Step 22: Use a multilayer perceptron as a feature extractor. Mapping the original features to a low-dimensional feature vector .
[0089] Contrastive learning pre-training was performed using the SimCLR framework:
[0090] (1) Collect a large number of wind turbine status samples from historical data to construct a training dataset. .
[0091] (2) For each sample After performing two random data augmentations, we obtained... and As positive sample pairs. Data augmentation methods include: adding Gaussian noise. , Use values between 0.01 and 0.05; scale randomly. Random mask, randomly setting some dimensions to zero.
[0092] (3) A projection head is connected after the feature extractor. (An additional 2 layers of MLP, outputting 128 dimensions, discarded after training), yielding a representation for the contrastive loss. .
[0093] (4) A batch contains One enhanced sample In the process of enhancing each original sample twice, for each positive sample pair Calculate the NT-Xent contrast loss:
[0094]
[0095] in For cosine similarity, This is the temperature parameter (usually taken as 0.5).
[0096] (5) The total loss is the average of all positive sample pairs: ;
[0097] (6) Train using the Adam optimizer until convergence (usually 50-100 epochs).
[0098] After training, retain the feature extractor. Discard the projector head This feature extractor can map the operating status of wind turbines to a semantically meaningful low-dimensional space, allowing wind turbines with similar operating conditions to cluster in the feature space.
[0099] Step 23: Real-time status of each wind turbine after deployment. Input Feature Extractor , to obtain the feature vector This serves as the input for subsequent clustering.
[0100] Step 24: Perform online clustering of the variable frequency fan in the feature space using the streaming HDBSCAN algorithm to automatically discover clusters of arbitrary shapes without pre-setting the number of clusters.
[0101] (1) Core parameter: minimum cluster size Typically, a value of 5-10 is used to determine the minimum number of points in a cluster; the core distance parameter is usually set to... .
[0102] (2) Core distance For point To its first The distance to the nearest neighbor; the reach distance is... ,in For Euclidean distance, For point Euclidean distance to its k-th nearest neighbor For point The Euclidean distance to its k-th nearest neighbor is among This is the preset minimum cluster size.
[0103] (3) Initial cluster construction:
[0104] 1. Based on the current adjustable frequency fan assembly Calculate the reachability distance between all pairs of points;
[0105] 2. Construct a weighted adjacency matrix for the reachability graph, where the edge weights are the reachability distances;
[0106] 3. Construct a minimum spanning tree (MST) using Prim's algorithm, with edges sorted in ascending order of reachability.
[0107] 4. Perform hierarchical clustering on the MST: delete edges in ascending order of reach distance to split the tree into several connected components;
[0108] 5. For each component (cluster), calculate its stability. ,in , and These are the thresholds for a point to enter and leave the cluster, respectively;
[0109] 6. Select the cluster with the highest stability as the final cluster partition, and mark the points that are not assigned to any cluster as "noise points".
[0110] (4) When a fresh air unit is added to the adjustable frequency set, the following steps are included:
[0111] 1. Calculate the new point Euclidean distances to all existing points;
[0112] 2. Find of Find the nearest neighbor and calculate its core distance. ;
[0113] 3. Calculation Distances between it and all existing points;
[0114] 4. To insert into an existing minimum spanning tree: find the edge connecting to the nearest point in the tree and insert it according to the distance between them;
[0115] 5. Update the cluster assignment of the affected area: Re-perform hierarchical clustering and stability calculation on the local area near the new point to determine the affiliation of the new point or whether a new cluster is formed.
[0116] (5) When a wind turbine leaves the adjustable frequency set, first delete the node and its associated edges from the minimum spanning tree; then reconnect the disconnected subtrees (using the second smallest spanning edge); finally update the cluster assignment of the affected area.
[0117] To avoid error accumulation, at fixed intervals (e.g., every 5 minutes) or when the cumulative number of insertions / deletions exceeds a threshold (e.g., 20 times), the entire current adjustable frequency wind turbine set is rerun to reset the minimum spanning tree and cluster partitioning.
[0118] Step 25: Wind turbines not assigned to any cluster (noise points) do not participate in upper-level collaborative control and only operate in local MPPT mode (MPPT mode is a well-known control mode for photovoltaic power generation systems in this field) to avoid issuing frequency regulation commands to unsuitable wind turbines. A buffering mechanism can be set for noise points: if a noise point is detected multiple times consecutively, an alarm is triggered and the wind turbine status is checked.
[0119] Step 26, each cluster As a group, its aggregated observations It is composed of the following parts:
[0120] (1) Average pooling of all wind turbine eigenvectors within the group: ;
[0121] (2) The average rotational speed is The proportion of total kinetic energy of the group is Average frequency deviation Therefore, the aggregated observation status of each group ,in .
[0122] Step 3: Treat each group as a virtual intelligent agent and construct a Dec-POMDP model for upper-level collaboration. For example... Figure 3 As shown, set the maximum number of groups. Invalid agents are handled using a mask.
[0123] Step 31: Set the upper-level global state For groups that do not exist, their observations are set to zero vectors.
[0124] Step 32, Actions of the m-th virtual agent The group total frequency modulation power command is subject to the following constraints:
[0125] (1) Group total power limit: Among them, a single wind turbine , ;
[0126] (2) Group total slope rate constraint: The single wind turbine's gradeability .
[0127] Step 33: The system frequency dynamics are described by the swing equation:
[0128]
[0129] in The synchronous generator governor response has an unknown precise model and is treated as a black box in the environment.
[0130] Step 34: The state transition is implicitly determined by the system dynamics, the total power response of each group, and the intra-group allocation results, without the need for explicit modeling.
[0131] Discount factor in reinforcement learning of Dec-POMDP model .
[0132] Step 4: Design an upper-level multi-objective reward function. The upper-level reward function is used to guide the inter-group cooperative strategy so that the system frequency trajectory approximates the theoretical optimal trajectory.
[0133] Step 41: The optimal frequency trajectory that maximizes the lowest frequency point has an exponential approximation form:
[0134]
[0135] Among them, parameters , , For design parameters, values are typically taken as 1.1 to 1.3. For the overall gain of the system's primary frequency modulation, For system inertia, The damping coefficient is... This is the adjustment coefficient.
[0136] Step 42: Design trajectory tracking rewards:
[0137]
[0138] in Using a range of 50 to 200, the reward guidance system will frequently track the optimal trajectory.
[0139] Step 43: Design inter-group energy conservation rewards:
[0140] (1) Continuous penalty items:
[0141]
[0142] (2) Terminal sparse reward (frequency modulation end time) calculate):
[0143]
[0144] in Take 20~100, Take 50~200, Take 0.01 pu·s.
[0145] Step 44: Design inter-group collaboration to encourage groups with sufficient total momentum to undertake more frequency modulation tasks:
[0146]
[0147] in This represents the proportion of the total available kinetic energy of the group. This is the frequency modulation start threshold (usually -0.03Hz). For indicator functions, Take 5~20.
[0148] Step 45: If an adjustable frequency fan in a group exceeds its speed limit, a penalty will be imposed:
[0149]
[0150] in Select 100~500.
[0151] Step 46: Design a smoothness reward:
[0152]
[0153] in Use a value of 0.5 to 5 to suppress power spikes.
[0154] Step 47: Design the overall upper-level reward:
[0155]
[0156] And in Add extra .
[0157] Step 5: Train the upper-layer inter-swarm cooperative policy using a centralized training-distributed execution multi-agent deep reinforcement learning algorithm. Each virtual agent (swarm) contains an Actor network and a Critic network. For example... Figure 4 As shown, a masking mechanism is used to handle variable-quantity groups.
[0158] Step 51, Actor Network The input layer is 6-dimensional, the hidden layer is [128, 64], and the output layer is 1-dimensional (group total power instruction), scaled to the group total power range after tanh activation; Critic network The input layer is Dimension, hidden layer [256, 128], output Q value.
[0159] Step 52: Employ a centralized training and distributed execution framework. During training, trajectories are sampled from experience replays. For each time step, the loss is calculated using the actual existing groups, and the mask is set to zero.
[0160] Step 6: Design the lower-level group power allocation strategy and output the upper-level group total power command. The frequency-adjustable fans within the group need to be allocated reasonably, while also taking into account the real-time frequency adjustment capabilities of each fan. Figure 5 .
[0161] Step 61: Weight Calculation. The weights are calculated based on the current rotor kinetic energy, speed margin, and inertia of the wind turbine.
[0162]
[0163] in The average inertia of the adjustable frequency fans within the group. .
[0164] Step 62: Perform preliminary allocation based on weights
[0165]
[0166] Step 63: Check whether the initial allocation of each fan meets the power limit and speed safety estimate:
[0167] (1) If Then let And record the excess amount;
[0168] (2) If Then let And record the missing parts;
[0169] (3) Predicting the change in rotational speed: based on the rotor motion equation Approximate estimate, if the expected rotational speed is lower than If so, reduce the power of the fan to a safe value.
[0170] The excess / insufficient power needs to be redistributed, which can be done using iterative proportional allocation or quadratic programming, until all wind turbines meet the constraints and the total power equals the required amount. .
[0171] Step 64: If a fan in the group becomes non-frequency adjustable (e.g., its speed drops to the lower limit), immediately remove it from the current group and recalculate and redistribute its weights. The remaining power is then supplied by other frequency-adjustable fans. Simultaneously, report this information to the central monitoring unit so that the cluster structure can be adjusted during the next clustering update. Figure 6 .
[0172] Step 7: Deploy the trained upper-level policy and lower-level allocation policy to the wind farm group controller, and dynamically update the clustering results according to the changes in wind turbine status. In this deployment step, each group is equipped with a group controller, which is responsible for group state aggregation, upper-level policy inference, lower-level power allocation, and communication with wind turbines; at the same time, a central monitoring unit is set up to be responsible for periodically triggering re-clustering and updating model parameters.
[0173] This embodiment also provides a wind farm optimal frequency trajectory collaborative control system based on contrastive clustering and hierarchical reinforcement learning, including:
[0174] The wind turbine status monitoring module is used to collect real-time operating data of each wind turbine and evaluate its frequency regulation capability;
[0175] The dynamic filtering module generates a set of wind turbines that currently have frequency regulation capabilities based on the criteria for determining wind turbine frequency regulation capabilities.
[0176] The comparison clustering module includes a pre-trained feature extraction network and an adaptive density clusterer, used for online feature extraction and dynamic clustering of wind turbines with frequency regulation capabilities;
[0177] The upper-level collaborative training module defines the aggregated observation state of each group, treats each group as a virtual agent, constructs a multi-agent Markov decision process model for upper-level group collaboration, designs a multi-objective reward function to guide the multi-agent Markov decision process, makes the wind power frequency trajectory approximate the theoretical optimal trajectory, and uses a multi-agent deep reinforcement learning algorithm to train the multi-agent Markov decision process model.
[0178] The lower-level group power allocation strategy design module designs the lower-level group power allocation strategy, which is used to dynamically allocate the total group power command output by the upper level to the adjustable frequency wind turbines in the group according to the real-time frequency adjustment capability of each wind turbine.
[0179] The cluster controller, deployed in each cluster, includes a pre-trained multi-agent Markov decision process model and a lower-level power allocation strategy within the cluster. It performs collaborative control of the optimal frequency trajectory of the wind farm and monitors the status of the wind turbines within the cluster in real time, dynamically updating the clustering results based on changes in the turbine status.
[0180] The system also includes a central monitoring unit responsible for periodically triggering clustering updates and model fine-tuning, and a local wind turbine controller that receives and executes commands, smoothly switching between MPPT control and other functions. In practical deployment, the central monitoring unit is located in the wind farm control center, equipped with a GPU server, and is responsible for maintaining the global feature extraction network. Parameters that periodically (e.g., every 5 minutes) trigger a full HDBSCAN re-clustering (e.g.) Figure 7 The system receives wind turbine status changes reported by each group controller and issues updated cluster configurations and model parameters. The group controller is deployed in the local control cabinet of each group and equipped with an embedded industrial computer. It is responsible for collecting the status of each wind turbine in the group in real time (every 0.1 seconds), running the feature extraction network forward inference (<5ms), maintaining the local minimum spanning tree and cluster allocation (incremental update), calculating the group aggregate observation, running the upper-level Actor network forward inference (<5ms), generating the group total power command, executing the lower-level allocation algorithm (<2ms), decomposing the command into individual wind turbine commands and issuing them, monitoring the safety status within the group, and triggering local protection or reallocation when necessary. The wind turbine local controller receives commands from the group controller and smoothly switches with MPPT control.
[0181] To verify the effectiveness of the wind farm frequency support control method based on contrastive clustering and hierarchical reinforcement learning proposed in this invention, a simulation platform based on the system frequency response (SFR) model was built in a Python environment. The equivalent system parameters were selected based on a typical power grid: inertia and time constant. Load damping coefficient Speed governor droop coefficient equivalent time constant of speed governor Power deficit set as (Corresponding to a 10% load surge), rated frequency 50Hz. The total capacity of the wind farm accounts for 8% of the system capacity. The simulation duration is 190s, comparing the traditional stepper inertial control (SIC) method with the method of this invention.
[0182] Figure 8 The system frequency response curves under two control strategies are shown in Table 1. Figure 8 As shown in Table 1, after the disturbance, the lowest frequency point of the SIC control occurred at 45s, at 49.63Hz; while the lowest point of the proposed method also occurred at 45s, at 49.66Hz, an improvement of 0.03Hz compared to SIC. During the frequency recovery phase, SIC eventually stabilized at 49.76Hz, while the proposed method recovered to 49.80Hz, reducing the steady-state frequency deviation by 0.04Hz. Furthermore, the proposed method exhibited a smoother recovery process between 45s and 60s, avoiding secondary drops. Throughout the simulation, the net energy integral of the wind farm approached zero, satisfying the rotor kinetic energy conservation constraint.
[0183] Table 1 Comparison of system frequency data under two control strategies
[0184] Time (s) SiC frequency (Hz) Frequency (Hz) of the method of this invention 0 50.00 50.00 10 50.02 50.01 20 50.01 50.01 30 50.01 50.01 40 49.95 49.90 45 49.63 49.66 50 49.63 49.67 55 49.78 49.80 60 49.79 49.82 65 49.77 49.81 70 49.76 49.80 75 49.76 49.80 80-190 49.76 49.80
[0185] The above results demonstrate that the method of the present invention can effectively improve the lowest frequency point of the system and enhance steady-state recovery performance, thus verifying its superiority in frequency support control.
[0186] This invention achieves collaborative tracking of the optimal frequency trajectory by automatically learning deep features through comparative clustering and adaptive grouping, and deeply integrating it with hierarchical reinforcement learning. This significantly improves the system's lowest frequency point, avoids secondary drops, and features a simple structure, high training efficiency, and good engineering applicability.
[0187] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should be considered within the scope of protection of the present invention.
Claims
1. A method for coordinated control of optimal frequency trajectory in wind farms based on contrastive clustering and hierarchical reinforcement learning, characterized in that, include: Step 1: Establish the criteria for determining the frequency regulation capability of wind turbines. Based on real-time operating data, dynamically select the set of wind turbines that currently have frequency regulation capability. For wind turbines that do not have frequency regulation capability, execute the MPPT mode. Step 2: A feature extraction network is used to map the original state of the wind turbines with frequency modulation capability to a low-dimensional feature space. Then, an adaptive density clustering algorithm is used to dynamically group the wind turbines to obtain wind turbine groups. Step 3: Define the aggregated observation state of each group, treat each group as a virtual agent, and construct a multi-agent Markov decision process model for inter-group collaboration in the upper layer. Step 4: Design a multi-objective reward function to guide the multi-agent Markov decision process, so that the wind power frequency trajectory approaches the theoretical optimal trajectory. Step 5: Train the multi-agent Markov decision process model using a multi-agent deep reinforcement learning algorithm; Step 6: Design a lower-level group power allocation strategy to dynamically allocate the total group power command output from the upper level to the adjustable frequency wind turbines in the group according to the real-time frequency modulation capability of each wind turbine. Step 7: Deploy the trained multi-agent Markov decision process model and the lower-level intra-group power allocation strategy to the wind farm group controller to perform collaborative control of the optimal frequency trajectory of the wind farm, and dynamically update the clustering results according to the changes in the wind turbine status. The feature extraction network described in step 2 is a multilayer perceptron, which is pre-trained using the SimCLR framework for contrastive learning. Contrastive learning pre-training is performed using the SimCLR framework, specifically including: A large number of wind turbine status samples were collected from historical data to construct a training dataset; Each sample is augmented twice with random data to obtain positive sample pairs; A projection head is connected after the feature extraction network; For each positive sample pair, calculate the NT-Xent contrast loss, and then calculate the average of the NT-Xent contrast losses of the positive sample pairs as the total loss; The Adam optimizer is used to train the network until convergence based on the total loss. After training, the feature extraction network is retained and the projection head is discarded. Step 2 employs the streaming HDBSCAN adaptive density clustering algorithm for dynamic grouping of wind turbines, specifically including: Based on the current set of wind turbines with frequency regulation capabilities, calculate the reachability distance between all point pairs and construct a minimum spanning tree; the reachability distance is: in The core distance, i.e., the point To the The distance to the nearest neighbor, for and The Euclidean distance between them and These are the feature vectors extracted by the feature extraction network for the i-th and j-th wind turbines, respectively. For point Euclidean distance to its k-th nearest neighbor For point Euclidean distance to its k-th nearest neighbor This is the preset minimum cluster size; Hierarchical clustering is performed on the minimum spanning tree, and cluster partitioning is automatically selected based on cluster stability; this includes: calculating the stability for each cluster. ,in , and To set a threshold, The cluster is represented by the cluster with the highest stability, and the cluster is selected as the final cluster assignment for that point. Points not assigned to any cluster are marked as noise points. When a new air unit joins or leaves a set of air units with frequency regulation capabilities, the minimum spanning tree and cluster allocation are dynamically updated.
2. The wind farm optimal frequency trajectory collaborative control method based on contrastive clustering and hierarchical reinforcement learning according to claim 1, characterized in that, The criteria for determining the frequency regulation capability of the wind turbine mentioned in step 1 include: Speed safety margin conditions: ,in For the first Typhoon machines at all times rotational speed, This is the minimum safe speed for the fan. This is a preset speed safety margin; Rotor kinetic energy level conditions: ,in , indicating the first Typhoon machines at all times The kinetic energy state, The minimum kinetic energy ratio threshold. For the first The initial rotational speed of the typhoon fan; Wind speed conditions: ,in The cut-in wind speed for the fan. To cut off the wind speed for the fan. For the first Typhoon machines at all times wind speed; Equipment status conditions: The fan is fault-free and not in power-limited operation mode; Communication status: The wind turbine and the wind farm group controller are communicating normally.
3. The wind farm optimal frequency trajectory collaborative control method based on contrastive clustering and hierarchical reinforcement learning according to claim 1, characterized in that, In step 3, the aggregated observation status of each group is obtained by average pooling the feature vectors of the wind turbines within the group and concatenating the group statistics, that is: in For the first The aggregated observation vectors of each group, For the first A set of wind turbine indexes within a group. This represents the total number of fans within the group. For the wind turbine feature vector, The average rotational speed, This represents the proportion of the total kinetic energy of the group. This represents the average frequency deviation.
4. The method for coordinated control of optimal frequency trajectory of wind farm based on contrastive clustering and hierarchical reinforcement learning according to claim 1, characterized in that, The multi-objective reward function mentioned in step 4 includes trajectory tracking reward (aiming at the theoretically optimal frequency trajectory), energy conservation reward, inter-group cooperation reward, safety constraint reward, and smoothness reward, namely: in, As a reward for trajectory tracking, As a reward for energy conservation, For inter-group collaborative rewards, To ensure safety, rewards are imposed. For smooth reward, This represents the proportion of the total available kinetic energy of the group. This is the frequency modulation start threshold. For indicator functions, , , , , To set coefficients, For tolerance, For wind turbine time The actual frequency deviation, For wind turbine time Reference frequency deviation, For the first The total frequency modulation power command for the virtual agents, where K is the number of virtual agents. This is the minimum safe speed for the fan. This is the highest safe speed for the fan. For the first The rotational speed of the typhoon fan; exist Additional terminal sparse reward , .
5. The wind farm optimal frequency trajectory collaborative control method based on contrastive clustering and hierarchical reinforcement learning according to claim 1, characterized in that, For multi-agent deep reinforcement learning algorithms, each virtual agent includes an Actor network and a Critic network. The Actor network has a 6-dimensional input layer, a hidden layer of [128, 64], and a 1-dimensional output layer, scaled to the group's total power range after tanh activation. The Critic network has an input layer of... dimension, The maximum number of groups is set, the hidden layer is [256, 128], the output evaluation Q value is used, and the multi-agent Markov decision process model is trained using a centralized training-distributed execution framework.
6. The method for coordinated control of optimal frequency trajectory of wind farm based on contrastive clustering and hierarchical reinforcement learning according to claim 1, characterized in that, Step 6 involves designing the power allocation strategy within the lower-level group, including: Step 6-1, based on rotor kinetic energy, speed margin, and inertia, determine the weights as follows: in For the first in the group The inertia of a typhoon generator. The average inertia of the adjustable frequency fans within the group. Indicates the first Typhoon machines at all times The rotor kinetic energy state, in For the first Typhoon machines at all times rotational speed, This is the minimum safe speed for the fan. For the first The initial rotational speed of the typhoon fan; Step 6-2: Perform preliminary allocation according to weight; Step 6-3: Check whether the initial allocation of each wind turbine meets the power limit and speed safety estimate. If not, use iterative proportional allocation or quadratic programming for reallocation. Step 6-4: When the wind turbine status becomes non-frequency adjustable, immediately remove it from the group and redistribute the remaining power, and iterate through Step 6-1 again to calculate and allocate weights.
7. A wind farm optimal frequency trajectory cooperative control system for implementing the wind farm optimal frequency trajectory cooperative control method according to any one of claims 1-6, characterized in that, include: The wind turbine status monitoring module is used to collect real-time operating data of each wind turbine and evaluate its frequency regulation capability; The dynamic filtering module generates a set of wind turbines that currently have frequency regulation capabilities based on the criteria for determining wind turbine frequency regulation capabilities. The comparison clustering module includes a pre-trained feature extraction network and an adaptive density clusterer, used for online feature extraction and dynamic clustering of wind turbines with frequency regulation capabilities; The upper-level collaborative training module defines the aggregated observation state of each group, treats each group as a virtual agent, constructs a multi-agent Markov decision process model for upper-level group collaboration, designs a multi-objective reward function to guide the multi-agent Markov decision process, makes the wind power frequency trajectory approximate the theoretical optimal trajectory, and uses a multi-agent deep reinforcement learning algorithm to train the multi-agent Markov decision process model. The lower-level group power allocation strategy design module designs the lower-level group power allocation strategy, which is used to dynamically allocate the total group power command output by the upper level to the adjustable frequency wind turbines in the group according to the real-time frequency adjustment capability of each wind turbine. The cluster controller, deployed in each cluster, includes a pre-trained multi-agent Markov decision process model and a lower-level power allocation strategy within the cluster. It performs collaborative control of the optimal frequency trajectory of the wind farm and monitors the status of the wind turbines within the cluster in real time, dynamically updating the clustering results based on changes in the turbine status.