Virtual power plant secondary frequency modulation power optimization allocation method, device, equipment and medium

By using a virtual power plant secondary frequency regulation power optimization allocation method, the frequency regulation challenge of traditional AGC systems when a high proportion of renewable energy is connected to the grid is solved. This method improves the accuracy and speed of frequency regulation, balances economy and sustainability, and optimizes the utilization efficiency and carbon emissions of energy storage resources.

CN122267902APending Publication Date: 2026-06-23CHINA THREE GORGES CORPORATION

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA THREE GORGES CORPORATION
Filing Date
2026-03-24
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Traditional centralized AGC systems struggle to achieve fast and accurate secondary frequency regulation when a high proportion of renewable energy is integrated into the grid. Existing robust optimization methods are conservative, leading to increased costs. Metaheuristic algorithms tend to converge prematurely under complex multi-peak objectives and lack cross-regional coupling processing capabilities. Energy storage lifetime and carbon costs are underestimated, and there is a lack of balance between short-term and long-term sustainability.

Method used

A virtual power plant secondary frequency regulation power optimization allocation method is adopted. The initial power allocation is calculated through a distributed AGC framework. The uncertainty is modeled by deep reinforcement learning, and a swarm intelligence optimization algorithm is introduced for multi-objective optimization. By utilizing historical search experience memory, adaptive search radius and cross-regional reorganization mechanism, the synergistic optimization of frequency control performance, operation economy and resource sustainability is achieved.

Benefits of technology

It improves frequency regulation accuracy and response speed, enhances system robustness and economy, and balances battery life and long-term sustainability of carbon emissions, achieving a synergistic improvement in frequency regulation performance and economy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122267902A_ABST
    Figure CN122267902A_ABST
Patent Text Reader

Abstract

The present application relates to the field of new energy grid-connected control technology, and discloses a virtual power plant secondary frequency modulation power optimization distribution method, device, equipment and medium, the present application generates initial power regulation instruction of each distributed resource based on power grid frequency deviation; obtain historical operation data and / or real-time interaction data, and obtain the uncertainty distribution characteristics in the virtual power plant operation process based on the learning of the historical operation data and / or real-time interaction data; based on the initial power regulation instruction and the uncertainty distribution characteristics, the group intelligence optimization algorithm with memory mechanism, adaptive radius and cross-region reorganization is used to iteratively optimize and solve the multi-objective optimization function including frequency control performance, operation economy and resource sustainability, determine the optimal power regulation instruction of each distributed resource, solve the problem that the traditional frequency modulation method cannot consider dynamic uncertainty perception, global optimization and battery life, and improve the frequency regulation accuracy and response speed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of new energy grid connection control technology, specifically to a method, device, equipment, and medium for optimizing the allocation of secondary frequency regulation power in a virtual power plant. Background Technology

[0002] With the high proportion of renewable energy sources such as wind and solar power integrated into the grid, the power grid faces challenges such as frequency fluctuations, amplified prediction errors, and uncertainty coupling. As the second line of defense for grid frequency control, the AGC system achieves error-free frequency regulation by adjusting power output; this process is called secondary frequency regulation. The AGC system, or Automatic Generation Control System, controls the output of photovoltaic inverters to meet constantly changing user electricity demands, thereby ensuring the safe operation of the grid. However, traditional centralized AGC systems often rely on static parameters or linearized models, making it difficult to maintain fast and accurate secondary frequency regulation under highly time-varying scenarios. Meanwhile, while existing robust optimization methods, such as worst-case design based on IGDT, can guarantee safety margins, they often lead to increased costs and decreased resource utilization efficiency due to boundary-based conservatism. Furthermore, existing heuristic / meta-heuristic methods, such as standard ALA and PSO, are prone to premature convergence, insufficient local search, or weak cross-regional coupling processing capabilities under complex multi-peak objectives. Moreover, energy storage, as a key resource for secondary frequency regulation, has long had its lifespan degradation and carbon costs underestimated or treated with coarse weights, resulting in a lack of quantitative balance between short-term frequency regulation benefits and long-term sustainability. Summary of the Invention

[0003] This invention provides a method, apparatus, equipment, and medium for optimizing power allocation in a virtual power plant's secondary frequency regulation, which solves the problem that traditional frequency regulation methods cannot simultaneously address dynamic uncertainty perception, global optimization, and battery life loss, thereby improving frequency regulation accuracy and response speed.

[0004] In a first aspect, the present invention provides a method for optimizing power allocation in a virtual power plant with secondary frequency regulation, comprising: acquiring the grid frequency deviation and generating initial power regulation commands for each distributed resource based on the grid frequency deviation; acquiring historical operating data and / or real-time interactive data, and obtaining the uncertainty distribution characteristics during the operation of the virtual power plant based on learning from the historical operating data and / or real-time interactive data; and using a swarm intelligence optimization algorithm to iteratively optimize a multi-objective optimization function including frequency control performance, operational economy, and resource sustainability based on the initial power regulation commands and the uncertainty distribution characteristics, thereby determining the optimal power regulation commands for each distributed resource; wherein, the swarm intelligence optimization algorithm introduces a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a cross-resource region solution structure reorganization mechanism during the iterative optimization process.

[0005] This invention achieves a substantial improvement over traditional frequency regulation methods by organically combining uncertainty perception, global optimization, and multi-objective collaborative optimization. On the one hand, it learns the uncertainty distribution characteristics based on historical and real-time data, enabling it to perceive environmental changes. On the other hand, it employs a swarm intelligence optimization algorithm that introduces a memory mechanism, adaptive radius, and cross-regional reorganization to solve a multi-objective function encompassing frequency control performance, operational economy, and resource sustainability on a global scale. This improves frequency regulation accuracy and response speed while also taking into account long-term sustainability indicators such as battery life and carbon emissions, achieving a synergistic improvement in frequency regulation performance and economy.

[0006] In one optional implementation, the step of generating initial power regulation commands for each distributed resource based on grid frequency deviation includes: calculating the total regulation power required for the virtual power plant to participate in secondary frequency regulation based on the grid frequency deviation; allocating the total regulation power to each resource unit according to the participation factor of each distributed resource, generating initial power regulation commands for each distributed resource, wherein the participation factor of each distributed resource is determined based on the adjustable capacity, state of charge, and / or prediction margin of each distributed resource. By calculating the total regulation power based on the grid frequency deviation and dynamically determining the participation factor based on the adjustable capacity, state of charge, and prediction margin of each distributed resource, resources with large adjustable capacity, good health status, and high prediction accuracy can undertake more tasks. This mechanism not only lays a reasonable foundation for subsequent global optimization but also improves the system's adaptability to uncertainty by actively avoiding resources with large prediction errors, while avoiding the defects of fixed weight allocation and improving resource utilization efficiency while protecting equipment.

[0007] In one optional implementation, the step of obtaining the uncertainty distribution characteristics during the operation of a virtual power plant based on learning from historical operating data and / or real-time interactive data includes: constructing a reinforcement learning model with the virtual power plant as the agent and the power system environment as the interaction environment; and extracting uncertainty patterns from historical operating data and / or real-time interactive data through continuous interactive learning between the agent and the interaction environment, which serves as the uncertainty distribution characteristics during the operation of the virtual power plant. By constructing a reinforcement learning model with the virtual power plant as the agent and the power system environment as the interaction environment, the model can actively learn the fluctuation patterns of renewable energy output, load, and market prices through continuous interaction with the environment. This dynamic learning mechanism eliminates the reliance on prior assumptions in obtaining uncertainty characteristics, instead directly mining the true risk distribution from historical and real-time data, ensuring frequency regulation safety margins while avoiding resource waste caused by excessive conservatism. Furthermore, the reinforcement learning model possesses online adaptive capabilities, continuously updating its understanding of uncertainty as the environment changes, ensuring that the frequency regulation strategy always matches the current operating conditions, significantly improving the robustness and economy of the system in complex time-varying scenarios.

[0008] In one optional implementation, based on the initial power regulation command and uncertainty distribution characteristics, a swarm intelligence optimization algorithm is used to iteratively solve a multi-objective optimization function, including frequency control performance, operational economy, and resource sustainability, to determine the optimal power regulation command for each distributed resource. This includes: using the initial power regulation command as the initial population for the swarm intelligence optimization algorithm; determining the optimization parameters of the multi-objective optimization function based on the uncertainty distribution characteristics; and iteratively solving the multi-objective optimization function using the swarm intelligence optimization algorithm to obtain the optimal power regulation command for each distributed resource. This implementation first uses the initial power regulation command as the initial population for the swarm intelligence optimization algorithm, enabling the optimization process to fully utilize the reasonable components in the initial allocation results and perform a refined search based on existing foundations, significantly improving convergence efficiency. Second, the optimization parameters of the multi-objective optimization function are dynamically determined based on the uncertainty distribution characteristics, allowing the optimization decision to proactively respond to environmental changes. This mechanism avoids blind searching caused by random initialization and ensures that the optimization objectives match environmental risks, maximizing operational economy while guaranteeing frequency regulation reliability. The resulting optimal power regulation command truly achieves a synergistic unity of safety, economy, and sustainability.

[0009] In one alternative implementation, resource sustainability includes battery life loss cost, which is determined based on an equivalent full cycle model and modified by incorporating at least one of a temperature factor, a rate factor, and a health status factor.

[0010] In one optional implementation, the historical search experience memory mechanism includes: storing the historical optimal solution for each search individual and using it as a reference direction in subsequent iterations for position updates; the adaptive search radius adjustment mechanism includes: dynamically increasing or decreasing the search step size according to the optimization progress status of the search individual; and the cross-resource region solution structure recombination mechanism includes: dividing the solution vector into segments according to region or resource type and cross-recombining the segments between different individuals. Specifically, the historical search experience memory mechanism stores the historical optimal solution for each individual and uses it as a reference direction in subsequent iterations, enabling the algorithm to effectively utilize past search experience and avoid repeated searches due to forgetting excellent solutions; the adaptive search radius adjustment mechanism dynamically increases or decreases the search step size according to the optimization progress status of the individual, achieving an adaptive balance between global exploration and local development; and the cross-resource region solution structure recombination mechanism divides the solution vector into segments according to region or resource type and cross-recombines them between different individuals, allowing the fusion of excellent allocation schemes from different regions and explicitly solving the optimization problem under the complex constraint of cross-regional power coupling. The organic integration of these three mechanisms enables the algorithm to maintain population diversity and avoid premature convergence under complex multi-peak objectives, while also achieving efficient convergence in a high-dimensional coupled space, ultimately resulting in a better frequency modulation power allocation scheme. This enhances the global optimization capability and solution efficiency of the swarm intelligence optimization algorithm.

[0011] In an optional implementation, the method further includes: issuing the optimal power adjustment command to each distributed resource for execution and obtaining the actual residual frequency deviation after execution; generating a feedback correction command based on the actual residual frequency deviation, and superimposing the feedback correction command onto the optimal power adjustment command to generate a corrected power adjustment command. In this implementation, after issuing and executing the optimal power adjustment command obtained through optimization calculation, the system does not stop there but monitors the execution effect in real time and obtains the actual residual frequency deviation, thereby generating a feedback correction command that is superimposed onto the original command. This closed-loop mechanism effectively compensates for the uncertainties caused by model simplification, prediction errors, and external disturbances. It leverages the forward-looking advantages of the global optimization algorithm while retaining the correction capability of closed-loop feedback control, enabling the system to achieve efficient resource allocation and ensure the accuracy and reliability of frequency control in complex time-varying environments, significantly improving the engineering practical value of the frequency modulation strategy.

[0012] Secondly, the present invention provides a virtual power plant secondary frequency regulation power optimization allocation device, comprising: an initial allocation module, used to acquire grid frequency deviation and generate initial power regulation commands for each distributed resource based on the grid frequency deviation; an uncertainty learning module, used to acquire historical operating data and / or real-time interactive data, and obtain uncertainty distribution characteristics in the virtual power plant operation process based on learning from the historical operating data and / or real-time interactive data; and an optimization decision module, used to iteratively optimize a multi-objective optimization function including frequency control performance, operating economy, and resource sustainability based on the initial power regulation commands and uncertainty distribution characteristics, and determine the optimal power regulation command for each distributed resource; wherein, the swarm intelligence optimization algorithm introduces a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a cross-resource region solution structure reorganization mechanism in the iterative optimization process.

[0013] Thirdly, the present invention provides an electronic device, comprising: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the virtual power plant secondary frequency regulation power optimization allocation method of the first aspect or any corresponding embodiment described above.

[0014] Fourthly, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the virtual power plant secondary frequency regulation power optimization allocation method of the first aspect or any corresponding embodiment described above. Attached Figure Description

[0015] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0016] Figure 1 This is a schematic diagram of the first process of the virtual power plant secondary frequency regulation power optimization allocation method according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the second process of the virtual power plant secondary frequency regulation power optimization allocation method according to an embodiment of the present invention; Figure 3 This is a structural block diagram of a virtual power plant secondary frequency regulation power optimization and allocation device according to an embodiment of the present invention; Figure 4 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present invention. Detailed Implementation

[0017] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0018] It is understood that before using the technical solutions disclosed in the various embodiments of the present invention, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in the present invention and their authorization should be obtained in accordance with relevant laws and regulations through appropriate means.

[0019] With the integration of a high proportion of renewable energy, the power grid faces the challenge of increased frequency fluctuations and uncertainties. Traditional centralized AGC, relying on static parameters, struggles to achieve fast and accurate frequency regulation. Existing robust optimization methods, while ensuring safety margins, suffer from cost increases due to boundary-based conservatism. Metaheuristic algorithms are prone to premature convergence under complex multi-peak objectives and lack cross-regional coupling processing capabilities. Furthermore, the degradation of energy storage lifespan and carbon emission costs have long been underestimated, resulting in a lack of effective balance between short-term frequency regulation benefits and long-term sustainability. Based on this, this invention provides a method, apparatus, equipment, and medium for optimizing the allocation of secondary frequency regulation power in a virtual power plant. First, the total corrected power is calculated based on real-time frequency difference at the distributed AGC layer and initially allocated according to participation factors. Subsequently, an uncertainty modeling layer constructed by deep reinforcement learning (such as DDPG) integrates time-varying disturbances such as wind and solar power output, load, and price in the continuous action space, outputting online estimates of the strategy and scenario distribution. Based on this prior, an improved ALA optimization layer is entered, utilizing three mechanisms—individual dynamic memory, adaptive optimization search radius, and multi-regional power coupling cross-processing—to perform comprehensive optimization of the frequency regulation power of each resource. The optimization process seeks to solve a multi-objective optimization problem with "frequency deviation, adjustment range / operating cost, carbon emission cost, and battery life loss (EFC and temperature / rate / SOH correction)" as joint objectives, while satisfying constraints such as capacity, ramping, SOC, and regional power flow. The obtained optimal command is issued to the generating unit and energy storage at the feedback execution layer. The residual frequency deviation is then corrected a second time using proportional / integral correction, and the operating data is fed back to DRL and ALA to achieve online adaptation and strategy evolution. This forms a stable, low-carbon, economical, and sustainable secondary frequency regulation closed loop of "perceiving uncertainty - optimizing allocation - closed-loop execution - self-evolution".

[0020] In power systems, frequency is a measure of grid operation quality, reflecting the dynamic balance between power generation and consumption. Power is a regulation mechanism used to maintain frequency stability by adjusting power generation or consumption. When power generation equals power consumption, the frequency remains stable at its rated value; when the two are unbalanced, the frequency deviates. Therefore, the essence of secondary frequency regulation is to calculate the total power required for adjustment based on the frequency deviation, then optimize the allocation of these power commands to various regulation resources, and finally restore the frequency to its rated value through power regulation. The "optimized power allocation" in the title of this invention achieves more precise and economical frequency regulation by optimizing the allocation of regulation power among various resources.

[0021] According to an embodiment of the present invention, a method for optimizing power allocation in a virtual power plant with secondary frequency regulation is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0022] This embodiment provides a method for optimizing the allocation of secondary frequency regulation power in a virtual power plant. Figure 1 This is a flowchart of a virtual power plant secondary frequency regulation power optimization allocation method according to an embodiment of the present invention, such as... Figure 1 As shown, the process includes the following steps: Step S101: Obtain the grid frequency deviation and generate initial power regulation commands for each distributed resource based on the grid frequency deviation.

[0023] Grid frequency deviation refers to the difference between the actual operating frequency of the power grid and the rated frequency. It reflects the balance between power generation and power load. A positive frequency deviation indicates a power surplus, while a negative deviation indicates a power deficit. Distributed resources are a collective term for various adjustable resources within a virtual power plant, including energy storage systems, distributed photovoltaics, wind power, controllable loads, diesel generators, etc., and are the basic units for executing frequency regulation commands.

[0024] In a virtual power plant, the Automatic Generation Control (AGC) system regulates the grid frequency by coordinating various distributed energy units (such as photovoltaic, wind power, and energy storage systems). Based on grid frequency deviations and expected load demands, the AGC system adjusts the output of each distributed energy source in real time to maintain grid frequency stability.

[0025] The purpose of this step is to calculate the total correction power of the virtual power plant based on the grid frequency deviation and decompose it to various distributed resources. Specifically, step S101 includes: Step S1011: Obtain the power grid frequency deviation.

[0026] Step S1012: Calculate the total regulation power required for the virtual power plant to participate in secondary frequency regulation based on the grid frequency deviation.

[0027] Step S1013: The total regulation power is allocated to each resource unit according to the participation factor of each distributed resource, and the initial power regulation command of each distributed resource is generated. The participation factor of each distributed resource is determined according to the adjustable capacity, state of charge and / or prediction margin of each distributed resource.

[0028] For example, in step S101 above, firstly, a distributed AGC (Automatic Generation Control) framework is established for the virtual power plant (VPP) to participate in the secondary frequency regulation of the power grid. The virtual power plant consists of various adjustable resources (such as controllable generators, energy storage batteries, etc.) distributed in different locations, requiring coordination among these distributed units to jointly undertake the frequency regulation task. Traditional centralized AGC uses a master station to uniformly calculate frequency regulation commands, while this embodiment adopts a distributed control architecture: the virtual power plant aggregation controller calculates the total frequency correction power command of the system, and each sub-unit shares this command according to an allocation strategy. To achieve frequency difference elimination, firstly, based on the system frequency deviation... Calculate the total secondary frequency regulation power required by the virtual power plant A linear control law is adopted, in which the frequency deviation is proportional to the required power:

[0029] in This is the frequency modulation gain coefficient (positive value). This indicates the instantaneous frequency deviation of the system.

[0030] The above formula ensures that when the frequency is low, A positive value prompts the virtual power plant to increase its output to support the frequency; conversely, if the frequency is too high, the output is reduced, thereby pulling the frequency back to the target value.

[0031] After obtaining the total frequency regulation power demand, it needs to be distributed and allocated to various resource units within the virtual power plant. Let the virtual power plant contain n adjustable units (denoted as i = 1, 2, ..., n), and the power command allocated to the i-th unit is... .

[0032] The allocation follows these constraints: This means that the sum of instructions from each unit matches the total demand. To determine the specific allocation ratio, a participation factor for each unit can be introduced. , so that:

[0033] in , The value can be initialized and updated based on factors such as cell capacity, response speed, ramp rate, state of charge, and prediction margin. For example, cells with larger capacity or higher availability margin can be assigned a higher value. .

[0034] Through the above modeling, the total frequency correction power was decomposed in real time within the virtual power plant. Each unit decomposes the power based on the locally measured frequency difference and the received frequency. The instructions are used for adjustment. This distributed AGC framework improves the robustness and real-time performance of frequency modulation control: on the one hand, the aggregated controller's centralized calculation ensures that frequency deviations are globally corrected; on the other hand, the decentralized execution of each unit avoids single points of failure and enhances system redundancy. The model established in this step lays the foundation for subsequent optimization: the total power demand is driven by the frequency difference, and the initial allocation of each unit is based on a fixed weight, but the allocation strategy will be further improved through optimization algorithms to enhance performance.

[0035] Step S102: Obtain historical operating data and / or real-time interactive data, and obtain the uncertainty distribution characteristics of the virtual power plant operation process based on learning from the historical operating data and / or real-time interactive data.

[0036] The purpose of this step is to use deep reinforcement learning to model renewable energy output, load, and price disturbances online and output policy / scenario distributions. Historical operating data refers to various records accumulated by the virtual power plant during its past operation, including wind / solar power output data, load change curves, and grid frequency records, used to uncover long-term statistical patterns of uncertainty. Real-time interactive data refers to the current state information continuously collected during system operation, such as real-time wind power output, real-time load, and real-time electricity prices, used to capture the latest environmental changes. Uncertainty distribution characteristics refer to quantitative information obtained through data learning that reflects the statistical patterns of various uncertainty factors, including renewable energy output fluctuation characteristics, load fluctuation characteristics, and market price fluctuation characteristics.

[0037] Virtual power plant operation faces numerous uncertainties, including fluctuations in renewable energy output, load forecasting errors, and uncertain changes in electricity market prices. Traditional methods often employ Informed Gap Decision Theory (IGDT) for uncertainty modeling, pre-setting "information gaps" to ensure the robustness of decisions under worst-case scenarios. However, IGDT typically assumes the fluctuation boundaries of uncertain parameters, making it difficult to reflect changes in probability distributions under complex dynamic environments, potentially leading to overly conservative results. To improve adaptability to uncertainty, this embodiment innovatively introduces Deep Reinforcement Learning (DRL) technology to replace traditional IGDT for uncertainty modeling.

[0038] Specifically, step S102 above includes: Step S1021: Obtain historical running data and / or real-time interactive data.

[0039] Step S1022: Construct a reinforcement learning model with the virtual power plant as the intelligent agent and the power system environment as the interaction environment.

[0040] For example, a Markov Decision Process (MDP) is constructed with a virtual power plant frequency regulation decision as the agent and the power system environment as the environment: the state space contains observable environmental information (such as current load level, renewable output, frequency deviation, and remaining capacity of each resource), and the action space contains frequency regulation control strategies (such as adjusting the reserved frequency regulation reserves of each unit or adjusting strategy parameters). Environmental uncertainties are presented through state transitions. Through interactive experiments with the environment, the reinforcement learning agent gradually approximates the implicit uncertainty distribution without the need for artificial assumptions about fluctuation ranges.

[0041] Specifically, this embodiment uses the Deep Deterministic Policy Gradient (DDPG) algorithm as a reinforcement learning model implementation example, leveraging its advantage in continuous action space to handle the problem of continuous frequency modulation command allocation. DDPG uses an actor-critic architecture: actor network. According to the status s Output continuous control action a (e.g., power command adjustments for various resources), Commentator Network Evaluate the value of a given state-action pair. Design a reasonable reward function. This is crucial for guiding agents to capture the impact of uncertainty. The reward function incorporates frequency deviation penalties, adjustment costs, and risk preference terms. For example, if action 'a' at a certain moment reduces the frequency deviation, a positive reward is given; if the frequency deviation increases, a penalty is imposed; the higher the adjustment cost caused by the action (e.g., activating expensive generators or deeply discharged batteries), the lower the reward should be. For the uncertain part, a risk metric is introduced: for example, additional penalties are imposed for performance degradation exceeding a certain confidence level, to ensure the policy remains robust across various scenarios.

[0042] Step S1023: Through continuous interactive learning between the agent and the interactive environment, extract uncertainty patterns from historical operating data and / or real-time interactive data as uncertainty distribution characteristics in the operation of the virtual power plant.

[0043] Through the design of step S1022 above, the agent approximately "learns" the probability distribution of environmental uncertainty through repeated training: the results of frequency responses under different scenarios are reflected as reward signals, driving policy updates. The DDPG algorithm updates policy network parameters based on gradients; for example, the gradient of the actor's policy is approximately...

[0044] in The expected cumulative return target for the strategy; For the set of trainable parameters of the Actor network; For a deterministic policy, in state The following is a series of actions output by the actor via the network; Environmental state vector (which may include observations such as frequency difference, renewable output, and SOC in VPP); For continuous action vectors (such as frequency modulation allocation / constraint hyperparameters of each unit, etc.). For action vectors Operators for finding gradients (partial derivatives); For the parameter set of the critic network; For the critic's value assessment of the state-action pair (scalar, representing long-term performance / negative cost). For the action output by the current strategy, the commentator's gradient of the action indicates "where to fine-tune the action to make the value increase the fastest"; Output the Jacobian of the actor's parameters, describing "which parameters will push the movement in which direction"; To obtain the desired operator for the state distribution (induced by experience replay or strategy), the mean of a mini-batch sample is used for approximation.

[0045] This formula provides an approximate way to calculate the gradient of a deterministic policy: (The goal is to increase the target value.) This means ensuring that the actor's online output movements are consistently "superior," which in this invention manifests as more stable frequency, lower overall cost (carbon content), and reduced battery life. The chain rule states that the effects of "changing parameters altering the movement" and "the movement altering the value" should be multiplied and averaged over possible states. Therefore, the gradient of the commentator at the current movement is used. The guiding action is subtly adjusted "in a direction of increasing value," and then Jacobi is used as an actor. Map this direction back to "how to modify the parameters"; average over a batch of states to obtain... The estimate is used to update Intuitively, it combines information about "which direction to move the action in for better results" with information about "how to adjust parameters to push the action in that direction," thereby enabling the strategy to gradually learn to provide better continuous instruction allocation under uncertainty in the virtual power plant secondary frequency regulation scenario.

[0046] Through such policy gradient updates, the agent continuously adjusts to maximize long-term rewards. After sufficient training, the reinforcement learning module can generate conditional probability distributions of uncertainties or corresponding adjustment strategies based on real-time observations. When environmental statistical characteristics change (such as increased fluctuations in wind power output), the agent can automatically update its strategy through continuous training, achieving dynamic adaptation. Compared to IGDT, which requires manual assumptions about the worst-case scenario, the deep reinforcement learning uncertainty modeling in this embodiment is significantly innovative: it can directly learn risk distributions from historical and real-time data, ensuring frequency regulation safety margins without being overly conservative, thereby improving the economy and flexibility of virtual power plant frequency regulation.

[0047] Step S103: Based on the initial power regulation command and the uncertainty distribution characteristics, a swarm intelligence optimization algorithm is used to iteratively optimize the multi-objective optimization function, which includes frequency control performance, operational economy and resource sustainability, to determine the optimal power regulation command for each distributed resource. In the iterative optimization process, the swarm intelligence optimization algorithm introduces a historical search experience memory mechanism, an adaptive search radius adjustment mechanism and a solution structure reorganization mechanism across resource regions.

[0048] This step takes the initial allocation command as the starting point for optimization and the uncertainty distribution characteristics as the basis for parameter adjustment. By introducing a swarm intelligence optimization algorithm with three improvement mechanisms, it seeks the optimal solution among the three dimensions of frequency control performance, operation economy and resource sustainability, so that the final output power regulation command has both environmental adaptability and global optimality and engineering feasibility.

[0049] Specifically, step S103 above includes: Step S1031: Use the initial power adjustment command as the initial population individuals for the swarm intelligence optimization algorithm.

[0050] For example, the swarm intelligence optimization algorithm uses the Improved Artificial Lemming Optimization Algorithm (ALA). ALA is a biomimetic metaheuristic algorithm inspired by the group behavior of lemmings in nature. The standard ALA constructs four typical lemming behaviors for searching the optimal solution space: long-distance migration and burrowing correspond to global exploration, while foraging and predator avoidance correspond to local exploitation. The algorithm iteratively approaches the optimal solution by simulating the lemming group's migration and foraging process within the solution space. In this embodiment, the frequency regulation power allocation of each unit in the virtual power plant is represented as a solution vector. The ALA algorithm uses a population of solution vectors of a certain size as the initial solution and optimizes the objective function step by step by simulating the evolution of this population through the behavior of lemming groups.

[0051] Step S1032: Determine the optimization parameters of the multi-objective optimization function based on the characteristics of the uncertainty distribution; the multi-objective optimization function includes frequency control performance, operational economy, and resource sustainability.

[0052] This step involves constructing a multi-objective optimization model for the secondary frequency regulation control of a virtual power plant, clarifying each optimization objective and constraint. The virtual power plant's participation in secondary frequency regulation must ensure system frequency stability while also considering economic efficiency and sustainability. Therefore, performance indicators encompassing frequency control performance, operational economy, and resource sustainability are incorporated into the objective function system.

[0053] For example, frequency control performance metrics include frequency deviation metrics ( Frequency regulation (FGD) measures the deviation of the system frequency from the nominal value during frequency modulation. A larger deviation indicates a worse frequency modulation effect. This indicator emphasizes restoring the frequency to the 50Hz (or 60Hz) standard as soon as possible to ensure power quality and system safety.

[0054] For example, performance indicators of operational economy include adjustment costs ( (): This refers to the economic cost incurred by various resources in a virtual power plant to execute frequency regulation commands. For example, increasing the output of a thermal power unit will increase fuel costs, and frequent charging and discharging of batteries will deplete their lifespan. An adjustment cost coefficient can be set for each unit i. (Yuan / kilowatt of frequency modulation power), then the total regulation cost can be expressed as the weighted sum of the frequency modulation power of each unit, such as:

[0055] in ) represents the adjustment amount of the output of the i-th unit at time t relative to the baseline plan, and the integral (or summation) represents the cumulative adjustment force over the entire frequency modulation cycle. A larger value indicates a more expensive unit or one that is less likely to be adjusted frequently (e.g., high cost coefficients for generators or life-sensitive equipment). Optimization will tend to reduce the instruction range of this unit.

[0056] For example, resource sustainability includes carbon emission costs and battery life loss costs, which are determined based on an equivalent full cycle model and modified by incorporating at least one of temperature factors, rate factors, and health status factors.

[0057] Carbon emission costs ( This considers the carbon emission costs incurred due to power generation adjustments during frequency regulation, aligning with the requirements of low-carbon operation. If certain units in the virtual power plant (such as diesel generators) emit carbon emissions during output changes, an emission factor is introduced. (tons of CO2 per kilowatt-hour) and carbon tax price (Yuan per ton of CO2). The carbon emission cost can then be expressed as:

[0058] in This represents the adjustment amount when the output of unit i is increased (power generation is increased), considering only the additional emissions caused by positive adjustment. For zero-carbon energy sources (wind, solar, energy storage, etc.), this can make... =0 means emissions costs are disregarded. By minimizing The optimization will prioritize the use of clean energy or low-carbon resources in frequency regulation, reduce output variations of fossil fuel units, and thus reduce carbon footprint.

[0059] Battery life depreciation cost ( This section addresses the scenario where battery energy storage participates in frequency regulation within a virtual power plant, specifically introducing the cost of battery life degradation. Batteries gradually lose capacity during charge-discharge cycles; excessively high temperatures (T), high charge-discharge rates (C-rate), and low remaining lifespan all accelerate aging. To quantify this impact, a cycle life model is used to equate battery life loss to economic costs. For example, assuming a certain type of battery has a rated cycle life of [missing information] under standard operating conditions... The initial cost is If the cost is 1000 yuan, then each execution of the Equivalent Complete Loop (EFC) will "consume" approximately 1000 yuan. The value of this. Furthermore, the main factors affecting lifespan are introduced as correction factors: temperature factor. (The higher the temperature, the greater the lifespan consumption factor), and the multiplier factor. (The higher the charge / discharge rate, the more severe the lifespan loss), and the remaining lifespan factor. (The older the battery, the higher the marginal cost of each cycle.) Therefore, battery j in a very short time... Internal output power Increased life-cycle costs It can be represented as:

[0060] in This is the rated energy capacity of battery j. This indicates the proportion of discharge / charge capacity relative to a complete cycle at that moment (equivalent cycle depth). The above formula takes temperature into account. , magnification and current health status The impact on single-cycle cost. All moments of the battery up to T The total battery life depreciation cost is obtained by summing the results over all battery cells in the virtual power plant (Batt is the set of battery cells).

[0061] This metric encourages the algorithm to minimize deep battery discharge or high load impacts during optimization, avoiding excessive sacrifice of battery life for temporary frequency modulation gains, thus making the frequency modulation strategy more economically sustainable.

[0062] Combining the above four objectives, this embodiment forms a multi-objective optimization problem. The measurable frequency difference cost, regulation cost, carbon emission cost, and battery depreciation cost are respectively... The optimization objective can then be expressed as simultaneously minimizing the vector objective ( In practice, to facilitate the solution, weighted summation is often used to transform multiple objectives into a single objective function. :

[0063] in The weighting coefficients assigned to decision-makers reflect the relative importance of each objective. The selection of weights can be based on actual needs (e.g., for high frequency security requirements). The greatest emphasis is placed on environmental protection. (etc.). Optimization involves not only minimizing the objective function but also satisfying various physical and operational constraints, including but not limited to: Power balance constraint: The total frequency regulation power provided by the virtual power plant must match the system demand, i.e. .

[0064] Output upper and lower limit constraints: For each resource i, This ensures that the output after adjustment does not exceed the physical limits.

[0065] Rate of climb constraint: The rate of power change of the generator set and batteries is limited. To prevent adjustments from being made too quickly and affecting equipment stability.

[0066] Battery SOC Constraints: Energy Storage Operation Must Be Guaranteed and according to Update SOC status.

[0067] Regional power constraints: If the virtual power plant spans multiple regions, coupling conditions such as regional power balance and line transmission capacity limitations should also be considered.

[0068] Optional factors include regional power flow and line capacity constraints.

[0069] The aforementioned multi-objective model comprehensively quantifies the various factors that need to be considered in the secondary frequency regulation decision-making of the virtual power plant. Compared with traditional models that only consider frequency regulation error or cost as a single objective, this embodiment innovatively incorporates carbon emissions and battery depreciation into the objective function, making the optimization results more in line with the requirements of low-carbon and sustainable development of the future power grid. In the next step, the improved ALA algorithm is used to solve the model to obtain the optimal power command allocation scheme for each unit.

[0070] Step S1033: A swarm intelligence optimization algorithm is used to iteratively solve the multi-objective optimization function to obtain the optimal power adjustment command for each distributed resource. The swarm intelligence optimization algorithm incorporates a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a cross-resource region solution structure reorganization mechanism during the iterative optimization process.

[0071] The historical search experience memory mechanism includes: saving the historical best solution for each search individual and using it as a reference direction in subsequent iterations for position updates.

[0072] For example, to address the issues of premature convergence or periodic entrapment in similar regions that may occur with standard ALA, a dynamic memory function is added to each "lemming" individual in the algorithm. Specifically, a memory position vector is introduced for each search individual i. It is used to store the location of the best solution found by this individual so far (i.e., the historical best). (Assignment scheme). In each iteration, if an individual finds a better solution than its memory, its memory is updated. Otherwise, the original memory is retained. The memory update rule can be expressed as:

[0073] in It is the new position solution for the i-th individual at iteration t+1. The objective function to be optimized always returns a smaller objective value (assuming a minimization problem). The above formula indicates that the memory is updated only when a new solution is better; otherwise, the memory position remains unchanged. Using this memory mechanism, each individual carries its own historical best solution, which can be used as a candidate direction in subsequent iterations. For example, in the "foraging" phase, in addition to referring to the current global optimum, individuals can also explore towards the vicinity of their own historical best solution, thus preventing the forgetting of previously discovered excellent allocation schemes. This mechanism is similar to the "self-awareness" component in particle swarm optimization, but here the memory is dynamically adjusted, ensuring that the algorithm can backtrack and reuse past experience when the environment changes (such as the shift in the best solution due to load uncertainty), accelerating the speed of re-optimization. The dynamic memory mechanism effectively alleviates the problem of premature convergence in the algorithm, improves the diversity of solutions, and enhances global search capabilities.

[0074] The adaptive search radius adjustment mechanism includes dynamically increasing or decreasing the search step size of an individual based on its optimization progress / stagnation status, in order to balance global exploration and local development.

[0075] For example, in standard ALA, the movement step size and search radius of each lemming individual are typically fixed or decay according to a preset strategy. To address the multi-objective optimization requirements of a virtual power plant, an adaptive search radius adjustment mechanism is introduced, enabling the algorithm to dynamically adjust its exploration step size based on iteration progress, balancing global search with local fine-tuning. Specifically, each individual is assigned a search radius parameter. Initially, a large value is assigned to encourage extensive exploration, and then dynamically adjusted based on optimization convergence and individual evolutionary energy. For example, the following adjustment strategy can be adopted: when an individual does not find a better solution within a certain period, its search radius is appropriately increased to escape local traps; conversely, when an individual frequently finds better solutions, the search radius is gradually reduced to refine the neighborhood search. This can be formalized as:

[0076] in The preset maximum step size limit, This is the magnification factor. This is the contraction coefficient exponent. The above formula illustrates that when an individual stagnates, its step size is increased. Multiply by (but not exceeding the upper limit) to expand the search scope; when progress is made, press [the appropriate value]. The step size is reduced proportionally. A value of 0.5 indicates that the step size decreases less than the increase, thus gradually converging. Through an adaptive optimization search radius, the algorithm can quickly traverse a broad solution space in the early stages of iteration, increasing the probability of finding the global optimum; in the later stages of iteration, it converges to a smaller radius around the current optimal solution for fine-tuning, reducing jitter and ensuring solution accuracy. Compared to fixed step sizes or linear decay, this mechanism expands the radius based on the individual continuous stagnation algebra threshold and shrinks the radius exponentially when a new optimal solution is found, achieving a phased coarse-to-fine search. It adaptively adjusts the search intensity according to the current optimization situation, improving the algorithm's ability to handle complex multimodal functions and its convergence efficiency.

[0077] The cross-resource region solution structure reorganization mechanism includes: dividing the solution vector into segments according to region or resource type, and cross-reorganizing the segments among different individuals to explicitly handle cross-regional coupling constraints and improve the efficiency of cross-regional coupling optimization.

[0078] For example, virtual power plants often consist of multiple geographically or functionally distinct resource subsets, such as distributed power sources in different regions or subgroups composed of generator sets and energy storage devices. These subsets have power coupling relationships (e.g., coordinated response frequency of resources within the same region, power flow exchange across regions limited by line capacity, etc.). To better consider this structural characteristic in optimization, ALA adds operations to simulate multi-regional cross-cooperation. Specifically, the solution vector can be... Divide into m segments according to region or category, each segment This represents the power allocation subvector for each resource within the j-th region. When a multi-region crossover event is triggered, two lemming individuals (solution schemes) are randomly selected from the current population. One or more region indices k are randomly chosen, and their power allocation schemes in these regions are exchanged, thereby generating new candidate solutions. For example, if region k is selected, the individuals after the exchange... The new solution is:

[0079] and Correspondingly obtained The original part of region k. The aforementioned crossover operator allows for the recombination and fusion of superior frequency modulation schemes from different regions: for example, one individual has a low-cost allocation scheme in region A, while another has a good frequency difference suppression scheme in region B. Crossover can combine the advantages of both into the same solution. This multi-regional coupling crossover mimics the migration and communication behavior of lemming populations between different habitats, helping to overcome the limitations of optimization in each sub-region and find a globally better combination of resource allocations. Unlike traditional random mutation of a single individual, crossover behavior utilizes existing high-quality fragments in the population for recombination, enabling more effective exploration of the solution space containing complex coupling relationships. In particular, in the virtual power plant frequency control problem, the linkage between different regions or different types of resources is a decision-making challenge. This innovation injects the algorithm with the ability to search for such decoupled-coupled structures, significantly improving global optimization performance.

[0080] In summary, the improved ALA incorporates the above three improvements into the standard lemming algorithm's process: after initializing the population, in each iteration, individuals first perform standard migration and burrowing search operations, then apply dynamic memory to update historical bests, adjust the adaptive search radius, and perform multi-region crossover operations with a certain probability to generate new solutions. Finally, fitness is evaluated and the population is updated. With these innovative mechanisms, the algorithm can more fully search the solution space of virtual power plant frequency regulation command allocation, balancing global applicability and convergence speed, thus providing efficient support for solving complex multi-objective optimization problems.

[0081] The virtual power plant secondary frequency regulation power optimization allocation method provided in this embodiment has the following beneficial effects: Uncertainty modeling is moving from "static boundaries" to "data-driven": DRL is used to learn frequency correction strategies in continuous action space, directly approximating environmental distribution and risk profiles from historical / online data, avoiding conservative worst-case assumptions, and reducing redundancy and excess costs while ensuring safety.

[0082] ALA’s three key improvements enhance solution quality and speed: Individual dynamic memory mechanism preserves and reuses individual historical best fragments, reducing premature convergence; Adaptive optimization search radius automatically expands or contracts as stagnation / progress occurs, taking into account both global exploration and local fine-tuning; Multi-region power coupling cross-reorganizes high-quality sub-solutions at the “fragment level” in the solution space, strengthening cross-region / cross-resource coupling search capabilities.

[0083] Battery life economics are explicitly measurable: a life loss model based on equivalent complete cycle (EFC) is introduced, along with a temperature factor. Multiplier Health status factors Adjusting the cost per cycle unifies the immediate frequency adjustment benefits and long-term lifetime value into the same target framework, avoiding the short-sighted strategy of "trading lifetime for response".

[0084] Multi-objective low-carbon optimization: The objective function jointly considers frequency deviation, adjustment mileage, carbon cost, and lifetime loss, and combines regional power flow and ramp constraints to achieve overall optimization of "stability-economy-low carbon-sustainability".

[0085] Project feasibility and scalability: The solution maintains a plug-and-play interface with distributed AGC; it can be solved centrally or regionally decomposed and coordinated, making it easy to deploy and upgrade in VPPs of different sizes (including multiple terminals and cross-regional).

[0086] Closed-loop self-evolution: The DRL strategy and the improved ALA continuously absorb operational data in the feedback loop and evolve adaptively online; they still maintain convergence and stability under perturbation scenarios (such as sudden wind changes and price jumps).

[0087] This embodiment provides another method for optimizing the allocation of secondary frequency regulation power in a virtual power plant. Figure 2 This is a flowchart of a virtual power plant secondary frequency regulation power optimization allocation method according to an embodiment of the present invention, such as... Figure 2 As shown, the method is in Figure 1 Based on the illustrated embodiment, a feedback correction step is also included, the specific method of which is as follows: Step S201: Obtain the grid frequency deviation and generate initial power regulation commands for each distributed resource based on the grid frequency deviation. For details, please refer to [link to relevant documentation]. Figure 1 Step S101 of the illustrated embodiment will not be described again here.

[0088] Step S202: Acquire historical operating data and / or real-time interactive data, and based on learning from the historical operating data and / or real-time interactive data, obtain the uncertainty distribution characteristics during the virtual power plant operation process. For details, please refer to [link to relevant documentation]. Figure 1 Step S102 of the illustrated embodiment will not be described again here.

[0089] Step S203: Based on the initial power regulation command and the characteristics of uncertainty distribution, a swarm intelligence optimization algorithm is used to iteratively optimize a multi-objective optimization function, including frequency control performance, operational economy, and resource sustainability, to determine the optimal power regulation command for each distributed resource. The swarm intelligence optimization algorithm incorporates a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a cross-resource region solution structure reorganization mechanism during the iterative optimization process. For details, please refer to [link to relevant documentation]. Figure 1 Step S103 of the illustrated embodiment will not be described again here.

[0090] Step S204: The optimal power adjustment command is sent to each distributed resource for execution, and the actual residual frequency deviation after execution is obtained; a feedback correction command is generated based on the actual residual frequency deviation, and the feedback correction command is superimposed on the optimal power adjustment command to generate the corrected power adjustment command.

[0091] Specifically, after the optimization calculation is completed in step S203, the obtained frequency modulation command scheme needs to be used for actual control execution and dynamically corrected through a feedback mechanism to cope with real-time changes. This step S204 establishes a dynamic feedback execution closed loop to ensure that the optimized command is effectively implemented in the physical system and continuously corrects frequency deviations.

[0092] First, the virtual power plant aggregation controller allocates power according to the optimal power distribution. An initial frequency adjustment command is issued to each resource unit. Upon receiving the command, these units (generators, loads, energy storage, etc.) adjust their output at the physical level, causing the system frequency to change accordingly. After a feedback cycle (e.g., several seconds) has elapsed since the command was executed, the dispatch center obtains the latest system frequency. and the response of each unit. Due to model simplification and environmental uncertainties, the actual frequency deviation... The error may not have been completely eliminated and further correction is required. Therefore, a frequency deviation feedback correction strategy is introduced: based on the remaining frequency deviation... The size of the incremental correction instruction is used to calculate the incremental correction instruction. The power is then redistributed to each unit according to a certain allocation principle and superimposed on the optimized command to achieve closed-loop stability. For example, a simple proportional feedback can be used to obtain the corrected power regulation command:

[0093] in The feedback gain coefficient for unit i is determined based on unit capacity or participating factors, so that... This ensures that the overall feedback correction corresponds to the required total power. The formula indicates that when a residual frequency difference is detected, an additional correction power command is automatically generated and superimposed on the original optimized command, with the direction opposite to the sign of the frequency difference to reduce the deviation. In addition to proportional control, integral control terms can be introduced as needed to eliminate steady-state errors; for example, the integral of the accumulated frequency difference can be used to offset persistent small deviations.

[0094] Another key innovation of the dynamic feedback execution mechanism lies in its integration with online updates of reinforcement learning policies. Since the aforementioned steps have trained a policy network based on DDPG, it can be deployed in the feedback loop to adjust control parameters in real time. For example, when the environment undergoes significant changes (such as a sudden drop in wind power), the reinforcement learning agent can perceive a significant deviation from the training distribution in its state and adaptively adjust the frequency modulation strategy, such as temporarily increasing the proportion of energy storage output to cope with the sudden drop. This design allows feedback control to go beyond traditional linear controllers, incorporating intelligent decision-making elements and enhancing its resilience to unknown disturbances.

[0095] At the practical execution level, each unit of the virtual power plant maintains information exchange with the aggregation controller through a high-speed communication network, achieving closed-loop control at the second level or even higher frequencies. Each cycle includes four stages: "command issuance → field execution → status feedback → command correction," forming a continuous iteration. The frequency control objective is to... The frequency gradually approaches zero, indicating that the system frequency has stabilized. Evaluation metrics such as frequency regulation response time, overshoot, and steady-state error can be used to verify the effectiveness of the feedback mechanism. Through dynamic command feedback, this embodiment overcomes the shortcomings of single offline optimization, which may not be accurate enough in reality, ensuring that the frequency can be stably controlled within the compliant range under various disturbances. In addition, this closed-loop execution also provides the algorithm with actual operating data, which can be fed back to the reinforcement learning model to continuously update the environment model and allow the strategy to evolve gradually. Overall, dynamic feedback and execution steps ensure a closed loop for the entire virtual power plant secondary frequency regulation strategy from algorithm design to practical application, making frequency regulation control both optimized and reliable, meeting the technical requirements for safe and economical grid operation.

[0096] This embodiment also provides a virtual power plant secondary frequency regulation power optimization and allocation device, which is used to implement the above embodiments and preferred embodiments, and will not be repeated as already described. As used below, the term "module" can be a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0097] This embodiment provides a virtual power plant secondary frequency regulation power optimization and allocation device, such as... Figure 3 As shown, it includes: The initial allocation module 301 is used to obtain the grid frequency deviation and generate initial power adjustment commands for each distributed resource based on the grid frequency deviation. Uncertainty learning module 302 is used to acquire historical operating data and / or real-time interactive data, and to obtain the uncertainty distribution characteristics in the operation process of the virtual power plant based on the learning of historical operating data and / or real-time interactive data; The optimization decision module 303 is used to iteratively search for and solve a multi-objective optimization function, including frequency control performance, operational economy and resource sustainability, based on the initial power adjustment command and uncertainty distribution characteristics, using a swarm intelligence optimization algorithm to determine the optimal power adjustment command for each distributed resource. The swarm intelligence optimization algorithm introduces a historical search experience memory mechanism, an adaptive search radius adjustment mechanism and a solution structure reorganization mechanism across resource regions during the iterative optimization process.

[0098] In an optional embodiment, the above-described apparatus further includes: The feedback correction module 304 is used to send the optimal power adjustment command to each distributed resource for execution and obtain the actual residual frequency deviation after execution; generate a feedback correction command based on the actual residual frequency deviation, and superimpose the feedback correction command on the optimal power adjustment command to generate a corrected power adjustment command.

[0099] The virtual power plant secondary frequency regulation power optimization and allocation device provided in this embodiment of the invention can execute the present invention. Figure 1 or Figure 2 The virtual power plant secondary frequency regulation power optimization allocation method provided in the illustrated embodiment has corresponding functional modules and beneficial effects. Further functional descriptions of the above modules and units are the same as in the corresponding embodiments described above, and will not be repeated here.

[0100] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.

[0101] The following is a detailed reference. Figure 4 This diagram illustrates a structural schematic suitable for implementing an electronic device according to embodiments of the present invention. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 401, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 402 or a program loaded from memory 408 into random access memory (RAM) 403. The RAM 403 also stores various programs and data required for the operation of the electronic device. The processor 401, ROM 402, and RAM 403 are interconnected via a bus 404. An input / output (I / O) interface 405 is also connected to the bus 404.

[0102] Typically, the following devices can be connected to I / O interface 405: input devices 406 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 407 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 408 including, for example, magnetic tapes, hard disks, etc.; and communication devices 409. Communication device 409 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 4 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.

[0103] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 409, or installed from a memory 408, or installed from a ROM 402. When the computer program is executed by the processor 401, it performs the functions defined in the virtual power plant secondary frequency regulation power optimization allocation method of the embodiments of the present invention.

[0104] Figure 4 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.

[0105] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as computer code that can be recorded on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the virtual power plant secondary frequency regulation power optimization allocation method shown in the above embodiments is implemented.

[0106] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0107] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A method for optimizing the allocation of secondary frequency regulation power in a virtual power plant, characterized in that, The method includes: Obtain the grid frequency deviation and generate initial power regulation commands for each distributed resource based on the grid frequency deviation; Acquire historical operating data and / or real-time interactive data, and obtain the uncertainty distribution characteristics during the operation of the virtual power plant based on learning from the historical operating data and / or real-time interactive data; Based on the initial power regulation command and the uncertainty distribution characteristics, a swarm intelligence optimization algorithm is used to iteratively optimize a multi-objective optimization function, including frequency control performance, operational economy, and resource sustainability, to determine the optimal power regulation command for each distributed resource. The swarm intelligence optimization algorithm introduces a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a solution structure reorganization mechanism across resource regions during the iterative optimization process.

2. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that, The step of generating initial power regulation commands for each distributed resource based on the grid frequency deviation includes: Calculate the total regulation power required for the virtual power plant to participate in secondary frequency regulation based on the grid frequency deviation; The total regulating power is allocated to each resource unit according to the participation factor of each distributed resource, and an initial power regulating command is generated for each distributed resource. The participation factor of each distributed resource is determined according to the adjustable capacity, state of charge and / or prediction margin of each distributed resource.

3. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that, The step of obtaining the uncertainty distribution characteristics during the operation of the virtual power plant based on learning from the historical operating data and / or real-time interactive data includes: Construct a reinforcement learning model with a virtual power plant as the intelligent agent and the power system environment as the interaction environment; Through continuous interactive learning between the intelligent agent and the interactive environment, uncertainty patterns are extracted from the historical operating data and / or real-time interactive data, serving as the uncertainty distribution characteristics during the operation of the virtual power plant.

4. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that, The step of determining the optimal power regulation command for each distributed resource by iteratively optimizing a multi-objective optimization function, including frequency control performance, operational economy, and resource sustainability, based on the initial power regulation command and the uncertainty distribution characteristics, using a swarm intelligence optimization algorithm, includes: The initial power adjustment command is used as the initial population individual for the swarm intelligence optimization algorithm; Based on the uncertainty distribution characteristics, the optimization parameters of the multi-objective optimization function are determined; The swarm intelligence optimization algorithm is used to iteratively solve the multi-objective optimization function to obtain the optimal power adjustment command for each distributed resource.

5. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that, The resource sustainability includes battery life loss cost, which is determined based on an equivalent full cycle model and corrected by introducing at least one of temperature factor, rate factor and health status factor.

6. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that: The historical search experience memory mechanism includes: saving the historical optimal solution for each search individual and using it as a reference direction in subsequent iterations for position updates; The adaptive search radius adjustment mechanism includes: dynamically increasing or decreasing the search step size according to the optimization progress status of the individual search. The cross-resource region solution structure recombination mechanism includes: dividing the solution vector into segments according to region or resource type, and cross-recombining the segments among different individuals.

7. The virtual power plant secondary frequency regulation power optimization allocation method according to claim 1, characterized in that, The method further includes: The optimal power adjustment command is sent to each distributed resource for execution, and the actual residual frequency deviation after execution is obtained; A feedback correction command is generated based on the actual residual frequency deviation, and the feedback correction command is superimposed on the optimal power regulation command to generate a corrected power regulation command.

8. A virtual power plant secondary frequency regulation power optimization and allocation device, characterized in that, The device includes: An initial allocation module is used to obtain the grid frequency deviation and generate initial power adjustment commands for each distributed resource based on the grid frequency deviation. An uncertainty learning module is used to acquire historical operating data and / or real-time interactive data, and to obtain the uncertainty distribution characteristics during the operation of the virtual power plant based on the learning of the historical operating data and / or real-time interactive data. The optimization decision module is used to iteratively optimize a multi-objective optimization function, including frequency control performance, operational economy, and resource sustainability, based on the initial power adjustment command and the uncertainty distribution characteristics, using a swarm intelligence optimization algorithm to determine the optimal power adjustment command for each distributed resource. The swarm intelligence optimization algorithm incorporates a historical search experience memory mechanism, an adaptive search radius adjustment mechanism, and a solution structure reorganization mechanism across resource regions during the iterative optimization process.

9. An electronic device, characterized in that, include: The system includes a memory and a processor, which are interconnected. The memory stores computer instructions, and the processor executes the computer instructions to perform the virtual power plant secondary frequency regulation power optimization allocation method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to execute the virtual power plant secondary frequency regulation power optimization allocation method according to any one of claims 1 to 7.