New energy inverter cluster cooperative control method and system based on network type energy storage inverter

By using the hierarchical collaborative control architecture based on the A3C reinforcement learning algorithm, the problems of fixed control parameters and insufficient stability of multi-machine collaborative operation of grid-type energy storage inverters in high-proportion new energy power grids are solved. This achieves voltage and frequency stability and power distribution coordination of the power grid, and improves the intelligent operation capability of the system.

CN122225533APending Publication Date: 2026-06-16STATE GRID GANSU ELECTRIC POWER RESEARCH INSTITUTE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID GANSU ELECTRIC POWER RESEARCH INSTITUTE
Filing Date
2026-03-19
Publication Date
2026-06-16

Smart Images

  • Figure CN122225533A_ABST
    Figure CN122225533A_ABST
Patent Text Reader

Abstract

The present application relates to the field of new energy power generation and power electronic conversion control technology, and is a new energy inverter cluster cooperative control method and system based on networked energy storage inverters, which comprises a model construction unit, an algorithm construction unit, a reward construction unit, a hierarchical control unit, an adaptive adjustment unit and a learning update unit; the present application constructs an inverter cluster and a multi-source grid-connected grid system model of a high-proportion new energy power grid, deeply integrates the VSG control structure of the networked energy storage inverters with the A3C reinforcement learning algorithm, forms an adaptive cooperative control framework of the inverter cluster, realizes real-time cooperative operation among multiple inverters, and thus significantly improves the voltage stability, frequency support capability, power distribution coordination and friendly accommodation capability for new energy fluctuations of the high-proportion new energy power grid, and provides intelligent support for the safe, stable and efficient operation of the high-proportion new energy power system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of new energy power generation and power electronic conversion control technology, and in particular to a new energy inverter cluster collaborative control method and system based on a grid-type energy storage inverter. Background Technology

[0002] With the transformation of the global energy structure, the penetration rate of renewable energy in the power system is constantly increasing. Wind power, photovoltaic power generation devices, and other new energy power generation devices are being connected to the grid on a large scale, and the power system is gradually moving towards a grid pattern where a high proportion of renewable energy and a high proportion of power electronic equipment coexist. Compared with traditional power systems dominated by synchronous generators, this new type of grid has significantly reduced inertia, damping characteristics, and voltage support capabilities, facing severe challenges in dealing with frequency fluctuations, voltage disturbances, and fault recovery. In traditional power grids, synchronous generators rely on the physical inertia of their rotating components to provide natural frequency stability. However, new energy power generation devices are usually connected to the grid through power electronic converters, lacking the mechanical inertia of synchronous generators, and their power output depends on control system regulation. When the grid is disturbed, these devices cannot provide an instantaneous inertial response like synchronous machines, easily leading to severe grid frequency fluctuations, power imbalances, and even triggering cascading grid disconnection faults, seriously threatening system stability.

[0003] Currently, widely used grid-following (GFL) inverters primarily rely on phase-locked loops (PLLs) to obtain the grid voltage phase and follow it during operation, representing a "passive" control method. While this structure performs stably in strong grids, it struggles to maintain voltage and frequency references under weak grid or islanded operation conditions. It is prone to loss of synchronization and grid disconnection when the grid voltage collapses or frequency drifts, limiting the proportion of renewable energy absorbed by the grid. To address these issues, grid-forming (GFM) technology has emerged. Unlike traditional grid-following inverters, GFM inverters actively establish voltage and frequency references, enabling the construction of grid voltage and power support, and can still operate independently and stably in islanded or weak grids. Its core idea is to simulate the electrical and mechanical characteristics of a synchronous generator within the inverter through control algorithms, thereby achieving virtual inertia and virtual damping support. This type of inverter has significant advantages in voltage and frequency support, transient stability, short-circuit current support, and multi-machine collaborative power distribution.

[0004] Currently, mainstream grid-connected inverter control strategies include droop control, Virtual Synchronous Generator (VSG) control, matching control, and virtual oscillator control. Droop control has a simple structure but limited dynamic performance; VSG control introduces virtual inertia and damping through simulated oscillation equations, enhancing dynamic stability, but its fixed parameters make it difficult to balance speed and robustness; matching control constructs a control model based on the power transfer relationship between the DC and AC sides, achieving low-delay and high-stability grid-connected characteristics; virtual oscillator control achieves a self-synchronization mechanism through a nonlinear oscillator model, which is beneficial for multi-unit parallel operation. However, all of these methods suffer from fixed parameters and insufficient adaptability, making it difficult to cope with the complex dynamics brought about by renewable energy fluctuations and changes in grid operating conditions.

[0005] In power systems with a high proportion of renewable energy penetration, grid-connected energy storage inverters typically establish voltage and frequency references, while grid-connected devices such as photovoltaic and wind power inverters participate in active / reactive power regulation and power balance as current sources. Due to differences in the control mechanisms of different inverters, and the high uncertainty of grid impedance, power coupling, and operating conditions, parallel operation of multiple inverters can easily lead to uneven power distribution, accumulated frequency offsets, and multi-inverter oscillations, affecting system stability. Furthermore, existing control methods often focus on single objectives such as frequency support or voltage regulation, lacking comprehensive optimization capabilities for energy storage balance, power economy, and renewable energy utilization efficiency. Therefore, there is an urgent need to construct a collaborative control strategy for inverter clusters, led by grid-connected energy storage inverters, for high-proportion renewable energy grids. This strategy would collaboratively optimize the key control parameters of grid-connected inverters and the power regulation commands of grid-connected inverters, forming a multi-level control system with adaptive and intelligent decision-making capabilities to achieve dual voltage and frequency support, dynamic power sharing, and optimal overall system operation.

[0006] With the continuous expansion of the high-proportion renewable energy grid connection, traditional power systems supported by synchronous generators are gradually being replaced by a large number of renewable energy devices based on power electronic interfaces. This significantly reduces the system's inertia and damping levels, leading to frequent voltage and frequency fluctuations and a decrease in dynamic stability margin. While grid-connected energy storage inverters, as crucial equipment supporting the stable operation of new power systems, can actively establish voltage and frequency references through methods such as droop control, VSG, matching control, and virtual oscillation control, the following key technical challenges remain in scenarios with high renewable energy penetration: (1) The control parameters are fixed and the dynamic adaptive capability is insufficient.

[0007] Traditional grid-connected inverter control strategies often employ fixed droop coefficients, virtual inertia, and damping parameters, making it difficult to achieve online adaptive adjustments based on grid operating conditions and power fluctuations from renewable energy sources. When power disturbances or voltage anomalies occur in the system, the inverter exhibits slow dynamic response, significant frequency overshoot, and slow voltage recovery, weakening the system's stability and disturbance rejection capabilities.

[0008] (2) Insufficient stability of multi-machine collaborative operation.

[0009] When grid-connected energy storage inverters are connected in parallel with grid-connected photovoltaic and wind power inverters, problems such as uneven power distribution, frequency offset accumulation, low-frequency / high-frequency oscillations, and even instability can easily occur due to differences in control mechanisms between different inverters, complex line impedance coupling, and dynamic inconsistencies in power links. It is difficult to achieve active / reactive power coordination between inverters and overall system stability and coordination.

[0010] (3) The optimization objective is singular, and the overall system performance is limited.

[0011] Existing research focuses primarily on single performance indicators such as frequency stability or voltage regulation, while neglecting comprehensive objectives such as energy balance, operational economy, system losses, and renewable energy utilization rate of energy storage systems. This makes it difficult to achieve multi-objective optimization that balances system stability, energy utilization efficiency, and economy.

[0012] (4) The integration of intelligent optimization and network control is insufficient.

[0013] While some studies have attempted to adjust inverter parameters using traditional optimization algorithms or rule-based adaptive control, these methods rely on human experience, have slow response times, and weak generalization capabilities, making them ill-suited for the complex dynamics and uncertainties inherent in scenarios with a high proportion of renewable energy. There is a lack of intelligent control frameworks with online learning, autonomous decision-making, and long-term optimization capabilities.

[0014] In summary, existing inverter cluster control methods are insufficient to simultaneously meet the requirements of high-proportion renewable energy power systems in terms of dynamic stability, power coordination, and intelligent adaptability, thus limiting their further promotion and application in new power systems. Summary of the Invention

[0015] This invention provides a collaborative control method and system for a cluster of new energy inverters based on a grid-type energy storage inverter, which overcomes the shortcomings of the prior art. It can effectively solve the problems that existing grid-type energy storage inverters are difficult to achieve adaptive parameter optimization in complex operating environments and cannot take into account the collaborative adjustment of multiple types of inverters.

[0016] To address the above problems, one of the technical solutions of this invention is achieved through the following method: a collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter, comprising the following steps: A multi-source grid system model is constructed based on inverter clusters and a high proportion of new energy grids; the inverter clusters include grid-connected energy storage inverters, photovoltaic inverters and wind power inverters. An A3C reinforcement learning algorithm is built in the power grid system model to formulate the current control strategy, collect the operation status information of the high-proportion new energy power grid, and adjust the control parameters of the inverter cluster according to the current control strategy based on the operation status information. A multi-objective reward function model is constructed to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; the multi-objective reward function model includes the system economy and energy balance objective function, the steady-state performance and carbon emission objective function, and the comprehensive reward function; Design a hierarchical collaborative control architecture to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster under a high proportion of new energy power grid. The A3C reinforcement learning algorithm enables adaptive adjustment of the control parameters of the inverter cluster based on system operating status information; Using the system operation status information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster, the A3C reinforcement learning algorithm is trained to update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated.

[0017] The above-mentioned grid system model is based on inverter clusters and a high proportion of renewable energy grids to construct a multi-source grid-connected system; wherein, the inverter clusters include grid-connected energy storage inverters, photovoltaic inverters, and wind power inverters; including: The grid-type energy storage inverter adopts a grid-type voltage source model and establishes its equivalent rotor dynamics model based on the VSG principle to simulate the virtual inertia, virtual damping, and droop control characteristics of the synchronous generator. The virtual inertia of the VSG is expressed as follows: , in, For virtual inertia, The damping coefficient is... For reference active power, The rated angular frequency, This represents the equivalent electrical angular frequency of a grid-connected energy storage inverter under virtual synchronous control. This represents the equivalent inertial regulation quantity constructed based on the principle of a virtual synchronous generator; This represents the electrical output active power of the equivalent virtual synchronous generator of the grid-connected energy storage inverter; The output voltage of the grid-type energy storage inverter is: , In the formula, This refers to the output voltage of a grid-connected energy storage inverter. This is the reference amplitude of the inverter output voltage; This represents the instantaneous output angular frequency of the grid-connected energy storage inverter, calculated based on the virtual synchronous generator control model. This is the reference phase angle for the output voltage; Output active power of grid-connected energy storage inverter and reactive power They are respectively: , , In the formula, This indicates the active power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions; This represents the reactive power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions. This is the equivalent amplitude of the output voltage of the grid-connected energy storage inverter; This is the equivalent amplitude of the output current; The phase difference between the output voltage and the output current; Both photovoltaic inverters and wind power inverters operate in a current-source grid-connected control mode, and their current command is expressed as follows: , In the formula, This represents the current command component of a photovoltaic inverter or wind power inverter in the d-axis direction in a synchronous rotating coordinate system. This represents the current command component of a photovoltaic inverter or wind power inverter in the q-axis direction in a synchronous rotating coordinate system. This indicates the reference value of the active power of the corresponding inverter; This indicates the reactive power reference value for the corresponding inverter; This represents the voltage component of the grid-connected voltage along the d-axis.

[0018] The above-mentioned A3C reinforcement learning algorithm is built in the power grid system model to formulate the current control strategy, collect the operating status information of the high-proportion renewable energy power grid, and adjust the control parameters of the inverter cluster according to the current control strategy based on the operating status information; including: The A3C reinforcement learning algorithm consists of a shared global neural network and multiple threaded parallel local neural networks; each neural network includes an Actor network and a Critic network. The Actor network interacts with the intelligent agent, which acts as an inverter. The agent is responsible for outputting adjustment strategies for control parameters based on the operational status information of the high-proportion renewable energy grid, i.e., generating strategies. Among them, the Actor network parameters The update formula is: , In the formula, The learning rate of the Actor network. Represents the gradient. For the dominant function, The entropy of the strategy; This represents the state vector of the system at time t; For the updated Actor network parameters; This is the entropy regularization coefficient; Indicates the state of the Actor network. The output action, i.e., the control parameter adjustment of the inverter; For the Actor network parameters The strategy function for representation; The Critic network evaluates the performance under the current adjustment strategy and uses the advantage function. Convergence and improvement of guidance strategies; among which, Critic network parameters The update is based on the TD error, and the formula is as follows: , In the formula, Time difference error is used to measure the deviation between the current state value estimate and the value estimate after one step reward. It is the immediate reward that the system receives after performing a control action; It is a discount factor used to represent the weight of future rewards in the current evaluation; and These represent the system's state at the current moment and the state at the next moment, respectively. and This represents the state value estimate of the Critic network for the current state and the next state; is the learning rate of the Critic network; For Critic network parameters, This represents the gradient of the state value function with respect to the Critic network parameters.

[0019] The aforementioned multi-objective reward function model is used to dynamically balance and collaboratively optimize the control parameters of the inverter cluster. This model includes objective functions for system economy and energy balance, steady-state performance and carbon emission, and a comprehensive reward function. (1) The expression for the multidimensional operational performance evaluation index is as follows: , in, It is a multidimensional performance evaluation index function; These are the weighting coefficients for each evaluation indicator; For voltage deviation; This is for frequency deviation; For power sharing error; For energy storage balance; For system energy efficiency; This is a function that normalizes or penalizes various indicators, used to convert indicators with different dimensions into a unified reward scale. (2) System economic and energy balance objective function: The calculation method is as follows: , In the formula, This represents the objective function value obtained by optimizing the operating cost of the inverter cluster within the scheduling cycle; Let T be the total system operating cost, and T be the total duration of the optimized scheduling cycle. The cost of inverter losses at time t; Energy cost of energy storage systems; The energy cost of interacting with the power grid; specifically expressed as: , in, These are the unit energy loss coefficients of the equipment. For grid-connected electricity price; This represents the output active power of the inverter at time t. and Let represent the charging active power and discharging active power of the energy storage system at time t, respectively. This represents the active power interaction between the grid-connected inverter cluster and the public power grid at time t. To ensure a balance between energy supply and demand, a power balance constraint is introduced, and the power balance deviation is... It can be represented as: , In the formula, This represents the output active power of the photovoltaic power generation system at time t. Let be the active power of the energy storage system at time t; This represents the active power demand of the load at time t; This represents the active power of energy storage during charging at time t; This represents the active power loss of the system at time t; when At that time, the system is in a state of instantaneous power balance; Power balance bonus It can be represented as: ; (3) Steady-state performance and carbon emission objective function: Among them, frequency and voltage stability targets Represented as: , In the formula, , where is the weighting coefficient, used to characterize the relative importance of frequency stability index and voltage stability index in the comprehensive objective function; This represents the deviation of the system from its rated frequency at time t. The system's rated frequency; It represents the deviation of the system from the reference voltage at time t; by normalizing the frequency deviation and voltage deviation to their rated values, the evaluation indicators are transformed into dimensionless forms, thereby realizing the unified measurement and weighted fusion of different physical quantities, which is used to characterize the comprehensive stability performance of the system's operating state. The goal of minimizing carbon emissions can be expressed as: , In the formula, The objective function is carbon emissions. These are the equivalent carbon emission factors per unit of active power corresponding to the output power of the grid-connected inverter, the discharge power of the energy storage system, and the power interacting with the public grid. (4) The comprehensive reward function is: , in, Weighting coefficient Based on system operation requirements, the following must be met: .

[0020] The above-mentioned hierarchical collaborative control architecture enables collaborative control of inverter clusters in high-proportion renewable energy grids based on key control variables of the inverter clusters. This includes: the local control layer receiving control commands from the coordination control layer, coordinating the control between grid-connected energy storage inverters, photovoltaic inverters, and wind power inverters, and feeding back control information to the collaborative control layer. The coordination control layer receives global control commands from the global control layer, exchanges real-time operating data among the inverters, enables active power sharing and optimal allocation of available power among the inverters, and feeds back control information to the global control layer. The coordination control layer must satisfy stability constraints, which are as follows: , , in, For the first The output power of the inverter To meet system load requirements, This refers to the allowable frequency deviation range; and They represent the first Taiwan inverter and the first The output frequency of the inverters in the grid-connected operation state is used to characterize the frequency consistency constraints among the inverter clusters. The global optimization layer is based on the A3C control framework. It dynamically generates control strategies according to the state of the power grid system, performs adaptive optimization on the grid-type energy storage inverter, and performs global scheduling on the active / reactive power reference values ​​of photovoltaic and wind power inverters.

[0021] The aforementioned A3C reinforcement learning algorithm achieves adaptive adjustment of the control parameters of the inverter cluster based on system operating state information; including: The parameter update methods for each intelligent agent, i.e., the inverter, are as follows: , in, For learning rate, Accumulated rewards for achieving the target; Indicates the first The parameter vectors of the control strategies of each agent in the next iteration; This indicates the updated policy parameters; Describe the objective function Regarding strategy parameters The gradient is used to indicate the direction of parameter updates; When the A3C reinforcement learning algorithm detects anomalies in the power grid based on the system's operating status information, it outputs new control parameters for the inverter cluster in real time and performs adaptive adjustment of the control parameters. Among these anomalies, power grid anomalies include fluctuations in new energy output, power grid disturbances, or sudden load changes. The specific adjustment mechanism is as follows: , In the formula, Indicates the first Virtual inertia parameters of the virtual synchronous generator at any given time. Adjust the increment of its corresponding parameter; Indicates the first The virtual damping coefficient at time t. This represents the adjustment increment of the damping parameter; and They represent the first Control gain parameters in the active and reactive power control loops of the inverter at all times. and These represent the online update values ​​for the corresponding control gain.

[0022] The above-mentioned grid system model, after adaptive adjustment of control parameters of the inverter cluster, is used to train the A3C reinforcement learning algorithm based on the system operating state information. This algorithm updates the current control strategy and performs the next iteration until the voltage and frequency of the high-proportion renewable energy grid stabilizes and power distribution is coordinated. This includes: The agent continuously collects the system state-action-reward sequence during operation. ,Right now: For grid-type energy storage inverters, the virtual inertia, damping coefficient, and droop control parameters of the device are continuously collected, and the collected control parameters are optimized to achieve adaptive support of voltage and frequency for the grid-type energy storage inverters. For photovoltaic inverters and wind power inverters, active / reactive output tracking is performed according to the scheduling instructions of the updated control strategy to maintain consistency with the grid reference.

[0023] The second technical solution of this invention is achieved through the following means: a collaborative control system for a new energy inverter cluster based on a grid-type energy storage inverter, comprising: The model building unit constructs a multi-source grid-connected power grid system model based on inverter clusters and a high proportion of new energy power grids; wherein, the inverter cluster includes grid-type energy storage inverters, photovoltaic inverters and wind power inverters; The algorithm building unit builds the A3C reinforcement learning algorithm in the power grid system model, formulates the current control strategy, collects the operating status information of the high-proportion new energy power grid, and adjusts the control parameters of the inverter cluster according to the current control strategy based on the operating status information. The reward construction unit constructs a multi-objective reward function model to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; the multi-objective reward function model includes system economic and energy balance objective functions, steady-state performance and carbon emission objective functions, and a comprehensive reward function; A hierarchical control unit is designed, and a hierarchical collaborative control architecture is implemented to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster under a high proportion of new energy power grid. The adaptive adjustment unit uses the A3C reinforcement learning algorithm to adaptively adjust the control parameters of the inverter cluster based on system operating status information. The learning and updating unit uses the system operation status information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster to train the A3C reinforcement learning algorithm, update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated.

[0024] Compared with the prior art, the present invention has the following advantages: This invention constructs a grid system model of inverter clusters and multi-source grid connection in a high-proportion renewable energy power grid. It deeply integrates the VSG control structure adopted by grid-connected energy storage inverters with the A3C reinforcement learning algorithm to form an adaptive collaborative control framework for inverter clusters. In the A3C reinforcement learning algorithm, the agent collects grid operation status information in real time, dynamically optimizes the key control parameters of the grid-connected energy storage inverters, and provides coordinated power reference values ​​or reactive power adjustment commands to photovoltaic / wind power inverters. This enables real-time collaborative operation among multiple inverters, thereby significantly improving the voltage stability, frequency support capability, power distribution coordination, and friendly absorption capability of renewable energy fluctuations in the high-proportion renewable energy power grid. This provides intelligent support for the safe, stable, and efficient operation of the high-proportion renewable energy power system. Attached Figure Description

[0025] The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

[0026] Figure 1 This is a flowchart of the method in Embodiment 1 of the present invention.

[0027] Figure 2 This is a structural diagram of a multi-source grid-connected power grid system in Embodiment 1 of the present invention.

[0028] Figure 3 This is a flowchart of the VSG control of the grid-type energy storage inverter in Embodiment 1 of the present invention.

[0029] Figure 4 This is a network structure diagram of the A3C reinforcement learning algorithm in Embodiment 1 of the present invention.

[0030] Figure 5 This is a flowchart of the A3C reinforcement learning algorithm in Embodiment 1 of the present invention.

[0031] Figure 6 This is a diagram of the hierarchical collaborative control architecture in Embodiment 1 of the present invention. Detailed Implementation

[0032] The present invention is not limited to the following embodiments, and the specific implementation can be determined according to the technical solution of the present invention and the actual situation.

[0033] Example 1: As Figure 1 As shown in the figure, this invention discloses a collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter, comprising the following steps: Step S101: Construct a multi-source grid system model based on inverter clusters and a high proportion of new energy grids; wherein, the inverter cluster includes grid-connected energy storage inverters, photovoltaic inverters and wind power inverters; Step S102: Build the A3C reinforcement learning algorithm in the power grid system model, formulate the current control strategy, collect the operating status information of the high-proportion new energy power grid, and adjust the control parameters of the inverter cluster according to the current control strategy based on the operating status information. Step S103: Construct a multi-objective reward function model to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; wherein, the multi-objective reward function model includes a system economic and energy balance objective function, a steady-state performance and carbon emission objective function, and a comprehensive reward function; Step S104: Design a hierarchical collaborative control architecture. Under a high proportion of new energy power grid, collaborative control of the inverter cluster is carried out based on the key control variables of the inverter cluster. The collaborative control includes functions such as voltage support, frequency stabilization, and active power distribution to ensure the stability and coordination of system operation. The collaborative control uses key operating quantities such as inverter node voltage, frequency, and steady-state active power as feedback variables. Step S105: The A3C reinforcement learning algorithm adaptively adjusts the control parameters of the inverter cluster based on the system operating status information; wherein, the system operating status information includes load changes, new energy output fluctuations, grid disturbances, and inverter node voltage, frequency, and power deviations; the reinforcement learning algorithm is used to optimize the control parameter configuration online, and the underlying controller executes the specific control actions. Step S106: Using the system operation status information of the grid system model after adaptive adjustment of the control parameters of the inverter cluster, train the A3C reinforcement learning algorithm, update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy grid are stable and the power distribution is coordinated.

[0034] In step S101 above, a multi-source grid-connected power grid system model is constructed based on the inverter cluster and a high proportion of renewable energy grids, such as... Figure 2 As shown; the inverter cluster includes grid-connected energy storage inverters, photovoltaic inverters, and wind power inverters; including: The grid-type energy storage inverter adopts a grid-type voltage source model and establishes its equivalent rotor dynamics model based on the VSG principle to simulate the virtual inertia, virtual damping, and droop control characteristics of the synchronous generator. The virtual inertia of the VSG is expressed as follows: , in, This is the virtual inertia, used to characterize the inertial response of a virtual synchronous generator to changes in angular velocity. The damping coefficient is... For reference active power, The rated angular frequency, This represents the equivalent electrical angular frequency of a grid-type energy storage inverter under virtual synchronous control, and its physical meaning is equivalent to the angular velocity in the virtual synchronous generator model. It represents the equivalent inertial regulation constructed based on the principle of virtual synchronous generator, used to describe the impact of active power imbalance on angular frequency regulation; This represents the electrical output active power of the equivalent virtual synchronous generator of the grid-connected energy storage inverter; The output voltage of the grid-type energy storage inverter is: , In the formula, This refers to the output voltage of a grid-connected energy storage inverter. This is the reference amplitude of the inverter output voltage; This represents the instantaneous output angular frequency of the grid-connected energy storage inverter, calculated based on the virtual synchronous generator control model. This is the reference phase angle for the output voltage; Output active power of grid-connected energy storage inverter and reactive power They are respectively: , , In the formula, This indicates the active power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions; This represents the reactive power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions. This is the equivalent amplitude of the output voltage of the grid-connected energy storage inverter; This is the equivalent amplitude of the output current; The phase difference between the output voltage and the output current; Both photovoltaic and wind power inverters operate in a current-source grid-connected control mode. In a synchronous rotating coordinate system oriented towards the grid voltage, the q-axis voltage component is approximately zero (Vq≈0). Therefore, active and reactive power can be independently adjusted by the d-axis and q-axis current components, respectively. The current command can be expressed as: , In the formula, and These represent the current command components of the photovoltaic inverter or the wind power inverter in the d-axis and q-axis directions in the synchronous rotating coordinate system, respectively. and These represent the active power reference value and reactive power reference value of the corresponding inverter, respectively. This represents the voltage component of the grid-connected voltage along the d-axis. The dq coordinate system uses the grid-connected voltage as its orientation reference, making the q-axis component of the grid-connected voltage zero, thereby achieving a decoupled control characteristic where active power is controlled by the d-axis current and reactive power is controlled by the q-axis current.

[0035] Depend on Figure 2 It is known that the DC-side energy module includes a photovoltaic cell array, a wind turbine generator, and an energy storage battery pack. The photovoltaic cell array is connected to a photovoltaic inverter, the wind turbine generator is connected to a wind turbine inverter, and the energy storage battery pack is connected to a grid-type energy storage inverter, providing controllable energy for the system to build a voltage reference and frequency support.

[0036] Specifically, by constructing a high-proportion renewable energy grid system model that includes grid-connected energy storage inverters, photovoltaic inverters, wind power inverters, and load nodes, the control characteristics of photovoltaic inverters, wind power inverters, and grid-connected energy storage inverters are uniformly incorporated into the same system framework. The grid topology, node power injection relationships, and line impedance parameters are clearly defined. In this framework, grid-connected energy storage inverters provide voltage and frequency shaping capabilities, while photovoltaic and wind power inverters complete active and reactive power output regulation according to system scheduling. As a result, the grid system model further considers factors such as photovoltaic and wind power output fluctuations, line impedance coupling, energy storage status changes, and load disturbances, forming a system model suitable for reinforcement learning training and control strategy optimization.

[0037] Among them, the grid-type energy storage inverter adopts grid-type voltage source modeling and establishes its equivalent rotor dynamics model based on the VSG principle to simulate the virtual inertia, virtual damping, and droop control characteristics of the synchronous generator, thereby realizing the active construction of voltage and frequency references and undertaking the functions of virtual inertia support, voltage regulation, and frequency control of the system. The photovoltaic inverter and wind power inverter, as current source grid-following inverters, obtain the voltage phase of the grid connection point based on the phase-locked loop. Their control structure mainly consists of a current inner loop and a power regulation loop, which are used to follow the voltage / frequency reference formed by the grid-type energy storage inverter and execute active and reactive power regulation commands to achieve power point tracking. Dynamic factors such as line impedance, node voltage coupling, new energy output fluctuations, and load disturbances are incorporated into the system model, so that the overall system can accurately reflect the electromagnetic transient characteristics under the operation of the GFM (grid-type)-GFL (grid-following) hybrid inverter cluster, providing a model foundation for subsequent collaborative control strategy design and reinforcement learning training.

[0038] Among them, the VSG control framework of the grid-type energy storage inverter is as follows: Figure 3As shown, the inverter simulates the swing equation, speed governor, and droop control of a synchronous generator through a virtual synchronous machine model, generating reference values ​​such as voltage amplitude and frequency. Then, it drives the voltage-source converter to output AC voltage through a dual closed-loop voltage loop and current loop, as well as pulse width modulation, thereby providing frequency support, voltage support, and virtual inertia at the grid connection point. The active and reactive power on the grid connection side are fed back to the VSG controller after passing through a low-pass filter, realizing the self-synchronous operation and stable power regulation of the grid-connected inverter under grid disturbances.

[0039] In step S102 above, an A3C reinforcement learning algorithm is built in the power grid system model to formulate the current control strategy, collect the operating status information of the high-proportion renewable energy power grid, and adjust the control parameters of the inverter cluster according to the current control strategy based on the operating status information; including: The A3C reinforcement learning algorithm consists of a shared global neural network and multiple threaded parallel local neural networks, such as... Figure 4 As shown; each neural network includes an Actor network and a Critic network; The Actor network interacts with the intelligent agent, which acts as an inverter. The agent is responsible for outputting adjustment strategies for control parameters based on the operational status information of the high-proportion renewable energy grid, i.e., generating strategies. Among them, the Actor network parameters The update formula is: , In the formula, The learning rate of the Actor network. Represents the gradient. For the dominant function, The entropy of the strategy; This represents the state vector of the system at time t; For the updated Actor network parameters; This is the entropy regularization coefficient; Indicates the state of the Actor network. The output action, i.e., the control parameter adjustment of the inverter; For the Actor network parameters The strategy function for representation; The Critic network evaluates the performance under the current adjustment strategy and uses the advantage function. Convergence and improvement of guidance strategies; among which, Critic network parameters The update is based on the TD error, and the formula is as follows: , In the formula, Time difference error is used to measure the deviation between the current state value estimate and the value estimate after one step reward. It is the immediate reward that the system receives after performing a control action; It is a discount factor used to represent the weight of future rewards in the current evaluation; and These represent the system's state at the current moment and the state at the next moment, respectively. and This represents the state value estimate of the Critic network for the current state and the next state; is the learning rate of the Critic network; For Critic network parameters, This represents the gradient of the state value function with respect to the Critic network parameters.

[0040] The A3C reinforcement learning algorithm is built into the power grid system model. Through the interaction between agents (each inverter as an agent) and the power grid operating environment, online optimization of inverter control parameters and strategy updates are achieved. Specifically, the Actor network is responsible for outputting adjustment strategies for inverter control parameters based on power grid state information (node ​​voltage deviation, frequency deviation, power imbalance, and energy storage status, etc.). The Critic network evaluates the performance under the current strategy and guides strategy convergence and improvement through a dominance function. Simultaneously, the A3C reinforcement learning algorithm outputs active / reactive power regulation reference values ​​to the grid-connected photovoltaic and wind power inverters, achieving coordinated response of the inverter cluster under the leadership of the grid-connected energy storage inverters. Thus, the A3C reinforcement learning algorithm, through an asynchronous multi-agent parallel learning mechanism, enables the coordinated control strategy to have higher generalization ability and adaptability under complex operating conditions.

[0041] In step S103 above, a multi-objective reward function model is constructed to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; wherein, the multi-objective reward function model includes a system economic and energy balance objective function, a steady-state performance and carbon emission objective function, and a comprehensive reward function; including: (1) The expression for the multidimensional operational performance evaluation index is as follows: , in, It is a multidimensional performance evaluation index function; These are the weighting coefficients for each evaluation indicator; For voltage deviation; This is for frequency deviation; For power sharing error; For energy storage balance; For system energy efficiency; This is a function for normalizing or penalizing various indicators, used to convert indicators with different dimensions into a unified reward scale. Among them, the multidimensional performance evaluation function This method is used to characterize the overall operational quality of multi-inverter systems in terms of voltage, frequency, power sharing, energy storage balance, and energy efficiency, and to clarify the key evaluation dimensions of system performance. Based on this, it further constructs the original optimization objective functions for the system in terms of economy, stability, energy balance, and carbon emissions, and ultimately maps them uniformly to a comprehensive immediate reward function used in reinforcement learning algorithms. ; (2) System economic and energy balance objective function: The present invention sets the optimization objective of the current control strategy as minimizing operating cost, minimizing carbon emissions, and maximizing power supply reliability. Among them, the objective of minimizing operating cost is to minimize the total operating cost of the microgrid system under the premise of meeting energy demand and system constraints. The objective function is calculated as follows: , In the formula, This represents the objective function value obtained by optimizing the operating cost of the inverter cluster within the scheduling cycle; The total operating cost of the system, Based on the total operating cost of the system The objective function is to optimize the economic efficiency of the target, where T is the total duration of the optimization scheduling cycle. The cost of inverter losses at time t; Energy cost of energy storage systems; The energy cost of interacting with the power grid; specifically expressed as: , in, These are the unit energy loss coefficients of the equipment. For grid-connected electricity price; This represents the output active power of the inverter at time t. and Let represent the charging active power and discharging active power of the energy storage system at time t, respectively. This represents the active power interaction between the grid-connected inverter cluster and the public power grid at time t. To ensure a balance between energy supply and demand, a power balance constraint is introduced, and the power balance deviation is... It can be represented as: , In the formula, This represents the output active power of the photovoltaic power generation system at time t. Let be the active power of the energy storage system at time t. , and Let represent the active power demand of the load at time t, the active power of energy storage charging, and the active power loss of the system, respectively; when At that time, the system is in a state of instantaneous power balance; Power balance bonus It can be represented as: , (3) Steady-state performance and carbon emission objective function: To improve the stability and sustainability of the power grid system, frequency and voltage stability objectives and carbon emission minimization objectives are introduced: Among them, frequency and voltage stability targets Represented as: , In the formula, , where is the weighting coefficient, used to characterize the relative importance of frequency stability index and voltage stability index in the comprehensive objective function; This represents the deviation of the system from its rated frequency at time t. The system's rated frequency; It represents the deviation of the system from the reference voltage at time t; by normalizing the frequency deviation and voltage deviation to their rated values, the evaluation indicators are transformed into dimensionless forms, thereby realizing the unified measurement and weighted fusion of different physical quantities, which is used to characterize the comprehensive stability performance of the system's operating state. Carbon emissions from the power grid system mainly originate from inverter losses, energy storage conversion losses, and grid energy interaction. Therefore, the goal of minimizing carbon emissions can be expressed as: , In the formula, The objective function is carbon emissions. These are the equivalent carbon emission factors per unit of active power corresponding to the output power of the grid-connected inverter, the discharge power of the energy storage system, and the power interacting with the public grid. (4) Comprehensive Reward Function: Combining the above multi-objective optimization content, the instantaneous reward function of the reinforcement learning agent at each time step is defined as follows: , in, Weighting coefficient Based on system operation requirements, the following must be met: .

[0042] Among them, the multi-objective reward function model reflects the multi-objective reward function of operating indicators such as voltage stability, frequency deviation, power distribution error, energy storage state balance and system energy utilization efficiency, so as to realize the comprehensive evaluation and optimization of the coordinated control performance of inverter cluster. By setting corresponding weights for each performance indicator, a reward structure that can balance system stability and economy is established.

[0043] In step S104 above, a hierarchical collaborative control architecture is designed to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster in a high-proportion renewable energy power grid; including: The local control layer receives control commands from the coordination control layer, performs coordinated control of the grid-type energy storage inverter, photovoltaic inverter and wind power inverter, and feeds back the control information to the coordination control layer. The coordination control layer receives global control commands from the global control layer, exchanges real-time operating data among the inverters, enables active power sharing and optimal allocation of available power among the inverters, and feeds back control information to the global control layer. The coordination control layer must satisfy stability constraints, which are as follows: , , in, For the first The output power of the inverter To meet system load requirements, This refers to the allowable frequency deviation range; and They represent the first Taiwan inverter and the first The output frequency of the inverters in the grid-connected operation state is used to characterize the frequency consistency constraints among the inverter clusters. The global optimization layer is based on the A3C control framework. It dynamically generates control strategies according to the state of the power grid system, performs adaptive optimization on the grid-type energy storage inverter, and performs global scheduling on the active / reactive power reference values ​​of photovoltaic and wind power inverters.

[0044] like Figure 6As shown, this invention aims to achieve coordinated control of voltage support, frequency stability, and power distribution for energy storage, photovoltaic, and wind power inverter clusters under high-proportion renewable energy grid conditions. It constructs a hierarchical collaborative control architecture consisting of a local control layer, a coordinated control layer, and a global optimization layer. This architecture uses grid-connected energy storage inverters as active voltage and frequency generating units, and photovoltaic and wind power inverters as power regulation and grid connection execution units. Through multi-level collaboration, it achieves stable, efficient, and intelligent operation of the inverter cluster. This hierarchical control structure enables the inverter cluster to form a complementary relationship across three levels: local rapid control, group coordinated control, and global intelligent optimization. This hierarchical architecture effectively reduces system control complexity, improves the real-time coordination capability of multi-inverter systems, and significantly enhances the voltage stability, frequency support capability, and power distribution balance of high-proportion renewable energy grids.

[0045] The local control layer is responsible for the rapid dynamic control of a single inverter and forms the foundation of the three-layer structure. For grid-connected energy storage inverters, the local controller, based on the VSG principle, adjusts key control quantities such as virtual inertia, damping coefficient, active droop coefficient, reactive droop coefficient, reference voltage, and reference frequency to achieve active voltage reference construction and frequency support and synchronization. For grid-connected inverters, the local control layer follows the reference phase angle formed by the grid-connected inverter through a phase-locked loop and performs active power regulation, reactive power support, and voltage regulation based on the active / reactive reference values ​​issued by the coordination layer, thereby achieving stable grid-connected operation.

[0046] The coordination control layer enables power sharing, operational consistency maintenance, and energy complementarity among inverter clusters, including energy storage, photovoltaic, and wind power. This layer exchanges real-time operational data among the inverters, including node frequency and phase angle, node voltage amplitude, active and reactive power output of each inverter, and energy storage SOC status. Through consistency control, power allocation strategies, and energy storage management strategies, the coordination control layer helps inverters achieve active power sharing, optimal allocation of available power between photovoltaic and wind power inverters, multi-source complementarity, and SOC balance management. This ensures power sharing, phase angle consistency synchronization, and energy storage status coordination among multiple inverters, guaranteeing the stability of parallel operation.

[0047] The global optimization layer is based on the A3C reinforcement learning algorithm. It uses power grid system operation data to build a policy decision model, dynamically generates control strategies according to the overall performance target, adaptively optimizes control quantities such as virtual inertia, damping, and droop parameters of grid-connected energy storage inverters, globally schedules the active / reactive reference values ​​of photovoltaic inverters and wind power inverters, and adjusts the control strategy in real time according to changes in power grid system load, new energy fluctuations, and energy storage SOC status. This improves the overall stability of system frequency and voltage, and achieves optimized energy allocation and economical operation under high proportion of new energy conditions.

[0048] In step S105 above, the A3C reinforcement learning algorithm adaptively adjusts the control parameters of the inverter cluster based on system operating state information; including: The parameter update methods for each intelligent agent, i.e., the inverter, are as follows: , in, For learning rate, Accumulated rewards for achieving the target; Indicates the first The parameter vector of the control strategy of each agent (inverter) in the next iteration This indicates the updated policy parameters; Describe the objective function Regarding strategy parameters The gradient is used to indicate the direction of parameter updates. The inverter's parameter updates follow a standard policy gradient parameter update format, maximizing the expected reward through gradient ascent, and are used to train multi-agent reinforcement learning.

[0049] When the A3C reinforcement learning algorithm detects grid anomalies based on system operating status information, it outputs new control parameters for the inverter cluster in real time and adaptively adjusts these parameters. Grid anomalies include fluctuations in renewable energy output, grid disturbances, or sudden load changes. The specific adjustment mechanism is as follows: , In the formula, Indicates the first Virtual inertia parameters of the virtual synchronous generator at any given time. Adjust the increment of its corresponding parameter; Indicates the first The virtual damping coefficient at time t. This represents the adjustment increment of the damping parameter; and They represent the first Control gain parameters in the active and reactive power control loops of the inverter at all times. and These represent the online update values ​​for the corresponding control gains. These parameters are updated online to achieve adaptive optimization of system performance. Through this adaptive adjustment mechanism, the inverter cluster can achieve goals such as dynamic active / reactive power balance, rapid grid frequency stabilization, and consistent power sharing among multiple inverters under different operating conditions.

[0050] In the grid-connected operation of the inverter cluster, this invention utilizes the A3C reinforcement learning algorithm to achieve adaptive optimization of control parameters for the grid-connected energy storage inverters through online policy updates, and dynamically schedules the active / reactive power reference values ​​of the photovoltaic and wind power inverters. Each agent autonomously updates its control strategy based on real-time changes in the system's operating status (such as sudden load changes, fluctuations in renewable energy output, voltage or frequency disturbances, and changes in the state of charge of energy storage), thereby achieving coordinated response and optimized adjustment of the inverter cluster. During real-time system operation, each agent dynamically optimizes the strategy parameters based on gradient update rules, achieving self-learning and adaptive control.

[0051] In step S106 above, the system operating state information of the power grid system model after adaptive adjustment of the inverter cluster's control parameters is used to train the A3C reinforcement learning algorithm, update the current control strategy, and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and power distribution is coordinated, including: The agent continuously collects the system state-action-reward sequence during operation. ,Right now: For grid-type energy storage inverters, the virtual inertia, damping coefficient, and droop control parameters of the device are continuously collected, and the collected control parameters are optimized to achieve adaptive support of voltage and frequency for the grid-type energy storage inverters. For photovoltaic inverters and wind power inverters, active / reactive output tracking is performed according to the scheduling instructions of the updated control strategy to maintain consistency with the grid reference.

[0052] The system collects operating status data and calculates immediate rewards through a local controller and a coordination control layer to drive the Actor-Critic network update. In the A3C reinforcement learning algorithm, the agent adaptively adjusts the key control parameters of the inverter based on the evaluation results of the advantage function, enabling the inverter cluster to continuously optimize its network characteristics and power regulation behavior during operation. Through online learning and policy update mechanisms, this invention can continuously improve the dynamic robustness and collaborative adjustment capability of the inverter cluster under different operating conditions, achieving long-term stable optimization control effects.

[0053] Furthermore, to achieve adaptive adjustment and multi-machine collaborative optimization of inverter cluster parameters, this invention treats the inverter cluster, consisting of a grid-connected energy storage inverter and several grid-connected photovoltaic and wind power inverters, as a unified control object. The A3C reinforcement learning algorithm is introduced to jointly optimize the key control parameters of the grid-connected energy storage inverter and the power reference values ​​of the photovoltaic and wind power inverters. To achieve adaptive parameter adjustment, each inverter is treated as an intelligent agent, which continuously collects the system state-action-reward sequence during operation. The system updates global network parameters using a time-difference method, enabling the control strategy to dynamically adjust according to changes in grid operating conditions. For grid-connected energy storage inverters, online learning is used to continuously optimize their virtual inertia, damping coefficient, and droop control parameters, achieving adaptive support for voltage and frequency. For photovoltaic and wind power inverters, active / reactive output tracking is performed based on the scheduling instructions updated by the global strategy, thereby maintaining consistency with the grid benchmark.

[0054] Furthermore, such as Figure 5 As shown, the construction process of the A3C reinforcement learning algorithm is as follows: (1) Define the state space and action space The state space represents the environmental state information related to the agent, namely the operating state information of a high-proportion renewable energy power grid, including node voltage deviation, frequency deviation, power imbalance and energy storage status. t State space at any given moment It can be defined as: , In the formula, This represents the deviation of the system from its rated frequency at time t; This represents the deviation of the system from the reference voltage at time t; This represents the active power deviation of the system at time t relative to the power balance state; This represents the reactive power deviation of the system at time t relative to the power balance state; This represents the state of charge of the energy storage unit at time t, and is used to characterize the energy state of the energy storage system. The action space defines the decision-making actions that the agent can take, including adjustments to the VSG control parameters of the grid-connected energy storage inverter and adjustments to the power reference values ​​of the photovoltaic and wind power inverters. The dynamic space... Defined as: , In the formula, This refers to the online adjustment amount of the virtual inertia parameter; This is the adjustment amount for the damping coefficient; This represents the online adjustment amount of the active power droop coefficient in the droop control of a grid-connected inverter. This represents the online adjustment amount of the reactive power droop coefficient in the droop control of a grid-type inverter, used to achieve adaptive parameter adjustment without changing the control structure; and This represents the adjustment vector of the active and reactive power reference values ​​of the grid-connected inverter, which includes power regulation commands for each photovoltaic inverter and wind power inverter, and is used to achieve power distribution and reactive power support coordination among multiple inverters.

[0055] (2) Define the reward function, i.e. the optimal policy. , in, Weighting coefficient Based on system operation requirements, the following must be met: .

[0056] (3) Initialize the A3C neural network Initialize the globally shared neural network in A3C and the local neural networks in independent threads, and initialize the various parameters of the networks. The main parameters are those of the Actor and Critic networks. , , Learning rate , Discount factor Number of iterations wait.

[0057] (4) Create multiple parallel training threads In each thread, the Agent interacts with the environment, obtains new gradients by calculating the advantage function in the loss function, and shares the gradients with the global neural network.

[0058] (5) Update the global shared neural network and the local neural network The parameters of the global neural network are updated using gradients shared by all threads of the neural network. and The updated parameters are then transmitted to the local neural network, which completes the parameter update.

[0059] (6) Iterate repeatedly until the predetermined number of training iterations is reached. Or the algorithm converges.

[0060] Among them, the A3C reinforcement learning algorithm is an improvement on the traditional actor-critic algorithm. Through multi-threaded and multi-environment interaction and parallel computing, it breaks the correlation between data and improves the convergence speed. Through the above-mentioned online learning and policy update design, the agent can continuously improve the control strategy based on the real-time changes of new energy output fluctuations, load changes, grid disturbances and energy storage state of charge without retraining offline. This enables the inverter cluster to continuously perceive and optimize the operating environment in the long term, and significantly improves the voltage-frequency stability and power distribution coordination capability of the high-proportion new energy grid.

[0061] In summary, this invention constructs a grid system model of inverter clusters and multi-source grid connection in a high-proportion renewable energy power grid. It deeply integrates the VSG control structure adopted by grid-connected energy storage inverters with the A3C reinforcement learning algorithm to form an adaptive collaborative control framework for inverter clusters. In the A3C reinforcement learning algorithm, the agent dynamically optimizes the key control parameters of the grid-connected energy storage inverters by collecting grid operation status information in real time, and provides coordinated power reference values ​​or reactive power adjustment commands to photovoltaic / wind power inverters. This enables real-time collaborative operation among multiple inverters, thereby significantly improving the voltage stability, frequency support capability, power distribution coordination, and friendly absorption capability of renewable energy fluctuations in the high-proportion renewable energy power grid. This provides intelligent support for the safe, stable, and efficient operation of the high-proportion renewable energy power system.

[0062] Example 2: This embodiment of the invention discloses a collaborative control system for a new energy inverter cluster based on a grid-type energy storage inverter, comprising: The model building unit constructs a multi-source grid-connected power grid system model based on inverter clusters and a high proportion of new energy power grids; wherein, the inverter cluster includes grid-type energy storage inverters, photovoltaic inverters and wind power inverters; The algorithm building unit builds the A3C reinforcement learning algorithm in the power grid system model, formulates the current control strategy, collects the operating status information of the high-proportion new energy power grid, and adjusts the control parameters of the inverter cluster according to the current control strategy based on the operating status information. The reward construction unit constructs a multi-objective reward function model to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; the multi-objective reward function model includes system economic and energy balance objective functions, steady-state performance and carbon emission objective functions, and a comprehensive reward function; A hierarchical control unit is designed, and a hierarchical collaborative control architecture is implemented to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster under a high proportion of new energy power grid. The adaptive adjustment unit uses the A3C reinforcement learning algorithm to adaptively adjust the control parameters of the inverter cluster based on system operating status information. The learning and updating unit uses the system operation status information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster to train the A3C reinforcement learning algorithm, update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated.

Claims

1. A collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter, characterized in that, Includes the following steps: A multi-source grid system model is constructed based on inverter clusters and a high proportion of new energy grids; the inverter clusters include grid-connected energy storage inverters, photovoltaic inverters and wind power inverters. An A3C reinforcement learning algorithm is built in the power grid system model to formulate the current control strategy, collect the operation status information of the high-proportion new energy power grid, and adjust the control parameters of the inverter cluster according to the current control strategy based on the operation status information. A multi-objective reward function model is constructed to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; the multi-objective reward function model includes the system economy and energy balance objective function, the steady-state performance and carbon emission objective function, and the comprehensive reward function; Design a hierarchical collaborative control architecture to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster under a high proportion of new energy power grid. The A3C reinforcement learning algorithm enables adaptive adjustment of the control parameters of the inverter cluster based on system operating status information; Using the system operation status information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster, the A3C reinforcement learning algorithm is trained to update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated.

2. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 1, characterized in that, The aforementioned grid system model is based on inverter clusters and a high proportion of renewable energy grids to construct a multi-source grid-connected system; wherein, the inverter clusters include grid-connected energy storage inverters, photovoltaic inverters, and wind power inverters; including: The grid-type energy storage inverter adopts a grid-type voltage source model and establishes its equivalent rotor dynamics model based on the VSG principle to simulate the virtual inertia, virtual damping, and droop control characteristics of the synchronous generator. The virtual inertia of the VSG is expressed as follows: , in, For virtual inertia, The damping coefficient is... For reference active power, The rated angular frequency, This represents the equivalent electrical angular frequency of a grid-connected energy storage inverter under virtual synchronous control. This represents the equivalent inertial regulation quantity constructed based on the principle of a virtual synchronous generator; This represents the electrical output active power of the equivalent virtual synchronous generator of the grid-connected energy storage inverter; The output voltage of the grid-type energy storage inverter is: , In the formula, This refers to the output voltage of a grid-connected energy storage inverter. This is the reference amplitude of the inverter output voltage; This represents the instantaneous output angular frequency of the grid-connected energy storage inverter, calculated based on the virtual synchronous generator control model. This is the reference phase angle for the output voltage; Output active power of grid-connected energy storage inverter and reactive power They are respectively: , , In the formula, This indicates the active power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions; This represents the reactive power output of a grid-connected energy storage inverter to the grid under three-phase symmetrical operation conditions. This is the equivalent amplitude of the output voltage of the grid-connected energy storage inverter; This is the equivalent amplitude of the output current; The phase difference between the output voltage and the output current; Both photovoltaic inverters and wind power inverters operate in a current-source grid-connected control mode, and their current command is expressed as follows: , In the formula, This represents the current command component of a photovoltaic inverter or wind power inverter in the d-axis direction in a synchronous rotating coordinate system. This represents the current command component of a photovoltaic inverter or wind power inverter in the q-axis direction in a synchronous rotating coordinate system. This indicates the reference value of the active power of the corresponding inverter; This indicates the reactive power reference value for the corresponding inverter; This represents the voltage component of the grid-connected voltage along the d-axis.

3. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 2, characterized in that, The process of building an A3C reinforcement learning algorithm in the power grid system model, formulating the current control strategy, collecting operating status information of the high-proportion renewable energy power grid, and adjusting the control parameters of the inverter cluster based on the operating status information and the current control strategy includes: The A3C reinforcement learning algorithm consists of a shared global neural network and multiple threaded parallel local neural networks; each neural network includes an Actor network and a Critic network. The Actor network interacts with the intelligent agent, which acts as an inverter. The agent is responsible for outputting adjustment strategies for control parameters based on the operational status information of the high-proportion renewable energy grid, i.e., generating strategies. Among them, the Actor network parameters The update formula is: , In the formula, The learning rate of the Actor network. Represents the gradient. For the dominant function, The entropy of the strategy; This represents the state vector of the system at time t; For the updated Actor network parameters; This is the entropy regularization coefficient; Indicates the state of the Actor network. The output action, i.e., the control parameter adjustment of the inverter; For the Actor network parameters The strategy function for representation; The Critic network evaluates the performance under the current adjustment strategy and uses the advantage function. Convergence and improvement of guidance strategies; among which, Critic network parameters The update is based on the TD error, and the formula is as follows: , In the formula, Time difference error is used to measure the deviation between the current state value estimate and the value estimate after one step reward. It is the immediate reward that the system receives after performing a control action; It is a discount factor used to represent the weight of future rewards in the current evaluation; and These represent the system's state at the current moment and the state at the next moment, respectively. and This represents the state value estimate of the Critic network for the current state and the next state; is the learning rate of the Critic network; For Critic network parameters, This represents the gradient of the state value function with respect to the Critic network parameters.

4. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 1, characterized in that, The construction of a multi-objective reward function model is used to dynamically balance and collaboratively optimize the control parameters of the inverter cluster. This multi-objective reward function model includes system economic and energy balance objective functions, steady-state performance and carbon emission objective functions, and a comprehensive reward function. (1) The expression for the multidimensional operational performance evaluation index is as follows: , in, It is a multidimensional performance evaluation index function; These are the weighting coefficients for each evaluation indicator; For voltage deviation; This is for frequency deviation; For power sharing error; For energy storage balance; For system energy efficiency; This is a function that normalizes or penalizes various indicators, used to convert indicators with different dimensions into a unified reward scale. (2) System economic and energy balance objective function: The calculation method is as follows: , In the formula, This represents the objective function value obtained by optimizing the operating cost of the inverter cluster within the scheduling cycle; Let T be the total system operating cost, and T be the total duration of the optimized scheduling cycle. The cost of inverter losses at time t; Energy cost of energy storage systems; The energy cost of interacting with the power grid; specifically expressed as: , in, These are the unit energy loss coefficients of the equipment. For grid-connected electricity price; This represents the output active power of the inverter at time t. and Let represent the charging active power and discharging active power of the energy storage system at time t, respectively. This represents the active power interaction between the grid-connected inverter cluster and the public power grid at time t. To ensure a balance between energy supply and demand, a power balance constraint is introduced, and the power balance deviation is... It can be represented as: , In the formula, This represents the output active power of the photovoltaic power generation system at time t. Let be the active power of the energy storage system at time t; This represents the active power demand of the load at time t; This represents the active power of energy storage during charging at time t; This represents the active power loss of the system at time t; when At that time, the system is in a state of instantaneous power balance; Power balance bonus It can be represented as: ; (3) Steady-state performance and carbon emission objective function: Among them, frequency and voltage stability targets Represented as: , In the formula, , where is the weighting coefficient, used to characterize the relative importance of frequency stability index and voltage stability index in the comprehensive objective function; This represents the deviation of the system from its rated frequency at time t. The system's rated frequency; It represents the deviation of the system from the reference voltage at time t; by normalizing the frequency deviation and voltage deviation to their rated values, the evaluation indicators are transformed into dimensionless forms, thereby realizing the unified measurement and weighted fusion of different physical quantities, which is used to characterize the comprehensive stability performance of the system's operating state. The goal of minimizing carbon emissions can be expressed as: , In the formula, The objective function is carbon emissions. These are the equivalent carbon emission factors per unit of active power corresponding to the output power of the grid-connected inverter, the discharge power of the energy storage system, and the power interacting with the public grid. (4) The comprehensive reward function is: , in, Weighting coefficient Based on system operation requirements, the following must be met: .

5. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 1, characterized in that, The proposed hierarchical collaborative control architecture enables collaborative control of inverter clusters in high-proportion renewable energy power grids based on key control variables of the inverter clusters. This includes: the local control layer receiving control commands from the coordination control layer, coordinating the control between grid-connected energy storage inverters, photovoltaic inverters, and wind power inverters, and feeding back control information to the collaborative control layer. The coordination control layer receives global control commands from the global control layer, exchanges real-time operating data among the inverters, enables active power sharing and optimal allocation of available power among the inverters, and feeds back control information to the global control layer. The coordination control layer must satisfy stability constraints, which are as follows: , , in, For the first The output power of the inverter To meet system load requirements, This refers to the allowable frequency deviation range; and They represent the first Taiwan inverter and the first The output frequency of the inverters in the grid-connected operation state is used to characterize the frequency consistency constraints among the inverter clusters. The global optimization layer is based on the A3C control framework. It dynamically generates control strategies according to the state of the power grid system, performs adaptive optimization on the grid-type energy storage inverter, and performs global scheduling on the active / reactive power reference values ​​of photovoltaic and wind power inverters.

6. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 1, characterized in that, The A3C reinforcement learning algorithm adaptively adjusts the control parameters of the inverter cluster based on system operating state information; including: The parameter update methods for each intelligent agent, i.e., the inverter, are as follows: , in, For learning rate, Accumulated rewards for achieving the target; Indicates the first The parameter vectors of the control strategies of each agent in the next iteration; This indicates the updated policy parameters; Describe the objective function Regarding strategy parameters The gradient is used to indicate the direction of parameter updates; When the A3C reinforcement learning algorithm detects anomalies in the power grid based on the system's operating status information, it outputs new control parameters for the inverter cluster in real time and performs adaptive adjustment of the control parameters. Among these anomalies, power grid anomalies include fluctuations in new energy output, power grid disturbances, or sudden load changes. The specific adjustment mechanism is as follows: , In the formula, Indicates the first Virtual inertia parameters of the virtual synchronous generator at any given time. Adjust the increment of its corresponding parameter; Indicates the first The virtual damping coefficient at time t. This represents the adjustment increment of the damping parameter; and They represent the first Control gain parameters in the active and reactive power control loops of the inverter at all times. and These represent the online update values ​​for the corresponding control gain.

7. The collaborative control method for a new energy inverter cluster based on a grid-type energy storage inverter according to claim 1, characterized in that, The system operating state information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster is used to train the A3C reinforcement learning algorithm, update the current control strategy, and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated, including: The agent continuously collects the system state-action-reward sequence during operation. ,Right now: For grid-type energy storage inverters, the virtual inertia, damping coefficient, and droop control parameters of the device are continuously collected, and the collected control parameters are optimized to achieve adaptive support of voltage and frequency for the grid-type energy storage inverters. For photovoltaic inverters and wind power inverters, active / reactive output tracking is performed according to the scheduling instructions of the updated control strategy to maintain consistency with the grid reference.

8. A collaborative control system for a new energy inverter cluster based on a grid-type energy storage inverter, characterized in that, include: The model building unit constructs a multi-source grid-connected power grid system model based on inverter clusters and a high proportion of new energy power grids; wherein, the inverter cluster includes grid-type energy storage inverters, photovoltaic inverters and wind power inverters; The algorithm building unit builds the A3C reinforcement learning algorithm in the power grid system model, formulates the current control strategy, collects the operating status information of the high-proportion new energy power grid, and adjusts the control parameters of the inverter cluster according to the current control strategy based on the operating status information. The reward construction unit constructs a multi-objective reward function model to dynamically balance and collaboratively optimize the control parameters of the inverter cluster; the multi-objective reward function model includes system economic and energy balance objective functions, steady-state performance and carbon emission objective functions, and a comprehensive reward function; A hierarchical control unit is designed, and a hierarchical collaborative control architecture is implemented to perform collaborative control of the inverter cluster based on the key control variables of the inverter cluster under a high proportion of new energy power grid. The adaptive adjustment unit uses the A3C reinforcement learning algorithm to adaptively adjust the control parameters of the inverter cluster based on system operating status information. The learning and updating unit uses the system operation status information of the power grid system model after adaptive adjustment of the control parameters of the inverter cluster to train the A3C reinforcement learning algorithm, update the current control strategy and perform the next iteration calculation until the voltage and frequency of the high-proportion renewable energy power grid are stable and the power distribution is coordinated.