Multi-zone power system load frequency control method, electronic device and medium
By employing a multi-agent reinforcement learning method, a load frequency regulation model for a multi-regional power system was constructed. A finite-time extended state observer and a synchronization controller were designed, which solved the problems of frequency deviation and tie-line power fluctuation in the multi-regional power system. This enabled rapid frequency recovery and economical power allocation, thereby improving the system's stability and coordinated frequency regulation performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTH CHINA ELECTRIC POWER UNIV
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-19
AI Technical Summary
In multi-regional power systems, traditional load frequency control methods are difficult to meet the requirements of high precision and high robustness. Especially when the system dynamic characteristics are complex, the inter-regional coupling is strong, and the output of renewable energy is highly volatile, the frequency deviation and tie-line power fluctuations are aggravated, and the synchronization performance is degraded.
By employing a multi-agent reinforcement learning approach, a finite-time extended state observer and a synchronous controller are designed to construct the dynamic equations of a multi-regional power system load frequency regulation model. Combined with the MA-GRU-SAC algorithm, which combines centralized training with distributed execution, decoupled modeling and collaborative optimization control of each frequency regulation unit are achieved.
It achieves rapid frequency recovery and economical power allocation, improves system frequency stability and anti-interference capability, enhances the optimized operation level of multi-regional coordinated frequency modulation, and strengthens control stability and accuracy.
Smart Images

Figure CN122246756A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of power system automatic control technology, and more specifically to a multi-regional power system load frequency control method, electronic equipment, and medium. Background Technology
[0002] With the continuous expansion of power system scale and the large-scale grid connection of renewable energy, load frequency control in multi-regional power systems faces increasingly complex challenges. Traditional load frequency control methods are mainly based on centralized or decentralized control strategies, which perform well in single-region or small-scale systems. However, in multi-regional interconnected power systems, due to factors such as complex system dynamic characteristics, strong inter-regional coupling, and large fluctuations in renewable energy output, the control performance of traditional methods often fails to meet the requirements of high precision and high robustness.
[0003] In view of this, the present invention is hereby proposed. Summary of the Invention
[0004] The present invention is proposed in view of the above-mentioned problems. According to one aspect of the present invention, a multi-regional power system load frequency control method is provided, comprising: Based on the physical models of different frequency regulation units, the dynamic equations of the load frequency regulation model of the multi-regional power system are modeled. The multi-regional power system includes power systems in multiple frequency regulation regions. Each frequency regulation region power system includes at least some of the following frequency regulation units: coal-fired power generation unit, gas-fired power generation unit, energy storage unit, wind power generation unit, and photovoltaic power generation unit. The dynamic equations are decoupled to obtain independent subsystem models corresponding to each frequency adjustment unit within a single frequency modulation region; Based on the independent subsystem model corresponding to each frequency adjustment unit in a single frequency modulation region, a finite-time extended state observer is constructed to estimate the state and lumped disturbance of different frequency adjustment units. A finite-time synchronization controller is constructed using the finite-time extended state observer's finite-time estimation results of the system state and lumped disturbances. Based on the output of the finite-time synchronous controller, a multi-region frequency modulation collaborative optimization algorithm is constructed. The multi-region frequency modulation collaborative optimization algorithm adopts a centralized training and distributed execution architecture, including agents corresponding one-to-one with each frequency adjustment unit and a centralized value network. Each agent adopts a policy network based on the soft actor-critic algorithm, and the front end of the policy network of each agent and the centralized value network are embedded in a gated recurrent unit layer. The multi-region frequency regulation collaborative optimization algorithm is trained, and the trained multi-region frequency regulation collaborative optimization algorithm is used to control the multi-region power system.
[0005] In actual operation, frequency deviations occur in different regions when load disturbances or new energy fluctuations occur. To achieve rapid regional frequency recovery and multi-unit coordinated regulation, the above technical solution establishes the dynamic equations of the multi-regional power system load frequency regulation model based on the physical models of each frequency regulation unit, and constructs independent subsystem models corresponding to each frequency regulation unit. This allows for decoupling modeling of each frequency regulation unit, reducing the complexity of controller design. By designing a finite-time extended state observer to estimate system state variables and total loop disturbances online, and by designing a finite-time synchronous controller to generate the output power of each generating unit, time-synchronous control can be used to achieve synchronous convergence of frequencies of different frequency regulation units. Based on the finite-time synchronous controller, a multi-regional frequency regulation coordinated optimization algorithm is constructed to intelligently optimize the output power allocation of each generating unit, achieving coordinated control of rapid frequency recovery and economical power allocation, effectively improving system frequency stability, disturbance suppression capability, and optimized operation level of multi-regional coordinated frequency regulation.
[0006] Specifically, the reinforcement learning decision-making module described above employs the Soft Actor-Critic (SAC) algorithm with an Actor-Critic structure. The SAC algorithm uses a policy network (Actor) to generate actions following a Gaussian distribution for a given state, while a value network (Critic) evaluates the value (Q-value) of a given state-action pair. A key feature of SAC is the introduction of entropy during the optimization process to encourage randomness in exploration and prevent the agent from prematurely converging to a local optimum. Entropy is a measure of the uncertainty of a random variable. By maximizing entropy, the agent becomes more "diversified" in the decision-making process.
[0007] Furthermore, to address the inertia, lag, and historical dependency inherent in the multi-generation unit regulation process, this embodiment embeds a Gated Recurrent Unit (GRU) layer at the front end of each agent's policy (Actor) network. This GRU encodes the state history sequence, extracts temporal features, and outputs continuous power scheduling actions. Modeling historical operational information using GRUs improves the stability and response accuracy of the policy under dynamic disturbances. Simultaneously, a centralized Critic network is constructed, with a GRU layer also embedded at its front end. This network evaluates the joint state and actions of the multiple generation units, learns the value function, characterizes the coupling relationships between the generation units, and provides global gradient information for policy updates. By adding a GRU network before the multilayer perceptron, the network's memory performance is improved, enhancing the agent's efficiency in utilizing historical data.
[0008] Exemplarily, the method further includes: Observe whether the reward function converges and whether the frequency deviation and power output of different frequency adjustment units meet the constraints during the training process of the multi-region frequency modulation collaborative optimization algorithm. The training is considered complete when the reward function converges and the frequency deviation and power output of different frequency adjustment units meet the constraints.
[0009] The above technical solution monitors the convergence status of the reward function, frequency deviation, and the satisfaction of power output constraints of different frequency regulation units in real time during the training process of the multi-region frequency regulation collaborative optimization algorithm. This enables accurate judgment of whether the algorithm training is complete, effectively ensuring the reliability and rationality of the training results, ensuring that the algorithm output meets the actual operating constraints and performance requirements of power system frequency regulation, and improving the stability and accuracy of multi-region frequency regulation collaborative optimization control.
[0010] For example, the policy network of each agent is optimized based on the maximum entropy reinforcement learning objective, and the value function of the centralized value network is constrained by minimizing the Bellman error.
[0011] The above scheme optimizes the policy network of each agent by using the maximum entropy reinforcement learning objective, and at the same time uses the value function of the centralized value network constrained by minimizing the Bellman error. This can improve the agent's policy exploration ability and learning stability, effectively reduce value estimation bias, enhance the robustness and convergence efficiency of multi-agent collaborative decision-making, and enable the agent to obtain better and more stable behavioral policies in complex dynamic environments.
[0012] For example, the policy network of each agent is updated using the following function: ; in, Indicates that the parameters of each agent are The strategy network loss function value; Indicates that the parameters of each agent are The logarithm of the strategy; The parameter is A centralized value network, taking the joint observations and joint actions of all agents as input, provides global gradient information for the policy update of each agent; Weights representing policy entropy; Let be the normalization constant for the state. This represents the logarithmic term of the normalization constant, used to ensure the normalization of the policy probability distribution.
[0013] The centralized value network is updated using the following function: ; ; in, and Representing all intelligent agents in t and t+ Joint observations at time 1; and Representing all intelligent agents in t and t+ Joint actions at moment 1; The parameter is A centralized value (Q) network for estimating the value of the current joint state-action pair; For parameters The value estimation of joint state-action pairs in the next time step by a centralized value network; This represents the immediate reward obtained by all agents after performing a joint action in a joint state; This represents the discount factor, used to measure the importance of future cumulative returns; This represents the sum of the values of all agent policy pairs, reflecting the global entropy constraint of multi-agent systems. This is the entropy regularization coefficient, used to adjust the trade-off between reward and policy randomness. The loss function of a centralized Q-network is represented. This represents the expectation of the state transition samples obtained from the experience replay pool. This represents the soft update coefficient of the target network. Represents the parameters of a centralized objective Q-network. This represents the parameters of a centralized Q-network. Indicates that the intelligent agent is in t+ The action taken at moment 1. Indicates that the intelligent agent is in t+ The observation at time 1.
[0014] The above scheme designs update mechanisms for a distributed policy network and a centralized value network, respectively. The policy network updates based on its own loss function, policy logarithm, global gradient information provided by the centralized value network, policy entropy weights, and state normalization constants. This improves the rationality of single-agent decision-making while ensuring the normalization of policy probability distribution. The centralized value network constructs a loss function based on multi-agent joint observation, joint actions, immediate rewards, discount factors, global entropy constraints, and entropy regularization coefficients. It also optimizes parameters by combining the expected value of empirical replay samples and soft updates of the target network. This provides global gradient information for policy updates of each agent and balances rewards and policy randomness through global entropy constraints. This effectively improves the stability, global optimality, and training convergence of multi-agent collaborative decision-making, while enhancing the adaptability and robustness of the algorithm in multi-agent joint interaction scenarios.
[0015] For example, the step of modeling the dynamic equations of the multi-regional power system load frequency regulation model based on the physical models of different frequency regulation units includes: Establish the correspondence between the power of different frequency regulation units and the system frequency deviation within the multi-regional power system; Establish dynamic models of different frequency adjustment units within a single frequency modulation region; Establish a dynamic power model for the tie line between any two frequency modulation regions; Determine the regional control error for a single frequency modulation region; Preferably, the relationship between the power of different frequency regulation units and the system frequency deviation within the multi-regional power system is expressed by the following expression: ; in, , i The marker representing the frequency modulation region, This indicates the total number of frequency modulation regions; This indicates the system frequency deviation; This indicates the output power of the coal-fired power generation unit; This indicates the output power of the gas-fired power generation unit; This indicates the output power of the energy storage unit; This indicates the output power of the wind power generation unit; This indicates the output power of the photovoltaic power generation unit; The symbols are respectively for the coal-fired power generation unit, the gas-fired power generation unit, the energy storage unit, the wind power generation unit, and the photovoltaic power generation unit; Indicates regional load disturbance; Indicates a tie-line deviation signal; and These represent the load damping constant and the load inertia constant, respectively. Preferably, the dynamic model of different frequency adjustment units within a single frequency modulation region is represented by the following expression: ; ; in, Representing different types of frequency modulation resources; Represents frequency adjustment unit m ; output power; It is a time constant; It is an intermediate variable in the process; It is the delay time constant; It is the primary frequency modulation constant; It is a frequency adjustment unit m Received power generation instructions; Preferably, the tie-line power dynamic model is represented by the following expression: ; in, yes The first differential; It is the first The first frequency modulation zone and the first Power exchange coefficient between frequency modulation zones; It is the first Frequency deviation of each frequency modulation zone; Preferably, the area control error is represented by the following expression: ; in, Indicates regional control error; It is the frequency deviation coefficient.
[0016] The above scheme constructs physical models of different frequency regulation units in a multi-regional power system, and sequentially establishes the correspondence between the power of each frequency regulation unit and the system frequency deviation, the dynamic model of each frequency regulation unit in a single region, the dynamic model of the power of the inter-regional tie line, and the control error in a single region. This forms a complete dynamic equation for load frequency regulation in a multi-regional power system, which can accurately characterize the dynamic response characteristics of various frequency regulation resources such as coal, gas, energy storage, wind power, and photovoltaics, as well as the power interaction law between regions. It effectively improves the simulation accuracy and dynamic characterization capability of the frequency fluctuation process of complex power systems, and provides accurate and reliable theoretical model support for realizing multi-source collaborative and cross-regional coordinated load frequency optimization control.
[0017] For example, for any of the frequency adjustment units, the independent subsystem model of the frequency adjustment unit can be represented as: ; in, yes The second derivative, and ; This represents the loop disturbance of the frequency adjustment unit; It is the output gain of the frequency adjustment unit. This indicates the virtual frequency deviation of the frequency adjustment unit.
[0018] The above scheme constructs an independent subsystem model for each frequency regulation unit, clarifies the mathematical mapping relationship between second-order derivative, loop disturbance, output gain, and virtual frequency deviation, and can accurately characterize the dynamic response characteristics and disturbance propagation law of the frequency regulation unit. This effectively improves the modeling accuracy and control stability of the frequency regulation process, and provides solid theoretical support and technical guarantee for achieving precise frequency regulation and reliable and stable operation of the system.
[0019] For example, the finite-time extended state observer is constructed in the following way: For any of the frequency adjustment units, an extended state model is constructed based on the independent subsystem model of the frequency adjustment unit; Based on the extended state model, construct the finite-time extended state observer; Preferably, the extended state model is represented by the following expression: ; in, , , Represents frequency adjustment unit m Received power generation instructions Represents the output gain to be selected, where, The total disturbance of the system loop represents different frequency modulation resources. This is the derivative or equivalent rate of change of the total disturbance; Preferably, the finite-time extended state observer is represented by the following expression: ; in, It is a state variable Observed values; It is a state variable The observation error. It is the observer gain. It is the power gain of the sign function, where sgn represents the sign function.
[0020] The above scheme adopts a technical approach of constructing a finite-time extended state observer based on an independent subsystem model. First, an extended state model containing the total disturbance of the system loop and its rate of change is established for each frequency regulation unit. Then, a corresponding finite-time extended state observer is designed based on the model. This allows for rapid and accurate real-time observation and error compensation of key information such as the state variables and total disturbance of the frequency regulation unit within a finite time. This effectively suppresses the total disturbance of the system loop existing in different frequency regulation resources, significantly improves the observation accuracy, response speed and anti-interference capability of the frequency regulation system, and ensures the stability and reliability of the frequency regulation control process.
[0021] For example, the finite-time synchronization controller is represented by the following expression:
[0022] in, and It is the input gain matrix; It is a constant gain; in, ; .
[0023] The above scheme, by introducing a specified input gain matrix and a constant gain, can achieve precise synchronization of the system state within a finite time, effectively improving the response speed and convergence efficiency of the synchronization control, enhancing the stability and robustness of the control system, and quickly suppressing system disturbances and errors, ensuring that the synchronization process is completed quickly, stably, and accurately. This finite-time synchronization controller can guarantee strong anti-interference capability for different frequency adjustment units, ensuring that the state variables of different frequency adjustment units converge synchronously within a finite time, while also ensuring the robustness of the controller under internal system disturbances and external environmental interference.
[0024] According to another aspect of the present invention, an electronic device is provided, including a processor and a memory, wherein the memory stores a computer program, and the processor is used to execute the computer program to implement the method as described above.
[0025] According to another aspect of the present invention, a computer-readable storage medium is provided, which stores a computer program / instructions that, when executed by a processor, implement the method described above.
[0026] In the above technical solution, by establishing the dynamic equations of the multi-regional power system load frequency regulation model based on the physical models of each frequency regulation unit, and constructing independent subsystem models corresponding to each frequency regulation unit, decoupling modeling of each frequency regulation unit can be achieved, reducing the complexity of controller design. By designing a finite-time extended state observer to estimate the system state variables and total loop disturbances online, and designing a finite-time synchronous controller to generate the output power of each generation unit, time-synchronous control can be used to achieve frequency synchronization convergence of different frequency regulation units. By constructing a multi-regional frequency regulation collaborative optimization algorithm based on the finite-time synchronous controller, the output power allocation of each generation unit can be intelligently optimized, achieving collaborative control of rapid frequency recovery and economical power allocation, effectively improving the system frequency stability, disturbance suppression capability, and optimized operation level of multi-regional collaborative frequency regulation.
[0027] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description
[0028] The above and other objects, features, and advantages of the present invention will become more apparent from the more detailed description of the embodiments of the invention in conjunction with the accompanying drawings. The drawings are provided to further illustrate the embodiments of the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings, the same reference numerals generally represent the same parts or steps.
[0029] Figure 1 A schematic flowchart illustrating a multi-regional power system load frequency control method according to an embodiment of the present invention is shown. Figure 2 The diagram shows the synchronous convergence results of state variables according to an embodiment of the present invention; Figure 3 A schematic diagram of an agent network structure according to an embodiment of the present invention is shown; Figure 4 A schematic diagram illustrating the training process of MA-GRU-SAC according to an embodiment of the present invention is shown; Figure 5 This diagram illustrates the reward graph of the agent training process according to an embodiment of the present invention. Figure 6 A comparison diagram of frequency response effects according to an embodiment of the present invention is shown; Figure 7 A control structure diagram according to an embodiment of the present invention is shown; Figure 8 A schematic block diagram of an electronic device according to an embodiment of the present invention is shown. Detailed Implementation
[0030] To make the objectives, technical solutions, and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are merely a part of the embodiments of the present invention, and not all of the embodiments of the present invention. It should be understood that the present invention is not limited to the exemplary embodiments described herein. Based on the embodiments of the present invention described herein, all other embodiments obtained by those skilled in the art without inventive effort should fall within the protection scope of the present invention.
[0031] Research has revealed the following key technical problems in load frequency control for multi-regional power systems: 1. Strong dynamic coupling: In multi-region interconnected power systems, the frequency and power exchange of each region affect each other. Traditional distributed control methods are difficult to effectively coordinate the dynamic coupling between regions, resulting in increased system frequency deviation and tie-line power fluctuations. 2. Uncertainty of renewable energy: The large-scale integration of renewable energy sources such as wind power and photovoltaics leads to a decrease in system inertia and frequent power fluctuations. Traditional control strategies based on fixed models are difficult to adapt to this rapidly changing operating environment. 3. Limitations of centralized control: Centralized control requires global information, has a heavy communication burden, and is highly dependent on the central node, making it difficult to adapt to the trend of distributed development of power systems; 4. Reduced power system synchronization performance: When different generator sets or equipment are connected to the power grid, the dynamic characteristics of each power source in the system become more different, which leads to a decrease in the overall frequency synchronization performance of the power grid and poses a greater challenge to the coordination and stability of grid-connected operation.
[0032] To address at least some of the aforementioned technical problems, this invention provides a method, electronic device, and medium for load frequency control in a multi-regional power system. This method utilizes multi-agent reinforcement learning to achieve load frequency control in a multi-regional power system, thereby improving the frequency stability and anti-interference capability of the multi-regional power system. For ease of description, the method, electronic device, and storage medium are described below through specific embodiments.
[0033] Example This embodiment provides a method for load frequency control in a multi-regional power system. This method is mainly applied to frequency regulation scenarios in multi-regional interconnected systems containing various types of frequency regulation resources (coal-fired units, gas-fired units, energy storage, wind power, and photovoltaics). Figure 1 A schematic flowchart illustrating a multi-regional power system load frequency control method according to an embodiment of the present invention is shown. Figure 1 As shown, the method may include the following steps S110, S120, S130, S140, S150 and S160.
[0034] In step S110, based on the physical models of different frequency regulation units, the dynamic equation of the load frequency regulation model of the multi-regional power system is modeled. The multi-regional power system includes power systems in multiple frequency regulation regions. The power system in each frequency regulation region includes at least some of the following frequency regulation units: coal-fired power generation unit, gas-fired power generation unit, energy storage unit, wind power generation unit, and photovoltaic power generation unit.
[0035] In this article, the frequency regulation area refers to a power control area in a multi-regional power system that participates in load frequency control as an independent power-frequency regulation unit. Its internal generating units work together to complete frequency regulation and exchange power and coordinate control with other areas through tie lines. Frequency regulation units include, but are not limited to, coal-fired power generation units (coal-fired turbines), gas-fired power generation units (gas turbines), energy storage units (energy storage devices), wind power generation units (wind turbine generators), and photovoltaic power generation units (photovoltaic generators).
[0036] In this embodiment, it is considered that a regional frequency dynamic model (i.e., dynamic equation) is first established based on the physical model of each frequency adjustment unit. Specifically, step S110 may include the following steps S111, S112, S113 and S114.
[0037] In step S111, the correspondence between the power of different frequency regulation units and the system frequency deviation within the multi-regional power system is established. This correspondence is expressed by the following expression: ; in, , i The marker representing the frequency modulation region, This indicates the total number of frequency modulation regions; Indicates system frequency deviation; This indicates the output power of the coal-fired power generation unit; This indicates the output power of the gas-fired power generation unit; Indicates the output power of the energy storage unit; This indicates the output power of the wind power generation unit; This indicates the output power of the photovoltaic power generation unit; These are the symbols for coal-fired power generation units, gas-fired power generation units, energy storage units, wind power generation units, and photovoltaic power generation units, respectively. Indicates regional load disturbance; Indicates a tie-line deviation signal; and These represent the load damping constant and the load inertia constant, respectively.
[0038] In the following text, to unify the mathematical expression of different frequency modulation resources, the model structure remains consistent for different frequency modulation resource types, while the corresponding time constant, output gain, droop coefficient, and delay parameters are set to different values according to the actual equipment characteristics. These represent different types of frequency modulation resources.
[0039] In step S112, a dynamic model of different frequency adjustment units within a single frequency modulation region is established. The dynamic model of different frequency adjustment units within a single frequency modulation region is expressed by the following expression: ; ; in, Representing different types of frequency modulation resources; Represents frequency adjustment unit m ; output power; It is a time constant; It is an intermediate variable in the process; It is the delay time constant; It is the primary frequency modulation constant; It is a frequency adjustment unit m The power generation instruction received.
[0040] In step S113, a dynamic model of tie-line power between any two frequency modulation zones is established. The dynamic model of tie-line power is expressed by the following expression: ; in, yes The first differential; It is the first The first frequency modulation zone and the first Power exchange coefficient between frequency modulation zones; It is the first Frequency deviation in each frequency modulation zone.
[0041] In step S114, the area control error of a single frequency modulation zone is determined. The area control error is expressed by the following expression: ; in, Indicates regional control error; It is the frequency deviation coefficient.
[0042] In step S120, the dynamic equations are decoupled to obtain independent subsystem models corresponding to each frequency adjustment unit in a single frequency modulation region.
[0043] In the dynamic equations established in step S110, the system exhibits a set of first-order coupled differential equations, and different frequency modulation units form a strongly coupled structure through a common frequency variable. To achieve independent control of different frequency modulation units, it is considered to decouple different frequency modulation single loops within a single frequency modulation region into separate subsystems. In this embodiment, firstly, it is assumed that each frequency modulation unit subsystem has a virtual frequency deviation. The power output of the remaining frequency conditioning units is modeled as external interference at the common coupling point. The virtual frequency deviation of each frequency conditioning unit... It has the following form: .
[0044] Next, the formula in step S111 is differentiated, and combined with steps S112-114, the second-order model of the virtual frequency deviation for different frequency adjustment units is obtained: ; in, This represents the output gain of different frequency adjustment units. This represents loop disturbances for different frequency adjustment units.
[0045] For ease of representation, the above second-order model can be uniformly described. From this, we can obtain the independent subsystem model for each frequency adjustment unit: ;in, yes The second derivative, and ; This represents the loop disturbance of the frequency adjustment unit; It is the output gain of the frequency adjustment unit. This indicates the virtual frequency deviation of the frequency adjustment unit.
[0046] In step S130, a finite-time extended state observer is constructed based on the independent subsystem model corresponding to each frequency regulation unit in a single frequency regulation region to estimate the state and lumped disturbance of different frequency regulation units.
[0047] In this embodiment, the finite-time extended state observer is constructed as follows: for any frequency adjustment unit, an extended state model is constructed based on the independent subsystem model of the frequency adjustment unit; and a finite-time extended state observer is constructed based on the extended state model.
[0048] Specifically, based on the independent subsystem models of the different frequency adjustment units obtained in step S120, they can be uniformly described as the following state-space equations. Let... , And it unifies the unknowns, uncertainties, coupling terms and external disturbances in the system into a total disturbance. And expand it into a new state. Therefore, the following extended state model is obtained: ; in, Represents frequency adjustment unit m Received power generation instructions Represents the output gain to be selected, where, The total disturbance of the system loop represents different frequency modulation resources. Let be the derivative or equivalent rate of change of the total disturbance, and assume that it is bounded.
[0049] In order to obtain the state variables simultaneously within a finite time... Based on the estimated values, a finite-time extended state observer is designed as follows: ; in, It is a state variable Observed values; It is a state variable The observation error; It is the observer gain; It represents the power gain of the sign function; sgn represents the sign function.
[0050] In step S140, a finite-time synchronization controller is constructed using the finite-time extended state observer's finite-time estimation results of the system state and lumped disturbances.
[0051] In this embodiment, based on the finite-time extended state observer designed in step S130, a disturbance-compensated finite-time synchronization control law is constructed using its finite-time estimation results of the system state and total disturbance. Specifically, to achieve finite-time convergence of the state variables, the estimated values of the state variables are obtained according to the steps above. First, construct the feedback vector. And design a finite-time synchronization controller with the following structure: .
[0052] in, and It is the input gain matrix; It is a constant gain. (Symbol) It can be represented as: .
[0053] To compensate for the total disturbance in the system loop The estimated total loop perturbation (i.e., lumped perturbation) obtained from the finite-time extended state observer. Combining the initially constructed finite-time stable controller, the final finite-time synchronization controller is designed as follows: .
[0054] Figure 2 A diagram showing the synchronous convergence result of state variables according to an embodiment of the present invention is provided. Figure 2 As shown, by using the aforementioned finite-time synchronous controller, the state variables of different frequency adjustment units can be guaranteed. and state variables Convergence within the same control cycle of 2 seconds, that is, within one control cycle of 2 seconds and within a finite time, the state variables converge to 0 simultaneously, indicating the effectiveness of the finite-time synchronous controller.
[0055] In step S150, based on the output of the finite-time synchronization controller, a multi-region frequency modulation collaborative optimization algorithm is constructed. This algorithm employs a centralized training and distributed execution architecture, including multiple agents corresponding one-to-one with each frequency modulation unit and a centralized value network. Each agent uses a policy network based on the soft actor-critic algorithm, and the front end of both the policy network and the centralized value network of each agent is embedded in a gated recurrent unit layer. This multi-region frequency modulation collaborative optimization algorithm can be called the MA-GRU-SAC algorithm.
[0056] To further improve the system frequency regulation performance, this embodiment considers building a multi-region frequency modulation collaborative optimization scheduling framework for collaborative regulation of multiple frequency regulation units on the output of the finite-time synchronous controller. This framework configures a reinforcement learning decision module (i.e., agent) for each frequency regulation unit and adopts a multi-agent reinforcement learning mechanism with centralized training and distributed execution.
[0057] Figure 3 A schematic diagram of an agent network structure according to an embodiment of the present invention is shown. Figure 3 As shown, the policy network (Actor) is used for distributed execution. Its input is the local observation information of the corresponding agent. After extracting temporal features through the gated recurrent unit (GRU) layer and processing by the multilayer perceptron, it outputs the local action instructions of the agent. The value network (Critic) is used only in the centralized training phase. Its input is the global observation information with a global perspective (i.e., the set of observations of all agents) and the joint actions of all agents. It is also processed by the gated recurrent unit and the multilayer perceptron, and finally outputs the global value evaluation result (Q-value), which is used to guide the optimization of each policy network during training. In this embodiment, in order to reduce the high bias in Q-value estimation, SAC introduces a double Q network. During policy updates, a smaller Q-value is selected, thereby effectively reducing the risk of overestimation.
[0058] In this embodiment, the observation space of each agent is set as follows: , This represents the observation space of different intelligent agents. The action space of each agent is set as follows: , This represents the action space for different agents. The reward function for each agent is designed as follows: ;in, Represents the reward function; and These are the weighting coefficients; Output for each agent.
[0059] In this embodiment, the policy network of each agent is optimized based on the maximum entropy reinforcement learning objective. Specifically, the optimal policy function of the Actor network for each agent is defined as: ; in, This represents the optimal strategy for each agent; Represents the state space of the system; Represents the local observation space of each agent; Represents the action space of each agent; The entropy representing the policy of each agent; Indicates the agent's state Next action The reward value; Weights representing policy entropy; Representation Strategy The set of maximum values; Indicates the state and perform actions Down The expectation.
[0060] The Centralized Value Network (Critic) employs a centralized training framework, using joint observations and actions from all agents as input to evaluate the value of global state-action pairs. Let... Indicates all m An intelligent agent in t Joint observation at different times Indicates all m An intelligent agent in t The joint actions at each moment. The value function of a centralized value network is constrained by minimizing the Bellman error, ensuring that the predicted value of the Q-value network approaches the target value. The centralized value function can be expressed by the following equation: ; in, Representing state-action pairs The centralized value function under; Representing state-action pairs The reward function below; This represents the value function of the state-action pair at the next time step. Represents the discount factor; represents the discount factor for all agents. t+ The sum of the values of each policy pair at time 1 is used to constrain the entropy of the global policy. Indicates the strategy adopted. The logarithm of . Indicates the observed value Next action The expectation.
[0061] Each agent's policy network is updated using the following loss function: ; in, Indicates that the parameters of each agent are The strategy network loss function value; Indicates that the parameters of each agent are The logarithm of the strategy; The parameter is A centralized value network, taking the joint observations and joint actions of all agents as input, provides global gradient information for the policy update of each agent; Weights representing policy entropy; Let be the normalization constant for the state. This represents the logarithmic term of the normalization constant, used to ensure the normalization of the policy probability distribution.
[0062] The centralized value network is updated using the following function: ; ; in, and Representing all intelligent agents in t and t+ Joint observations at time 1; and Representing all intelligent agents in t and t+ Joint actions at moment 1; The parameter is A centralized value (Q) network for estimating the value of the current joint state-action pair; For parameters The value estimation of joint state-action pairs in the next time step by a centralized value network; This represents the immediate reward obtained by all agents after performing a joint action in a joint state; This represents the discount factor, used to measure the importance of future cumulative returns; This represents the sum of the values of all agent policy pairs, reflecting the global entropy constraint of multi-agent systems. This is the entropy regularization coefficient, used to adjust the trade-off between reward and policy randomness. The loss function of a centralized Q-network is represented. This represents the expectation of the state transition samples obtained from the experience replay pool. This represents the soft update coefficient of the target network. Represents the parameters of a centralized objective Q-network. This represents the parameters of a centralized Q-network. Indicates that the intelligent agent is in t+ The action taken at moment 1. Indicates that the intelligent agent is in t+ The observation at time 1.
[0063] The SAC algorithm can achieve a balance between exploration and exploitation through automatic adjustment. The update objective of the policy network is to maximize the expected return with entropy. The entropy term is used to introduce randomness constraints during policy optimization, enhance exploration capabilities, prevent premature convergence, and achieve a dynamic balance between maximizing returns and exploration. Given the entropy temperature parameter, the entropy parameter in the policy network is updated using the following loss function: ; in, Indicates that the parameters of each agent are The loss function value of the policy entropy; The initial value of the entropy of the strategy.
[0064] In this embodiment, a GRU network is further introduced into the SAC algorithm, that is, a GRU layer is added before the multilayer perceptron (MLP). The GRU dynamically adjusts the information weights of the current input and the historical hidden states through update and reset gates, effectively alleviating the gradient problem and enhancing the ability to model long sequence dependencies. The calculation formula for this GRU network is as follows: ;in, Output the new hidden state of the GRU network at the current moment; Output the hidden state of the GRU network at the previous time step; This represents the hidden state of the GRU network at the current moment. For Hadamard product; This represents the output of the update gate.
[0065] Figure 4 A schematic diagram illustrating the training process of MA-GRU-SAC according to an embodiment of the present invention is shown. Figure 4 As shown, the training process employs a centralized training and distributed execution framework. First, multiple agents interact with the system in a multi-regional power system environment, outputting control actions based on current state information to adjust the load frequency in each region. The environment responds with new system states and corresponding reward signals based on these actions. Each agent continuously explores during the interaction process to enhance the generalization ability of the policy.
[0066] During the training phase, information such as the state, actions, and rewards of all agents is centrally collected and trained. A network evaluation mechanism is used to assess the policy network and value network, and an experience replay mechanism is employed to repeatedly sample historical data, breaking down correlations between samples and improving data utilization efficiency and training stability. Based on this, gradient calculations are performed, and the parameters of the policy and value networks are updated, thereby optimizing the policies of each agent.
[0067] After training, the updated policy parameters are distributed to each distributed agent to achieve independent decision-making and collaborative control. Finally, the trained agent policies are combined with a finite-time synchronous controller and applied to load frequency control in a multi-regional power system to achieve rapid frequency stabilization and inter-regional coordinated control.
[0068] In step S160, the multi-region frequency regulation collaborative optimization algorithm is trained, and the trained multi-region frequency regulation collaborative optimization algorithm is used to control the multi-region power system.
[0069] In this embodiment, the method further includes: observing whether the reward function converges and whether the frequency deviation and power output of different frequency adjustment units meet the constraints during the training process of the multi-region frequency modulation collaborative optimization algorithm; determining that training is complete when the reward function converges and the frequency deviation and power output of different frequency adjustment units meet the constraints. Otherwise, repeating steps S130-S150. Figure 5 A reward graph illustrating the agent training process according to an embodiment of the present invention is shown. Figure 5 As shown, in 1500 rounds of agent training, the average reward value and exploration reward value of the agent converge and tend to stabilize after about 200 rounds, indicating that the scheduling framework based on MA-GRU-SAC can complete the real-time allocation of power generation instructions for different frequency regulation units.
[0070] Figure 6 A comparison diagram of frequency response effects according to an embodiment of the present invention is shown. Figure 6 In this study, the MA-GRU-SAC algorithm of this embodiment and the traditional PID algorithm are respectively used to control the load frequency of a multi-regional power system. Figure 6 The results show that the method in this embodiment has a faster frequency convergence speed and stronger robustness than the traditional PID method, which proves the effectiveness and superiority of the method in this embodiment.
[0071] Figure 7 A control structure diagram according to an embodiment of the present invention is shown. Figure 7As shown, in this embodiment, a finite-time extended state observer, a finite-time synchronization controller, and an agent are set up for each frequency regulation unit, with each agent corresponding to a specific frequency regulation unit. Specifically, during the operation of the multi-regional power system, the frequency deviation, tie-line power deviation, and related state information of each region are first input into the corresponding finite-time extended state observer to obtain system state estimates and lumped disturbance estimates. Subsequently, the state estimates and lumped disturbance estimates are input into the corresponding finite-time synchronization controller to generate basic synchronization control quantities. Based on this, each agent, combined with local observation spatial information, further optimizes the basic synchronization control quantities, outputs optimized control commands for the corresponding frequency regulation units, and applies them to the coal-fired power generation unit, gas-fired power generation unit, energy storage unit, wind power generation unit, and photovoltaic power generation unit, respectively.
[0072] After receiving the optimized control command, each frequency regulation unit dynamically adjusts the frequency of the multi-regional power system and feeds back the adjusted frequency deviation and related state variables to the front-end observer and the input of the intelligent agent, thus forming a closed-loop control structure. This control structure enables rapid coordinated response among different frequency regulation units, improving the frequency stability and frequency regulation performance of the multi-regional power system under load disturbances and fluctuations in renewable energy power.
[0073] Therefore, this embodiment adopts the above-mentioned finite-time synchronous load frequency control method for multi-regional power systems based on MA-GRU-SAC. By constructing an equivalent model of the multi-regional power system, designing a finite synchronization controller, and using the MA-GRU-SAC algorithm, the state synchronization convergence of different frequency regulation units and the real-time allocation of generation commands are realized, thereby improving the robustness of load frequency regulation.
[0074] According to another aspect of the present invention, an electronic device is also provided. Figure 8 A schematic block diagram of an electronic device according to an embodiment of the present invention is shown. Figure 8 As shown, the electronic device 800 includes a processor 810 and a memory 820. The memory 820 stores a computer program, which the processor 810 executes to implement the method described above.
[0075] According to another aspect of the present invention, a computer-readable storage medium is also provided. The storage medium stores a computer program / instructions that, when executed by a processor, implement the method described above. The storage medium may, for example, include a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disc read-only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.
[0076] Those skilled in the art will readily understand the implementation structure, working principle, and beneficial effects of electronic devices and computer-readable storage media by reading the above methods. For the sake of brevity, further details will not be elaborated here.
[0077] Although exemplary embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above exemplary embodiments are merely illustrative and are not intended to limit the scope of the invention. Various changes and modifications can be made therein by those skilled in the art without departing from the scope and spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as claimed in the appended claims.
[0078] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0079] In the several embodiments provided by this invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed.
[0080] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0081] Similarly, it should be understood that, in order to streamline the invention and aid in understanding one or more of the various aspects of the invention, features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the invention. However, this approach should not be construed as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as reflected in the corresponding claims, its inventive point lies in solving the corresponding technical problem with fewer features than all of those in a single disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of the invention.
[0082] Those skilled in the art will understand that, apart from the mutual exclusion of features, all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and all processes or elements of any method or apparatus so disclosed may be combined in any combination. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by an alternative feature that serves the same, equivalent, or similar purpose.
[0083] Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features but not others included in other embodiments, combinations of features from different embodiments are intended to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments can be used in any combination.
[0084] The various component embodiments of the present invention can be implemented in hardware, or as software modules running on one or more processors, or a combination thereof. Those skilled in the art will understand that microprocessors or digital signal processors (DSPs) can be used in practice to implement some or all of the functions of some modules in the electronic device according to embodiments of the present invention. The present invention can also be implemented as an apparatus program (e.g., a computer program and computer program product) for performing some or all of the methods described herein. Such programs implementing the present invention can be stored on a computer-readable medium or can be in the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
[0085] It should be noted that the above embodiments are illustrative of the invention and not restrictive, and that those skilled in the art can devise alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words can be interpreted as names.
[0086] The above description is merely a specific embodiment of the present invention or an explanation of that embodiment. The scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. The scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for load frequency control in a multi-regional power system, characterized in that, include: Based on the physical models of different frequency regulation units, the dynamic equations of the load frequency regulation model of the multi-regional power system are modeled. The multi-regional power system includes power systems in multiple frequency regulation regions. Each frequency regulation region power system includes at least some of the following frequency regulation units: coal-fired power generation unit, gas-fired power generation unit, energy storage unit, wind power generation unit, and photovoltaic power generation unit. The dynamic equations are decoupled to obtain independent subsystem models corresponding to each frequency adjustment unit within a single frequency modulation region; Based on the independent subsystem model corresponding to each frequency adjustment unit in a single frequency modulation region, a finite-time extended state observer is constructed to estimate the state and lumped disturbance of different frequency adjustment units. A finite-time synchronization controller is constructed using the finite-time extended state observer's finite-time estimation results of the system state and lumped disturbances. Based on the output of the finite-time synchronous controller, a multi-region frequency modulation collaborative optimization algorithm is constructed. The multi-region frequency modulation collaborative optimization algorithm adopts a centralized training and distributed execution architecture, including agents corresponding one-to-one with each frequency adjustment unit and a centralized value network. Each agent adopts a policy network based on the soft actor-critic algorithm, and the front end of the policy network of each agent and the centralized value network are embedded in a gated recurrent unit layer. The multi-region frequency regulation collaborative optimization algorithm is trained, and the trained multi-region frequency regulation collaborative optimization algorithm is used to control the multi-region power system.
2. The method according to claim 1, characterized in that, The method further includes: Observe whether the reward function converges and whether the frequency deviation and power output of different frequency adjustment units meet the constraints during the training process of the multi-region frequency modulation collaborative optimization algorithm. The training is considered complete when the reward function converges and the frequency deviation and power output of different frequency adjustment units meet the constraints.
3. The method according to claim 1, characterized in that, The policy network of each agent is optimized based on the maximum entropy reinforcement learning objective, and the value function of the centralized value network is constrained by minimizing the Bellman error.
4. The method according to claim 1, characterized in that, The policy network of each agent is updated using the following function: ; in, Indicates that the parameters of each agent are The strategy network loss function value; Indicates that the parameters of each agent are The logarithm of the strategy; The parameter is A centralized value network; Weights representing policy entropy; Let be the normalization constant for the state. This represents the logarithmic term of the normalization constant; The centralized value network is updated using the following function: ; ; in, and Representing all intelligent agents in t and t+ Joint observations at time 1; and Representing all intelligent agents in t and t+ Joint actions at moment 1; The parameter is The value estimation of the current joint state-action pair by a centralized value network; For parameters The value estimation of joint state-action pairs in the next time step by a centralized value network; This represents the immediate reward obtained by all agents after performing a joint action in a joint state; Indicates the discount factor; This represents the sum of the values of all agent policy pairs; This is the entropy regularization coefficient; The loss function representing a centralized value network; This represents the expectation of the state transition samples obtained from the experience replay pool; Represents the soft update coefficients of the target network; Represents the parameters of a centralized objective Q-network; Represents the parameters of a centralized Q-network; Indicates that the intelligent agent is in t+ The action taken at moment 1; Indicates that the intelligent agent is in t+ The observation at time 1.
5. The method according to claim 1, characterized in that, The dynamic equations for modeling the load frequency regulation model of a multi-regional power system based on the physical models of different frequency regulation units include: Establish the correspondence between the power of different frequency regulation units and the system frequency deviation within the multi-regional power system; Establish dynamic models of different frequency adjustment units within a single frequency modulation region; Establish a dynamic model of the tie-line power between any two frequency modulation regions; Determine the regional control error for a single frequency modulation region; Preferably, the relationship between the power of different frequency regulation units and the system frequency deviation within the multi-regional power system is expressed by the following expression: ; in, , i The marker representing the frequency modulation region, This indicates the total number of frequency modulation regions; This indicates the system frequency deviation; This indicates the output power of the coal-fired power generation unit; This indicates the output power of the gas-fired power generation unit; This indicates the output power of the energy storage unit; This indicates the output power of the wind power generation unit; This indicates the output power of the photovoltaic power generation unit; The symbols are respectively for the coal-fired power generation unit, the gas-fired power generation unit, the energy storage unit, the wind power generation unit, and the photovoltaic power generation unit; Indicates regional load disturbance; Indicates a tie-line deviation signal; and These represent the load damping constant and the load inertia constant, respectively. Preferably, the dynamic model of different frequency adjustment units within a single frequency modulation region is represented by the following expression: ; ; in, Representing different types of frequency modulation resources; This represents the output power of the frequency adjustment unit m; It is a time constant; It is an intermediate variable in the process; It is the delay time constant; It is the primary frequency modulation constant; It is the power generation command received by the frequency regulation unit m; Preferably, the tie-line power dynamic model is represented by the following expression: in, yes The first differential; It is the first The first frequency modulation zone and the first Power exchange coefficient between frequency modulation zones; It is the first Frequency deviation of each frequency modulation zone; Preferably, the area control error is represented by the following expression: ; in, Indicates regional control error; It is the frequency deviation coefficient.
6. The method according to any one of claims 1-5, characterized in that, For any of the frequency adjustment units, the independent subsystem model of the frequency adjustment unit can be represented as: ; in, yes The second derivative, and ; This represents the loop disturbance of the frequency adjustment unit; It is the output gain of the frequency adjustment unit. This indicates the virtual frequency deviation of the frequency adjustment unit.
7. The method according to any one of claims 1-5, characterized in that, The finite-time extended state observer is constructed in the following way: For any of the frequency adjustment units, an extended state model is constructed based on the independent subsystem model of the frequency adjustment unit; Based on the extended state model, construct the finite-time extended state observer; Preferably, the extended state model is represented by the following expression: ; in, , , This represents the power generation command received by the frequency regulation unit m. Represents the output gain to be selected, where, The total disturbance of the system loop represents different frequency modulation resources. This is the derivative or equivalent rate of change of the total disturbance; Preferably, the finite-time extended state observer is represented by the following expression: ; in, It is a state variable Observed values; It is a state variable The observation error; It is the observer gain; It is the power gain of the sign function, where sgn represents the sign function.
8. The method according to any one of claims 1-5, characterized in that, The finite-time synchronization controller is represented by the following expression: ; in, and It is the input gain matrix; It is a constant gain; in, ; .
9. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program, and the processor is used to execute the computer program to implement the method as described in any one of claims 1-8.
10. A computer-readable storage medium, characterized in that, The system stores a computer program / instructions that, when executed by a processor, implement the method as described in any one of claims 1-8.