Multi-agent autonomous collaborative micro-grid regulation method and system

By constructing state vectors and multi-factor payoff functions, and combining multiple rounds of policy adoption and update processes, the policy boundaries are dynamically adjusted, solving the problems of static policy construction and lack of collaborative feedback mechanisms in multi-agent allocation-microgrid regulation, and realizing the adaptive optimization and collaborative stability of the system.

CN120879787BActive Publication Date: 2026-06-23STATE GRID JIANGSU ELECTRIC POWER CO LTD NANTONG POWER SUPPLY BRANCH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
STATE GRID JIANGSU ELECTRIC POWER CO LTD NANTONG POWER SUPPLY BRANCH
Filing Date
2025-07-22
Publication Date
2026-06-23

Smart Images

  • Figure CN120879787B_ABST
    Figure CN120879787B_ABST
Patent Text Reader

Abstract

The application discloses a kind of multi-agent autonomous coordination's microgrid regulation and control method and system, it is related to power distribution network and microgrid collaborative control technical field, including based on microgrid and power distribution agent construction Intelligent agent and define regulation and control strategy space, construct revenue function expression each agent's strategy preference and resource boundary.Through the mechanism of evolutionary game, multiple rounds of strategy evolution are executed, and strategy adoption probability is updated according to the revenue function and relative strategy advantage value.Based on the regulation and control execution result, operating deviation data and coordination index are collected, and each agent's revenue function is dynamically corrected and the strategy boundary is synchronously adjusted.The method disclosed in the application realizes the dynamic evolution, autonomous optimization and collaborative stability of regulation and control strategy in microgrid environment by constructing multi-agent intelligent agent, introducing evolutionary game strategy updating mechanism, and combining operating deviation and coordination score feedback to correct revenue function and strategy boundary, which improves the autonomy, responsiveness and overall robustness of the system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of distribution network and microgrid coordinated control technology, specifically a multi-entity autonomous and coordinated distribution-microgrid control method and system. Background Technology

[0002] Coordinated regulation of distribution networks and microgrids is one of the key research directions in smart grids. Current mainstream research focuses on centralized optimal scheduling, distributed control algorithms, and multi-agent system models. Among these, multi-agent models are increasingly being applied to distribution-microgrid operation and management due to their adaptability to complex environments, ability to handle local states, and certain decision-making capabilities. Simultaneously, the introduction of intelligent algorithms such as evolutionary game theory and reinforcement learning has enhanced the response capability of the control system to dynamic disturbances and inter-device interactions, providing theoretical and algorithmic support for autonomous control mechanisms.

[0003] Although existing research has achieved some autonomous and collaborative goals, significant technical bottlenecks remain. Current multi-agent control models generally rely on static policy sets or predefined rules, making it difficult to adapt to the frequent changes in operating states and the heterogeneity of participating entities in distribution-microgrids. The lack of efficient collaborative feedback mechanisms and dynamic adjustment capabilities among multiple agents limits the strategy optimization space when facing problems such as power fluctuations, time delay response, and regional cooperation imbalances.

[0004] Furthermore, most existing methods are objective function optimization-oriented, neglecting behavioral stability, adaptive adjustment, and policy convergence during policy evolution, making it difficult to achieve continuous policy optimization and overall system robustness. For example, Chinese invention patent CN120200256A constructs a DN structure containing multiple microgrids; the day-ahead scheduling optimization model for DNs containing IEMGs focuses on operating costs and constraints; for DNs, it aims to solve non-convex nonlinear optimization problems including line losses; for IEMGs, the model is used to handle convex optimization problems considering the uncertainties of WT and PV output power; power exchange between DNs and IEMGs is achieved through connecting lines, thus a two-layer model framework is constructed to analyze the system; the solution process is a two-layer optimization iterative process, with the outer layer using the ATC method to obtain the interaction power values ​​between DNs and IEMGs, and the inner layer using the C&CG method to solve the stochastic distributed robust optimization problem of IEMGs. This method can effectively reduce system carbon emissions and operating costs while ensuring stable operation of the DN (Digital Network), providing a theoretical basis for promoting the efficient utilization of ammonia energy in the DN. CN120073668A discloses a method and system for joint optimization operation of multiple smart microgrids considering carbon capture. This method considers the differences between smart microgrids in different regions and the different characteristics of various loads in the smart microgrid. It applies carbon capture technology to activate the direct air capture operation mode of carbon dioxide during low electricity price periods and periods of high renewable energy generation, aiming to minimize the overall cost, and constructs a joint optimization operation model for multiple smart microgrids. By solving the joint optimization operation model, the joint optimization operation strategy of multiple smart microgrids is obtained. This invention optimizes the joint operation of smart microgrids connected to the distribution network in multiple regions and of multiple types. It makes full use of the spatial flexibility of new sources and loads, takes into account the impact of uncertain factors on various devices in the smart microgrid, and considers the application of new carbon capture technology. While ensuring the safe and stable operation of the microgrid, it reduces the system's operating cost. Both of these are goal-oriented, and the calculation processes of both are extremely complex and cumbersome. Summary of the Invention

[0005] In view of the above-mentioned problems, the present invention is proposed.

[0006] Therefore, the technical problem solved by this invention is that the existing multi-agent microgrid control methods have static strategy construction methods, lack dynamic adaptability, lack behavioral stability and strategy convergence mechanism in the evolution process, and lack of collaborative feedback mechanism among multiple agents, resulting in poor system robustness. The invention also addresses the problem of how to achieve strategy evolution among multiple agents, self-adjustment of the payoff function, and dynamic synchronous adjustment of boundary constraints.

[0007] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0008] First, this invention provides a multi-agent autonomous collaborative distribution-microgrid control method, comprising:

[0009] Each node in the microgrid and distribution network is abstracted as an independent intelligent agent node, and a state vector is defined for each agent node. The elements in the state vector include allowable boundaries, which are the influence relationships of the current agent node on other factors.

[0010] Based on the allowed boundaries, data sets under multiple dimensions are constructed, and the combination of all data sets in all dimensions forms a pre-selected strategy set. Based on the impact of each group of strategies in the pre-selected strategy set on the current node, a multi-factor function structure for evaluating the strategy returns is constructed to obtain the comprehensive return value of the current strategy. The multi-factor function structure is used to reflect the actual return level of the group of strategies in the current state.

[0011] A multi-round strategy adoption and update process is adopted to iteratively optimize the strategy selection behavior of each node, thereby ensuring that each agent node obtains the best strategy. The strategy selection trajectory, comprehensive benefit change curve and strategy replacement number of each agent node are recorded in the evolution process. Each iteration in the multi-round strategy adoption and update process represents a process of simultaneous evaluation and possible update of all strategies.

[0012] Several actual operational indicators are collected to generate a strategy execution deviation score. The strategy execution deviation score is used to quantify the degree of deviation between the strategy and the expected control target. Based on the cooperation relationship between microgrids in this iteration, the coordination score of each subject is evaluated. After each strategy evolution cycle is completed, if the strategy execution deviation score of a node is higher than the set upper limit threshold, the multi-factor function structure corresponding to it is adjusted. If the coordination score of a node is lower than the set coordination tolerance threshold, the system will perform boundary compression processing on the generated pre-selected strategy set. By limiting the upper limit of the adjustment capability and the lower limit of the control time window in its state vector, the current strategy space is convergently compressed, and the compressed strategy set is used as the update input for the next round of multi-round strategy adoption and update process.

[0013] Furthermore, including:

[0014] The multi-factor function structure includes the calculation of economic expenditure per unit of regulated power, the efficiency assessment of response delay time, and the penalty score for violations caused by voltage or power fluctuations. An adjustment coefficient is set for each factor and weighted synthesis is performed to obtain the comprehensive benefit value of the current strategy.

[0015] The economic expenditure is calculated based on the adjustment operation defined in the current strategy, and the unit adjustment power caused by it under the current state is calculated. The efficiency evaluation of the response delay time is to evaluate the response time from the adoption to the execution of the strategy, compare it with the system target response time, obtain the response delay time difference, and allocate responsiveness factor scores according to the size of the difference.

[0016] The penalty scoring for violations caused by voltage or power fluctuations includes incorporating the strategy into the system safety constraint model, using simulation to detect whether it may cause voltage or power parameters to exceed the limit boundaries, and calculating a violation penalty score accordingly.

[0017] Furthermore, including:

[0018] The system safety constraint model includes voltage constraints and power constraints, wherein the voltage constraint is expressed as:

[0019]

[0020] in, V represents the voltage at node s after the strategy is applied. min and V max Indicates the lower and upper limits of the voltage;

[0021] The power constraint is expressed as:

[0022]

[0023] in, Indicates the injected power after the strategy takes effect. Represents the reference power, ΔP max This indicates the maximum tolerable deviation.

[0024] Furthermore, including:

[0025] The penalty points for the violation are expressed as follows:

[0026]

[0027] Among them, S safe α represents the penalty points for violations. v Represents the voltage violation weight, α p Represents the power violation weight, ∈ v Indicates the voltage tolerance threshold, ∈ p denoted by , where represents the power tolerance threshold, and N represents the set of nodes affected by the policy.

[0028] Furthermore, including:

[0029] The multi-round policy adoption and update process iteratively optimizes the policy selection behavior of each node to ensure that each agent node obtains the best policy, including:

[0030] For each round of evolution, the evolution step size, policy update frequency and maximum number of iterations are set. After receiving local and neighborhood policy information, each agent calculates the probability of its corresponding policy being adopted based on the current comprehensive benefit value and the comparative benefit of other agents.

[0031] The calculation employs a transformation function based on strategy advantage, which is calculated from the difference between the target subject's current payoff and the target strategy payoff. The strategy change result in the current round is determined by sampling. After the subject's strategy is updated, it participates in the next round of payoff reassessment, forming a multi-round cascading evolution mechanism.

[0032] Determine whether the strategy has reached a stable state or the maximum number of rounds has been reached. If so, terminate the iteration; otherwise, start the iteration again.

[0033] Furthermore, including:

[0034] The probability of the corresponding strategy being adopted is expressed as a parameterized Sigmoid function, with the following specific expression:

[0035]

[0036] Among them, P adopt ΔU represents the probability of the current strategy being adopted in the next round, k represents the evolutionary temperature coefficient, which indicates the sensitivity to strategy differences, and η represents the neighborhood trust factor, which reflects the degree of trust of the node in the neighborhood payoff judgment. It is dynamically adjusted according to historical consistency. ΔU·η together form the curvature input of the Sigmoid function, reflecting the combined effect of payoff differences and neighborhood recognition. This function ensures that the adoption probability is close to 1 when the payoff advantage is obvious, while it adopts conservatively when the payoff difference is ambiguous or the neighborhood trust is low.

[0037] Furthermore, including:

[0038] The collection of four types of actual operational indicators generates a strategy execution deviation score, which is used to quantify the degree of deviation between the strategy and the expected control target, including:

[0039] Actual operating indicators are collected in real time from four dimensions: power flow regulation deviation, voltage fluctuation offset, strategy response delay, and neighborhood disturbance amplification. A strategy execution deviation score is calculated for each subject. The power flow regulation deviation is calculated from the difference between the actual power flow value generated after the current strategy is executed and the expected power change value of the strategy. The voltage fluctuation offset is derived from the difference in node voltage change before and after strategy execution. The strategy response delay is recorded by the time difference between the control command issuance time and the actual start of the strategy's regulation behavior. The neighborhood disturbance amplification is obtained by sensing the degree of disturbance of the operating parameters of other nodes in its communication neighborhood after the current node executes the strategy.

[0040] Furthermore, including:

[0041] The power flow regulation deviation is expressed as:

[0042]

[0043] in, This represents the power flow regulation deviation score for power distribution entity a. This represents the actual active power of the power distribution entity α after the strategy is executed. Let α represent the strategic expected power of the power distribution entity, and ε represent a small positive number to prevent division by zero.

[0044] The increase in neighborhood disturbance is expressed as:

[0045]

[0046] in, N represents the neighborhood disturbance magnitude score of power distribution entity a. a This represents the number of affected agent nodes within the neighborhood of power distribution entity a. This represents the node voltage after the policy is executed at neighboring node j. This represents the node voltage of neighboring node j before the strategy is executed. This indicates the power injected after the policy is executed at neighbor node j. ω represents the power injected before the policy is executed at neighbor node j. V ω represents the weight of the voltage disturbance effect. P This indicates the weight of the impact of power disturbances.

[0047] Furthermore, including:

[0048] The assessment of the collaborative scores of each entity based on the collaborative relationships between microgrids in this iteration includes:

[0049] The coordination score of each agent is evaluated from the following dimensions: consistency of command response between the current agent and its neighbors, distribution of coordination instruction execution delay, and record of behavioral stability during historical evolution.

[0050] Furthermore, the command response consistency score S sync Used to compare the consistency between the current subject's actual response command and the neighboring domain:

[0051]

[0052] Where, θ i θ represents the angle of the current subject's regulatory response. j θ represents the response state of the j-th node in the neighborhood. max The maximum difference in response states is represented by N, which represents the number of neighboring entities, and S is the maximum difference in response states. sync The value range is [0,1], and the higher the value, the more consistent the response; the control response angle represents the degree of deviation of the current subject's control behavior from the direction of neighboring subjects, and is mapped to angular coordinates;

[0053] Cooperative instruction execution latency score S delay Represented as:

[0054]

[0055] in, Indicates the actual response time of the current entity. T represents the reference response time issued by the system. tol This represents the system's maximum tolerable response time. If the tolerable value is exceeded, then S... delay =0.

[0056] Behavioral stability score is expressed as:

[0057]

[0058] Where, n change This indicates the number of times the current subject has changed its strategy in the last 10 rounds of evolution, derived from the state records of multiple rounds of strategy adoption and updating processes;

[0059] The final collaborative score is calculated as a weighted average:

[0060] S coop =0.4·S sync +0.3·S delay +0.3·S stab ;

[0061] Among them, S coop This represents the collaboration score; a higher value indicates a higher degree of cooperation from the subject towards the collaborative goal. S coop The output value range is [0,1], which is used to determine whether policy boundary compression is needed.

[0062] Furthermore, including:

[0063] Based on the collaborative relationships between microgrids in this iteration, after evaluating the collaborative scores of each entity, the following also applies:

[0064] The collaborative score is uploaded to the local scheduling fusion node for processing, and a collaborative evaluation matrix is ​​constructed as an external correction factor for the multi-factor function structure.

[0065] The local scheduling fusion node is established by dividing the power distribution area into multiple sub-regions during system operation. Each sub-region is equipped with a fusion node. At the end of the evolution cycle, all intelligent agent nodes report to the corresponding fusion node. The fusion node constructs the following collaborative evaluation matrix M from the collected data. coop :

[0066]

[0067] in, This represents the policy execution deviation score of the i-th smart agent node. Let represent the collaborative score of the i-th intelligent agent node, and n represent the total number of intelligent agent entities participating in the evolutionary game. Each row of the matrix corresponds to an entity participating in the game evolution. The first column stores the regulation deviation score of the entity, and the second column stores its collaborative ability score.

[0068] On the other hand, the present invention also provides a multi-agent autonomous collaborative matching-microgrid control system, including an agent modeling and strategy setting module, a game evolution strategy updating module, and a feedback correction and boundary adjustment module;

[0069] The agent modeling and strategy setting module is used to abstract each node in the microgrid and distribution network as an independent intelligent agent node, and to define a state vector for each agent node. The elements in the state vector include allowable boundaries, which are the influence relationships of the current agent node on other factors.

[0070] The game evolution strategy update module is used to construct a data set under multiple dimensions based on the allowed boundary, and the combination of all dimensions of the data set forms a pre-selected strategy set; based on the impact of each group of strategies in the pre-selected strategy set on the current node, a multi-factor function structure for evaluating the strategy payoff is constructed to obtain the comprehensive payoff value of the current strategy, and the multi-factor function structure is used to reflect the actual payoff level of the group of strategies in the current state;

[0071] A multi-round strategy adoption and update process is adopted to iteratively optimize the strategy selection behavior of each node, thereby ensuring that each agent node obtains the best strategy. The strategy selection trajectory, comprehensive benefit change curve and strategy replacement number of each agent node are recorded in the evolution process. Each iteration in the multi-round strategy adoption and update process represents a process of simultaneous evaluation and possible update of all strategies.

[0072] The feedback correction and boundary adjustment module is used to collect several actual operation indicators to generate a strategy execution deviation score. The strategy execution deviation score is used to quantify the degree of deviation between the strategy and the expected control target. Based on the cooperation relationship between microgrids in this iteration, the collaborative score of each subject is evaluated. After each strategy evolution cycle is completed, if the strategy execution deviation score of a node is higher than the set upper limit threshold, the multi-factor function structure corresponding to it is adjusted. If the collaborative score of a node is lower than the set collaborative tolerance threshold, the system will perform boundary compression processing on the basis of the generated pre-selected strategy set. By limiting the upper limit of the adjustment capability and the lower limit of the control time window in its state vector, the current strategy space is convergently compressed, and the compressed strategy set is used as the update input for the next round of multi-round strategy adoption and update process.

[0073] Thirdly, the present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, characterized in that the processor executes the computer program to implement the steps of the multi-agent autonomous collaborative distribution-microgrid control method described above.

[0074] Fourthly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, it implements the steps of the multi-agent autonomous collaborative distribution-microgrid control method described above.

[0075] The beneficial effects of this invention are:

[0076] (1) The multi-agent autonomous collaborative distribution-microgrid control method provided by this invention constructs a state vector and a set of strategies, and establishes a payoff function through a refined agent modeling mechanism. This payoff function includes multiple factors, such as the economic expenditure calculation per unit of regulated electricity, the efficiency assessment of response delay time, and the penalty score for violations caused by voltage or power fluctuations. The penalty score is obtained through a system security constraint model. Therefore, it achieves the beneficial effect of enabling the control strategy to have differentiated, adaptive, and multi-objective evaluation capabilities, providing both mathematical and engineering support for distributed evolutionary game theory.

[0077] (2) Through an evolutionary game mechanism, multiple rounds of strategy evolution and policy updates are performed. Each current node calculates its relative policy advantage value based on the difference between its current policy's overall payoff and the overall payoffs of other policies in its neighborhood. This advantage value is then used as an input variable in a preset policy adoption probability function to generate a policy adoption probability. Based on this probability, it is determined whether to perform a policy replacement in the current round. If a policy replacement is determined, the main node selects a new policy from its pre-selected policy set and recalculates the overall payoff index of that policy according to the multi-factor function structure, using it as the input for the next round of evolution. This method achieves the beneficial effects of avoiding local optima, enhancing the policy evolution capability driven by local information, and balancing flexibility and stability. It allows the system to achieve high-quality regulation without relying on a global optimum solution.

[0078] (3) This invention constructs a strategy execution deviation score to quantify the degree of deviation between the strategy and the expected control target, and to construct a collaborative score to determine whether strategy boundary compression is necessary. If the strategy execution deviation score of an entity is higher than a set upper limit threshold, the structure of its corresponding comprehensive return function is adjusted, the weight of the economic factor is appropriately reduced, and the penalty coefficient of the safety factor is increased. The adjustment ratio is adaptively updated based on the recorded historical return fluctuation trend to ensure the smoothness and convergence of weight changes.

[0079] Specifically, if an agent's collaboration score falls below a set collaboration tolerance threshold, the system will perform boundary compression based on the generated pre-selected strategy set. By limiting the upper limit of the adjustment capability and the lower limit of the control window in its state vector, the system convergently compresses the current strategy space and uses the compressed strategy set as its update input in the next round of evolutionary game. Ultimately, the correction process completes a closed-loop adjustment mechanism from strategy evolution feedback to strategy space definition, enabling the comprehensive payoff function and strategy set to continuously and dynamically evolve according to actual execution and collaboration performance. This improves the adaptability, stability, and robustness of the multi-agent system. Furthermore, by correcting the payoff function based on feedback and simultaneously adjusting the strategy boundary, the system achieves the beneficial effects of strengthening the adaptive capability of the payoff model, improving the stability of multi-agent collaboration, and supporting two-way feedback regulation of behavior and strategy.

[0080] (4) By constructing a multi-agent intelligent agent, introducing an evolutionary game strategy update mechanism, and combining operational deviation and collaborative scoring feedback to correct the payoff function and strategy boundary, this invention realizes the dynamic evolution, autonomous optimization and collaborative stability of the control strategy in the distribution-microgrid environment, thereby improving the system's autonomy, responsiveness and overall robustness. Attached Figure Description

[0081] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0082] Figure 1 This is an overall flowchart of a multi-entity autonomous and collaborative distribution-microgrid control method provided in the first embodiment of the present invention;

[0083] Figure 2 This is a schematic diagram of a multi-entity autonomous and collaborative distribution-microgrid control system provided for the second embodiment of the present invention. Detailed Implementation

[0084] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.

[0085] Example 1, referring to Figure 1 As an embodiment of the present invention, a multi-agent autonomous collaborative distribution-microgrid control method is provided, comprising:

[0086] S1: Construct intelligent agents based on microgrids and power distribution entities and define the control strategy space. Construct a revenue function to express the strategy preferences and resource boundaries of each entity.

[0087] Each node in the microgrid and distribution network is abstracted as an independent intelligent agent node, and a state vector is defined for each agent node, including the node's current state, electrical parameters, regulation capability, and allowable boundaries.

[0088] Based on the cost of electricity, voltage stability constraints, and adjustable load capacity, a set of pre-selected strategies is constructed for each distribution entity. The strategy set includes multi-dimensional decision variables, including power regulation, load reduction, energy storage charging and discharging, and interaction information rate that the distribution node can participate in.

[0089] In this invention, the power distribution entity refers to each power distribution node in the power system that participates in the control method of this invention.

[0090] Furthermore, for each microgrid and distribution node in the power distribution system that is abstracted as a smart agent, a pre-selected strategy set for that distribution entity is first constructed based on its defined state vector, combined with the current energy cost, voltage stability constraints, and the adjustable load capacity of the distribution entity. The construction process is as follows:

[0091] State vector input: Extract the regulatory capacity and allowable boundaries directly from the current state vector of the agent node.

[0092] Strategy variable dimension identification: Each strategy in the strategy set consists of multiple dimensions, including but not limited to power regulation, load reduction, energy storage charging and discharging, and interaction information rate that the distribution node can participate in.

[0093] Strategy combination generation: For each decision dimension, enumerate several possible values ​​under the allowed boundaries and combine them into a multi-dimensional strategy set. The combination of all dimensions forms a pre-selected strategy set.

[0094] Output: Generates a set of pre-selected strategies for each power distribution entity, which serves as input for subsequent revenue evaluation and strategy evolution.

[0095] In this embodiment, the main distribution entity is a distribution network under consideration, and the allowable boundary is a period of time under consideration. Then, the corresponding prediction strategy set is a multi-dimensional strategy set that includes all distribution network nodes under the allowable boundary. The strategy set includes the power regulation, load reduction, energy storage charging and discharging, and interaction information rate that the distribution node can participate in at different times. Each strategy set includes the specific data of the current distribution node under the four dimensions within the allowable boundary.

[0096] A multi-factor function structure for evaluating the benefits of strategies is constructed based on the state vectors of each power distribution entity. This structure includes calculation of the economic expenditure per unit of regulated power, efficiency assessment of response delay time, and penalty scoring for violations caused by voltage or power fluctuations.

[0097] Furthermore, after obtaining the pre-selected strategy set, each strategy needs to be quantitatively evaluated. The evaluation method is based on the existing information in the state vectors of each agent, constructing a set of multi-factor function structures to reflect the actual payoff level of the strategy in the current state.

[0098] The construction process is as follows:

[0099] Input information includes the current node's state vector and the single policy combination to be evaluated.

[0100] Calculation process: Based on the adjustment operations defined in the current strategy, calculate the unit adjustment power consumption caused by the operation under the current conditions. Combine this with the power cost information to obtain the corresponding economic expenditure. The economic expenditure is power consumption multiplied by cost.

[0101] Furthermore, the regulation operations include power regulation, load reduction, energy storage charging and discharging, and interactive information rate.

[0102] The response time from adoption to execution of the strategy is evaluated and compared with the system target response time to obtain the response delay time difference. The responsiveness factor score is then assigned based on the magnitude of the difference. The system target response time refers to the upper limit of the acceptable response time predetermined during the design of a certain target in the system, usually in milliseconds or seconds.

[0103] The strategy is incorporated into the system safety constraint model, and simulation is used to detect whether it may cause voltage or power parameters to exceed the limit boundaries, and a violation penalty score is calculated accordingly.

[0104] Furthermore, the system safety constraint model includes voltage constraints and power constraints. Voltage constraints are expressed as:

[0105]

[0106] in, V represents the voltage at node s after the strategy is applied. min and V max The lower and upper limits of the voltage are indicated; in this invention, they are set to 0.95 and 1.05 pu, respectively.

[0107] The power constraint is expressed as:

[0108]

[0109] in, Indicates the injected power after the strategy takes effect. Represents the reference power, ΔP max This indicates the maximum tolerable deviation.

[0110] The penalty points for violations are expressed as follows:

[0111]

[0112] Among them, S safe Indicates the penalty points for violations. α v This represents the voltage violation weight. α p Represents the power violation weight. ∈ v This represents the voltage tolerance threshold. p This represents the power tolerance threshold. N represents the set of nodes affected by the policy.

[0113] Output results: The three types of factor values ​​serve as the economic efficiency, responsiveness, and safety scores of the strategy in the current state, providing a basis for the subsequent synthesis of the overall return value.

[0114] The factor scores from the three dimensions mentioned above are weighted and combined to obtain the comprehensive return value of each strategy under the current agent node state. The specific process is as follows:

[0115] Input information: economic factor value, responsiveness factor value, and safety factor value.

[0116] Adjustment coefficient settings: Economic factor weight set to 0.4. Responsiveness factor weight set to 0.3. Safety factor weight set to 0.3.

[0117] Multiply the economic factor by 0.4, the responsiveness factor by 0.3, and the safety factor by 0.3. Add the three products together to obtain the overall payoff of the strategy. The output is a real number representing the overall payoff of this strategy combination, used for comparison, selection, and updating during the strategy game phase.

[0118] S2: Execute multiple rounds of strategy evolution through an evolutionary game mechanism, and update the strategy adoption probability based on the payoff function and the relative strategy advantage value.

[0119] After completing the construction of the pre-selected strategy set for each intelligent agent node in S1 and calculating the corresponding comprehensive strategy benefit index, the strategy evolution stage begins.

[0120] This step iteratively optimizes the strategy selection behavior of each agent through an evolutionary game mechanism, specifically including multiple rounds of strategy adoption and updating processes, as follows:

[0121] First, a fixed evolution step size, policy update frequency, and maximum number of iterations are set for each round of evolution. In each iteration, each intelligent agent node obtains the currently selected policy from its corresponding pre-selected policy set and uses the comprehensive benefit value of the policy (output by step S1) as the performance of the current policy.

[0122] Furthermore, this invention sets the evolution step size to 1 round, meaning each iteration represents a process of simultaneous evaluation and possible updates of all subject policies. The policy update frequency is set to determine the policy update immediately after each evaluation, i.e., the update frequency is once per step. The maximum number of iterations is set to 100 rounds; exceeding this number of rounds will forcibly terminate the evolution.

[0123] Each intelligent agent node selects its current policy from its pre-selected policy set by prioritizing the policy with the highest overall return value U2 as the initial policy. If multiple policies have the same highest return value, one is randomly selected from these candidates as the initial policy. This method ensures that policy initialization has optimal bias while preserving necessary evolutionary diversity.

[0124] Each node simultaneously receives the current policy and corresponding total revenue value U from other agents within its neighborhood. 1N This serves as reference information for game theory. Based on this, and using the difference between the overall payoff of the current strategy and the overall payoff of other strategies in the neighborhood, the relative strategy advantage value ΔU of the current strategy is calculated. This strategy advantage value is then used as an input variable in a preset strategy adoption probability function to generate the strategy adoption probability.

[0125] Furthermore, the difference between the overall return of the current strategy and the overall return of other strategies within the neighborhood is calculated by subtracting the average overall return of the current strategies of the nodes in the communication neighborhood from the overall return of the current strategy of the main distribution node. That is, there may be multiple communication neighborhood nodes, and the difference in overall returns is expressed as:

[0126]

[0127] The policy adoption probability function adopts a parameterized sigmoid function form. The control parameters of the function are determined by three types of information: first, the evolutionary temperature set by the system; second, the magnitude of the current policy change; and third, the neighborhood trust factor. These three factors jointly influence the tendency and probability distribution of policy replacement.

[0128] Furthermore, the policy adoption probability function adopts a parameterized sigmoid function form, with the specific expression as follows:

[0129]

[0130] Among them, P adoptThis represents the probability that the current strategy will be adopted (retained) in the next round. ΔU represents the aforementioned relative strategy advantage value. k represents the evolutionary temperature coefficient, indicating the sensitivity to strategy differences, with a value range of [0.5, 5.0], and is set to 1.5 by default in this invention. η represents the neighborhood trust factor, reflecting the degree of trust a node has in its judgment of neighborhood gains, dynamically adjusted based on historical consistency, with an initial value of 1. ΔU·η together form the curvature input term of the Sigmoid function, reflecting the combined effect of gain differences and neighborhood acceptance. This function ensures that the adoption probability is close to 1 when the gain advantage is obvious, while adopting conservatively when the gain difference is ambiguous or the neighborhood trust is low.

[0131] Based on the calculated adoption probability P adopt The system determines whether to perform a strategy replacement in the current round. If a strategy replacement is determined, the main node selects a new strategy from its pre-selected strategy set and recalculates the comprehensive return index of the strategy based on the multi-factor function structure defined in S1, which is then used as the input for the next round of evolution.

[0132] Furthermore, the specific method for determining strategy replacement is as follows: For each main node, generate a random number in the range [0,1] in the current round. If the random number is greater than the calculated adoption probability P... adopt If the policy is determined to be replaced, the policy will be replaced; otherwise, the current policy will be retained. Once a policy is determined to be replaced, the principal node will select a policy from its pre-selected policy set according to the following rules:

[0133] Excluding the current strategy, a weighted random selection is performed from the remaining strategy options, based on the overall return value. This means that strategies with higher overall returns have a greater probability of being selected, but strategies with lower returns may still be sampled. This mechanism combines greedy prioritization with perturbation exploration, balancing optimality with evolutionary diversity and avoiding early entrapment in local optima.

[0134] This iterative process involves multiple rounds of evolution and updates, forming a cascading strategy evolution driven by layered feedback loops. A strategy stability assessment mechanism is also implemented to determine whether the evolutionary process should terminate. The criteria for termination include: the strategy remaining unchanged for several consecutive rounds; the average overall return increase across all agent nodes being less than a set return increase threshold; or the maximum set number of iterations (100) being reached. The current evolutionary process terminates when any one of these conditions is met.

[0135] Furthermore, in this embodiment, the threshold for the number of consecutive rounds without strategy changes is set to 10 rounds. The profit increase threshold of this invention is 0.5%, that is, the difference between the average profit of the current round and the previous round divided by the profit of the previous round is less than 0.005.

[0136] To ensure the traceability of the evolution process, a state recording mechanism is set up to record the strategy selection trajectory, overall benefit change curve, and number of strategy replacements of each subject during the evolution process, for use in subsequent analysis and adjustment steps.

[0137] S3: Based on the results of regulation and control, collect operational deviation data and collaborative indicators, dynamically correct the revenue functions of each entity, and simultaneously adjust the strategy boundaries.

[0138] After completing multiple rounds of strategy evolution in S2 and forming the final selected strategies and their comprehensive return indicators for each subject in the current round, the strategy evaluation and feedback correction phase begins.

[0139] First, based on the current policy term output by S2 and the corresponding trajectory of changes in overall revenue, combined with the state vector and allowable boundaries defined in S1, the effect of the regulation execution is evaluated in real time.

[0140] Four types of actual operating indicators are collected, including calculating the power flow regulation deviation based on the current strategy execution status, evaluating the voltage fluctuation offset caused by the strategy implementation, recording the difference between the strategy response delay and the target response time, and detecting the disturbance amplitude in the neighborhood to reflect the degree of strategy spillover effect.

[0141] Furthermore, the power flow regulation deviation is calculated from the difference between the actual power flow value generated after the current strategy is executed and the expected power change value of the strategy. The voltage fluctuation offset is the difference in node voltage change before and after strategy execution. The difference between the strategy response delay and the target response time is calculated by recording the time difference between the control command issuance time and the actual start of the strategy's regulation behavior. The neighborhood disturbance amplitude is calculated by sensing the degree of disturbance to the operating parameters (voltage, frequency, power) of other nodes in its communication neighborhood after the current node executes the strategy.

[0142] The power flow regulation deviation is expressed as:

[0143]

[0144] in, This represents the power flow regulation deviation score of power distribution entity a. This represents the actual active power of the power distribution entity α after the strategy is executed. Let α represent the strategic desired power of the power distribution entity. ε represents a small positive number to prevent division by zero.

[0145] The magnitude of the disturbance in the neighborhood is expressed as:

[0146]

[0147] in, This represents the disturbance magnitude score in the neighborhood of power distribution entity a. aThis represents the number of affected agent nodes within the neighborhood of power distribution entity a. This represents the node voltage after the neighboring node j's strategy is executed. This represents the node voltage of neighboring node j before the strategy is executed. This indicates the power injected after the neighboring node j executes the strategy. This represents the power injected before the policy is executed at neighboring node j. ω V This indicates the weighting of the impact of voltage disturbances. ω P This indicates the weight of the impact of power disturbances.

[0148] After processing the four types of indicators in a unified manner, a strategy execution deviation score S is generated. dev It is used to quantify the degree of deviation between the strategy and the expected control target.

[0149] Furthermore, the unified processing includes normalization and weighting. First, the four types of indicators are normalized to [0,1], and then weighted. A weight coefficient is set for each indicator: power flow regulation deviation weight: 0.35, voltage fluctuation offset weight: 0.3, strategy response delay weight: 0.2, and neighborhood disturbance amplification weight: 0.15.

[0150] Finally, the four categories of indicators are summed according to their weights to generate a strategy execution deviation score.

[0151] Based on the collaborative relationship between microgrids in this round of regulation, the collaborative score of each subject is evaluated from the following dimensions: consistency of command response between the current subject and its neighboring domain, distribution of collaborative instruction execution delay, and record of behavioral stability during historical evolution, which comes from the state recording mechanism in S2.

[0152] Furthermore, the command response consistency score S sync Used to compare the consistency between the current subject's actual response command and the neighboring domain:

[0153]

[0154] Where, θ i θ represents the current subject's regulatory response angle, indicating the degree of deviation of the current subject's regulatory behavior from the direction of neighboring subjects, and is mapped to angular coordinates or response state codes. j This represents the response state of the j-th node in the neighborhood. θ max This represents the maximum difference in response states. N represents the number of neighboring entities. S sync The value range is [0,1], and the higher the value, the more consistent the response.

[0155] Cooperative instruction execution latency score S delay Represented as:

[0156]

[0157] in, This indicates the actual response time of the current entity. Indicates the system's reference response time. T tol This represents the system's maximum tolerable response time (set to 10s in this invention). If the tolerable value is exceeded, then S... delay =0.

[0158] Behavioral stability score is expressed as:

[0159]

[0160] Where, n change This indicates the number of times the current subject has changed its strategy in the last 10 rounds of evolution, derived from the state records in S2.

[0161] The final collaborative score is calculated as a weighted average:

[0162] S coop =0.4·S sync +0.3·S delay +0.3·S stab ;

[0163] Among them, S coop This represents the collaboration score; a higher value indicates a higher degree of cooperation from the subject towards the collaborative goal. S coop The output value range is [0,1], which is used to determine whether policy boundary compression is required.

[0164] The collaborative score is uploaded to the local scheduling fusion node for processing, and a collaborative evaluation matrix is ​​constructed, which serves as an external correction factor for the multi-factor comprehensive benefit function constructed in S1.

[0165] Furthermore, during system operation, the power distribution area is divided into multiple sub-regions, each equipped with a local scheduling fusion node. The local scheduling fusion node is implemented using edge computing devices and possesses the capabilities of receiving policy scores, performing fusion analysis, and issuing correction signals. All intelligent agent nodes report their findings to the corresponding fusion node at the end of the evolution cycle. The fusion node constructs the following collaborative evaluation matrix M from the collected data. coop :

[0166]

[0167] in, This represents the policy execution deviation score of the i-th smart agent node. This represents the collaborative score of the i-th intelligent agent node. n represents the total number of intelligent agent entities participating in the evolutionary game. Each row of the matrix corresponds to an entity participating in the game evolution. The first column stores the entity's regulatory deviation score, which is used to determine whether its payoff function needs to be modified. The second column stores its collaborative ability score, which is used to determine whether the policy boundary needs to be restricted.

[0168] After each strategy evolution cycle is completed, if the strategy execution deviation score of a certain subject is... If the return exceeds the set upper limit threshold, the corresponding comprehensive return function structure will be adjusted. The weight of the economic factor (originally default weight 0.4) in S1 will be appropriately reduced, while the penalty coefficient of the safety factor (originally default weight 0.3) will be increased. The adjustment ratio will be adaptively updated based on the historical return fluctuation trend recorded in S2 stage to ensure the smoothness and convergence of weight changes.

[0169] If a certain subject's collaborative scoring If the value falls below the set cooperation tolerance threshold, the system will perform boundary compression processing based on the pre-selected strategy set generated in S1. By limiting the upper limit of the adjustment capability and the lower limit of the control window in its state vector, the system will perform convergent compression of the current strategy space and use the compressed strategy set as its update input in the next round of evolutionary game in S2.

[0170] Furthermore, the present invention sets the upper limit threshold to 0.6 and the cooperation tolerance threshold to 0.5.

[0171] Ultimately, the correction process completes a closed-loop adjustment mechanism from the evolution and feedback of the S2 strategy to the definition of the S1 strategy space, enabling the comprehensive benefit function and strategy set to continuously and dynamically evolve based on actual execution and collaborative performance, thereby improving the adaptability, stability, and robustness of the multi-agent system.

[0172] It should be noted that S3 constructs a feedback loop by introducing execution deviation scoring and collaborative scoring mechanisms, dynamically linking the execution results of the S2 strategy with the comprehensive payoff function and strategy boundary constructed in S1. The design, based on a normalized and weighted deviation scoring system and a structured collaborative scoring model, can quantitatively identify individual regulatory deviations and insufficient group collaboration. Combined with a threshold judgment mechanism, it triggers payoff weight adjustments and strategy space convergence, effectively preventing the spread of imbalanced behavior, improving the stability, adaptability, and collaboration of regulatory behavior, providing a modified strategy input for the next round of evolution, and enhancing the overall robustness of the system.

[0173] Example 2, refer to Figure 2 As an embodiment of the present invention, a multi-agent autonomous collaborative microgrid control system is provided, including an agent modeling and strategy setting module 100, a game evolution strategy update module 200, and a feedback correction and boundary adjustment module 300.

[0174] The agent modeling and strategy setting module 100 is used to construct intelligent agents based on microgrids and power distribution entities and define the control strategy space, and construct a revenue function to express the strategy preferences and resource boundaries of each entity.

[0175] The agent modeling and strategy setting module 100 includes: intelligent agent abstraction submodule 101 and strategy space construction submodule 102.

[0176] Furthermore, the intelligent agent abstraction submodule 101 is used to model each microgrid and important distribution node in the distribution network as an intelligent agent node with independent perception and response capabilities, and to define its state vector, which contains physical and strategic attributes of the node such as voltage, current, load capacity, electricity price, and adjustable boundary.

[0177] The strategy space construction submodule 102 is used to define a set of control strategies based on the state vector of the agent node. The strategy set covers the operation options that the node can execute, such as power regulation, load reduction, energy storage scheduling, and information interaction. It also combines the electricity price function, voltage stability threshold, and boundary conditions to construct the payoff function of each strategy, which serves as the evaluation basis for the subsequent game evolution process.

[0178] It should be noted that the intelligent agent abstraction submodule 101 provides the agent granularity control capability for this system, and the policy space construction submodule 102 provides a policy candidate set for each node, which is a prerequisite for executing the evolutionary game mechanism.

[0179] The game evolution strategy update module 200 is used to perform multiple rounds of strategy evolution through the evolutionary game mechanism, and update the strategy adoption probability based on the payoff function and the relative strategy advantage value.

[0180] The game evolution strategy update module 200 includes: strategy adoption evolution submodule 201 and strategy advantage calculation submodule 202.

[0181] Furthermore, the strategy advantage calculation submodule 202 is used to compare the strategies of each agent node in each evolution cycle, and calculate the relative advantage value of the current strategy in the group based on historical returns and collaborative performance.

[0182] The strategy adoption evolution submodule 201 is used to combine the replicating dynamic equation in game theory to calculate the adoption probability based on the relative differences in the strategy advantages among agent nodes, and dynamically update the strategy selection of each node to realize the distributed evolution of the strategy in the entire network.

[0183] It should be noted that the strategy advantage calculation submodule 202 provides the basic data for the evolution of the group game, while the strategy adoption evolution submodule 201 is responsible for the actual execution of strategy updates and is the core evolution mechanism of the match-micronet regulation process.

[0184] The feedback correction and boundary adjustment module 300 is used to collect operational deviation data and collaborative indicators based on the control execution results, dynamically correct the revenue functions of each subject, and synchronously adjust the strategy boundary.

[0185] The feedback correction and boundary adjustment module 300 includes: a strategy deviation evaluation submodule 301 and a revenue function and boundary correction submodule 302.

[0186] Furthermore, the strategy deviation assessment submodule 301 is used to collect the control execution deviation of each agent node after each round of strategy evolution, including power regulation error, voltage fluctuation response delay, node disturbance transmission effect, etc., and calculate the strategy deviation score and cooperative stability score according to the predetermined weight.

[0187] The revenue function and boundary correction submodule 302 is used to adaptively adjust the weight coefficients of the economy, responsiveness and safety factors in the revenue function according to the node's scoring results. At the same time, it compresses or expands the policy response range, charging and discharging capabilities and other policy boundary parameters, so that the node has a better policy convergence space in the next round of evolution.

[0188] It should be noted that the feedback correction module 300 implements the system's punishment and control mechanism for nodes exhibiting undesirable behavior, enhancing the stability and coordination of the strategy update process. This is a key element in improving the system's dynamic robustness and multi-agent autonomy.

[0189] Other technical features of the multi-entity autonomous collaborative distribution-microgrid control system described in this embodiment are similar to those of the corresponding multi-entity autonomous collaborative distribution-microgrid control method, and will not be repeated here.

[0190] Thirdly, the present invention also provides a multi-entity autonomous and collaborative distribution-microgrid control system, comprising:

[0191] One or more processors;

[0192] Memory, used to store one or more programs;

[0193] When the one or more programs are executed by the one or more processors, the one or more processors implement the methods described above.

[0194] Finally, the present invention also provides a storage medium containing computer-executable instructions, which, when executed by a processor, enable the processor to perform the aforementioned multi-agent autonomous collaborative distribution-microgrid control method.

[0195] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. "A plurality of" means two or more, unless otherwise explicitly specified.

[0196] In this invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," "linking," and "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral part; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; they can refer to the internal communication of two components or the interaction between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0197] In this invention, unless otherwise explicitly specified and limited, "above" or "below" the second feature can mean that the first feature is in direct contact with the second feature, or that the first feature is in indirect contact with the second feature through an intermediate medium. Furthermore, "above," "over," and "on top" of the second feature can mean that the first feature is directly above or diagonally above the second feature, or simply that the first feature is at a higher horizontal level than the second feature. "Below," "below," and "under" the second feature can mean that the first feature is directly below or diagonally below the second feature, or simply that the first feature is at a lower horizontal level than the second feature.

[0198] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Furthermore, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0199] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of the preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.

[0200] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.

[0201] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0202] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

[0203] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

[0204] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.

Claims

1. A multi-agent autonomous and collaborative distribution-microgrid regulation method, characterized in that, include: Each node in the microgrid and distribution network is abstracted as an independent intelligent agent node, and a state vector is defined for each agent node. The elements in the state vector include allowable boundaries, which are the influence relationships of the current agent node on other factors. Based on the allowed boundaries, data sets under multiple dimensions are constructed, and the combination of all data sets in all dimensions forms a pre-selected strategy set. Based on the impact of each group of strategies in the pre-selected strategy set on the current node, a multi-factor function structure for evaluating the strategy returns is constructed to obtain the comprehensive return value of the current strategy. The multi-factor function structure is used to reflect the actual return level of the group of strategies in the current state. A multi-round strategy adoption and update process is adopted to iteratively optimize the strategy selection behavior of each node, thereby ensuring that each agent node obtains the best strategy. The strategy selection trajectory, comprehensive benefit change curve and strategy replacement number of each agent node are recorded in the evolution process. Each iteration in the multi-round strategy adoption and update process represents a process of simultaneous evaluation and possible update of all strategies. Several actual operational indicators are collected to generate a strategy execution deviation score. The strategy execution deviation score is used to quantify the degree of deviation between the strategy and the expected control target. Based on the cooperation relationship between microgrids in this iteration, the coordination score of each node is evaluated. After each strategy evolution cycle, if the strategy execution deviation score of a node is higher than the set upper limit threshold, the multi-factor function structure corresponding to it is adjusted. If the coordination score of a node is lower than the set cooperation tolerance threshold, the system will perform boundary compression processing on the generated pre-selected strategy set. By limiting the upper limit of the adjustment capability and the lower limit of the control time window in its state vector, the current strategy space is convergently compressed, and the compressed strategy set is used as the update input for the next round of multi-round strategy adoption and update process.

2. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 1, characterized in that: The multi-factor function structure includes the calculation of economic expenditure per unit of regulated power, the efficiency assessment of response delay time, and the penalty score for violations caused by voltage or power fluctuations. An adjustment coefficient is set for each factor and weighted synthesis is performed to obtain the comprehensive benefit value of the current strategy. The economic expenditure is calculated based on the adjustment operation defined in the current strategy, and the unit adjustment power caused by it under the current state is calculated. The efficiency assessment of the response delay time is to evaluate the response time from the adoption to the execution of the strategy, compare it with the system target response time, obtain the response delay time difference, and assign a responsiveness factor score according to the size of the difference. The penalty score for violations caused by voltage or power fluctuations includes putting the strategy into the system safety constraint model, detecting through simulation whether it may cause the voltage or power parameters to exceed the limit boundary, and calculating a violation penalty score accordingly.

3. The multi-agent autonomous collaborative distribution-microgrid control method as described in claim 2, characterized in that: The system safety constraint model includes voltage constraints and power constraints, wherein the voltage constraint is expressed as: in, V represents the voltage at node s after the strategy is applied. min and V max Indicates the lower and upper limits of the voltage; The power constraint is expressed as: in, Indicates the injected power after the strategy takes effect. Represents the reference power, ΔP max This indicates the maximum tolerable deviation.

4. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 3, characterized in that: The penalty points for the violation are expressed as follows: Among them, S safe α represents the penalty points for violations. v Represents the voltage violation weight, α p Represents the power violation weight, ∈ v Indicates the voltage tolerance threshold, ∈ p denoted by , where represents the power tolerance threshold, and N represents the set of nodes affected by the policy.

5. The multi-agent autonomous collaborative distribution-microgrid control method as described in claim 1, characterized in that: The multi-round policy adoption and update process iteratively optimizes the policy selection behavior of each node to ensure that each agent node obtains the best policy, including: For each round of evolution, set the evolution step size, policy update frequency and maximum number of iterations. After receiving local and neighborhood policy information, each node calculates the probability of its corresponding policy being adopted based on the current comprehensive benefit value and the comparison of benefits with other nodes. The calculation adopts a transformation function based on strategy advantage, which is calculated by the difference between the current payoff of the target node and the payoff of the target strategy. The strategy change result in the current round is determined by sampling. After the main strategy is updated, it participates in the payoff reassessment in the next round, forming a multi-round cascading evolution mechanism. Determine whether the strategy has reached a stable state or the maximum number of rounds has been reached. If so, terminate the iteration; otherwise, start the iteration again.

6. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 5, characterized in that: The probability of the corresponding strategy being adopted is expressed as a parameterized Sigmoid function, with the following specific expression: Among them, P adopt ΔU represents the probability of the current strategy being adopted in the next round, k represents the evolutionary temperature coefficient, which indicates the sensitivity to strategy differences, and η represents the neighborhood trust factor, which reflects the degree of trust of the node in the neighborhood payoff judgment. It is dynamically adjusted according to historical consistency. ΔU·η together form the curvature input of the Sigmoid function, reflecting the combined effect of payoff differences and neighborhood recognition. This function ensures that the adoption probability is close to 1 when the payoff advantage is obvious, while it adopts conservatively when the payoff difference is ambiguous or the neighborhood trust is low.

7. The multi-agent autonomous collaborative distribution-microgrid control method as described in claim 1, characterized in that: Four types of actual operational indicators are collected to generate a strategy execution deviation score, which is used to quantify the degree of deviation between the strategy and the expected control target, including: Actual operating indicators are collected in real time from four dimensions: power flow regulation deviation, voltage fluctuation offset, strategy response delay, and neighborhood disturbance amplification. A strategy execution deviation score is calculated for each node. The power flow regulation deviation is calculated from the difference between the actual power flow value generated after the current strategy is executed and the expected power change value of the strategy. The voltage fluctuation offset is the difference in node voltage change before and after strategy execution. The strategy response delay is recorded by the time difference between the control command issuance time and the actual start of the strategy's regulation behavior. The neighborhood disturbance amplification is obtained by sensing the degree of disturbance of the operating parameters of other nodes in its communication neighborhood after the current node executes the strategy.

8. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 7, characterized in that: The power flow regulation deviation is expressed as: in, This represents the power flow regulation deviation score of intelligent agent node a. This represents the actual active power of the intelligent agent node α after the policy is executed. Let ε represent the expected power of the strategy for node α, and let ε represent a small positive number to prevent division by zero. The increase in neighborhood disturbance is expressed as: in, N represents the neighborhood disturbance magnitude score of smart agent node a. a This represents the number of affected proxy nodes within the neighborhood of node a. V represents the node voltage after the policy is executed at neighboring node j. j base This represents the node voltage of neighboring node j before the strategy is executed. This indicates the power injected after the policy is executed at neighbor node j. ω represents the power injected before the policy is executed at neighbor node j. V ω represents the weight of the voltage disturbance effect. P This indicates the weight of the impact of power disturbances.

9. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 1, characterized in that: The evaluation of each node's collaborative score based on the collaborative relationships between microgrids in this iteration includes: The collaboration score of each node is evaluated from the following dimensions: consistency of command response between the current intelligent agent node and its neighbors, distribution of collaboration instruction execution latency, and record of behavioral stability during historical evolution. Furthermore, the command response consistency score S sync Used to compare the consistency between the current node's actual response command and that of its neighbors: Where, θ i θ represents the current control response angle of the intelligent agent node. j θ represents the response state of the j-th smart agent node in the neighborhood. max The maximum difference in response states is represented by N, which represents the number of neighboring entities, and S is the maximum difference in response states. sync The value range is [0,1], and the higher the value, the more consistent the response; the control response angle represents the degree of deviation of the current node's control behavior from the direction of the neighboring main body, and is mapped to angular coordinates; Cooperative instruction execution latency score S delay Represented as: in, This indicates the actual response time of the current smart agent node. T represents the reference response time issued by the system. tol This represents the system's maximum tolerable response time. If this tolerable value is exceeded, then S... delay =0; Behavioral stability score is expressed as: Where, n change This indicates the number of policy changes made by the current smart agent node in the last 10 rounds of evolution, derived from the state records of multiple rounds of policy adoption and update processes; The final collaborative score is calculated as a weighted average: S coop =0.4·S sync +0.3·S delay +0.3·S stab ; Among them, S coop This represents the collaboration score; a higher value indicates a higher degree of cooperation from the nodes towards the collaborative goal. coop The output value range is [0,1], which is used to determine whether policy boundary compression is needed.

10. The multi-agent autonomous collaborative distribution-microgrid regulation method as described in claim 9, characterized in that: Based on the collaborative relationships between microgrids in this iteration, after evaluating the collaborative scores of each entity, the following also applies: The collaborative score is uploaded to the local scheduling fusion node for processing, and a collaborative evaluation matrix is ​​constructed as an external correction factor for the multi-factor function structure. The local scheduling fusion node is established by dividing the power distribution area into multiple sub-regions during system operation. Each sub-region is equipped with a fusion node. At the end of the evolution cycle, all intelligent agent nodes report to the corresponding fusion node. The fusion node constructs the following collaborative evaluation matrix M from the collected data. coop : in, This represents the policy execution deviation score of the i-th smart agent node. Let represent the collaborative score of the i-th intelligent agent node, and n represent the total number of intelligent agent entities participating in the evolutionary game. Each row of the matrix corresponds to an entity participating in the game evolution. The first column stores the regulation deviation score of the entity, and the second column stores its collaborative ability score.

11. A multi-entity autonomous and collaborative distribution-microgrid control system, characterized in that: It includes a proxy modeling and strategy setting module, a game evolution strategy update module, and a feedback correction and boundary adjustment module; The agent modeling and strategy setting module is used to abstract each node in the microgrid and distribution network as an independent intelligent agent node, and to define a state vector for each agent node. The elements in the state vector include allowable boundaries, which are the influence relationships of the current agent node on other factors. The game evolution strategy update module is used to construct a data set under multiple dimensions based on the allowed boundary, and the combination of all dimensions of the data set forms a pre-selected strategy set; based on the impact of each group of strategies in the pre-selected strategy set on the current node, a multi-factor function structure for evaluating the strategy payoff is constructed to obtain the comprehensive payoff value of the current strategy, and the multi-factor function structure is used to reflect the actual payoff level of the group of strategies in the current state; A multi-round strategy adoption and update process is adopted to iteratively optimize the strategy selection behavior of each node, thereby ensuring that each agent node obtains the best strategy. The strategy selection trajectory, comprehensive benefit change curve and strategy replacement number of each agent node are recorded in the evolution process. Each iteration in the multi-round strategy adoption and update process represents a process of simultaneous evaluation and possible update of all strategies. The feedback correction and boundary adjustment module is used to collect several actual operation indicators to generate a strategy execution deviation score. The strategy execution deviation score is used to quantify the degree of deviation between the strategy and the expected control target. Based on the cooperation relationship between microgrids in this iteration, the collaborative score of each subject is evaluated. After each strategy evolution cycle is completed, if the strategy execution deviation score of a node is higher than the set upper limit threshold, the multi-factor function structure corresponding to it is adjusted. If the collaborative score of a node is lower than the set collaborative tolerance threshold, the system will perform boundary compression processing on the basis of the generated pre-selected strategy set. By limiting the upper limit of the adjustment capability and the lower limit of the control time window in its state vector, the current strategy space is convergently compressed, and the compressed strategy set is used as the update input for the next round of multi-round strategy adoption and update process.

12. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the multi-agent autonomous collaborative distribution-microgrid control method according to any one of claims 1 to 10.

13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the multi-agent autonomous collaborative distribution-microgrid control method as described in any one of claims 1 to 10.