A power distribution network voltage control method, device and equipment

By constructing a distributed power output probability model and a reinforcement learning model, and combining them with a mixed-integer second-order cone programming algorithm, reactive power regulation commands are generated, solving the problems of untimely and inaccurate voltage control in active distribution networks, and achieving efficient and safe voltage control.

CN122246761APending Publication Date: 2026-06-19ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID JIBEI ELECTRIC POWER CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID JIBEI ELECTRIC POWER CO LTD
Filing Date
2026-01-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods in active distribution networks suffer from problems such as untimely online strategy generation and limited control accuracy, making it difficult to effectively address the voltage control challenges brought about by the integration of a high proportion of renewable energy.

Method used

A probability model for the output of distributed power sources is constructed. By combining a reinforcement learning model and a mixed-integer second-order cone programming algorithm, reactive power regulation commands are generated. Through offline training and real-time fine-tuning, precise control of voltage fluctuations is achieved.

Benefits of technology

It enables rapid suppression of voltage fluctuations, improves the real-time safety and control accuracy of system operation, extends equipment life, and reduces the number of emergency interventions caused by prediction errors.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122246761A_ABST
    Figure CN122246761A_ABST
Patent Text Reader

Abstract

This invention relates to a distribution network voltage control method, apparatus, and equipment, belonging to the field of power grid operation optimization technology. The method includes: constructing a power output probability model for each distributed power source based on historical power output data; constructing a day-ahead voltage control model based on the power output probability model; determining, based on the day-ahead voltage control model, the first reactive power regulation command for multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within a first scheduling cycle; constructing a general reinforcement learning model based on the intraday predicted power output data of each distributed power source and the reference voltage of each node; transferring the parameters of the general reinforcement learning model to a target reinforcement learning model; fine-tuning the target reinforcement learning model based on the intraday actual power output data of multiple distributed power sources; and using the fine-tuned target reinforcement learning model to generate a second reactive power regulation command for multiple continuous voltage control devices in the distribution network within a second scheduling cycle.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments in this specification relate to the field of power grid operation optimization technology, specifically to a distribution network voltage control method, device, and equipment. Background Technology

[0002] To build a clean, low-carbon, efficient, and safe modern energy system, the large-scale integration of renewable energy has become a significant trend. Simultaneously, with the continuous growth of electricity load, the number of devices with flexible adjustment capabilities in the distribution network has increased substantially, enhancing the flexibility of grid dispatch and strengthening the coupling relationship between power sources, grid, load, and storage. However, this development trend also brings dual challenges: on the one hand, distributed power sources, represented by photovoltaics and wind power, are affected by the natural environment, exhibiting strong randomness and intermittency; on the other hand, the increasing variety and number of dispatching devices have led to a sharp increase in the dimensionality of system models, making control problems increasingly complex. Against this backdrop, active distribution networks have emerged. The uncertainty of renewable energy output, coupled with the high-dimensional complexity of dispatching models, places higher demands on the real-time dispatching and voltage control of the distribution network.

[0003] Currently, in the dispatching decisions of active distribution networks, predictive algorithms are typically used to obtain output forecast data for distributed generation. However, due to its inherent volatility and randomness, the prediction bias is often significant, directly affecting the effectiveness of subsequent optimized dispatching. To address this uncertainty, researchers have introduced various theoretical methods, mainly including chance-constrained programming, stochastic optimization, and robust optimization. Chance-constrained programming uses a probability density function to characterize the error distribution, transforming stochastic constraints into a deterministic form; stochastic optimization relies on output probability models, using Monte Carlo simulations to generate a large number of scenarios, which are then reduced to form a set of typical scenarios for optimization; robust optimization avoids using precise probability distributions, instead describing the fluctuation range based on an uncertain set and seeking the optimal solution in the worst-case scenario.

[0004] Nevertheless, the aforementioned existing methods generally suffer from problems such as untimely online strategy generation and limited control accuracy when dealing with voltage control issues in distribution networks with a high proportion of renewable energy integration, which restricts further improvement in the operating efficiency and safety level of the distribution network. Summary of the Invention

[0005] The purpose of the embodiments in this specification is to provide a distribution network voltage control method, device, and equipment to overcome the problems of untimely online strategy generation and limited control accuracy in existing methods.

[0006] To solve the above-mentioned technical problems, the specific technical solutions of the embodiments in this specification are as follows: On the one hand, the embodiments of this specification provide a distribution network voltage control method, including: Based on the historical output data of each distributed power source in the distribution network, a probability model of the output of each distributed power source is constructed. Based on the aforementioned output probability model, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits. Based on the day-ahead voltage control model, the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network are determined within the first scheduling cycle. Based on the intraday predicted output data of each distributed power source and the reference voltage of each node, a general reinforcement learning model is constructed. Transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; The target reinforcement learning model is fine-tuned based on the actual daily output data of multiple distributed power sources. The fine-tuned target reinforcement learning model is used to generate the second reactive power regulation command for multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles.

[0007] Furthermore, the construction of the output probability model for each distributed power source includes: For each distributed power source, based on its historical output data, a nonparametric kernel density estimation algorithm is used to construct the output probability density function of the active power output of the distributed power source.

[0008] Furthermore, based on the power output probability model, a day-ahead voltage control model is constructed with the objective of minimizing the active power loss of the distribution network and the node voltage exceedance risk index, including: Based on the output probability model of each distributed power source, the semi-invariants of each order of the output random variable corresponding to each distributed power source are determined by the semi-invariant algorithm. The semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltage of each node in the distribution network. Expanding the semi-invariants of each node voltage yields the probability distribution of each node voltage. Based on the probability distribution of voltage at each node, and with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits, a day-ahead voltage control model based on a mixed integer second-order cone programming algorithm is constructed.

[0009] Further, the step of converting the semi-invariants of the output random variables corresponding to each distributed power source into the semi-invariants of the voltages of each node in the distribution network includes: The AC power flow model of the distribution network is expanded and linearized using Taylor series at the reference operating point to obtain the linear sensitivity relationship between the power injection disturbance and the voltage disturbance at the distribution network nodes. Based on the linear sensitivity relationship, the semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltages of each node in the distribution network.

[0010] Furthermore, based on the probability distribution of voltage at each node, and with the objective of minimizing the active power loss of the distribution network and the node voltage exceedance risk index, a day-ahead voltage control model based on a mixed-integer second-order cone programming algorithm is constructed, including: Based on the probability distribution of voltage at each node, with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network, a day-ahead voltage control model is constructed, which assumes that the capacitor banks in the distribution network meet the constraints of daily switching frequency and single regulation capacity, the energy storage system in the distribution network meets the constraints of state of charge and charging / discharging power, the photovoltaic inverters in the distribution network meet the constraints of reactive power regulation capacity, and the nodes in the distribution network meet the constraints of voltage safety operation and line power flow.

[0011] Furthermore, the construction of a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node includes: Using the reference voltage as the control target and the intraday predicted power output data and the power grid status data of the distribution network as inputs, a deep deterministic strategy gradient algorithm is used for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation.

[0012] Furthermore, the offline training using a deep deterministic policy gradient algorithm yields a general reinforcement learning model for continuous voltage control device regulation, comprising: The state space, action space, and reward function of the general reinforcement learning model are constructed. The state space includes the voltage amplitude of each node in the distribution network, the predicted output data of the multiple distributed power sources, and time information. The action space includes the reactive power output setpoints of the multiple continuous voltage control devices. The reward function is constructed to be negatively correlated with the deviation of the node voltage from the reference voltage. An Actor-Critic network based on a deep deterministic policy gradient algorithm is constructed. The Actor network takes grid state data as input and outputs reactive power regulation action data. The Critic network takes grid state data and reactive power regulation action data as joint input and outputs the reward function Q value corresponding to the reactive power regulation action data. Based on the intraday predicted power output data, the Actor-Critic network is trained offline in a simulated power distribution network environment, and the parameters of the Actor network and Critic network are updated alternately using the gradient descent method. After the offline training is completed, the parameters of the Actor-Critic network are saved to obtain the general reinforcement learning model.

[0013] Furthermore, the fine-tuning of the target reinforcement learning model includes: Based on the preset large model, the network layer parameters corresponding to the wind and light fluctuation feature extraction in the target reinforcement learning model are maintained. Based on the actual output data during the day, the parameters of the remaining network layers in the target reinforcement learning model are fine-tuned.

[0014] Furthermore, embodiments of this specification provide a power distribution network voltage control device, comprising: The first construction module is used to construct the output probability model of each distributed power source based on the historical output data of each distributed power source in the distribution network. The second construction module is used to construct a day-ahead voltage control model based on the output probability model, with the goal of minimizing the active power loss of the distribution network and the node voltage over-limit risk index. The determination module is used to determine the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle, based on the day-ahead voltage control model. The third construction module is used to build a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node. The transfer module is used to transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; The fine-tuning module is used to fine-tune the target reinforcement learning model based on the actual daily output data of multiple distributed power sources. The generation module is used to generate second reactive power regulation instructions for multiple continuous voltage control devices in the distribution network within the second scheduling cycle using a fine-tuned target reinforcement learning model; each first scheduling cycle includes multiple second scheduling cycles.

[0015] As can be seen from the technical solutions provided in the embodiments of this specification above, the embodiments of this specification can construct output probability models for each distributed power source based on historical output data of each distributed power source in the distribution network; based on the output probability models, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss and node voltage exceedance risk index of the distribution network; based on the day-ahead voltage control model, the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network are determined within the first scheduling cycle; a general reinforcement learning model is constructed based on the intraday predicted output data of each distributed power source and the reference voltage of each node; the parameters of the general reinforcement learning model are transferred to the target reinforcement learning model; the target reinforcement learning model is fine-tuned based on the intraday actual output data of multiple distributed power sources; the fine-tuned target reinforcement learning model is used to generate the second reactive power regulation command of multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles. By constructing a distributed power source output probability model, the complex probability distribution characteristics (such as multi-peak and skewed) of photovoltaic and wind power output can be accurately characterized, thereby providing high-fidelity uncertainty input for subsequent stochastic optimization. Building upon this foundation, the constructed day-ahead voltage control model uses both a probability-based voltage exceedance risk index and system active power loss as optimization objectives, achieving a proactive quantitative trade-off between safety and losses. This model formulates globally optimal day-ahead switching plans for discrete devices with slow response times and limited action frequency (such as capacitor banks), effectively avoiding ineffective and frequent device actions and extending their service life while optimizing overall network operating losses. The generated reference voltage curves for each node provide a unified and coordinated tracking benchmark for intraday real-time control, ensuring the vertical coordination of multi-timescale control strategies. Through parameter migration technology, the online controller inherits mature control strategies learned from offline training from the outset, completely eliminating the control risks and performance instability caused by the randomness of strategies in the early stages of traditional online learning, achieving safe and smooth integration of AI controllers in critical power infrastructure. Furthermore, rapid fine-tuning of the model based on real-time data can promptly calibrate strategy deviations caused by prediction errors, enabling the system not only to cope with known fluctuations but also to safely and efficiently adapt to unknown real-time changes and slow time-varying variations. Ultimately, the fine-tuned target reinforcement learning model can instantly generate precise adjustment commands to coordinate multiple continuous voltage control devices based on real-time grid conditions, achieving rapid suppression of voltage fluctuations. This real-time control layer works in conjunction with the previously planned discrete device schedule, forming a complete closed-loop control system where the daily discrete devices construct a safe operating framework, while the intraday continuous devices perform rapid and precise fine-tuning. This architecture fully integrates the response characteristics of different control devices, significantly improving the real-time safety and control accuracy of the system while ensuring equipment lifespan and full-cycle losses. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments or prior art of this specification, the accompanying drawings used in the description of the embodiments or prior art will be briefly introduced below.

[0017] Figure 1 This is a flowchart of a power distribution network voltage control method provided in the embodiments of this specification; Figure 2 This is a schematic diagram of the overall process of a power distribution network voltage control method provided in the embodiments of this specification; Figure 3 This is a schematic diagram of the process of constructing a target reinforcement learning model in a distribution network voltage control method provided in the embodiments of this specification; Figure 4 This is a flowchart illustrating the construction of a general reinforcement learning model in a distribution network voltage control method provided in the embodiments of this specification; Figure 5 This is a schematic diagram of the structural composition of a power distribution network voltage control device provided in the embodiments of this specification; Figure 6 This is a schematic diagram of the structural composition of a computer device provided in the embodiments of this specification. Detailed Implementation

[0018] The technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.

[0019] It should be noted that the terms "first," "second," etc., used in this specification, claims, and the foregoing drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, apparatus, product, or device that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.

[0020] In some embodiments, in a power system, a node can be an intersection or location point on the electrical wiring diagram of a distribution network. At this point, various components of the power grid are connected together, and electrical energy is exchanged and balanced. A node can be a geometric location where voltage is defined and monitored. In a mathematical model, the voltage magnitude and phase angle of each node can be solved, and all power flows in the distribution network are calculated based on these node voltages.

[0021] A node can connect to and contain the following types of physical entities: A load can refer to various users or equipment that absorb electrical energy from the power grid, such as residential buildings, factories, and commercial centers. A load consumes both active and reactive power.

[0022] Distributed power sources, such as photovoltaic power generation systems, wind turbines, and micro gas turbines, inject active power into nodes and may also provide or absorb reactive power (if their inverters have reactive power regulation capabilities).

[0023] The centralized power supply or upstream grid connection point is the low-voltage side busbar of the transformer in the distribution substation, which is the main entry point for the distribution network to receive electrical energy from the high-voltage transmission network.

[0024] Reactive power compensation equipment may include parallel capacitor banks / reactors and static var generators / compensators. The parallel capacitor banks / reactors are directly connected to the nodes and are used to inject or absorb fixed or grouped switching capacitive or inductive reactive power; the static var generators / compensators are power electronic-based fast reactive power regulation devices, also connected to the nodes.

[0025] Energy storage systems, such as battery energy storage, have converters connected to nodes that can operate in charging (equivalent to load) or discharging (equivalent to power supply) modes as needed, and can operate in four quadrants (independently regulating active and reactive power).

[0026] The connection point between a line and a transformer is either an end point of the line or a side of the transformer. Lines and transformers are the edges connecting different nodes, and nodes are the vertices of these edges.

[0027] In some embodiments, the reference voltage can be a target voltage value set for each such node platform. By adjusting discrete devices (such as capacitors) and continuous devices (such as photovoltaic inverters and energy storage) connected to certain nodes, the actual voltage of all nodes is made as close as possible to their reference value, thereby ensuring the safe and stable operation of the entire network.

[0028] Figure 1 This is a flowchart of a power distribution network voltage control method provided in the embodiments of this specification. Figure 2This is a schematic diagram of the overall process of a power distribution network voltage control method provided in the embodiments of this specification. In specific implementation, it may include the following steps: S101: Based on the historical output data of each distributed power source in the distribution network, construct the output probability model of each distributed power source.

[0029] In some embodiments, step S101 may specifically include: for each distributed power source, constructing the power output probability density function of the distributed power source based on its historical power output data using a nonparametric kernel density estimation algorithm.

[0030] In some embodiments, a distribution network can be a medium- or low-voltage power distribution network that receives electrical energy from a high-voltage transmission network and distributes it to end users. The distribution network is subject to voltage control, and the integration of distributed power sources within it is a major factor causing voltage fluctuations.

[0031] In some embodiments, distributed power sources can be power generation devices that are distributed and installed in the distribution network or on the user side, such as photovoltaic power generation systems and wind power generation systems. Their power generation process depends on natural conditions (sunlight, wind speed), and is intermittent, random, and fluctuating. Their output cannot be accurately predicted, and they are the core source of uncertainty for voltage instability in the distribution network.

[0032] In some embodiments, historical power output data can be a time-series dataset composed of active power values ​​actually generated by each distributed power source at different points in the past, recorded over a long period by a data acquisition and monitoring system. This data forms the objective basis for analyzing the randomness of its power output.

[0033] In some embodiments, the output probability model can be a mathematical model used to quantify the uncertainty of distributed power generation output. The output probability model does not provide deterministic predictions, but rather the probability distribution of the power generation output occurring at different power levels. The purpose of constructing this model is to provide a mathematical foundation for subsequent stochastic optimization, enabling optimization decisions to consider all possible operating scenarios and their probabilities, rather than relying on a single, potentially inaccurate, point prediction.

[0034] In some embodiments, for the i-th distributed power source in the distribution network, its historical active power output samples for N consecutive days are collected to form a sample set {P1, P2, …, Pn}. Before building the model, the data can be cleaned to remove outliers and missing values ​​caused by communication failures, equipment maintenance, etc., to ensure the validity of the data.

[0035] In some embodiments, based on the above sample set, the probability density function of the distributed power output can be derived. : ; Traditional parametric methods (such as assuming that the output follows a specific parameter distribution, such as a normal or Weibull distribution) can be abandoned because the actual output distribution may exhibit complex shapes such as multimodal or skewed distributions, making it difficult to accurately fit with a single parameter distribution. Therefore, in some embodiments, a data-driven method, nonparametric kernel density estimation, can be used. Nonparametric kernel density estimation does not require prior assumptions about the shape of the output distribution and can directly learn from historical data to smoothly reconstruct the true probability density curve.

[0036] In some embodiments, the power output is based on a uniform kernel function and the probability density function of the distributed power source described above. The active power output probability density function based on nonparametric kernel density estimation is constructed as follows: ; In the formula, n is the total number of historical samples, and K(·) is the Gaussian kernel function, which can be used to smooth the diffusion of the influence of each sample.

[0037] In some embodiments, the cleaned historical power output samples {P1, P2, …, Pn} can be substituted into the above formula to obtain the continuous probability density function of the distributed power source's output. This function fully characterizes the probability that the power source's output will fall within any power range at any future time.

[0038] The output probability model established through nonparametric kernel density estimation can closely match the true stochastic characteristics of distributed generation output, especially effectively describing its multimodal (e.g., photovoltaic output clustering around zero power, medium power, and peak power) and asymmetric distribution. This provides high-quality, high-fidelity uncertainty input for subsequent stochastic optimization. Furthermore, based on this accurate probability model, day-ahead optimization can mathematically consider all possible output scenarios and their probabilities, thereby solving for a scheduling plan with the lowest expected cost and controllable voltage exceedance risk. Compared to deterministic optimization relying on a single prediction scenario or methods using coarse probabilistic assumptions, this approach can further reduce network losses while ensuring safe system operation (low risk). In addition, the nonparametric method does not depend on specific distribution assumptions, therefore it can be applied to any type of distributed generation (photovoltaic, wind power, etc.) and any geographical and climatic region. Only local historical data needs to be input to automatically build an adapted probability model, exhibiting good universality and adaptability, avoiding the degradation of control performance caused by model assumptions not matching reality.

[0039] S102: Based on the aforementioned output probability model, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits.

[0040] In some embodiments, step S102 may specifically include: determining the semi-invariants of the output random variables corresponding to each distributed power source based on the output probability model of each distributed power source using a semi-invariant algorithm; converting the semi-invariants of the output random variables corresponding to each distributed power source into the semi-invariants of the voltages of each node in the distribution network; expanding the semi-invariants of the voltages of each node to obtain the probability distribution of the voltages of each node; and constructing a day-ahead voltage control model based on a mixed-integer second-order cone programming algorithm, with the goal of minimizing the active power loss of the distribution network and the node voltage over-limit risk index.

[0041] In some embodiments, active power loss can be the unavoidable power loss converted into heat energy during the transmission of electrical energy in the distribution network due to line resistance, transformers and other equipment, and its value is directly related to the system power flow (especially the square of the current).

[0042] In some embodiments, the node voltage exceedance risk index can be a quantitative probabilistic safety indicator used to measure the likelihood that the voltage of a node in the distribution network will deviate from the allowable operating range under conditions of uncertain distributed generation output. It is not a deterministic yes or no judgment, but a risk pre-assessment based on a probabilistic model. The lower the value, the higher the voltage safety margin and the more robust the system operation.

[0043] In some embodiments, the day-ahead voltage control model can be a mathematical optimization model. Its inputs are the output probability model constructed in step S101, as well as distribution network topology information, load forecasts, equipment parameters, etc.; its decision variables are the switching plans of discrete voltage control devices (such as capacitor banks) for the entire next day; its objective is to achieve optimal grid operation (such as minimum loss and minimum risk) while satisfying all physical and safety constraints; its output is the optimal equipment scheduling instructions and the voltage reference trajectory of each node.

[0044] In some embodiments, semi-invariants can be numerical features (such as mean, variance, skewness, etc.) used to describe the probability distribution characteristics of random variables. Compared with raw moments or central moments, semi-invariants have better additivity: the semi-invariant of the sum of multiple independent random variables is equal to the sum of the semi-invariants of each variable.

[0045] In some embodiments, mixed-integer second-order cone programming can be a special type of convex optimization problem. Mixed-integer can mean that the optimization variables simultaneously include continuous variables (such as energy storage output) and integer variables (such as the number of capacitor banks switched on and off); the second-order cone programming can include second-order cone constraints in the constraints, which can efficiently and accurately describe the nonlinear relationships formed by the power flow equations (after convex relaxation) in the distribution network.

[0046] In some embodiments, for each distributed power source, the first few orders (e.g., orders 2-4) of semi-invariants of the random power output variable can be calculated based on its output probability density function. The semi-invariants can be obtained through its central moments or cumulants generation functions, which encapsulate the core statistical characteristics of the probability distribution of the random variable (such as fluctuation amplitude, distribution skewness, etc.).

[0047] In some embodiments, the nonlinear AC power flow equations describing the physical laws of the distribution network can be expanded using Taylor series at a certain baseline operating point (the operating state composed of the predicted expected values ​​of each variable), and second-order and higher-order terms can be ignored, thereby obtaining a linearized incremental model. This linearized model can clearly define the sensitivity path of random fluctuations affecting the system state.

[0048] In some embodiments, the additivity of semi-invariants and the aforementioned linearization model can be utilized to linearly superimpose the semi-invariants of each order of random fluctuations in each distributed power source according to a linear sensitivity relationship, thereby efficiently calculating the semi-invariants of each order of voltage fluctuations at each node of the system. Subsequently, mathematical methods such as Gram-Charlier series expansion or Cornish-Fisher expansion are used to reduce the semi-invariants of the node voltages to their approximate probability density functions or cumulative distribution functions. Thus, a complete probabilistic description of the node voltages can be obtained without enumerating massive amounts of scenarios.

[0049] In some embodiments, the hard requirement that the voltage must be kept within the limit is transformed into a probabilistic soft constraint by using the obtained node voltage probability distribution; that is, the probability of the voltage exceeding the limit must not exceed a preset minimum value. This constitutes the risk constraint of the model.

[0050] In some embodiments, the objective function can be designed as follows: ; In the formula: t is the scheduling time; Represents a set of distribution network lines; Represents the set of nodes in a distribution network; Represents the ESS set; Represents the PV set; This represents the resistance of the k-th line; This represents the current in the k-th line at time t; This represents the active power of the i-th node at time t; and These represent the charging and discharging power of the i-th ESS at time t, respectively. This indicates that the i-th PV output has active power at time t; This is due to network overhead.

[0051] In some embodiments, the voltage safety operation of the distribution network can be considered simultaneously, transforming the voltage safety constraints of the distribution network into an objective function that minimizes the node voltage over-limit risk index. ; In the formula: Indicates a distribution network node; and The first Maximum and minimum values ​​of node voltage; This is the probability density function for the risk of voltage exceeding limits.

[0052] In some embodiments, the objective function, probabilistic safety constraints (which can be equivalently transformed into deterministic constraints), equipment physical constraints (such as the number of capacitor switching operations, energy storage balance, etc.), and power flow equations treated with convex relaxation (such as the second-order cone form) can be collectively constructed into a complete mixed-integer second-order cone programming model. Finally, a high-performance commercial solver is invoked to solve this convex optimization model, thereby obtaining the optimal day-ahead scheduling plan for discrete equipment in the sense of global optimum.

[0053] By combining linearization and semi-invariant methods, the precise probability distribution of node voltages can be obtained efficiently and analytically without requiring tens of thousands of scenario simulations. This solves the problem of low computational efficiency caused by the curse of dimensionality in traditional stochastic optimization, making online computation feasible for refined probabilistic risk assessment of complex distribution networks. Furthermore, the voltage exceedance risk index based on probability theory is used as one of the explicit optimization objectives, optimized together with the active power loss objective. This ensures that the optimization result is no longer solely driven by the risk of reducing active power loss or solely by the conservatism of safety, but rather proactively and quantitatively finds the Pareto optimal frontier between active power loss and safety based on the risk preferences set by decision-makers, achieving intelligent risk management. Because the optimization process fully considers the probability weights of all possible output scenarios, the generated discrete equipment switching plans are inherently robust to uncertain futures. Compared to plans that rely on a single predicted scenario, the generated plans can ensure system safety in most possible actual situations, significantly reducing the number of emergency interventions required due to prediction deviations and improving the authority and executability of scheduling operations. Furthermore, by employing a convex optimization framework of second-order cone programming to relax the power flow equations, the originally non-convex and difficult problem is transformed into a convex problem. This ensures that the obtained solution is either the globally optimal solution or a high-quality feasible solution with a very small duality gap, avoiding the tendency of traditional algorithms to get trapped in local optima, thus ensuring the theoretical optimality and practical efficiency of the day-ahead scheduling plan.

[0054] S103: Based on the day-ahead voltage control model, determine the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle.

[0055] In some embodiments, step S103 may specifically include: determining the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle based on the day-ahead voltage control model.

[0056] In some embodiments, the first scheduling period can be a pre-planned time range. For example, it can be a complete calendar day (24 hours) divided into several equal-length scheduling periods (e.g., 96 15-minute periods). Each scheduling period represents a second scheduling period. The output of step S103 above can be a global pre-arrangement for this period.

[0057] In some embodiments, discrete voltage control devices can be reactive power regulating devices with a step-type or switching operation, including capacitor banks. Their working principle involves connecting or disconnecting one or more fixed-capacity capacitor banks from the power grid via mechanical or power electronic switches, thereby step-changing the reactive power injection at the node. Due to the limited lifespan of their mechanical switches, there is a strict upper limit to the number of daily operations for such devices.

[0058] In some embodiments, the first reactive power regulation command may be a control output generated during the day-ahead optimization phase. It is a series of time-discrete, formatted commands that explicitly specify the state that each discrete voltage control device should be in during each scheduling period of the first scheduling cycle (e.g., for capacitor banks, whether it is connected or disconnected, or the specific number of banks connected). The physical essence of this command is that it specifies the discrete reactive power values ​​that the device should inject into or absorb from the grid in future time periods; hence, it is called a reactive power regulation command.

[0059] In some embodiments, the reference voltage can be another output generated during the day-ahead optimization phase. It is a time-series vector that defines a target voltage value to be achieved or approximated for each node in the distribution network during each scheduling period of the first scheduling cycle. The reference voltage curve is an optimal operating trajectory found by the day-ahead model after comprehensively considering active power loss, security, and equipment operation constraints. It serves as the benchmark for evaluating voltage deviation and triggering adjustment actions during the intraday real-time control phase.

[0060] In some embodiments, a mathematical optimization solver (such as CPLEX or Gurobi) can be invoked to solve the day-ahead voltage control model of the mixed-integer second-order cone programming established in step S102. The solver uses algorithms such as the interior-point method and branch-and-bound to find a global or near-global optimal solution that minimizes the objective function (the weighted sum of active power loss and risk indicators). This solution contains the optimal values ​​of all optimization variables over the entire scheduling period.

[0061] In some embodiments, the binary or integer decision variable values ​​u_{i,t} of all discrete voltage control devices (indexed i) for all scheduling periods t can be extracted from the solution results. For example, for capacitor banks, u_{i,t} = 0 indicates disconnection, and u_{i,t} = 3 indicates the activation of 3 capacitor banks.

[0062] In some embodiments, these numerical sequences can be converted into a sequence of control instructions that can be directly recognized and executed by the device controller. The instruction format may include: device ID, timestamp (or time period number), target action or status. For example, {Device: CB_01, Time: 08:00, Instruction: Activate Group 2}. This complete set of instructions, ordered by time, constitutes the first reactive power regulation instruction.

[0063] In some embodiments, the optimal values ​​of continuous variables U_{j,t} related to the voltage amplitude of each node can be extracted from the same solution result, where j is the node index and t is the time period index. These values ​​represent the expected voltage level of each node in the corresponding time period under the optimal equipment scheduling plan. These optimal voltage values ​​can be organized by node and time to form a reference voltage matrix or curve. The reference voltage set reflects the globally optimal operating state that the system can achieve under the day-ahead forecast information.

[0064] In some embodiments, the generated first reactive power regulation command can be substituted into the power flow calculation to verify whether the obtained node voltage is highly consistent with the reference voltage output by the model, so as to ensure the internal consistency and physical feasibility of the optimization results.

[0065] By concretizing and operationalizing the abstract mathematical model into explicit instructions that the equipment controller can directly understand and execute, advanced stochastic optimization results can be directly transformed into reliable bases for driving physical equipment actions, ensuring the feasibility of the technical solution. The reference voltage is both a reflection of the day-ahead optimization results and a target benchmark for intraday real-time control. By transmitting it to the intraday control module (such as a reinforcement learning agent), seamless connection and target coordination between the day-ahead macro-level layout and the intraday micro-level adjustment are achieved, avoiding control conflicts caused by inconsistent targets between the two stages. Furthermore, the first reactive power regulation instruction obtained by solving the optimization model is a globally optimal plan derived under strict constraints on the number of equipment actions. This effectively avoids ineffective or frequent actions of discrete equipment, extends its mechanical life, and maximizes its regulation benefits by optimizing the switching timing, improving system efficiency from both equipment management and energy consumption operation perspectives.

[0066] S104: Construct a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node.

[0067] In some embodiments, step S104 may specifically include: taking the reference voltage as the control target, taking the intraday predicted power output data and the power grid state data of the distribution network as input, and using a deep deterministic policy gradient algorithm for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation.

[0068] In some embodiments, using the reference voltage as the control target can be achieved by setting the reference voltage curves of each node generated in step S103 as the expected voltage values ​​that the agent needs to track when constructing the environment and reward mechanism of the reinforcement learning model. (Refer to...) Figure 4 During model training, the reward signal obtained by the agent is directly negatively correlated with the degree to which the actual voltage of the power grid deviates from the reference voltage under its control action, thereby driving its policy learning to evolve in the direction of minimizing voltage deviation.

[0069] In some embodiments, intraday predicted power output data can be one of the input features used for model training, mainly including a sequence of predicted active power values ​​from distributed power sources such as photovoltaics and wind power in the near future. This data represents a forward-looking description of the core sources of uncertainty in the simulated operating scenario during the training phase, enabling the agent to learn preventative and coordinated control patterns under expected fluctuations.

[0070] In some embodiments, grid status data of the distribution network, as another input, may include, but is not limited to: real-time voltage amplitudes at each node, line power flow information, active and reactive load data, and network topology connections. This data provides the agent with complete contextual information reflecting the real-time operating status of the system, which is necessary for decision-making.

[0071] In some embodiments, offline training can be performed entirely within a high-fidelity digital twin simulation environment of a power distribution network. In this environment, the agent can undergo large-scale, high-speed, and absolutely safe trial-and-error learning based on historical data or synthesized various typical and extreme operating scenarios (covering different weather and load combinations), allowing it to accumulate rich regulatory experience without interacting with the real physical power grid.

[0072] The resulting general reinforcement learning model is an agent that has internalized how to implement fast and coordinated voltage control strategies under various prediction scenarios. The model's parameters form a transferable knowledge base, encapsulating general knowledge about power grid dynamic response and equipment coordination. This provides a high-performance, high-security starting point for policy initialization of online controllers deployed in real-world environments, thus solving the problems of slow training and high initial risk in real-time control using deep reinforcement learning.

[0073] S105: Transfer the parameters of the general reinforcement learning model to the target reinforcement learning model.

[0074] In some embodiments, step S105 may specifically include: transferring the parameters of the general reinforcement learning model to the target reinforcement learning model.

[0075] In some embodiments, parameter transfer, in the field of machine learning, refers to the process of applying model parameters (i.e., the weights and biases of a neural network) trained on one task (source task) as initial values ​​to the model building process of another similar task (target task). Its core idea is to reuse learned, generalizable feature representations or decision patterns, rather than learning from random states, thereby significantly improving the learning efficiency and final performance of the new task.

[0076] In some embodiments, the target reinforcement learning model can be a reinforcement learning agent deployed in a real-time online distribution network environment to make decisions. Its network architecture (such as an Actor-Critic structure) is exactly the same as that of a general reinforcement learning model, but the task environment it faces has subtle differences: its input is based on real-time monitoring data, rather than prediction data, and it needs to interact directly with the real physical power grid in a closed loop. Therefore, it is an online controller that needs to make fast, safe, and adaptive decisions based on real-time feedback.

[0077] In some embodiments, the task scenario of the general reinforcement learning model is voltage control based on intraday forecast data. This is a scenario built in a digital simulation environment, characterized by data sourced from historical and forecast sources, covering various possible but not real-time weather and load combinations (e.g., sunny, cloudy, rainy, typical days in different seasons). In this scenario, the model learns a general strategy of "how to adjust if a certain predicted fluctuation occurs." The task scenario of the objective reinforcement learning model is voltage control based on actual data. This is a real-time operation scenario of a real physical power grid, characterized by instantaneous, unique data containing unknown fluctuations (e.g., a sudden drop in photovoltaic power caused by an unforeseen cloud). In this scenario, the model needs to solve the specific problem of how to precisely adjust in the face of actual fluctuations occurring at that moment. The core control logic (state-action mapping) of the two scenarios is highly similar, but the distribution of input data is offset. This high similarity makes the deep knowledge learned by the general reinforcement learning model regarding power grid dynamic response, equipment coordination patterns, etc., highly reusable.

[0078] In some embodiments, the neural network architecture of the target reinforcement learning model (including the number of layers in the Actor and Critic networks, the number of neurons per layer, the type of activation function, etc.) is ensured to be completely consistent with the general reinforcement learning model. This is a structural prerequisite for direct parameter transfer. All network parameters trained and converged in the general reinforcement learning model can be completely copied and assigned to the corresponding network parameters of the target reinforcement learning model.

[0079] In general reinforcement learning models, the parameters of the policy network are crucial. This policy network directly maps the grid state to control actions, and its parameters embed the core decision-making logic and coordination rules regarding voltage regulation, serving as the main body for cross-scenario knowledge reuse. Inheriting these parameters provides online controllers with readily available and reliable basic policies, enabling hot starts and significantly shortening their convergence time to adapt to real-time fluctuations. Simultaneously, the parameters of the value network are selectively transferred. The value network contains prior assessments of grid operating losses and safety; transferring its parameters helps the target reinforcement learning model quickly establish a value judgment benchmark for new real-time data, improving initial learning stability and efficiency. This hierarchical and focused transfer strategy maximizes the reuse of general knowledge while laying an optimal foundation for rapid and safe fine-tuning of the target reinforcement learning model.

[0080] In some embodiments, after parameter transfer is completed, the target reinforcement learning model is in a pre-trained ready state. It already possesses a high-performance basic control strategy, can be immediately deployed online, and can generate relatively reasonable and safe initial control commands, thereby completely avoiding the cold start risk and highly random exploration phase in the early stages of online learning.

[0081] Traditional deep reinforcement learning online training must start with random policies, which in the early stages of exploration outputs a large number of invalid or even harmful random actions, potentially endangering power grid safety, and the learning convergence is slow. Through parameter transfer, the online controller possesses near-mature control capabilities from the very first moment, and the initial output commands already have the basic effectiveness of maintaining voltage stability. This achieves a safe, smooth, and performance cliff-free transition from offline to online operation, meeting the stringent safety and real-time requirements of industrial control systems.

[0082] Furthermore, migrating the parameters of the general reinforcement learning model to an independent target reinforcement learning model, rather than directly fine-tuning the general reinforcement learning model itself, is based on considerations of the distribution network's security, stability, and maintainability. Parameter migration can protect and solidify the general knowledge base: the general reinforcement learning model is trained on rich and diverse offline prediction data, and its parameters contain general strategies and value judgments for dealing with various possible scenarios, making it a valuable reusable asset for the system. If directly fine-tuned, real-time data will continuously overwrite its parameters, leading to catastrophic forgetting and causing it to quickly lose its generalization ability and long-term robustness. By creating a copy of the target reinforcement learning model through migration, a safe decoupling of knowledge inheritance and real-time adaptation is achieved: the target reinforcement learning model can inherit high-performance initial strategies, achieving rapid hot start and convergence, while all its adjustments are strictly limited to the copy, without affecting the purity and stability of the general reinforcement learning model. This architecture ensures that the system has a reliable and unchanging knowledge base at all times, supporting rapid fault isolation and safe policy rollback, thereby meeting the stringent standards of operational determinism, adaptability, and long-term reliability for critical power infrastructure.

[0083] S106: Fine-tune the target reinforcement learning model based on the actual daily output data of multiple distributed power sources.

[0084] In some embodiments, step S106 may specifically include: maintaining the network layer parameters corresponding to the wind and solar fluctuation features extracted in the target reinforcement learning model based on a preset large model; and fine-tuning the remaining network layer parameters in the target reinforcement learning model based on the actual daily power output data.

[0085] In some embodiments, the actual daily output data can be the active power value actually generated by each distributed power source, collected in real time by measuring devices (such as smart meters, SCADA systems) on the day the distribution network is actually operating. Compared with the predicted daily output data, it can reflect the true and accurate operating status of the power grid, but it also includes real-time fluctuations and short-term uncertainties.

[0086] In some embodiments, refer to Figure 3Fine-tuning can be a transfer learning technique, which involves making small, targeted updates to some or all of the parameters of a pre-trained model for a target task, so that the model can better adapt to the data distribution of the target task. It can maintain the general knowledge (feature extraction ability) already learned by the pre-trained model and only adjust the specific parts responsible for the final decision, thereby achieving fast and stable domain adaptation.

[0087] In some embodiments, the pre-set large model can be a large language model that has been fully pre-trained on large-scale data (such as historical and predicted data from multiple scenarios) and has strong representation learning capabilities.

[0088] In some embodiments, freezing network layer parameters can be a specific fine-tuning strategy. Neural networks consist of multiple layers; shallow layers tend to learn general, low-level features (such as edges and textures), while deeper layers learn high-level features and decision logic relevant to a specific task. Freezing can be a process of fixing (not updating) the parameters of certain layers in the model during fine-tuning, keeping them unchanged during training on the target task, thereby preserving the general knowledge learned in the source task.

[0089] In some embodiments, the network layers for wind and solar power fluctuation feature extraction can be neural network layers in the Actor and Critic networks responsible for identifying and abstracting the randomness, periodicity, and fluctuation patterns of wind and solar power output from the raw input data (especially distributed power generation sequences). These can be the first few layers of the network or the feature encoder portion.

[0090] In some embodiments, the network structure of the target reinforcement learning model (i.e., the model after parameter transfer) can be analyzed. Based on prior knowledge or experimental analysis, it can be identified which layers in the network are mainly responsible for learning the general spatiotemporal characteristics of wind and solar power output fluctuations (e.g., the first few layers of a convolutional layer or a recurrent neural network), and which layers are mainly responsible for mapping high-level features to control actions for specific grid topology and operating constraints (e.g., fully connected output layers).

[0091] In some embodiments, the parameters of the identified network layers responsible for extracting wind and light fluctuation features (e.g., the first to third hidden layers of an Actor network) can be set to an untrainable state. Technically, this means that the parameter gradients of these layers are ignored during backpropagation, and their values ​​remain constant throughout the fine-tuning process. This aims to lock in the universal understanding of the random nature of wind and light learned by general reinforcement learning models from massive amounts of prediction data.

[0092] In some embodiments, during real-time operation of the power grid, status information, including actual daily power output data and measured node voltage values, is continuously collected. This status information can be input into a target reinforcement learning model. Inside the model, frozen network layers extract features using their fixed parameters, while unfrozen network layers use currently trainable parameters to perform calculations, ultimately outputting control actions. These control actions are then sent to continuous voltage control equipment for execution.

[0093] In some embodiments, feedback and experience collection can be the generation of new states and rewards (calculated based on the deviation between the actual voltage and the reference voltage) in the environment (distribution network) after an action is performed. Experience samples can be stored in an online experience playback pool.

[0094] In some embodiments, small batches of data can be periodically sampled from the online playback pool to calculate the loss function. During backpropagation, only the parameters of the remaining network layers that were not frozen are calculated and updated. These layers are responsible for the final output of the decision, and their adjustments enable the model to quickly learn how to combine general wind and solar fluctuation characteristics with the real-time topology, load distribution, and equipment status of the current power grid to generate more accurate and realistic reactive power regulation commands.

[0095] Through this layered, partially frozen fine-tuning approach, the target reinforcement learning model can smoothly and stably adapt to the real-time operating environment without destroying existing valuable knowledge, and its control strategy continuously optimizes over time. Specifically, by freezing the general feature extraction layer, the model's core understanding of the randomness of wind and solar energy remains stable when adapting to real-time data, fundamentally avoiding drastic changes or collapses in the control strategy that might occur due to instability in the early stages of online learning, greatly enhancing the safety of the online learning process. Simultaneously, fine-tuning only a small number of high-level parameters significantly reduces the dimensionality of the parameter space that needs optimization, enabling the model to converge rapidly in a very short time using limited real-time data, meeting the stringent real-time requirements of voltage control for minute-level or even second-level responses. Furthermore, freezing the basic feature layer ensures that the model's understanding of wind and solar energy fluctuations—the core source of uncertainty—is not eroded by subsequent real-time data. This allows the control strategy to maintain robust, principle-based judgment capabilities even when facing real-time fluctuation patterns that differ from the training data distribution and are unseen before, significantly improving the long-term robustness and generalization performance of the entire voltage control system.

[0096] S107: Use the fine-tuned target reinforcement learning model to generate the second reactive power regulation command for multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles.

[0097] In some embodiments, step S107 may specifically include: using a fine-tuned target reinforcement learning model to generate a second reactive power regulation command for multiple continuous voltage control devices in the distribution network during the second scheduling cycle.

[0098] In some embodiments, the second scheduling period can be a short time window for real-time or near-real-time control on the actual operating day. Each first scheduling period may include multiple second scheduling periods. The first scheduling period may be in the unit of days, while the second scheduling period may be in the unit of minutes or even seconds (e.g., 5 minutes, 1 minute). Within each first scheduling period, rolling optimization and execution are performed on the multiple second scheduling periods contained therein, thereby quickly responding to instantaneous changes in the system state.

[0099] In some embodiments, the fine-tuned target reinforcement learning model can be a reinforcement learning agent that has been dynamically adapted to the current actual operating environment after being fine-tuned online based on real-time data in step S106. Its neural network parameters have been optimized and calibrated for the real-time state of the current power grid and prediction errors, based on inheriting the general knowledge of the general reinforcement learning model, becoming a high-precision policy function specifically for real-time decision-making in the current time period.

[0100] In some embodiments, continuous voltage control devices, as opposed to discrete devices, refer to reactive power compensation devices capable of continuous, smooth, and stepless adjustment of output. These mainly include power conversion systems in photovoltaic inverters, wind turbine converters, and energy storage systems. They can receive continuous setpoint commands and rapidly (milliseconds to seconds) adjust their reactive power output, serving as the core actuators for achieving precise voltage control at the second / minute level.

[0101] In some embodiments, the second reactive power regulation command may be an output generated during the intraday real-time optimization phase. It comprises a series of continuous, precise numerical commands that directly specify the reactive power setpoint that each continuous voltage control device should output in the current and several future short time periods. This command is generated in real-time by a fine-tuned target reinforcement learning model based on the latest grid conditions, and is used to quickly and precisely smooth voltage fluctuations.

[0102] In some embodiments, at the beginning of each second scheduling cycle, the latest operating status of the distribution network is acquired in real time through a data acquisition and monitoring system to construct the state input of the reinforcement learning model. This state includes at least: the measured voltage values ​​of each node; the actual daily output data of each distributed power source; real-time or ultra-short-term load forecast data; current time information; and the node reference voltage for that period obtained from the day-ahead plan.

[0103] In some embodiments, the constructed real-time state vector can be input into the Actor network of the fine-tuned target reinforcement learning model. The network performs high-speed calculations based on its fine-tuned network parameters using a forward propagation method, and instantaneously outputs a continuous action vector. This vector is the second reactive power regulation command, and each element corresponds to the reactive power setpoint of a continuous voltage control device (such as the i-th photovoltaic inverter) in the next time period.

[0104] In some embodiments, the generated second reactive power regulation command can be reliably and promptly transmitted to the local controller of each corresponding continuous voltage control device via a communication network. Upon receiving the command, the device controller immediately adjusts the internal control loop of its converter, enabling its reactive power output to track the command set value within seconds.

[0105] In some embodiments, after executing an instruction, the system enters the next cycle, acquires a new state, and calculates a reward. This sample can be used to continue the online experience replay pool and trigger the next round of model fine-tuning. This allows the objective reinforcement learning model to continuously learn from actual closed-loop feedback while being provided with control instructions, constantly evolving its policy.

[0106] Leveraging the millisecond-level forward inference capability of the fine-tuned target reinforcement learning model, the optimal reactive power compensation command can be calculated within an extremely short time interval to address real-time fluctuations in the power grid (such as sudden drops in photovoltaic power caused by cloud cover), driving continuous equipment to operate rapidly. This achieves a rapid closed loop of voltage deviation perception-decision-execution, effectively suppressing voltage fluctuations at their inception and significantly improving power quality. Furthermore, the generated second reactive power regulation command works in conjunction with the previously formulated first reactive power regulation command, forming a complete voltage control solution that coordinates multiple time scales and multiple equipment types. In addition, the fine-tuned model can flexibly and non-linearly coordinate multiple continuous devices based on real-time data, and its control strategy surpasses that of traditional controllers based on fixed rules or linear models.

[0107] In some embodiments, the conversion of the semi-invariants of the output random variables corresponding to each distributed power source into the semi-invariants of the voltages of each node in the distribution network may specifically include: performing a Taylor series expansion and linearizing on the AC power flow model of the distribution network at a reference operating point to obtain a linear sensitivity relationship between the power injection disturbance and the voltage disturbance at the distribution network nodes; and converting the semi-invariants of the output random variables corresponding to each distributed power source into the semi-invariants of the voltages of each node in the distribution network according to the linear sensitivity relationship.

[0108] In some embodiments, the baseline operating point can be a specific distribution network operating state selected as the analysis benchmark when constructing the linearization model. This state can be determined by the predicted expected output of each distributed generation source, the predicted load, and the network topology, and a set of corresponding node voltages and phase angles can be obtained through deterministic power flow calculations. The linearization process will revolve around this point, analyzing the impact of small disturbances.

[0109] In some embodiments, by performing a Taylor series expansion at a baseline operating point and retaining only first-order terms while ignoring all higher-order terms (such as quadratic and cubic terms), the original nonlinear relationship can be approximated as a linear relationship in the vicinity of that point. This approximation has sufficient accuracy when the perturbation is small.

[0110] In some embodiments, node injected power disturbances can be the deviations of the net active and reactive power injected by each node relative to their corresponding values ​​at a reference operating point. For distributed generation nodes, the disturbances arise from the random deviations between their actual output and the predicted expected values.

[0111] In some embodiments, node voltage disturbances can be deviations in the voltage magnitude and phase angle of each node relative to their corresponding values ​​at a reference operating point.

[0112] In some embodiments, the linear sensitivity relationship can be a mathematical model derived through the above linearization process that describes the causal relationship between node injected power disturbances and node voltage disturbances.

[0113] In some embodiments, at the baseline operating point, the nonlinear AC power flow equations are expanded using Taylor series, and higher-order terms of order two and above are ignored. This mathematical processing yields a linearized incremental model. This model clearly shows that when the injected active and reactive power (including fluctuations in the actual output of distributed generation relative to its predicted value) at each node undergoes a small change, the resulting changes in the voltage amplitude and phase angle at each node can be approximated by a fixed linear coefficient matrix, i.e., a sensitivity matrix. Each element in this matrix represents the degree of direct impact of a unit change in injected power at one node on the voltage of another node, i.e., the sensitivity.

[0114] In some embodiments, the power flow model of the distribution network can be linearized using a Taylor series expansion based on a selected baseline operating point: ; In the formula: S is the injected power of each node; X represents the random variables of the node, mainly including the voltage amplitude and phase angle of each node; S0 and X0 are the expected values ​​of S and X when the power system is running under normal conditions, respectively; ΔS and ΔX are the uncertain fluctuations of the injected power and the resulting state changes in the distribution network; J0 is the final Jacobian matrix obtained after the Newton-Laurent iteration; T0 is the sensitivity matrix of branch power to injected power.

[0115] In some embodiments, a mathematical model describing the physical laws of the distribution network, namely the AC power flow equations, is established. This set of equations is inherently nonlinear, and directly applying it to stochastic analysis would be computationally complex. To simplify the analysis, a representative baseline operating point is selected. This point can be determined by the predicted expected output of each distributed power source, the predicted load, and the network topology, and a set of corresponding node voltage and phase angle baseline values ​​can be obtained through deterministic power flow calculations.

[0116] In some embodiments, the change in node voltage can be characterized as a linear combination of the changes in injected power at each node. The random fluctuations in the output of each distributed power source are the main source of these random variations in injected power. According to the property of semi-invariants—in a linear combination, the semi-invariants of each order of the output variable can be obtained by linearly mapping the corresponding order of the input variable through the same linear combination coefficients (i.e., the sensitivity matrix). Specifically, the semi-invariants (first, second, third, etc.) of each distributed power source's output random variable (obtained from its probability model) are considered as a set of input signals. Then, using the linear sensitivity matrix obtained in the first step as the transfer function, these input signals are linearly transformed. This transformation process is mathematically equivalent to multiplying the sensitivity matrix by a vector composed of the semi-invariants of each power source (performed separately for each order of semi-invariant). The final calculated result is the semi-invariant of each order of voltage change at each node in the distribution network. Since the voltage at the reference operating point is known, combining the semi-invariants of the voltage changes with the reference value yields the semi-invariant of the total node voltage.

[0117] Traditional stochastic power flow analysis requires massive Monte Carlo simulations, resulting in a heavy computational burden. By combining a linearized model with semi-invariant analytical transfer, complex probabilistic calculations are transformed into deterministic algebraic operations based on the sensitivity matrix. This method completely avoids scenario generation and repetitive power flow calculations, enabling accurate assessment of the voltage probability characteristics of all nodes within seconds or milliseconds, even in distribution networks with numerous distributed power sources, achieving an order-of-magnitude improvement in computational efficiency. Furthermore, by establishing and utilizing linear sensitivity relationships, not only are quantitative results of the final voltage uncertainty provided, but more importantly, the complete propagation path and impact of uncertainty from its source (distributed power sources) to its end (node ​​voltages) are clearly revealed. This allows for accurate identification of the nodes most sensitive to voltage fluctuations and the power sources most affected, providing precise decision-making basis for targeted monitoring enhancement, layout optimization, or the development of preventative control measures, achieving a leap from vague perception to precise insight.

[0118] In some embodiments, the above-mentioned day-ahead voltage control model based on the mixed-integer second-order cone programming algorithm is constructed with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network according to the probability distribution of the voltage of each node. Specifically, it may include: constructing a day-ahead voltage control model with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network according to the probability distribution of the voltage of each node, and with the following constraints: the capacitor bank in the distribution network meets the daily switching frequency constraint and single regulation capacity constraint; the energy storage system in the distribution network meets the state of charge constraint and charging and discharging power constraint; the photovoltaic inverter in the distribution network meets the reactive power regulation capability constraint; and the nodes in the distribution network meet the voltage safety operation constraint and line power flow constraint.

[0119] In some embodiments, active power loss can be the power lost as heat due to resistance during the transmission of electrical energy in the lines and transformers of the distribution network, and its value is proportional to the square of the branch current.

[0120] In some embodiments, the node voltage exceedance risk index can be a probability-based quantitative safety indicator. It uses the node voltage probability distribution obtained in step S102 to calculate the probability that the node voltage exceeds the safe operating range, and weights and aggregates the exceedance risks of all nodes to form a scalar value that reflects the overall voltage safety risk of the system.

[0121] In some embodiments, device and system operating constraints may be the inclusion of various hard limitations in the model to ensure that the optimization scheme is physically feasible and complies with operating procedures.

[0122] In some embodiments, capacitor bank constraints may include a daily cumulative switching limit (for protecting mechanical switches) and the rated reactive power capacity of each capacitor bank.

[0123] In some embodiments, energy storage system constraints may include upper and lower limits of state of charge (to prevent overcharging and over-discharging), charging and discharging power limits, and energy balance equations (where the state of charge (SOC) is equal at the beginning and end of the scheduling cycle or a set value).

[0124] In some embodiments, photovoltaic inverter constraints may be adjustable reactive power range constraints based on inverter capacity, given its current active power output.

[0125] In some embodiments, node voltage safety constraints may require that the voltage amplitude of each node be maintained within specified upper and lower limits.

[0126] In some embodiments, line power flow constraints may require that the active and reactive power transmitted by each line must not exceed its thermal stability or stability limit.

[0127] In some embodiments, minimizing active power loss can reduce heat loss due to resistance during power transmission in distribution network lines and transformers.

[0128] In some embodiments, minimizing the node voltage exceedance risk index can proactively manage safety risks. It utilizes the node voltage probability distribution obtained in the preceding steps to quantitatively assess the likelihood of voltage exceeding the safe range, and suppresses it to an acceptable low level through optimization, thereby achieving a shift from passively responding to exceedances to proactively preventing risks.

[0129] In some embodiments, to ensure the physical feasibility and engineering applicability of the optimization results, the model may incorporate multiple key constraints: Capacitor bank operating constraints: ; In the formula: Represents the CB set; Let be the reactive power output of the i-th CB at time t; This represents the reactive power output of each capacitor group in the i-th CB; and These represent the number of cuts made by the i-th CB at time t and its maximum cut value, respectively. This represents the XOR mathematical operator; Let be the maximum number of times the i-th CB can be adjusted daily. The above formula shows that the model must strictly follow the physical characteristics of the capacitor bank, including its maximum allowable number of switching operations per day (to prevent mechanical fatigue of the switches), and the single adjustment amount limited by the equipment capacity for each operation (i.e., the fixed reactive power capacity put on or taken off each time).

[0130] Energy storage system operating constraints: ; In the formula: Let represent the storage capacity of the i-th ESS at time t; and These represent the charge and discharge efficiencies of the i-th ESS, respectively. This represents the maximum capacity of the i-th ESS; and This represents the maximum charging and discharging power of the i-th ESS; This represents the charging / discharging state of the i-th ESS at time t. It is a binary variable, where 1 indicates that it is charging and 0 indicates that it is discharging. and These represent the storage capacity of the i-th ESS at the beginning and end of the scheduling cycle, respectively. The above formula indicates that the model needs to ensure that the operating trajectory of the energy storage system conforms to physical laws, mainly including: the state of charge of the energy storage unit must be maintained within a safe range throughout the entire scheduling cycle, and maintain a specified value at the beginning and end of the cycle; its charging and discharging power must not exceed the maximum capacity of the equipment at any time, and it cannot charge and discharge simultaneously.

[0131] Constraints on the regulation capability of photovoltaic inverters: ; In the formula: The reactive power of the i-th PV at time t; Let be the minimum power factor of the i-th PV. The above formula shows that the model must respect the technical capability boundaries of the photovoltaic inverter. Under the premise of ensuring full absorption of photovoltaic active power output, its reactive power adjustment range is limited by the apparent power capacity of the equipment, that is, its reactive power output cannot exceed the available capacity determined by the active power output.

[0132] System security operation constraints: ; ; In the formula: Represents a set of distribution network lines; and Let the maximum and minimum allowable values ​​be the voltage at the i-th node. Let be the voltage amplitude of the i-th node at time t; This represents the maximum transmission capacity of the i-th line; and Let t represent the active and reactive power transmitted by the i-th line at time t. The above formula shows that the model must ensure the basic safety of the entire distribution network, mainly including two aspects: first, node voltage safety constraints, that is, the voltage amplitude of all nodes must be strictly maintained between the specified upper and lower limits; second, line power flow safety constraints, that is, the power flowing through each line must not exceed its thermal stability transmission limit to prevent line overload.

[0133] In some embodiments, the above dual optimization objectives and all listed constraints can be integrated and expressed according to the mathematical specifications of mixed integer second-order cone programming, ultimately forming a well-structured and fully defined mathematical model of mixed integer second-order cone programming.

[0134] By placing probabilistic safety risk indicators and active power loss indicators within the same optimization framework for joint optimization, and by fully incorporating the detailed physical constraints of all key equipment and grid safety constraints, the final scheduling plan is no longer a compromise solution that sacrifices certain constraints or a single dimension for optimal performance. Instead, it is a globally optimal solution that achieves the best balance between safety and losses while strictly satisfying all realistic constraints, greatly enhancing the scientific rigor and comprehensive value of day-ahead planning.

[0135] In some embodiments, the above-mentioned deep deterministic policy gradient algorithm is used for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation. Specifically, this may include: constructing the state space, action space, and reward function of the general reinforcement learning model; the state space includes the voltage amplitude of each node in the distribution network, the predicted output data of the multiple distributed power sources, and time information; the action space includes the reactive power output setpoint of the multiple continuous voltage control equipment; the reward function is constructed to be negatively correlated with the deviation of the node voltage from the reference voltage; constructing an Actor-Critic network based on the deep deterministic policy gradient algorithm; the Actor network takes the grid state data as input and outputs reactive power regulation action data; the Critic network takes the grid state data and reactive power regulation action data as joint input and outputs the reward function Q value corresponding to the reactive power regulation action data; based on the intraday predicted output data, the Actor-Critic network is trained offline in a simulated distribution network environment, and the parameters of the Actor network and the Critic network are alternately updated using the gradient descent method; after the offline training is completed, the parameters of the Actor-Critic network are saved to obtain the general reinforcement learning model.

[0136] In some embodiments, the intraday predicted power output data can be a time series dataset obtained by predicting the active power generated by each distributed power source (such as photovoltaic and wind turbines) in the distribution network during each scheduling period based on ultra-short-term weather forecasts. Although there are errors, it systematically describes the spectrum of the most likely future operating scenarios, providing batches of input samples covering typical operating conditions for offline training.

[0137] In some embodiments, the reference voltage can be determined in step S103, which is the expected voltage target value curve of each node in the distribution network during each scheduling period. It is the globally optimal operating trajectory obtained by the day-ahead stochastic optimization model after comprehensively considering losses and probabilistic security. The reference voltage is the ultimate control target that the reinforcement learning agent needs to strive to approach, and it is the core basis for constructing the reward function and the guide for policy learning.

[0138] In some embodiments, the general reinforcement learning model can be a deep reinforcement learning agent with preliminary voltage regulation decision-making capabilities, pre-trained based on historical or predicted data in a completely offline simulation environment. The neural network parameters of this model encode the basic strategies and rules for coordinating voltage control of multiple continuous devices under the predicted scenario distribution, which will be transferred as valuable initial knowledge to the subsequent online controller.

[0139] In some embodiments, the Deep Deterministic Policy Gradient (DDPG) algorithm can be an advanced deep reinforcement learning algorithm specifically designed for continuous action space problems. It employs an Actor-Critic architecture: the Actor (policy) network acts as a controller, directly outputting an accurate continuous action vector (such as the reactive power setpoints of each device) based on the current environmental state; the Critic (value) network acts as an evaluator, responsible for evaluating the long-term value of a given state-action pair. DDPG significantly improves training stability by introducing a target network and an experience replay mechanism, enabling it to learn complex and smooth control policies from high-dimensional data.

[0140] In some embodiments, offline training can be a learning process for the agent that does not interact with a real physical power grid, but rather takes place in a high-fidelity digital simulation environment. Training data is derived from historical operating records or batch scenarios generated from predictive data. This approach completely avoids the safety risks of trial and error in real systems and enables thorough and rapid learning using large-scale data.

[0141] In some embodiments, the environmental information observed by the agent at each decision moment can be designed as a composite vector: .in, This is a vector of measured (or simulated) voltage values ​​at each node at the end of the previous time period, reflecting the current state of the system. and These provide forward-looking information, namely vectors of current and short-term future distributed power generation and load forecasts. Encoding time features (such as time of day) enables intelligent agents to learn cyclical patterns of load and output.

[0142] In some embodiments, the control instructions executable by the intelligent agent directly correspond to the adjustment amount of the continuous voltage control device, and can be defined as follows: This refers to the reactive power setpoint vector for all photovoltaic inverters and energy storage converters in the next time period. Each action component is constrained within the physical feasible domain of the device (such as inverter capacity circle constraints).

[0143] In some embodiments, the agent's reward function can be constructed as follows: The first term penalizes the deviation between the voltage and the reference voltage at the next moment; the second term is a voltage over-limit indication function, which imposes a significant penalty for any over-limit.

[0144] In some embodiments, the Actor network may take a state vector as input, process it through multiple fully connected layers and activation functions, and finally use the tanh function to limit the output to [-1,1] in the output layer, and then scale it to the actual range of each device action through a linear mapping layer.

[0145] In some embodiments, the Critic network can be a concatenation of state and action vectors as input, processed by a neural network, and outputting a scalar Q-value representing the long-term expected reward of the state-action pair. Its structure is deeper than that of the Actor network, giving it powerful value assessment capabilities.

[0146] In some embodiments, the experience replay pool can be a first-in-first-out data buffer used to store transfer sample quadruples generated by the interaction between the agent and the simulation environment. During training, small batches of samples are randomly sampled to break the sequence correlation between data, thereby improving data utilization efficiency and learning stability.

[0147] In some embodiments, environmental interaction and data collection can be performed cyclically for multiple training rounds (each round simulating a typical day or a combination of multiple scenarios). At each step in each round, the agent adjusts its actions based on the current state. and policy selection actions with added exploratory noise (such as Ornstein-Uhlenbeck process noise). This action is performed in the simulator, and the new state is obtained through power flow calculation. and rewards And store the experience in the replay pool.

[0148] In some embodiments, iterative updates of network parameters may be performed cyclically after the replay pool has accumulated sufficient data: Updating the Critic network can be done by sampling a batch of data from the replay pool and calculating the target Q-value. ,in The discount factor is used to minimize the loss function. To update the main Critic network parameters.

[0149] Updating the Actor network can be achieved by utilizing the gradient direction provided by the Critic network and updating the parameters of the main Actor network using the policy gradient theorem to maximize the expected reward. .

[0150] In some embodiments, convergence and saving can continue the process described above until the agent's performance metrics (such as average round reward and root mean square voltage deviation) on independent verification scenario sets stably converge and reach a preset standard. At this point, the parameters of the main Actor network are saved. This network is the general reinforcement learning model, which encapsulates the initial voltage control intelligence learned from the prediction data.

[0151] By shifting the generation of optimal control strategies from explicit mathematical programming relying on precise physical models to implicit deep representation learning based on massive data interaction, the system can automatically discover and utilize deep nonlinear device coordination laws that go beyond the simplification assumptions of traditional linearized models. This enables it to gain generalized control capabilities to cope with complex, imprecisely modeled scenarios, providing a new technical path for dealing with extreme fluctuations. Furthermore, the general reinforcement learning model obtained through offline training has internalized effective experience in coordinating multiple continuous devices for rapid voltage regulation in the predicted world. This policy network serves as a high-performance initial intelligent controller, and its parameters constitute a high-quality policy foundation. This provides an excellent starting point and knowledge guarantee for subsequent fine-tuning in the online phase. In addition, the general reinforcement learning model is trained to convergence in rich and diverse offline scenarios, ensuring that the learned policies have considerable universality and stability. This stable and rich knowledge representation, when transferred as initial values ​​to the online model, can greatly suppress the risk of control oscillations caused by drastic policy changes in the early stages of online learning and significantly accelerate the subsequent fine-tuning convergence process based on real-time data.

[0152] As can be seen from the above embodiments of the distribution network voltage control method provided in this specification, these embodiments can construct output probability models for each distributed power source based on historical output data of each distributed power source in the distribution network; based on the output probability models, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss and node voltage exceedance risk index of the distribution network; based on the day-ahead voltage control model, the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network are determined within the first scheduling cycle; a general reinforcement learning model is constructed based on the intraday predicted output data of each distributed power source and the reference voltage of each node; the parameters of the general reinforcement learning model are transferred to the target reinforcement learning model; the target reinforcement learning model is fine-tuned based on the intraday actual output data of multiple distributed power sources; the fine-tuned target reinforcement learning model is used to generate the second reactive power regulation command of multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles. By constructing a distributed power source output probability model, the complex probability distribution characteristics (such as multi-peak and skewed) of photovoltaic and wind power output can be accurately characterized, thereby providing high-fidelity uncertainty input for subsequent stochastic optimization. Building upon this foundation, the constructed day-ahead voltage control model uses both a probability-based voltage exceedance risk index and system active power loss as optimization objectives, achieving a proactive quantitative trade-off between safety and losses. This model formulates globally optimal day-ahead switching plans for discrete devices with slow response times and limited action frequency (such as capacitor banks), effectively avoiding ineffective and frequent device actions and extending their service life while optimizing overall network operating losses. The generated reference voltage curves for each node provide a unified and coordinated tracking benchmark for intraday real-time control, ensuring the vertical coordination of multi-timescale control strategies. Through parameter migration technology, the online controller inherits mature control strategies learned from offline training from the outset, completely eliminating the control risks and performance instability caused by the randomness of strategies in the early stages of traditional online learning, achieving safe and smooth integration of AI controllers in critical power infrastructure. Furthermore, rapid fine-tuning of the model based on real-time data can promptly calibrate strategy deviations caused by prediction errors, enabling the system not only to cope with known fluctuations but also to safely and efficiently adapt to unknown real-time changes and slow time-varying variations. Ultimately, the fine-tuned target reinforcement learning model can instantly generate precise adjustment commands to coordinate multiple continuous voltage control devices based on real-time grid conditions, achieving rapid suppression of voltage fluctuations. This real-time control layer works in conjunction with the previously planned discrete device schedule, forming a complete closed-loop control system where the daily discrete devices construct a safe operating framework, while the intraday continuous devices perform rapid and precise fine-tuning. This architecture fully integrates the response characteristics of different control devices, significantly improving the real-time safety and control accuracy of the system while ensuring equipment lifespan and full-cycle losses.

[0153] Based on the above-described distribution network voltage control method, this specification also proposes embodiments of a distribution network voltage control device. For example... Figure 5 As shown, the power distribution network voltage control device 500 may specifically include the following modules: The first construction module 501 is used to construct the output probability model of each distributed power source based on the historical output data of each distributed power source in the distribution network. The second construction module 502 is used to construct a day-ahead voltage control model based on the output probability model, with the goal of minimizing the active power loss of the distribution network and the node voltage over-limit risk index. The determination module 503 is used to determine, based on the day-ahead voltage control model, the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle; The third construction module 504 is used to construct a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node. The transfer module 505 is used to transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; The fine-tuning module 506 is used to fine-tune the target reinforcement learning model based on the actual daily output data of multiple distributed power sources. The generation module 507 is used to generate second reactive power regulation instructions for multiple continuous voltage control devices in the distribution network within the second scheduling cycle using a fine-tuned target reinforcement learning model; each first scheduling cycle includes multiple second scheduling cycles.

[0154] In some embodiments, the first construction module 501 described above can be specifically used for: For each distributed power source, based on its historical output data, a nonparametric kernel density estimation algorithm is used to construct the output probability density function of the active power output of the distributed power source.

[0155] In some embodiments, the second building module 502 described above can be specifically used for: Based on the output probability model of each distributed power source, the semi-invariants of each order of the output random variable corresponding to each distributed power source are determined by the semi-invariant algorithm. The semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltage of each node in the distribution network. Expanding the semi-invariants of each node voltage yields the probability distribution of each node voltage. Based on the probability distribution of voltage at each node, and with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits, a day-ahead voltage control model based on a mixed integer second-order cone programming algorithm is constructed.

[0156] In some embodiments, the second building module 502 described above can also be used for: The AC power flow model of the distribution network is expanded and linearized using Taylor series at the reference operating point to obtain the linear sensitivity relationship between the power injection disturbance and the voltage disturbance at the distribution network nodes. Based on the linear sensitivity relationship, the semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltages of each node in the distribution network.

[0157] In some embodiments, the second building module 502 described above can also be used for: Based on the probability distribution of voltage at each node, with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network, a day-ahead voltage control model is constructed, which assumes that the capacitor banks in the distribution network meet the constraints of daily switching frequency and single regulation capacity, the energy storage system in the distribution network meets the constraints of state of charge and charging / discharging power, the photovoltaic inverters in the distribution network meet the constraints of reactive power regulation capacity, and the nodes in the distribution network meet the constraints of voltage safety operation and line power flow.

[0158] In some embodiments, the third construction module 504 described above can be specifically used for: Using the reference voltage as the control target and the intraday predicted power output data and the power grid status data of the distribution network as inputs, a deep deterministic strategy gradient algorithm is used for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation.

[0159] In some embodiments, the third construction module 504 described above can also be used for: The state space, action space, and reward function of the general reinforcement learning model are constructed. The state space includes the voltage amplitude of each node in the distribution network, the predicted output data of the multiple distributed power sources, and time information. The action space includes the reactive power output setpoints of the multiple continuous voltage control devices. The reward function is constructed to be negatively correlated with the deviation of the node voltage from the reference voltage. An Actor-Critic network based on a deep deterministic policy gradient algorithm is constructed. The Actor network takes grid state data as input and outputs reactive power regulation action data. The Critic network takes grid state data and reactive power regulation action data as joint input and outputs the reward function Q value corresponding to the reactive power regulation action data. Based on the intraday predicted power output data, the Actor-Critic network is trained offline in a simulated power distribution network environment, and the parameters of the Actor network and Critic network are updated alternately using the gradient descent method. After the offline training is completed, the parameters of the Actor-Critic network are saved to obtain the general reinforcement learning model.

[0160] In some embodiments, the fine-tuning module 506 described above can also be used for: Based on the preset large model, the network layer parameters corresponding to the wind and light fluctuation feature extraction in the target reinforcement learning model are maintained. Based on the actual output data during the day, the parameters of the remaining network layers in the target reinforcement learning model are fine-tuned.

[0161] As can be seen from the distribution network voltage control device provided in the embodiments of this specification above, the embodiments of this specification can construct the output probability model of each distributed power source based on the historical output data of each distributed power source in the distribution network; based on the output probability model, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network; based on the day-ahead voltage control model, the first reactive power adjustment command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network are determined within the first scheduling cycle; a general reinforcement learning model is constructed based on the intraday predicted output data of each distributed power source and the reference voltage of each node; the parameters of the general reinforcement learning model are transferred to the target reinforcement learning model; the target reinforcement learning model is fine-tuned based on the intraday actual output data of multiple distributed power sources; the fine-tuned target reinforcement learning model is used to generate the second reactive power adjustment command of multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles. By constructing a distributed power source output probability model, the complex probability distribution characteristics (such as multi-peak and skewed) of photovoltaic and wind power output can be accurately characterized, thereby providing high-fidelity uncertainty input for subsequent stochastic optimization. Building upon this foundation, the constructed day-ahead voltage control model uses both a probability-based voltage exceedance risk index and system active power loss as optimization objectives, achieving a proactive quantitative trade-off between safety and losses. This model formulates globally optimal day-ahead switching plans for discrete devices with slow response times and limited action frequency (such as capacitor banks), effectively avoiding ineffective and frequent device actions and extending their service life while optimizing overall network operating losses. The generated reference voltage curves for each node provide a unified and coordinated tracking benchmark for intraday real-time control, ensuring the vertical coordination of multi-timescale control strategies. Through parameter migration technology, the online controller inherits mature control strategies learned from offline training from the outset, completely eliminating the control risks and performance instability caused by the randomness of strategies in the early stages of traditional online learning, achieving safe and smooth integration of AI controllers in critical power infrastructure. Furthermore, rapid fine-tuning of the model based on real-time data can promptly calibrate strategy deviations caused by prediction errors, enabling the system not only to cope with known fluctuations but also to safely and efficiently adapt to unknown real-time changes and slow time-varying variations. Ultimately, the fine-tuned target reinforcement learning model can instantly generate precise adjustment commands to coordinate multiple continuous voltage control devices based on real-time grid conditions, achieving rapid suppression of voltage fluctuations. This real-time control layer works in conjunction with the previously planned discrete device schedule, forming a complete closed-loop control system where the daily discrete devices construct a safe operating framework, while the intraday continuous devices perform rapid and precise fine-tuning. This architecture fully integrates the response characteristics of different control devices, significantly improving the real-time safety and control accuracy of the system while ensuring equipment lifespan and full-cycle losses.

[0162] This specification also provides a computer device for a distribution network voltage control method, including a processor and a memory for storing processor-executable instructions. Specifically, the processor can perform the following tasks according to the instructions: constructing an output probability model for each distributed power source based on historical output data of each distributed power source in the distribution network; constructing a day-ahead voltage control model based on the output probability model, with the objective of minimizing active power loss and node voltage exceedance risk index of the distribution network; determining, based on the day-ahead voltage control model, the first reactive power adjustment command for multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within a first scheduling cycle; constructing a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node; transferring the parameters of the general reinforcement learning model to a target reinforcement learning model; fine-tuning the target reinforcement learning model based on the intraday actual output data of multiple distributed power sources; and generating a second reactive power adjustment command for multiple continuous voltage control devices in the distribution network within a second scheduling cycle using the fine-tuned target reinforcement learning model; each first scheduling cycle includes multiple second scheduling cycles.

[0163] To execute the above instructions more accurately, please refer to... Figure 6 As shown in the embodiments of this specification, another specific computer device 600 is also provided, wherein the computer device 600 includes a network communication port 601, a processor 602 and a memory 603, and the above structures are connected by internal cables so that the various structures can perform specific data interaction.

[0164] The processor 602 can be specifically used to: construct an output probability model for each distributed power source based on historical output data of each distributed power source in the distribution network; construct a day-ahead voltage control model based on the output probability model, with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network; determine the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle based on the day-ahead voltage control model; construct a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node; transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; fine-tune the target reinforcement learning model based on the intraday actual output data of multiple distributed power sources; and use the fine-tuned target reinforcement learning model to generate the second reactive power regulation command of multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles.

[0165] The memory 603 can be used to store the corresponding instruction program.

[0166] In this embodiment, the network communication port 601 can be a virtual port bound to different communication protocols, thereby enabling the sending or receiving of different data. For example, the network communication port can be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for email data communication. Furthermore, the network communication port can also be a physical communication interface or communication chip. For example, it can be a wireless mobile network communication chip, such as GSM or CDMA; it can also be a Wi-Fi chip; or it can be a Bluetooth chip.

[0167] In this embodiment, the processor 602 can be implemented in any suitable manner. For example, the processor can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers, etc. This specification is not limiting.

[0168] In this embodiment, the memory 603 includes volatile memory and non-volatile memory. The memory 603 can include multiple layers. In digital systems, anything that can store binary data can be a memory; in integrated circuits, a circuit with storage function but no physical form is also called a memory, such as RAM, FIFO, etc.; in a system, a storage device with a physical form is also called a memory, such as a memory stick, TF card, etc.

[0169] Furthermore, embodiments of this specification also provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described... Figure 1 The instructions for the method shown.

[0170] It should be understood that in the various embodiments of this specification, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this specification.

[0171] It should also be understood that, in the embodiments of this specification, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this specification generally indicates that the preceding and following related objects have an "or" relationship.

[0172] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0173] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0174] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0175] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational tasks to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The task is a function specified in one or more boxes.

[0176] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A voltage control method for a power distribution network, characterized in that, include: Based on the historical output data of each distributed power source in the distribution network, a probability model of the output of each distributed power source is constructed. Based on the aforementioned output probability model, a day-ahead voltage control model is constructed with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits. Based on the day-ahead voltage control model, the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network are determined within the first scheduling cycle. Based on the intraday predicted output data of each distributed power source and the reference voltage of each node, a general reinforcement learning model is constructed. Transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; The target reinforcement learning model is fine-tuned based on the actual daily output data of multiple distributed power sources. The fine-tuned target reinforcement learning model is used to generate the second reactive power regulation command for multiple continuous voltage control devices in the distribution network within the second scheduling cycle; each first scheduling cycle includes multiple second scheduling cycles.

2. The method according to claim 1, characterized in that, The construction of the output probability model for each distributed power source includes: For each distributed power source, based on its historical output data, a nonparametric kernel density estimation algorithm is used to construct the output probability density function of the active power output of the distributed power source.

3. The method according to claim 1, characterized in that, Based on the power output probability model, a day-ahead voltage control model is constructed with the objective of minimizing the active power loss of the distribution network and the node voltage exceedance risk index, including: Based on the output probability model of each distributed power source, the semi-invariants of each order of the output random variable corresponding to each distributed power source are determined by the semi-invariant algorithm. The semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltage of each node in the distribution network. Expanding the semi-invariants of each node voltage yields the probability distribution of each node voltage. Based on the probability distribution of voltage at each node, and with the goal of minimizing the active power loss of the distribution network and the risk index of node voltage exceeding limits, a day-ahead voltage control model based on a mixed integer second-order cone programming algorithm is constructed.

4. The method according to claim 3, characterized in that, The process of converting the semi-invariants of the output random variables corresponding to each distributed power source into the semi-invariants of the voltages of each node in the distribution network includes: The AC power flow model of the distribution network is expanded and linearized using Taylor series at the reference operating point to obtain the linear sensitivity relationship between the power injection disturbance and the voltage disturbance at the distribution network nodes. Based on the linear sensitivity relationship, the semi-invariants of the output random variables corresponding to each distributed power source are converted into the semi-invariants of the voltages of each node in the distribution network.

5. The method according to claim 3, characterized in that, Based on the probability distribution of voltage at each node, and with the objective of minimizing the active power loss of the distribution network and the node voltage exceedance risk index, a day-ahead voltage control model based on a mixed-integer second-order cone programming algorithm is constructed, including: Based on the probability distribution of voltage at each node, with the goal of minimizing the active power loss and node voltage over-limit risk index of the distribution network, a day-ahead voltage control model is constructed, which assumes that the capacitor banks in the distribution network meet the constraints of daily switching frequency and single regulation capacity, the energy storage system in the distribution network meets the constraints of state of charge and charging / discharging power, the photovoltaic inverters in the distribution network meet the constraints of reactive power regulation capacity, and the nodes in the distribution network meet the constraints of voltage safety operation and line power flow.

6. The method according to claim 1, characterized in that, The general reinforcement learning model is constructed based on the intraday predicted output data of each distributed power source and the reference voltage of each node, including: Using the reference voltage as the control target and the intraday predicted power output data and the power grid status data of the distribution network as inputs, a deep deterministic strategy gradient algorithm is used for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation.

7. The method according to claim 6, characterized in that, The method employs a deep deterministic policy gradient algorithm for offline training to obtain a general reinforcement learning model for continuous voltage control equipment regulation, including: The state space, action space, and reward function of the general reinforcement learning model are constructed. The state space includes the voltage amplitude of each node in the distribution network, the predicted output data of the multiple distributed power sources, and time information. The action space includes the reactive power output setpoints of the multiple continuous voltage control devices. The reward function is constructed to be negatively correlated with the deviation of the node voltage from the reference voltage. An Actor-Critic network based on a deep deterministic policy gradient algorithm is constructed. The Actor network takes grid state data as input and outputs reactive power regulation action data. The Critic network takes grid state data and reactive power regulation action data as joint input and outputs the reward function Q value corresponding to the reactive power regulation action data. Based on the intraday predicted power output data, the Actor-Critic network is trained offline in a simulated power distribution network environment, and the parameters of the Actor network and Critic network are updated alternately using the gradient descent method. After the offline training is completed, the parameters of the Actor-Critic network are saved to obtain the general reinforcement learning model.

8. The method according to claim 1, characterized in that, The fine-tuning of the target reinforcement learning model includes: Based on the preset large model, the network layer parameters corresponding to the wind and light fluctuation feature extraction in the target reinforcement learning model are maintained. Based on the actual output data during the day, the parameters of the remaining network layers in the target reinforcement learning model are fine-tuned.

9. A power distribution network voltage control device, characterized in that, include: The first construction module is used to construct the output probability model of each distributed power source based on the historical output data of each distributed power source in the distribution network. The second construction module is used to construct a day-ahead voltage control model based on the output probability model, with the goal of minimizing the active power loss of the distribution network and the node voltage over-limit risk index. The determination module is used to determine the first reactive power regulation command of multiple discrete voltage control devices in the distribution network and the reference voltage of each node in the distribution network within the first scheduling cycle, based on the day-ahead voltage control model. The third construction module is used to build a general reinforcement learning model based on the intraday predicted output data of each distributed power source and the reference voltage of each node. The transfer module is used to transfer the parameters of the general reinforcement learning model to the target reinforcement learning model; The fine-tuning module is used to fine-tune the target reinforcement learning model based on the actual daily output data of multiple distributed power sources. The generation module is used to generate second reactive power regulation instructions for multiple continuous voltage control devices in the distribution network within the second scheduling cycle using a fine-tuned target reinforcement learning model; each first scheduling cycle includes multiple second scheduling cycles.

10. A computer device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the method of any one of claims 1-8.