A state-enhanced large-scale distributed photovoltaic safety peak shaving method and system

By optimizing the peak-shaving strategy of a large-scale distributed photovoltaic system using a deep reinforcement learning model, the problems of transformer overload and control delay were solved, achieving more efficient photovoltaic consumption and transformer safety management.

CN120934092BActive Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2025-08-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies have failed to effectively address the transformer overload problem caused by reverse power flow in large-scale distributed photovoltaic systems, and have not considered the delay and overload risk of uploading to the control center, lacking optimal control strategies for both short-term and long-term time scales.

Method used

A deep reinforcement learning model is adopted, which combines reward functions on short-term and long-term time scales. Through adaptive noise and state augmentation space, the photovoltaic peak-shaving strategy is optimized. Taking into account transformer overload and solar curtailment, the photovoltaic output is dynamically adjusted to reduce overload risk, and the scheduling decision is optimized by state delay compensation.

🎯Benefits of technology

It improves the overload mitigation capacity of transformers, enhances photovoltaic absorption efficiency, improves the accuracy and timeliness of dispatching decisions, and reduces overload risk.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120934092B_ABST
    Figure CN120934092B_ABST
Patent Text Reader

Abstract

A state-enhanced method and system for large-scale distributed photovoltaic (PV) power shaving includes: establishing a deep reinforcement learning model for distributed PV power shaving; establishing a multi-timescale reward function as a composite reward function for the deep reinforcement learning model; during training the deep reinforcement learning model, calculating the transformer overload performance index in each iteration, calculating adaptive noise based on the transformer overload performance index, generating a perturbation term, and superimposing the perturbation term on the current policy network; using the deep reinforcement learning model for distributed PV power shaving to perform peak shaving, establishing a state enhancement space based on state transmission delay and weather conditions during peak shaving, predicting the state delay compensation amount at each time step using the state enhancement space, and correcting the current state with the compensation amount. This invention improves the accuracy and timeliness of peak shaving decisions through multi-timescale reward functions, adaptive noise, and state enhancement.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of power distribution network peak shaving technology, and more specifically, relates to a state-enhanced large-scale distributed photovoltaic security peak shaving method and system. Background Technology

[0002] The rapid growth of distributed energy resources (DER) has met human electricity needs and enhanced social sustainability, but it has also brought new challenges to power system security. In recent years, the large-scale installation of distributed photovoltaic (PV) panels on residential rooftops has significantly improved residents' daily electricity consumption. However, during periods of high solar irradiance, the electricity generated by PV systems often exceeds the residential load, leading to reverse power flow (RPF). This excess electricity is fed back to the upstream grid through transformers, potentially causing transformer overload and affecting the overall security of the distribution network. Therefore, further research is needed on safe peak-shaving strategies for large-scale distributed photovoltaic (DPV) systems.

[0003] CN117200338A proposes a distributed photovoltaic peak-shaving method based on a data acquisition terminal. This invention is based on the existing transformer substation structure and adds power acquisition for photovoltaic power. First, the number of photovoltaic units that need to participate in peak shaving is calculated based on the total peak-shaving demand. Then, the selected units are selected according to the principle of prioritizing photovoltaic units that have participated in peak shaving fewer times. The actual power generation of the selected units is adjusted to construct a peak-shaving strategy, which is then implemented by sending the data acquisition terminal to the IoT meter.

[0004] CN111952996A discloses a peak-shaving control method for distributed photovoltaic (PV) systems with energy storage based on economic benefit assessment, belonging to the technical field of electricity market ancillary services. The economic benefit assessment of this invention considers both the costs and benefits of distributed PV systems with energy storage participating in peak shaving. Costs include the opportunity cost of the distributed PV project, the cost of purchasing electricity for energy storage charging, and the operation and maintenance costs of the energy storage. Benefits include peak-shaving compensation revenue and the revenue from the energy storage. Based on the economic benefit analysis, a reasonable strategy for distributed PV systems with energy storage participating in peak shaving is formulated.

[0005] However, none of the aforementioned patents took into account the delay in uploading to the control center, nor did they limit the risk of overload, nor did they jointly consider the optimal control on both short-term and long-term time scales. Summary of the Invention

[0006] To address the shortcomings of existing technologies, this invention provides a state-enhanced method and system for large-scale distributed photovoltaic security peak shaving.

[0007] The present invention adopts the following technical solution.

[0008] The first aspect of this invention proposes a state-enhanced method for large-scale distributed photovoltaic (PV) security peak shaving, comprising the following:

[0009] A deep reinforcement learning model for distributed photovoltaic (PV) power shaving is established; the action of the deep reinforcement learning model is to perform power shaving for the PV power of each substation.

[0010] The scheduling day is divided into multiple time periods. A short-term reward function is established based on the transformer overload power and solar curtailment of the substation at each time period. A long-term reward function is established based on the number of overloads within the scheduling day. The short-term and long-term reward functions are merged to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model.

[0011] When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network.

[0012] A deep reinforcement learning model for distributed photovoltaic (PV) security peak shaving is used to perform PV peak shaving. During peak shaving, a state augmentation space based on state transmission delay and weather conditions is established. At each time step, the state augmentation space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

[0013] Preferably, the establishment of the deep reinforcement learning model for distributed photovoltaic power security peak shaving specifically involves:

[0014] The deep reinforcement learning model is a Markov decision model, whose state includes the current time, the photovoltaic output and load demand of each substation at the current time;

[0015] The action is to perform peak-shaving control on the photovoltaic power of each substation. The range of the action is between [-1, 0]. When the action of the corresponding substation is negative, the photovoltaic output of the corresponding substation is reduced. When it is 0, the photovoltaic output of the substation remains unchanged.

[0016] Preferably, the fusion of the short-term and long-term reward functions to form a multi-timescale reward function, which serves as the composite reward function of the deep reinforcement learning model, specifically involves:

[0017] The short-term reward function for a given moment is as follows: establish a reward function based on transformer overload power and a reward function based on solar curtailment, and then sum the two reward functions according to the weights set for the corresponding reward functions.

[0018] The long-term reward function is as follows: calculate the difference between the number of overload occurrences of each transformer on the dispatch day and the set maximum allowable number of overload occurrences, add up all differences greater than 0, and find the opposite number;

[0019] The reward function for all short-term times within a scheduling day is summed, and then the reward function for long-term times is multiplied by the corresponding set weight of the long-term time, to form a multi-time-scale reward function.

[0020] Preferably, the establishment of reward functions based on transformer overload power and solar curtailment amounts specifically involves:

[0021]

[0022] In the formula, for The reward function based on transformer overload power at any given time; for The reward function based on the amount of solar energy wasted at each moment; This represents the total number of transformers. For the first Transformer Overload power at any given time; For the first The capacity of the transformer; For the first The maximum reverse power of each transformer; For the first Transformer The amount of solar energy wasted at any given moment; For the first Transformer Photovoltaic power output at all times.

[0023] Preferably, the calculation of the transformer overload performance index specifically involves:

[0024] Transformer The overload power at any given time divided by the capacity of the corresponding transformer, and then divided by the maximum reverse power of the corresponding transformer, equals the value of the corresponding transformer's overload power. Overload factor at any given time

[0025] Calculate all transformers The average overload performance is obtained by taking the square root of the sum of the squares of the overload coefficients at each time point.

[0026] Extract all transformers The maximum value of the overload coefficient at any given time is subtracted from the set overload threshold, and the Sigmoid function of the result is calculated as the maximum overload performance.

[0027] The average overload performance and the maximum overload performance are multiplied to obtain the transformer overload performance index.

[0028] Preferably, the step of calculating adaptive noise as a disturbance term based on the transformer overload performance index specifically involves:

[0029]

[0030] In the formula, For adaptive noise, , These are the set fixed linear attenuation coefficient and the feedback attenuation coefficient, respectively; and These are the current iteration count and the set maximum iteration count, respectively. The transformer overload performance index; The set reference noise;

[0031] Disturbance term Follows a normal distribution , For It is a normal distribution with standard deviation.

[0032] Preferably, the establishment of the state enhancement space based on state transmission delay and weather conditions specifically includes:

[0033] The current moment t The state, current weather conditions, timestamp of transmission, and expected action together form a vector as the state augmentation space; the weather conditions include light intensity and normalized light intensity change rate; the expected action is to maintain the action issued by the dispatch center at the previous moment; if the dispatch center did not issue a peak-shaving action at the previous moment, the expected action is to implement maximum power point tracking (MPPT).

[0034] Preferably, at each time step, a state-enhanced spatial prediction state delay compensation amount is used to correct the current state, specifically as follows:

[0035] The delay time is obtained by subtracting the timestamp of the time the status enhancement space was received from the timestamp of the time the space was sent.

[0036] Current moment t The photovoltaic output and load demand in the current state, the current weather conditions, the delay time and the expected action are used as feature vectors. The pre-trained quantile regression forest algorithm is input to predict the delay compensation amount and output its confidence. The delay compensation amount includes the photovoltaic output compensation amount and the load demand compensation amount.

[0037] The absolute values ​​of both the photovoltaic output compensation and the load demand compensation are less than or equal to the current time. tThe ratio of photovoltaic output to load demand in the current state;

[0038] When the expected action is to maintain the action issued by the dispatch center at the previous moment, the ratio is a set value;

[0039] When the expected action is to perform maximum power point tracking (MPPT), the proportion is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change of illumination intensity.

[0040] The current moment t The photovoltaic output and load demand in the current state are respectively increased by adding the photovoltaic output compensation amount and the load demand compensation amount to obtain the corrected photovoltaic output and load demand.

[0041] Preferably, the ratio is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change of light intensity, specifically:

[0042] When the expected action is to perform maximum power point tracking (MPPT), the proportional for:

[0043]

[0044] In the formula, Confidence level; , The set coefficient; This represents the normalized rate of change of light intensity.

[0045] A second aspect of the present invention proposes a large-scale distributed photovoltaic security peak-shaving system based on the state enhancement method described in the first aspect of the present invention, comprising a peak-shaving model establishment module, a reward function construction module, an adaptive noise module, and a state enhancement compensation module, specifically:

[0046] Peak shaving model establishment module: used to establish a deep reinforcement learning model for safe peak shaving of distributed photovoltaic power; the action of the deep reinforcement learning model is to perform peak shaving for the photovoltaic power of each substation;

[0047] Reward function construction module: It is used to divide the scheduling day into multiple time periods, establish a short-term reward function based on the transformer overload power and solar curtailment of the substation at each time period, establish a long-term reward function based on the number of overloads within the scheduling day, and merge the short-term reward function and the long-term reward function to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model.

[0048] Adaptive noise module: When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network;

[0049] State enhancement compensation module: Used to perform peak shaving of photovoltaic power using a deep reinforcement learning model for distributed photovoltaic power security. During peak shaving, a state enhancement space is established based on state transmission delay and weather conditions. At each time step, the state enhancement space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

[0050] The beneficial effects of this invention are that, compared with the prior art,

[0051] This invention constructs a multi-timescale reward function, which not only evaluates the performance of all time periods within each distributed photovoltaic peak-shaving day, but also performs an overall evaluation of the peak-shaving performance on the peak-shaving day, further improving the ability to alleviate transformer overload. This invention introduces an adaptive noise attenuation method, which dynamically adjusts the noise intensity by providing real-time feedback on transformer overload conditions. When the overload risk increases, the noise intensity decreases to suppress high-risk behavior, prioritizing transformer safety. When the transformer can bear sufficient reverse power margin, the exploratory nature is enhanced, improving the efficiency of photovoltaic power consumption. This invention establishes a state enhancement space based on state transmission delay and weather conditions. At each moment, the state enhancement space is used to predict the state delay compensation amount, taking into account the impact of delay and weather, improving the accuracy and timeliness of large-scale DPV scheduling decisions under state transmission delay. Attached Figure Description

[0052] Figure 1 This is a flowchart of the method of the present invention;

[0053] Figure 2 This is a schematic diagram of a power distribution network according to an embodiment of the present invention;

[0054] Figure 3 A schematic diagram of a multi-timescale reward function;

[0055] Figure 4 This is a schematic diagram illustrating the state transmission delay. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this invention. The embodiments described in this application are merely some embodiments of this invention, and not all embodiments. Based on the spirit of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of this invention.

[0057] like Figure 1 As shown, Embodiment 1 of the present invention provides a state-enhanced large-scale distributed photovoltaic security peak-shaving method, including the following:

[0058] A deep reinforcement learning model for distributed photovoltaic (PV) power shaving is established; the action of the deep reinforcement learning model is to perform power shaving for the PV power of each substation.

[0059] The scheduling day is divided into multiple time periods. A short-term reward function is established based on the transformer overload power and solar curtailment of the substation at each time period. A long-term reward function is established based on the number of overloads within the scheduling day. The short-term and long-term reward functions are merged to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model.

[0060] When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network.

[0061] It should be noted that the policy network is a network that explores actions in deep reinforcement learning, and noise is added to increase exploration.

[0062] A deep reinforcement learning model for distributed photovoltaic (PV) security peak shaving is used to perform PV peak shaving. During peak shaving, a state augmentation space based on state transmission delay and weather conditions is established. At each time step, the state augmentation space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

[0063] It should be noted that, as Figure 2 As shown, the photovoltaic output and load demand data in this embodiment are derived from the daily operation sampling data of a county, with a sampling resolution of 1 hour. The three substations in the county are connected to a unified busbar, and each substation includes a transformer and distributed photovoltaics. The technical parameters of the transformers and photovoltaics of the three substations connected to the unified busbar are shown in Table 1.

[0064] Table I. Technical parameters of transformers and photovoltaic systems at each substation

[0065]

[0066] In this preferred embodiment, the establishment of the deep reinforcement learning model for distributed photovoltaic power security peak shaving specifically involves:

[0067] The deep reinforcement learning model is a Markov decision model, whose state includes the current time, the photovoltaic output and load demand of each substation at the current time;

[0068] The action is to perform peak-shaving control on the photovoltaic power of each substation. The range of the action is between [-1, 0]. When the action of the corresponding substation is negative, the photovoltaic output of the corresponding substation is reduced. When it is 0, the photovoltaic output of the substation remains unchanged.

[0069] It should be noted that the status represents information observed in the DPV peak-shaving environment and is an important basis for guiding photovoltaic controller agents to make reasonable decisions. The formula is:

[0070]

[0071] In the formula, It is the set of all states; The current moment; , The first Transformer Real-time photovoltaic output and load demand; This represents the total number of transformers.

[0072] Actions are peak-shaving instructions issued by the agent to control the DPV output, based on the current policy and the state observed from the environment. The formula is:

[0073]

[0074] In the formula, No. Transformer Actions at any given moment; A set of actions;

[0075] When the substation's operation is negative, the adjustment will reduce the photovoltaic output of the corresponding substation. ;

[0076] The state transition probability is the current state in a given environment. When the agent performs an action, the state transitions. The probability of state transitions is influenced not only by the uncertainty of large-scale photovoltaic power output but also by fluctuations in load demand. Traditional methods struggle to effectively address this uncertainty, while deep reinforcement learning-based methods can incorporate uncertainty into the transition probability equation and learn the relationships between states in a data-driven manner.

[0077] In this preferred embodiment, such as Figure 3 As shown, the fusion of short-term and long-term reward functions to form a multi-timescale reward function, which serves as the composite reward function of the deep reinforcement learning model, specifically involves:

[0078] The reward function for all short-term times within a scheduling day is summed, and then the reward function for long-term times is multiplied by the corresponding set weight of the long-term time, to form a multi-time-scale reward function.

[0079] The formula for the multi-timescale reward function is:

[0080]

[0081] In the formula, for The reward function based on transformer overload power at any given time; for The reward function based on the amount of solar energy wasted at each moment; for The short-term reward function at a given moment; For long-term reward functions; For multi-timescale reward functions; , , For the set weights, , , The sum is 1; This represents the total number of transformers. For the first Transformer Overload power at any given time; For the first The capacity of the transformer; For the first The maximum reverse power of each transformer; For the first Transformer The amount of solar energy wasted at any given moment; For the first Transformer Solar power output at all times; , For the first The number of overload occurrences and the maximum permissible number of overloads for each transformer during the dispatching day; This represents the total number of time slots within the scheduling day.

[0082] It should be noted that the overload power is:

[0083]

[0084] The formula for calculating reverse power is:

[0085]

[0086] In a preferred embodiment, the calculation of the transformer overload performance index specifically involves:

[0087] Transformer The overload power at any given time divided by the capacity of the corresponding transformer, and then divided by the maximum reverse power of the corresponding transformer, equals the value of the corresponding transformer's overload power. Overload factor at any given time

[0088] Calculate all transformers The average overload performance is obtained by taking the square root of the sum of the squares of the overload coefficients at each time point.

[0089] Extract all transformers The maximum value of the overload coefficient at any given time is subtracted from the set overload threshold, and the Sigmoid function of the result is calculated as the maximum overload performance.

[0090] The average overload performance and the maximum overload performance are multiplied to obtain the transformer overload performance index.

[0091] Specifically, the transformer overload performance index The formula is:

[0092]

[0093] In the formula, The set overload threshold; This is the Sigmoid function.

[0094] It should be noted that, Characterizes the overall overload condition of all transformers. The overload condition of a transformer with the maximum overload power on its surface is determined by its transformer overload performance index. This index not only assesses the overall overload condition of the distribution network but also shows sensitivity to individual severe overloads.

[0095] In this preferred embodiment, the step of calculating adaptive noise as a disturbance term based on the transformer overload performance index specifically involves:

[0096]

[0097] In the formula, For adaptive noise, , These are the set fixed linear attenuation coefficient and the feedback attenuation coefficient, respectively; and These are the current iteration count and the set maximum iteration count, respectively. The transformer overload performance index; The set reference noise;

[0098] Disturbance term Follows a normal distribution , For It is a normal distribution with standard deviation.

[0099] In this preferred embodiment, the establishment of the state enhancement space based on state transmission delay and weather conditions specifically involves:

[0100] The current moment t The state, current weather conditions, timestamp of transmission, and expected action together form a vector as the state augmentation space; the weather conditions include light intensity and normalized rate of change of light intensity. It should be noted that the normalized rate of change of light intensity is the rate at the current time... t Light intensity minus the previous moment t- The light intensity of 1, divided by the current time. t The expected action is to maintain the action issued by the dispatch center at the previous moment. If the dispatch center did not issue a peak-shaving action at the previous moment, the expected action is to implement maximum power point tracking (MPPT).

[0101] It should be noted that, under normal circumstances, substations perform peak shaving based on the peak shaving instructions issued by the dispatch center, which is the action taken at the previous moment. However, faults or other abnormal situations may occur, resulting in the dispatch center not issuing instructions for a long time. In this case, the system switches to a safety strategy, which is to implement Maximum Power Point Tracking (MPPT).

[0102] In this preferred embodiment, the state delay compensation amount for state enhancement spatial prediction is used at each time step, and the current state is corrected using the compensation amount. Specifically:

[0103] The delay time is obtained by subtracting the timestamp of the time when the status enhancement space is received from the scheduling center from the timestamp of the time when it is sent.

[0104] It should be noted that large-scale DPV and local loads are concentrated in low-voltage distribution networks, which are geographically far from provincial dispatch centers. Therefore, data cannot be directly sent to the provincial dispatch center. Instead, it needs to go through multiple levels of data transmission and processing. The process involves county dispatch, local dispatch, and finally provincial dispatch. Therefore, there is a constant communication delay when transmitting the system's operating status. However, traditional peak-shaving methods rarely consider peak-shaving scenarios with constant communication delays. A schematic diagram of the status transmission delay is shown below. Figure 4 As shown in the figure d This is the delay time.

[0105] Current moment t The photovoltaic output and load demand in the current state, the current weather conditions, the delay time and the expected action are used as feature vectors. The pre-trained quantile regression forest algorithm is input to predict the delay compensation amount and output its confidence. The delay compensation amount includes the photovoltaic output compensation amount and the load demand compensation amount.

[0106] The absolute values ​​of both the photovoltaic output compensation and the load demand compensation are less than or equal to the current time. t The ratio of photovoltaic output to load demand in the current state;

[0107] When the expected action is to maintain the action issued by the dispatch center at the previous moment, the ratio is a set value; it should be noted that the set value of the ratio in this embodiment is 5%.

[0108] When the expected action is to perform maximum power point tracking (MPPT), the proportion is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change of illumination intensity.

[0109] The current moment t The photovoltaic output and load demand in the current state are respectively increased by adding the photovoltaic output compensation amount and the load demand compensation amount to obtain the corrected photovoltaic output and load demand.

[0110] In this preferred embodiment, the ratio is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change of light intensity, specifically:

[0111] When the expected action is to perform maximum power point tracking (MPPT), the proportional for:

[0112]

[0113] In the formula, Confidence level; , The set coefficient; This represents the normalized rate of change of light intensity.

[0114] It should be noted that the quantile regression forest algorithm outputs predicted values ​​corresponding to three set quantiles, where each quantile represents the probability that the true value is less than the corresponding predicted value. The three set quantiles include the median (0.5), specifically 0.05, 0.5, and 0.95 in this embodiment. The predicted value corresponding to the median is the final predicted value of the quantile regression forest algorithm. The confidence level in this embodiment... for:

[0115]

[0116] In the formula, , These are the predicted values ​​for quantiles of 0.05 and 0.95, respectively. The set compensation reference range is obtained by subtracting the minimum compensation from the maximum compensation in historical data;

[0117] It should be noted that the rate of change in illumination is large, resulting in large variations in photovoltaic output power; therefore, the compensation range is correspondingly wide. Under the same illumination conditions, a higher confidence level in the predicted value leads to higher prediction accuracy, allowing for a larger compensation value. For cases with low confidence, a smaller constraint range is set to avoid compensation overshoot due to inaccurate predictions. Specifically, in this embodiment... Set to 0.3 For tanh 1 (0.5) divided by the set rate of change threshold.

[0118] Embodiment 2 of the present invention proposes a large-scale distributed photovoltaic security peak-shaving system based on the state enhancement method described in Embodiment 1 of the present invention, including a peak-shaving model establishment module, a reward function construction module, an adaptive noise module, and a state enhancement compensation module, specifically as follows:

[0119] Peak shaving model establishment module: used to establish a deep reinforcement learning model for safe peak shaving of distributed photovoltaic power; the action of the deep reinforcement learning model is to perform peak shaving for the photovoltaic power of each substation;

[0120] Reward function construction module: It is used to divide the scheduling day into multiple time periods, establish a short-term reward function based on the transformer overload power and solar curtailment of the substation at each time period, establish a long-term reward function based on the number of overloads within the scheduling day, and merge the short-term reward function and the long-term reward function to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model.

[0121] Adaptive noise module: When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network;

[0122] State enhancement compensation module: Used to perform peak shaving of photovoltaic power using a deep reinforcement learning model for distributed photovoltaic power security. During peak shaving, a state enhancement space is established based on state transmission delay and weather conditions. At each time step, the state enhancement space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

[0123] This disclosure can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this disclosure.

[0124] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention.

Claims

1. A state-enhanced method for large-scale distributed photovoltaic (PV) security peak shaving, characterized in that, Includes the following: A deep reinforcement learning model for distributed photovoltaic (PV) power shaving is established; the action of the deep reinforcement learning model is to perform power shaving for the PV power of each substation. The scheduling day is divided into multiple time periods. A short-term reward function is established based on the transformer overload power and solar curtailment of the substation at each time period. A long-term reward function is established based on the number of overloads within the scheduling day. The short-term and long-term reward functions are merged to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model. When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network. A deep reinforcement learning model for distributed photovoltaic (PV) security peak shaving is used to perform PV peak shaving. During peak shaving, a state augmentation space based on state transmission delay and weather conditions is established. At each time step, the state augmentation space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

2. The state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 1, characterized in that: The establishment of the deep reinforcement learning model for distributed photovoltaic power security peak shaving is specifically as follows: The deep reinforcement learning model is a Markov decision model, whose state includes the current time, the photovoltaic output and load demand of each substation at the current time; The action is to adjust the photovoltaic output of each substation to a peak value. The range of the action is between [-1, 0]. When the action of the corresponding substation is negative, the photovoltaic output of the corresponding substation is reduced. When it is 0, the photovoltaic output of the substation remains unchanged.

3. The state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 1, characterized in that: The fusion of short-term and long-term reward functions to form a multi-timescale reward function, which serves as the composite reward function of the deep reinforcement learning model, is specifically as follows: The short-term reward function for a given moment is as follows: establish a reward function based on transformer overload power and a reward function based on solar curtailment, and then sum the two reward functions according to the weights set for the corresponding reward functions. The long-term reward function is as follows: calculate the difference between the number of overload occurrences of each transformer on the dispatch day and the set maximum allowable number of overload occurrences, add up all differences greater than 0, and find the opposite number; The reward function for all short-term times within a scheduling day is summed, and then the reward function for long-term times is multiplied by the corresponding set weight of the long-term time, to form a multi-time-scale reward function.

4. The state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 3, characterized in that: The establishment of reward functions based on transformer overload power and solar curtailment amounts are as follows: In the formula, for The reward function based on transformer overload power at any given time; for The reward function based on the amount of solar energy wasted at each moment; This represents the total number of transformers. For the first Transformer Overload power at any given time; For the first The capacity of the transformer; For the first The maximum reverse power of each transformer; For the first Transformer The amount of solar energy wasted at any given moment; For the first Transformer Photovoltaic power output at all times.

5. A state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 4, characterized in that: The calculation of the transformer overload performance index is specifically as follows: Transformer The overload power at any given time divided by the capacity of the corresponding transformer, and then divided by the maximum reverse power of the corresponding transformer, equals the value of the corresponding transformer's overload power. Overload factor at any given time Calculate all transformers The average overload performance is obtained by taking the square root of the sum of the squares of the overload coefficients at each time point. extract all transformers the maximum value of the overload factor at the moment, subtract a set overload threshold value from the maximum value, calculate the sigmoid function of the result as the maximum overload performance; The average overload performance and the maximum overload performance are multiplied to obtain the transformer overload performance index.

6. The state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 5, characterized in that: The adaptive noise calculated based on the transformer overload performance index as a disturbance term is specifically as follows: In the formula, For adaptive noise, , These are the set fixed linear attenuation coefficient and the feedback attenuation coefficient, respectively; and These are the current iteration count and the set maximum iteration count, respectively. The transformer overload performance index; The set reference noise; perturbation term subject to a normal distribution , is a normal distribution with standard deviation.

7. A state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 2, characterized in that: The establishment of the state enhancement space based on state transmission delay and weather conditions specifically involves: The current moment t The state, current weather conditions, timestamp of transmission, and expected action together form a vector as the state augmentation space; the weather conditions include light intensity and normalized light intensity change rate; the expected action is to maintain the action issued by the dispatch center at the previous moment; if the dispatch center did not issue a peak-shaving action at the previous moment, the expected action is to implement maximum power point tracking (MPPT).

8. A state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 7, characterized in that: The state delay compensation amount is used at each time step to correct the current state using the state-enhanced spatial prediction state delay compensation amount. Specifically: The delay time is obtained by subtracting the timestamp of the time the status enhancement space was received from the timestamp of the time the space was sent. Current moment t The photovoltaic output and load demand in the current state, the current weather conditions, the delay time and the expected action are used as feature vectors. The pre-trained quantile regression forest algorithm is input to predict the delay compensation amount and output its confidence. The delay compensation amount includes the photovoltaic output compensation amount and the load demand compensation amount. The absolute values ​​of both the photovoltaic output compensation and the load demand compensation are less than or equal to the current time. t The ratio of photovoltaic output to load demand in the current state; When the expected action is to maintain the action issued by the dispatch center at the previous moment, the ratio is a set value; When the expected action is to perform maximum power point tracking (MPPT), the proportion is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change of illumination intensity. Current moment t The photovoltaic output and load demand in the current state are respectively increased by adding the photovoltaic output compensation amount and the load demand compensation amount to obtain the corrected photovoltaic output and load demand.

9. A state-enhanced large-scale distributed photovoltaic security peak-shaving method according to claim 8, characterized in that: The ratio is calculated based on the confidence level of the quantile regression forest algorithm and the normalized rate of change in light intensity, specifically: When the predicted action is to implement maximum power point tracking, MPPT, the proportion is: In the formula, is the confidence level; , is the set coefficient; is the normalized rate of change of illumination intensity.

10. A large-scale distributed photovoltaic security peak-shaving system based on the state enhancement method of any one of claims 1-9, comprising a peak-shaving model establishment module, a reward function construction module, an adaptive noise module, and a state enhancement compensation module, characterized in that: Peak shaving model establishment module: used to establish a deep reinforcement learning model for safe peak shaving of distributed photovoltaic power; the action of the deep reinforcement learning model is to perform peak shaving for the photovoltaic power of each substation; Reward function construction module: It is used to divide the scheduling day into multiple time periods, establish a short-term reward function based on the transformer overload power and solar curtailment of the substation at each time period, establish a long-term reward function based on the number of overloads within the scheduling day, and merge the short-term reward function and the long-term reward function to form a multi-time-scale reward function as the composite reward function of the deep reinforcement learning model. Adaptive noise module: When training a deep reinforcement learning model, during each iteration of action exploration, the transformer overload performance index is calculated based on the transformer overload power, adaptive noise is calculated based on the transformer overload performance index and a perturbation term is generated, and the perturbation term is superimposed on the current policy network; State enhancement compensation module: Used to perform peak shaving of photovoltaic power using a deep reinforcement learning model for distributed photovoltaic power security peak shaving. During peak shaving, a state enhancement space is established based on state transmission delay and weather conditions. At each time step, the state enhancement space is used to predict the state delay compensation amount, and the compensation amount is used to correct the current state.

Citation Information

Patent Citations

  • Energy storage-containing distributed photovoltaic peak regulation control method based on economic benefit evaluation

    CN111952996A

  • Distributed photovoltaic peak regulation method based on acquisition terminal

    CN117200338A

  • Power grid active scheduling intelligent decision-making method and system based on Lagrange relaxation

    CN117254468A

  • Multi-scale electric energy flow control method and system based on deep reinforcement learning

    CN119813225A