Energy storage scheduling strategy generation method and device based on multi-objective optimization algorithm

By employing a multi-layer uncertainty quantification and dynamic weight adaptive energy storage scheduling strategy, combined with Bayesian optimization and graph neural networks, the uncertainty issues of photovoltaic output, load demand, and battery health status in energy storage systems are resolved, achieving efficient charge and discharge control and system optimization.

CN122246899APending Publication Date: 2026-06-19ZHEJIANG TTN ELECTRIC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG TTN ELECTRIC
Filing Date
2026-05-22
Publication Date
2026-06-19

Smart Images

  • Figure CN122246899A_ABST
    Figure CN122246899A_ABST
Patent Text Reader

Abstract

This invention discloses a method and apparatus for generating energy storage scheduling strategies based on a multi-objective optimization algorithm. The method includes: multi-level uncertainty quantification of photovoltaic power, load, electricity price, and battery health status; constructing a distribution-sensitive multi-objective function, including objectives such as revenue, battery life, photovoltaic absorption, and demand control; generating dynamic weight coefficients online through Bayesian optimization based on the uncertainty of variable predictions and battery status; obtaining a set of coupled constraints using a graph neural network based on real-time battery data and health status, including a lower limit for photovoltaic absorption rate and a power limit for health status; solving a baseline charging and discharging plan for a future preset time period using a global scheduling layer; and inputting the dynamic weights and real-time data into a reinforcement learning strategy network at a local scheduling layer to generate and execute minute-level charging and discharging actions under coupled constraints. This invention enables intelligent scheduling and revenue optimization of energy storage systems in complex and highly uncertain environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of energy storage scheduling technology, specifically to a method and apparatus for generating energy storage scheduling strategies based on a multi-objective optimization algorithm. Background Technology

[0002] With the large-scale deployment of distributed photovoltaic (PV) and battery energy storage systems, energy storage scheduling needs to simultaneously consider multiple conflicting objectives, such as economic benefits, battery lifespan, local PV consumption, and demand control. Key variables such as PV output, load demand, real-time electricity prices, and battery health status are highly uncertain and have nonlinear coupling relationships with each other, making scheduling decisions extremely complex.

[0003] Existing energy storage dispatch methods mostly rely on deterministic point predictions to construct optimization models, aggregating multiple objectives into a single objective with fixed weights. Such methods cannot characterize the distribution of prediction errors and struggle to incorporate the expected value and risk of uncertainty into the optimization process. When prediction confidence is low or fluctuates drastically, fixed-weight strategies can easily lead to significant deviations from expected returns, insufficient photovoltaic power absorption, or increased curtailment, while simultaneously accelerating battery lifespan degradation.

[0004] Some methods introduce robust optimization or stochastic programming to handle uncertainty, but they usually use a single form of uncertainty set or a pre-defined probability distribution or interval, lacking hierarchical quantification of the confidence level of multivariate predictions, and failing to establish a dynamic mapping between the fluctuation range of uncertainty and the weights of each objective. When the battery's state of charge approaches the safety boundary or its health deteriorates, fixed weights cannot promptly strengthen the protection of battery life, thus restricting the long-term safe operation of energy storage assets.

[0005] In summary, existing energy storage dispatch strategies have many shortcomings and urgently need improvement. Summary of the Invention

[0006] In view of the above-mentioned shortcomings mentioned in the background art, the purpose of this invention is to provide a method, apparatus and computer-readable storage medium for generating energy storage scheduling strategies based on a multi-objective optimization algorithm.

[0007] The first aspect of the present invention provides a method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm, the method comprising the following steps: Multi-level uncertainty quantification is performed on the predicted values ​​of photovoltaic power, load, electricity price and battery health status to obtain the probability distribution or interval of each variable, and a distribution-sensitive multi-objective function is constructed in the form of expected value and risk of each objective. The objectives include revenue, battery life, photovoltaic consumption and demand control. Based on the fluctuation range and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack, dynamic weight coefficients for each objective are generated online through Bayesian optimization. Based on the uncertainty quantification results of the battery pack health status and the real-time operation data of the battery pack, a graph neural network is used to perceive the internal state of the battery pack to obtain a set of coupling constraints. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health status. The global scheduling layer uses a distribution-sensitive multi-objective function, dynamic weight coefficients, and a set of coupling constraints to solve for a charging and discharging plan for a future preset period as a benchmark. The local scheduling layer uses this benchmark as a reference, inputs the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, and generates and executes minute-level charging and discharging actions under the set of coupling constraints.

[0008] A second aspect of the present invention provides an energy storage scheduling strategy generation device based on a multi-objective optimization algorithm, the device comprising: Multi-level uncertainty quantification is performed on the predicted values ​​of photovoltaic power, load, electricity price and battery health status to obtain the probability distribution or interval of each variable, and a distribution-sensitive multi-objective function is constructed in the form of expected value and risk of each objective. The objectives include revenue, battery life, photovoltaic consumption and demand control. Based on the fluctuation range and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack, dynamic weight coefficients for each objective are generated online through Bayesian optimization. Based on the uncertainty quantification results of the battery pack health status and the real-time operation data of the battery pack, a graph neural network is used to perceive the internal state of the battery pack to obtain a set of coupling constraints. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health status. The global scheduling layer uses a distribution-sensitive multi-objective function, dynamic weight coefficients, and a set of coupling constraints to solve for a charging and discharging plan for a future preset period as a benchmark. The local scheduling layer uses this benchmark as a reference, inputs the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, and generates and executes minute-level charging and discharging actions under the set of coupling constraints.

[0009] A third aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method as described in any of the preceding claims.

[0010] Compared with the prior art, the present invention has at least the following beneficial technical effects: This invention combines multi-layer uncertainty quantification, dynamic weight adaptation, multi-objective coupling constraints, and reinforcement learning strategy networks to enable energy storage scheduling strategies to simultaneously consider multiple objectives such as economic benefits, battery life, local photovoltaic consumption, and demand control. Under highly uncertain and nonlinearly coupled operating environments, it achieves real-time, adaptive, and high-precision charge and discharge control, effectively improving photovoltaic consumption rate, reducing curtailment, extending battery life, and significantly enhancing the overall economy and safety of energy storage systems. Attached Figure Description

[0011] Figure 1 This is a flowchart illustrating a method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm, as disclosed in an embodiment of the present invention.

[0012] Figure 2 This is a schematic diagram illustrating the determination of the worst-case monomer through a message passing mechanism and an attention mechanism, as disclosed in an embodiment of the present invention.

[0013] Figure 3 This is a schematic diagram of the structure of an energy storage scheduling strategy generation device based on a multi-objective optimization algorithm disclosed in an embodiment of the present invention. Detailed Implementation

[0014] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this disclosure are described in detail below with reference to specific embodiments. It should be understood that the specific embodiments described herein are for illustration and explanation only and are not intended to limit this disclosure.

[0015] It is understood that the energy storage scheduling strategy generation method based on multi-objective optimization algorithm described in this embodiment can be deployed in the local controller, edge computing gateway or cloud energy management platform of the energy storage system. It is used to generate a charging and discharging scheduling strategy that takes into account revenue, battery life, photovoltaic consumption and demand control in a dynamic environment with high uncertainty and nonlinear coupling of multiple variables such as photovoltaic power generation, load fluctuation, real-time electricity price and battery health status.

[0016] For the above technical issues, please refer to Figure 1 This invention provides a method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm, comprising the following steps: S1, perform multi-level uncertainty quantification on the predicted values ​​of photovoltaic power, load, electricity price and battery health status to obtain the probability distribution or interval of each variable, and construct a distribution-sensitive multi-objective function in the form of each objective expectation and risk. The objectives include revenue, battery life, photovoltaic consumption and demand control. Traditional energy storage dispatch methods typically rely on deterministic point predictions, meaning they output only a single predicted value for key variables such as photovoltaic power, load, and electricity price as input to the optimization model. However, in actual operation, photovoltaic output is significantly affected by weather conditions, load fluctuations are random, real-time electricity prices are driven by market supply and demand dynamics, and battery health status varies slowly with charging and discharging conditions and is difficult to measure accurately. The prediction errors for these variables are objectively present and cannot be ignored. When the prediction deviation is large, the charging and discharging strategies generated based on deterministic predictions may lead to actual returns far lower than expected, insufficient photovoltaic power absorption resulting in curtailment, or accelerated battery lifespan degradation due to overcharging and discharging.

[0017] To address the aforementioned issues, this invention employs multi-layer uncertainty quantification to quantify the predicted values ​​of photovoltaic power, load, electricity price, and battery health status, obtaining the probability distribution or intervals for each variable. Multi-layer uncertainty quantification refers to modeling uncertainty using a hierarchical and categorized approach, addressing the differences in the distribution characteristics of prediction errors for different variables at different time scales, rather than applying a uniform uncertainty set or preset probability distribution to all variables. The probability distribution or interval means that, depending on the variable characteristics and application requirements, the uncertainty quantification results can be expressed as a complete probability density function, or as a confidence interval or prediction interval, making this invention more widely applicable in practical deployments.

[0018] Specifically, on a day-ahead or longer forecasting scale, electricity price forecasting errors can usually be characterized as a normal distribution, log-normal distribution, or Student's t-distribution by statistically testing and fitting the historical forecasting error series.

[0019] For photovoltaic power forecasting, weather type (such as sunny, cloudy, overcast, and rainy) can be used as a classification condition to establish the conditional probability distribution of forecast error under different weather types. This is because the fluctuation characteristics of photovoltaic output differ significantly under different weather conditions. The forecast accuracy is usually higher and the error distribution is more concentrated under sunny conditions, while the irradiance changes drastically under cloudy or rainy conditions, and the variance of the forecast error increases significantly.

[0020] For load demand, time-segmented conditional probability distributions can be established by combining factors such as weekday / holiday type and time period. In scenarios where the amount of data is limited or it is difficult to reliably estimate the complete probability distribution, the above variables can also be output in the form of prediction intervals, such as directly giving the upper and lower bounds of the prediction at 80% or 90% confidence levels.

[0021] For battery health status, since it changes slowly and cannot be directly measured, it can be estimated online using a state estimation algorithm. The estimation result itself can be expressed as a posterior probability distribution or an estimate with a confidence interval.

[0022] After obtaining the probability distributions or intervals of each variable, a distribution-sensitive multi-objective function is further constructed in the form of expected values ​​and risks for each objective. Distribution sensitivity means that the construction of this multi-objective function directly depends on the aforementioned probability distributions or intervals. When the input probability distributions or intervals change, even if the predicted means of each variable remain unchanged, the value of the multi-objective function will change accordingly, thus enabling the optimization process to perceive changes in the uncertain environment.

[0023] Let the multi-objective function vector be... These correspond to four objectives: revenue, battery life, photovoltaic power generation, and demand control. Unlike deterministic multi-objective functions that only consider the expected values ​​of each objective, distribution-sensitive multi-objective functions consider each objective... It consists of two items:

[0024] in, The expected term of this objective is calculated based on the previously obtained probability distribution or interval; This is a risk metric, also driven by the aforementioned probability distribution or interval; This is the risk preference coefficient, used to adjust the relative proportion of expectation and risk in the corresponding objective.

[0025] Specifically, for the revenue target, the expected term is the anticipated revenue within the dispatch cycle calculated based on the joint probability distribution of electricity price, photovoltaic power, and load. The risk term can use downside risk or conditional value of risk, extracting the degree of revenue loss under adverse scenarios from the left tail of the probability distribution. For the battery life target, the expected term is the expected lifespan reduction based on the expected relationship between charge / discharge depth and cycle count. The risk term captures the potential losses from accelerated degradation under extreme charge / discharge depths or high-temperature conditions, as characterized by the tail of the probability distribution. For the photovoltaic grid integration target, the expected term is the expected grid integration under predicted photovoltaic output conditions. The risk term reflects the probability and degree of insufficient grid integration derived from the photovoltaic output prediction distribution. For the demand control target, the expected term is the expected demand. The risk term is the probability and magnitude of demand exceeding limits derived from the load prediction distribution.

[0026] It should be understood that by simultaneously considering both expectation and risk as described above, the values ​​of each objective depend not only on the central tendency of the predictor variables but also on the extreme scenario risks implied by their distribution shape. When the volatility of the probability distribution increases or the tails thicken, even if the expectation term remains unchanged, the risk term... It will also increase, thus enabling more objective functions to automatically reflect changes in the uncertain environment.

[0027] S2, Based on the fluctuation amplitude and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack, dynamic weight coefficients of each objective are generated online through Bayesian optimization. Traditional energy storage dispatching methods typically pre-determine fixed weights for each objective after constructing a multi-objective function. This means the relative importance of revenue, battery life, photovoltaic (PV) consumption, and demand control remains constant during operation. However, in actual operation, the priority of the dispatching strategy should be dynamically adjusted according to environmental conditions. For example, when the confidence level of electricity price forecasts is high and the fluctuation range is small, the focus is on maximizing revenue. When the confidence level of electricity price forecasts is low or the fluctuations are severe, the reliability of the forecast results decreases. If high weights are still used to pursue revenue, a large deviation between the actual and predicted values ​​will result in actual revenue being far lower than expected. When the state of charge of the battery pack approaches the safety boundary or its health deteriorates, continuing to maintain high-power charging and discharging to pursue revenue will accelerate battery life deterioration and even trigger safety risks. Therefore, fixed-weight strategies cannot dynamically balance the priority among objectives according to real-time environmental changes, limiting the adaptability of dispatching strategies in complex and uncertain environments.

[0028] To address the aforementioned issues, this invention utilizes the fluctuation amplitude and confidence level of the obtained probability distribution or interval, as well as the current state of charge and health of the battery pack, to generate dynamic weight coefficients for each objective online through Bayesian optimization.

[0029] Bayesian optimization is a global optimization method applicable to complex black-box functions. Its Bayesian nature is reflected in the following two core mechanisms: (1) a probabilistic surrogate model (such as a Gaussian process) is used to model the objective function. This model not only outputs the predicted value but also the prediction uncertainty, expressing the understanding of the unknown function in the form of a probability distribution; (2) in each iteration, the posterior distribution of the surrogate model is updated through Bayesian inference using new observation data, so that the model's approximation of the objective function is gradually refined. On this basis, sequential decision-making is carried out by collecting functions under the guidance of the posterior distribution, and the optimal solution is efficiently found in a limited number of online evaluations.

[0030] The volatility refers to the width of the probability distribution or interval obtained above, used to measure the degree of uncertainty of the current predicted value. The larger the volatility, the less accurate the prediction result of the variable. The confidence level refers to the confidence level corresponding to the prediction interval or probability distribution. The lower the confidence level, the lower the prediction model's certainty about the current prediction result.

[0031] In terms of adjustment mechanisms, volatility and confidence levels influence the weighting of revenue targets. Specifically, when the probability distribution or range of electricity prices or photovoltaic power fluctuates significantly, or when the prediction confidence level is low, it indicates a decrease in the predictive reliability of current electricity price trends or photovoltaic output. This increases the uncertainty of expected revenue, and the relative weight of revenue targets in the overall optimization should be appropriately reduced to make the dispatch strategy more conservative, avoiding significant actual deviation losses due to excessive profit-seeking. Conversely, when volatility is small and confidence levels are high, the prediction reliability is strong, and the weighting of revenue targets can be appropriately relaxed to obtain greater economic benefits.

[0032] The current state of charge (SOC) and health status of the battery pack influence the adjustment of the weighting of battery life targets. Specifically, when the SOC deviates further from the preset safe range or the battery health status is lower, it indicates a higher operational risk or some degradation of the battery pack. In this case, the relative weight of the battery life target in the overall optimization should be appropriately increased to make the scheduling strategy more conservative in terms of charging and discharging power and depth, strengthen battery protection, and delay battery life loss. Conversely, when the SOC is in the middle of the safe range and the health status is good, the battery operational risk is low, and the relative weight of the battery life target can be appropriately reduced to free up more optimization space for other targets.

[0033] As an example of how to achieve the above-mentioned dynamic weight generation, the dynamic weight coefficients of each objective are generated online through Bayesian optimization, which may specifically include the following steps S21-S22.

[0034] S21, construct a Gaussian process proxy model with each target weight as the decision variable, determine the upper limit of the profit weight search space based on the fluctuation range and confidence level of the probability distribution or interval, and determine the lower limit of the lifetime weight search space based on the current state of charge and health of the battery pack. In practical implementation, the weights of revenue target, battery life target, photovoltaic consumption target, and demand control target are used as decision variables to construct a Gaussian process surrogate model. The Gaussian process is the most classic probabilistic surrogate model in Bayesian optimization; it is a Bayesian prior defined on a function space. Unlike deterministic models such as neural networks, the Gaussian process outputs not only the predicted mean but also the predicted variance for any input (a quantitative expression of the uncertainty of cognition in unknown regions within the Bayesian framework).

[0035] It has There are targets, and their weight vectors are: ,satisfy Using weight vectors As input, the weighted overall deviation after the scheduling strategy is executed under this weight combination. To output, construct a Gaussian process surrogate model:

[0036] in, It is a mean function. The covariance kernel function is used to characterize the correlation between different points in the weight space. The Gaussian process surrogate model uses the weighted deviation between the actual and ideal values ​​of each objective within a historical scheduling period as its objective function.

[0037] For any unsampled weight combination The model outputs the predicted mean. With prediction variance The two together constitute a pair The posterior probability estimate.

[0038] When determining the search space for return weights, fixed upper and lower bounds are not used; instead, constraints are dynamically set based on the current environmental state. Specifically, for the return weight search space, the fluctuation range and confidence level of the probability distribution or interval are used as input conditions. Let the target return weights be... Upper limit of search space after environmental constraints It is determined according to the following rules:

[0039] in, , These are the normalized fluctuation range indicators for the probability distribution or interval of photovoltaic power and electricity price, respectively. , These are the normalized confidence indices for photovoltaic power and electricity price forecasts, respectively. The base scaling factor; to This is the sensitivity adjustment coefficient. When the fluctuation amplitude increases or the confidence level decreases, the exponential term... Decrease Consequently, the upper limit of the search space for profit weighting is reduced accordingly, thus limiting the profit-seeking tendency of strategies from the source of search.

[0040] For the lifetime weight search space, the current state of charge and health of the battery pack are used as input conditions. Let the target weight for battery lifetime be defined. The lower bound of the search space after environmental constraints is It is determined according to the following rules:

[0041] in, This is a function of state of charge deviation, when the state of charge (SOC) is in the middle of a preset safe range. The closer to the overcharge or over-discharge boundary, the more... ; This is a normalized value for battery health status, ranging from [0,1]. and These are the weighting coefficients. When the SOC deviates from the safe range or the SOH decreases... and Increase Consequently, the lower limit of the lifespan weight search space is raised accordingly.

[0042] It should be understood that, through the above method, the four input conditions—fluctuation amplitude, confidence level, state of charge, and health status—impose dynamic constraints on the search space from two directions: the upper limit of the return weight and the lower limit of the lifetime weight, respectively, so that the shape and position of the search space can be adaptively adjusted according to the environment.

[0043] S22, under the constraints of the revenue weight search space and the lifetime weight search space, a collection function is constructed using the predicted mean and predicted variance output by the Gaussian process surrogate model. Candidate weight combinations are iteratively selected and evaluated using the collection function. The Gaussian process surrogate model is updated according to the evaluation results until convergence, and the dynamic weight coefficients of each target in the current scheduling cycle are generated online.

[0044] In practice, under the aforementioned search space constraints, Bayesian optimization enters the sequential iteration phase. First, several sampling points are initialized within the constrained search space, i.e., several candidate weight combinations are selected. The weighted bias value corresponding to each combination is then calculated through actual evaluation or simulation. The observation data is used to complete the initial fitting of the Gaussian process surrogate model.

[0045] Enter the iteration process. In the... In each iteration, the predicted mean is based on the posterior distribution provided by the current agent model. and prediction variance Construct a data acquisition function that automatically selects the next candidate weight combination with the highest evaluation value within the constraint space. Take the commonly used Expected Improvement (EI) data acquisition function as an example:

[0046] in, This represents the optimal (minimum) deviation value among the currently evaluated samples; This is an exploration coefficient used to adjust the exploration tendency; and These are the cumulative distribution function and probability density function of the standard normal distribution, respectively. .

[0047] It should be understood that, guided by the Bayesian posterior distribution, this acquisition function automatically balances utilization and exploration: the former tends to select regions with lower predicted mean (i.e., regions that the model considers to perform well), while the latter tends to select regions with larger predicted variance (i.e., regions with higher model cognitive uncertainty), thereby achieving a balance between fine-grained searching of known high-value regions and exploration of unknown regions.

[0048] New candidate weight combination After being evaluated, its observation results This new evidence is fed back to the Gaussian process surrogate model. It should be understood that this feedback update is the Bayesian posterior inference: using the observed data from existing sampling points as the likelihood, combined with the Gaussian process prior, a new posterior distribution is obtained through Bayes' theorem update, enabling the Gaussian process surrogate model to better understand the objective function. The approximation accuracy gradually improves in key regions of the weight space.

[0049] The above iterative process continues until a preset convergence condition is met, such as the change in the optimal weight combination between adjacent iterations being less than a preset threshold, or the maximum number of iterations being reached. After convergence, the Bayesian optimization outputs the optimal dynamic weight coefficients for each objective in the current scheduling cycle. The process is executed entirely online and can adaptively generate weight combinations suitable for the current conditions based on the real-time operating environment.

[0050] S3, based on the uncertainty quantification results of the battery pack health status and the real-time operation data of the battery pack, the internal state of the battery pack is perceived through a graph neural network to obtain a set of coupling constraints. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health status. In actual operation, energy storage battery packs consist of multiple individual cells connected in series and parallel. Due to differences in manufacturing processes, uneven temperature distribution, and varying charge-discharge cycle counts, the health status of each individual cell is inconsistent. If the battery pack is treated as an ideal whole and the additional constraints caused by the differences between individual cells are ignored, some cells may be overcharged, over-discharged, or locally overheated, accelerating the lifespan degradation of the entire pack and even causing safety hazards.

[0051] Traditional methods typically treat battery packs as equivalent to ideal voltage sources or simple first-order models, focusing only on the average state of charge and average state of health of the entire pack. They fail to recognize the bottleneck effect at the individual cell level; that is, the worst-performing cell often determines the safe operating boundary of the entire pack. Furthermore, after the aforementioned uncertainty quantification of battery health status, the health status estimation results for each individual cell inherently contain some uncertainty. How to effectively integrate this individual-level uncertainty during the coupling constraint generation process is a problem that traditional methods have not yet solved.

[0052] To address the aforementioned issues, this step utilizes a graph neural network to perceive the internal state of the battery pack. Graph neural networks are naturally suited for processing systems with well-defined topologies, such as battery packs: the electrical connections between cells naturally form a graph structure, the state parameters and uncertainties of each cell can serve as node features, and the electrical connections serve as edge features. For example... Figure 2 As shown, through the message passing mechanism, the graph neural network can propagate state information between neighboring nodes, so that the representation of each node not only depends on its own features but also incorporates information from its neighboring nodes. Based on this, an attention mechanism is used to automatically identify and focus on the worst-performing nodes whose health has deteriorated significantly and which have a significant inhibitory effect on the overall power capability of the group.

[0053] As an example, the internal state of the battery pack can be perceived through a graph neural network to obtain a set of coupling constraints, which may specifically include the following steps S31-S32.

[0054] S31. A graph structure is constructed with each individual cell in the battery pack as a graph node and the electrical connection relationship as an edge. The node features include the quantitative parameters of the health state uncertainty of the corresponding individual cell and real-time operating data. The graph neural network is used for message passing and the attention mechanism is used to aggregate the suppression effect of the health state uncertainty of the individual cell on the power capability of the whole pack, and the battery pack aggregated state vector is generated. In practice, the battery pack is first modeled as a graph. The graph structure can be represented as follows: ,in For a set of nodes, The set is an edge set. Each individual battery cell is treated as a graph node. The edge connections between nodes are determined based on the series and parallel electrical topology of the battery pack. If there is a direct electrical connection between two individual cells, an edge is established between the corresponding nodes. .

[0055] Each node eigenvectors It consists of two parts of data spliced ​​together: (1) the uncertainty quantification parameters of the health status of the single cell. The estimation of the health status itself is uncertain, which can be expressed in the form of a posterior probability distribution, and the feature vector contains the mean and variance of the posterior distribution. (2) the real-time operating data of the single cell, including real-time voltage, current, temperature and state of charge, etc.

[0056] After the graph structure is constructed, it is input into a graph neural network for message passing. In the... During message passing at each layer, each node aggregates messages from its neighboring nodes. The information is then integrated and updated with its own characteristics:

[0057] in, For nodes In the The feature vector of the layer; The transformation matrix is ​​its own feature matrix; The neighbor message transformation matrix; For nodes The set of neighboring nodes; It is a non-linear activation function. Through multiple rounds of message passing, the representation of each node continuously improves its receptive field, enabling it to perceive state information under the global electrical topology.

[0058] After several layers of message passing, an attention mechanism is used to perform weighted aggregation of individual cell nodes. The attention mechanism is designed to automatically identify the worst-performing cells with high health uncertainty and a significant inhibitory effect on the overall power capability of the group, and assign them higher attention weights.

[0059] in, For the first After the last layer outputs the node eigenvectors; To query the transformation matrix; For bias terms; This is the attention scoring vector. The weights are... This reflects the varying degrees of influence of each individual cell on the overall state of the group. For example, when the health status of a particular cell is significantly lower than the group average, or when the variance of its health status estimate is large, it indicates that the cell is more likely to reach the safety boundary during charging and discharging, and its corresponding attention weight... The corresponding level is relatively high.

[0060] Based on the aforementioned attention weights, the features of all nodes are summed in a weighted manner to generate the battery pack aggregate state vector. :

[0061] It should be understood that the aggregated state vector not only contains the average state information of the entire battery pack, but more importantly, it embeds the bottleneck constraints caused by the inconsistency of individual cells and the risk information caused by the uncertainty of health state estimation through attention weighting.

[0062] S32, based on the aggregated state vector, calculate and output the coupling constraint set, which includes the lower limit of photovoltaic absorption rate and the power limit based on the health state of the battery pack.

[0063] Obtaining the aggregate state vector of the battery pack Then, it is transformed into a set of coupled constraints that can be directly used as optimization constraints through a preset solution network. Preferably, the solution network consists of two parallel sub-networks, which output the power limit and the lower limit of photovoltaic grid integration rate, respectively.

[0064] The first subnetwork is used to generate power limits based on the battery pack's state of health. This subnetwork aggregates the state vector. Extract two sets of statistics: (1) the mean of the overall health status and variance (2) Weighted worst-case health status index The maximum allowable charging power and discharging power are obtained by weighting the health status of the batch of individuals with the highest attention weight. Based on the above statistics, the maximum allowable charging power and discharging power are determined by the following formula:

[0065]

[0066] in, and These are the rated charge and discharge powers of the battery pack; and This is the conservatism coefficient. The formula takes... and The smaller of the two values ​​is used as the power discount factor. When monomer inconsistency intensifies ( When the health status of the worst-case cell decreases (increases) or the power limit automatically tightens, the discount factor decreases, thus incorporating both the bottleneck constraint and the estimation uncertainty into the calculation of the power limit.

[0067] The second subnetwork is used to generate the lower limit of photovoltaic grid integration rate. This subnetwork has the maximum charging power output of the first subnetwork. The current predicted photovoltaic power output distribution and available battery capacity are used as inputs, and determined by the following formula:

[0068] in, This serves as the lower limit of the benchmark absorption rate. This refers to the length of the scheduling time window; The acceptable charging capacity of the battery within the current scheduling window is determined by the available capacity between the current state of charge and the safe upper limit. When poor battery health leads to... When the battery capacity is reduced or insufficient for charging, The amount of photovoltaic power that China can accept is limited. The targets were adjusted accordingly to avoid setting mandatory disposal targets that could not be met.

[0069] It should be understood that the output power limits and photovoltaic grid integration lower limits of the two sub-networks mentioned above are determined by the aggregation state vector. The shared information within them forms a coupling. The solution network achieves this through shared information. , and Intermediate variables, such as these, enable adaptive adjustments to the lower limit of the grid connection rate when the power limit is tightened, thereby ensuring photovoltaic grid connection as much as possible under a given battery condition without sacrificing battery safety.

[0070] S4, the global scheduling layer uses the distribution-sensitive multi-objective function, dynamic weight coefficients and coupling constraint set to solve the charging and discharging plan for the future preset time period as a benchmark; the local scheduling layer uses this benchmark as a reference, inputs the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, and generates and executes minute-level charging and discharging actions under the coupling constraint set.

[0071] Energy storage dispatch faces two coordination challenges on two time scales: (1) From a global economic perspective, it is necessary to coordinate charging and discharging plans based on forecast information for a longer future period (such as 24 hours) to maximize revenue and achieve multi-objective equilibrium; (2) From a local real-time response perspective, photovoltaic output, load demand, and electricity prices fluctuate in real time on a minute-level scale that cannot be covered by forecasts. If the global plan is strictly followed, it will be impossible to respond to these fluctuations in a timely manner, which may lead to actual revenue deviating from expectations or exceeding constraints. If only local real-time optimization is relied upon, it is easy to lose the global perspective and cause short-term profit-seeking behavior to deviate from the long-term optimal direction.

[0072] To simultaneously address the needs of both scales, this invention employs a hierarchical scheduling architecture. At the global scheduling layer, a distribution-sensitive multi-objective function is used as the optimization objective, dynamically generated weight coefficients are used as the aggregation coefficients for each objective, and a set of coupled constraints are used as the constraints to solve for a charging and discharging plan for a future preset time period (e.g., 24 hours, with a time resolution of 15 minutes or 1 hour). This plan provides the target charging and discharging power curves for each time period, which are then distributed to the local scheduling layer as a benchmark. This benchmark provides macroscopic guidance for local scheduling, ensuring that real-time local adjustments do not deviate from the globally optimal direction.

[0073] At the local scheduling layer, using this benchmark as a reference boundary, the system responds in real time to minute-level actual operational fluctuations and generates charge and discharge actions that can be actually executed.

[0074] As an exemplary method for implementing local scheduling, it may specifically include the following steps S41-S42.

[0075] S41 constructs a state vector using real-time running data, dynamic weight coefficients, and the benchmark charging and discharging power corresponding to the current moment. The charging and discharging power adjustment amount is used as the action space. The trained reinforcement learning policy network is input and candidate actions are output. In specific implementation, the state vector required for reinforcement learning is constructed, which includes: (1) real-time operating data collected by sensors at the minute level, such as current photovoltaic output, load power, actual electricity price and real-time state of charge of battery pack, used to characterize the actual operating conditions at the current moment. (2) Dynamic weight coefficients of the current scheduling cycle generated online, enabling the policy network to perceive the relative priority of each objective when making decisions. (3) The benchmark charging and discharging power corresponding to the current moment, that is, the target value of charging and discharging power at the current moment in the benchmark plan issued by the global scheduling layer, providing a reference anchor point for local adjustment. For example, the state vector can be expressed as:

[0076] in, To contribute to the current photovoltaic industry, This represents the current load power. This is the current real-time electricity price. This represents the current state of charge of the battery pack. to These are the dynamic weighting coefficients for four objectives: revenue, battery life, photovoltaic power generation, and demand control. This represents the reference charge / discharge power at the current moment.

[0077] The action space is defined as the amount of charge and discharge power adjustment. This refers to the increase or decrease in power based on the reference charge / discharge power. The state vector... Input a pre-trained reinforcement learning policy network Output candidate actions:

[0078] It should be understood that the candidate action is the incremental charge / discharge power recommended by the policy network, which is then superimposed with the baseline charge / discharge power to obtain the candidate charge / discharge power.

[0079] S42, based on the coupling constraint set, the candidate actions are modified to obtain the actual minute-level charging and discharging actions; wherein, the reward function of the reinforcement learning policy network is composed of the dynamic weight coefficients weighted by the real-time contribution of each objective, and the deviation penalty term between the actual action and the baseline policy.

[0080] After obtaining candidate actions, they are verified and corrected using a set of coupling constraints. This set of coupling constraints includes a lower limit for photovoltaic grid integration. and maximum charge / discharge power limits based on battery pack health status , The correction process can be represented as:

[0081] in, This means that the charging and discharging power corresponding to the candidate action is simultaneously constrained within both the power limit range and the charging and discharging demand implied by the lower limit of the photovoltaic absorption rate. When the charging and discharging power corresponding to the candidate action exceeds the power limit based on the battery pack health status, it is trimmed to the limit range; when the candidate action causes the photovoltaic absorption rate to fall below the lower limit, a forced correction is performed to meet the absorption constraint. The corrected... This refers to the actual minute-level charging and discharging actions performed.

[0082] Wherein, the reward function of the reinforcement learning policy network It consists of two parts: first, the dynamic weighting coefficients weight the immediate contributions of each objective; and second, the penalty term for deviation between the actual action and the baseline strategy. The reward function can be designed as follows:

[0083] in, These are the dynamic weighting coefficients for each objective; These are real-time contribution metrics for four objectives: minute-level revenue, battery life degradation, photovoltaic power consumption, and demand control effectiveness. This is the deviation penalty coefficient. The deviation penalty term measures the degree of deviation between the locally executed action and the baseline plan; the greater the deviation, the heavier the penalty.

[0084] It should be understood that through this reward function design, the local scheduling layer retains the ability to flexibly adjust according to real-time conditions while being guided by the baseline policy, thus avoiding excessive deviation between the local optimization direction and the global optimum. The reinforcement learning policy network of the local scheduling layer is obtained through offline training or pre-training on historical data and operates in inference mode during actual deployment, meeting the timeliness requirements of minute-level real-time decision-making.

[0085] Through the above-mentioned hierarchical scheduling architecture, this invention achieves the synergy between global planning and local real-time response. That is, the global layer plans the baseline path based on prediction information over a longer time scale, and the local layer uses the baseline as the anchor point, integrates dynamic weights and coupling constraints perceived in real time, and outputs safe, economical, and minute-level charging and discharging actions consistent with the global goal through a reinforcement learning policy network. Finally, the closed-loop execution of the energy storage scheduling strategy under multi-objective dynamic balance is achieved.

[0086] Please see Figure 3 The present invention also provides an energy storage scheduling strategy generation device 100 based on a multi-objective optimization algorithm, the device comprising: The multi-layer uncertainty quantification unit 11 is used to perform multi-layer uncertainty quantification on the predicted values ​​of photovoltaic power, load, electricity price and battery health status, obtain the probability distribution or interval of each variable, and construct a distribution-sensitive multi-objective function in the form of each objective expectation and risk. The objectives include revenue, battery life, photovoltaic consumption and demand control. The dynamic weight generation unit 12 is used to generate dynamic weight coefficients for each target online through Bayesian optimization based on the fluctuation amplitude and confidence level of the probability distribution or interval and the current state of charge and health of the battery pack. The coupling constraint acquisition unit 13 is used to obtain a set of coupling constraints by sensing the internal state of the battery pack through a graph neural network based on the uncertainty quantification result of the battery pack health state and the real-time operation data of the battery pack. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health state. The hierarchical scheduling unit 14 is used to solve the charging and discharging plan for a future preset period in the global scheduling layer with a distribution-sensitive multi-objective function, dynamic weight coefficients and coupling constraint set as a benchmark, and in the local scheduling layer with this benchmark as a reference, input the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, generate minute-level charging and discharging actions under the coupling constraint set and execute them.

[0087] As an example, see further. Figure 3 As shown, the dynamic weight generation unit 12 includes: The surrogate model construction subunit 121 is used to construct a Gaussian process surrogate model with each target weight as the decision variable, and to determine the upper limit of the profit weight search space based on the fluctuation range and confidence level of the probability distribution or interval, and to determine the lower limit of the lifetime weight search space based on the current state of charge and health of the battery pack. The weight search subunit 122 is used to construct a collection function by means of the prediction and the prediction variance output by the Gaussian process surrogate model under the constraints of the revenue weight search space and the lifetime weight search space. The collection function is used to iteratively select candidate weight combinations and evaluate them. The Gaussian process surrogate model is updated according to the evaluation results until convergence, and the dynamic weight coefficients of each target in the current scheduling cycle are generated online.

[0088] As an example, the Gaussian process proxy model uses the weighted deviation between the actual and ideal values ​​of each objective within a historical scheduling period as the objective function.

[0089] As an example, see further. Figure 3 As shown, the coupling constraint acquisition unit 13 includes: The graph neural network perception subunit 131 is used to construct a graph structure with each individual cell in the battery pack as a graph node and the electrical connection relationship as an edge. The node features include the quantitative parameters of the health state uncertainty of the corresponding individual cell and real-time operation data. The graph neural network performs message passing and uses an attention mechanism to aggregate the suppression effect of the health state uncertainty of the individual cell on the power capability of the whole pack, and generates the aggregated state vector of the battery pack. The constraint solving subunit 132 is used to solve the output coupling constraint set based on the aggregated state vector, which includes the lower limit of photovoltaic absorption rate and the power limit based on the health state of the battery pack.

[0090] As an example, the hierarchical scheduling unit 14 is specifically used to implement: A state vector is constructed using real-time running data, dynamic weight coefficients, and the benchmark charging and discharging power at the current moment. The charging and discharging power adjustment amount is used as the action space. The trained reinforcement learning policy network is input to the network and candidate actions are output. The candidate actions are modified based on the set of coupling constraints to obtain the actual minute-level charging and discharging actions; wherein, the reward function of the reinforcement learning policy network is composed of the dynamic weight coefficients weighted by the real-time contribution of each objective and the deviation penalty term between the actual action and the baseline policy.

[0091] It should be understood that each module in the energy storage scheduling strategy generation device based on the multi-objective optimization algorithm corresponds one-to-one with each step in the above method embodiment. Its specific implementation method and technical effect have been described in detail in the method embodiment, and will not be repeated here.

[0092] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method as described in any of the preceding claims.

[0093] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm, characterized in that, Includes the following steps: S1, perform multi-level uncertainty quantification on the predicted values ​​of photovoltaic power, load, electricity price and battery health status to obtain the probability distribution or interval of each variable, and construct a distribution-sensitive multi-objective function in the form of each objective expectation and risk. The objectives include revenue, battery life, photovoltaic consumption and demand control. S2, Based on the fluctuation amplitude and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack, dynamic weight coefficients of each objective are generated online through Bayesian optimization. S3, based on the uncertainty quantification results of the battery pack health status and the real-time operation data of the battery pack, the internal state of the battery pack is perceived through a graph neural network to obtain a set of coupling constraints. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health status. S4, the global scheduling layer uses the distribution-sensitive multi-objective function, dynamic weight coefficients and coupling constraint set to solve the charging and discharging plan for the future preset time period as a benchmark; the local scheduling layer uses this benchmark as a reference, inputs the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, and generates and executes minute-level charging and discharging actions under the coupling constraint set.

2. The method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm according to claim 1, characterized in that: Based on the fluctuation amplitude and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack, dynamic weight coefficients for each objective are generated online through Bayesian optimization, including: S21, construct a Gaussian process proxy model with each target weight as the decision variable, determine the upper limit of the profit weight search space based on the fluctuation range and confidence level of the probability distribution or interval, and determine the lower limit of the lifetime weight search space based on the current state of charge and health of the battery pack. S22, under the constraints of the revenue weight search space and the lifetime weight search space, a collection function is constructed using the predicted mean and predicted variance output by the Gaussian process surrogate model. Candidate weight combinations are iteratively selected and evaluated using the collection function. The Gaussian process surrogate model is updated according to the evaluation results until convergence, and the dynamic weight coefficients of each target in the current scheduling cycle are generated online.

3. The method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm according to claim 2, characterized in that: The Gaussian process proxy model uses the weighted deviation between the actual and ideal values ​​of each objective within the historical scheduling period as the objective function.

4. The method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm according to claim 1, characterized in that: Based on the uncertainty quantification results of the battery pack's health status and the real-time operating data of the battery pack, a graph neural network is used to perceive the internal state of the battery pack to obtain a set of coupling constraints, including: S31. A graph structure is constructed with each individual cell in the battery pack as a graph node and the electrical connection relationship as an edge. The node features include the quantitative parameters of the health state uncertainty of the corresponding individual cell and real-time operating data. The graph neural network is used for message passing and the attention mechanism is used to aggregate the suppression effect of the health state uncertainty of the individual cell on the power capability of the whole pack, and the battery pack aggregated state vector is generated. S32, based on the aggregated state vector, calculate and output the coupling constraint set, which includes the lower limit of photovoltaic absorption rate and the power limit based on the health state of the battery pack.

5. The method for generating energy storage scheduling strategies based on a multi-objective optimization algorithm according to claim 1, characterized in that: The local scheduling layer uses this benchmark as a reference, inputs dynamic weight coefficients and real-time running data into the reinforcement learning policy network, and generates minute-level charging and discharging actions under the coupling constraint set, including: S41 constructs a state vector using real-time running data, dynamic weight coefficients, and the benchmark charging and discharging power corresponding to the current moment. The charging and discharging power adjustment amount is used as the action space. The trained reinforcement learning policy network is input and candidate actions are output. S42, based on the coupling constraint set, the candidate actions are modified to obtain the actual minute-level charging and discharging actions; wherein, the reward function of the reinforcement learning policy network is composed of the dynamic weight coefficients weighted by the real-time contribution of each objective, and the deviation penalty term between the actual action and the baseline policy.

6. A device for generating energy storage scheduling strategies based on a multi-objective optimization algorithm, characterized in that: The device includes: The multi-layer uncertainty quantification unit is used to perform multi-layer uncertainty quantification on the predicted values ​​of photovoltaic power, load, electricity price and battery health status, obtain the probability distribution or interval of each variable, and construct a distribution-sensitive multi-objective function in the form of each objective expectation and risk. The objectives include revenue, battery life, photovoltaic consumption and demand control. The dynamic weight generation unit is used to generate dynamic weight coefficients for each target online through Bayesian optimization based on the fluctuation amplitude and confidence level of the probability distribution or interval, as well as the current state of charge and health of the battery pack. The coupling constraint acquisition unit is used to obtain a set of coupling constraints by sensing the internal state of the battery pack through a graph neural network based on the uncertainty quantification results of the battery pack health state and the real-time operation data of the battery pack. The set of coupling constraints includes the lower limit of photovoltaic absorption rate and the power limit based on the battery pack health state. The hierarchical scheduling unit is used to solve the charging and discharging plan for a future preset period at the global scheduling layer using a distribution-sensitive multi-objective function, dynamic weight coefficients, and a set of coupling constraints as a benchmark, and at the local scheduling layer, using this benchmark as a reference, inputting the dynamic weight coefficients and real-time running data into the reinforcement learning policy network, generating minute-level charging and discharging actions under the set of coupling constraints, and executing them.

7. The energy storage scheduling strategy generation device based on a multi-objective optimization algorithm according to claim 6, characterized in that, The dynamic weight generation unit includes: The surrogate model construction subunit is used to construct a Gaussian process surrogate model with each target weight as the decision variable. The upper limit of the profit weight search space is determined based on the fluctuation range and confidence level of the probability distribution or interval, and the lower limit of the lifetime weight search space is determined based on the current state of charge and health of the battery pack. The weight search subunit is used to construct a collection function based on the predicted mean and predicted variance output by the Gaussian process surrogate model under the constraints of the revenue weight search space and the lifetime weight search space. The collection function is used to iteratively select and evaluate candidate weight combinations. The Gaussian process surrogate model is updated according to the evaluation results until convergence, and the dynamic weight coefficients of each target in the current scheduling cycle are generated online.

8. The energy storage scheduling strategy generation device based on a multi-objective optimization algorithm according to claim 6, characterized in that, The coupling constraint acquisition unit includes: The graph neural network perception subunit is used to construct a graph structure with each individual cell in the battery pack as a graph node and the electrical connection relationship as an edge. The node features include the quantitative parameters of the health state uncertainty of the corresponding individual cell and real-time operation data. The graph neural network performs message passing and uses an attention mechanism to aggregate the suppression effect of the health state uncertainty of the individual cell on the power capability of the entire pack, and generates the aggregated state vector of the battery pack. The constraint solving subunit is used to solve the output coupling constraint set based on the aggregated state vector, which includes the lower limit of photovoltaic absorption rate and the power limit based on the health state of the battery pack.

9. The energy storage scheduling strategy generation device based on a multi-objective optimization algorithm according to claim 6, characterized in that, The hierarchical scheduling unit is specifically used to implement: A state vector is constructed using real-time running data, dynamic weight coefficients, and the benchmark charging and discharging power at the current moment. The charging and discharging power adjustment amount is used as the action space. The trained reinforcement learning policy network is input to the network and candidate actions are output. The candidate actions are modified based on the set of coupling constraints to obtain the actual minute-level charging and discharging actions; wherein, the reward function of the reinforcement learning policy network is composed of the dynamic weight coefficients weighted by the real-time contribution of each objective and the deviation penalty term between the actual action and the baseline policy.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.