Intelligent decision-making methods, terminal equipment and storage media for power system unit dispatching

By screening characteristic indicators in new power systems and using the CRITIC objective weighting method and Gaussian mixture model for multi-scenario segmentation, combined with the near-end policy optimization algorithm of the Actor-Critic framework, the computational complexity and distribution offset problems of power system dispatching under high new energy penetration are solved, and optimal dispatching decisions are achieved.

CN117726478BActive Publication Date: 2026-06-30HUNAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN UNIV
Filing Date
2023-02-22
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In new power systems with high penetration of new energy sources, the uncertainty and complexity of source and load lead to high computational complexity of traditional reinforcement learning algorithms in power system scheduling, making it difficult to achieve optimal scheduling. Furthermore, the distribution offset problem results in poor online decision-making performance of the agent.

Method used

By selecting physically meaningful feature indicators, configuring weights using the CRITIC objective weighting method, combining a Gaussian mixture model for multi-scenario segmentation, and training with a near-end policy optimization algorithm based on the Actor-Critic framework, a deep reinforcement learning method for multi-scenario segmentation is established to optimize unit scheduling strategies.

Benefits of technology

It achieves optimal scheduling decisions in different scenarios, improves computational efficiency and decision accuracy, and meets the goals of safe, economical and green scheduling of the power system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117726478B_ABST
    Figure CN117726478B_ABST
Patent Text Reader

Abstract

This invention discloses an intelligent decision-making method, terminal equipment, and storage medium for power system unit scheduling. Based on historical power system operating data, it extracts typical features using dimensionality reduction methods and constructs a feature index set by configuring feature weights using an objective weighting method. A Gaussian mixture clustering model is used for multi-scenario partitioning as a front-end optimization measure for deep reinforcement learning methods, mitigating the suboptimal decision-making problem that may be caused by differences in data distribution across multiple scenarios under source-load uncertainty. The unit scheduling problem is modeled as a sequential decision Markov process, constructing a multi-scenario unit scheduling model based on deep reinforcement learning, overcoming the limitations of the original single-scenario model. Through a dynamic step-size update mechanism and parallel computing, the parameter update efficiency during the offline training phase of the decision network is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power system dispatching, and in particular to an intelligent decision-making method, terminal equipment and storage medium for power system unit dispatching. Background Technology

[0002] In new power systems with high penetration rates of new energy sources, the sources and loads exhibit high volatility and uncertainty, posing new challenges to the safe and economical dispatch and operation of the power system.

[0003] Currently, there are two main solutions for grid dispatching in new power systems: physical model-based and data-driven approaches. The increasing penetration of new energy sources and the increasing complexity of network topologies have exacerbated the difficulty of implementing dispatching based on physical models. In particular, directly solving large-scale, nonlinear mixed-integer programming problems presents challenges such as long computation times, high model accuracy requirements, strong reliance on manual control, and the generation of conservative dispatching schemes that are difficult to adapt to the dynamic changes of actual systems.

[0004] Deep reinforcement learning methods possess powerful perceptual fitting and exploratory decision-making capabilities. These methods interact with the power grid operation simulation environment, adaptively learning control strategies to flexibly handle source-load uncertainties such as wind power, photovoltaic power, and multi-energy loads, achieving optimized scheduling decisions. With the increasing penetration rate of new energy sources in the power system and the strengthening of source-load uncertainties, reinforcement learning-based decision-making schemes have high application value in large-scale complex scheduling problems with high-dimensional state-action spaces.

[0005] However, traditional reinforcement learning algorithms suffer from increased computational complexity in large-scale, highly nonlinear, and differentiated scenarios. This is particularly evident in their significant limitations in power system scheduling applications. One key issue is the distribution offset problem, where inconsistent data distributions between the training and test sets lead to poor online decision-making performance by the agent, making it difficult to guarantee optimal scheduling. Summary of the Invention

[0006] The technical problem to be solved by the present invention is to provide a smart decision-making method, terminal equipment and storage medium for power system unit scheduling, which addresses the shortcomings of the existing technology. By dividing multiple scenarios to reduce source-load uncertainty and training unit scheduling strategies in different scenarios, optimal scheduling is achieved.

[0007] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is: an intelligent decision-making method for power system unit dispatching, comprising the following steps:

[0008] S1. Collect historical data of actual power grid operation as the raw dataset, and use the raw dataset to construct training and test sets;

[0009] S2. Filter feature indicators, process the training set using the feature indicators to obtain a dataset containing multiple feature indicators, and use the CRITIC objective weighting method to assign weights to the value of each feature indicator in the dataset to construct a feature indicator set.

[0010] S3. Use the set of feature indicators as input to the Gaussian mixture model, and output the scene categories and the set of feature indicators for each scene category.

[0011] S4. Construct the current state s using the training set data corresponding to the feature index sets of each scene category. t Generate scheduling policy π and sample action a. t Calculate the current state s based on the action. t Instant rewards under r t And generate new environment states s t+1 The generated sample sequence <s> is obtained. t a t r t s t+1 The generated sample sequences are used as inputs to the value network and policy network in the offline decision network to train the value network and policy network, obtaining the decision model corresponding to each scenario. The trained policy network can then be used to make online decisions on the test set to verify the model's effectiveness.

[0012] Current data-driven, single-scenario-based unit scheduling decision-making methods struggle to effectively address suboptimal decision-making issues caused by data distribution differences across power system scenarios. This invention selects characteristic indicators with clear physical meaning to characterize source-load properties, characterizing source-load uncertainty from both overall source-load level and trend perspectives. Compared to feature reduction methods, this approach fully preserves original source-load information and effectively extracts differentiated features across different scenarios. For each characteristic indicator, weight coefficients need to be configured to objectively reflect its importance in characterizing source-load uncertainty. This invention employs the CRITIC objective weighting method, considering both the differences between characteristic indicators and eliminating the influence of highly correlated indicators, thus reducing information overlap and making it suitable for comprehensive evaluation problems involving multiple indicators and multiple evaluation objects. This invention uses a Gaussian mixture model as the clustering analysis method. The hybrid model is a mixed probability model containing multiple sub-Gaussian models. Compared with other clustering analysis methods, such as K-means clustering, hierarchical clustering, and fuzzy C-means clustering, it can effectively approximate any continuous probability distribution by reasonably setting the number of sub-Gaussian models. It is suitable for describing the source-load difference characteristics of different scenarios and realizing multi-scenario partitioning. This invention adopts the Proximal Policy Optimization (PPO) algorithm based on the Actor-Critic framework as the decision network. In the training phase, the advantage function is introduced to dynamically update the step size. In the online phase, the policy network is used for real-time decision-making. Compared with other deep reinforcement learning algorithms, such as deep Q-network (DQN) and deep deterministic policy gradient (DDPG), this algorithm has good convergence and high decision efficiency, which solves the problems of computational complexity and low computational efficiency in the existing intelligent decision-making process for power system unit scheduling.

[0013] In step S2, the selected characteristic indicators characterize the uncertainty of source and load from two perspectives: overall level and trend. From an overall perspective, the selected characteristic indicators include: maximum renewable energy output I1, reflecting the maximum renewable energy output level; average renewable energy output I2, reflecting the average renewable energy output level; maximum load utilization hour rate I3, reflecting equipment time utilization efficiency; daily peak-valley difference rate I4, reflecting the grid's peak-shaving capacity; and renewable energy daily peak-valley difference rate I5, reflecting the overall change in renewable energy output of the grid. From a trend perspective, the selected characteristic indicators include: load factor I6, reflecting the overall load change; daily load fluctuation rate I7, reflecting the degree of instability in load changes; and renewable energy daily output fluctuation rate I8, reflecting the degree of instability in renewable energy output changes.

[0014] The calculation formulas for each characteristic index are as follows: I6 = P av / P max , Where T is the number of time periods divided into the day at the smallest time granularity. These represent the average, maximum, and minimum values ​​of new energy power output, P. av P max P min These represent the average, maximum, and minimum load values, respectively, P. t , The load output and renewable energy output at time t are respectively, and α t , Let P(t) be the minimum time-granularity fluctuation rate of load output and renewable energy output, respectively, and let P(t) be the load output at time t. av P max These are the average and maximum load values, respectively.

[0015] The feature indicators selected in this invention, which possess clear physical meaning and characterize the source-load properties, describe the load and new energy sources from both overall level and trend perspectives, effectively extracting the source-load difference characteristics of different scenarios. Compared to feature reduction methods based on data relationship mapping, it has no requirements on data structure; compared to feature indicators that only consider a single perspective, it can comprehensively retain the original information.

[0016] The weight ω of the j-th feature index value j The calculation formula is: Among them, C j The amount of information carried by the j-th indicator, σ z Let σ be the standard deviation of the z-th characteristic index of m objects to be evaluated, where z = 1, 2, ..., 8; j Let be the standard deviation of the j-th characteristic index of m objects to be evaluated. Let X′ be the mean of the j-th characteristic index of m objects to be evaluated; z X′ is the z-th feature index value of the m objects to be evaluated after normalization. j r is the normalized value of the j-th feature index of the m objects to be evaluated; zj x is the correlation coefficient between the z-th and j-th feature indicators; ij Let be the value of the j-th feature indicator of the i-th object; for positive indicators, For negative indicators, x j Let I be the value of the j-th characteristic indicator of m objects to be evaluated; the positive indicators are the maximum output of new energy I1, the average output of new energy I2, the highest utilization hour rate of load I3, and the load rate I6; the negative indicators are the daily peak-valley difference rate I4, the daily peak-valley difference rate of new energy I5, the daily load fluctuation rate I7, and the daily output fluctuation rate of new energy I8.

[0017] Weights characterize the importance of each feature index in describing the uncertainty of the source load. The weighting method significantly affects the objectivity and accuracy of the evaluation results. The CRITIC objective weighting method adopted in this invention is a comprehensive evaluation method applicable to multiple indicators and multiple evaluation objects. By introducing contrast intensity and conflict, it fully considers the differences and correlations between feature indicators. Compared with subjective weighting methods and traditional objective weighting methods, the evaluation results are more objective, comprehensive, and credible.

[0018] In step S3, the process of determining the parameters of the Gaussian mixture model includes:

[0019] 1) Calculate the posterior probability distribution γ according to Bayes' theorem. ik : The Gaussian mixture model includes a mixture distribution consisting of K sub-Gaussian distributions, where K represents the number of sub-Gaussian models, μ k , ∑ k , σ k α k p(x) represents the expectation, variance or covariance, probability in the Gaussian mixture model, and weight of the k-th sub-Gaussian model, respectively. i |μ k , ∑ k , σ k ) is a mixture probability model of the k-th sub-Gaussian distribution;

[0020] 2) Update the parameters of the Gaussian mixture model based on the posterior probability distribution:

[0021]

[0022]

[0023]

[0024] Where, μ′ k ,∑′ k α′ k Corresponding to μ k , σ k α k Updated parameter, x i Let i be the i-th object to be evaluated, where i = 1, 2, 3, ..., m; and m be the number of objects to be evaluated.

[0025] 3) Repeat steps 1) and 2) until the parameters converge, and obtain the updated parameters, which is the Gaussian mixture model.

[0026] This invention employs the Expectation-maximization (EM) algorithm for Gaussian mixture model parameter estimation. By iteratively updating the model parameters through the re-estimation formula, the computational complexity of maximum likelihood estimation can be reduced, while ensuring effective convergence of the algorithm.

[0027] In step S3, based on the Bayesian information criterion, the number K of sub-Gaussian models in the Gaussian mixture model is determined using a successive approximation method; the formula for the Bayesian information criterion is as follows: C BIC =Kln(m)-2ln(L); where, C BIC is the Bayesian formula value used to evaluate the Gaussian mixture model; L is the maximum likelihood function value of the Gaussian mixture model; m is the number of objects to be evaluated.

[0028] The Bayesian information criterion, by balancing model complexity with the model's ability to describe the data distribution, can select the number of sub-Gaussian models that best fit the data distribution while ensuring low model complexity. It is an effective method for determining the number of sub-Gaussian models in a Gaussian mixture model.

[0029] In step S4, the objective function L optimized by the policy network is... CLIP (θ) is:

[0030]

[0031] Where θ represents the policy network parameters, and clip is the clipping function. To control the hyperparameters of the shear interval, π represents the ratio of sampling probabilities between the new and old strategies. θ (a|s) represents the sampling probability of the new strategy. For the sampling probability of the old strategy, A(s) t a t In state s t Take action a t Compared to the advantage estimate of taking the average action; A(s) t a t )=Q u (s t a t )-V u (s t V u (s t )=E(R t |s t ;π), Q u (s t a t Let be the action value function, representing the value of an action in state s. t The following action a is executed according to strategy π.t Expected reward, V u (s t ) represents state s t The value function represents the state s. t The expected reward for performing all actions according to policy π; E(·) is the expectation function, R t For cumulative rewards.

[0032] Furthermore, the method of the present invention also includes:

[0033] S5. Construct a test set using the original dataset, randomly select a feature dataset for a certain day within the test set, determine the scenario to which the feature dataset belongs, and match it to the corresponding decision model for scheduling decision.

[0034] As an inventive concept, the present invention also provides a terminal device, including a memory, a processor, and a computer program stored in the memory; the processor executes the computer program to implement the steps of the method described above.

[0035] As an inventive concept, the present invention also provides a computer-readable storage medium having a computer program / instructions stored thereon; when the computer program / instructions are executed by a processor, they implement the steps of the method described above.

[0036] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0037] 1. This invention addresses the distribution offset problem in the application of deep reinforcement learning methods to power system dispatch. It establishes a feature index set based on physical meaning and objective weighting method as the basis for multi-scenario division, and proposes multi-scenario division as a front-end optimization measure for deep reinforcement learning methods, thus overcoming the impact of power system source-load uncertainty and complex multi-scenario on unit dispatch.

[0038] 2. This invention fully considers the scheduling objectives and decision-making requirements of the new power system in terms of safety, economy and greenness, and designs a reinforcement learning state, action space and reward function mechanism, and proposes a new intelligent scheduling decision-making method for power systems based on multi-scenario partitioning and improved deep reinforcement learning.

[0039] 3. This invention establishes a proximal policy optimization algorithm with a dynamic step size update mechanism to differentiate the training of offline decision networks for various scenarios. By improving training efficiency through parallel computing, optimal decision-making is achieved in each scenario. Attached Figure Description

[0040] Figure 1 This is an overall architecture diagram of Embodiment 1 of the present invention;

[0041] Figure 2This is a schematic diagram of optimal multi-scenario partitioning based on the BIC criterion in Embodiment 1 of the present invention;

[0042] Figure 3 This is a schematic diagram of the multi-scene segmentation results in Embodiment 1 of the present invention;

[0043] Figure 4 This is a schematic diagram of the offline training framework based on the near-end policy optimization algorithm in Embodiment 1 of the present invention;

[0044] Figure 5 This is a schematic diagram comparing the average rewards before and after the multi-scene division in Embodiment 1 of the present invention;

[0045] Figure 6 This is a schematic diagram comparing the average decision step length before and after multi-scenario partitioning in Embodiment 1 of the present invention;

[0046] Figure 7 This is a schematic diagram of the voltage status of key nodes under multi-scenario scheduling decision-making in Embodiment 1 of the present invention;

[0047] Figure 8 This is a schematic diagram of power grid loss under multi-scenario scheduling decision-making in Embodiment 1 of the present invention;

[0048] Figure 9 This is a schematic diagram of the new energy consumption situation under the multi-scenario scheduling decision in Embodiment 1 of the present invention. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0050] In this document, the terms "first," "second," and other similar words are not intended to imply any order, quantity, or importance, but are merely used to distinguish different elements. The terms "one," "a," and other similar words are not intended to indicate the existence of only one of the stated things, but rather that the description pertains to only one of the two stated things, which may include one or more. The terms "comprising," "including," and other similar words are intended to indicate a logical relationship, not a spatial relationship. For example, "A includes B" means that logically B belongs to A, not that spatially B is located inside A. Furthermore, the meanings of the terms "comprising," "including," and other similar words should be considered open-ended, not closed. For example, "A includes B" means that B belongs to A, but B does not necessarily constitute all of A; A may also include other elements such as C, D, and E.

[0051] The following uses the modified IEEE 118-node extended system as an example to illustrate and verify the multi-scenario unit scheduling decision method based on deep reinforcement learning of this invention.

[0052] Example 1

[0053] like Figure 1 As shown, this embodiment proposes an intelligent decision-making method for power system unit scheduling based on Gaussian multi-scenario partitioning and deep reinforcement learning, including the following steps:

[0054] S1. Extract nearly one year's worth of operational data from a provincial power grid, with a time granularity of 5 minutes, as the initial dataset (i.e., the original dataset). Divide the initial dataset into training and testing sets. Use the training set to train the decision network of a multi-scenario unit scheduling model based on deep reinforcement learning. For scenarios covered in the testing set, such as stable operation (where renewable energy output and overall load levels are stable); load fluctuation (where load is affected by electricity demand and shows a clear trend, but renewable energy output remains stable); and renewable energy fluctuation (where renewable energy is limited by natural conditions and shows a clear trend, but overall load levels remain stable), use the trained decision model strategy network for online decision-making to verify the model's effectiveness. Data includes: load values ​​at each node at the smallest time granularity and the maximum output of each renewable energy unit.

[0055] S2. Process the above training set based on feature indices I1 to I8, configure the weights of each feature index using the CRITIC objective weighting method, construct a feature index set as the basis for multi-scene segmentation, perform multi-scene segmentation using the trained Gaussian mixture model, and evaluate the multi-scene segmentation effect using an evaluation system.

[0056] The specific implementation process of step S2 includes:

[0057] S2.1. The dataset is preprocessed using physical meaning-based feature indicators to obtain 370 datasets containing 8 feature indicators.

[0058] S2.2. The CRITIC objective weighting method is used to assign weights to each feature indicator of the dataset. Considering improving the renewable energy absorption rate, equipment utilization rate, and smoothing the load / renewable energy output change trend, load factor, maximum utilization hour rate, maximum renewable energy value, and average renewable energy value are selected as positive indicators, while daily peak-valley difference rate, daily load fluctuation rate, renewable energy daily peak-valley difference rate, and renewable energy daily output fluctuation rate are selected as negative indicators. The weights of each indicator after calculation are shown in Table 1.

[0059] Table 1 Weighting of Each Feature Index

[0060]

[0061] In this embodiment, the maximum output of new energy (I1) reflects the maximum output level of new energy; the average output of new energy (I2) reflects the average output level of new energy; the highest utilization hour rate of load (I3) reflects the time utilization efficiency; the daily peak-valley difference rate (I4) reflects the peak-shaving capacity of the power grid; the daily peak-valley difference rate of new energy (I5) reflects the overall change in the output of new energy in the power grid; the load rate (I6) reflects the load change situation; the daily load fluctuation rate (I7) reflects the degree of instability of load changes; and the daily output fluctuation rate of new energy (I8) reflects the degree of instability of new energy output changes.

[0062] I6 = P av / P max , T represents the number of time periods in the day, divided by the smallest time granularity. These represent the average, maximum, and minimum values ​​of new energy power output, P. av P max P min These represent the average, maximum, and minimum load values, respectively, P. t , The load and renewable energy output at time t are respectively, and α t , These represent the minimum time granularity fluctuation rates of load and renewable energy output, respectively.

[0063] S2.3 Configuring feature weights in the dataset according to Table 1, and constructing a feature index set;

[0064] In this embodiment, to eliminate the influence of dimensions, the min-max normalization method is used to process the feature index, as shown in the following formula:

[0065] Positive indicators:

[0066] Negative indicators:

[0067] Assume there are m objects to be evaluated, i = 1, 2, ..., m; 8 characteristic indicators from I1 to I8, j = 1, 2, ..., 8; x ij x is the value of the j-th feature index of the i-th object; j Let be the value of the j-th characteristic index among m evaluation objects.

[0068] The information carrying capacity of the characteristic indicators is calculated, the standard deviation measures the strength of the comparison, and the correlation coefficient measures the conflict, as shown in the following formula:

[0069] Contrast intensity:

[0070]

[0071] Conflict:

[0072] Information carrying capacity:

[0073] In the formula, σ z Let σ be the standard deviation of the z-th index, where z = 1, 2, ..., 8; j Let j be the standard deviation of the j-th indicator; Let X′ be the mean of the j-th indicator among m objects to be evaluated; z X′ is the z-th indicator among m evaluated objects after normalization. j For the j-th indicator of m objects to be evaluated after normalization; r zj The correlation coefficient between the z-th and j-th indicators is the linear correlation Pearson coefficient; C j The value of the j-th indicator represents the amount of information it carries; the larger the value, the greater its weight.

[0074] The objective weights of each characteristic indicator are calculated using the following formula:

[0075]

[0076] In the formula, ω j The information content of the j-th indicator accounts for the proportion of the total information content, which is the objective weight of this feature indicator.

[0077] The weighting of each feature index in Table 1 is obtained through the above calculations.

[0078] S2.4. Estimate the parameters of the Gaussian mixture model using the EM algorithm, and select the optimal number of scene partitions (i.e., the number of sub-Gaussian models) based on the BIC criterion. Figure 2 As shown, the same complete covariance matrix was ultimately selected, and the optimal number of scene partitions was set to 3.

[0079] In this embodiment, a Gaussian mixture model is constructed as the basis for multi-scene partitioning, and its probability distribution is represented as follows:

[0080] τ k =(μ k , ∑ k , σ k );

[0081]

[0082]

[0083] In the formula, the Gaussian mixture model is a mixture probability model containing K sub-Gaussian distributions; x is the sample set; μ k , ∑ k , σ k α k Let $\mathbf{k}$ be the expectation, variance (or covariance), probability in the mixture model, and weight of the $k$-th sub-Gaussian model, respectively; $\t$ are parameters relating the expectation, variance (or covariance), and probability in the mixture model; $\t$ is the value of $\mathbf{k}$. k Let be the parameters of expectation, variance (or covariance), and probability in the mixture model for the k-th sub-Gaussian model; Let P(·) be the probability distribution of the k-th sub-Gaussian model in the Gaussian mixture model; P(·) is the probability density function; and K is the number of sub-Gaussian models.

[0084] The parameters of the Gaussian mixture model are estimated using the maximum likelihood function, as shown in the following equation:

[0085]

[0086] In this embodiment, the Expectation-maximization (EM) algorithm is used to solve for the parameters of the Gaussian mixture model, reducing the computational complexity of maximum likelihood estimation. The specific steps are as follows:

[0087] 1) Initialize the model's expectation, variance (or covariance), probability in the mixture model, and weight-related parameters;

[0088] 2) Expectation Step (E-step): The posterior probability distribution is calculated using Bayes' theorem, as shown in the following formula:

[0089]

[0090] 3) Maximization step (M-step): The parameters of the updated model are solved based on the results of the E-step. The calculation formula is as follows:

[0091]

[0092]

[0093]

[0094] D. After the calculations in steps B and C, until the parameters converge (the expression for parameter convergence is as follows), or until the maximum number of iterations is reached, the trained Gaussian mixture model is obtained.

[0095] ||τ i+1 -τ i ||<∈

[0096] In the formula, ∈ is a very small positive number, indicating that the parameter changes very little after one iteration. τ i+1 τ is the parameter obtained in the (i+1)th iteration. i The parameters are obtained in the i-th iteration.

[0097] S2.5. Use the trained Gaussian mixture clustering model to perform multi-scene segmentation, and the results are as follows: Figure 3 As shown in Table 2, based on the multi-scenario segmentation effect evaluation system, the values ​​of each evaluation index are calculated:

[0098] Table 2 Multi-scenario Evaluation Index System

[0099]

[0100] In this embodiment, based on the Bayesian Information Criterion (BIC), the number of sub-Gaussian models in the Gaussian mixture model is determined using a successive approximation method. The Bayesian formula is as follows: C BIC =Kln(m)-2ln(L). Where, C BIC is the Bayesian formula value used to evaluate Gaussian mixture models, where K is the number of sub-Gaussian models in the Gaussian mixture model, and L is the maximum likelihood function value of the Gaussian mixture model.

[0101] In this embodiment, the feature index set is used as the input of the Gaussian mixture model to divide the scene. The number of scenes is calculated by S2.4. The output is the divided scene categories and the feature index set of each scene category. The training set of each scene can be obtained according to the model output results.

[0102] In this embodiment, to evaluate the effectiveness of the multi-scene segmentation method, an evaluation system including the following indicators is constructed:

[0103] The Silhouette Coefficient (SC) reflects the aggregation of data within the same scene and the separation of data between different scenes.

[0104]

[0105] In the formula, a and b are the average distances between the current data and other data of the same type, and the average distances between the current data and the closest other type of data, respectively.

[0106] The variance ratio (Calinski-Harbasz, CH) is based on between-class variance and within-class variance. It is used to evaluate the tightness of data within the same scenario and the separation of data across different scenarios. A higher CH value indicates a better multi-scenario partitioning effect.

[0107]

[0108] In the formula, B K W K , respectively, are the covariance matrices between data from different types of scenarios and between data from data within the same type of scenario; tr is the trace of the matrix.

[0109] The Davies-Bouldin index (DBI) is a performance metric that considers both the similarity of data within classes and the differences between data between classes. A lower DBI value indicates better scene segmentation performance.

[0110]

[0111] In the formula, Let ω be the average Euclidean distance from the i-th class sample to its class center; i -ω j ||2 is the Euclidean distance between the class centers of class i and class j.

[0112] S3. Construct a novel AC power system simulation environment based on grid topology and equipment models, and design reasonable actions, state spaces, and reward function mechanisms that consider scheduling objectives. The state-action and space settings are shown in Table 3.

[0113] Table 3 Status and Action Space Settings

[0114]

[0115]

[0116] The weights of each sub-reward are configured according to the importance of the scheduling target, as shown in Table 4.

[0117] Table 4 Sub-reward function weight settings

[0118]

[0119] In this embodiment, the state space includes: load active power, load reactive power, voltage amplitude of the node where the load is located, active power output of the unit, reactive power output of the unit, voltage amplitude of the unit, branch current load rate, load prediction value for the next time step, unit on / off status, remaining time steps for the shut-down unit to be allowed to restart, remaining time steps for the restarted unit to be allowed to shut down, remaining time steps for the disconnected branch to be reconnected, and remaining time steps for the branch to have been continuously soft overloaded.

[0120] The operating space includes: the unit's active power output adjustment value and the unit's voltage adjustment value.

[0121] Establish a system that reflects the security of the new power system (r safe ), economy (r eco ), green (r) env The reward functions for running the scheduling targets are set as follows:

[0122] Security Rewards safe The power flow calculation is obtained by receiving scheduling instructions from the reinforcement learning training simulation environment.

[0123] r safe =r line +r q +r v +r balance ;

[0124] The items are as follows:

[0125] 1) Line power exceeding limits:

[0126]

[0127] N line I represents the number of power grid branches. i and T i Let δ be the current and thermal limit of branch i, and let δ be a constant value to avoid the denominator being zero.

[0128] 2) Unproductive output exceeds the limit:

[0129]

[0130]

[0131] In the formula, N gen q represents the total number of generating units. i , These represent the reactive power output and upper and lower limits of unit i, respectively.

[0132] 3) Node voltage exceeding limits:

[0133]

[0134]

[0135] r v =0 otherwise

[0136] In the formula, N i v represents the number of nodes in the power grid. i , These are the voltage and the upper and lower limits of node i, respectively.

[0137] 4) Over-limit power of the balancing unit:

[0138]

[0139]

[0140] r balance =0 otherwise

[0141] In the formula, N balance p represents the number of balancing units. i , for, These represent the active power output and upper and lower limits of the balancing unit i, respectively. max and C min These are constant values, 1.1 and 0.9 respectively.

[0142] Economic reward r eco The unit operating costs incurred by the agent under the scheduling decisions made at time step t:

[0143]

[0144] In the formula, a, b, and c are the unit operating cost coefficients, d is the unit start-up and shutdown cost, and the new energy units and balanced units are always kept running. The active power output p of each thermal power unit is... i A value of zero indicates the device is in a powered-off state.

[0145] Environmental rewards r env The renewable energy absorption rate is the ratio of renewable energy absorbed to the maximum renewable energy output.

[0146]

[0147] In the formula, N new The number of new energy generating units. These are the actual and maximum active power outputs of the new energy unit i at the current time step.

[0148] r = r safe +r eco +r env ;

[0149] In the formula, r represents the sum of the security reward r. safe Economic efficiency eco Environmental rewards env The reward function.

[0150] S4. A near-end strategy optimization algorithm is used to differentiate the offline training strategies for each scenario (Zhu Jiebei, Xu Siyang. A Smart Optimization Method for Power Grid Safety Operation Strategy Based on Deep Reinforcement Learning [P]. Tianjin: CN114048903A, 2022-02-15.). The training framework is as follows: Figure 4 As shown. After the policy network converges, the average reward and average decision step size for each round in each scenario during the training phase are as follows. Figure 5 and Figure 6 .

[0151] In this embodiment, for each of the multiple scenarios, an offline decision network based on the proximal policy optimization algorithm is used for differentiated training. The proximal policy optimization algorithm is based on the Actor-Critic framework, and the offline training process of the decision network includes generating sample sequences, training the value network, and training the policy network. First, the policy network of each scenario constructs the current state s based on its corresponding scenario training set. t Generate scheduling policy π and sample action a. t The power system simulation environment calculates the current state s based on actions. t Immediate Rewards t And generate new environment states s t+1 The generated sample sequence <s> is obtained. t a t r t s t+1 > The value network and policy network extract sample sequences for training and updating.

[0152] The value network performs gradient updates by constructing a value network loss function, L. V (u) is:

[0153] L V (u)=E(r t +γV u (s t+1 )-V u (s t )) 2 ;

[0154] V u (s t )=E(R t |s t ;π);

[0155] R t =r t +γrt+1 +γ 2 r t+2 +…;

[0156] In the formula, the expression for updating the value network parameters is:

[0157]

[0158] In the formula, E(·) is the expected function, and V u (s t ) represents state s t The value function represents the state s. t The expected reward R for performing all actions according to policy π. t For cumulative rewards, R t =r t +γr t+1 +γ 2 r t+2 +…;r t For the current state s t The immediate reward, u, is calculated by the reward function r. * Here are the updated values ​​for the value network parameters, where u represents the value network parameter and α represents the value network parameter. u For the value network learning rate, Let γ be the gradient of the value network loss function with respect to parameter u, and γ be the discount factor.

[0159] The policy network introduces an advantage function as its loss function for parameter updates. The loss function (i.e., the advantage function) of the policy network is A(s). t a t )for:

[0160] A(s t a t )=Q u (s t a t )-V u (s t );

[0161] Q u (s t a t )=E(R t |s t a t ,π);

[0162] In the formula, A(s) t a t ) is in state s t Take action a t Compared to the advantage estimate of taking the average action, Q u (s t at Let be the action value function, representing the value of an action in state s. t The following action a is executed according to strategy π. t Expected rewards.

[0163] Considering the sensitivity of policy network training to policy gradient learning rate updates, the proximal policy optimization algorithm introduces a shearing function to limit the sampling probability ratio of the old and new policies. The objective function L optimized by the policy network is... CLIP (θ) is:

[0164]

[0165] In the formula, θ represents the policy network parameters, and clip represents the clipping function. To control the hyperparameters of the clipping interval, clip is used to ensure that the sampling probability of the new and old strategies is greater than that of always being within the interval. Within this range, measures are taken to prevent instability in algorithms based on policy gradient updates; π θ (a|s) represents the sampling probability of the new strategy. Sampling probabilities for the old strategy, lr represents the ratio of sampling probabilities between the old and new strategies. t (θ) needs to be as close to 1 as possible.

[0166] The policy network parameter update expression is:

[0167]

[0168] In the formula, θ * For the updated values ​​of the policy network parameters, α θ For the policy network learning rate, Let θ be the gradient of the objective function with respect to the parameter θ.

[0169] By continuously updating the value network and policy network, the decision network can more accurately evaluate the value of actions and make action selections. Through continuous interaction between the decision network and the environment, until the decision network training converges, a policy network capable of online intelligent scheduling decisions is obtained.

[0170] S5. Randomly select a feature dataset for a specific day within the test set, determine its corresponding scenario, and then match it to the appropriate policy network for scheduling decisions. The node voltage status, grid loss status, and renewable energy consumption status of key node 81 and node 10 are as follows: Figure 7 , Figure 8 and Figure 9 .

[0171] Based on the combined experimental results, we can conclude that:

[0172] 1) As shown in Table 2, the multi-scene segmentation effect was evaluated using the multi-scene evaluation index system. The contour coefficient, variance ratio criterion, and segmentation effectiveness index values ​​were all within a reasonable range and performed well, indicating the effectiveness of the multi-scene segmentation method proposed in this invention.

[0173] 2) According to Figure 5 , Figure 6 It can be seen that during the offline training of the decision network, the average reward per round and the average decision step size per round are significantly improved in each scenario after multi-scenario division compared with the original data.

[0174] 3) According to Figure 7 , Figure 8 It can be seen that, for a certain day's feature dataset randomly selected in the test set, the node voltage value of key node 81 is in the range of 0.994 to 1.010, which strictly meets the scheduling target of safe operation without voltage exceeding the limit; the grid loss is in the range of 0.40 to 0.75, which meets the scheduling target of economic operation; and the absorption of new energy units is close to the upper limit of the output of new energy units, which meets the scheduling target of green operation.

[0175] Example 2

[0176] Embodiment 2 of the present invention provides a terminal device corresponding to Embodiment 1 above. The terminal device can be a processing device for a client, such as a mobile phone, a laptop, a tablet computer, a desktop computer, etc., to execute the method of the above embodiments.

[0177] The terminal device in this embodiment includes a memory, a processor, and a computer program stored in the memory; the processor executes the computer program in the memory to implement the steps of the method in Embodiment 1 described above.

[0178] In some implementations, the memory may be high-speed random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage device.

[0179] In other implementations, the processor can be any type of general-purpose processor, such as a central processing unit (CPU) or a digital signal processor (DSP), and there is no limitation here.

[0180] Example 3

[0181] Embodiment 3 of the present invention provides a computer-readable storage medium corresponding to Embodiment 1 above, on which a computer program / instructions are stored. When the computer program / instructions are executed by a processor, they implement the steps of the method of Embodiment 1 above.

[0182] A computer-readable storage medium can be a tangible device that holds and stores instructions for use by an instruction execution device. A computer-readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination thereof.

[0183] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The solutions in the embodiments of this application can be implemented in various computer languages, such as the object-oriented programming language Java and the interpreted scripting language JavaScript.

[0184] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0185] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0186] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.

[0187] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A smart decision-making method for power system unit dispatching, characterized in that, Includes the following steps: S1. Collect historical data of actual power grid operation as the raw dataset, and use the raw dataset to construct training and test sets; Historical data on actual power grid operation includes: load values ​​at each node at the smallest time granularity and maximum output of each new energy unit; S2. Select feature indicators, process the training set using the feature indicators to obtain a dataset containing multiple feature indicators, and use the CRITIC objective weighting method to assign weights to the values ​​of each feature indicator in the dataset to construct a feature indicator set; the selected feature indicators include: maximum output value of new energy sources. Average output of new energy sources Maximum load utilization rate Daily peak-to-valley difference rate Peak-valley difference rate of new energy Load factor Daily load volatility New energy daily power generation volatility ;No. Weight of each feature index value The calculation formula is: ;in, For the first The amount of information carried by each indicator , , , , for The first object to be evaluated The standard deviation of each characteristic indicator for The first object to be evaluated The standard deviation of each characteristic indicator for The first object to be evaluated The mean of each characteristic indicator; After normalization The first of the objects to be evaluated Each characteristic index value; After normalization The first of the objects to be evaluated Each characteristic index value; For the first The and the first Correlation coefficients among the feature indicators; For the first The first object The numerical value of the characteristic indicator; for positive indicators, For negative indicators, , for The first object to be evaluated The values ​​of several characteristic indicators; the positive indicator refers to the maximum output value of new energy sources. Average output of new energy sources Maximum load utilization rate Load factor The negative indicator refers to the daily peak-to-valley difference rate. Peak-valley difference rate of new energy Daily load volatility New energy daily power generation volatility ; S3. The feature index set is used as input to the Gaussian mixture model, and the output is the divided scenario categories and the feature index set of each scenario category; the scenario categories include stable operation scenarios, load fluctuation scenarios, and new energy fluctuation scenarios; the process of determining the parameters of the Gaussian mixture model includes: 1) Calculate the posterior probability distribution according to Bayes' theorem. : The Gaussian mixture model includes models composed of... A mixture distribution composed of individual Gaussian distributions Indicates the number of sub-Gaussian models. The first The expectation, variance or covariance of a Gaussian model, its probability in a Gaussian mixture model, and its weights. For the first Mixture probability models of individual Gaussian distributions; 2) Update the parameters of the Gaussian mixture model based on the posterior probability distribution: ; ; ; in, , , Corresponding to Updated parameters For the first One object to be evaluated, The number of objects to be evaluated; 3) Repeat steps 1) and 2) until the parameters converge, and obtain the updated parameters, which is the Gaussian mixture model; S4. Construct the current state using the training set data corresponding to the feature index sets of each scene category. Generate scheduling strategy And sample the actions Calculate the current state based on the action. Instant rewards And generate new environmental states. The generated sample sequence is obtained. The generated sample sequences are used as inputs to the value network and policy network in the offline decision network to train the value network and policy network, thereby obtaining the decision model corresponding to each scenario.

2. The intelligent decision-making method for power system unit dispatching according to claim 1, characterized in that, The calculation formulas for each characteristic index are as follows: , , , , , , , , , ; in, This represents the number of time periods divided into the entire day using the smallest time granularity. , , These represent the average, maximum, and minimum values ​​of new energy power output, respectively. , , These represent the average, maximum, and minimum values ​​of the load, respectively. , They are respectively Constant load output and renewable energy output , These are the minimum time granularity fluctuation rates of load output and renewable energy output, respectively. for The load output at any given moment; These are the average and maximum load values, respectively.

3. The intelligent decision-making method for power system unit dispatching according to claim 1, characterized in that, In step S3, based on the Bayesian information criterion, the number K of sub-Gaussian models in the Gaussian mixture model is determined by the successive approximation method; the formula for the Bayesian information criterion is as follows: ;in, This is the Bayesian formula value, used to evaluate Gaussian mixture models; The maximum likelihood function value of the Gaussian mixture model; This represents the number of objects to be evaluated.

4. The intelligent decision-making method for power system unit dispatching according to claim 1, characterized in that, In step S4, the objective function optimized by the policy network is... for: ; in, For policy network parameters, For shearing function, To control the hyperparameters of the shear interval, This represents the ratio of sampling probabilities between the new and old strategies. Sampling probabilities for the new strategy, Sampling probabilities for the old strategy, In the state Take action below Compared to taking average action , , , Let be the action value function, representing the state. According to the strategy Execute action Reward expectations for , indicating the state According to the strategy The expected reward for performing all actions; Let be the expected function. For cumulative rewards.

5. The intelligent decision-making method for power system unit dispatching according to claim 1, characterized in that, Also includes: S5. Construct a test set using the original dataset, randomly select a feature dataset for a certain day within the test set, determine the scenario to which the feature dataset belongs, and match it to the corresponding decision model for scheduling decision.

6. A terminal device, comprising a memory, a processor, and a computer program stored in the memory; characterized in that, The processor executes the computer program to implement the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium having a computer program / instructions stored thereon; characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 5.