Virtual power plant multi-scale market collaborative bidding scheduling method
By constructing a cloud-cluster-device layered architecture and improving the dynamic aggregation method of modularity and market fit, combined with conditional value at risk and Lyapunov optimization, the problem of efficient scheduling of virtual power plants in multi-scale markets is solved, realizing efficient resource aggregation and robustness and economy of bidding schemes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUANENG LIAONING ENERGY SALES LLC
- Filing Date
- 2026-05-07
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to efficiently aggregate distributed energy resources while maintaining the accuracy of physical resource constraints. They also fail to effectively handle uncertainties in new energy output and load, resulting in low computational efficiency of virtual power plant scheduling models and a lack of adaptability to different electricity market mechanisms.
A layered collaborative architecture of cloud-cluster-device is constructed, which abstracts distributed resources into a standardized model. Dynamic aggregation is performed by improving modularity and market fit. Combining conditional value at risk and Lyapunov optimization methods, a two-stage robust optimization bidding model is established. Intraday correction is performed through mixed integer programming-deep Q-network, and an adaptive interaction mode is selected to interact with the power grid.
It achieves efficient unified modeling and aggregation of heterogeneous resources, improves the scheduling and computation efficiency and robustness of virtual power plants in multi-scale markets, ensures the feasibility and economy of bidding schemes, and adapts to different power market mechanisms.
Smart Images

Figure CN122243120A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of power market technology, specifically involving a multi-scale market collaborative bidding and dispatching method for virtual power plants. Background Technology
[0002] Virtual power plants aggregate dispersed distributed energy resources, energy storage systems, and controllable loads through advanced control, metering, and communication technologies, forming a unified entity to participate in the electricity market. As the electricity market system continues to improve, virtual power plants need to participate simultaneously in multiple markets, including the day-ahead energy market, the intraday rolling market, the real-time balancing market, and demand response. These different markets differ significantly in terms of time scale, response speed, and settlement rules, which places multi-scale coordination requirements on the bidding decisions and dispatch control of virtual power plants.
[0003] Existing research has made some progress in the participation of virtual power plants in a single market, but the following technical problems still exist:
[0004] First, distributed resources have diverse characteristics, including interruptible, continuously adjustable, tiered adjustable, mobile, and energy storage types. Existing methods typically use simplified equivalent models for aggregation, which makes it difficult to achieve efficient aggregation of large-scale resources while maintaining the accuracy of physical constraints. This results in a significant deviation between the aggregation model and the actual operating characteristics.
[0005] Second, the uncertainties of wind power, photovoltaic output and load power are superimposed. Traditional robust optimization uses box uncertainty set or ellipsoidal uncertainty set, which is either too conservative and leads to a decline in economic efficiency, or it faces the risk of load loss because it cannot cover extreme scenarios. It lacks a flexible control mechanism for the size of uncertainty set.
[0006] Third, the state-of-charge temporal coupling constraints of the energy storage system correlate decision variables in different scheduling periods, causing the feasible domain of the virtual power plant to become a complex polyhedron in a high-dimensional space. In the coordinated scheduling of the transmission network and the virtual power plant, if the detailed models of each virtual power plant are directly reported, the dimensionality of the scheduling model variables will explode. If the traditional projection method is used for dimensionality reduction, a large number of redundant constraints will be generated in the elimination process, resulting in low computational efficiency.
[0007] Fourth, the bidding schemes determined recently are difficult to cope with the rapid changes in real-time operating conditions during the day. Existing intraday correction methods are mostly simple proportional adjustments or rule-based heuristic corrections, which lack the ability to adapt to market signals and uncertainties. At the same time, the application of deep reinforcement learning in power system dispatch often faces the problem of actions violating physical constraints, and the method of relying on penalty functions is difficult to guarantee the feasibility of the solution.
[0008] Fifth, the electricity market mechanisms differ across regions, with some adopting a centralized clearing model and others a demand-side response pricing model. Virtual power plants need to have the ability to flexibly interact with different market mechanisms. At the same time, the parameters of the resource model will drift with factors such as aging operation and environmental changes, and existing methods lack a closed-loop correction mechanism. Summary of the Invention
[0009] This application provides a multi-scale market collaborative bidding and scheduling method for virtual power plants, aiming to solve the problem that existing technologies lack methods that can systematically integrate multi-scale market collaboration, handle multiple uncertainties, and achieve efficient and reliable bidding and scheduling.
[0010] A multi-scale market collaborative bidding and scheduling method for virtual power plants, the method comprising:
[0011] A cloud-cluster-edge layered collaborative architecture is constructed. At the edge layer, distributed resources with different characteristics are abstracted into standardized models such as interruptible resources, continuously adjustable resources, tiered adjustable resources, movable resources, and energy storage-like resources. At the cluster layer, distributed resources within the edge control layer are dynamically aggregated based on improved modularity to form multiple resource aggregates. Then, based on market fit, the resource aggregates are dynamically combined into virtual power plant entities that participate in market transactions.
[0012] Identify the external characteristics of the virtual power plant entity, including power baseline, maximum adjustable power characteristics, and ramp rate characteristics;
[0013] Based on conditional value at risk, a polyhedral uncertainty set is constructed, and a two-stage robust optimization bidding model for the day-ahead electricity market and the demand response market is established. The first-stage decision variable of the two-stage robust optimization bidding model is an integer variable determined before the uncertainty is realized, and the second-stage decision variable is a continuous variable adjusted after the uncertainty is realized. The two-stage robust optimization bidding model is solved by a column and constraint generation algorithm to obtain the day-ahead bidding scheme.
[0014] The Lyapunov optimization method is used to decouple the coupling constraints of energy storage time periods. The high-dimensional operation model of the virtual power plant is projected onto a two-dimensional plane with exchange power and operating cost as coordinates by the vertex search method to obtain an equivalent projection model.
[0015] During the intraday phase, a rolling time window optimization method is used to dynamically modify the intraday bidding scheme, and a mixed integer programming-deep Q-network method is used to solve the intraday bidding decision. The deep Q-network generates original action suggestions based on the current state, and the mixed integer programming module modifies the original action suggestions into actionable actions under the condition of satisfying physical constraints.
[0016] Depending on whether the power grid provides price or demand signals, the system selects either an active or responsive interaction mode to interact with the power grid, and calculates the response deviations of each resource after the end of each day's operation to correct the standardized model parameters.
[0017] Optionally, the dynamic aggregation of distributed resources within the edge control layer based on improved modularity includes: constructing an undirected graph with nodes as elements and branches as connections, and defining the communication cost between nodes, wherein the communication cost includes geographical location distance cost and edge controller fixed configuration cost;
[0018] An improved modularity index is constructed with the goal of maximizing the difference between actual communication cost and expected communication cost.
[0019] The Leuven algorithm or spectral clustering algorithm is used to solve the problem and obtain the resource aggregate partitioning result that minimizes communication cost and maximizes internal coupling tightness.
[0020] Optionally, the step of dynamically combining the resource aggregates into virtual power plant entities participating in market transactions based on market fit includes: treating each resource aggregate as a game participant and extracting the maximum up-adjustment power, maximum down-adjustment power, response latency, and response duration of all its internal flexible resources;
[0021] After normalizing the four indicators, the market fit index expression is constructed as follows: the normalized value of upward adjustment power, the normalized value of downward adjustment power, and the product of (1 - the normalized value of response delay) and the normalized value of response duration.
[0022] With the goal of maximizing the sum of market fit of all virtual power plants, the optimal alliance partitioning scheme is solved by dynamic programming or branch and bound method to obtain the virtual power plant entities.
[0023] Optionally, the construction of the polyhedral uncertainty set based on conditional value at risk includes: using the Monte Carlo method to generate scenarios from historical prediction error data of wind power output, photovoltaic power output and load power, wherein wind speed follows a Weibull distribution, light intensity follows a Beta distribution and load power follows a normal distribution;
[0024] By introducing the conditional value at risk theory, the maximum acceptable prediction error boundary is calculated at a given confidence level.
[0025] By combining the maximum prediction error boundary with the uncertainty parameter, a polyhedral uncertainty set constrained by the 1-norm and the infinite norm is constructed.
[0026] Optionally, the decoupling of energy storage time-period coupling constraints using the Lyapunov optimization method includes: defining a virtual queue state variable for the energy storage system, wherein the virtual queue state variable is equal to the difference between the actual state of charge and the offset related to the vertex search direction;
[0027] Establish the temporal evolution law of the virtual queue, and take its net cumulative change of zero as the equivalent constraint that the net charge and discharge amount within the energy storage scheduling cycle is zero;
[0028] By introducing the Lyapunov drift penalty function, the objective function of the original vertex search is rewritten as the difference between the original projection objective and the drift penalty term used to suppress excessive fluctuations in the virtual queue. This decouples the cross-time-dependent optimization problem into a single-time linear programming problem that can be solved independently at each time step.
[0029] Optionally, the step of projecting the high-dimensional operation model of the virtual power plant onto a two-dimensional plane with exchange power and operating cost as coordinates through vertex search to obtain an equivalent projection model includes: using the exchange power and operating cost of the virtual power plant as coordinate variables of the two-dimensional projection plane;
[0030] The linear programming problem is solved sequentially along multiple uniformly distributed search directions in the two-dimensional projection plane to determine all vertices of the projected polygon;
[0031] The vertices are connected into a convex polygon using the convex hull algorithm, resulting in an equivalent projection model that is precisely described by a set of linear inequalities.
[0032] Optionally, the method of solving intraday bidding decisions using mixed integer programming-deep Q-networks includes: modeling the day-ahead-real-time two-stage bidding decision problem as a Markov decision process, defining a state space that includes grid topology, resource operating status, market environment information and operating cost status, and a mixed action space that includes market-declared power and resource adjustment instructions;
[0033] Deep Q-networks generate initial action suggestions based on the current state;
[0034] The mixed-integer programming module aims to minimize the quadratic deviation from the original action proposal. It solves the convex quadratic programming problem under the conditions of power balance, resource physical constraints, external characteristic constraints, and market declaration limits to obtain feasible actions.
[0035] The action is applied to the virtual power plant system to calculate the reward and update the deep Q network parameters.
[0036] Optionally, the selection of active interaction mode or responsive interaction mode to interact with the power grid includes: when the power grid provides price or demand signals, selecting responsive interaction mode: receiving peak-shaving demand or price signals issued by the power grid, calculating the minimum cost under each response level according to a preset step size, generating a "response quantity-price" curve and reporting it to the power grid;
[0037] When the power grid does not provide price or demand signals, the active interaction mode is selected: the aggregated model of the virtual power plant is directly reported to the power grid dispatch center or market clearing engine as a generator-like model. The aggregated model includes power boundaries, ramp boundaries and piecewise linear cost functions.
[0038] Compared with the prior art, this application has at least the following beneficial effects:
[0039] This application constructs a cloud-cluster-edge layered collaborative architecture. At the edge layer, distributed resources with different characteristics are abstracted into five standardized models. At the cluster layer, resources are dynamically aggregated based on improved modularity to form resource aggregates. Then, based on market fit, these aggregates are dynamically combined into virtual power plant entities, solving the technical problem of unified modeling and efficient aggregation of heterogeneous resources. The improved modularity index introduces geographical distance cost and edge controller fixed configuration cost, aiming to maximize the difference between actual communication cost and expected communication cost, thereby maximizing the communication tightness within the aggregate and minimizing cross-aggregate communication coupling. The market fit index comprehensively reflects four capabilities: power upscaling, power downscaling, response latency, and response duration. It aims to maximize the sum of the market fit of all virtual power plants, ensuring that the formed virtual power plant entities are highly matched with market demand.
[0040] This application addresses the technical challenge of balancing robustness and economic viability in bidding schemes under the coupling of uncertainties between renewable energy and load by constructing a polyhedral uncertainty set based on conditional value at risk (VAT) and establishing a two-stage robust optimization bidding model for the day-ahead energy market and demand response market. The Monte Carlo method is used to generate prediction error scenarios for wind power, solar power, and load. Conditional VAT theory is introduced to calculate the maximum tolerable prediction error boundary at a given confidence level. A polyhedral uncertainty set jointly constrained by the 1-norm and the infinite norm is constructed. The conservatism of robust optimization can be flexibly controlled by adjusting the confidence level. The two-stage robust optimization model places integer variable decisions before uncertainty realization and continuous variable adjustments after uncertainty realization. Iterative solutions are obtained through a column and constraint generation algorithm, ensuring the feasibility of the scheduling scheme under the worst-case scenario while avoiding economic losses due to excessive conservatism.
[0041] This application employs the Lyapunov optimization method to decouple the time-period coupling constraints of energy storage. By using a vertex search method, the high-dimensional operation model of the virtual power plant is projected onto a two-dimensional plane with exchange power and operating cost as coordinates, solving the technical problem of complex high-dimensional feasible region projection calculations caused by energy storage coupling constraints. The virtual queue state variables of the energy storage system are defined and their temporal evolution laws are established. A Lyapunov drift penalty function is introduced to decouple the cross-time-period coupling optimization problem into a single-time-period linear programming problem that can be solved independently at each time step. The linear programming problem is solved sequentially along multiple uniformly distributed search directions within the two-dimensional projection plane to determine the vertices of the projection polygon. An equivalent projection model, precisely described by a set of linear inequalities, is obtained through a convex hull algorithm. Compared to the traditional Fourier-Motzkin elimination method, the computational efficiency is significantly improved, making non-iterative collaborative scheduling of the virtual power plant and transmission network possible. Attached Figure Description
[0042] Figure 1 A flowchart of a virtual power plant multi-scale market collaborative bidding and scheduling method provided in one embodiment of this application. Detailed Implementation
[0043] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments.
[0044] The virtual power plant multi-scale market collaborative bidding and scheduling method provided in this application, such as Figure 1 As shown, it includes the following steps:
[0045] S1: Construct a layered collaborative architecture for cloud-cluster-device and standardized resource modeling, which includes the following steps:
[0046] S1.1 Establish a three-layer architecture of "cloud-cluster-terminal": The cloud platform layer is responsible for interacting with the power grid dispatch / trading center and global optimization; the cluster layer is the resource aggregation control layer, which consists of multiple resource clusters, each of which has independent aggregation, identification and deaggregation capabilities; the terminal layer is the terminal control unit for each distributed resource (wind power, photovoltaic, gas turbine, energy storage, transferable load, etc.).
[0047] S1.2 At the end layer, distributed resources with different characteristics are standardized and modeled, and abstracted into the following five general models: interruptible resources, continuously adjustable resources, tiered adjustable resources, movable resources, and energy storage-like resources; each resource model uses mixed integer linear constraints to describe its operating characteristics such as power upper and lower limits, energy timing coupling, and ramp rate.
[0048] Specifically, interruptible resources can discretely reduce all or part of their active power within a specific time period. Typical devices include some non-core industrial loads and building air conditioning. The mathematical model is as follows: Let the base power of the i-th interruptible resource at time t be... The decision variable for interruption is a binary variable. (Where 1 represents an interrupt and 0 represents no interruption), then the actual power consumption is:
[0049]
[0050] The total interrupt time during the interruption period is affected by the maximum interruption duration. constraint:
[0051]
[0052] A minimum interval constraint must be met between adjacent interruption periods. The cost of scheduling this resource consists of the compensation cost when it is interrupted.
[0053]
[0054] in The unit power compensation coefficient;
[0055] Continuously adjustable resources can continuously regulate active power within their upper and lower output limits. Typical equipment includes gas turbines, photovoltaic inverters (with active power control functions), and wind power converters. Its mathematical model is as follows:
[0056] Let the output of the j-th continuously adjustable resource at time t be... Then the following conditions are met:
[0057]
[0058] Climbing rate constraints between adjacent time periods:
[0059]
[0060] in and These represent the maximum upward and downward ramp rates, respectively. The operating cost of this resource is represented by a linear function:
[0061]
[0062] in and This is the cost coefficient;
[0063] The output of a tiered control resource can be selected among several discrete power levels. Typical equipment includes multi-level air conditioning units, multi-tap transformers, and some industrial equipment. Its mathematical model is as follows: Let the output level of the k-th tiered control resource be represented by a set of binary variables. This means that m is the number of gears, and only one gear can be selected at any given time.
[0064]
[0065] The output power corresponding to each gear is The actual output is:
[0066]
[0067] The number of gear shifts is affected by the maximum number of gear shifts per day. constraint:
[0068]
[0069] Migrative resources: These resources can shift their power consumption from one time period to another within a scheduling cycle, while the total power consumption remains unchanged. Typical examples include industrial loads with adjustable production schedules and electric vehicle charging loads. The mathematical model is as follows:
[0070] Let the base power of the l-th movable resource at time t be... The translation decision variable is a binary translation indicator variable. Actual power:
[0071]
[0072] in The power shifted out from time period t. Let t represent the power shifted from other time periods to time period t. The total power consumption conservation constraint before and after the shift is as follows:
[0073]
[0074] Translation power is constrained by the maximum translation rate:
[0075]
[0076] The translation period must fall within the time window allowed by the schedule; that is, the translation operation can only be performed within the preset translation window. Internally effective;
[0077] Energy storage-like resources: These resources possess energy timing coupling characteristics similar to energy storage systems, and can perform charging / discharging or energy storage / discharge operations within a certain energy capacity range. Typical devices include battery energy storage systems, thermal storage tanks, and ice storage air conditioners. Their mathematical model is:
[0078] Let the power of the q-th type of energy storage resource at time t be... (Taking charging / energy storage as the positive direction), the state of charge (or stored energy) is ,satisfy:
[0079]
[0080] in For charging and discharging efficiency, The duration of each scheduling period. This represents self-dissipating power (usually negligible or linearly approximated). State-of-charge constraints:
[0081]
[0082] Charge and discharge power constraints ( (Positive indicates charging, negative indicates discharging)
[0083]
[0084] To avoid the impact of deep charge and discharge on lifetime, a state-of-charge boundary protection constraint is added:
[0085]
[0086] in For safety margin. At the same time, the state of charge at the beginning and end of the scheduling cycle should satisfy consistency constraints:
[0087]
[0088] To ensure the sustainable allocation of resources;
[0089] S1.3. The standardized resource model is linked to the equipment class in the power system topology model through a unified modeling language to form an extensible resource model library;
[0090] S2: Based on improved modularity and market fit, the dynamic aggregation and external characteristic identification of virtual power plants specifically includes the following steps:
[0091] S2.1. Based on improved modularity, the edge control layer resource aggregation is partitioned. In the group layer, this invention first dynamically aggregates distributed resources within the edge control layer according to the geographical distribution and network topology of flexible resources, forming several resource aggregations. Each aggregation is uniformly managed by an edge controller. To minimize communication costs and maximize the internal coupling tightness of the aggregations, this invention improves the traditional modularity index by introducing an inter-node communication cost factor, and constructs the following improved modularity optimization model;
[0092] Assume the power distribution network topology is an undirected graph Let N be the set of all access nodes and B be the set of branches. For any two nodes... Define communication cost The expression for the increased communication infrastructure and maintenance costs required to group node i and node j into the same aggregate is:
[0093]
[0094] in and Let be the geographical coordinates of node i and node j, respectively. This is the unit distance communication cost coefficient (calculated from the unit price of the communication media used, such as optical fiber, 4G / 5G modules, and construction costs). The fixed configuration cost (including hardware procurement, software installation and debugging costs) for adding or expanding an edge controller within this aggregate. The sum of the costs of communication between all flexible resources within the aggregate. This is the sum of the communication costs between the flexibility resource at node i and all other resources within the aggregate. A binary indicator variable is introduced. When node i and node j are assigned to the same aggregate Otherwise, it is 0. Then the improved modularity index of the e-th aggregate is... Defined as:
[0095]
[0096] This indicator represents the difference between the actual communication cost and the expected communication cost under random conditions. A larger difference indicates higher communication density between nodes within the aggregate and weaker communication coupling across aggregates. The optimization objective is to maximize the sum of improved modularity across all aggregates, with the aggregate partitioning scheme as the decision variable. Where E is a set of aggregates, Let be the set of nodes contained in the e-th aggregate. The optimization problem can be formulated as:
[0097]
[0098] in A pre-defined lower bound on modularity is used to ensure sufficient cohesion in each aggregate. The optimization problem is solved using either the Leuven algorithm based on a greedy strategy or a spectral clustering algorithm to obtain the optimal resource aggregate partitioning result for the edge control layer. In this partitioning result, each aggregate contains several geographically proximate, low-communication-cost distributed resources, which are uniformly coordinated and controlled by a single edge controller, thus laying the foundation for subsequent upper-layer aggregation in the virtual power plant.
[0099] S2.2. Dynamic Aggregation of Virtual Power Plants on the Cloud Platform Based on Market Fit: After obtaining the resource aggregates at the edge control layer, this invention further combines these aggregates into virtual power plant entities participating in market transactions. The combination process is modeled as a dynamic alliance game problem. Each aggregate, as a game participant, obtains cooperative benefits by forming different alliances (i.e., different virtual power plants). To quantify the market competitiveness of different alliances, this invention constructs a market fit index. As a characteristic function of the alliance S, this indicator comprehensively reflects the degree of matching between the adjustability of resources within the alliance (including power capacity adjustment / decrease, response speed, and duration of adjustment) and market demand.
[0100] Specifically, for each resource aggregate within Alliance S, the four key adjustable capability indicators of all its internal flexibility resources are first extracted: maximum upscaling power. Maximum power reduction Response latency and response duration To avoid the influence of different units of measurement, each indicator is normalized. The normalization formula is as follows:
[0101]
[0102] Where X represents , , , Then, the market fit index is constructed as follows:
[0103]
[0104] The physical meaning of this expression is: a virtual power plant's ability to participate in the market is directly proportional to its adjustable power capacity (both upward and downward), inversely proportional to its response latency (faster response is better), and directly proportional to its sustainable adjustment time. The optimization objective of the coalition game is to select the optimal coalition partitioning scheme, such that all virtual power plants... The goal is to maximize the sum of all elements. This problem can be solved using dynamic programming or branch and bound methods, ultimately yielding the dynamic aggregation result of the virtual power plant at the cloud platform layer. Since the number of aggregates is usually much smaller than the number of original distributed resources, this solution process can be completed in milliseconds, meeting the timeliness requirements of day-ahead scheduling.
[0105] S2.3 Identification Method of External Characteristics of Virtual Power Plants: Based on the virtual power plants formed by the above dynamic aggregation, the dispatch external characteristics they exhibit are further identified. These external characteristics include three core elements: power baseline, maximum adjustable power characteristics, and ramp rate characteristics. These external characteristics will serve as the basis for bidding in the electricity market, demand response market, and ancillary services market for virtual power plants.
[0106] The power baseline is defined as the power exchange curve between the virtual power plant and the upstream grid via the tie line when the grid does not issue any regulation commands to the virtual power plant (i.e., the virtual power plant operates only in its own economically optimal mode). This scheme aims to minimize the total day-ahead operating cost of the virtual power plant and establishes the following optimization model to solve for the power baseline:
[0107]
[0108] in It is the collection of all dispatchable resources (including gas turbines, energy storage, interruptible loads, etc.) within the virtual power plant. Let be the operating cost of the i-th resource at time t (using the linear cost function in S1). This is the current forecast value for market electricity prices. Let t be the power exchanged between the virtual power plant and the grid via the tie line (with the outflow from the virtual power plant as the positive direction). Constraints include: operational constraints for each resource (such as the five types of resource constraints described in step one), node power balance constraints, branch power flow constraints (using a linearized DistFlow model), and tie line power upper and lower bound constraints. After solving the above mixed-integer linear programming problem, the optimal tie line power curve is obtained. This is the power baseline of the virtual power plant;
[0109] Identification of Maximum Adjustable Power Characteristics: Maximum adjustable power characteristics describe the range of adjustable power up and down relative to the power baseline over time, while maintaining safe operation within the virtual power plant. This invention obtains the maximum adjustable power at time t by solving the following two optimization problems. and maximum adjustable power :
[0110]
[0111]
[0112] in and The results are obtained by solving optimization problems with the objective functions of maximizing and minimizing tie-line power, and with constraints including all constraints of the power baseline (but excluding the power baseline itself). To ensure the continuity of the adjustable range within the scheduling cycle, ramp-up constraints between adjacent time periods need to be added to the objective function. After solving for all time periods, the upper and lower adjustable power boundary curves are obtained, and the band-shaped region formed by these boundaries is the power feasible region of the virtual power plant.
[0113] The ramp rate characteristic describes the maximum rate of change of tie-line power in a virtual power plant between adjacent dispatch periods, defining the upward ramp rate. The downhill ramp rate represents the maximum increase in tie-line power from time t to time t+1. To achieve the maximum reduction, the identification model is as follows:
[0114]
[0115]
[0116] in For the tie-line power variable at time t+1, in addition to retaining all constraints from the power baseline identification, the optimization problem also needs to add ramp rate constraints for each resource within the virtual power plant (such as the gas turbine). , The maximum charge and discharge power change rate of energy storage is calculated, and the virtual power plant is forced to operate at the power baseline point at time t, and can be freely adjusted at time t+1. The ramp rate boundary at each time is obtained by solving this optimization problem;
[0117] During the external characteristic identification process described above, the time-period coupling constraints of the energy storage system (i.e., the recursive relationship between adjacent time periods of the state of charge) cause the optimization problems at different times to be coupled with each other, resulting in a linear increase in solution complexity with the number of scheduling time periods. To solve this problem, this invention will use the Lyapunov optimization method to decouple the time-period coupling constraints in subsequent step four, which will not be elaborated here. The identified external characteristics (power baseline, upper and lower adjustable power boundaries, ramp rate boundaries) together with the operating cost function of the virtual power plant (using a piecewise linear function to describe the relationship between regulation cost and regulation amount) constitute a complete aggregation model. This model will be reported from the group layer to the cloud platform layer as input to the multi-market bidding optimization model in S3.
[0118] S3: A two-stage robust optimization bidding model based on the polyhedral uncertainty set of conditional value of risk, with the following specific steps:
[0119] S3.1 Construction of Polyhedral Uncertainty Sets Based on Monte Carlo Scene Generation and CVaR
[0120] At the cloud platform layer, when virtual power plants participate in day-ahead and demand response market bidding, they face the combined impact of three uncertainties: wind power output, photovoltaic output, and load power. To accurately characterize these uncertainties, this invention first uses the Monte Carlo method to generate scenarios based on historical forecast error data for wind power, photovoltaic power, and load.
[0121] The uncertainty in wind power output stems from the random fluctuations in wind speed. Wind speed follows a two-parameter Weibull distribution, with its probability density function being... Where ν is the wind speed, k is the shape parameter, and c is the proportional parameter. The relationship between wind power output and wind speed is described by a piecewise function: when the wind speed is lower than the cut-in wind speed... or higher than the cut-out wind speed The output power is zero when the wind speed is between the cut-in wind speed and the rated wind speed. When the wind speed is between the rated wind speed and the cut-out wind speed, the output power increases linearly with the wind speed; when the wind speed is between the rated wind speed and the cut-out wind speed, the output power remains at the rated power. ;
[0122] The uncertainty in photovoltaic power output stems from the random fluctuations in solar irradiance. Solar irradiance follows a Beta distribution over a given period, with the probability density function being... Where r is the light intensity, For maximum light intensity, and Let be the shape parameter of the Beta distribution. It is a Gamma function. The photovoltaic output power and the light intensity satisfy a linear relationship. ,in For photoelectric conversion efficiency, The area of the solar panel;
[0123] The uncertainty of the load power follows a normal distribution, and its probability density function is: , where z is the load power, μ is the expected value, and σ is the standard deviation.
[0124] Based on the above probability distribution model, this invention uses the Monte Carlo method to generate 20,000 wind power output scenarios, 20,000 photovoltaic power output scenarios, and 20,000 load scenarios, each scenario corresponding to a set of prediction error values. On this basis, conditional value at risk theory is introduced to construct a polyhedral uncertainty set. Let the predicted wind power at a certain moment be... The predicted power of photovoltaic power is The predicted load power is The actual values of the three , , Uncertainty is uniformly expressed as:
[0125]
[0126] in, This represents the Hadamard product (element-by-element multiplication). , , These are the CVaR value calculated using historical prediction error data and the loss price, respectively, at a given confidence level β. Their physical meaning is the maximum prediction error boundary that the virtual power plant can withstand at this confidence level. , , The parameter is uncertain and its value is within the interval [-1, 1].
[0127] Based on the above expression, the polyhedral uncertainty set at time t It can be described as:
[0128]
[0129] in , It is an infinite norm. It is a norm of 1. The uncertainty set is defined as the uncertainty parameter. This uncertainty set employs a joint constraint of the 1-norm and the infinite norm, which limits both the extreme fluctuation range of individual uncertain parameters and controls the cumulative fluctuation of the overall uncertain parameters, thus achieving a flexible balance between robustness and economy. By adjusting the confidence level β, this invention can dynamically control the size of the uncertainty set: the closer β is to 1.00, the closer the considered uncertainty scenario is to the worst-case scenario in historical data, and the higher the conservatism of the robust optimization result; when β decreases, the uncertainty set shrinks accordingly, improving the economy of the robust optimization result. Compared with traditional uncertainty sets using a fixed percentage or the historical maximum deviation, the CVaR-based polyhedral uncertainty set constructed in this invention can fully utilize all historical data rather than just extreme values, thereby effectively reducing the conservatism of robust optimization while ensuring feasibility.
[0130] S3.2. A Two-Stage Robust Optimization Model for Virtual Power Plants Participating in the Energy Market and Demand Response Market. Based on the aforementioned uncertainty set, this invention establishes a two-stage robust optimization model for virtual power plants participating in the energy market and demand response market. This model divides the day-ahead scheduling decisions of virtual power plants into two stages: the first stage is a "here-now" decision, and the second stage is a "wait-see" decision. The decision variable x in the first stage is an integer variable that must be determined before the uncertainty is realized, including: the start-stop state of the gas turbine (0-1 variable), and the charging and discharging state of the energy storage system (0-1 variable, which must satisfy...). To avoid simultaneous charging and discharging), the virtual power plant's power purchase and sale status in the electricity market (0-1 variable, which must meet the following conditions). These variables, once determined, cannot be changed after the uncertainty scenario actually occurs; therefore, they need to be considered within the uncertainty set. To ensure the feasibility of its decisions throughout the entire scope;
[0131] The decision variable y in the second stage is a continuous variable that can be adjusted according to the specific scenario after the uncertainty is realized, including: the actual active power output of the gas turbine at each moment. (subject to upper and lower limits) and climbing rate constraints (Limitations), actual charging and discharging power of energy storage systems and (Subject to upper and lower limits of charge and discharge power) Temporal coupling constraints with charged state , (restrictions), actual power purchase and sale in the electricity market and (by , (Constraints of state variables), actual power consumption of transferable loads (satisfy and ), and the reporting power of the demand response market. (by The constraint is that virtual power plants can only participate in demand response when they participate in the market as loads. These variables can be flexibly adjusted after the actual occurrence of uncertain scenarios to address the impact of forecast bias.
[0132] The total operating cost of a virtual power plant consists of four parts: the power generation cost of the gas turbine, the charging and discharging cost of the energy storage system, the dispatching cost of transferable loads, and the net cost of participating in the electricity market and demand response market. The objective function adopts a two-stage robust optimization form of "minimizing the maximum cost," which can be compactly expressed as:
[0133]
[0134] in, Fixed costs incurred for the first phase of decision-making (mainly including gas turbine start-up and shutdown costs and energy storage system status maintenance costs). The variable costs for the second phase are defined as follows: under the worst-case uncertainty scenario... Below, the virtual power plant optimizes the second-stage variable y to achieve the minimum sum of operating costs and market transaction costs. Second-stage variable costs... The specific expression is:
[0135]
[0136] in Cost of generating electricity using gas turbines; Cost of energy storage charging and discharging; To determine the cost of transferable load dispatching, auxiliary variables are introduced. and Transform it into a linear form; This represents the net cost of market transactions, of which This is the current forecast value for market electricity prices. The virtual power plant submits its bid in the demand response market (the bid is determined based on minimizing the maximum cost per unit of electricity purchased). The goal of the virtual power plant is to minimize the total cost by optimizing the integer variable x in the first stage and the continuous variable y in the second stage, while ensuring the feasibility of the scheduling scheme under the worst-case uncertainty scenario, all while ensuring that all constraints are met.
[0137] S3.3 Model Solving Based on Column-and-Constraint Generation Algorithm: The two-stage robust optimization model has a three-level nested structure of "min-max-min", which cannot be solved by directly calling commercial solvers. The Column-and-Constraint Generation (C&CG) algorithm is used to decompose the model into an iteratively solvable main problem and sub-problems. The Lagrange duality theory is used to transform the max-min double-layer structure in the sub-problems into a single-layer linear programming problem.
[0138] The solution process for the C&CG algorithm is as follows:
[0139] Step 1: Initialization. From the uncertainty set Select a set of values as the initial worst-case scenario. (For example, the case where all uncertain parameters can be selected and their maximum values are taken). Set a lower bound. Upper Realm Number of iterations ;
[0140] Step 2: Solve the main problem. The main problem is in the following form:
[0141]
[0142]
[0143]
[0144]
[0145] in These are auxiliary decision variables used to approximate the variable costs of the second stage; This represents the worst-case uncertainty scenario returned by the subproblem in the i-th iteration; For corresponding The second-stage optimal decision variable is determined. The essence of the main problem is to determine a conservative upper bound estimate of the first-stage decision variable *x* and the second-stage cost, given several worst-case scenarios. Solving the main problem yields the optimal solution. and objective function value Update the lower bound with this value. ;
[0146] Step 3: Solve the subproblems. The subproblems are in the following form:
[0147]
[0148] The subproblem receives the first-stage decision variables derived from the main problem. Then, under the worst-case uncertainty scenario, the minimum variable cost of the second stage is calculated. Since the subproblem contains a "max-min" two-layer structure, direct solution is difficult. Therefore, Lagrange duality theory is used to transform the inner min problem into its dual form, thus transforming the original problem into a directly solvable single-layer linear programming problem. Specifically, the constraints of the second-stage problem (including...) Introduce dual variables into all linear inequalities and equality constraints. , , , and The equivalent form of the subproblem is obtained through the strong duality theorem:
[0149]
[0150]
[0151] in and All elements in the expression are non-negative. However, the above expression contains a non-linear term. (because It is an uncertain variable to be optimized. These are dual variables, and their product is a bilinear term. To solve this nonlinear problem, this invention uses the Big M method to relax the nonlinear term: introducing an auxiliary variable. and The subproblem is ultimately transformed into a mixed-integer linear programming problem. The transformed subproblem can then be solved directly using commercial solvers (such as Gurobi or CPLEX). After solving the subproblem, the objective function value is obtained. and the corresponding worst-case uncertainty scenario ,use Update Upper Realm ;
[0152] Step 4: Convergence assessment and iteration. Set the convergence threshold. (usually taken) or ).like If the algorithm has converged, it is determined that the algorithm has converged and the current optimal solution is returned. And its corresponding second-stage scheduling scheme. Otherwise, return to the worst-case scenario of the subproblem. and its corresponding second-stage optimal variables Add it as a new column to the main problem (i.e., add a variable to the main problem). and constraint and Let the number of iterations be... Jump to S2 to continue iteration;
[0153] The core advantage of the C&CG algorithm described above lies in the fact that while the size of the main problem gradually increases with the number of iterations, each newly added column (i.e., the new worst-case scenario and the corresponding second-stage variables) continuously tightens the approximate estimate of the second-stage cost, thus ensuring that the algorithm converges to the global optimum within a finite number of iterations. Compared to the traditional Benders decomposition method, the C&CG algorithm simultaneously preserves the second-stage variables in the main problem. and uncertain scenarios It can more accurately capture the coupling relationship between the first-stage decision and the second-stage response, and thus has a faster convergence speed;
[0154] By solving the two-stage robust optimization model described above, the cloud platform layer can obtain the optimal bidding scheme (including the declared electricity volume and price curves for each market in each time period) for the virtual power plant in the day-ahead electricity market and demand response market, and then distribute the scheme to the group layer and end layer for execution. If the confidence level β is 1.00, the uncertainty set constructed in this invention degenerates into a traditional box uncertainty set based on the historical maximum deviation; when the β value is between 0.70 and 0.98, the uncertainty set shrinks significantly, which can reduce the conservatism of the robust optimization result by about 2%. Therefore, virtual power plant operators can flexibly adjust the value of the confidence level β according to their own risk preferences and the degree of confidence in historical data, and make a reasonable trade-off between robustness and economy.
[0155] S4: Considering the equivalent projection of the feasible region for time-period coupling and decoupling and the non-iterative coordination of the transmission network, the specific steps include:
[0156] S4.1, A Lyapunov-based decoupling method for energy storage time-period coupling constraints, in the external characteristic identification process of S2 mentioned above, the recursive relationship of the charge state time series of the energy storage system. The introduction of cross-time period coupling constraints leads to interdependence of decision variables across different scheduling periods, making the feasible region of the virtual power plant a complex polyhedron in a high-dimensional space, significantly increasing the computational burden when solving using the vertex search method. To address this issue, this invention employs Lyapunov optimization theory to decouple the time period coupling constraints.
[0157] Specifically, for the energy storage system at node i in the k-th virtual power plant, its virtual queue state variable is defined. for:
[0158]
[0159] in This represents the actual state of charge of the energy storage system at time t. This is the offset related to the current vertex search direction. The offset varies depending on the search direction during the vertex search process. The value of is determined using the following segmentation rule: when the x-coordinate of the search direction At that time, take ;when At that time, take ,in This is the minimum state of charge. For maximum charge and discharge power of energy storage, This is the weighting coefficient. The design of this offset ensures that the virtual queue can adaptively adjust according to the search direction, thereby adapting to decoupling requirements under different boundary conditions;
[0160] The temporal evolution of the virtual queue is described by the following formula:
[0161]
[0162] The physical meaning of this formula is: the increment of the virtual queue equals the net charge / discharge amount of the energy storage system within a unit time period (charge amount minus discharge amount). Accumulating the above formula from t=0 to t=T-1 and dividing by the period T, combined with the constraint that the net charge / discharge amount within the energy storage scheduling period is zero (i.e., ... This shows that the net cumulative change in the virtual queue is zero. This means that the original energy storage time-period coupling constraints can be equivalently transformed into a stability problem for the virtual queue.
[0163] Based on this, the Lyapunov drift penalty function is introduced. This function measures the change in the congestion level of the virtual queue between adjacent time periods. The objective function for vertex search is rewritten as:
[0164]
[0165] The first item The objective of the original vertex search is to maximize the projected value along the search direction. The last two terms are Lyapunov drift penalty terms, used to suppress excessive fluctuations in the virtual queue. Through the above transformation, the optimization problem, which originally had cross-time-period coupling constraints, is decoupled into a single-time-period linear programming problem that can be solved independently at each time step. The decision in each time step depends only on the current virtual queue state, without needing to anticipate future uncertainties. After this decoupling, the optimization problems in each time step of the vertex search process are independent of each other, significantly reducing the computational complexity of feasible region projection characterization and laying the algorithmic foundation for subsequent vertex searches.
[0166] S4.2. Based on the linearization transformation of the objective function according to the upper view graph theory, during the vertex search process, the operating cost function of the virtual power plant... It is usually expressed as a linear function of the gas turbine output. Where a is the cost coefficient and b is a constant term. However, when a virtual power plant aggregates multiple distributed resources, the total cost function may exhibit piecewise linear or even nonlinear characteristics. Directly embedding it into the feasible region projection model will lead to nonlinearity in the optimization problem, increasing the difficulty of solving it. Therefore, the upper mirror diagram theory can be introduced to linearize the objective function.
[0167] An epigraph is defined as the graph of a function and the set of all points above it; that is, for a function... The image shown is Using this definition, the nonlinear part of the objective function for the operating cost of a virtual power plant is replaced with an auxiliary variable. And add the following linear constraints:
[0168]
[0169] The above constraints represent the operating costs of each distributed resource. The linear cost function value is not less than the actual output value, and the sum of the costs of each resource satisfies... ,in This represents the upper bound of the operating cost of the virtual power plant at time t. After the upper view transformation, the nonlinear cost term in the original objective function is transformed into a set of linear inequality constraints, and the objective function itself is simplified to auxiliary variables. linear summation form This transformation preserves the convexity of the problem, allowing subsequent vertex search and projection solutions to be completed within a linear programming framework without the need for a nonlinear solver.
[0170] S4.3. Equivalent projection characterization of the feasible region of the virtual power plant based on vertex search method: After completing the decoupling of time-period coupling constraints and the linearization of the objective function, this invention uses the vertex search method to project the high-dimensional operation model of the virtual power plant onto a two-dimensional plane with exchange power and operating cost as coordinates, forming an equivalent projection model described by a set of linear inequalities.
[0171] Vertex search method solves a series of linear programming problems along multiple different directions within a two-dimensional projection plane to determine all vertices of the projected polygon, and then obtains the complete projected region using the convex hull algorithm. Specifically, let the projection variable be... ,in Let be the power exchanged between the virtual power plant and the transmission network at time t. This represents the corresponding operating cost. For the m-th search direction... Solve the following optimization problem:
[0172]
[0173] The constraints include: all distributed resource operation constraints established in S1 (power upper and lower limits, ramp rate, energy timing coupling, etc. for the five types of resources), external characteristic constraints identified in S2 (node power balance, power flow constraints), and virtual queue stability constraints after decoupling in S4.1 and the mirror graph constraints after linearization in S4.2. The above optimization problem is a linear programming problem, which can be solved efficiently using the simplex method or interior point method.
[0174] After successively solving for the optimal values in M uniformly distributed search directions, a set of candidate vertices on the projection plane is obtained. After removing non-polar points (i.e., points located inside the lines connecting other vertices), the remaining vertices are sorted by angle, and a convex hull algorithm (such as Graham's scan or Andrew's algorithm) is used to generate a convex polygon. This convex polygon is the equivalent projective feasible region of the virtual power plant at time t, which can be precisely represented by a set of linear inequalities:
[0175]
[0176]
[0177] in , , Let be the coefficient of the i-th boundary line. The number of boundary lines is given. It should be noted that all constraints in the above projection model are linear constraints, without any time-series coupling variables. Therefore, the projected feasible region at each time step can be calculated independently, without the need for joint solution across time periods. Compared to the traditional method of directly projecting high-dimensional constraints using Fourier-Motzkin elimination, the decoupling-vertex search two-stage method adopted in this invention avoids generating a large number of redundant constraints during the elimination process, significantly improving computational efficiency. Examples show that using this method to characterize the VPP feasible region projection over 24 time periods takes approximately 102 seconds, while the traditional method takes over 1000 seconds under the same conditions.
[0178] S4.4. Non-iterative collaborative scheduling of virtual power plants and transmission networks based on feasible region projection: After replacing the original high-dimensional VPP detailed model with the equivalent projection model of each virtual power plant obtained in the above steps and reporting it to the transmission network dispatch center, the dispatch optimization problem of the transmission network can be described in the following form:
[0179]
[0180] The constraints include: upper and lower limits of output of conventional generating units in the transmission network, ramp rate constraints, node power balance constraints, line transmission capacity constraints, and equivalent projection constraints of each virtual power plant. and (That is, the switching power of a virtual power plant equals the total internal load power minus the total internal generating power). The above optimization problem is a linear programming problem, and the number of variables is only the number of virtual power plants multiplied by the number of time periods plus the number of conventional unit variables, which is much smaller than the total number of all distributed resource variables in integrated scheduling;
[0181] After solving the above transmission network dispatch model, the optimal switching power command for each virtual power plant is obtained. and the corresponding operating costs The power grid dispatch center sends these instructions to the cloud platforms of each virtual power plant. Upon receiving the power exchange instructions, each virtual power plant independently solves its own internal resource optimal scheduling problem, using these instructions as boundary conditions. The objective function of this sub-problem is... The constraints include all resource operation constraints from step one, external characteristic constraints identified in step two, and newly added exchange power equation constraints. Since the subproblems of each virtual power plant are independent of each other, they can be solved in parallel, significantly reducing the overall computation time;
[0182] S5: Multi-timescale rolling correction and constraint-aware deep reinforcement learning bidding, specifically including the following steps:
[0183] S5.1. A method for revising intraday bidding plans based on rolling time windows: During the day-ahead scheduling phase, the virtual power plant formulates bidding plans for each market period of the following day (including the declared power curve of the power market, the declared capacity of the demand response market, and the bid curve of the reserve market) based on the robust optimization model established in step three for the two-stage day-ahead period. However, since the prediction errors of wind power, photovoltaic power output, and load power decrease as the prediction time scale shortens, the bidding plans determined on the day-ahead phase are difficult to fully match the actual intraday operating conditions. To fully utilize the advantages of ultra-short-term prediction accuracy, this invention introduces a rolling time window optimization method in the intraday phase to dynamically revise the day-ahead bidding plan;
[0184] Assume a daily scheduling period of 24 hours and a time resolution of 1 hour. Intraday rolling optimization uses a 15-minute scheduling step and a rolling window length of [value missing]. This means that each optimization covers the scheduling plan for the next hour starting from the current moment. At each optimization time *nn* within a day, the virtual power plant's cloud platform layer obtains the latest wind power output forecast from the Supervisory Control and Data Acquisition (SCADA) system and the ultra-short-term forecasting module. Photovoltaic power output forecast Forecasted load power And rolling forecasts of real-time market-clearing electricity prices and ,in Based on this, an intraday rolling optimization model is established, whose objective function is to maximize the revenue of the virtual power plant participating in the real-time electricity market and the reserve market within the current rolling window, as expressed below:
[0185]
[0186] in and The power and capacity declared by the virtual power plant in the real-time electricity market and the standby market are respectively the decision variables; The overall operating cost function of the virtual power plant is expressed as a piecewise linear cost function obtained in step four through mirror image transformation and projection. The constraints of the optimization problem include: the external characteristic constraints of the virtual power plant identified in step two (i.e., power upper and lower limit constraints). Climbing rate constraint And the operational constraints of each distributed resource established in step one. Due to the length of the rolling window. With only four time periods, the scale of this optimization problem is much smaller than that of the day-ahead scheduling model, and it can be solved in milliseconds. Rolling optimization employs a strategy of "only executing the optimization results of the current time period and updating in the next time period": that is, after solving for the optimal bidding plan within the rolling window at time n, only the decision result of the first time period (i.e., time n) is taken as the actual market bid volume and submitted to the trading center; after entering the next time period n+1, the rolling window is reconstructed based on the latest forecast data, and a new optimal bidding plan is solved. This "rolling solution, only executing the current time" strategy enables the virtual power plant's bidding plan to be continuously corrected as the forecast information is updated hourly, thereby effectively reducing the deviation between the day-ahead plan and the actual intraday operation, and improving the market returns of the virtual power plant.
[0187] S5.2 Markov Decision Process Modeling of the Day-ahead-Real-Time Two-Stage Bidding Problem: The day-ahead-real-time two-stage bidding decision problem described in S3 and S5.1 is essentially a process by which a virtual power plant maximizes its revenue through sequential decision-making in an uncertain electricity market environment. To introduce a data-driven intelligent decision-making method, this invention models the above problem as a Markov decision process. A Markov decision process consists of four basic elements: a state space, an action space, state transition probabilities, and a reward function. Its core assumption is that the state at the next moment depends only on the state at the current moment and the action taken, and is independent of the past.
[0188] The state space S is defined as follows: During the scheduling period t, the virtual power plant agent observes the following state information. The following six categories of characteristic variables are included: (1) power grid topology and power flow information, namely the voltage amplitude, phase angle and active power flow of each branch of the virtual power plant access node; (2) the operating status of each distributed resource, including the current output of the gas turbine, the current state of charge of the energy storage, and the current actual output of photovoltaic and wind power; (3) the dispatch decision results of the previous period, including the actual amount of electricity won in the power market. Actual winning bids in the backup market and the number of applications submitted to the demand response market (4) Market environment information, including real-time market electricity prices for the current period. standby market electricity price And the electricity price forecast series for future periods; (5) the operating cost status of the virtual power plant, i.e., the total operating cost of the virtual power plant in the current period. (6) Uncertainty scenario indicator variables for the current period, used to identify whether the current situation is in a predefined unfavorable scenario (such as a sharp drop in wind and solar power output caused by extreme weather).
[0189] Action space A is defined as follows: the decision actions that the agent needs to make in each time period. Including: (1) Real-time electricity market reporting power This action is a continuous variable, and its value range is... (2) Real-time standby market reporting capacity The range of values is (3) Real-time reporting power of the demand response market The range of values is (4) Power adjustment commands for internal resources, i.e., the output changes of each distributed resource. The aforementioned action space contains both continuous decision variables (power, capacity) and discrete decision variables (such as energy storage charging and discharging state switching), thus constituting a hybrid action space.
[0190] State transition probability Describes the current state Take action below Then proceed to the next state The probability of state transitions is uncertain due to the randomness of the electricity market environment and renewable energy output. However, explicit modeling is unnecessary; instead, it is learned implicitly through environmental interactions and sample sampling in reinforcement learning.
[0191] reward function The design is key to Markov decision process modeling. The immediate reward defined in this invention comprises three parts: (1) market revenue reward, i.e., the revenue obtained by the virtual power plant in the real-time electricity market and the reserve market, expressed as: (2) Operating cost penalty, i.e., the operating cost of resources within the virtual power plant. (3) Deviation assessment penalty, which is the assessment fee borne when the deviation between the actual power of the virtual power plant and the day-ahead winning bid plan exceeds the allowable range, is expressed as follows: ,in This represents the deviation assessment coefficient. Combining the above three parts, the instantaneous reward function at time t is:
[0192]
[0193] The agent's optimization objective is to maximize the cumulative discount reward over the entire scheduling cycle. ,in This serves as a discount factor, used to balance short-term and long-term returns. Through the Markov decision process modeling described above, the bidding problem of the virtual power plant is transformed into a sequential decision optimization problem solvable within a reinforcement learning framework. The definitions of its state space and action space lay the foundation for the subsequent training and online deployment of a constraint-aware deep reinforcement learning solver.
[0194] S5.3, Mixed Integer Programming-Deep Q-Network Solution: Traditional deep reinforcement learning methods for solving power system dispatching problems typically add security constraints (such as power balance constraints, line flow constraints, and energy storage state of charge constraints) as weighted penalty terms to the reward function. This approach has two inherent drawbacks: first, the penalty coefficients require repeated manual parameter tuning, and the selection of different coefficients significantly affects the algorithm's convergence and the feasibility of the solution; second, even with high penalty coefficients, the actions output by the deep neural network may still violate certain hard constraints, making them unexecutable in real systems. To address these issues, this invention employs a mixed integer programming-deep Q-network method to solve the Markov decision process established in S5.2.
[0195] The core idea of the mixed-integer programming-deep Q-network method is to integrate the action generation process in a deep Q-network with a mixed-integer programming solver, embedding a lightweight mixed-integer programming module within the deep Q-network framework. This module is responsible for converting the coarse action suggestions output by the deep Q-network into feasible actions that strictly satisfy all physical constraints. The specific implementation steps are as follows:
[0196] Step 1: Training the Deep Q-Network. Construct a deep neural network with three fully connected hidden layers (256, 256, and 128 neurons per layer) to approximate the action value function. The network input is the current state. (The dimension of the state vector is consistent with the state space defined in S5.2), and the output is the Q-value estimate for each discrete action. The network employs an empirical replay mechanism and target network technique to improve training stability. The loss function is the mean squared Bellman error, and the optimizer uses the adaptive moment estimation (Adam) optimizer with a learning rate set to... .
[0197] Step 2: Embedding of the mixed-integer programming module. When the deep Q-network is based on the current state... Generate original action suggestions Subsequently, the action suggestion is not directly used as the output of the action, but rather as the input to the mixed-integer programming module. The optimization objective of the mixed-integer programming module is to generate feasible actions that satisfy all constraints, while approximating the action suggestions of the deep Q-network as closely as possible. The mathematical model of this module can be expressed as:
[0198]
[0199] The constraints include: (1) power balance constraints (2) Power upper and lower limits and ramp rate constraints for distributed resources (as described in the five resource models in S1); (3) Temporal coupling constraints of energy storage state of charge. and charged state boundary constraints (4) External characteristic constraints of the virtual power plant (power feasible region and ramp rate boundary identified in step 2); (5) Physical constraints on market declaration volume. Since the above constraints are all linear constraints and the objective function is a quadratic function, this problem is a convex quadratic programming problem, which can be solved in milliseconds. If the action suggestion output by the deep Q network has satisfied all constraints, the mixed integer programming module directly outputs the action suggestion; if the action suggestion violates some constraints, the mixed integer programming module finds the action closest to the action suggestion and within the feasible region by solving the above quadratic programming problem, and uses it as the final action to be executed.
[0200] Step 3: Calculate rewards and update strategy after action correction. Calculate the available actions output by the mixed integer programming module. Operating on a virtual power plant system to observe the next state. and reward value , experience tuple Stored in the experience replay pool. The deep Q-network is based on... The generated rewards are used for gradient updates, thereby indirectly learning the optimal policy under the constraint conditions. Due to the presence of the mixed-integer programming module, the deep Q-network can gradually learn to avoid ineffective actions during training without manually adjusting the constraint penalty coefficients;
[0201] The core advantage of the mixed-integer programming-deep Q-network approach lies in the fact that the deep Q-network is responsible for efficiently exploring and approximating the optimal policy in a high-dimensional state space, while the mixed-integer programming module is responsible for ensuring the feasibility of actions. The two work together, each fulfilling its specific function. Compared to traditional deep reinforcement learning methods, the decision results generated by this method strictly satisfy all physical constraints of the virtual power plant operation, avoiding market bias assessments and even system safety risks caused by constraint violations.
[0202] S5.4 Deployment scheme for offline training and online execution: In order to improve the computational efficiency of intraday rolling optimization and meet the real-time requirement of completing bidding decisions within a 15-minute scheduling cycle, this invention adopts a two-stage deployment scheme of "offline training and online execution".
[0203] Offline Training Phase: The mixed-integer programming-deep Q-network model is trained using a historical dataset. The historical dataset contains the following data collected during the operation of the virtual power plant over the past year: actual and predicted wind and solar power output, actual and predicted load power, real-time market electricity price sequences, reserve market electricity price sequences, and actual scheduling records of each distributed resource of the virtual power plant. A parallel environment exploration strategy is employed during training: agents are run simultaneously in multiple independent environment replicas to collect experience, accelerating the filling of the experience replay pool. Each training epoch contains 2000 scheduling cycles (each cycle is 24 hours), for a total of 500 epochs. The model is validated every 50 epochs during training by calculating the model's average cumulative reward and constraint violation rate on an independent historical test set. When the average constraint violation rate after 5 consecutive validations is below 0.1% and the average cumulative reward no longer significantly increases, the model is considered converged, and the current network parameters are saved as the completed training model.
[0204] Online execution phase: The trained deep Q-network model parameters are loaded into the inference engine of the cloud platform layer. At each daily rolling optimization time n, the cloud platform obtains the current state from the data acquisition system. The input is then fed into a deep Q-network for forward inference to obtain the original action suggestions. Then, the mixed-integer programming module is called to adjust the constraints and obtain feasible actions. Finally, the market declaration volume part of the action. The internal resource adjustment instructions are submitted to the power trading center and then distributed to the group layer and end layer for execution. The forward inference time of the deep Q-network is less than 5 milliseconds, the solution time of the mixed integer programming module is less than 10 milliseconds, and the total time of the entire online decision-making process is controlled within 15 milliseconds, which is far lower than the 15-minute scheduling cycle requirement, meeting the timeliness requirements for virtual power plants to participate in the real-time market. Compared with the tens of seconds to several minutes of computation time required to solve the robust optimization model of the two days in step three, the mixed integer programming-deep Q-network method proposed in this step achieves an order-of-magnitude improvement in computational efficiency in the intraday rolling optimization stage, enabling virtual power plants to respond to market changes and new energy fluctuations in real time with a 15-minute cycle, realizing full-time-scale collaborative optimization scheduling from "day-ahead planning" to "intraday correction" to "real-time execution".
[0205] S6: Proactive / responsive interaction and closed-loop feedback to adapt to differentiated market mechanisms, specifically including the following steps:
[0206] S6.1. Select the interaction mode based on whether the power grid provides price / demand signals, including:
[0207] Active mode: The virtual power plant directly reports the aggregated model (power boundary, ramp boundary, piecewise linear cost function) as a generator-like model, which is then cleared by the grid dispatch or market.
[0208] Response-based: When the power grid issues peak-shaving demand or price signals, the virtual power plant calculates the minimum cost at each response level in 10% increments and reports the "response quantity - price" curve.
[0209] S6.2 After the daily operation is completed, the response deviation of each resource and the cluster is statistically evaluated, the standardized model parameters (such as response latency and reliability indicators) are corrected, and the feedback is fed back to the resource model library in step one to realize closed-loop parameter update.
[0210] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
Claims
1. A virtual power plant multi-scale market coordinated bidding and dispatching method, characterized in that, The method includes: A cloud-cluster-edge layered collaborative architecture is constructed. At the edge layer, distributed resources with different characteristics are abstracted into standardized models such as interruptible resources, continuously adjustable resources, tiered adjustable resources, movable resources, and energy storage-like resources. At the cluster layer, distributed resources within the edge control layer are dynamically aggregated based on improved modularity to form multiple resource aggregates. Then, based on market fit, the resource aggregates are dynamically combined into virtual power plant entities that participate in market transactions. Identify the external characteristics of the virtual power plant entity, including power baseline, maximum adjustable power characteristics, and ramp rate characteristics; Based on conditional value at risk, a polyhedral uncertainty set is constructed, and a two-stage robust optimization bidding model for the day-ahead electricity market and the demand response market is established. The first-stage decision variable of the two-stage robust optimization bidding model is an integer variable determined before the uncertainty is realized, and the second-stage decision variable is a continuous variable adjusted after the uncertainty is realized. The two-stage robust optimization bidding model is solved by a column and constraint generation algorithm to obtain the day-ahead bidding scheme. The Lyapunov optimization method is used to decouple the coupling constraints of energy storage time periods. The high-dimensional operation model of the virtual power plant is projected onto a two-dimensional plane with exchange power and operating cost as coordinates by the vertex search method to obtain an equivalent projection model. During the intraday phase, a rolling time window optimization method is used to dynamically modify the intraday bidding scheme, and a mixed integer programming-deep Q-network method is used to solve the intraday bidding decision. The deep Q-network generates original action suggestions based on the current state, and the mixed integer programming module modifies the original action suggestions into actionable actions under the condition of satisfying physical constraints. Depending on whether the power grid provides price or demand signals, the system selects either an active or responsive interaction mode to interact with the power grid, and calculates the response deviations of each resource after the end of each day's operation to correct the standardized model parameters.
2. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The dynamic aggregation of distributed resources within the edge control layer based on improved modularity includes: constructing an undirected graph with nodes as elements and branches as connections, and defining the communication cost between nodes, wherein the communication cost includes geographical distance cost and fixed configuration cost of the edge controller; An improved modularity index is constructed with the goal of maximizing the difference between actual communication cost and expected communication cost. The Leuven algorithm or spectral clustering algorithm is used to solve the problem and obtain the resource aggregate partitioning result that minimizes communication cost and maximizes internal coupling tightness.
3. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The method of dynamically combining the resource aggregates into virtual power plant entities participating in market transactions based on market fit includes: treating each resource aggregate as a game participant and extracting the maximum up-adjustment power, maximum down-adjustment power, response latency, and response duration of all its internal flexible resources; After normalizing the maximum up-adjustment power, maximum down-adjustment power, response delay, and response duration, the market fit index expression is constructed as follows: the normalized value of up-adjustment power, the normalized value of down-adjustment power, the product of (1 - the normalized value of response delay) and the normalized value of response duration. With the goal of maximizing the sum of market fit of all virtual power plants, the optimal alliance partitioning scheme is solved by dynamic programming or branch and bound method to obtain the virtual power plant entities.
4. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The construction of the polyhedral uncertainty set based on conditional value at risk includes: using the Monte Carlo method to generate scenarios from historical prediction error data of wind power output, photovoltaic power output and load power, wherein wind speed follows a Weibull distribution, light intensity follows a Beta distribution and load power follows a normal distribution; By introducing the conditional value at risk theory, the maximum acceptable prediction error boundary is calculated at a given confidence level. By combining the maximum prediction error boundary with the uncertainty parameter, a polyhedral uncertainty set constrained by the 1-norm and the infinite norm is constructed.
5. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The decoupling of energy storage time-period coupling constraints using the Lyapunov optimization method includes: defining a virtual queue state variable for the energy storage system, wherein the virtual queue state variable is equal to the difference between the actual state of charge and the offset related to the vertex search direction; Establish the temporal evolution law of the virtual queue, and take its net cumulative change of zero as the equivalent constraint that the net charge and discharge amount within the energy storage scheduling cycle is zero; By introducing the Lyapunov drift penalty function, the objective function of the original vertex search is rewritten as the difference between the original projection objective and the drift penalty term used to suppress excessive fluctuations in the virtual queue. This decouples the cross-time-dependent optimization problem into a single-time linear programming problem that can be solved independently at each time step.
6. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The method of projecting the high-dimensional operation model of the virtual power plant onto a two-dimensional plane with exchange power and operating cost as coordinates through vertex search to obtain an equivalent projection model includes: using the exchange power and operating cost of the virtual power plant as coordinate variables of the two-dimensional projection plane; The linear programming problem is solved sequentially along multiple uniformly distributed search directions in the two-dimensional projection plane to determine all vertices of the projected polygon; The vertices are connected into a convex polygon using the convex hull algorithm, resulting in an equivalent projection model that is precisely described by a set of linear inequalities.
7. The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The method of solving intraday bidding decisions using mixed integer programming-deep Q-networks includes: modeling the day-ahead-real-time two-stage bidding decision problem as a Markov decision process, defining a state space that includes grid topology, resource operating status, market environment information and operating cost status, and a mixed action space that includes market-declared power and resource adjustment instructions; Deep Q-networks generate initial action suggestions based on the current state; The mixed-integer programming module aims to minimize the quadratic deviation from the original action proposal. It solves the convex quadratic programming problem under the conditions of power balance, resource physical constraints, external characteristic constraints, and market declaration limits to obtain feasible actions. The action is applied to the virtual power plant system to calculate the reward and update the deep Q network parameters. 8.The virtual power plant multi-scale market coordinated bidding and dispatching method according to claim 1, characterized in that, The selection of active or responsive interaction mode to interact with the power grid includes: when the power grid provides price or demand signals, selecting responsive interaction mode: receiving peak-shaving demand or price signals from the power grid, calculating the minimum cost at each response level according to a preset step size, generating a "response quantity - price" curve and reporting it to the power grid; When the power grid does not provide price or demand signals, the active interaction mode is selected: the aggregated model of the virtual power plant is directly reported to the power grid dispatch center or market clearing engine as a generator-like model. The aggregated model includes power boundaries, ramp boundaries and piecewise linear cost functions.