Electric vehicle charging station coordinated peak shaving method based on hierarchical reinforcement learning

By decomposing the operation mode of charging stations into two layers of intelligent agents through hierarchical reinforcement learning algorithm, a two-layer collaborative optimization model is designed. The charging strategy is optimized by Dueling DQN and TD3 algorithms, which solves the problem of low computational efficiency of electric vehicle charging stations in complex environments and achieves better charging strategy and grid regulation efficiency.

CN116031923BActive Publication Date: 2026-06-12HEFEI UNIV OF TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HEFEI UNIV OF TECH
Filing Date
2023-02-24
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing collaborative peak shaving methods for electric vehicle charging stations struggle to effectively address the issues of large state spaces and sparse rewards in complex environments, resulting in low computational efficiency and an inability to achieve ideal charging strategy optimization.

Method used

A hierarchical reinforcement learning algorithm is adopted to decompose the operation mode of the charging station into two layers of intelligent agents. A two-layer collaborative optimization model is designed through a service electricity price setting unit and a charging power control unit. The Dueling DQN and TD3 algorithms are used to solve the model and optimize the charging strategy to achieve peak shaving response.

🎯Benefits of technology

It accelerates computation speed, obtains better charging behavior strategies, effectively solves the problems of large state space and sparse rewards, improves grid regulation efficiency and user satisfaction, and obtains additional response benefits.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116031923B_ABST
    Figure CN116031923B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of power system, especially to a kind of electric vehicle charging station coordinated peak shaving method based on hierarchical reinforcement learning, the method includes the following steps: S1, the state of charging pile, the state of charging waiting parking space, the joint state of waiting parking space are defined, system operation mode is formulated according to service tariff unit and charging power control unit.S2, determine the mapping relationship of charging service tariff and user arrival rate, adjust charging pile charging power, construct unit time peak shaving reward function, establish double-center coordinated peak shaving system of charging station.S3, according to the operation mode of station, with SPM as upper layer, and CPC as lower layer, design double-layer collaborative optimization model.S4, the optimization objective function of upper layer agent is established, and is solved using Dueling DQN algorithm.The optimization objective function of lower layer agent is established, and is solved using TD3 algorithm.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power system technology, and in particular to a collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning. Background Technology

[0002] In recent years, against the backdrop of global energy shortages and environmental degradation, electric vehicles have been widely promoted both domestically and internationally due to their energy-saving and environmentally friendly advantages. With the increasing influx of electric vehicles, the existing charging station capacity may be insufficient to meet the charging demand, potentially leading to severe charging queues. This not only wastes drivers' time productivity but may also, in severe cases, affect the power quality of the distribution network. Developing effective electric vehicle charging guidance strategies to alleviate grid pressure is fundamental and essential for the future large-scale adoption of electric vehicles.

[0003] To adapt to the development and safe, stable, and high-quality operation of the new generation of power systems, construct a clean, low-carbon, safe, and efficient energy system, control the total amount of fossil fuels, and focus on improving utilization efficiency, this invention enables the dispatch center to initiate peak shaving response when fluctuations in new energy sources cause an imbalance between power grid supply and demand, guiding users to participate in grid operation regulation. Electric vehicle-based power stations can participate in peak shaving response directly or indirectly, alleviating grid pressure while also gaining additional response benefits.

[0004] Currently, the main methods for solving the model of collaborative peak shaving for electric vehicle charging stations include traditional solvers and traditional reinforcement learning algorithms. While traditional mathematical model-based solvers and reinforcement learning methods can obtain optimal solutions, they become problematic in complex environments and challenging tasks, leading to a rapid increase in the number of parameters to be learned and the required storage space, making it difficult to achieve ideal results. Hierarchical reinforcement learning decomposes complex problems into several sub-problems, solving them one by one through a divide-and-conquer approach to ultimately solve the complex problem. Hierarchical reinforcement learning algorithms offer a new approach to solving such problems. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention provides a collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning. This method can publish charging service prices for the next time period based on wide-area information such as time-of-use pricing of the power grid, and control the charging power of fast-charging piles according to the peak-shaving needs of the upper-level dispatching agency and the current status of the charging stations at each section. The collaborative peak-shaving system guides users to participate in grid operation regulation. Electric vehicle charging stations can participate in peak-shaving response directly or indirectly, alleviating grid pressure while gaining additional response benefits. A reward-constraint strategy optimization method is used to train the agent, introducing constraints as penalty signals into the reward function, thus solving the problem of reinforcement learning finding loopholes in the reward function.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] A collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning includes the following steps:

[0008] S1. Define the status of the charging pile, the status of the charging waiting space, and the combined status of the waiting spaces. Determine the system operation mode based on the service electricity price setting unit and the charging power control unit.

[0009] S2. Determine the mapping relationship between the charging service electricity price and the user arrival rate, adjust the charging power of the charging pile, construct the peak shaving reward function per unit time, and establish a dual-center collaborative peak shaving system for charging stations.

[0010] S3. Based on the operation mode of the station, a two-layer collaborative optimization model is designed with the service electricity price setting unit as the upper layer and the charging power control unit as the lower layer.

[0011] S4. Establish the optimization objective function of the upper-layer agent and solve it using the Dueling DQN algorithm. Establish the optimization objective function of the lower-layer agent and solve it using the TD3 algorithm.

[0012] Further optimization of this technical solution involves the following steps in step S1: Determining the system operation mode.

[0013] S11. Determine the number of DC fast charging piles as J and the number of charging waiting parking spaces as L;

[0014] Let J DC fast charging piles be respectively labeled as... At time t, the charging station The state is denoted as

[0015]

[0016] in, They represent The type of electric vehicle connected, its maximum battery capacity, and its rated charging power. These represent the current state of charge (SBC) and charging power of the electric vehicle, respectively. If free, then Therefore, the joint state of the J charging piles at time t is denoted as ;

[0017] S12. After recording the combined status of the charging piles, record the L charging waiting spaces as follows: At time t, charging while waiting for a parking space. The state is denoted as ,in They represent The types of electric vehicles parked on the road, their state of charge, and their arrival times;

[0018] S13, if If there are no electric vehicles waiting, then When there are waiting areas When a car is parked, the joint state of the L waiting parking spaces is denoted as:

[0019]

[0020]

[0021] S14. When an electric vehicle finishes charging and leaves the charging station. If there are electric vehicles in the waiting queue, then wait for a parking space. Electric vehicle access To start the charging service, see below:

[0022]

[0023] ,

[0024]

[0025]

[0026] in This indicates the initial charging power of the electric vehicle when connected to the charging station.

[0027] Taking the arrival of electric vehicles as the triggering event, assume that M types of electric vehicles arrive at the charging station sequentially via a Poisson process, with an arrival rate of [missing information]. The time sequence of electric vehicles arriving at the depot is denoted as... , This indicates the total number of electric vehicles arriving at the charging station. Let be the time when the k-th electric vehicle arrives at the depot. When the k-th electric vehicle arrives at the depot, the triggering event is recorded as . , These represent the type of electric vehicle and the state of charge of the battery, respectively.

[0028] When the m-th electric vehicle is When you arrive at the charging station, if there are no available parking spaces in the waiting area, The electric vehicle will leave the depot immediately; if there are available parking spaces in the waiting area, The electric vehicle then enters the waiting parking space, and the waiting queue status changes accordingly, as shown below.

[0029]

[0030] .

[0031] Further optimization of this technical solution, the establishment of the dual-center collaborative peak-shaving system for charging stations in step S2 includes the following steps:

[0032] S21, Order Let K be the time interval for issuing time-of-use (TOU) electricity prices, and let K be the total number of TOU electricity price cycles. Let K be the peak-shaving electricity price of the power grid at any time t within a day. ,make , It is a finite electricity price state space; let For the k-th time-of-use electricity price At the time of issuance, the electricity price sequence for scoring is as follows: ,in, ,make ;

[0033] S22, at the time of issuance of the k-th electricity price cycle The service electricity pricing unit sets prices based on the future time window of the power station. Baseline of power stations and time-of-use electricity pricing within the grid The superior's peak shaving instructions and the number of electric vehicles queuing at the current moment. occupancy rate of charging piles , formulate Electricity price for charging services within the area For ease of representation, the service electricity price will be abbreviated as... , , The price adjustment range for charging station services;

[0034] S23. When the service electricity pricing unit makes a decision at the time of decision-making release Service electricity pricing plan Then, the arrival rate of electric vehicles during that period was obtained. ;

[0035] S24. When the power grid has no peak-shaving demand. , As a collection of peak-shaving periods for the power grid, CPC adjusts each charging power to the rated charging power of the electric vehicle. ; where let the d-th scheduling time be The issued peak reduction command is recorded as The total number of decisions made in a day is T represents the total length of a day. For the scheduling instruction issuance cycle, in CPC's Decision Moments , The charging power control unit issues the charging power control command as follows: abbreviated as ;

[0036] S25. When the power grid has peak-shaving needs. The charging power control unit adjusts the power according to the current charging station status. Peak shaving command Charging service prices Time-of-use pricing with the power grid Issue charging power control command As shown below:

[0037]

[0038] Let J be a J-dimensional vector, where each value in the vector represents the charging adjustment power of the charging station. Let c be the charging power of the j-th charging pile at the Z-th decision time of the lower-level intelligent agent.

[0039] ,

[0040] The charging power of each charging station changes as follows:

[0041]

[0042] S26 and charging stations will be included in the decision-making period. Within, CPC is reduced based on historical operating curves. The power consumption is measured, and the dispatch center will reward or penalize the peak-shaving behavior of the stations based on their actual response. The reward or penalty for peak shaving at any time t within the peak-shaving period will be recorded as follows:

[0043]

[0044]

[0045] in, As the baseline for charging stations, The penalty coefficient is... To reduce peak volume, As the reward coefficient, This represents the actual reduction amount. A collection of charging stations. Let t be the charging power of the j-th charging pile at time t.

[0046] Further optimization of this technical solution, in step S3, establishing a two-layer collaborative optimization model includes the following steps:

[0047] S31. Based on the operation mode of the power station, a two-layer collaborative optimization model is designed with the service electricity price setting unit as the upper layer and the charging power control unit as the lower layer. The upper-layer intelligent agent comprehensively considers the operating revenue of the power station and the user's satisfaction with the service electricity price, and sets the charging service price to change the arrival rate of electric vehicle users, so that the power station can initially achieve peak shaving and valley filling. The lower-layer intelligent agent comprehensively considers the peak shaving response revenue and the user's satisfaction with the reduction of charging power, and controls the charging power of the charging pile during the peak shaving period to respond to the upper-layer dispatching agency.

[0048] S32. At any time during the k-th time-of-use pricing cycle Set the charging station status to The electricity price for charging services is , The revenue per unit time from charging services at the charging station is recorded as As shown below:

[0049]

[0050]

[0051] in, For charging piles The charging service price for the connected electric vehicle when it first arrives at the depot. If the charging service price increases while the electric vehicle is waiting, the charging service price for that electric vehicle will remain unchanged as compensation to the user.

[0052] S33, During peak shaving periods When the charging power of some charging stations is reduced, certain compensation will be given to the affected users, denoted as... The overall revenue obtained by the station during a day's operation. As shown below:

[0053]

[0054]

[0055]

[0056]

[0057] make Revenue is generated by providing charging services to high-speed train users. In order to obtain response compensation by participating in the power grid's peak shaving response, Indicates the amount of charging piles provided per unit of time. The compensation cost for electric vehicles is determined by the current charging power of the electric vehicle, its rated charging power, and the compensation factor. The decision is as follows:

[0058]

[0059] S34. Because the upper-level intelligent agent sets different electricity price cycles The price of charging services within the charging station affects the arrival rate of electric vehicle users during peak and off-peak hours, thereby increasing the utilization rate of charging piles and improving the charging station's revenue. At the same time, some charging users may experience a decrease in charging service satisfaction due to rising service prices. The cost of user satisfaction with the charging service price during the specified time period is denoted as: ,

[0060]

[0061]

[0062] in Price range for charging services The original electricity price for charging services within the area, With a fixed cost coefficient, the upper-level agent aims to achieve the optimal daily economic benefit of the site while considering service price satisfaction. This optimization objective is denoted as... As shown below

[0063]

[0064] in , These are the sub-objective weight coefficients;

[0065] S35, Lower-level agents during peak shaving periods Internally, the charging power of some charging stations will be reduced. , This leads to longer charging times for electric vehicle users. Assuming that at any decision point during the peak-shaving period of the lower-level intelligent agent... , Its charging control command is Then, at any time t within the decision-making period, The cost of lower-level satisfaction per unit time is As shown below:

[0066]

[0067]

[0068] The upper-level intelligent agent aims to maximize the economic benefits of peak-shaving response while balancing user satisfaction costs and compensation. Its optimization objective is denoted as... ,

[0069]

[0070] in , The sub-objective weight coefficient.

[0071] In a further optimization of this technical solution, step S4 involves establishing the optimization objective functions for the upper and lower layer agents and solving them using relevant algorithms:

[0072] The goal of the upper-level intelligent agent is to maximize the cumulative reward within a finite time frame. Due to the randomness of vehicle traffic flow, it is naturally a random variable, starting from the initial state. The total profit accumulated after K steps at the beginning is:

[0073]

[0074] If we consider the initial state to be random, then the optimization objective function of the upper-level agent is:

[0075]

[0076] Optimization strategy To maximize The resulting control strategy, i.e. The upper-level intelligent agent uses the Dueling DQN algorithm to solve the problem;

[0077] Lower-level agents at the decision-making moment The decision state is

[0078] ,

[0079] The action in this state is defined as , , Let the mapping relationship between state actions and charging pile power be denoted as . , As shown below:

[0080]

[0081]

[0082]

[0083] in A 0 / 1 variable, representing the time at the decision point. charging pile Is there an electric charger? If so, ;on the contrary, ,

[0084] Assuming at the decision moment The upper-level intelligent agent is in a state Take action The single-step transfer reward is recorded as As shown below:

[0085]

[0086] Considering the initial state is random, the optimization objective function of the lower-level agent is:

[0087]

[0088] Optimization strategy To maximize The resulting control strategy, i.e. The lower-level intelligent agent uses the TD3 algorithm to solve the problem.

[0089] This technical solution is further optimized, and step S4 specifically includes the following steps:

[0090] S41, Adjusting the pricing space for charging services Use constants Discretized There are several levels, among which Then the action at the k-th decision time The corresponding charging service price is ,

[0091] Upper-level intelligent agents at the decision-making moment The decision state is ,in express The baseline of the charging stations within the area, , ; express Total internal peak reduction power; They represent in Time-of-use electricity pricing, charging pile occupancy rate, and waiting queue length;

[0092] S42. At the moment of decision-making The upper-level intelligent agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as As shown below:

[0093]

[0094]

[0095] in Let r be the cost of lower-level satisfaction per unit time, and r be the economic efficiency per unit time, where if There were no power grid peak shaving instructions during the period. ;

[0096] S43, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. The total expected return accumulated over K steps in making a decision is as follows:

[0097]

[0098] The optimization objective function for the upper-level intelligent agent is:

[0099]

[0100] Optimization strategy To maximize The resulting control strategy, i.e. ;

[0101] S44. Solve the upper-layer agent using the Dueling DQN algorithm of deep reinforcement learning;

[0102] S45. Let the mapping relationship between the state action and the charging pile power be denoted as: , As shown below:

[0103]

[0104]

[0105]

[0106] in As an intermediate variable, the lower-level agent makes decisions at the time of decision-making. The decision state is The action in this state is defined as , , , A 0 / 1 variable, representing the time at the decision point. charging pile Is there an electric charger? If so, ;on the contrary, ;

[0107] S46. At the moment of decision-making The lower-level agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as As shown below:

[0108]

[0109] S47, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. Making a decision, the total expected return accumulated through Z steps of transition:

[0110]

[0111] Considering the initial state is random, let the optimization objective function of the lower-level agent be:

[0112]

[0113] Optimization strategy To maximize The resulting control strategy, i.e. ;

[0114] S48. Use the TD3 algorithm to solve the lower-level agent.

[0115] Unlike existing technologies, the above technical solution has the following beneficial effects:

[0116] The hierarchical reinforcement learning-based collaborative peak-shaving method for electric vehicle charging stations effectively addresses the problems of large state and behavior space combinations and sparse rewards, thereby accelerating computation and obtaining better behavioral strategies. By training the agent using a reward-constrained policy optimization method and introducing constraints as penalty signals into the reward function, the problem of finding loopholes in the reward function during reinforcement learning is solved. Attached Figure Description

[0117] Figure 1 This is a schematic diagram of the collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning. Detailed Implementation

[0118] To explain in detail the technical content, structural features, objectives, and effects of the technical solution, the following description is provided in conjunction with specific embodiments and accompanying drawings.

[0119] Please see Figure 1 The diagram shows a flowchart of a collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning. A preferred embodiment of this invention provides a collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning, which includes the following steps:

[0120] S1 defines the status of the charging pile, the status of the charging waiting space, and the combined status of the waiting spaces. The system operation mode is determined based on the service electricity price setting unit and the charging power control unit.

[0121] Determining the system's operating mode includes the following steps:

[0122] S11. Determine the number of DC fast charging piles as J and the number of charging waiting parking spaces as L.

[0123] Let J DC fast charging piles be respectively labeled as... At time t, the charging station The state is denoted as

[0124]

[0125] in, They represent The type of electric vehicle being connected, the maximum battery capacity, and the rated charging power.

[0126] These represent the current state of charge (SOC) of the electric vehicle's battery and the charging power, respectively. If... If free, then Furthermore, the joint state of the J charging piles at time t is denoted as... .

[0127] S12. After recording the combined status of the charging piles, record the L charging waiting spaces as follows: At time t, charging is waiting for a parking space. The state is denoted as

[0128] in They represent The types of electric vehicles parked on the road, their state of charge (SoC), and arrival times.

[0129] S13, if If there are no electric vehicles waiting, then When there are waiting areas When a car is parked, the joint state of the L waiting parking spaces is denoted as:

[0130]

[0131]

[0132] S14. When an electric vehicle finishes charging and leaves the charging station. If there are electric vehicles in the waiting queue, then wait for a parking space. Electric Vehicles (EVs) in China To start the charging service, see below:

[0133]

[0134] ,

[0135]

[0136]

[0137] in This represents the initial charging power of the EV when connected to the charging station. To ensure the battery life of the electric vehicle, its value is a relatively small constant.

[0138] This invention considers a real-world system, using the arrival of electric vehicles as a triggering event. It assumes that M types of electric vehicles arrive at the charging station sequentially via a Poisson process, with an arrival rate of [missing information]. M represents the number of different types of electric vehicles. The time sequence of electric vehicles arriving at the depot is denoted as... , This indicates the total number of electric vehicles arriving at the charging station. Let be the time when the k-th electric vehicle arrives at the depot. When the k-th EV arrives at the depot, this triggering event is recorded as . , These represent the type of electric vehicle and the battery state of charge (SoC), respectively.

[0139] When the m-th electric vehicle is When you arrive at the charging station, if there are no available parking spaces in the waiting area, The electric vehicle will leave the depot immediately; if there are available parking spaces in the waiting area, The electric vehicle then enters the waiting parking space, and the waiting queue status changes accordingly, as shown below.

[0140]

[0141]

[0142] S2. Determine the mapping relationship between the charging service electricity price and the user arrival rate, adjust the charging power of the charging pile, construct the peak shaving reward function per unit time, and establish a dual-center collaborative peak shaving system for charging stations.

[0143] Establishing a dual-center collaborative peak-shaving system for charging stations includes the following steps:

[0144] S21, Order Let K be the time interval for issuing time-of-use (TOU) electricity prices, and let K be the total number of TOU electricity price cycles. Let K be the peak-shaving electricity price of the power grid at any time t within a day. ,make , It is a finite electricity price state space; let For the k-th time-of-use electricity price At the time of issuance, the electricity price sequence for scoring is as follows: ,in, ,make .

[0145] S22, at the time of issuance of the k-th electricity price cycle SPM is based on the future time window of the station. Baseline of power stations and time-of-use electricity pricing within the grid The superior's peak shaving instructions and the number of electric vehicles queuing at the current moment. occupancy rate of charging piles , formulate Electricity price for charging services within the area For ease of representation, the service electricity price will be abbreviated as... . , The price adjustment range for charging station services.

[0146] S23. When the Service Price Maker (SPM) makes a decision... release Service electricity pricing plan Then, the arrival rate of electric vehicles during that period was obtained. .

[0147] S24. When the power grid has no peak-shaving demand. , This refers to the collection of peak-shaving periods for the power grid. The Charging Power Controller (CPC) adjusts each charging power to the rated charging power of the electric vehicle. .

[0148] Let the d-th scheduling time be... The issued peak reduction command is recorded as The total number of decisions made in a day is T represents the total length of a day. This refers to the scheduling instruction issuance cycle. In the CPC... Decision Moments , The CPC issues the charging power control command as follows: abbreviated as .

[0149] S25. When the power grid has peak-shaving needs. CPC based on the current charging station status Peak shaving command Charging service prices Time-of-use pricing with the power grid Issue charging power control command As shown below:

[0150]

[0151] Let J be a J-dimensional vector, where each value in the vector represents the charging adjustment power of the charging station. Let c be the charging power of the j-th charging pile at the Z-th decision time of the lower-level intelligent agent.

[0152] ,

[0153] The charging power of each charging station changes as follows:

[0154]

[0155] S26 and charging stations will be included in the decision-making period. Within, CPC is reduced based on historical operating curves. The power consumption. Simultaneously, the dispatch center will reward or penalize the peak-shaving behavior of the stations based on their actual response, recording the reward or penalty per unit time t at any given time t within the peak-shaving period as...

[0156]

[0157]

[0158] in, This serves as the baseline for charging stations and is typically obtained through statistical analysis of historical operating data from typical operating days of the charging stations. The penalty coefficient is... To reduce peak volume, As the reward coefficient, This represents the actual reduction amount. A collection of charging stations. Let t be the charging power of the j-th charging pile at time t.

[0159] S3. Based on the station's operation mode, design a two-layer collaborative optimization model with SPM as the upper layer and CPC as the lower layer.

[0160] Establishing a two-layer collaborative optimization model includes the following steps:

[0161] S31. Based on the operation mode of the charging stations, a two-layer collaborative optimization model is designed with SPM as the upper layer and CPC as the lower layer. The upper-layer intelligent agent comprehensively considers the operating revenue of the charging stations and user satisfaction with the service electricity price, and sets the charging service price to change the arrival rate of electric vehicle users, so that the charging stations can initially achieve peak shaving and valley filling. The lower-layer intelligent agent comprehensively considers the peak shaving response benefits and the user satisfaction cost of reducing charging power, and controls the charging power of the charging piles during peak shaving periods to respond to the upper-layer dispatching mechanism.

[0162] S32. At any time during the k-th time-of-use pricing cycle Set the charging station status to The electricity price for charging services is , The revenue per unit time from charging services at the charging station is recorded as As shown below:

[0163]

[0164]

[0165] in, For charging piles The charging service price for the connected electric vehicle when it first arrives at the depot. If the charging service price increases while the electric vehicle is waiting, the charging service price for that electric vehicle will remain unchanged as compensation to the user.

[0166] S33, During peak shaving periods When the charging power of some charging stations is reduced, we will provide certain compensation to those users, denoted as [compensation]. The total revenue obtained by the station during its daily operation. As shown below:

[0167]

[0168]

[0169]

[0170]

[0171] make Revenue is generated by providing charging services to high-speed train users. In order to obtain response compensation by participating in the power grid's peak shaving response. Indicates the amount of charging piles provided per unit of time. The compensation cost for electric vehicles is determined by the current charging power of the electric vehicle, its rated charging power, and the compensation factor. The decision is as follows:

[0172]

[0173] S34. Because the upper-level intelligent agent sets different electricity price cycles The price of charging services within the charging station affects the arrival rate of electric vehicle users during peak and off-peak hours, thereby increasing the utilization rate of charging piles and improving the charging station's revenue. At the same time, some charging users may experience a decrease in satisfaction with the charging service due to rising service prices. The cost of user satisfaction with the charging service price during the specified time period is denoted as: ,

[0174]

[0175]

[0176] in Price range for charging services The original electricity price for charging services within the area, Let be a fixed cost coefficient. The upper-level agent, considering service price satisfaction, aims to achieve the optimal daily economic benefit for the site; its optimization objective is denoted as... As shown below

[0177]

[0178] in , The sub-objective weight coefficient.

[0179] S35, Lower-level agents during peak shaving periods Internally, the charging power of some charging stations will be reduced. , This leads to longer charging times for electric vehicle users. Therefore, a user satisfaction index based on the relatively extended charging time is designed. It is assumed that at any decision point within the peak-shaving period of the lower-level agent... , Its charging control command is Then, at any time t within the decision-making period, The cost of lower-level satisfaction per unit time is As shown below:

[0180]

[0181]

[0182] The upper-level intelligent agent aims to maximize the economic benefits of peak-shaving response while balancing user satisfaction costs and compensation. Its optimization objective is denoted as... ,

[0183]

[0184] in , The sub-objective weight coefficient.

[0185] S4. Establish the optimization objective function for the upper-layer agent and solve it using the Dueling DQN algorithm. Establish the optimization objective function for the lower-layer agent and solve it using the TD3 algorithm.

[0186] Developing a collaborative peak-shaving optimization strategy for charging stations based on hierarchical reinforcement learning includes the following steps:

[0187] S41, Adjusting the pricing space for charging services Use constants Discretized There are several levels, among which Then the action at the k-th decision time. The corresponding charging service price is .

[0188] Upper-level intelligent agents at the decision-making moment The decision state is ,in express The baseline of the charging stations within the area, , ; express Total internal peak reduction power; They represent in Time-of-use electricity pricing, charging pile occupancy rate, and waiting queue length.

[0189] S42. At the moment of decision-making The upper-level intelligent agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as

[0190] As shown below:

[0191]

[0192]

[0193] in Let r be the cost of lower-level satisfaction per unit time, and r be the economic efficiency per unit time. There were no power grid peak shaving instructions during the period. .

[0194] S43, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. The total expected return accumulated over K steps in making a decision is as follows:

[0195]

[0196] The optimization objective function for the upper-level intelligent agent is:

[0197]

[0198] Optimization strategy To maximize The resulting control strategy, i.e. .

[0199] S44. Use the Dueling DQN algorithm of deep reinforcement learning to solve the upper-layer agent.

[0200] S45. Let the mapping relationship between the state action and the charging pile power be denoted as: , As shown below:

[0201]

[0202]

[0203] intermediate variables

[0204]

[0205] The lower-level intelligent agent at the decision-making moment The decision state is The action in this state is defined as , , . For lower-level intelligent agents at the decision-making moment baseline power, A 0 / 1 variable, representing the time at the decision point. charging pile Is there an electric charger? If so... ;on the contrary, .

[0206] S46. At the moment of decision-making The lower-level agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as As shown below:

[0207]

[0208] S47, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. Making a decision, the total expected return accumulated through Z steps of transition:

[0209]

[0210] Considering the initial state is random, let the optimization objective function of the lower-level agent be:

[0211]

[0212] Optimization strategy To maximize The resulting control strategy, i.e. .

[0213] S48. Use the TD3 algorithm (Twin Delayed Deep Deterministic policy gradient algorithm) to solve the lower-level agent.

[0214] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Unless otherwise specified, an element defined by the phrase "comprising..." or "including..." does not exclude the presence of additional elements in the process, method, article, or terminal device that includes said element. Additionally, in this document, "greater than," "less than," "exceeding," etc., are understood to exclude the stated number; "above," "below," "within," etc., are understood to include the stated number.

[0215] Although the above embodiments have been described, those skilled in the art, once they understand the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the above descriptions are merely embodiments of the present invention and do not limit the scope of patent protection of the present invention. Any equivalent structural or procedural transformations made using the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.

Claims

1. A collaborative peak-shaving method for electric vehicle charging stations based on hierarchical reinforcement learning, characterized in that, Includes the following steps: S1. Define the status of the charging pile, the status of the charging waiting parking space, and the combined status of the waiting parking space, and formulate the system operation mode; The status of the charging waiting parking space includes the type of electric vehicle parked in the space, its state of charge, and its arrival time. The joint state of the waiting parking spaces is the set of states of all waiting parking spaces; S2. Determine the mapping relationship between the charging service electricity price and the user arrival rate, adjust the charging power of the charging pile, construct the peak shaving reward function per unit time, and based on the peak shaving reward function and the user arrival rate mapping relationship, determine the operation objectives and constraints of the station, and provide a decision basis for constructing a two-layer collaborative optimization model; S3. Based on the operation mode of the power station, a two-layer collaborative optimization model is designed with the service electricity price setting unit as the upper layer and the charging power control unit as the lower layer. The charging station operator can dynamically adjust the charging service price received by charging users through the service electricity pricing unit to influence users' willingness to charge, thereby adjusting the charging load of the station; the charging power control unit allows the station operator to directly control and manage the power of the charging piles within the station; Based on the operation mode of the charging stations, a two-layer collaborative optimization model is designed with the service electricity price setting unit as the upper layer and the charging power control unit as the lower layer. The upper-layer intelligent agent comprehensively considers the operating revenue of the charging stations and the user's satisfaction with the service electricity price, and sets the charging service price to change the arrival rate of electric vehicle users, so that the charging stations can initially achieve peak shaving and valley filling. The lower-layer intelligent agent comprehensively considers the peak shaving response revenue and the user's satisfaction with the reduction of charging power, and controls the charging power of the charging piles during peak shaving periods to respond to the upper-layer dispatching agency. S4. Establish the optimization objective function of the upper-layer agent and solve it using the Dueling DQN algorithm. Establish the optimization objective function of the lower-layer agent and solve it using the TD3 algorithm.

2. The method for coordinated peak shaving of electric vehicle charging stations based on hierarchical reinforcement learning according to claim 1, characterized in that: Step S1, defining the system operating mode, includes the following steps. S11. Determine the number of DC fast charging piles as J and the number of charging waiting parking spaces as L; Let J DC fast charging piles be respectively labeled as... At time t, the charging station The state is denoted as in, They represent The type of electric vehicle connected, its maximum battery capacity, and its rated charging power. These represent the current state of charge (SBC) and charging power of the electric vehicle, respectively. If free, then Therefore, the joint state of the J charging piles at time t is denoted as ; S12. After recording the combined status of the charging piles, record the L charging waiting spaces as follows: At time t, charging while waiting for a parking space. The state is denoted as ,in They represent The types of electric vehicles parked on the road, their state of charge, and their arrival times; S13, if If there are no electric vehicles waiting, then When there are waiting areas When a car is parked, the joint state of the L waiting parking spaces is denoted as: S14. When an electric vehicle finishes charging and leaves the charging station. If there are electric vehicles in the waiting queue, then wait for a parking space. Electric vehicle access To start the charging service, see below: , in This indicates the initial charging power of the electric vehicle when connected to the charging station. Taking the arrival of electric vehicles as the triggering event, assume that M types of electric vehicles arrive at the charging station sequentially via a Poisson process, with an arrival rate of [missing information]. M represents the number of different types of electric vehicles, and the time sequence of electric vehicles arriving at the depot is denoted as... , This indicates the total number of electric vehicles arriving at the charging station. Let be the time when the k-th electric vehicle arrives at the depot. When the k-th electric vehicle arrives at the depot, the triggering event is recorded as . , These represent the type of electric vehicle and the state of charge of the battery, respectively. When the m-th electric vehicle is When you arrive at the charging station, if there are no available parking spaces in the waiting area, The electric vehicle will leave the depot immediately; if there are available parking spaces in the waiting area, The electric vehicle then enters the waiting parking space, and the waiting queue status changes accordingly, as shown below. 。 3. The method for coordinated peak shaving of electric vehicle charging stations based on hierarchical reinforcement learning according to claim 1, characterized in that: The establishment of the dual-center collaborative peak-shaving system for charging stations in step S2 includes the following steps: S21, Order Let K be the time interval for issuing time-of-use (TOU) electricity prices, and let K be the total number of TOU electricity price cycles. Let K be the peak-shaving electricity price of the power grid at any time t within a day. ,make , It is a finite electricity price state space; let For the k-th time-of-use electricity price At the time of issuance, the electricity price sequence for scoring is as follows: ,in, ,make ; S22, at the time of issuance of the k-th electricity price cycle The service electricity pricing unit sets prices based on the future time window of the power station. Baseline of power stations and time-of-use electricity pricing within the grid The superior's peak shaving instructions and the number of electric vehicles queuing at the current moment. occupancy rate of charging piles , formulate Electricity price for charging services within the area For ease of representation, the service electricity price will be abbreviated as... , , The price adjustment range for charging station services; S23. When the service electricity pricing unit makes a decision at the time of decision-making release Service electricity pricing plan Then, the arrival rate of electric vehicles during that period was obtained. ; S24. When the power grid has no peak-shaving demand. , As a collection of peak-shaving periods for the power grid, CPC adjusts each charging power to the rated charging power of the electric vehicle. ; where let the d-th scheduling time be The issued peak reduction command is recorded as The total number of decisions made in a day is T represents the total length of a day. For the scheduling instruction issuance cycle, in CPC's Decision Moment , The charging power control unit issues the charging power control command as follows: abbreviated as ; S25. When the power grid has peak-shaving needs. The charging power control unit adjusts the power according to the current charging station status. Peak shaving command Charging service prices Time-of-use pricing with the power grid Issue charging power control command As shown below: Let J be a J-dimensional vector, where each value in the vector represents the charging adjustment power of the charging station. Let c be the charging power of the j-th charging pile at the Z-th decision time of the lower-level intelligent agent. , The charging power of each charging station varies as follows: S26 and charging stations will be included in the decision-making period. Within, CPC is reduced based on historical operating curves. The power consumption is measured, and the dispatch center will reward or penalize the peak-shaving behavior of the stations based on their actual response. The reward or penalty for peak shaving at any time t within the peak-shaving period will be recorded as follows: in, As the baseline for charging stations, The penalty coefficient is... To reduce peak volume, As the reward coefficient, This represents the actual reduction amount. A collection of charging stations. Let t be the charging power of the j-th charging pile at time t.

4. The method for coordinated peak shaving of electric vehicle charging stations based on hierarchical reinforcement learning according to claim 1, characterized in that: In step S3, establishing the two-layer collaborative optimization model includes the following steps: S31. Based on the operation mode of the power station, a two-layer collaborative optimization model is designed with the service electricity price setting unit as the upper layer and the charging power control unit as the lower layer. The upper-layer intelligent agent comprehensively considers the operating revenue of the power station and the user's satisfaction with the service electricity price, and sets the charging service price to change the arrival rate of electric vehicle users, so that the power station can initially achieve peak shaving and valley filling. The lower-layer intelligent agent comprehensively considers the peak shaving response revenue and the user's satisfaction with the reduction of charging power, and controls the charging power of the charging pile during the peak shaving period to respond to the upper-layer dispatching agency. S32. At any time during the k-th time-of-use pricing cycle Set the charging station status to The electricity price for charging services is , The revenue per unit time from charging services at the charging station is recorded as As shown below: in, For charging piles The charging service price for the connected electric vehicle when it first arrives at the depot. If the charging service price increases while the electric vehicle is waiting, the charging service price for that electric vehicle will remain unchanged as compensation to the user. S33, During peak shaving periods When the charging power of some charging stations is reduced, certain compensation will be given to the affected users, denoted as... The overall revenue obtained by the station during a day's operation. As shown below: make Revenue is generated by providing charging services to high-speed train users. In order to obtain response compensation by participating in the power grid's peak shaving response, Indicates the amount of charging piles provided per unit of time. The compensation cost for electric vehicles is determined by the current charging power of the electric vehicle, its rated charging power, and the compensation factor. The decision is as follows: S34. Because the upper-level intelligent agent sets different electricity price cycles The price of charging services within the charging station affects the arrival rate of electric vehicle users during peak and off-peak hours, thereby increasing the utilization rate of charging piles and improving the charging station's revenue. At the same time, some charging users may experience a decrease in charging service satisfaction due to rising service prices. The cost of user satisfaction with the charging service price during the specified time period is denoted as: , in Price range for charging services The original electricity price for charging services within the area, With a fixed cost coefficient, the upper-level agent aims to achieve the optimal daily economic benefit of the site while considering service price satisfaction. This optimization objective is denoted as... As shown below in , These are the sub-objective weight coefficients; S35, Lower-level agents during peak shaving periods Internally, the charging power of some charging stations will be reduced. , This leads to longer charging times for electric vehicle users. Assuming that at any decision point during the peak-shaving period of the lower-level intelligent agent... , Its charging control command is Then, at any time t within the decision-making period, The cost of lower-level satisfaction per unit time is As shown below: The upper-level intelligent agent aims to maximize the economic benefits of peak-shaving response while balancing user satisfaction costs and compensation. Its optimization objective is denoted as... , in , The sub-objective weight coefficient.

5. The method for coordinated peak shaving of electric vehicle charging stations based on hierarchical reinforcement learning according to claim 1, characterized in that: In step S4, the optimization objective functions of the upper and lower layer agents are established and solved using relevant algorithms: The goal of the upper-level intelligent agent is to maximize the cumulative reward within a finite time frame. Due to the randomness of vehicle traffic flow, it is naturally a random variable, starting from the initial state. The total profit accumulated after K steps at the beginning is: If we consider the initial state to be random, then the optimization objective function of the upper-level agent is: Optimization strategy To maximize The resulting control strategy, i.e. The upper-level intelligent agent uses the DuelingDQN algorithm to solve the problem; Lower-level agents at the decision-making moment The decision state is , The action in this state is defined as , , Let the mapping relationship between state actions and charging pile power be denoted as . , As shown below: in A 0 / 1 variable, representing the time at the decision point. charging pile Is there an electric charger? If so, ;on the contrary, , Assuming at the decision moment The upper-level intelligent agent is in a state Take action The single-step transfer reward is recorded as As shown below: Considering the initial state is random, the optimization objective function of the lower-level agent is: Optimization strategy To maximize The resulting control strategy, i.e. The lower-level intelligent agent uses the TD3 algorithm to solve the problem.

6. The method for coordinated peak shaving of electric vehicle charging stations based on hierarchical reinforcement learning according to claim 1, characterized in that: Step S4 specifically includes the following steps. S41, Adjusting the pricing space for charging services Use constant Discretized There are several levels, among which Then the action at the k-th decision time The corresponding charging service price is , Upper-level intelligent agents at the decision-making moment The decision state is ,in express The baseline of the charging stations within the area, , ; express Total internal peak reduction power; They represent in Time-of-use electricity pricing, charging pile occupancy rate, and waiting queue length; S42. At the moment of decision-making The upper-level intelligent agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as As shown below: in Let r be the cost of lower-level satisfaction per unit time, and r be the economic efficiency per unit time, where if There were no power grid peak shaving instructions during the period. ; S43, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. The total expected return accumulated over K steps in making a decision is as follows: The optimization objective function for the upper-level intelligent agent is: Optimization strategy To maximize The resulting control strategy, i.e. ; S44. Solve the upper-layer agent using the Dueling DQN algorithm of deep reinforcement learning; S45. Let the mapping relationship between the state action and the charging pile power be denoted as: , As shown below: in As an intermediate variable, the lower-level agent makes decisions at the time of decision-making. The decision state is The action in this state is defined as , , , A 0 / 1 variable, representing the time at the decision point. charging pile Is there an electric charger? If so, ;on the contrary, ; S46. At the moment of decision-making The lower-level agent is in a state Take action Then, at the next moment Agent state transition to The single-step transfer reward generated in this process is denoted as As shown below: S47, Considering the arrival rate of electric vehicles In this case, the calculation starts from the initial state. Initially, the upper-level intelligent agent follows the control strategy. Making a decision, the total expected return accumulated through Z steps of transition: Considering the initial state is random, let the optimization objective function of the lower-level agent be: Optimization strategy To maximize The resulting control strategy, i.e. ; S48. Use the TD3 algorithm to solve the lower-level agent.