A deep reinforcement learning-based electric vehicle and power grid interaction collaborative scheduling method

By constructing a hexagonal grid model among multiple transformers through a vehicle-to-grid (V2G) collaborative scheduling method based on deep reinforcement learning, the state and actions of transformers and electric vehicles are defined, a reward function is designed and reinforcement learning training is performed, and the problem of overload of distribution transformers caused by charging and discharging of electric vehicles in groups is solved, achieving load curve smoothing and maximizing economic benefits.

CN122246731APending Publication Date: 2026-06-19STATE GRID SHANGHAI ENERGY INTERCONNECTION RES INST CO LTD +3

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID SHANGHAI ENERGY INTERCONNECTION RES INST CO LTD
Filing Date
2026-02-09
Publication Date
2026-06-19

Smart Images

  • Figure CN122246731A_ABST
    Figure CN122246731A_ABST
Patent Text Reader

Abstract

This invention relates to a vehicle-to-grid (V2G) interactive collaborative scheduling method based on deep reinforcement learning, comprising: constructing a scenario model of V2G charging and discharging across multiple transformers; defining the load state of the transformers based on the scenario model; defining the state and actions of the V2G based on the scenario model; using the load state of the transformers and the state of the V2G as the state space, and the actions of the V2G as the action space, designing a reward function that includes the cost of charging and migration, the benefit of discharging, and the equivalent benefit of peak shaving and valley filling; constructing a partial Markov decision process and training it using reinforcement learning until the strategy converges to obtain the optimal scheduling strategy; deploying the optimal scheduling strategy to the scheduling center, and outputting scheduling actions based on the real-time environmental state using a greedy strategy to complete the scheduling task of the V2G. This invention can reduce peak loads, smooth the load curve, and improve the economic benefits of V2G participating in V2G interaction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power distribution network flexibility scheduling and mobile energy storage optimization technology, and in particular to a vehicle-to-grid interactive and cooperative scheduling method for electric vehicles based on deep reinforcement learning. Background Technology

[0002] Against the backdrop of accelerated integration of distributed renewable energy and uneven distribution of electricity load in time and space, distribution transformers often face the dilemma of "prone to overload during peak hours and idle capacity during off-peak hours." A large number of electric vehicles (EVs) with vehicle-to-grid (V2G) capabilities are essentially mobile energy storage units capable of bidirectional grid interaction; however, without proper coordination, their collective charging and discharging behavior will further increase load rates, amplify over-limit and insulation thermal aging risks during peak hours, while off-peak redundant capacity remains wasted and the peak-to-valley difference will widen. Existing work has shown that by combining deep reinforcement learning (DRL) with distribution load monitoring and user charging demand information, power strategies can be dynamically issued within a rolling cycle. This enables orderly charging and avoids transformer overload under uncertain arrival / departure and heterogeneous state of charge (SOC) constraints. A typical solution employs a dual-timescale control of "15-minute rolling plan + 3-minute real-time correction" to suppress over-limit fluctuations. On the other hand, simple model-based optimization is prone to the curse of dimensionality when state information is incomplete or traffic / price / voltage constraints are strongly coupled. Summary of the Invention

[0003] The technical problem to be solved by the present invention is to provide a vehicle-to-grid (V2G) interactive and collaborative scheduling method for electric vehicles based on deep reinforcement learning, which can reduce peak loads, smooth load curves, and improve the economic benefits of EVs participating in V2G interaction.

[0004] The technical solution adopted by this invention to solve its technical problem is: to provide a method for coordinated scheduling of electric vehicle-network interaction based on deep reinforcement learning, comprising the following steps:

[0005] A scenario model for electric vehicles charging and discharging across multiple transformers is constructed. The scenario model discretizes the target scheduling area into multiple hexagonal grids to obtain a grid set. Each grid in the grid set contains at most one transformer.

[0006] The load state of the transformer is defined based on the scenario model.

[0007] The state and actions of the electric vehicle are defined based on the scenario model;

[0008] Using the load state of the transformer and the state of the electric vehicle as the state space, and the actions of the electric vehicle as the action space, a reward function is designed that includes the cost of charging and migrating, the discharge benefit, and the equivalent benefit of peak shaving and valley filling. A partial Markov decision process is constructed and trained using reinforcement learning until the policy converges, thus obtaining the optimal scheduling policy.

[0009] The optimal scheduling strategy is deployed to the scheduling center, and a greedy strategy is used to output scheduling actions based on the real-time environmental status to complete the scheduling task of electric vehicles.

[0010] The load status of the transformer includes: the transformer's base load, the transformer's net load, and the transformer's area marking status.

[0011] The net load of the transformer is expressed as follows: ,in, for A grid with transformers at all times The net load of the transformer inside, for A grid with transformers at all times The base load of the transformer inside, for A grid with transformers at all times The collection of all vehicles inside. express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power.

[0012] The transformer area marking status is distinguished according to the load range of the expected transformer operation. When the transformer's base load is less than the lower limit of the load range, the transformer area marking status is a low-load charging zone; when the transformer's base load is greater than the upper limit of the load range, the transformer area marking status is a high-load suitable discharge zone.

[0013] The state of the electric vehicle includes: the electric vehicle is in The location and state of charge at any given time; the actions of the electric vehicle include: charging, discharging, and migrating to the target grid.

[0014] The objective function for training using reinforcement learning is expressed as follows: ,in, This indicates taking the maximum value. All are weighting coefficients. The total revenue for all electric vehicles within the target scheduling area is expressed as: , for Moment Electric Vehicles The profit is expressed as: ,in, express Time Grid The unit price of internal charging. express Moment Electric Vehicles Should charging be performed? For charging efficiency, express A grid with transformers at all times Electric vehicles inside The charging power, For the action time, express Time Grid The unit benefit of internal discharge, express Moment Electric Vehicles Should a discharge be performed? For discharge efficiency, express A grid with transformers at all times Electric vehicles inside The discharge power, The unit price of energy consumption for electric vehicles. express Moment Electric Vehicles From the current grid Migrate to target grid , Energy consumption per unit distance Indicates from the current grid Migrate to target grid The equivalent path; For penalty items, it is represented as: , for A grid with transformers at all times The net load of the transformer inside, and These represent the lower and upper limits of the load range for the expected operation of the transformer, respectively.

[0015] When using reinforcement learning for training, the following constraints must be met:

[0016] Transformer load constraints are expressed as: ,in, for A grid with transformers at all times The net load of the transformer inside, For grids with transformers The maximum load limit for transformers within the facility;

[0017] Charging pile constraints are represented as follows: ,in, express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power, For grids with transformers The number of charging stations within the area;

[0018] Charge and discharge are mutually exclusive and subject to rated constraints, as expressed as: ,in, and These are the upper limits for charging power and discharging power, respectively.

[0019] The departure target constraint is expressed as: ,in, For electric vehicles Off-grid power consumption For electric vehicles Target power consumption;

[0020] Low charge high discharge constraint, expressed as: if ,but: ;like ,but: ,in, express A grid with transformers at all times This is a low-load rechargeable area. express A grid with transformers at all times This is a high-load suitable area;

[0021] Migration traffic constraints are expressed as: ,in, express From the grid Go to the grid The number of vehicles ready to perform the discharge mission;

[0022] Location change constraint, represented as: if the electric vehicle is completing a relocation order, During the order dispatch period, electric vehicles ,in, express Moment Electric Vehicles The location.

[0023] The technical solution adopted by this invention to solve its technical problem is: to provide an electric vehicle-network interactive cooperative scheduling device based on deep reinforcement learning, comprising:

[0024] The construction module is used to build a scenario model of electric vehicles charging and discharging across multiple transformers. The scenario model discretizes the target scheduling area into multiple hexagonal grids to obtain a grid set. Each grid in the grid set contains at most one transformer.

[0025] The first definition module is used to define the load state of the transformer based on the scenario model;

[0026] The second definition module is used to define the state and actions of the electric vehicle based on the scenario model;

[0027] The training module is used to design a reward function that includes the cost of charging and migrating, the benefit of discharging, and the equivalent benefit of peak shaving and valley filling, using the load state of the transformer and the state of the electric vehicle as the state space and the action space of the electric vehicle as the action space. It constructs a partial Markov decision process and uses reinforcement learning to train it until the policy converges and obtains the optimal scheduling policy.

[0028] The scheduling module is used to deploy the optimal scheduling strategy to the scheduling center and output scheduling actions based on the real-time environmental status using a greedy strategy to complete the scheduling task of electric vehicles.

[0029] The load status of the transformer includes: the transformer's base load, the transformer's net load, and the transformer's area marking status.

[0030] The net load of the transformer is expressed as follows: ,in, for A grid with transformers at all times The net load of the transformer inside, for A grid with transformers at all times The base load of the transformer inside, for A grid with transformers at all times The collection of all vehicles inside. express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power.

[0031] The transformer area marking status is distinguished according to the load range of the expected transformer operation. When the transformer's base load is less than the lower limit of the load range, the transformer area marking status is a low-load charging zone; when the transformer's base load is greater than the upper limit of the load range, the transformer area marking status is a high-load suitable discharge zone.

[0032] The state of the electric vehicle includes: the electric vehicle is in The location and state of charge at any given time; the actions of the electric vehicle include: charging, discharging, and migrating to the target grid.

[0033] The objective function for training using reinforcement learning is expressed as follows: ,in, This indicates taking the maximum value. All are weighting coefficients. The total revenue for all electric vehicles within the target scheduling area is expressed as: , for Moment Electric Vehicles The profit is expressed as: ,in, express Time Grid The unit price of internal charging. express Moment Electric Vehicles Should charging be performed? For charging efficiency, express A grid with transformers at all times Electric vehicles inside The charging power, For the action time, express Time Grid The unit benefit of internal discharge, express Moment Electric Vehicles Should a discharge be performed? For discharge efficiency, express A grid with transformers at all times Electric vehicles inside The discharge power, The unit price of energy consumption for electric vehicles. express Moment Electric Vehicles From the current grid Migrate to target grid , Energy consumption per unit distance Indicates from the current grid Migrate to target grid The equivalent path; For penalty items, it is represented as: , for A grid with transformers at all times The net load of the transformer inside, and These represent the lower and upper limits of the load range for the expected operation of the transformer, respectively.

[0034] When using reinforcement learning for training, the following constraints must be met:

[0035] Transformer load constraints are expressed as: ,in, for A grid with transformers at all times The net load of the transformer inside, For grids with transformers The maximum load limit for transformers within the facility;

[0036] Charging pile constraints are represented as follows: ,in, express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power, For grids with transformers The number of charging stations within the area;

[0037] Charge and discharge are mutually exclusive and subject to rated constraints, as expressed as: ,in, and These are the upper limits for charging power and discharging power, respectively.

[0038] The departure target constraint is expressed as: ,in, For electric vehicles Off-grid power consumption For electric vehicles Target power consumption;

[0039] Low charge high discharge constraint, expressed as: if ,but: ;like ,but: ,in, express A grid with transformers at all times This is a low-load rechargeable area. express A grid with transformers at all times This is a high-load suitable area;

[0040] Migration traffic constraints are expressed as: ,in, express From the grid Go to the grid The number of vehicles ready to perform the discharge mission;

[0041] Location change constraint, represented as: if the electric vehicle is completing a relocation order, During the order dispatch period, electric vehicles ,in, express Moment Electric Vehicles The location.

[0042] The technical solution adopted by the present invention to solve its technical problem is: to provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-mentioned electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning.

[0043] The technical solution adopted by the present invention to solve its technical problem is: to provide a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the above-mentioned electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning are implemented.

[0044] Beneficial effects

[0045] By adopting the above-mentioned technical solution, this invention has the following advantages and positive effects compared with the prior art: This invention effectively reduces the system state dimension by discretizing the target scheduling region with a hexagonal grid, adapts to large-scale vehicle-charging pile and multi-transformer scenarios, and has excellent scalability; it clearly defines the transformer load state and the state and actions of electric vehicles, and combines a reward function that integrates charging migration costs, discharging benefits, and peak shaving and valley filling equivalent benefits to construct a partial Markov decision process and obtain the optimal strategy through reinforcement learning training, thereby achieving the core objective of stabilizing the transformer load within the expected range and smoothing the load curve, while maximizing the economic benefits of EV owners participating in vehicle-to-grid interactive EVs. Attached Figure Description

[0046] Figure 1 This is a flowchart of the first embodiment of the present invention, which is a method for coordinated scheduling of electric vehicle-network interaction based on deep reinforcement learning.

[0047] Figure 2 This is a flowchart illustrating the training process using reinforcement learning in the first embodiment of the present invention. Detailed Implementation

[0048] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined by the appended claims.

[0049] The first embodiment of the present invention relates to an electric vehicle vehicle-to-grid (V2G) interactive and coordinated scheduling method based on deep reinforcement learning. This method can guide EVs to charge in low-load transformers and discharge in high-load transformers via V2G, and perform cross-regional migration as needed, under the constraints of port capacity, SOC, transformer load, etc., to achieve the spatiotemporal orderly transfer of energy. The goal is to maintain the transformer load within the desired range, thereby reducing peak loads, smoothing the load curve, and improving the economic benefits of EVs participating in V2G interaction.

[0050] This implementation of a deep reinforcement learning-based electric vehicle-grid interactive collaborative scheduling method supports the direct control of the charging and discharging behavior of EVs after they are connected to the grid, allowing the scheduling center to manage their discharge behavior. After connecting an EV to the grid, the user needs to set an estimated off-grid time for the vehicle. and target power The dispatch center collects this information, along with data such as the initial battery level of the EVs. For other EVs within the target dispatch area but not yet connected to the grid, the dispatch center, based on the collected data on the transformer load within the area, translates the peak-shaving and valley-filling demand into a demand for the number of EVs participating in vehicle-to-grid interaction, and then issues migration instructions to idle EVs. For example... Figure 1 As shown, the specific steps include:

[0051] Step 1: Construct a scenario model of electric vehicles charging and discharging across multiple transformers.

[0052] This step models the scenario of optimizing the "charge-operation-discharge" process across multiple transformers for large-scale EVs, specifically as follows: The target scheduling region is discretized into multiple hexagonal grids, resulting in a grid set. Each grid There is at most one transformer inside, and the grid with the transformer has One charging station with V2G capability. The dispatch center's dispatch cycle is one day, divided into... each discrete time step Step size is The scene model also includes a set of all vehicles for each grid cell. Grid with transformer The transformer inside has a rated capacity With base load .

[0053] Step 2: Define the load state of the transformer based on the scenario model.

[0054] In this embodiment, the dispatch center collects real-time base load information of the transformer at the beginning of each time step. Historical net load The unit price of remaining available charging stations and time-sharing / zone-based charging. Unit revenue of time-sharing and zone-sharing discharge wait.

[0055] At any given time step, the net load of the transformer can be expressed as: , This is the set of all vehicles within the grid containing the transformer. and These refer to the charging and discharging power, respectively.

[0056] In this embodiment, the load range is based on the system's desired transformer operating condition. Divide the area containing the transformer into two parts: if If so, then mark the area as a low-load rechargeable area; if This area is then marked as a high-load suitable placement zone, forming... This label provides a directional prior for subsequent "cross-regional energy channels (low-region charging, high-region discharging)" and serves as a state feature input during training.

[0057] Step 3: Define the state and actions of the electric vehicle based on the scenario model.

[0058] This step involves modeling the electric vehicle, including its state and actions. The state is used to record the vehicle's state within each grid cell. At time step The position is Vehicle's SOC status Based on the charging / discharging scheduling or migration strategy formulated by the dispatch center for all EVs within the region, EVs will correspondingly perform three types of actions: charging, discharging, or migrating to the target grid. Therefore, these actions can be assigned using binary assignment. Representing vehicles At time step Perform charging, discharging, or retrieving from the current grid? Migration to Grid These are the three types of actions. Therefore, the evolution of the vehicle's SOC state can be represented as:

[0059] in, The time steps required to complete the action These are the charging and discharging efficiencies, respectively. Energy consumption per unit distance To from the grid Migration to Grid The equivalent path, .

[0060] Step 4: Using the load state of the transformer and the state of the electric vehicle as the state space, and the actions of the electric vehicle as the action space, design a reward function that includes the cost of charging and migrating, the discharge benefit, and the equivalent benefit of peak shaving and valley filling. Construct a partial Markov decision process and train it using reinforcement learning until the policy converges to obtain the optimal scheduling policy.

[0061] In this step, the problem of scheduling EVs to adjust the load of transformers in the target area is characterized as POMDP:

[0062] 1) State space: consisting of time steps Transformer load status Vehicle status The splicing composition: .

[0063] 2) Action Space: Each EV has three types of executable actions: charging, discharging, or migrating to the target grid.

[0064] 3) State Transition: After the dispatch center assigns a strategy to each EV in the region, the system updates the transformer net load status, the base load, the EV's destination, and the time required to complete the action. The reward for completing an action per step All these can be determined. Furthermore, after the transformer load is updated, the low-load charging zone and high-load discharging zone will be updated according to the load status. After the vehicle completes its action, if it performed a charging or discharging action, its grid position remains unchanged; if it performed a migration action, its position is updated to the target grid position for that scheduling, and the vehicle's SOC status is updated according to step 3. After the EV completes its action, its status changes from... Transfer to .

[0065] 4) Reward Function: To simultaneously achieve the goals of "peak shaving and valley filling, and maximizing owner benefits," the reward for EV actions includes not only the costs of charging and relocation, and the benefits of discharging, but also the equivalent benefit of "charging at low-load transformers and discharging at high-load transformers." For any EV that executes the assigned scheduling task, its immediate reward is calculated. Candidate values ​​are defined based on the cumulative value of the discount. If the task is completed and the state transitions to the terminated state, then ,otherwise ,in, For the target network, used for stable training; This is the discount factor. In the implementation, a scheduling policy is generated for all EVs. Then, calculate the value according to the above formula. Its specific form is consistent with the above formula and is consistent with the TD objective.

[0066] In this implementation, the scheduling center needs to consider two objectives when assigning strategies to each EV within a region: firstly, to ensure that the assigned strategies achieve the effect of "peak shaving and valley filling"; and secondly, to maximize the benefits for vehicle owners. The strategy layer provides adjustment targets for each transformer at each time step: To measure the cost of deviating from the target zone, the following penalty term is used:

[0067] .

[0068] Each EV in time step The returns are:

[0069]

[0070] in, This represents the unit price of driving energy consumption. The total revenue for all EVs within the target scheduling area is: Therefore, the overall optimization objective can be expressed as:

[0071] ,

[0072] in, and This is a weighting parameter used to balance the benefits of electric vehicles and the regulation targets of transformers.

[0073] In addition, it is subject to the following conditions:

[0074] 1) Transformer load constraints, expressed as:

[0075]

[0076] in, This is the upper limit of the transformer load. This constraint means that the transformer cannot be overloaded.

[0077] 2) Charging pile constraints, expressed as:

[0078]

[0079] This constraint means that the total number of EVs performing charging and discharging in each area with a transformer cannot exceed the total number of charging stations.

[0080] 3) Charging and discharging are mutually exclusive and subject to rated constraints, expressed as:

[0081]

[0082] in, These represent the upper limits of charging and discharging power, respectively.

[0083] 4) Departure target constraint, expressed as:

[0084] .

[0085] This constraint is used to ensure that the target SOC is met when the user leaves the site.

[0086] 5) Low charge high discharge constraint, expressed as: If ,but: ;like ,but: .

[0087] 6) Migration traffic constraints: Represented as the constraints at time steps. From the grid Go to the grid The number of vehicles ready to perform the discharge task meets the following requirements:

[0088]

[0089] 7) Location change constraint, represented as: if the EV is completing a migration dispatch, i.e. During the period of completing this order, charging and discharging are prohibited: .

[0090] like Figure 2 As shown, the training process for reinforcement learning in this embodiment is as follows:

[0091] First, initialize the mesh, model, and experience replay pool. The target region is hexagonally meshed, and each mesh, transformer, and vehicle set is established. The online network and target network are initialized, and a region-level shared experience replay pool is maintained for subsequent TD estimation and parameter updates.

[0092] Next, at each time step, information such as base load, historical net load, charging pile availability, and EV status is collected to obtain "low-load charging zone" and "high-load suitable discharge zone", providing prior information for cross-regional energy channels.

[0093] Then, the next step will be based on A greedy strategy generates executable instructions for EV. Under the constraints given above, it generates instructions probabilistically. Random exploration actions, based on probability Greedy selection is performed by solving the objective function; It is reduced from 1 to 0.1 in steps and then held.

[0094] Afterwards, each EV performs charging, discharging, or inter-zone migration actions according to the generated strategy. After execution, the environment advances and generates a new state: the vehicle position and SOC are updated according to the state evolution in step 3, and the transformer's net load and the "low load charging zone" and "high load discharge zone" markers are refreshed accordingly.

[0095] Then, calculate the immediate return for each EV that has performed the task. The immediate return includes (i) charging and migration costs, (ii) V2G discharge benefits, and (iii) the equivalent benefits of "low-load charging / high-load discharging". The transferred data is written to the experience replay pool, and a TD target is constructed based on the discounted cumulative value to stabilize training. Specifically, for the policy given by the scheduling center... The valuation uses the TD forecast, which is expressed as follows:

[0096]

[0097] in, For online networks, This is the differential step size.

[0098] When the experience replay pool is full, a small batch is randomly sampled from the replay pool, the loss function is calculated, and the online network parameters are updated to minimize the TD loss; after updating the online network parameters C times, the target network parameters are synchronized with the online network parameters.

[0099] Specifically, training is performed when the experience replay pool is full, and during training, experience replays are randomly selected from the pool. Each sample forms a small batch. To calculate the loss function

[0100] ,

[0101] And update the parameters to minimize the loss function:

[0102] ,

[0103] in, The learning rate is used. In the neural network implementation, the target network is used to calculate the mini-batch TD target value:

[0104]

[0105] Among them, target network parameters Every Synchronize with the online network once. To ensure continuous exploration, the strategy is executed using... Greedy, each grid cell has a probability Under the constraints in step 5, actions are randomly generated for exploration, with probability. Greedy optimization is performed by solving the objective in step 4 under the above constraints. It decreases from 1 to 0.1 in increments and then remains unchanged.

[0106] If the time domain is exhausted and the round termination condition is met, the current round ends; otherwise, it will return to "Collection and Update" to continue rolling to the next time step. During this process, the strategy allocation is always under the goal of "peak shaving and valley filling and maximizing car owner benefits", and meets constraints such as "transformer not overloaded, pile capacity limited, charging and discharging mutual exclusion / rated, user target SOC achieved, low charging and high discharging and migration traffic".

[0107] Step 5: Deploy the optimal scheduling strategy to the scheduling center, and use a greedy strategy to output scheduling actions based on the real-time environmental status to complete the scheduling task of electric vehicles.

[0108] It is easy to see that this implementation model the joint scheduling problem as a partial Markov decision process. To reduce the system state dimension and improve scalability, the scheduling center divides the target area into multiple hexagonal grids and implements distributed collaborative scheduling within the grids. Taking a hexagonal region as a node, a scheduling scheme is given for all idle EVs within each node. The "state value function" of a single EV is approximated by a neural network to evaluate the future returns of the vehicle at different time steps, locations, and SOCs. Based on the current set of available tasks (charging / discharging / migrating), an optimization problem is constructed with the goal of "peak shaving and valley filling + maximizing owner benefits," providing a "vehicle-task" match within the grid in one go.

[0109] The present invention will be further illustrated by a specific embodiment below.

[0110] This embodiment selects a city power distribution area, containing three public distribution transformers deployed in three hexagonal grid nodes, with several transformerless grids surrounding it serving as vehicle passage and parking areas. The dispatch center makes discrete decisions with a 24-hour rolling cycle and fixed step size. At the beginning of each step, it collects the base load, historical net load, number of available charging piles, and time-of-use electricity price / discharge revenue for each transformer, and divides the grids with transformers into "low-charging zone / high-discharge zone" based on this. During the online phase, each EV is given three action options: charge or discharge in its current grid, or migrate to a target grid with a transformer. The State of Charge (SOC) of the vehicle after completing the action is updated based on efficiency, mileage energy consumption, and equivalent distance. The dispatch problem is modeled as a POMDP, with the state concatenated as "time-load-vehicle"; the reward takes into account the revenue from electricity purchase and sale, migration costs, and the equivalent revenue of "low-charging and high-discharge". Under constraints such as "no overload, maximum pile position / rated power, charging and discharging mutual exclusion, off-site target SOC, migration flow, and prohibition of charging and discharging during migration", the dispatch center generates the current step dispatch order through a hybrid solution of "value assessment + vehicle-task matching"; after execution, the transfer triplet is written into the shared playback pool, parameters are updated according to TD target and target network, and the constraint-containing data is used to generate the current step dispatch order. The greedy strategy is continuously explored and utilized. For nodes marked as "high-discharge zones" during the evening peak (such as around 18:00), the algorithm prioritizes assigning nearby vehicles with available discharge capacity to perform V2G discharge, while simultaneously dispatching idle vehicles from "low-charging zones" to provide reinforcement; during the nighttime off-peak hours, the nodes concentrate on recharging in "low-charging zones," forming a stable cross-regional energy channel.

[0111] Therefore, this invention effectively reduces the system state dimension by discretizing the target scheduling region using a hexagonal grid, making it suitable for large-scale vehicle-charging and multi-transformer scenarios and possessing excellent scalability. It clearly defines the transformer load state and the state and actions of electric vehicles, and combines a reward function that integrates charging migration costs, discharging benefits, and peak shaving and valley filling equivalent benefits to construct a partial Markov decision process and obtain the optimal strategy through reinforcement learning training. This achieves the core objectives of stabilizing the transformer load within the desired range and smoothing the load curve, while maximizing the economic benefits of EV owners participating in vehicle-to-grid interactive EVs.

[0112] The second embodiment of the present invention relates to an electric vehicle-to-grid (V2G) interactive and cooperative scheduling device based on deep reinforcement learning, comprising:

[0113] The construction module is used to build a scenario model of electric vehicles charging and discharging across multiple transformers. The scenario model discretizes the target scheduling area into multiple hexagonal grids to obtain a grid set. Each grid in the grid set contains at most one transformer.

[0114] The first definition module is used to define the load state of the transformer based on the scenario model;

[0115] The second definition module is used to define the state and actions of the electric vehicle based on the scenario model;

[0116] The training module is used to design a reward function that includes the cost of charging and migrating, the benefit of discharging, and the equivalent benefit of peak shaving and valley filling, using the load state of the transformer and the state of the electric vehicle as the state space and the action space of the electric vehicle as the action space. It constructs a partial Markov decision process and uses reinforcement learning to train it until the policy converges and obtains the optimal scheduling policy.

[0117] The scheduling module is used to deploy the optimal scheduling strategy to the scheduling center and output scheduling actions based on the real-time environmental status using a greedy strategy to complete the scheduling task of electric vehicles.

[0118] The load status of the transformer includes: the transformer's base load, the transformer's net load, and the transformer's area marking status.

[0119] The net load of the transformer is expressed as follows: ,in, for A grid with transformers at all times The net load of the transformer inside, for A grid with transformers at all times The base load of the transformer inside, for A grid with transformers at all times The collection of all vehicles inside. express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power.

[0120] The transformer area marking status is distinguished according to the load range of the expected transformer operation. When the transformer's base load is less than the lower limit of the load range, the transformer area marking status is a low-load charging zone; when the transformer's base load is greater than the upper limit of the load range, the transformer area marking status is a high-load suitable discharge zone.

[0121] The state of the electric vehicle includes: the electric vehicle is in The location and state of charge at any given time; the actions of the electric vehicle include: charging, discharging, and migrating to the target grid.

[0122] The objective function for training using reinforcement learning is expressed as follows: ,in, This indicates taking the maximum value. All are weighting coefficients. The total revenue for all electric vehicles within the target scheduling area is expressed as: , for Moment Electric Vehicles The profit is expressed as: ,in, express Time Grid The unit price of internal charging. express Moment Electric Vehicles Should charging be performed? For charging efficiency, express A grid with transformers at all times Electric vehicles inside The charging power, For the action time, express Time Grid The unit benefit of internal discharge, express Moment Electric Vehicles Should a discharge be performed? For discharge efficiency, express A grid with transformers at all times Electric vehicles inside The discharge power, The unit price of energy consumption for electric vehicles. express Moment Electric Vehicles From the current grid Migrate to target grid , Energy consumption per unit distance Indicates from the current grid Migrate to target grid The equivalent path; For penalty items, it is represented as: , for A grid with transformers at all times The net load of the transformer inside, and These represent the lower and upper limits of the load range for the expected operation of the transformer, respectively.

[0123] When using reinforcement learning for training, the following constraints must be met:

[0124] Transformer load constraints are expressed as: ,in, for A grid with transformers at all times The net load of the transformer inside, For grids with transformers The maximum load limit for transformers within the facility;

[0125] Charging pile constraints are represented as follows: ,in, express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power, For grids with transformers The number of charging stations within the area;

[0126] Charge and discharge are mutually exclusive and subject to rated constraints, as expressed as: ,in, and These are the upper limits for charging power and discharging power, respectively.

[0127] The departure target constraint is expressed as: ,in, For electric vehicles Off-grid power consumption For electric vehicles Target power consumption;

[0128] Low charge high discharge constraint, expressed as: if ,but: ;like ,but: ,in, express A grid with transformers at all times This is a low-load rechargeable area. express A grid with transformers at all times This is a high-load suitable area;

[0129] Migration traffic constraints are expressed as: ,in, express From the grid Go to the grid The number of vehicles ready to perform the discharge mission;

[0130] Location change constraint, represented as: if the electric vehicle is completing a relocation order, During the order dispatch period, electric vehicles ,in, express Moment Electric Vehicles The location.

[0131] The third embodiment of the present invention relates to an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the electric vehicle-to-grid interactive cooperative scheduling method based on deep reinforcement learning of the first embodiment.

[0132] The fourth embodiment of the present invention relates to a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the electric vehicle-to-grid interactive cooperative scheduling method based on deep reinforcement learning of the first embodiment.

[0133] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0134] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0135] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction methods implemented in a process. Figure 1One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0136] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0137] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for coordinated scheduling of electric vehicle-network interaction based on deep reinforcement learning, characterized in that, Includes the following steps: A scenario model for electric vehicles charging and discharging across multiple transformers is constructed. The scenario model discretizes the target scheduling area into multiple hexagonal grids to obtain a grid set. Each grid in the grid set contains at most one transformer. The load state of the transformer is defined based on the scenario model. The state and actions of the electric vehicle are defined based on the scenario model; Using the load state of the transformer and the state of the electric vehicle as the state space, and the actions of the electric vehicle as the action space, a reward function is designed that includes the cost of charging and migrating, the discharge benefit, and the equivalent benefit of peak shaving and valley filling. A partial Markov decision process is constructed and trained using reinforcement learning until the policy converges, thus obtaining the optimal scheduling policy. The optimal scheduling strategy is deployed to the scheduling center, and a greedy strategy is used to output scheduling actions based on the real-time environmental status to complete the scheduling task of electric vehicles.

2. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 1, characterized in that, The load status of the transformer includes: the transformer's base load, the transformer's net load, and the transformer's area marking status.

3. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 2, characterized in that, The net load of the transformer is expressed as follows: ,in, for A grid with transformers at all times The net load of the transformer inside, for A grid with transformers at all times The base load of the transformer inside, for A grid with transformers at all times The collection of all vehicles inside. express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power.

4. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 2, characterized in that, The transformer area marking status is distinguished according to the load range of the expected transformer operation. When the transformer's base load is less than the lower limit of the load range, the transformer area marking status is a low-load charging zone; when the transformer's base load is greater than the upper limit of the load range, the transformer area marking status is a high-load suitable discharge zone.

5. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 1, characterized in that, The state of the electric vehicle includes: the electric vehicle is in The location and state of charge at any given time; the actions of the electric vehicle include: charging, discharging, and migrating to the target grid.

6. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 1, characterized in that, The objective function for training using reinforcement learning is expressed as follows: ,in, This indicates taking the maximum value. All are weighting coefficients. The total revenue for all electric vehicles within the target scheduling area is expressed as: , for Moment Electric Vehicles The profit is expressed as: ,in, express Time Grid The unit price of internal charging. express Moment Electric Vehicles Should charging be performed? For charging efficiency, express A grid with transformers at all times Electric vehicles inside The charging power, For the action time, express Time Grid The unit benefit of internal discharge, express Moment Electric Vehicles Should a discharge be performed? For discharge efficiency, express A grid with transformers at all times Electric vehicles inside The discharge power, The unit price of energy consumption for electric vehicles. express Moment Electric Vehicles From the current grid Migrate to target grid , Energy consumption per unit distance Indicates from the current grid Migrate to target grid The equivalent path; For penalty items, it is represented as: , for A grid with transformers at all times The net load of the transformer inside, and These represent the lower and upper limits of the load range for the expected operation of the transformer, respectively.

7. The electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning according to claim 1, characterized in that, When using reinforcement learning for training, the following constraints must be met: Transformer load constraints are expressed as: ,in, for A grid with transformers at all times The net load of the transformer inside, For grids with transformers The upper limit of transformer load within the facility; Charging pile constraints are represented as follows: ,in, express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power, For grids with transformers The number of charging stations within the area; Charge and discharge are mutually exclusive and subject to rated constraints, as expressed as: ,in, and These are the upper limits for charging power and discharging power, respectively. The departure target constraint is expressed as: ,in, For electric vehicles Off-grid power consumption For electric vehicles Target power consumption; Low charge high discharge constraint, expressed as: if ,but: ;like ,but: ,in, express A grid with transformers at all times This is a low-load rechargeable area. express A grid with transformers at all times This is a high-load suitable area; Migration traffic constraints are expressed as: ,in, express From the grid Go to the grid The number of vehicles ready to perform the discharge mission; Location change constraint, represented as: if the electric vehicle is completing a relocation order, During the order dispatch period, electric vehicles ,in, express Moment Electric Vehicles The location.

8. A vehicle-to-grid (V2G) interactive and cooperative scheduling device for electric vehicles based on deep reinforcement learning, characterized in that, include: The construction module is used to build a scenario model of electric vehicles charging and discharging across multiple transformers. The scenario model discretizes the target scheduling area into multiple hexagonal grids to obtain a grid set. Each grid in the grid set contains at most one transformer. The first definition module is used to define the load state of the transformer based on the scenario model; The second definition module is used to define the state and actions of the electric vehicle based on the scenario model; The training module is used to design a reward function that includes the cost of charging and migrating, the benefit of discharging, and the equivalent benefit of peak shaving and valley filling, using the load state of the transformer and the state of the electric vehicle as the state space and the action space of the electric vehicle as the action space. It constructs a partial Markov decision process and uses reinforcement learning to train it until the policy converges and obtains the optimal scheduling policy. The scheduling module is used to deploy the optimal scheduling strategy to the scheduling center and output scheduling actions based on the real-time environmental status using a greedy strategy to complete the scheduling task of electric vehicles.

9. The electric vehicle-to-network interactive and cooperative scheduling device based on deep reinforcement learning according to claim 8, characterized in that, The load status of the transformer includes: the transformer's base load, the transformer's net load, and the transformer's area marking status.

10. The electric vehicle-to-network interactive cooperative scheduling device based on deep reinforcement learning according to claim 9, characterized in that, The net load of the transformer is expressed as follows: ,in, for A grid with transformers at all times The net load of the transformer inside, for A grid with transformers at all times The base load of the transformer inside, for A grid with transformers at all times The collection of all vehicles inside. express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power.

11. The electric vehicle-to-network interactive cooperative scheduling device based on deep reinforcement learning according to claim 9, characterized in that, The transformer area marking status is distinguished according to the load range of the expected transformer operation. When the transformer's base load is less than the lower limit of the load range, the transformer area marking status is a low-load charging zone; when the transformer's base load is greater than the upper limit of the load range, the transformer area marking status is a high-load suitable discharge zone.

12. The electric vehicle-to-network interactive cooperative scheduling device based on deep reinforcement learning according to claim 8, characterized in that, The state of the electric vehicle includes: the electric vehicle is in The location and state of charge at any given time; the actions of the electric vehicle include: charging, discharging, and migrating to the target grid.

13. The electric vehicle-to-network interactive cooperative scheduling device based on deep reinforcement learning according to claim 8, characterized in that, The objective function for training using reinforcement learning is expressed as follows: ,in, This indicates taking the maximum value. All are weighting coefficients. The total revenue for all electric vehicles within the target scheduling area is expressed as: , for Moment Electric Vehicles The profit is expressed as: ,in, express Time Grid The unit price of internal charging. express Moment Electric Vehicles Should charging be performed? For charging efficiency, express A grid with transformers at all times Electric vehicles inside The charging power, For the action time, express Time Grid The unit benefit of internal discharge, express Moment Electric Vehicles Should a discharge be performed? For discharge efficiency, express A grid with transformers at all times Electric vehicles inside The discharge power, The unit price of energy consumption for electric vehicles. express Moment Electric Vehicles From the current grid Migrate to target grid , Energy consumption per unit distance Indicates from the current grid Migrate to target grid The equivalent path; For penalty items, it is represented as: , for A grid with transformers at all times The net load of the transformer inside, and These represent the lower and upper limits of the load range for the expected operation of the transformer, respectively.

14. The electric vehicle-to-network interactive cooperative scheduling device based on deep reinforcement learning according to claim 8, characterized in that, When using reinforcement learning for training, the following constraints must be met: Transformer load constraints are expressed as: ,in, for A grid with transformers at all times The net load of the transformer inside, For grids with transformers The upper limit of transformer load within the facility; Charging pile constraints are represented as follows: ,in, express A grid with transformers at all times Electric vehicles inside The charging power, express A grid with transformers at all times Electric vehicles inside The discharge power, For grids with transformers The number of charging stations within the area; Charge and discharge are mutually exclusive and subject to rated constraints, as expressed as: ,in, and These are the upper limits for charging power and discharging power, respectively. The departure target constraint is expressed as: ,in, For electric vehicles Off-grid power consumption For electric vehicles Target power consumption; Low charge high discharge constraint, expressed as: if ,but: ;like ,but: ,in, express A grid with transformers at all times This is a low-load rechargeable area. express A grid with transformers at all times This is a high-load suitable area; Migration traffic constraints are expressed as: ,in, express From the grid Go to the grid The number of vehicles ready to perform the discharge mission; Location change constraint, represented as: if the electric vehicle is completing a relocation order, During the order dispatch period, electric vehicles ,in, express Moment Electric Vehicles The location.

15. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the electric vehicle-network interactive cooperative scheduling method based on deep reinforcement learning as described in any one of claims 1-7.

16. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the electric vehicle-to-grid interactive cooperative scheduling method based on deep reinforcement learning as described in any one of claims 1-7.