A vehicle-network interaction real-time optimization method, system and device based on a single network architecture

By using the DQN algorithm with a single network architecture and a human risk preference model, the energy interaction between electric vehicles and the power grid is optimized, solving the problems of power grid load fluctuation and battery degradation, and realizing the orderly scheduling of electric vehicle charging and discharging and maximizing benefits.

CN122288264APending Publication Date: 2026-06-26GUANGDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG UNIV OF TECH
Filing Date
2026-03-31
Publication Date
2026-06-26

Smart Images

  • Figure CN122288264A_ABST
    Figure CN122288264A_ABST
Patent Text Reader

Abstract

This invention discloses a real-time optimization method, system, and device for vehicle-to-grid interaction based on a single-network architecture, belonging to the field of smart grid and electric vehicle charging and discharging optimization. The method includes training and model deployment phases. In the training phase, multi-source data is first collected to construct a state vector. This vector is then combined with a human risk preference model and a quantile mapping function to generate a fused feature vector, constructing an overall optimal action network to output the optimal charging and discharging solution. After constraint correction, the vehicle state and scheduling cost are updated. Then, based on a normalized advantage function, current and target network functions are constructed. Finally, through multiple rounds of training, the optimal action network is output. In the deployment phase, a data storage module collects real-time data, which is then fused with features and fed into the optimal action network to obtain interaction values. An intelligent decision-making module completes the energy interaction, and the interaction data is then fed back to the training module to achieve closed-loop model updates. This method simplifies the computational burden with a single network, achieving multi-objective collaborative optimization of grid load, battery degradation, and vehicle owner benefits.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of smart grid and electric vehicle charging and discharging optimization technology, specifically a method, system and device for real-time optimization of vehicle-grid interaction based on a single network architecture. Background Technology

[0002] Globally, carbon emission controls are being strengthened, and the number of electric vehicle users is rapidly expanding, making them an important force in promoting energy structure transformation and reducing carbon emissions. However, the large-scale popularization of electric vehicles also brings severe challenges to grid operation. The disorderly charging and discharging of large numbers of electric vehicles can easily cause grid load fluctuations and imbalances. At the same time, it is difficult to balance battery degradation control with maximizing benefits for car owners. Therefore, there is an urgent need to build an efficient vehicle-grid energy interaction optimization solution.

[0003] Existing research often employs Deep Q-Networks (DQN) combined with policy gradients and priority experience replay, focusing the action space primarily on pricing and energy supply scenarios. It has not been applied to electric vehicle energy exchange scenarios. The core limitation is that DQN is only suitable for scenarios with continuous states and discrete actions, failing to meet the scheduling requirements of the continuous action space for electric vehicle charging and discharging. Some existing technologies use an actor-critic architecture to solve this problem. The actor network updates actions using a gradient ascent algorithm, while the critic network uses the DQN algorithm to evaluate the value of actions, thereby generating multi-time-period single-vehicle charging and discharging strategies and achieving preliminary scheduling optimization.

[0004] However, the actor-critic architecture, by its very nature, operates with two networks in parallel, placing high demands on the computing power of the terminal deployment devices. This makes it difficult to adapt to the lightweight deployment requirements of terminals, and balancing algorithm simplification with optimization effectiveness has become a critical issue that urgently needs to be addressed. Furthermore, existing solutions do not fully consider human risk preferences, making it difficult to ensure that scheduling benefits align with user expectations. Therefore, how to achieve a balance between maximizing benefits and multi-objective optimization within a finite timeframe using efficient algorithms adapted to continuous action spaces has become a core direction for overcoming existing technological bottlenecks. Summary of the Invention

[0005] To address the aforementioned technical bottlenecks, this invention proposes a core innovative solution. Its key feature is the implementation of a DQN algorithm under continuous operation conditions, adapting to the continuous operation scheduling requirements of electric vehicle charging and discharging, effectively simplifying the computational burden of dual-network parallelism, and facilitating lightweight terminal deployment. A human risk preference model can accurately mine the potential benefits in the charging and discharging sequence, maximizing charging and discharging benefits throughout the entire time period. Simultaneously, the introduction of a human risk preference model ensures that scheduling benefits align with user expectations, ultimately achieving multi-objective collaborative optimization of grid load optimization, battery degradation reduction, and maximizing vehicle owner benefits. This solves the core pain points of existing technologies, such as complex algorithms and benefits that do not meet user expectations.

[0006] The specific technical solution is as follows: 1) A real-time optimization method for vehicle-to-network interaction based on a single network architecture, comprising the following steps: S1. Data Acquisition and Storage: Data related to vehicle-to-network interaction is collected by various terminals and transmitted to the data storage module for storage; S2. Data Distribution and Processing: The data storage module inputs the data into the intelligent decision-making module and the historical experience training module respectively; S3. Historical Experience Optimization: The historical experience module calculates the comprehensive benefit based on the input data of the data storage module, and performs iterative optimization through an optimization model based on a single network architecture, outputting the overall optimal action network function to the intelligent decision-making module.

[0007] S4. Intelligent Decision Computation: The intelligent decision module constructs a human risk preference model and a quantile mapping function. It performs a Hadamard product between the state vector and the quantile mapping function to obtain a new state vector that incorporates quantile features. This new state vector is then input into the overall optimal action network function output by the historical experience training module, which outputs the optimal solution for the energy interaction between each electric vehicle and the power grid in the current state. S5. Command Execution and Interaction: The intelligent decision-making module sends the optimal solution for interactive energy to the execution interaction module through the communication module, completing the energy interaction execution between the electric vehicle and the power grid.

[0008] 2) Furthermore, S3 also includes the following steps: S3.1. Basic Data Preparation and Network Initialization: Compile the required time-series data, including charging prices, discharge revenue, charging station power load, electric vehicle travel demand, renewable energy generation, state of charge, and battery health status at each time point. Initialize the experience pool to store key data such as status, actions, rewards, and termination flags during the scheduling process; Initialize the current network function and the target network function to lay the foundation for subsequent quantile calculation, action decision-making, and value assessment.

[0009] S3.2. Time-series state variable acquisition and construction Within each scheduling sequence, the current state variables are collected, including the electric vehicle's state of charge, battery health status, renewable energy generation at that moment, charging station power load, charging price, and discharging price, forming a complete state vector s. t This is used for subsequent network calculations.

[0010] S3.3. Transformation and Mapping of Quantile Distortion Risk Measures Constructing a distortion risk measurement function based on human risk preference parameters.

[0011] in It most closely approximates human risk preferences and performs distorted risk measurement transformation on all quantiles to meet the risk decision-making needs of actual scheduling. Establish quantile mapping function

[0012] Where m is the dimension of the state vector, and the weights and biases are parameters to be optimized. This formula maps the transformed scalar quantiles to a quantile vector with the same dimension as the state variables. The state vector s t Perform the Hadamard product operation with the mapped quantile vector

[0013] A new state vector with fused quantile features is obtained and used as input to the subsequent network.

[0014] Preferably, the distortion risk measurement function uses eight quantiles.

[0015] S3.4. Construction of the Overall Optimal Action Network and Initial Action Selection Establish the overall optimal action network function m( ) Using a new state vector incorporating quantile features as input, a two-layer hidden layer is set up with the ReLU activation function, and the output is the optimal action vector for energy exchange between the electric vehicle and the charging station. u t ={ u 1,t , u 2,t , ... u i,t , ...} ; Set upper and lower limits for the state of charge and energy exchange, and define the value range of each component of the action vector; By combining historical action weights with the ε-Greedy exploration strategy, the actions output by the optimal action network are modified to obtain preliminary actions. This ensures a smooth transition of actions while also enabling the exploration and utilization of scheduling strategies.

[0016] Furthermore, the upper and lower limits of the state of charge and the upper and lower limits of energy exchange constraints include:

[0017] In the formula, S max and S min These are the upper and lower limits of the state of charge, respectively. u maxand u min These are the upper and lower limits of a single energy exchange.

[0018] Furthermore, the ε-Greedy strategy is adopted by combining historical action weights with the exploration strategy.

[0019] In the formula, i It is the weight of historical actions, taking θ= 0.7; the probability that χ has ε is zero, otherwise the component χ ~ N(0,1) 2 ).

[0020] S3.5. Motion Constraint Correction and Determination of Effective Motions The system determines whether each component of the initially selected action meets the preset state of charge and energy exchange constraints. If it exceeds the limit, the system performs a truncation correction operation on the out-of-limit action component to restrict it to a legal value range, thereby obtaining an executable and effective action and ensuring that the state of charge of the electric vehicle is always within the normal range.

[0021] Furthermore, the truncation correction operation is as follows:

[0022] In the formula, E i Let be the battery capacity of the i-th electric vehicle. N i,t Let represent the travel demand of the i-th electric vehicle at time t.

[0023] S3.6. Scheduling Costs and Vehicle Status Updates Calculate energy exchange cost based on effective movements

[0024] In the formula, P inc t The charging price at time t, P ch t The discharge price at time t; Update the state of charge for the next moment based on effective actions and the electricity demand of electric vehicle travel.

[0025] The depth of discharge (DoD) is calculated based on the change in state of charge. Combined with formulas related to battery life degradation, the overall battery life degradation is calculated, and the battery health status is updated for the next time step.

[0026] In the formula, or This is the scaling factor.

[0027] Furthermore, the formula related to battery life degradation includes:

[0028]

[0029]

[0030] In the formula, Considering the energy interaction process, the current is a certain fixed value.

[0031] S3.7. Scheduling Rewards and Experience Data Storage Calculate the unbalanced power index of charging stations

[0032] In the formula, abs( ) is an absolute value function, where N is the total number of electric vehicles, and R is... t This reflects the degree of power supply and demand imbalance on the power grid side; Constructing a comprehensive return function

[0033] In the formula, α1, α2, and α3 are scaling factors; Record the state, action, reward, next state, and termination flag of the sequence in a five-element array and store it in the experience pool. If the next state is the state of the last sequence of the day's scheduling, the termination flag is 0; otherwise, it is 1.

[0034] S3.8. Construction of State Value Function and Positive Definite Precision Matrix Function and Network Calculation Establish a state-value function based on the normalized advantage function. V( ) and positive definite precision matrix function P( ) Both methods take a new state vector incorporating quantile features as input, employing a two-layer hidden layer and ReLU activation. The former outputs a one-dimensional scalar state value, while the latter uses a formula...

[0035] In the formula, It is a lower triangular matrix. It is a positive definite matrix; Then construct the current network function for the τ-th quantile.

[0036] This function evaluates the value of an action; By combining the total reward, the termination flag, and the state value of the target network, construct the target network function at the τ-th quantile.

[0037] S3.9. Loss Calculation and Network Parameter Optimization Calculate the time difference loss between the outputs of the current network and the target network.

[0038] It reflects the error in the assessment of the value of the action; Based on the quantiles after the distortion risk measurement, a quantile loss function is constructed, the specific formula of which is:

[0039]

[0040]

[0041] This function combines L1 smoothing loss to calculate the overall loss, and the loss calculation covers all quantile combinations of the current network and the target network; With the overall loss as the optimization objective, backpropagation optimization is performed on all parameters to be optimized in the current network, including the weights and biases of the quantile mapping function, the overall optimal action network function, the state value function, and the positive definite precision matrix function. S3.2 to S3.9 are repeated until all data in the training dataset has been traversed.

[0042] S3.10. Target Network Update and Model Output According to a preset number of scheduling sequences, the parameters to be optimized are updated, including: quantile mapping function. G( ) Weights and biases, overall optimal action network function m( ) Weights and biases, state value function V( ) Weights and biases, positive definite precision matrix functions P( ) The weights and biases are determined. Simultaneously, a soft update method for the target network is used to ensure the stability of network training. Repeat steps S3.2 to S3.10 to complete n rounds of training, and output the overall optimal action network function. The data is then passed to the intelligent decision-making module.

[0043] 3) Furthermore, S4 also includes the following steps S4.1. Collect current data on renewable energy generation, charging station power load, charging price, discharge revenue, user travel demand, state of charge, and battery health in the area through the data storage module to obtain the state vector s. t ; S4.2. Construct a human risk preference model consistent with S3.3 b(t) Quantile mapping function G( ) Mapping function between state vector and quantile G( ) Perform the Hadamard product operation to obtain a new state vector that incorporates quantile features. X t,β(τ) ; S4.3 Obtain the overall optimal action network function output by the historical experience training module. m( ) The new state vector that integrates the quantile features X t,β(τ) Input to the overall optimal action network function m( ) Output the optimal energy interaction value between each electric vehicle and the power grid in the current state; S4.4. The intelligent decision-making module sends energy interaction suggestions to electric vehicle users. After the users confirm the suggestions based on their own needs, they will perform energy interaction actions with the charging station.

[0044] S4.5. Synchronously transmit the energy interaction action information and the current time sequence state and action information to the historical experience training module; S4.6. The intelligent decision-making module updates the network function based on the feedback from the training module using historical experience, thereby achieving closed-loop control of the model and ultimately realizing the orderly scheduling of electric vehicle charging and discharging within the region.

[0045] 4) In addition, to implement the above method, the present invention also provides a vehicle-to-network interaction real-time optimization system based on a single network architecture, comprising: The data storage module is used to store data on renewable energy generation, charging station power load, charging price, discharge revenue, user travel demand, electric vehicle charge status and battery health status sent by various terminals, and transmit them to the corresponding modules. The intelligent decision-making module is used to execute all the steps in S4, receive and process various types of data, make energy interaction suggestions, and complete the update and iteration of network functions; The historical experience training module is used to execute all the steps in S3, complete model training and incremental training, and feed back the updated network parameters to the intelligent decision-making module. The interaction module is used to display energy interaction suggestions to electric vehicle users, receive user confirmation commands, and switch the charging pile between normal charging mode and vehicle-to-grid (V2G) mode. The communication module is used for information exchange between modules, including wireless communication submodule and wired communication submodule; The wireless communication submodule is used for data acquisition and communication with various terminals, as well as information exchange between the intelligent decision-making module and the execution interaction module. The wired communication submodule is used to enable communication between the data storage module, the intelligent decision-making module, and the historical experience training module.

[0046] 5) In addition, to achieve the above objectives, the present invention also provides a vehicle-to-network interaction real-time optimization device based on a single network architecture, including a memory and a processor. The memory stores a program for implementing a vehicle-to-network interaction real-time optimization method based on a single network architecture. When the processor executes the computer program, it implements any step in the above-described vehicle-to-network interaction real-time optimization method based on a single network architecture.

[0047] Compared with the prior art, the present invention is able to: The optimization of Q-value under continuous action is achieved using only one network, which simplifies the computational burden of parallel actor-critic dual networks and solves the problem of computing power limitation of terminal devices; By introducing a human risk preference model, we can explore the potential benefits of charging and discharging timing, ensuring that scheduling benefits are maximized throughout the entire time period while meeting user expectations. It takes into account grid load optimization, reduced battery life degradation, and maximized benefits for car owners, breaking through the limitations of existing technologies that focus on a single objective, and solving the core pain points in vehicle-to-grid interaction. Attached Figure Description

[0048] To more clearly illustrate the optimization schemes in the implementation of the present invention, the following is a brief introduction to the drawings used in the prior art and embodiments. The following drawings are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0049] Figure 1 This invention provides an overall block diagram of a vehicle-to-network interaction real-time optimization system based on a single network architecture.

[0050] Figure 2 The flowchart illustrates a real-time optimization method for vehicle-to-network interaction based on a single network architecture, as provided by this invention.

[0051] Figure 3A comparison chart of kernel density curves of electric vehicle owner benefits for four strategies provided by this invention: no control strategy, dual-network architecture strategy, single-network architecture strategy, and single-network architecture strategy combined with human risk preference.

[0052] Figure 4 A line graph comparing the battery health status of electric vehicles using four strategies provided by this invention: no control strategy, dual-network architecture strategy, single-network architecture strategy, and single-network architecture combined with human risk preference.

[0053] Figure 5 A line graph comparing the power imbalance of the power grid provided by this invention, which includes four strategies: no control strategy, dual-network architecture strategy, single-network architecture strategy, and single-network architecture combined with human risk preference.

[0054] Figure 6 The average training time per round for the three strategies provided by this invention: dual-network architecture strategy, single-network architecture strategy, and single-network architecture strategy combined with human risk preference. Detailed Implementation

[0055] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art through creative effort are within the scope of protection of the present invention.

[0056] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of this application.

[0057] Example 1 This invention provides a real-time optimization method for vehicle-to-network interaction based on a single network architecture, comprising the following steps: S1. Data Acquisition and Storage: Data related to vehicle-to-network interaction is collected by various terminals and transmitted to the data storage module for storage; S2. Data Distribution and Processing: The data storage module inputs the data into the intelligent decision-making module and the historical experience training module respectively; S3. Historical Experience Optimization: The historical experience module calculates the comprehensive benefit based on the input data of the data storage module, and performs iterative optimization through an optimization model based on a single network architecture, outputting the overall optimal action network function to the intelligent decision-making module.

[0058] S3.1. Basic Data and Network Initialization S3.1.1. The data acquisition module uses the SCADA system or the power exchange API to collect multi-source historical data from the electric vehicle user side and the power grid side of the charging station in the region, including charging price, discharge revenue, charging station load, and renewable energy power generation.

[0059] S3.1.2. Then, through the mobile navigation app, record the historical data of the travel demand electricity of N electric vehicles in the area, and input the data into the historical experience training module to provide basic data support for scheduling at various time series.

[0060] S3.1.3. N electric vehicles are initialized uniformly, with state of charge S i,0 =100%, energy consumption is E i,t =150kWh, initial battery health status value is H i,0 =100% (i=1,2,...,N).

[0061] S3.1.4. Initialize the experience pool D to store the five-element array for time-series scheduling, and initialize the current network function and the target network function to lay the parameter foundation for quantile transformation, action decision-making, and value assessment.

[0062] S3.2. Collect time-series state variables and construct a state vector. Within each scheduling sequence of historical data, 2N+4 dimensional state variables are collected and integrated to form a state vector. .

[0063] Specifically, the state vector consists of the state of charge of N vehicles, the battery health status of N vehicles, the power generation of one renewable energy source, the power load of one charging station, the charging price, and the discharging price.

[0064] S3.3. Quantile Distortion Transformation and Mapping to Generate Fusion Feature Vectors Specifically, in S3.3.1, eight optimal quantiles are selected, namely 0.125, 0.25, 0.375, 0.50, 0.625, 0.75, and 0.875.

[0065] Substitute human risk preference parameters Through formula

[0066] By altering each of the eight quantiles, the risk decision-making preferences of actual scheduling are aligned, namely, avoiding risk when returns are low and seeking risk when returns are high.

[0067] S3.3.2 Through the quantile mapping function , transform the scalar The mapping is performed using a vector with the same dimension as the state vector. The mapping process uses a cosine basis function and ReLU activation. The parameter to be optimized is the weight ω. ij Bias b j .

[0068] S3.3.3 Perform the Hadamard product between the state vector and the mapped quantile vector to obtain the input vector that incorporates quantile features. X t,β(τ) It serves as the core input for subsequent action networks and value networks.

[0069] S3.4. Constructing the overall optimal action network function S3.4.1 Establishing the overall optimal action network function m( ) The input is X t,β(τ) The system has two hidden layers, each with 512 neurons, and uses ReLU as the activation function. The output is the N-dimensional optimal action. a ,Right now u t ={ u 1,t , u 2,t , ... u i,t , ... Each component represents the energy exchange value between a single vehicle and a charging station.

[0070] S3.4.2 To ensure a normal state of charge, four constraint parameters are set.

[0071] Specifically, the minimum state of charge S min =20%, maximum state of charge S max =100%, minimum energy exchange rate u min =-40, maximum energy exchange capacity u max =40.

[0072] By combining historical action weights with the ε-Greedy exploration strategy, the actions output by the optimal action network are corrected using the formula.

[0073] The initial exploration rate was ε=1.00, and Xu Xiaonian decreased it by 0.02 each time until ε=0.01; i It is the weight of historical actions, taking θ= 0.7; the probability that χ has ε is zero, otherwise the component χ ~ N(0,1) 2 ).

[0074] S3.5. Motion Constraints and Truncation Correction Determine whether each component of the initially selected N-dimensional action satisfies the constraints in S3.4. If it exceeds the limit, then truncate and correct the excess part.

[0075] Specifically, the truncation correction rule is as follows:

[0076] In the formula, E i Let be the battery capacity of the i-th electric vehicle. N i,t Let represent the travel demand of the i-th electric vehicle at time t.

[0077] S3.6. Calculate scheduling cost and update state. S3.6.1. Calculate energy exchange cost based on effective actions

[0078] In the formula, P inc t The charging price at time t, P ch t Let t be the discharge price at time t.

[0079] S3.6.2. Update the state of charge at the next moment based on the effective actions and the electric vehicle's travel demand.

[0080] S3.6.3. Calculate the depth of discharge (DoD) based on the change in state of charge, and combine it with the relevant formulas for battery life degradation to calculate the overall battery life degradation and update the battery health status at the next time step.

[0081] In the formula, or As the scaling factor, take or =0.001.

[0082] Specifically, the formula related to battery life degradation includes:

[0083]

[0084]

[0085] In the formula, Considering the energy interaction process, the current is a certain fixed value.

[0086] S3.7. Calculate the overall benefit and store the five data sets for time-series scheduling. S3.7.1. Calculate the unbalanced power index of the charging station

[0087] In the formula, abs( ) is an absolute value function. This reflects the degree of power supply and demand imbalance on the power grid side.

[0088] S3.7.2. Constructing the comprehensive return function

[0089] In the formula, α1, α2, and α3 are weighting factors, with α1=1e2, α2=1e5, and α3=3e-2 respectively.

[0090] S3.7.3. Record the state, action, reward, next state, and termination flag. ,when When considering the state of the last of the 96 time series in a day, d =0, otherwise 1.

[0091] S3.8. Construction of State Value Function and Positive Definite Precision Matrix Function and Network Calculation S3.8.1. Establishing a state-value function based on the normalized advantage function V( ) and positive definite precision matrix function P( ) Both use a new state vector that incorporates quantile features as input. The former is a one-dimensional scalar, and the latter is an N×N matrix. They employ a two-layer hidden layer and ReLU activation.

[0092] The positive definite accuracy matrix function is simplified to:

[0093] In the formula, It is a lower triangular matrix. It is a positive definite matrix.

[0094] S3.8.2. Constructing the first The current network function for quantiles

[0095] This function evaluates the value of an action.

[0096] S3.8.3. Combining the total reward, termination flag, and the state value of the target network, construct the first... Quantitative target network function

[0097] S3.9. Calculate the loss function and optimize the network parameters. S3.9.1. Calculate the time difference loss between the current network and the target network output.

[0098] It reflects the error in the assessment of the value of the action; S3.9.2. Based on the quantiles after the distortion risk measurement, construct the quantile loss function, the specific formula of which is as follows:

[0099]

[0100]

[0101] Specifically, with κ=1, this function combines L1 smoothing loss to calculate the overall loss, and the loss calculation covers all quantile combinations of the current network and the target network.

[0102] S3.9.3. Using the overall loss as the optimization objective, perform backpropagation optimization on all parameters to be optimized in the current network, including the weights and biases of the quantile mapping function, the overall optimal action network function, the state value function, and the positive definite precision matrix function. Repeat steps S3.2 to S3.9 until all data in the training dataset has been traversed.

[0103] S3.10. Target network updates and terminal deployment Each day, 96 time series are traversed, and the parameters of the current state value function are directly assigned to the target state value function to achieve soft updates of the target network and ensure the stability of network training.

[0104] Repeat steps S3.2 to S3.10 to complete n rounds of training. After training, output the overall optimal action network function. It is then transmitted to the cloud processor.

[0105] S4. Intelligent Decision Computation: The intelligent decision module constructs a human risk preference model and a quantile mapping function. It performs a Hadamard product between the state vector and the quantile mapping function to obtain a new state vector that incorporates quantile features. This new state vector is then input into the overall optimal action network function output by the historical experience training module, which outputs the optimal solution for the energy interaction between each electric vehicle and the power grid in the current state. S4.1. The data acquisition module collects data on renewable energy generation, charging station power load, charging price, discharge revenue, and user travel demand within the area via a SCADA system or power exchange API. It then uses a mobile navigation app to obtain current data on user travel demand and translates it into the state vector s. t ; S4.2. Construct a human risk preference model consistent with the training phase. b(t) Quantile mapping function G( ) Mapping function between state vector and quantile G( ) Perform the Hadamard product operation to obtain a new state vector that incorporates quantile features. X t,β(τ) ; S4.3 Obtain the overall optimal action network function output by the historical experience training module. m( ) The new state vector that integrates the quantile features X t,β(τ) Input to the overall optimal action network function m( ) Output the optimal energy interaction value between each electric vehicle and the power grid in the current state; S4.4. The intelligent decision-making module sends energy interaction suggestions to electric vehicle users. After the users confirm the suggestions based on their own needs, they will perform energy interaction actions with the charging station.

[0106] S4.5. Synchronously transmit the energy interaction action information and the current time sequence state and action information to the historical experience training module; S4.6. The intelligent decision-making module updates the network function based on the feedback from the training module using historical experience, thereby achieving closed-loop control of the model and ultimately realizing the orderly scheduling of electric vehicle charging and discharging within the region.

[0107] S5. Command Execution and Interaction: The intelligent decision-making module sends the optimal solution for interactive energy to the execution interaction module through the communication module, completing the energy interaction execution between the electric vehicle and the power grid.

[0108] Example 2 This invention provides a real-time optimization system for vehicle-to-everything (V2X) interaction based on a single-network architecture, comprising: The data storage module is a type of memory used to store data on renewable energy generation, charging station power load, charging price, discharge revenue, user travel demand, electric vehicle charge status, and battery health status transmitted by various terminals, and to transmit them to the corresponding modules. The memory may include high-speed RAM or stable non-volatile memory. The intelligent decision-making module is a central controller CPU used to execute all steps of S4 in the first embodiment, receive and process various types of data and generate energy interaction suggestions, and complete the update and iteration of network functions; The historical experience training module is an image processing unit (GPU) used in all steps of S3 to complete model training, incremental training, and feed back the updated network function to the intelligent decision-making module. The execution interaction module is a display, input device, and mode switching unit. The display serves as a human-computer interaction terminal, receiving energy interaction suggestions from the intelligent decision module and visually displaying them to electric vehicle users. The input device can be a capacitive touch screen or a keyboard for users to confirm the interaction suggestions. The mode switching unit enables the charging pile to switch between normal charging mode and vehicle-to-grid (V2G) mode. The communication module is used for information exchange between modules, including wireless communication submodule and wired communication submodule; The wireless communication submodule is a wide area network communication protocol used for communication between the data storage module and various terminals, as well as communication between the intelligent decision-making module and the execution interaction module. The wide area network communication protocol can be the LoRaWAN communication protocol or the NB-IoT communication protocol. The wired communication submodule is a hardware communication protocol used to enable communication between the data storage module, the intelligent decision-making module, and the historical experience training module. The hardware communication protocol can be the USART communication protocol, the I2C communication protocol, or the SPI communication protocol.

[0109] The implementation methods of each module and its function in the system are completely consistent with the steps of the method in Embodiment 1, so they will not be repeated here. Example 3 This invention provides a vehicle-to-network interaction real-time optimization device based on a single network architecture. The memory contains one or more programs, which can be executed by one or more controllers to implement a vehicle-to-network interaction real-time optimization method based on a single network architecture as described in the above embodiments.

[0110] The above embodiments are for illustrative purposes only and do not represent the superiority or inferiority of the embodiments.

[0111] To verify the effectiveness of the proposed real-time optimization method, system, and device for vehicle-to-grid interaction based on a single-network architecture, data from the Guangdong Provincial Central Charging Station of China Southern Power Grid were used for simulation analysis. The experimental hardware configuration consisted of an Intel Core i7-13620H processor and 16GB of memory; the software configuration consisted of Python 3.11 and Windows 11 operating system.

[0112] Specifically, 300 electric vehicles were selected for simulation analysis. The model was trained using 576 historical data points from the past six days for the 300 electric vehicles and the charging station, and a total of n=100 training iterations were completed.

[0113] Specifically, 96 time series on the seventh day were selected as the test set, and the cost, battery health status, and grid imbalance power of 300 vehicles at each time series on that day were output.

[0114] Specifically, in simulation analysis, such as Figure 3 The average return of the single network architecture combined with human risk preference strategy is 0.53 yuan, and the curve shifts towards the positive return region as a whole, with no obvious low return tail phenomenon. It effectively reduces the overall energy exchange cost of 300 electric vehicles, and the difference in returns between different vehicles is small. The scheduling strategy has good consistency and universality. Specifically, in simulation analysis, such as Figure 4 The average battery health status of the single network architecture combined with the strategy of human risk preference is 99.9998%, with no obvious significant decline in health status. The reason for this is that the change in state of charge is precisely controlled, and the battery discharge depth is kept stable in the low loss range, which effectively reduces the degradation of battery life.

[0115] Specifically, in simulation analysis, such as Figure 5 The average control effect of the single network architecture combined with the strategy of human risk preference is 10567.22 kWh. The overall fluctuation range is small and it remains in the low range, which effectively smooths the power fluctuation of the grid and realizes the coordinated operation of electric vehicle clusters and the power grid.

[0116] Specifically, in simulation analysis, such as Figure 6 The single-network architecture shown, combined with a strategy that incorporates human risk preferences, has the shortest average training time, simplifying the workload of the image processor GPU and solving the problem of computing power limitations on terminal devices.

[0117] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any simple modifications, alterations, or equivalent structural changes made to the above embodiments based on the technical essence of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A real-time optimization method for vehicle-to-network interaction based on a single network architecture, characterized in that, Includes the following steps: S1. Data Acquisition and Storage: Data related to vehicle-to-network interaction is collected by various terminals and transmitted to the data storage module for storage; S2. Data Distribution and Processing: The data storage module inputs the data into the intelligent decision-making module and the historical experience training module respectively; S3. Historical Experience Optimization: The historical experience module calculates the comprehensive benefit based on the input data of the data storage module, and performs iterative optimization through an optimization model based on a single network architecture, outputting the overall optimal action network function to the intelligent decision-making module; S4. Intelligent Decision Computation: The intelligent decision module constructs a human risk preference model and a quantile mapping function. It performs a Hadamard product between the state vector and the quantile mapping function to obtain a new state vector that incorporates quantile features. This new state vector is then input into the overall optimal action network function output by the historical experience training module, which outputs the optimal solution for the energy interaction between each electric vehicle and the power grid in the current state. S5. Command Execution and Interaction: The intelligent decision-making module sends the optimal solution for interactive energy to the execution interaction module through the communication module, completing the energy interaction execution between the electric vehicle and the power grid.

2. A real-time optimization method for vehicle-to-network interaction based on a single network architecture, characterized in that, S3 also includes the following steps: S3.

1. The data storage module collects historical data on renewable energy generation, charging station power load, charging price, discharge revenue, user travel demand, state of charge, and battery health in the region, and inputs this data into the historical experience training module to construct the state vector s. t ; S3.

2. Initialize the core physical quantities of the electric vehicle, the experience pool, and the current and target network functions; S3.

3. Constructing a Human Risk Preference Model β(τ) Quantile mapping function Г( ) A new state vector incorporating quantile features is obtained by performing a Hadamard product operation on the state vector and the quantile mapping function. X t,β(τ) ; S3.

4. Establish the overall optimal action network function μ( ) The new state vector with quantile features is input. X t,β(τ) Perform initial action selection and output the optimal solution for the energy interaction between each electric vehicle and the power grid in the current state; S3.

5. Constrain the electric vehicle's state of charge and energy of a single grid interaction, and update the state vector s for the next time step. t+1 ; S3.

6. Calculate the overall benefit of choosing the current action in the current state, and store the five-element array of the state of the current time sequence and the next time sequence, as well as the action, reward, and termination flag information of the current time sequence, into the experience pool; S3.

7. Establishing a state-value function based on the normalized advantage function V( ) and positive definite precision matrix function P( ) Design the current network function and the target network function; S3.

8. According to the preset number of scheduling sequences, assign the current network function parameters to the target network function parameters, and calculate the time difference loss at each quantile based on the current network function and the target network function; S3.

9. Perform backpropagation optimization on all parameters to be optimized in the current network with the overall loss as the optimization objective, and repeat steps S3.2 to S3.9 until all data in the training dataset has been traversed; S3.

10. Repeat steps S3.1 to S3.9 to complete n rounds of training until convergence, and output the overall optimal action network function. μ( ) The data is then passed to the intelligent decision-making module.

3. A real-time optimization method for vehicle-to-network interaction based on a single network architecture, characterized in that, S4 also includes the following steps: S4.

1. The data storage module collects current data on renewable energy generation, charging station power load, charging price, discharge revenue, user travel demand, state of charge, and battery health in the area, and constructs a state vector s. t ; S4.

2. Construct a human risk preference model consistent with S3.3 in claim 2. β(τ) Quantile mapping function Г( ) Mapping function between state vector and quantile Г( ) Perform the Hadamard product operation to obtain a new state vector that incorporates quantile features. X t,β(τ) ; S4.3 Obtain the overall optimal action network function output by the historical experience training module. μ( ) The new state vector that integrates the quantile features X t,β(τ) Input to the overall optimal action network function μ( ) Output the optimal energy interaction value between each electric vehicle and the power grid in the current state; S4.

4. The intelligent decision-making module sends energy interaction suggestions to electric vehicle users. After the user confirms, an energy interaction action is generated with the charging station. S4.

5. Synchronously transmit the energy interaction action information and the current time sequence state and action information to the historical experience training module, which will then perform incremental model training and feed back the updated network function. S4.

6. The intelligent decision-making module updates the network function based on the feedback from the training module using historical experience, thereby achieving closed-loop control of the model and ultimately realizing the orderly scheduling of electric vehicle charging and discharging within the region.

4. The real-time optimization method for vehicle-to-network interaction based on a single network architecture as described in claim 2, characterized in that, The human risk preference model β(τ) The mathematical expression of the formula used is as follows: in, .

5. The real-time optimization method for vehicle-to-network interaction based on a single network architecture as described in claim 2, characterized in that, The quantile mapping function Г( ) Using cosine basis functions as the constructor and ReLU activation, the quantile mapping function... Г( ) The parameters to be optimized include weights ω. ij Bias b j .

6. The real-time optimization method for vehicle-to-network interaction based on a single network architecture as described in claim 2, characterized in that, The mathematical expressions used for the current network function and the target network function are as follows: (1) The mathematical expression of the current network function at the τth quantile is: ; (2) The mathematical expression of the target network function at the τth quantile is: Where r t V represents the reward for the current time series, λ is the discount factor, and V target ( ) is the state value function of the target network.

7. The real-time optimization method for vehicle-to-network interaction based on a single network architecture as described in claim 2, characterized in that, All the parameters to be optimized in the current network mentioned in step (9) include: quantile mapping function Г( ) Weights and biases, overall optimal action network function μ( ) Weights and biases, state value function V( ) Weights and biases, positive definite precision matrix functions P( ) The weights and biases.

8. A real-time optimization system for vehicle-to-network interaction based on a single-network architecture, used to implement the real-time optimization method for vehicle-to-network interaction based on a single-network architecture as described in any one of claims 1 to 7, characterized in that, It includes a data storage module, an intelligent decision-making module, a historical experience training module, an execution interaction module, and a communication module; The data storage module is used to store data on renewable energy power generation, charging station power load, charging price, discharge revenue, user travel demand, electric vehicle charge status and battery health status sent by various terminals, and transmit them to the corresponding modules. The intelligent decision-making module is used to execute all the steps in claim 2, receive and process various types of data and generate energy interaction suggestions, and complete the update and iteration of the network function; The historical experience training module is used to execute all the steps in claim 3, complete model training and incremental training, and feed back the updated network function to the intelligent decision module. The execution interaction module is used to display energy interaction suggestions to electric vehicle users, receive user confirmation commands, and switch the working mode of the charging pile between normal charging mode and vehicle-to-grid (V2G) mode. The communication module is used to enable information exchange between various modules, including a wireless communication submodule and a wired communication submodule; The wireless communication submodule is used for communication between the data storage module and various terminals, as well as for information exchange between the intelligent decision-making module and the execution interaction module. The wired communication submodule is used to enable communication between the data storage module, the intelligent decision-making module, and the historical experience training module.

9. A vehicle-to-network interaction real-time optimization device based on a single network architecture, characterized in that, The device includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the vehicle-to-network interaction real-time optimization method based on a single network architecture as described in any one of claims 1 to 7.