SAC-DRL-based continuous ETC cooperative adaptive risk warning method for expressway merging area
By adopting the SAC-DRL system in the merging zone of highways, and combining lightweight decision trees and SAC deep reinforcement learning models, the real-time issuance of personalized warning instructions and driver behavior feedback were realized. This solved the problems of adaptability and insufficient computing power of existing systems, and improved the warning compliance rate and traffic flow management efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUHAN UNIV OF TECH
- Filing Date
- 2026-04-14
- Publication Date
- 2026-06-16
AI Technical Summary
Existing traffic safety warning systems at highway merging zones lack adaptability, cannot provide differentiated guidance based on the micro-state of individual vehicles, lack driver behavior feedback mechanisms, and high-precision algorithms struggle to achieve real-time warnings at edge computing nodes, resulting in low compliance rates with warning strategies and a decline in overall traffic efficiency.
The method based on SAC-DRL is adopted to obtain vehicle data through transactions between the edge computing node MEC and the vehicle OBU device. Personalized warning instructions are generated using lightweight decision trees and SAC deep reinforcement learning models, and are released in real time through DSRC and 5G communication. This continuously monitors driving behavior, dynamically adjusts the warning strategy, and achieves closed-loop feedback and model optimization.
It achieves accurate decision-making with low latency, improves early warning compliance rate and overall traffic capacity, overcomes the problems of poor adaptability and computing power overload of traditional early warning systems, and improves driving safety and traffic flow management efficiency in highway merging areas.
Smart Images

Figure CN122223968A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of traffic safety technology, and more specifically, to a continuous ETC collaborative adaptive risk warning method for highway merging zones based on SAC-DRL. Background Technology
[0002] Merging zones on highways are typical high-risk, accident-prone sections. Due to frequent right-of-way competition and significant speed differences between main road traffic and merging traffic from ramps, merging zones have become hotspots for traffic congestion and accidents. Meanwhile, with the increasingly sophisticated ETC system and the gradual implementation of smart highway demonstration pilots, some highway demonstration sections are typically equipped with dense deployments of V2X devices with communication and sensing capabilities, as well as multi-ETC gantry systems. This provides a unique hardware foundation for achieving refined, all-weather, all-time vehicle management in complex sections such as merging zones.
[0003] Currently, traffic safety warnings for highway merging zones mainly rely on two types of methods: one is traditional physical control and static guidance, such as setting up speed reduction markings, rumble strips, or fixed speed limit signs; the other is information dissemination based on variable message signs (VMS), which involves broadcasting uniform warning text or speed limit instructions to all vehicles in the area after roadside sensing devices detect congestion or accidents. However, the above-mentioned existing technologies still have significant limitations in practical applications. First, the warning strategy lacks adaptability. The existing "one-to-many" broadcast mode cannot provide differentiated guidance for the micro-state of individual vehicles (such as speed, safety clearance), resulting in "one-size-fits-all" warnings that are easily ignored by drivers, leading to low compliance rates. Second, there is a lack of a closed-loop feedback mechanism for the driving behavior of the warned vehicles. Existing systems are mostly one-way broadcasts and cannot track the driver's actions (slowing down or changing lanes) after receiving the warning. If the risk is not eliminated, the system cannot dynamically adjust the secondary strategy. Finally, there is a contradiction between high-precision algorithms and limited edge computing power. Complex deep reinforcement learning (DRL) and other large-scale model inference are time-consuming and have a large number of parameters, making it difficult to directly deploy them on roadside edge computing nodes (MECs) with limited computing power and tight time constraints to achieve real-time early warning.
[0004] Existing early warning technologies face fundamental technical bottlenecks such as poor adaptability, lack of feedback loops, computational overload, and insufficient training. Specifically, rule-based models cannot handle complex game dynamics, while high-precision deep learning models have massive parameters, and full-scale operation easily leads to MEC overload, failing to meet the real-time early warning requirements of tens of milliseconds. Furthermore, the "one-way broadcast" mode cannot detect and respond to situations where drivers ignore warnings, lacking subsequent adaptive intervention mechanisms. Meanwhile, the "self-interested" orientation of traditional reinforcement learning easily triggers secondary accidents, and the scarcity of high-risk conflict samples makes it difficult for models to handle extreme conditions. Existing technologies often rely on a single rule-based model or a massive deep learning model; the former has poor adaptability, while the latter is computationally expensive. Traditional roadside information boards or vehicle-mounted equipment warnings typically only have a one-time issuance function and cannot detect whether the driver has adopted the advice. Most existing algorithms only focus on the traffic efficiency or collision avoidance of a single vehicle, easily leading to a decrease in overall traffic efficiency. The scarcity of high-risk negative samples in early warning strategies results in insufficient model training. Summary of the Invention
[0005] The technical problem to be solved by this invention is to provide a continuous ETC collaborative adaptive risk warning method for highway merging areas based on SAC-DRL, which can accurately target and push personalized warning instructions to specific vehicles, improve the warning compliance rate, and effectively solve the problem of active safety control under complex traffic flow by continuously monitoring the driving behavior feedback of vehicles.
[0006] The technical solution adopted by this invention to solve its technical problem is: constructing a continuous ETC collaborative adaptive risk early warning method for highway merging areas based on SAC-DRL, including the following steps: S1. By trading with the vehicle OBU device through the edge computing nodes MEC set on the upstream and downstream ETC gantries of the merging zone, vehicle data of the upstream and downstream of the highway merging zone is collected. S2. Determine the timing of the warning. Using all vehicle data in the area collected in step S1 as input, the edge computing node MEC outputs a binary judgment of whether to trigger or not to trigger the warning based on a lightweight decision tree model, providing a judgment on whether to trigger the warning. S3. Formulate the warning information content. When the warning judgment is triggered, the SAC deep reinforcement learning model is run independently for each vehicle that passes through the upstream main road ETC gantry of the merging area. The edge computing node MEC uses the vehicle data collected in step S1 as input to generate risk warning content and encode it. The edge computing node MEC then stores the warning content in the ETC gantry RSU device. S4. Issue initial warning information. Based on the warning timing determined in step S2 and the warning information specified in step S3, issue voice warning information to all vehicle smart OBUs passing through the ETC gantry cluster.
[0007] S5. The warning release is continuously optimized. When a vehicle passes through the subsequent gantry on the upstream main road of the merging zone, the system continuously collects data on all vehicles in the merging zone. By directly adjusting the input parameters and weights of the SAC deep reinforcement learning model, the system can adjust the previous warning strategy in real time and release the adjusted warning content before the vehicle leaves the gantry. S6. Warning issuance termination and offline model training: When a vehicle successfully passes through the last ETC gantry on the upstream main road of the merging zone and completes the transaction, the system determines that the closed loop has ended. The complete interaction record of the vehicle will be archived and included in the experience pool for cloud-based offline training to continuously adjust the model parameters.
[0008] According to the above scheme, in step S1, the intelligent OBU device supports DSRC and 4G / 5G dual-mode communication. When the vehicle is within 300 meters of the first upstream gantry, the edge computing node MEC sends a data collection command to the vehicle-mounted intelligent OBU device every once in a while through the 5G channel. The collected data includes timestamp, instantaneous speed, position coordinates, vehicle type data, three-dimensional acceleration and heading angle. The vehicle-mounted intelligent OBU uploads the collected data to the edge computing node MEC through the 5G channel.
[0009] According to the above scheme, in step S2, after vehicle data collection, a thirteen-dimensional key feature vector is extracted: a thirteen-digit timestamp. Instantaneous velocity ( ), WGS84 coordinates ( ), vehicle type ( ), acceleration ( ) and heading angle ; The thirteen-dimensional key feature vectors are divided into three clusters based on the three collection sections: upstream of the main road, downstream of the main road, and ramps, and data aggregation operations are performed on each cluster. The aggregated data types for the main road upstream include: road occupancy rate. Total number of vehicles Average vehicle speed Speed standard deviation High-risk vehicle ratio Density of vehicles accelerating / decelerating rapidly Minimum collision time and queue length ; Data types after ramp aggregation include: inflow speed and minimum inflow gap ; The data types aggregated downstream of the main road include: average vehicle speed. Speed standard deviation High-risk vehicle ratio and density of vehicles accelerating / decelerating rapidly ; The global environmental factors also include: weather. ,peak hours and peak dates .
[0010] According to the above scheme, in step S3, the warning information content is generated based on the SAC deep reinforcement learning model. The warning information content includes risk warning, recommended speed limit value and lane change suggestion, and differentiated strategies are generated in combination with different vehicle types. The SAC deep reinforcement learning model is a discrete SAC model with an improved architecture for traffic scenarios. The Actor network of the discrete SAC changes from outputting action distributions to outputting discrete action probabilities to reduce computational complexity. A weight adjustment mechanism is introduced to automatically adjust the weights of each vector based on the compliance with the previous warning during subsequent warnings, improving information coherence and warning effectiveness. A priority experience replay mechanism is introduced to maintain a balanced distribution of successful and unsuccessful warning guidance experiences in the training experience pool. To enable the SAC deep reinforcement learning model to be deployed on edge computing nodes (MECs) and achieve multi-threaded parallel generation of warning content within 30ms, the model is pre-trained in a cloud information processing center, quantized and distilled, and then deployed to the MEC. Information is periodically uploaded and injected into the experience pool located in the cloud information processing center.
[0011] According to the above scheme, the SAC deep reinforcement learning model data preprocessing process uses the warning vehicle data collected in step S1 to extract the thirteen-dimensional key feature vector of the vehicle driving state, uses the data aggregation method in step S2 to obtain the ten-dimensional secondary traffic feature vector of the overall traffic flow state of the merging area, and combines the three-dimensional global feature vector built into the edge computing node MEC and the six-dimensional previous warning parameters to obtain the thirty-two-dimensional traffic feature vector.
[0012] According to the above scheme, in step S4, when the vehicle is within 30 meters of the ETC gantry, the intelligent RSU device sends a warning command to the on-board intelligent OBU device through the DSRC channel. The warning information adopts the structured data packet format required by the DSRC technical standard. The warning information content includes a voice synthesis command, which is converted into natural speech in real time by the OBU's built-in TTS engine. According to the above scheme, the edge computing node MEC pushes the encoded six-dimensional early warning strategy parameter data packet to the RSU device of the ETC gantry upstream of the merging zone that the vehicle is about to pass through in real time via a fiber optic wired network; the RSU device inserts the data packet into the regular transaction with the vehicle and waits for the target vehicle to trigger it; when the target vehicle travels within 30 meters in front of the ETC gantry, the on-board intelligent OBU device and the roadside RSU device automatically complete the handshake authentication; after receiving the data packet, the on-board intelligent OBU uses its built-in lightweight parsing engine to generate the corresponding natural language text based on the data packet and calls the TTS engine to broadcast it.
[0013] According to the above scheme, in step S5, when the vehicle being warned leaves the previous warning gantry and is within the MEC communication range of the subsequent gantry, the data acquisition operation of step S1 is performed, and the input parameters and weights of the SAC deep reinforcement model are updated to generate the adjusted warning content; then the MEC completes the subsequent broadcast by sending the warning command in step S4; if the vehicle is determined not to trigger a warning at the previous gantry, the decision tree will be run again to determine whether to issue a warning when passing through the subsequent gantry.
[0014] According to the above scheme, in step S6, the interaction record archiving mechanism performs clustering based on whether the guidance is successful, the interaction records are uploaded to the cloud information processing center, and the SAC deep reinforcement learning model learning process is periodically sampled.
[0015] According to the above scheme, the information in the interaction record archive includes the parameters of the warning strategies related to the vehicle, the degree of vehicle compliance, the overall traffic flow in the merging area during that time period, and environmental parameters.
[0016] The SAC-DRL-based continuous ETC collaborative adaptive risk early warning method for highway merging zones, as described in this invention, has the following beneficial effects: 1. This invention effectively resolves the contradiction between the high computational load requirements of artificial intelligence algorithms and the high response speed requirements of existing ETC systems, achieving accurate decision-making with low latency. It innovatively designs a dual-mode driven architecture of lightweight decision tree + lightweight SAC. In step S2, the decision tree with extremely low computational cost is used as a front-end switch to quickly filter risk-free scenarios, and the lightweight SAC model is activated only when necessary to generate complex strategies. This hierarchical processing mechanism significantly reduces the resident load of edge computing nodes (MEC), ensuring that the system can still control the warning latency to the millisecond level in high-concurrency traffic flow scenarios in highway merging areas, thus guaranteeing driving safety.
[0017] 2. This invention overcomes the limitations of traditional one-way warning strategies and achieves adaptive guidance based on driver compliance feedback. In step S5, a strategy feedback adjustment mechanism is introduced to quantitatively assess the driver's willingness to cooperate by calculating the vehicle's speed compliance and lane-changing compliance in real time. When the system detects that the driver has not executed the previous instruction, it dynamically adjusts the historical strategy weights of the SAC model. ) and the feature weights of the vehicles being warned ( This forces the model to automatically generate more targeted intervention strategies at subsequent gantry points. This adaptive logic effectively improves the reach of warning information and the driver's execution rate; 3. This invention overcomes the shortcomings of traditional reinforcement learning in traffic control, which is prone to getting trapped in local optima for individual vehicles, and achieves global collaborative optimization at the traffic flow level; in step S2, the overall secondary traffic feature vector of the merging zone is aggregated, and in step S6, a five-dimensional reward function is constructed, including a traffic flow stability reward (…). The model not only learns how to safely merge vehicles that are warned, but also learns how to effectively smooth traffic flow fluctuations in the merging zone through warnings, thereby improving the overall road capacity. 4. In the model continuous training stage of step S6, the present invention adopts a dual experience buffer for successful / failure and a proportional balanced sampling mechanism. By artificially increasing the sampling ratio of failure samples with low compliance and high risk in the training, the model is prevented from reducing its warning capability for high-risk vehicles due to a large number of safe samples, which significantly enhances the SAC model's ability to perform under extreme conditions where warning is most needed. Attached Figure Description
[0018] The present invention will be further described below with reference to the accompanying drawings and embodiments. In the accompanying drawings: Figure 1 This is the technical roadmap for the SAC-DRL-based continuous ETC collaborative adaptive risk warning method for highway merging areas; Figure 2 This is a schematic diagram of the adaptive optimization process for early warning of vehicle-road cooperative systems on highways according to the present invention; Figure 3 This is a schematic diagram of the decision tree model structure of the present invention; Figure 4 This is a schematic diagram of the SAC deep learning model structure of the present invention. Detailed Implementation
[0019] To provide a clearer understanding of the technical features, objectives, and effects of the present invention, specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0020] like Figure 1-4As shown, the continuous ETC collaborative adaptive risk warning method for highway merging areas based on SAC-DRL (Soft Actor-Critic Deep Reinforcement Learning) of the present invention includes the following steps: S1. Collect vehicle data from upstream and downstream of the highway merging area and ramps. Utilize MECs (Multi-access Edge Computing) located near multiple ETC (Electronic Toll Collection) gantries in this area. When a vehicle enters 300 meters upstream of the MEC, the node proactively initiates a data collection request every 1 second, sending standardized collection commands to the vehicle via a dedicated 5G channel. Upon receiving the command, the intelligent OBU (On-Board Unit) returns an encrypted data packet containing a timestamp, instantaneous speed, location coordinates, vehicle type data, three-dimensional acceleration, and heading angle. The MEC verifies the integrity of the data and stores valid data in the vehicle's dedicated database. If 5G communication fails or the data is incomplete, the system will attempt the data acquisition request twice more. If it still fails, the vehicle is automatically marked as "no warning vehicle," and the method of issuing a warning is abandoned. All data collection and processing is completed 30 meters before the vehicle arrives at the first ETC gantry, ensuring that OBU and ETC transactions based on DSRC (Dedicated Short Range Communications) have sufficient pre-calculation time and do not affect the ETC mainline toll transaction process at all.
[0021] S2 and MEC preload lightweight decision tree models, such as Figure 3 As shown, the lightweight decision tree model contains only three layers of depth, seven decision nodes, and twelve decision influencing factors, and is deployed in the real-time computing partition of MEC.
[0022] After completing the data acquisition in step S1, MEC immediately extracts the thirteen-dimensional key feature vector: a thirteen-digit timestamp. Instantaneous velocity ( WGS84 coordinates (degrees) Vehicle type (length, width, height) Acceleration (longitudinal, lateral, vertical, and angular acceleration) ), heading angle (degrees) .
[0023] The extracted feature vectors were divided into three clusters based on the three collection sections: upstream of the main road, downstream of the main road, and ramps. Data aggregation was then performed on each cluster. The data type of the aggregated upstream section was divided into: road occupancy rate. Total number of vehicles Average vehicle speed Speed standard deviation The proportion of high-risk vehicles (speeding / sudden lane changes) Density of vehicles accelerating / decelerating rapidly Minimum collision time (TTC) Queue length The data types after ramp aggregation are divided into: inflow speed. Minimum inflow gap The data types aggregated downstream of the main road are divided into: average vehicle speed. Speed standard deviation The proportion of high-risk vehicles (speeding / sudden lane changes) Density of vehicles accelerating / decelerating rapidly Add a global environmental factor identifier (weather). ,peak hours Peak Dates After that, it is used as input for the decision tree. The above data aggregation calculation formula is as follows: Road occupancy rate:
[0024] In the formula, The total number of vehicles in the area; Let be the length and width of the i-th vehicle; The area of the road surface to be tested. The output range is [0,1].
[0025] Average vehicle speed:
[0026] In the formula, The total number of vehicles in the area; , Let be the velocity components of the i-th car in the x and y directions.
[0027] Speed standard deviation:
[0028]
[0029] In the formula, The total number of vehicles in the area; The actual speed of the i-th vehicle; This represents the average vehicle speed.
[0030] Proportion of high-risk vehicles:
[0031]
[0032]
[0033] In the formula, This represents the total number of speeding vehicles. The overspeed threshold is usually set to 0.2 (20% over the speed limit). The lateral acceleration threshold, referring to ISO 2631-1, is typically set to 1.47 m / s² (approximately 0.15 g). The total number of vehicles making sudden lane changes; This is the threshold for heading angle variation, which is set to 10° by default. This is an indicator function; it is 1 when the condition is met and 0 otherwise.
[0034] Density of vehicles accelerating / decelerating rapidly:
[0035]
[0036]
[0037] In the formula, The total number of vehicles accelerating rapidly; The rapid acceleration threshold is typically set to 2.5 m / s². This represents the total number of vehicles undergoing rapid deceleration. The rapid deceleration threshold is typically set to 3.4 m / s². To measure the length of the road segment; This is an indicator function; it is 1 when the condition is met and 0 otherwise.
[0038] Minimum collision time:
[0039]
[0040]
[0041]
[0042] In the formula, Let i be the relative distance between vehicles i and j; Let i be the relative speed between vehicles i and j; The TTC value between vehicles i and j; This refers to the maximum distance between the two vehicles included in the calculation; typically... s is considered high risk.
[0043] Queue length:
[0044]
[0045]
[0046] In the formula, Position of the first vehicle; This is the position for the last car in the queue. If there are no cars in the queue, It is 0.
[0047] Import speed:
[0048]
[0049] In the formula, The distance from the vehicle to the merging point; The length of the merging area is the distance from the ramp gantry to the merging point; This represents the number of vehicles merging into the area. This is an indicator function; it is 1 when the condition is met and 0 otherwise.
[0050] Minimum inlet gap:
[0051]
[0052]
[0053] In the formula, For vehicle spacing; The x-coordinates of two adjacent vehicles on the main road; The length of two adjacent vehicles; For dynamic safety distance; The maximum vehicle headway that is psychologically acceptable to the driver is usually set at 4.5 seconds, according to the Highway Capacity Manual.
[0054] like Figure 3 As shown, after receiving input, the decision tree uses the road occupancy rate plus total vehicle count correction as the first-level decision root node, the minimum merging gap plus merging speed correction, and the high-risk vehicle proportion plus speed standard deviation correction as the second-level decision nodes. TTC, queue length, density of vehicles accelerating / decelerating rapidly, and the weather conditions built into the MEC are used as the third-level decision nodes. Global peak time and peak date coefficient corrections are then added to perform a binary judgment of yes / no warning. The calculation formulas for the above first and second-level decision nodes are as follows: Road occupancy rate + total number of vehicles (adjusted)
[0055] In the formula, This is the base correction factor, with a default value of 1. The occupancy sensitivity is set to 1 by default. The sensitivity to the number of vehicles is set to 1 by default. The capacity of the merging zone is estimated based on the number of lanes on the main road within the merging zone. 、 、 It will be automatically adjusted during the model training process.
[0056] Minimum inlet gap + inlet speed correction:
[0057] In the formula, This is the base correction factor, with a default value of 1. This is the time margin sensitivity, with a default value of 1. The sensitivity to the number of vehicles is set to 1 by default. Speed limit for the upstream section of the merging zone, in km / h. 、 、 It will be automatically adjusted during the model training process.
[0058] High-risk vehicle ratio + speed standard deviation correction:
[0059] In the formula, This is the base correction factor, with a default value of 1. Linear sensitivity, with a default value of 1; This represents the non-linear sensitivity, with a default value of 1. , , It will be automatically adjusted during the model training process.
[0060] Regarding the settings of the aforementioned nodes, the road occupancy rate plus speed standard deviation correction, as aggregated data upstream of the main road, characterizes the traffic flow and congestion level upstream of the merging zone, playing a fundamental role in the risk of the merging zone, and is therefore used as the root node. The minimum merging gap plus merging speed correction, as aggregated data of the ramps, is directly affected by upstream traffic; its spatiotemporal characteristics of merging behavior directly determine the possibility and intensity of merging conflicts, playing a crucial role in the safety status of the merging zone. The proportion of high-risk vehicles plus speed standard deviation correction, as aggregated data of the overall upstream and downstream of the main road, reflects the stability of traffic flow and has an amplifying effect on the risk of the merging zone. Time-of-Traffic (TTC), as an upstream conflict verification indicator, calculates collision time based on actual vehicle speed and distance. It is a physical quantity that measures immediate collision risk, accurately capturing the dynamic conflict process and providing a basis for merging risk assessment. Queue length, as an indicator of ramp space resource status, reflects the waiting status of vehicles before merging. Its queue extension directly affects merging efficiency and system capacity, serving as a parameter for judging merging congestion. Rapid acceleration / deceleration vehicle density, as an overall dynamic behavior indicator of the upstream and downstream of the main road, quantifies the frequency of drivers frequently adjusting speed, further revealing the degree of traffic flow instability. Weather conditions, as environmental boundary conditions, affect driving environment factors such as visibility and road surface adhesion coefficient, significantly influencing the formation of instability. Global environmental factor identifiers (peak hours, peak dates) directly affect the threshold settings between nodes as weighting coefficients.
[0061] S3. Generating Personalized Warning Content Based on the SAC Deep Reinforcement Learning Model. The SAC deep reinforcement learning model is a discrete SAC model with an improved architecture for traffic scenarios. The Actor network of the discrete SAC changes from outputting action distributions to outputting discrete action probabilities to reduce computational complexity; a weight adjustment mechanism is introduced to automatically adjust the weights of each vector based on the compliance with the previous warning during subsequent warnings, improving information coherence and warning effectiveness; a priority experience replay mechanism is introduced to maintain a balanced distribution of successful and failed warning guidance experiences in the training experience pool; to enable the model to be deployed in the MEC, multi-threaded parallel generation of warning content is achieved within 30ms. The model is pre-trained in the cloud information processing center, quantized and distilled, and then deployed to the MEC to complete the model deployment, with information periodically uploaded and injected into the experience pool located in the cloud information processing center.
[0062] The SAC deep reinforcement learning model data preprocessing process uses the vehicle warning data collected in step S1 and extracts a 13-dimensional key feature vector describing the vehicle's driving state: a 13-bit timestamp. Instantaneous velocity ( WGS84 coordinates (degrees) Vehicle type (length, width, height) Acceleration (longitudinal, lateral, vertical, and angular acceleration) ), heading angle (degrees) .
[0063] The data aggregation method in step S2 is used to obtain a ten-dimensional secondary traffic feature vector of the overall traffic flow state in the merging zone, which includes: road occupancy rate. Total number of vehicles Average vehicle speed Speed standard deviation The proportion of high-risk vehicles (speeding / sudden lane changes) Density of vehicles accelerating / decelerating rapidly Minimum collision time (TTC) Queue length Import speed Minimum inflow gap .
[0064] The above vehicle data, combined with the MEC's built-in 3D global feature vector, includes weather information. ,peak hours Peak Dates And the previous warning parameters in six dimensions (target speed adjustment amount) Rate of change of velocity Change of direction Lane change recommended parameters Number of times the warning is repeated Early warning priority (If this is the first warning for the vehicle, the default value 0 will be used instead), resulting in a 32-dimensional traffic feature vector.
[0065] For the above thirty-two-dimensional traffic feature vectors, different weights are assigned, and the specific weight calculation formula is as follows:
[0066]
[0067] In the formula, The total eigenvector refers to the total 32-dimensional traffic eigenvectors. These are the parameters from the previous six-dimensional warning. The thirteen-dimensional traffic feature vector of the vehicle under warning; This represents the ten-dimensional secondary traffic feature vector of the merging zone. is the three-dimensional global feature vector. In the formula, is the total weight, and is the set of weights for all traffic feature vectors; The weight of the previous warning parameter; Weights for the characteristics of the vehicles being warned; Weights for traffic flow characteristics in the merging zone; The weights represent global environmental features.
[0068] Weight of the previous warning parameter:
[0069]
[0070] In the formula, This is the normalized average of the parameters from the previous warning. These are the normalized indicators of the previous warning parameters, namely the target speed adjustment. Rate of change of velocity Change of direction Lane change recommended parameters Number of times the warning is repeated Early warning priority Normalized index; where, This is the base weight, with a default value of 1. The enhancement factor has a default value of 1. This is the time decay coefficient, with a default value of 1; The time since the last warning, in seconds (s). 、 、 It will be automatically adjusted during the model training process.
[0071] Weighting of features of vehicles under warning:
[0072] In the formula, This is the base weight, with a default value of 1. The enhancement factor has a default value of 1. For the present The norm; for The maximum possible norm. 、 Adjustments can be made during the model training process.
[0073] Traffic flow characteristic weights in merging zones:
[0074] In the formula, This is the base weight, with a default value of 1. The enhancement factor has a default value of 1. Road occupancy rate; This represents the maximum occupancy rate; the default value is 1. 、 、 It will be automatically adjusted during the model training process.
[0075] Global environment feature weights:
[0076] In the formula, The base weight, with a default value of 0.5; The enhancement factor has a default value of 1. This is a weather feature vector. The default values are 1.5 for rainy / snowy days, 2.0 for foggy days, and 1.0 for sunny days. Other extreme weather conditions can be set by the user. Placeholders are reserved. This is the risk index for peak hours, with a default value of 1.5 for peak hours and 1.0 for off-peak hours. This is the risk index for peak days, with a default value of 1.5 for peak days and 1.0 for off-peak days. 、 、 、 、 It will be automatically adjusted during the model training process.
[0077] The lightweight Actor network employs a fully connected neural network (MLP) architecture, consisting of an input layer, two hidden layers, and an output layer. The process of generating 3D actions using 32D traffic feature vectors includes the following steps: Thirty-two-dimensional traffic feature vectors After normalization, we get As input, forward propagation is performed through matrix multiplication and nonlinear transformation. The number of nodes in both hidden layers is set to 128. The activation function is ReLU (Rectified Linear Unit). The calculation process is as follows:
[0078]
[0079] In the formula, Here is the weight matrix of the hidden layer; It is the bias vector; This is the feature output of the hidden layer.
[0080] The 3D action output process described in the Discrete SAC algorithm Three independent output branches are input, and the Softmax function maps the network output Logits values to a probability distribution. The output 3D action index is:
[0081] In the formula, These correspond to three dimensions of the early warning strategy: speed guidance actions, lane change suggestion actions, and the urgency of the early warning.
[0082] Speed-guided motion index:
[0083] In the formula, This is the weight matrix from the hidden layer to the velocity output layer, used to map the hidden layer features to the velocity action space; This is the bias vector for the velocity output layer, used to adjust the activation threshold. , This was achieved through continuous optimization via cloud-based training. The number of output layer nodes was set to 26. This represents different levels of target speed adjustment suggestions, ranging from acceleration to 125% to deceleration to 0%, with a step size of 5%, including speed maintenance.
[0084] Lane change suggestion action index:
[0085] In the formula, This is the weight matrix from the hidden layer to the lane-switching output layer, used to map the hidden layer features to the lane-switching action space; This is the bias vector for the lane-switching output layer, used to adjust the activation threshold. , This is achieved through continuous optimization via cloud-based training. The number of output layer nodes is set to 3. This represents lane-changing suggestions: maintain the current lane, suggest a left turn, or suggest a right turn.
[0086] Early Warning Urgency Index:
[0087] In the formula, The weight matrix for the output layer, from the hidden layer to the warning repetition count and priority; This is the bias vector for the output layer, representing the number of repetitions and priority of the warning. The number of nodes in the output layer is set to 3. This represents the number of repetitions and priority of the warning: low priority is repeated once, medium priority is repeated twice, and high priority is repeated three times.
[0088] During the MEC inference phase, a deterministic strategy is adopted instead of the random strategy used during training. That is, the action index with the highest probability in each dimension is selected as the final output.
[0089]
[0090]
[0091]
[0092] In the formula, The final output is a three-dimensional motion vector; To guide the final velocity of the motion output; Output the final lane change guidance action; The final warning level is output to indicate the urgency of the situation.
[0093] The final 3D action vector output by the lightweight Actor network The specific calculation and mapping process for converting these parameters into concrete 6-dimensional early warning strategy parameters is as follows: Target speed adjustment amount:
[0094] In the formula, The target speed adjustment amount represents the suggested vehicle speed. Equivalent to the current vehicle speed Adjust the amount. For example, when hour, ,like ,but .
[0095] Rate of change of velocity:
[0096] In the formula, The rate of change of velocity, i.e., acceleration / deceleration, is expressed in units of... ; , , These are the preset comfort level (recommended value 0.15g, referring to the comfort limits for human perception of vibration and acceleration in ISO 2631-1 standard) and the standard level (recommended value 0.25g, emergency threshold, recommended value 0.35g, referring to the benchmark deceleration for determining "safe braking distance" in highway design). This threshold is determined by... Decide.
[0097] Change of direction:
[0098] In the formula, The lane change direction is used to instruct vehicles to maintain their current lane or change lanes left / right to avoid merging zone risks. This parameter is determined by the lane change suggestion action. Decide.
[0099] Recommended lane change parameters:
[0100] In the formula, This is a lane change suggestion parameter used to quantify the strength of the lane change suggestion. This parameter integrates the lane change action. and the degree of urgency of the risk The higher the value, the more necessary it is to suggest lane changing. The warning strategy adjusts the tone of the voice prompt accordingly (such as "suggest lane change" or "please change lane immediately").
[0101] Number of times the warning is repeated:
[0102] In the formula, To indicate the number of repeated warnings, the value should be an integer within the range [1, 3]. Low priority ( ) corresponds to 1 broadcast, medium priority corresponds to 2 broadcasts, and high priority corresponds to 3 broadcasts.
[0103] Warning priority:
[0104] In the formula, This parameter represents the alert priority and is an integer ranging from 1 to 3. When the RSU receives multiple transaction requests simultaneously, it will determine the message queuing priority based on this parameter. A larger value indicates higher priority. The final generated 6-dimensional policy parameters. The encoded structured data packets are transmitted to the RSU device for caching.
[0105] S4. Issue initial warning information. Based on the warning timing determined in step S2 and the six-dimensional warning strategy parameters generated in step S3, the information is disseminated through the "MEC-RSU-OBU" link.
[0106] MEC transmits the encoded six-dimensional early warning strategy parameter data packet, including the target vehicle ID, via a fiber optic wired network. The data packet is pushed in real time to the RSU device at the upstream ETC gantry of the merging zone the vehicle is about to pass through. The RSU inserts this data packet into its regular transaction with the vehicle, waiting for the target vehicle to trigger it. When the target vehicle is within 30 meters of the ETC gantry, it enters the DSRC 5.8GHz communication coverage area, and the onboard intelligent OBU device and the roadside RSU device automatically complete a handshake authentication. After receiving the data packet, the onboard intelligent OBU uses its built-in lightweight parsing engine to generate corresponding natural language text based on the data packet and calls the TTS engine to broadcast it.
[0107] S5. Continuous Optimization of Warning Issuance. When a vehicle receiving a warning passes the subsequent ETC gantry on the upstream main road of the merging zone, the system executes a closed-loop feedback and strategy update process. The MEC node uses the warning vehicle data collected in step S1 to calculate the driver's compliance with the previous warning and dynamically adjusts the input weights of the SAC model accordingly to generate an iterative warning strategy. Subsequently, the MEC sends the warning command through the scheme in step S4 to complete the subsequent broadcast.
[0108] The above calculation process for the compliance rate of the previous round of early warnings specifically includes the following steps: Speed compliance calculation:
[0109] In the formula, The vehicle's current instantaneous speed is determined by... get; The suggested speed for the previous strategy; The tolerance threshold is set to 20% of the current road segment's speed limit. When the speed deviation exceeds this threshold, the compliance rate is reset to zero.
[0110] Lane change compliance calculation:
[0111] In the formula, These are the current lane ID and the suggested target lane ID, which are determined from the MEC's built-in database based on the vehicle's uploaded location. This represents the vehicle's current lateral speed. The maximum lateral speed for a typical lane change is 1.5 m / s (assuming a lane width of 3.75 m, and lateral acceleration is taken as the maximum lateral speed). The shortest lane change time is 2.5 seconds, and the calculated maximum lateral speed is 1.5 m / s. As a direction indicator factor, if Direction and suggested lane change direction If they match, the value is 1; otherwise, it is -1.
[0112] Overall compliance rate:
[0113] In the formula, This is the weighting coefficient, with a default value of 0.5. It will be automatically adjusted during the model training process.
[0114] Based on overall compliance A set of weight correction formulas is constructed to correct the weights of the relevant parameters of the warned vehicles in the feature weight calculation formula of step S3 in real time, so as to enhance the adaptive ability of the enhanced model.
[0115] Last warning parameter weight adjustment:
[0116] In the formula, The base weights for the previous warning parameters; This is the penalty coefficient for historical strategies, with a default value of 1. It will be automatically adjusted during the model training process.
[0117] Correction of feature weights for vehicles under warning:
[0118] In the formula, The basic weights for the feature vectors of the vehicles being warned, collected by the OBU; This is the vehicle status penalty coefficient, with a default value of 1. It will be automatically adjusted during the model training process.
[0119] MEC utilizes the modified weight parameters and The weighted combination of the 32-dimensional traffic feature vectors is recalculated and then input into the lightweight Actor network for forward inference.
[0120] S6. Warning Termination and Offline Model Training. When a vehicle receiving a warning passes the last ETC gantry on the upstream main road of the merging zone, the system determines that the vehicle's guidance task has ended. The MEC node executes the warning termination operation, calculates the reward value based on the vehicle's final performance through the merging zone, packages the complete experience samples, and uploads them to the cloud for continuous iteration of the SAC model.
[0121] To provide the SAC model with accurate optimization direction, MEC calculates the comprehensive reward value for this guidance task based on the vehicle's driving data throughout the merging process. The reward function design comprehensively considers all dimensions. The scoring calculation formula is as follows:
[0122]
[0123] In the formula, As a safety bonus, the minimum collision time calculated in step S2 is used. Measure immediate collision risk; As a traffic efficiency bonus, the average vehicle speed calculated in step S2 is used. Adjustment amount of the target speed in step S3 Assess the loss of traffic efficiency caused by speed fluctuations. This represents the vehicle's average speed over the entire merging zone. Speed at which the warning is triggered , Speed limits are imposed upstream of the merging zone; For driving comfort rewards, the longitudinal acceleration collected in step S1 is used. and lateral acceleration Assess ride comfort. , Its weight; As a reward for traffic flow stability, the speed standard deviation in the secondary traffic feature vector of the merging zone obtained by aggregation in step S2 is used. and density of vehicles accelerating / decelerating rapidly Assess the degree of turbulence in the overall traffic flow in the merging area after the warned vehicles take actions. , Its weight; For policy compliance rewards, the overall compliance score calculated in step S5 is directly used. To measure the driver’s actual compliance with guidance instructions. , , , , This represents the weight of each reward dimension, with a default value of 0.2, which can be adjusted during the model training process.
[0124] This function considers five dimensions: safety, traffic efficiency, driving comfort, traffic flow stability, and policy compliance. It quantifies the impact of individual vehicle behavior on itself and the overall traffic environment, and achieves adaptive optimization of the policy.
[0125] The system packages the vehicle's complete driving trajectory upstream of the merging zone and aggregates it to generate a standard interaction record. This record contains the following four core data categories: previous warning strategies (recording all six-dimensional strategy parameters of the vehicle), vehicle compliance (recording the vehicle's compliance with each warning), overall traffic flow characteristics of the merging zone (recording the ten-dimensional secondary traffic feature vector obtained by aggregation in step S2 during the vehicle's passage through the merging zone), and static environment and vehicle characteristics (recording the external environment and vehicle characteristics during the vehicle's passage through the merging zone, including the three-dimensional global feature vector, the warning vehicle ID and vehicle type data collected in step S1). The standard interaction record is stored in the MEC and automatically uploaded to the cloud information processing center monthly. After receiving the anonymized interaction record, the cloud information processing center determines the label based on the guidance compliance result and stores the data in the guidance success experience buffer and guidance failure experience buffer respectively. Each experience sample is structured and stored in the reinforcement learning standard format, which includes a thirty-two-dimensional traffic feature vector, a three-dimensional action index, a comprehensive reward value, a next state feature, and a task end marker.
[0126] The cloud server cluster periodically triggers the learning process of the SAC deep reinforcement learning model. During training, the system adopts a proportionally balanced sampling strategy from the two buffers mentioned above, forcibly increasing the learning weights for low-compliance "failure samples" to strengthen the model's warning decision logic in that scenario. The cloud utilizes sampling to simultaneously update the gradients of the two Critic networks and the Actor policy network. Value assessment involves calculating the Q-value through the two Critic networks and optimizing network parameters using the TD error minimization principle to accurately assess the potential value of the current traffic state. Update and policy optimization involve calculating the policy entropy and constructing the Actor loss function, combined with an automatically adjusted temperature coefficient. While ensuring continuous exploration, the system preserves the current optimal strategy and generates a better probability distribution of warning actions. The trained SAC model parameters are not directly distributed but undergo quantization and knowledge distillation to transform them into a lightweight model more suitable for edge computing environments. This lightweight model significantly reduces the inference computation burden while maintaining the original decision accuracy. It is then distributed to MEC nodes in each road segment via the network, replacing the old Actor policy network. Thus, the system completes a full adaptive optimization closed loop of "perception-decision-feedback-archiving-training-redeployment," enabling the warning strategy to continuously evolve with the traffic environment.
[0127] In a preferred embodiment of the present invention, the SAC deep reinforcement learning model is used to generate the most suitable strategy content for the vehicle to be warned based on 32 dimensions such as different traffic conditions, vehicle type, weather, and time of day, thus realizing the adaptive construction of the warning strategy. By constructing a dual-model driven architecture of "lightweight decision tree triggering and lightweight SAC execution", the decision tree is used as a warning trigger switch to quickly calculate the overall risk of the merging area and activate the SAC model, thereby significantly reducing the system load while ensuring decision accuracy. A safety control closed loop based on "driver policy-feedback-re-decision" is introduced. By constructing a five-dimensional reward function and a dual experience buffer training mechanism, global collaborative optimization at the traffic flow level and stability improvement under extreme conditions are achieved.
[0128] In a preferred embodiment of the present invention, an adaptive early warning strategy is constructed based on the SAC (Soft Actor-Critic) deep reinforcement learning model using real-time perception data. The SAC model is an algorithm combining deep learning (DL) and reinforcement learning (RL) based on maximum entropy theory. Its core advantage lies in maximizing the policy entropy while maximizing the accumulated expected return. The SAC model introduces an entropy regularization mechanism, which endows the policy with strong exploratory capabilities and stability, enabling it to avoid getting trapped in local optima and maintain policy stability when facing complex, high-dimensional, random, and dynamically changing environments. On the one hand, the "off-policy" characteristic of the SAC model supports efficient learning using historical experience replay pools, greatly improving sample utilization and solving the training problem caused by the scarcity of high-risk samples in traffic scenarios. On the other hand, its entropy-based flexible decision-making mechanism can naturally adapt to various complex interference factors, ensuring that the generated early warning strategy still possesses extremely high generalization ability and safety redundancy under various extreme conditions.
[0129] In a preferred embodiment of the present invention, a SAC deep reinforcement learning model is used to construct the early warning strategy. The process is as follows: a 32-dimensional traffic feature vector, including vehicle type, historical driving style (such as rapid acceleration and deceleration habits), and environmental features such as weather conditions and peak hours, is used as the state input. Leveraging the powerful high-dimensional feature extraction and perception capabilities of the SAC model, the model can keenly capture subtle state differences between "conservative and aggressive drivers" and "sunny days and rainy / foggy days." During inference, the model utilizes the globally optimal strategy learned through the maximum entropy mechanism during the training phase to automatically adjust the parameters of the output action without requiring manual rule resetting. For example, when an increase in the weight of low-visibility environmental features is detected, the model will calculate vehicle deceleration relatively restrainedly; when large trucks or aggressive driving style features are identified, the model will automatically increase the safety margin for lane-changing guidance or raise the warning priority. This mechanism achieves a leap from rigid "static rule execution" to flexible "dynamic scene adaptation," ensuring that the early warning strategy always accurately matches the current traffic flow state and individual characteristics.
[0130] In a preferred embodiment of the present invention, a "strategy feedback adjustment mechanism" is established based on a driver compliance-based strategy feedback closed-loop adjustment mechanism. The system utilizes ETC gantries and MEC devices continuously deployed upstream of the merging zone in a highway demonstration section to track the behavior data of vehicles receiving warnings in real time, calculating their speed compliance and lane-change compliance with the previous warning. When it detects that a driver has not fully executed the instruction, the system dynamically adjusts the "historical strategy weights" and "warned vehicle feature weights" at the input of the SAC model, mathematically strengthening the execution resistance perceived by the model. This automatically adjusts the warning content at subsequent gantries, generating a more operational warning strategy, thus achieving a leap from "one-time, one-way notification" to "continuous guidance feedback."
[0131] In a preferred embodiment of the present invention, a low-load, low-latency decision architecture based on a "lightweight decision tree + lightweight SAC model" is designed. Utilizing a lightweight decision tree with extremely low computational cost, secondary macroeconomic indicators such as road occupancy rate and TTC are used as a pre-emptive "switch" for whether to activate the warning, filtering out the vast majority of risk-free periods. Only when the decision tree determines that there is a potential risk in the current merging zone is the subsequent lightweight SAC deep reinforcement learning model activated to generate the policy. The lightweight SAC model, based on the original pre-trained SAC model through quantization and distillation, can compress the inference time to less than 30 milliseconds, and its performance requirements are low enough to meet the simultaneous concurrent warning policies of MEC for no more than sixteen vehicles. This dual-model architecture avoids full-time, high-load deep inference for all passing vehicles, significantly reducing the resident computing power requirements of MEC, ensuring that the warning latency can still be controlled within a reasonable range during peak traffic periods, and effectively solving the real-time warning problem.
[0132] In a preferred embodiment of the present invention, a post-training mechanism based on a five-dimensional reward function and a dual experience buffer is constructed. This mechanism includes a five-dimensional reward function encompassing safety, traffic efficiency, driving comfort, traffic flow stability, and policy compliance. This guides the model to balance individual vehicle interests with global traffic flow stability during the learning process, avoiding traffic disturbances caused by self-serving behavior. Simultaneously, the cloud training stage employs a "guided success / failure dual experience buffer" and a "proportional balanced sampling" mechanism. This artificially increases the sampling ratio of scarce high-risk (failure) samples during training, preventing the model from losing sensitivity due to a large number of mediocre safe samples. This significantly enhances the SAC model's decision-making ability when handling extremely dangerous conditions.
[0133] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of the present invention without departing from the spirit and scope of the claims. All of these forms are within the protection scope of the present invention.
Claims
1. A continuous ETC collaborative adaptive risk early warning method for highway merging zones based on SAC-DRL, characterized in that, Includes the following steps: S1. By trading with the vehicle OBU device through the edge computing nodes MEC set on the upstream and downstream ETC gantries of the merging zone, vehicle data of the upstream and downstream of the highway merging zone is collected. S2. Determine the timing of the warning. Using all vehicle data in the area collected in step S1 as input, the edge computing node MEC outputs a binary judgment of whether to trigger or not to trigger the warning based on a lightweight decision tree model, providing a judgment on whether to trigger the warning. S3. Formulate the warning information content. When the warning judgment is triggered, the SAC deep reinforcement learning model is run independently for each vehicle that passes through the upstream main road ETC gantry of the merging area. The edge computing node MEC uses the vehicle data collected in step S1 as input to generate risk warning content and encode it. The edge computing node MEC then stores the warning content in the ETC gantry RSU device. S4. Issue initial warning information. Based on the warning timing determined in step S2 and the warning information specified in step S3, issue voice warning information to all vehicle smart OBUs passing through the ETC gantry cluster. S5. The warning release is continuously optimized. When a vehicle passes through the subsequent gantry on the upstream main road of the merging zone, the system continuously collects data on all vehicles in the merging zone. By directly adjusting the input parameters and weights of the SAC deep reinforcement learning model, the system can adjust the previous warning strategy in real time and release the adjusted warning content before the vehicle leaves the gantry. S6. Warning issuance termination and offline model training: When a vehicle successfully passes through the last ETC gantry on the upstream main road of the merging zone and completes the transaction, the system determines that the closed loop has ended. The complete interaction record of the vehicle will be archived and included in the experience pool for cloud-based offline training to continuously adjust the model parameters.
2. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S1, the intelligent OBU device supports DSRC and 4G / 5G dual-mode communication. When the vehicle is within 300 meters of the first upstream gantry, the edge computing node MEC sends a data collection command to the vehicle-mounted intelligent OBU device every once in a while through the 5G channel. The collected data includes timestamp, instantaneous speed, position coordinates, vehicle type data, three-dimensional acceleration and heading angle. The vehicle-mounted intelligent OBU uploads the collected data to the edge computing node MEC through the 5G channel.
3. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S2, after vehicle data collection, a thirteen-dimensional key feature vector is extracted: a thirteen-digit timestamp. Instantaneous velocity ( ), WGS84 coordinates ( ), vehicle type ( ), acceleration ( ) and heading angle ; The thirteen-dimensional key feature vectors are divided into three clusters based on the three collection sections: upstream of the main road, downstream of the main road, and ramps, and data aggregation operations are performed on each cluster. The aggregated data types for the main road upstream include: road occupancy rate. Total number of vehicles Average vehicle speed Speed standard deviation High-risk vehicle ratio Density of vehicles accelerating / decelerating rapidly Minimum collision time and queue length ; Data types after ramp aggregation include: inflow speed and minimum inflow gap ; The data types aggregated downstream of the main road include: average vehicle speed. Speed standard deviation High-risk vehicle ratio and density of vehicles accelerating / decelerating rapidly ; The global environmental factors also include: weather. ,peak hours and peak dates .
4. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S3, warning information content is generated based on the SAC deep reinforcement learning model. The warning information content includes risk warnings, recommended speed limits and lane change suggestions, and differentiated strategies are generated in combination with different vehicle types. The SAC deep reinforcement learning model is a discrete SAC model with an improved architecture for traffic scenarios. The Actor network of the discrete SAC changes from outputting action distributions to outputting discrete action probabilities to reduce computational complexity. A weight adjustment mechanism is introduced to automatically adjust the weights of each vector based on the compliance with the previous warning during subsequent warnings, improving information coherence and warning effectiveness. A priority experience replay mechanism is introduced to maintain a balanced distribution of successful and unsuccessful warning guidance experiences in the training experience pool. To enable the SAC deep reinforcement learning model to be deployed on edge computing nodes (MECs) and achieve multi-threaded parallel generation of warning content within 30ms, the model is pre-trained in a cloud information processing center, quantized and distilled, and then deployed to the MEC. Information is periodically uploaded and injected into the experience pool located in the cloud information processing center.
5. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 4, characterized in that, The SAC deep reinforcement learning model data preprocessing process uses the warning vehicle data collected in step S1 to extract the thirteen-dimensional key feature vector of the vehicle driving state, and uses the data aggregation method in step S2 to obtain the ten-dimensional secondary traffic feature vector of the overall traffic flow state of the merging area. Combined with the three-dimensional global feature vector built into the edge computing node MEC and the six-dimensional previous warning parameters, a thirty-two-dimensional traffic feature vector is obtained.
6. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S4, when the vehicle is within 30 meters of the ETC gantry, the intelligent RSU device sends a warning command to the on-board intelligent OBU device through the DSRC channel. The warning information adopts the structured data packet format required by the DSRC technical standard. The warning information content includes a voice synthesis command, which is converted into natural speech in real time by the OBU's built-in TTS engine.
7. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 6, characterized in that, The edge computing node (MEC) pushes the encoded six-dimensional early warning strategy parameter data packet to the RSU device of the ETC gantry upstream of the merging zone that the vehicle is about to pass through in real time via a fiber optic wired network. The RSU device inserts the data packet into the vehicle's regular transaction and waits for the target vehicle to trigger it. When the target vehicle travels within 30 meters of the ETC gantry, the on-board smart OBU device and the roadside RSU device automatically complete the handshake authentication. After receiving the data packet, the vehicle's intelligent OBU uses its built-in lightweight parsing engine to generate the corresponding natural language text and calls the TTS engine to broadcast it.
8. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S5, when the vehicle being warned leaves the previous warning gantry and is within the MEC communication range of the subsequent gantry, the data acquisition operation of step S1 is performed, and the input parameters and weights of the SAC deep reinforcement model are updated to generate the adjusted warning content; then the MEC completes the subsequent broadcast by sending the warning command in step S4; if the vehicle is determined not to trigger a warning at the previous gantry, the decision tree will be run again to determine whether to issue a warning when passing through the subsequent gantry.
9. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 1, characterized in that, In step S6, the interaction record archiving mechanism clusters the data based on whether the guidance was successful, uploads the interaction records to the cloud information processing center, and periodically samples to trigger the SAC deep reinforcement learning model learning process.
10. The method for continuous ETC collaborative adaptive risk early warning in highway merging areas based on SAC-DRL according to claim 9, characterized in that, The information recorded in the interaction log archive includes parameters of previous warning strategies related to the vehicle, the degree of vehicle compliance, the overall traffic flow in the merging zone during that time period, and environmental parameters.