A high-value scene generation method and system for automatic driving system testing
By systematically identifying and structurally representing high-value scenarios, and utilizing Frenet feature extraction and diffusion models to generate vehicle behavior, the shortcomings in scenario generation in autonomous driving system testing are addressed, thereby improving test coverage and efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TONGJI UNIV
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to generate realistic, dangerous, and representative high-value scenarios in autonomous driving system testing, resulting in insufficient test coverage and limited efficiency.
By systematically identifying and structurally representing high-value scenarios, vehicle behavior is generated using Frenet feature extraction and diffusion models, and scene feature embedding is combined with map encoder and trajectory encoder to guide the generation of high-risk test scenarios.
It enables the generation of realistic, dangerous, and representative test scenarios, improving the efficiency and coverage of autonomous driving system verification.
Smart Images

Figure CN122240495A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of autonomous driving system testing and intelligent transportation system technology, and in particular to a method and system for generating high-value scenarios for autonomous driving system testing. Background Technology
[0002] The operational safety of autonomous driving systems in open and complex traffic environments still lacks sufficient assurance, and reliable and efficient system testing is a prerequisite for their large-scale application. With the improvement of vehicle-road-cloud integrated data collection capabilities, a large number of real driving scenarios have been recorded, laying the foundation for a data-driven testing paradigm. However, due to the influence of long-tail distribution, the original scenarios are mostly low-risk and homogeneous, while complex and dangerous scenarios that can truly expose system defects are extremely rare. This has led to testing remaining focused on simple situations that the system can easily handle for a long time, resulting in insufficient coverage and limited efficiency.
[0003] To improve the relevance of testing, existing research has mined high-value scenarios such as entry, following, abnormal operations, and multi-vehicle interactions from real-world driving data, extracting key segments under specific types or road conditions. However, due to the scarcity, uneven distribution, and significant differences in the types of these scenarios, relying solely on data mining is insufficient to meet testing needs in terms of scale and diversity. To address these shortcomings, researchers have further employed scenario generation techniques, expanding test scenarios through parametric modeling or deep generative models. The former relies on manual design, resulting in limited realism and coverage; while the latter, although closer to real-world distribution, tends to favor conventional and safe scenarios when high-value samples are scarce, making it difficult to simultaneously guarantee complexity and realism.
[0004] In summary, while the high-value scenarios discovered are authentic and reliable, their number is limited and their coverage is insufficient. Although the generated scenarios are diverse, it is difficult to ensure the rationality of their high-value attributes. How to structurally extract high-value scenarios based on real-world scenarios and use them as the basis and constraint for generation to expand into realistic and challenging test scenarios, thereby providing high-quality data support for autonomous driving system testing, has become an urgent technical problem to be solved. Summary of the Invention
[0005] This application provides a method and system for generating high-value scenarios for autonomous driving system testing. By systematically identifying, structurally representing, and generating high-value scenarios using a feature-driven approach, a closed loop is formed between real-world data mining and generation expansion. This provides autonomous driving systems with realistic, dangerous, and representative test scenarios, improving the efficiency and coverage of system verification.
[0006] To address the aforementioned technical problems, in a first aspect, embodiments of this application provide a method for generating high-value scenarios for testing autonomous driving systems, comprising the following steps: First, lanes are obtained from the original trajectory and lane allocation is performed; then, Frenet features are extracted based on the lane allocation; next, relative motion relationships and kinematic parameters between vehicles are obtained based on the extracted Frenet features; next, high-value scenarios are obtained through scene segmentation based on the relative motion relationships and kinematic parameters between vehicles; the high-value scenarios include cutting in, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior; then, the trajectories and maps corresponding to the identified high-value scenarios are saved as high-value scenario samples; based on the high-value scenario samples, a high-value feature distribution is obtained; next, the map structure, lane geometry information, traffic participant trajectories, and interaction relationships are uniformly encoded into a high-dimensional scene representation; finally, a diffusion model is used to model the natural driving scenario, using the scene feature embedding jointly output by the map encoder and trajectory encoder as the generation condition, and the high-value feature distribution is used to guide the diffusion denoising direction constrained by the guiding function to generate vehicle behavior.
[0007] In some exemplary embodiments, obtaining lanes from the original trajectory and performing lane assignment includes: assigning the vehicle's position in the world coordinate system to the lane to which the lane belongs based on map lane centerline information; the map lane centerline information includes position, heading, and speed limit.
[0008] In some exemplary embodiments, Frenet feature extraction is performed based on the lane assignment, including: after completing the vehicle lane assignment, constructing a Frenet coordinate system with the center line of the lane to which the vehicle belongs as a reference curve, characterizing the motion state of the vehicle relative to the lane, and obtaining Frenet features.
[0009] In some exemplary embodiments, the process of obtaining Frenet features includes: projecting the vehicle's position in the global coordinate system onto the centerline of its lane, defining the cumulative arc length of the projection point along the lane centerline as the vehicle's longitudinal coordinate, and defining the vertical distance between the vehicle's position and the projection point as the lateral offset, wherein the sign of the lateral offset is determined by the normal direction of the lane centerline; in the Frenet coordinate system, decomposing the vehicle's velocity vector in the global coordinate system into longitudinal velocity along the lane tangential direction and lateral velocity along the normal direction; and simultaneously, calculating the vehicle's longitudinal acceleration by the difference between the longitudinal velocities in adjacent time frames to obtain Frenet features.
[0010] In some exemplary embodiments, Frenet features include the vehicle's longitudinal coordinates, lateral offset, longitudinal velocity, lateral velocity, and longitudinal acceleration, used to characterize the vehicle's motion along the lane direction.
[0011] In some exemplary embodiments, based on the extracted Frenet features, the relative motion relationships and kinematic parameters between vehicles are obtained, including: after completing lane assignment and Frenet feature extraction, calculating the relative motion relationships between different vehicles at the same time stamp; the relative motion relationships between different vehicles include the longitudinal relative distance, lateral relative distance, and relative speed between vehicles; wherein, the longitudinal relative distance and lateral relative distance are used to describe the spatial proximity of two vehicles, and the relative speed is used to describe the speed difference and approach trend of two vehicles; the kinematic parameters are a set of parameters used to describe the motion state of the vehicle itself and the intensity of its changes, including vehicle position, heading, speed, and longitudinal acceleration.
[0012] In some exemplary embodiments, high-value scenarios are obtained by segmenting the scene based on the relative motion relationship between vehicles and kinematic parameters. This includes: extracting the trajectory start point, end point and average heading of each vehicle as clustering features, and using the DBSCAN clustering algorithm to segment the vehicle set, with each vehicle set constituting a behaviorally independent sub-scene; after completing the sub-scene segmentation, identifying high-risk behaviors within each sub-scene to obtain high-value scenarios.
[0013] In some exemplary embodiments, based on the high-value scene samples, a high-value feature distribution is obtained, including: performing joint distribution modeling on the kinematic features of the high-value samples to characterize the interaction structure strength and provide statistical basis for risk areas and interaction constraints in the generation stage; the kinematic features include vehicle speed, vehicle distance, acceleration, and headway; and storing the statistical kinematic features as a high-value feature distribution.
[0014] In some exemplary embodiments, the scene feature embedding is the joint output of the map encoder and the trajectory encoder, represented as: S∈R {32 × 256} ; where 32 represents the number of coding units in the scene, and 256 represents the scene feature dimension.
[0015] Secondly, this application also provides a high-value scenario generation system for autonomous driving system testing. This system implements the high-value scenario generation method for autonomous driving system testing described in the above embodiments, and is sequentially connected to a data processing and behavior recognition module, a high-value scenario sample and high-value feature distribution module, and an interactive encoding module. The data processing and behavior recognition module is used to obtain lanes from the original trajectory and perform lane allocation; and based on the lane allocation, perform Frenet feature extraction; based on the extracted Frenet features, obtain the relative motion relationship and kinematic parameters between vehicles; and based on the relative motion relationship and kinematic parameters between vehicles, segment the scene to obtain high-value scenarios. The scenarios include: high-value scenarios such as entry, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior; a high-value scenario sample and high-value feature distribution module is used to save the trajectory and map corresponding to the identified high-value scenarios as high-value scenario samples, and to obtain the high-value feature distribution based on the high-value scenario samples; an interaction encoding module is used to uniformly encode the map structure, lane geometry information, traffic participant trajectories, and interaction relationships into a high-dimensional scenario representation; a diffusion model is used to model natural driving scenarios, using the scenario feature embedding jointly output by the map encoder and trajectory encoder as the generation condition, and using the high-value feature distribution and guiding function to constrain the diffusion denoising direction to generate vehicle behavior.
[0016] The technical solution provided in this application has at least the following advantages: This application provides a method and system for generating high-value scenarios for testing autonomous driving systems. The method includes the following steps: First, lanes are obtained from the original trajectory and lane allocation is performed; then, Frenet features are extracted based on the lane allocation; next, relative motion relationships and kinematic parameters between vehicles are obtained based on the extracted Frenet features; then, high-value scenarios are obtained by scene segmentation based on the relative motion relationships and kinematic parameters between vehicles; the high-value scenarios include cutting in, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior; then, the trajectories and maps corresponding to the identified high-value scenarios are saved as high-value scenario samples; based on the high-value scenario samples, a high-value feature distribution is obtained; next, the map structure, lane geometry information, traffic participant trajectories, and interaction relationships are uniformly encoded into a high-dimensional scene representation; finally, a diffusion model is used to model the natural driving scenario, using the scene feature embedding jointly output by the map encoder and trajectory encoder as the generation condition, and the high-value feature distribution is used to guide the diffusion denoising direction constrained by the guiding function to generate vehicle behavior. This application provides a method for automatically mining high-value traffic scenarios from real driving data and generating high-risk, high-fidelity test scenarios based on the mining results and a diffusion model. The method systematically identifies, structures, and uses feature-driven generation methods to create a closed loop between real-world mining and generation expansion, providing real, dangerous, and representative test scenarios for autonomous driving systems and improving the efficiency and coverage of system verification. Attached Figure Description
[0017] One or more embodiments are illustrated by way of example with reference to the accompanying drawings. These illustrations do not constitute a limitation on the embodiments, and unless otherwise stated, the figures in the drawings are not to be limited by scale.
[0018] Figure 1 This is a schematic diagram of the architecture of a method for generating high-value scenarios for testing autonomous driving systems, provided in one embodiment of this application. Detailed Implementation
[0019] As can be seen from the background technology, existing technologies typically expand test scenarios through parametric modeling or deep generative models. The former relies on manual design and has limited realism and coverage; while the latter can closely approximate the real distribution, in the case of scarce high-value samples, the generated results tend to be biased towards conventional and safe scenarios, making it difficult to simultaneously guarantee complexity and realism.
[0020] To address the aforementioned technical issues, this application provides a method and system for generating high-value scenarios for autonomous driving system testing. By systematically identifying, structurally representing, and generating high-value scenarios using a feature-driven approach, a closed loop is formed between real-world data mining and generation expansion. This provides autonomous driving systems with realistic, dangerous, and representative test scenarios, improving the efficiency and coverage of system verification.
[0021] The embodiments of this application will now be described in detail with reference to the accompanying drawings. However, those skilled in the art will understand that many technical details have been provided in the embodiments of this application to facilitate a better understanding of the application. However, the technical solutions claimed in this application can be implemented even without these technical details and various variations and modifications based on the following embodiments.
[0022] See Figure 1 This application provides a method for generating high-value scenarios for testing autonomous driving systems, comprising the following steps: First, lanes are obtained from the original trajectory and lane allocation is performed; then, Frenet features are extracted based on the lane allocation; next, relative motion relationships and kinematic parameters between vehicles are obtained based on the extracted Frenet features; then, high-value scenarios are obtained by scene segmentation based on the relative motion relationships and kinematic parameters between vehicles; the high-value scenarios include cutting in, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior; then, the trajectories and maps corresponding to the identified high-value scenarios are saved as high-value scenario samples; based on the high-value scenario samples, a high-value feature distribution is obtained; next, the map structure, lane geometry information, traffic participant trajectories, and interaction relationships are uniformly encoded into a high-dimensional scene representation; finally, a diffusion model is used to model the natural driving scenario, using the scene feature embedding S jointly output by the map encoder and trajectory encoder as the generation condition, and using the high-value feature distribution and guiding function to constrain the diffusion denoising direction to generate vehicle behavior.
[0023] Specifically, the high-value scenario generation method for testing autonomous driving systems provided in this application is divided into the following three stages: data processing and behavior recognition stage, high-value scenario sample and high-value feature distribution stage, and interactive coding stage.
[0024] In the data processing and behavior recognition stage, lanes are initially extracted from the original trajectory for lane allocation. Lane allocation refers to the process of assigning the vehicle's position in the world coordinate system to a specific lane based on map lane centerline information (position, heading, speed limit). This involves considering the vehicle's two-dimensional position point at time t. The system selects the target lane with the highest geometric consistency with the vehicle's current location from the candidate lane set and associates the vehicle's trajectory with that target lane. Geometric consistency is determined by a combination of factors, including the minimum distance from the vehicle's location to the lane centerline, whether the projected point falls within the effective range of the lane, and the angle between the heading angle and the lane tangential direction. First, the centerline data of each lane is divided into several vector segments. The projected distance from the vehicle's current location to each lane centerline segment is calculated, along with the angular deviation between the tangential direction of the projected segment and the vehicle's heading. When the minimum projected distance from the vehicle to the lane centerline is no greater than 3.5 meters, and the angular deviation between the vehicle's heading and the corresponding lane tangential direction is no greater than 60°, the lane is considered a valid candidate lane for the vehicle. When multiple valid candidate lanes exist, the lane with the smallest projected distance is selected as the lane to which the vehicle belongs. The lane assignment result is used to determine the vehicle's lane and provides a reference lane centerline for subsequent Frenet feature calculations.
[0025] Based on lane assignment, scene segmentation is performed. Scene segmentation involves extracting Frenet features. Frenet features refer to the characteristics of a vehicle's motion relative to its lane, constructed using the lane centerline as a reference curve after lane assignment. Specifically, the vehicle's position in the global coordinate system is projected onto the lane centerline. The cumulative arc length along the lane centerline is defined as the vehicle's longitudinal coordinate *s*, and the vertical distance between the vehicle's position and the projection point is defined as the lateral offset *d*, the sign of which is determined by the lane centerline normal direction. In the Frenet coordinate system, the vehicle's velocity vector in the global coordinate system is decomposed into longitudinal velocity along the lane tangential direction and lateral velocity along the normal direction. Simultaneously, the vehicle's longitudinal acceleration is calculated by the difference between the longitudinal velocity in adjacent time frames. The resulting Frenet features include the vehicle's longitudinal position *s*, lateral offset *d*, longitudinal velocity, lateral velocity, and longitudinal acceleration, used to characterize the vehicle's motion along the lane direction.
[0026] Based on the extracted Frenet features, the relative motion relationships and kinematic parameters between vehicles are obtained. Subsequently, the trajectory start point, end point, and average heading are extracted for each vehicle as clustering features, and the DBSCAN clustering algorithm is used to segment the vehicle set (neighborhood radius 0.5, minimum sample size 1). Each vehicle set constitutes a behaviorally independent sub-scene. After completing the sub-scene segmentation, high-risk behaviors are identified within each sub-scene to obtain high-value scenes.
[0027] The relative motion relationship between vehicles refers to the calculation of the relative motion relationship between different vehicles at the same time stamp after lane assignment and Frenet feature extraction, including the longitudinal relative distance, lateral relative distance, and relative speed between vehicles. The longitudinal and lateral relative distances describe the spatial proximity of the two vehicles, while the relative speed describes the speed difference and approach trend between them.
[0028] The kinematic parameters are a set of parameters used to describe the vehicle's own motion state and the intensity of its changes, including vehicle position, heading, speed, and longitudinal acceleration.
[0029] The high-value scenarios include entry, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior.
[0030] (1) The identification of the cutting behavior is based on the set coordinate sorting and short-term velocity feature changes.
[0031] In speed characteristic determination, short-term speed refers to the vehicle's speed and its trend within a continuous 5-second time window. By monitoring the speed within this time window, sudden acceleration or deceleration can be identified. When a vehicle experiences a speed change exceeding 10 meters per second squared within the time window, and the positional relationships of surrounding vehicles (moving from the side to the front) and lane status changes (switching from an adjacent lane to the same lane) occur simultaneously, this short-term speed change is considered valuable for analysis. In this case, the relative order and lateral offset d of the vehicle in the Frenet longitudinal coordinate s are used, combined with the peak lateral speed and abrupt change in heading angle within the 5-second time window, to detect the behavior of "entering the main lane from an adjacent lane." The validity of the intrusion event is determined by the head-off distance between the main vehicle and the intruding vehicle, the minimum longitudinal distance, and the duration of the intrusion, thereby filtering out high-risk intrusion segments that pose a significant challenge to the system's braking control.
[0032] First, lanes are assigned to all vehicle trajectories, and the lane number corresponding to each vehicle within a continuous time period is calculated. When a vehicle's lane number changes by crossing non-connected lanes (connected lanes A and B mean that A is the next lane of B or B is the next lane of A), the vehicle is marked as a candidate vehicle for cutting in and enters the subsequent cutting in detection process.
[0033] Secondly, within the synchronized time series of the candidate vehicle and the target vehicle, the relative positional relationship between the two vehicles in the target vehicle's coordinate system is calculated. At any given moment, the absolute value of the lateral position of the candidate vehicle relative to the target vehicle is between 2.5 meters and 5.0 meters, and its longitudinal position is within... When the distance is between 6.0 meters and 10.0 meters, it is determined that the cutting vehicle has entered the side area of the target vehicle, triggering further detection of the cutting behavior.
[0034] Subsequently, after the above conditions are met, the relative position changes of the cutting vehicle are analyzed along the time dimension to determine whether it enters the area in front of the target vehicle within 5 seconds. When the longitudinal distance between the cutting vehicle and the target vehicle is no more than 12.5 meters and the absolute value of the lateral offset is no more than 1.75 meters, it is determined that the cutting vehicle has completed the intrusion process from the adjacent lane to the front of the target lane, thus confirming "intrusion into the main lane from the adjacent lane".
[0035] After confirming an intrusion into the main lane from an adjacent lane, the trajectories of all vehicles in the target lane are checked frame by frame to determine whether there are interfering vehicles that actually form a front-to-back constraint relationship between the target vehicle and the intruding vehicle. Specifically, within a continuous time period after the intruding vehicle enters the area in front of the target vehicle, if no other vehicle simultaneously meets the following conditions in the target vehicle's coordinate system: its longitudinal relative distance is greater than 0 and less than the longitudinal distance between the intruding vehicle and the target vehicle, and its lateral relative distance is not greater than 1.75 meters, and this positional relationship is maintained continuously for at least 1 second, then it is determined that there are no continuously existing interfering vehicles between the target vehicle and the intruding vehicle. In this case, it is considered that the intruding vehicle has formed a direct preceding relationship with the target vehicle after completing the intrusion, and is recognized as a valid intrusion event. Conversely, if there is a vehicle that exists continuously for at least 1 second under the above longitudinal and lateral positional relationships, it indicates that the direct preceding vehicle of the target vehicle is not the intruding vehicle, and this event is not considered a valid intrusion event.
[0036] Based on valid entry events, further screening is conducted for entry segments that pose a significant challenge to the braking of autonomous vehicles. During the entry process, the headway and longitudinal distance between the target vehicle and the entering vehicle are calculated. When the headway is within 1.5 seconds and the longitudinal distance is within 12.5 meters, the entry segment is marked as a high-risk entry segment (high-value entry scenario). (2) The process of recognizing car-following behavior is as follows: based on the longitudinal speed difference, the distance between the front of the vehicle and the longitudinal distance of the Frenet coordinates ( (s) and car-following stability constraints are used to identify car-following behaviors that are continuously influenced by the vehicle in front. These car-following behaviors can be used to evaluate the response capabilities of automated driving systems under interference from irrational traffic participants.
[0037] First, after completing lane assignment and Frenet feature calculation, a candidate set of preceding vehicles is determined for each target vehicle at each timestamp. Candidate preceding vehicles are defined as vehicles that are in the same lane as the target vehicle or in the exit lane of the target vehicle.
[0038] Secondly, among the candidate preceding vehicles, the effective following vehicles are identified. The longitudinal distance and longitudinal speed difference between the target vehicle and the candidate preceding vehicle are calculated frame by frame. It is required that the following conditions are met simultaneously in the same time frame: the longitudinal distance is between 0 m and 12.5 m, the absolute value of the speed difference does not exceed 3 m / s, and the continuous duration of meeting the above conditions is not less than 4 s. Then the vehicle pair is confirmed as a following relationship.
[0039] Finally, gap vehicle detection is performed to eliminate interfering vehicles. During the 4-second period when the car-following relationship is established, all other vehicles in the target lane are checked frame by frame. If a third vehicle exists that simultaneously meets the following conditions during this period: its longitudinal relative distance is greater than 0 and less than the longitudinal distance between the candidate preceding vehicle and the target vehicle, and its lateral relative distance is not greater than 1.75 meters, and it maintains this positional relationship for at least 2 seconds, then the third vehicle is determined to be continuously present between the target vehicle and the candidate preceding vehicle, and the car-following relationship is invalidated. Conversely, if no third vehicle meets the above conditions, it is confirmed that the candidate preceding vehicle forms a continuous car-following stability constraint on the target vehicle, and it is identified as a high-value car-following scenario.
[0040] Following-the-car scenarios can be used to test the accuracy and safety of autonomous driving systems' distance control and longitudinal decision-making during following. For example, they can verify the rationality of longitudinal behavioral outputs such as "accelerating to follow, decelerating to avoid collisions, maintaining distance, and the intensity and timing of acceleration or braking." High-risk following-the-car scenarios are characterized by persistent low distances and frequent speed fluctuations. The system must make real-time trade-offs between comfort and safety; therefore, these scenarios can examine whether longitudinal decisions are timely, whether they are excessive, and whether there is a risk of rear-end collisions. (3) The identification of bad driving behavior of a single vehicle is to detect sudden acceleration and deceleration, abnormal lateral deviation and speeding of the vehicle.
[0041] The rapid acceleration and deceleration can be determined by a preset acceleration threshold, such as when the absolute value of the acceleration is greater than 2.5 meters per square second.
[0042] The abnormal lateral offset is determined by the duration of the lateral position offset of the vehicle relative to the center line of its lane in the Frenet coordinate system. For example, the absolute value of the lateral position offset of the vehicle relative to the center line of its lane in the Frenet coordinate system is greater than 1.5 meters and the duration is more than 4 seconds.
[0043] Speeding refers to a vehicle's instantaneous speed exceeding the speed limit for its current lane, which is provided by map information or road attributes.
[0044] (4) The identification of complex multi-vehicle interaction scenarios is achieved by fusing multi-vehicle synchronization information with the segmentation of independent scenarios. The multi-vehicle synchronization information includes the Frenet features and relative position information of multiple vehicles, and the segmentation of independent scenarios is achieved through density clustering. For the obtained independent scenarios, the multi-directional action influence between vehicles is analyzed through interaction relationship graphs. If multiple vehicles simultaneously exhibit sudden acceleration / deceleration, approaching boundary behavior, and convergence conflict trends, they are judged as high-risk multi-vehicle interaction scenarios, which are used to test the system's multi-agent prediction and collaborative decision-making capabilities.
[0045] Specifically, after completing vehicle lane assignment and Frenet coordinate feature extraction, a trajectory feature vector is first constructed based on the starting and ending positions of the vehicle trajectory and the average heading angle of the trajectory. The DBSCAN density clustering method is then used to divide the large-scale traffic scene, with a neighborhood radius of 0.5 and a minimum sample size of 1. The clustering results correspond to several behaviorally independent sub-scenes. When the distance between vehicle trajectories in the feature space is less than the set radius, they are considered to be in the same cluster. Each set of vehicles obtained after clustering constitutes a behaviorally independent sub-scene. Subsequently, within each sub-scene, a joint quantitative assessment is performed on the abnormal driving behavior of individual vehicles and the interaction risk between vehicles. Abnormal individual behavior is characterized by the degree of vehicle speeding, the absolute value of longitudinal acceleration, and the absolute value of lateral offset in the Frenet coordinate system. Vehicle interaction risk is assessed by the longitudinal relative distance, lateral relative distance, and relative speed relationship between multiple vehicle pairs, combined with the collision time and collision avoidance braking deceleration calculated accordingly.
[0046] Within each sub-scene, a set P of potential interactive vehicle pairs is first constructed. For any two vehicles, their lateral and longitudinal relative distances are calculated using the Frenet coordinate system. When the lateral relative distance does not exceed 4 m and the longitudinal relative distance does not exceed 10 m, the two vehicles are considered to have a possibility of direct spatial interaction, and this vehicle pair is included in the set P of potential interactive vehicle pairs for subsequent interaction risk calculation. Subsequently, an individual anomaly score is calculated for each vehicle within the sub-scene. This score is obtained by weighting three types of kinematic anomalies: vehicle speeding anomaly, longitudinal acceleration anomaly, and lateral offset anomaly. Among them, speeding anomaly is measured by the amount by which the vehicle speed exceeds the road speed limit, longitudinal acceleration anomaly is measured by the absolute value of longitudinal acceleration, and lateral offset anomaly is measured by the absolute value of lateral offset in the Frenet coordinate system. The above three types of anomalies are normalized using 5.0 m / s, 3.0 m / s², and 1.75 m as normalization reference scales, and are weighted and summed with weight coefficients of 0.3, 0.4, and 0.3, respectively, to obtain the individual anomaly score of the vehicle.
[0047] The group interaction score is calculated on the set P of potential interactive vehicle pairs. For each vehicle pair, three types of interaction risks are calculated and weighted and accumulated: (1) Collision time risk: The collision time is calculated from the longitudinal relative distance and longitudinal relative speed between the two vehicles. When the collision time ≥ 12.5 s, the risk is recorded as 0; when 0 ≤ collision time < 12.5 s, the risk is recorded as 1. Collision time / 12.5. (2) Collision avoidance braking requirement risk: Calculate the deceleration required to avoid collision based on the relative speed and longitudinal distance between the two vehicles. When the required deceleration ≤ 0, the risk is recorded as 0; when 0 < required deceleration < 3.0 m / s², the risk is recorded as required deceleration / 3.0; when the required deceleration ≥ 3.0 m / s², the risk is recorded as 1. (3) Instantaneous collision risk: If the rectangular vehicle shapes of the two vehicles overlap on the plane at the current moment, the risk is recorded as 1; otherwise, it is recorded as 0. The above three types of risks are weighted and summed according to weights of 0.4, 0.4, and 0.2 respectively to obtain the interaction risk value of the vehicle pair; the interaction risk values of all vehicle pairs in the same sub-scene are accumulated to obtain the group interaction score of the sub-scene. Subsequently, the individual abnormal scores of the vehicles in the sub-scene are averaged and added to the group interaction score to obtain the comprehensive risk score of the sub-scene, and it is determined whether it is a complex multi-vehicle interaction scenario.
[0048] This application considers two or more vehicles as multiple vehicles.
[0049] The independent sub-scene refers to a group of vehicles that are related to each other in terms of spatial location and movement trend within the same time period, but do not have a direct interaction relationship with other vehicle groups; vehicles in an independent sub-scene are relatively close in longitudinal and lateral position and move synchronously or influence each other in time, while vehicles in different sub-scenes are spatially separated, have significantly different directions of movement, or do not have a continuous proximity relationship.
[0050] In the high-value scenario sample and high-value feature distribution stage, the trajectories and maps corresponding to the identified high-value entry, following, single-vehicle poor driving behavior, and multi-vehicle complex interaction behavior are saved as high-value scenario samples. The trajectory includes the position, heading, and speed at each moment in the scenario. The map includes lane lines and traffic signal information.
[0051] Joint distribution modeling of the kinematic features of high-value samples is used to characterize the strength of the interaction structure, providing a statistical basis for risk areas and interaction constraints in the generation stage. The kinematic features include vehicle speed v, vehicle distance d, acceleration a, and headway THW. The statistical features are stored as a high-value feature distribution.
[0052] First, preset discrete binning interval sets for a, v, d, and THW respectively. , (where B) h The corresponding THW bins are used, and within the statistical time window, each frame sample (a(t), v(t), d(t), THW(t)) is binned and mapped to obtain its corresponding bin index (i,j,k,l), which satisfies the following conditions: .
[0053] Subsequently, all samples were counted to form a four-dimensional histogram. ,in This represents the number of samples falling into the bin combination (i,j,k,l). Normalizing this histogram yields the joint probability distribution. Its definition ,in, It represents the sum of sample counts for all bin combinations, used to characterize the overall distribution characteristics of the scene in the joint feature space of "acceleration-vehicle speed-vehicle distance-time distance".
[0054] Next, the joint distribution is summed and marginalized to obtain the marginal distribution of each feature: the marginal distribution of vehicle speed is defined as follows: The marginal distribution of vehicle spacing is The marginal distribution of acceleration is defined as follows: The marginal distribution of the headway is defined as follows: .
[0055] After obtaining the marginal distribution, the 25% and 75% quantiles are obtained through the cumulative distribution function of the marginal distribution, and the corresponding bin boundaries are used as quantiles to obtain the 50% distribution range [Q25, Q75].
[0056] The above distribution is used to characterize the risk characteristics under each feature dimension. The higher the vehicle speed, the lower the vehicle distance, the higher the absolute value of acceleration, and the lower the headway, the higher the test value. The distribution and statistics are used as guiding information in the subsequent generation stage to constrain the generated samples to cluster towards the target high-risk area in the joint feature space and maintain kinematic characteristics consistent with the target distribution.
[0057] The system acquires dynamic features from high-value scenario samples, thereby obtaining behavioral statistical features and interaction structure features. Dynamic features serve as the primary features, describing the instantaneous motion state of vehicles at various moments. Based on this, behavioral statistical features reflecting the overall behavior pattern of vehicles are obtained by statistically analyzing the dynamic features of individual vehicles over time. Simultaneously, interaction structure features are constructed to characterize the interaction relationships and risks between vehicles by calculating and aggregating the relative relationships between different vehicle dynamic features at the multi-vehicle level.
[0058] The dynamic features are used to describe the instantaneous motion state of the vehicle in the scene, serving as the basic input for behavioral statistical features and interaction structure features. The dynamic features are represented in the Frenet coordinate system and include longitudinal position s(t), lateral offset d(t), and longitudinal velocity. Lateral velocity and longitudinal acceleration .
[0059] Among them, longitudinal velocity With lateral velocity It can be obtained by projecting the vehicle's global velocity vector onto the lane's tangential and normal directions, and the calculation method is as follows.
[0060]
[0061] In the formula: , Let X represent the vehicle's velocity components in the global coordinate system, specifically the velocity components along the x-axis and y-axis. , This is the unit tangential vector of the centerline of the lane where the vehicle is located, representing the direction of the lane.
[0062] longitudinal acceleration Differential calculation: Let be the longitudinal velocity at time t-1. This represents the time interval between two adjacent time steps.
[0063] Based on dynamic characteristics, behavioral statistical characteristics are characterized by statistical analysis over a time window, including maximum longitudinal acceleration, average overspeed intensity, average acceleration, and lateral offset.
[0064] Let the statistical time window for vehicle i be T=[t0,t1], and let T n This indicates the number of sampling points within the time window.
[0065] Maximum longitudinal acceleration is defined as the maximum absolute value of longitudinal acceleration within a time window, and it is calculated as follows: Let be the longitudinal acceleration at time t.
[0066] Average overspeed intensity is used to characterize the degree of sustained overspeeding, and is calculated as follows: Let be the vehicle speed at time t. The speed limit for the lane in which the vehicle is located. Number of samples within the time window.
[0067] Average acceleration is used to characterize the intensity of acceleration and deceleration, and it is calculated as follows: It is an absolute value. The lateral offset magnitude is used to characterize the degree of lateral instability or abnormal lateral movement, and it is calculated as follows: , where d(t) is the Frenet lateral offset (offset of the lane centerline) at time t.
[0068] The interaction structure features take multi-vehicle dynamics features as input to characterize the relative positional relationships, relative motion states and interaction risk intensity between vehicles in a scene, and form an interaction structure representation that can be used for encoding.
[0069] For any pair of vehicles (i, j) and any time t, define the longitudinal relative distance. , , Let be the longitudinal position of vehicles i and j at time t. Relative velocity. , , Let i and j be the longitudinal velocities at time t.
[0070] Collision Time Risk (TTC) can be constructed based on relative distance and relative velocity, and the calculation method is as follows: , Let i be the longitudinal relative distance between vehicles i and j. Let be the longitudinal relative speed between vehicles i and j.
[0071] To facilitate coding, TTC is further mapped to normalized risk intensity: , For TTC risk threshold, This is the normalization scale.
[0072] The calculation method for collision avoidance deceleration requirement characteristic (DRAC) is as follows: And normalized to: ,in The acceptable deceleration threshold.
[0073] Subsequently, vehicle features are treated as nodes, relative positions, and motion states as edges, and... , As an edge attribute, it is used for the conditional constraints and guidance of subsequent diffusion generation.
[0074] In the interactive coding stage, the map structure, lane geometry information, traffic participant trajectories and interaction relationships are uniformly encoded into a high-dimensional scene representation.
[0075] First, the map encoder takes high-precision map features as input and discretizes elements such as lane center lines, road edge lines, and stop signs into a set of polylines P={ }, where the i-th polyline is composed of sampling points The polyline is composed of two-dimensional coordinates and a tangent direction vector (cosθ, sinθ), and C=4 indicates that the point features are composed of two-dimensional coordinates and a tangent direction vector (cosθ, sinθ); a category index is assigned to each polyline. And introduce category embedding For each polyline point sequence, a two-stage PointNetPolyline encoding is used: the first stage concatenates the point features and class embeddings into the MLP to obtain point-level features, which are then max-pooled to obtain global features. The second phase will The point-level features are then concatenated with the input of another MLP and max-pooled to obtain the polyline feature vector. To inject global location information, the center of the multi-segment line is then... Fourier embedding The Fourier embedding is then mapped to the hidden dimension using an MLP to ultimately obtain the multi-segment line node representation. Forming a map feature embedding sequence { }, This indicates an MLP that maps Fourier embeddings to the hidden space via an MLP.
[0076] Secondly, using a trajectory encoder with the historical trajectories of traffic participants as input (each frame contains position, velocity, and heading angle), kinematic feature vectors are constructed at each historical moment. displacement increment , heading ,speed (and its angle with the heading), and kinematic characteristics Vector and participant category embedding After splicing, a time-level representation is obtained through Fourier embedding. Stacked in the time dimension As a feature of trajectory time series.
[0077] Finally, interaction encoding with attention is performed, calculating the relative relation vector for any edge (i, j). , , , Let i be the relative position of vehicle i and vehicle j in the lateral and longitudinal directions. , Let be the relative velocities of vehicle i and vehicle j in the lateral and longitudinal directions. Let the difference in heading angles between vehicle i and vehicle j be . The time difference is used to fuse historical time-series information, map polyline information, and vehicle motion information using attention aggregation, outputting a high-dimensional scene representation as the conditional input for the subsequent diffusion decoder.
[0078] Specifically, during the interactive coding process, the mutual influence between traffic participants is modeled through an attention mechanism.
[0079] First, calculate the relative relationship vector.
[0080] For any edge (i,j), calculate the relative relation vector T. ij :
[0081] in, Let i be the relative position of vehicle i and vehicle j in the lateral and longitudinal directions. Let be the relative speeds of vehicle i and vehicle j in the lateral and longitudinal directions; Let be the difference in heading angle between vehicle i and vehicle j. For time difference.
[0082] Fourier embedding and relational bias: For relation vectors Perform Fourier embedding to obtain relation embedding e pq :
[0083] Then, the relation embedding is mapped to a relative position bias b. ij :
[0084] Multi-head self-attention: In each layer of multi-head self-attention, a linear projection is performed on the node representation to obtain the query (Q), key (K), and value (V):
[0085] Then calculate the attention weights. This is used to perform weighted aggregation of the representations of neighboring nodes:
[0086] The result m after aggregation q Perform a weighted summation:
[0087] In the process of generating high-value scenarios, this application utilizes a diffusion model to model natural driving scenarios, generating vehicle behavior under the conditional input of scenario feature embedding S (output by the map encoder and trajectory encoder). Specifically, it uses the real future trajectory tensor... The modeling object is Na, where Na is the number of traffic participants, Tf is the future step size, and dx is the single-step state dimension. The diffusion model learns the scene distribution pattern p(Y|S) through a "forward noise addition-backward stepwise noise reduction" method, and gradually restores the future trajectory consistent with the real distribution from Gaussian noise during the sampling stage.
[0088] During the training phase, the forward process adds noise to the true future trajectory: based on the noise level σ, it samples from a standard Gaussian source. ~ N(0, I), construct noisy samples This causes the data to gradually degenerate into a Gaussian distribution; the reverse process is achieved by the diffusion decoder in a given ( The invention uses a diffusion model EDM architecture for preprocessing, which first normalizes the noisy trajectory. The denoising direction is predicted and the true future trajectory Y is recovered. The noise is embedded into log σ using the formula hσ = Embed(log σ), and the decoder outputs the normalized residual under the constraint S. Thus, the output of the noise denoiser is obtained. The training objective uses weighted mean square error to make the denoised output approximate the true future trajectory. , It is the square of the L2 norm, i.e., the mean square error.
[0089] In terms of model architecture, the diffusion decoder takes "noisy future trajectory + scene feature embedding" as conditional input and progressively denoises and generates the future trajectory. Let Na be the number of participants in the scene, Tf be the future prediction step size, and dx be the single-step state dimension, then the noisy future trajectory is represented as follows: Scene feature embedding is represented as S={S map , S hist For each participant i, first flatten its noisy future trajectory into a vector. The initial query features are obtained by decoding through the input mapping network: q i = MLP in(yi) ∈ R {D} Where D is the decoding hidden dimension. The noise level σ is obtained as vector z through Fourier embedding. σ = FE(log σ) ∈ R {D} and q i The initial hidden state h is formed by fusion. i 0 = LN(q i ) + W σ z σ (where LN is the layer normalization, W) σ(For linear mapping). Subsequently, a relational edge set E is constructed inside the decoder for conditional injection and interaction modeling. This edge set contains three types: the first is historical edges E. hist , used to extract the historical trajectory features S from the encoder output hist Injected into the current participant node; the second is map edge E. map , used to process the lane line features S output by the map encoder map Injected into the participant node; the third is the adjacent vehicle edge E. agent This is used to model the mutual influence among participants. For any edge (p→q)∈E, a relative relationship vector r is constructed. {pq} As the relational input for attention, r {pq} It must include at least the relative position and the relative motion, r {pq} =[Δx, Δy, Δv x , Δv y [, Δθ, Δt], and obtain the relation embedding e through Fourier encoding. {pq} = FE(r {pq} ), and then mapped to the dimension of each attention head through a linear layer. In the , In the multi-head attention update, h is represented for each node. Q is obtained by performing linear projections respectively i =W Q h i K j =W K h j V j =W V h j And embed the relation into the key or attention matrix: K {pq} =K p + W V e {pq} For node q, the attention weight α is obtained by aggregating all its incoming edge neighbors p∈N(q). {pq} = softmax p ( (Q q · K {pq} ) / sqrt(d h ) ), where d h This represents the dimension of each attention head. The outputs of multiple heads are concatenated and passed through a linear layer to obtain AttnOut. q And use residual update: h q { +1 / 2} = h q +AttnOut q After L layers of attention and gated feedforward iterations, the final decoded features h of each participant are obtained. i L Finally, the high-dimensional features are mapped back to the future trajectory space through the output header: = MLP out (h i L ) ∈ R {Tf·dx} and reshape to obtain ∈ R {Tf×dx} Thus, the predicted future trajectories of all participants are obtained. ∈ R {Na×Tf×dx} .
[0090] Noisy future trajectory It is a tensor with the shape of , representing the noisy trajectory of all participants.
[0091]
[0092] The scene feature embedding S consists of two parts: map feature embedding S map and historical trajectory feature embedding S hist .
[0093]
[0094] For each participant i, its noisy future trajectory Flattened into a vector :
[0095] Then, the initial query features are obtained through an input mapping network (MLP). :
[0096] in, To make noisy trajectories Flattened into one A dimensional vector.
[0097] MLP in Multilayer perceptron (MLP) is used to process trajectory vectors. Mapped to the hidden space.
[0098] D is the hidden dimension of the decoder.
[0099] First, determine the noise level. Perform a logarithmic transformation and then a Fourier embedding:
[0100] Then query features and noise embedding Combining these, we obtain the initial hidden state:
[0101] Where FE(log) represents the Fourier embedding of the logarithm of the noise, resulting in the noise embedding vector. .
[0102] LN( This represents layer normalization, applied to query features. Normalization is performed.
[0103] For a linear mapping matrix, embed the noise. Mapped to the hidden space.
[0104] In the decoder, a relation edge set E is constructed for conditional injection and interaction modeling. This edge set contains three types of edges: (1) Historical Edge E hist : The historical trajectory features S output by the encoder hist Injected into the current participant node.
[0105] (2) Map edge E map : The lane line features S output by the map encoder map Injected into the current participant node.
[0106] (3) Adjacent car side E agent Modeling the interactions among participants.
[0107] For any edge Construct relative relation vectors As a relational input for attention:
[0108] in, These represent the relative positions of vehicle p and vehicle q in the lateral and longitudinal directions, respectively. Let p and q represent the relative velocities of vehicles p and q in the lateral and longitudinal directions, respectively. This represents the difference in heading angle between vehicle p and vehicle q. Indicates time difference.
[0109] Then, the relation vector is embedded using Fourier embedding. Convert to relational embedding And perform a linear mapping:
[0110] in, This is the key vector after relation embedding injection.
[0111] In each layer of multi-head self-attention in the decoder, each node is represented... Performing a linear projection yields the query (Q), key (K), and value (V):
[0112] Then calculate the attention weights. This is used to perform weighted aggregation of the representations of neighboring nodes:
[0113] The result after aggregation for:
[0114] in, This is a linear projection matrix used to calculate the query, key, and value. is the attention weight, representing the correlation between node p and node q. Dimensions for each attention head. Let q be the set of neighbors of node q. This is the weighted relation embedding.
[0115] (6) Residual update and iteration.
[0116] After the multi-head self-attention output, residual updates are performed:
[0117] After L layers of attention and gated feedforward network iterations, the final decoded features of each participant are obtained. .
[0118] (7) Output Layer.
[0119] Finally, the high-dimensional features are processed through the output header (MLP). Mapping back to future trajectory space:
[0120] And then reshape it to obtain the future trajectory of each participant. :
[0121] Ultimately, the predicted future trajectory Y of all participants is obtained:
[0122] During the generation phase, an initial noise trajectory Y of the same size as Y is sampled from a standard normal distribution. T ~ N(0, I), with scene feature embedding S as conditional input, the diffusion decoder iterates through multiple steps of noise level sequence to gradually obtain the generated result.
[0123] For example, scene feature embedding The output is a joint output of the map encoder and trajectory encoder, where 32 represents the number of coding units in the scene and 256 represents the scene feature dimension. (Noisy future trajectory) Where 4 represents the tangential and normal noise levels of the position coordinates and heading angle. The scalar σ represents the noise scale corresponding to the current diffusion time step. The number of traffic participants is 80, representing the prediction time step, and 4 represents the state dimension per step. The output is the denoised future trajectory prediction. .
[0124] For example, the diffusion decoder network structure. In the input mapping layer, for each traffic participant i: its noisy future trajectory is... Flattened into a vector The query vector is mapped to the hidden space using a two-layer fully connected network (MLP): Layer 1: Linear(320 → 256), Layer 2: Linear(256 → 256). Each layer is followed by a ReLU activation function to obtain the query vector. In the noise embedding layer, the noise level is... Take the logarithm Mapped to a vector using Fourier embedding: This is added to the query vector to form the initial hidden state: The multi-layer attention denoising module, the diffusion decoder contains a 3-layer attention denoising module, each layer contains a multi-head self-attention layer, residual connection and normalization layer, feedforward network and 3-layer attention denoising module, each layer contains 8 attention heads.
[0125] Based on the diffusion model, without changing the scene encoding representation space, only the diffusion decoding module is adjusted so that the model generates trajectories containing high-value scene features during the denoising process.
[0126] Specifically, the parameters of the map encoder and trajectory encoder are frozen to ensure consistent output of scene features S for any scene; subsequently, only the diffusion decoder in the diffusion model is fine-tuned. High-value samples provide true future trajectories Y. hv As the monitoring target, condition S is generated simultaneously by the encoder. hv During training, Y hv Performing forward noise addition yields Then Input noise prediction network to obtain denoised output By minimizing Update the diffusion decoder parameters θ and inject the information of "high-value sample distribution" into the reverse denoising to achieve distribution transfer in the decoding stage.
[0127] The frozen scene feature embedding S consists of two parts:
[0128] These features are used as generation conditions in the diffusion decoder.
[0129] Forward noise addition process.
[0130] For high-value samples Perform a forward noise addition process to construct noisy samples. :
[0131] in, For a true high-value future trajectory, the shape is ,in It refers to the number of traffic participants. It is the future predicted step size. It is a single-step state dimension; The noise level represents the noise intensity during the noise addition process. It is standard Gaussian noise with a mean of 0 and a variance of 1.
[0132] Denoising process and fine-tuning training: In the reverse denoising process, the noisy samples are... noise level and scene features S hv The conditional input is fed into the diffusion decoder for denoising prediction.
[0133] First, perform noise reduction output prediction.
[0134] The diffusion decoder is based on conditional input. Predictive denoising residual :
[0135] in, For the denoising function of the diffusion decoder, the parameters are... These are the trainable parameters of the diffusion decoder. This is the normalized version of the noisy trajectory. Fourier embedding for noise level. S hv Feature embedding for high-value scenarios.
[0136] Denoising output and loss function: The goal of the inverse denoising process is to minimize the difference between the denoised output and the true high-value soft trace Y. hv The mean squared error between them. The training objective is to update the parameters of the diffusion decoder by minimizing the following loss function. :
[0137] in, Y is the output of the diffusion decoder, representing the trajectory recovered from noisy samples through a denoising process. hv The true future trajectory of high-value samples. It is the square of the norm, i.e., the mean square error.
[0138] The high-value feature-guided generation process unfolds based on the sampling stage of the diffusion generation model. Its core idea is to use the high-value feature distribution and the guiding function to constrain the diffusion denoising direction, so that the behavioral features of the generated result converge to the high-value feature region, while keeping the scene feature embedding S output by the map encoder and trajectory encoder as the generation condition.
[0139] Specifically, the fully trained diffusion model is still used in the generation phase. First, the initial noise trajectory Y is sampled from a standard Gaussian distribution. T Its dimension is consistent with the future trajectory; at the k-th denoising time step, the current intermediate trajectory Y is... k Corresponding noise scale sigma k The scene feature embedding S is input to the diffusion decoder to obtain the basic denoising prediction Y. k base Subsequently, regarding Y k base Calculate its corresponding kinematic characteristics, including vehicle speed v, distance d, acceleration a, and headway THW, and denote them as the feature vector F(Y). k base ).
[0140] By comparing the current feature F(Y) k base The target features F obtained in advance from high-value scene samples. star (F) star (represented as a vector of distribution mean), and a feature guidance function L is constructed based on the difference between the two. feat Its form is "the sum of the mean squared errors between the current feature and the target feature". Based on the gradient direction of the current trajectory using this guiding function, the basic denoising result is corrected to obtain the guided denoising result Y. k guidedThis is used to proceed to the next time step of the denoising iteration. By repeatedly executing the above "denoising prediction - feature evaluation - guided correction" process across multiple denoising time steps, the diffusion model progressively generates scenes that satisfy the distribution of high-value features under the constraints of scene features S. This process does not require retraining the model.
[0141] (1) Generate noise trajectory.
[0142] During the generation phase, the initial noise trajectory Y is first sampled from a standard Gaussian distribution. T Its dimensions are consistent with its future trajectory:
[0143] here Y represents a standard Gaussian distribution. T It is the initial noise trajectory.
[0144] (2) Basic denoising prediction of the diffusion decoder output.
[0145] At the k-th denoising time step, the current intermediate trajectory Y is... k Corresponding noise scale The scene features are embedded into the S-input diffusion decoder to obtain the basic denoising prediction. :
[0146]
[0147] in, It is the basic denoising prediction, representing the denoising result at the current time step k.
[0148] (3) Calculation of kinematic characteristics.
[0149] Calculate the current base denoised trajectory The kinematic characteristics, including vehicle speed v, distance d, acceleration a, and headway THW, are used to construct a feature vector. :
[0150] Where F(Ybase) is based on the current trajectory The calculated kinematic eigenvectors.
[0151] (4) Calculation of target features.
[0152] Target features These are target features pre-statistically obtained from high-value scene samples, represented as a distribution mean vector:
[0153] in, It is the mean vector of the target features, representing the expected value of the kinematic features in a high-value scenario.
[0154] (5) Characteristic guided function.
[0155] Feature guiding function L feat Based on the current feature F( The difference between the target feature F* and the target feature F* is constructed as the sum of the mean squared errors:
[0156] Among them, L feat It is a feature guidance function used to measure the difference between the current feature and the target feature.
[0157] (6) Gradient correction.
[0158] The gradient direction of the current trajectory is corrected based on the feature guidance function to obtain the guided denoising result. This is used for the denoising iteration before proceeding to the next time step.
[0159] in, It's the learning rate. It is the guiding function L feat Relative to the current trajectory Y k The gradient.
[0160] (7) Denoising iteration.
[0161] By repeatedly executing the above-mentioned "denoising prediction feature evaluation-guided correction" process in multiple denoising time steps k, the diffusion model gradually generates a scene that satisfies the distribution of high-value features under the constraint of scene features S:
[0162] In this process, the diffusion model does not need to be retrained, but is instead guided step-by-step to generate trajectories that satisfy the distribution of high-value features.
[0163] Based on the above technical solutions, this application provides a method and system for generating high-value scenarios for testing autonomous driving systems. The method includes the following steps: First, lanes are obtained from the original trajectory and lane allocation is performed; then, Frenet features are extracted based on the lane allocation; next, relative motion relationships and kinematic parameters between vehicles are obtained based on the extracted Frenet features; then, high-value scenarios are obtained by scene segmentation based on the relative motion relationships and kinematic parameters between vehicles; the high-value scenarios include cutting in, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior; then, the trajectories and maps corresponding to the identified high-value scenarios are saved as high-value scenario samples; based on the high-value scenario samples, a high-value feature distribution is obtained; next, the map structure, lane geometry information, traffic participant trajectories, and interaction relationships are uniformly encoded into a high-dimensional scene representation; finally, a diffusion model is used to model the natural driving scenario, using the scene feature embedding jointly output by the map encoder and trajectory encoder as the generation condition, and the high-value feature distribution is used to guide the diffusion denoising direction constrained by the guiding function to generate vehicle behavior. This application provides a method for automatically mining high-value traffic scenarios from real driving data and generating high-risk, high-fidelity test scenarios based on the mining results and a diffusion model. The method systematically identifies, structures, and uses feature-driven generation methods to create a closed loop between real-world mining and generation expansion, providing real, dangerous, and representative test scenarios for autonomous driving systems and improving the efficiency and coverage of system verification.
[0164] Those skilled in the art will understand that the above-described embodiments are specific examples of implementing this application, and in practical applications, various changes in form and detail may be made without departing from the spirit and scope of this application. Any person skilled in the art can make their own modifications and alterations without departing from the spirit and scope of this application; therefore, the scope of protection of this application should be determined by the scope defined in the claims.
Claims
1. A method for generating high-value scenarios for testing autonomous driving systems, characterized in that, Includes the following steps: Lanes are obtained from the original trajectory and lanes are assigned; then Frenet features are extracted based on the lane assignments. Based on the extracted Frenet features, the relative motion relationship and kinematic parameters between vehicles are obtained; High-value scenarios are obtained by segmenting the scene based on the relative motion relationship between vehicles and kinematic parameters; the high-value scenarios include cutting in, following, poor driving behavior of a single vehicle, and complex interaction behavior of multiple vehicles. The trajectories and maps corresponding to the identified high-value scenes are saved as high-value scene samples. Based on the high-value scene samples, the distribution of high-value features is obtained; The map structure, lane geometry information, traffic participant trajectories and interaction relationships are uniformly encoded into a high-dimensional scene representation; A diffusion model is used to model natural driving scenarios. The scene feature embedding output by the map encoder and trajectory encoder is used as the generation condition. The high-value feature distribution is used, and the guiding function constrains the diffusion denoising direction to generate vehicle behavior.
2. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Lanes are obtained from the original trajectory, and lane assignment is performed, including: Based on the lane centerline information on the map, the vehicle's position in the world coordinate system is assigned to the lane to which the lane belongs; The map lane centerline information includes location, heading, and speed limit.
3. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Based on the lane assignment, Frenet feature extraction is performed, including: Based on the completion of vehicle lane assignment, a Frenet coordinate system is constructed with the center line of the lane to which the vehicle belongs as the reference curve to characterize the motion state of the vehicle relative to the lane and obtain Frenet features.
4. The method for generating high-value scenarios for testing autonomous driving systems according to claim 3, characterized in that, The process of obtaining Frenet features includes: The vehicle's position in the global coordinate system is projected onto the center line of its lane. The cumulative arc length of the projection point along the lane center line is defined as the vehicle's longitudinal coordinate, and the vertical distance between the vehicle's position and the projection point is defined as the lateral offset. The sign of the lateral offset is determined by the normal direction of the lane center line. In the Frenet coordinate system, the vehicle's velocity vector in the global coordinate system is decomposed into longitudinal velocity along the lane tangential direction and lateral velocity along the normal direction; at the same time, the vehicle's longitudinal acceleration is calculated by the difference of the longitudinal velocity in adjacent time frames, thus obtaining Frenet features.
5. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Frenet features include the vehicle's longitudinal coordinates, lateral offset, longitudinal velocity, lateral velocity, and longitudinal acceleration, used to characterize the vehicle's motion along the lane direction.
6. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Based on the extracted Frenet features, the relative motion relationships and kinematic parameters between vehicles are obtained, including: After lane assignment and Frenet feature extraction are completed, the relative motion relationships between different vehicles at the same time stamp are calculated. The relative motion relationships between different vehicles include the longitudinal relative distance, the lateral relative distance, and the relative speed between the vehicles. Among them, the longitudinal and lateral relative distances are used to describe the spatial proximity of the two vehicles, and the relative speed is used to describe the speed difference and the tendency of the two vehicles to approach each other. The kinematic parameters are a set of parameters used to describe the vehicle's own motion state and the intensity of its changes. The parameters include vehicle position, heading, velocity, and longitudinal acceleration.
7. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Based on the relative motion relationships between vehicles and scene segmentation using kinematic parameters, high-value scenes are obtained, including: For each vehicle, the trajectory start point, end point, and average heading are extracted as clustering features, and the DBSCAN clustering algorithm is used to segment the vehicle set. Each vehicle set constitutes a behaviorally independent sub-scene. After completing the sub-scene division, high-risk behaviors are identified within each sub-scene to obtain high-value scenarios.
8. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, Based on the high-value scene samples, a high-value feature distribution is obtained, including: Joint distribution modeling of the kinematic features of high-value samples is performed to characterize the strength of the interaction structure and provide statistical basis for risk areas and interaction constraints in the generation stage; the kinematic features include vehicle speed, vehicle distance, acceleration and headway; the statistical kinematic features are stored as a high-value feature distribution.
9. The method for generating high-value scenarios for testing autonomous driving systems according to claim 1, characterized in that, The scene feature embedding is jointly output by a map encoder and a trajectory encoder, represented as: S e R {32 × 256} ; wherein 32 is the number of encoding units in the scene, and 256 is the dimension of the scene feature.
10. A high-value scenario generation system for autonomous driving system testing, the system being used to implement the high-value scenario generation method for autonomous driving system testing as described in any one of claims 1 to 9, characterized in that, The data processing and behavior recognition module, the high-value scene sample and high-value feature distribution module, and the interactive coding module are connected sequentially. The data processing and behavior recognition module is used to obtain lanes from the original trajectory, perform lane allocation, and extract Frenet features based on the lane allocation. Based on the extracted Frenet features, the relative motion relationships and kinematic parameters between vehicles are obtained; based on the relative motion relationships and kinematic parameters between vehicles, scene segmentation is performed to obtain high-value scenes; the high-value scenes include cutting in, following, single-vehicle poor driving behavior, and complex multi-vehicle interaction behavior. The high-value scene sample and high-value feature distribution module is used to save the trajectory and map corresponding to the identified high-value scene as high-value scene samples, and to obtain the high-value feature distribution based on the high-value scene samples. The interactive encoding module is used to encode the map structure, lane geometry information, traffic participant trajectories and interaction relationships into a high-dimensional scene representation; it uses a diffusion model to model the natural driving scene, uses the scene feature embedding jointly output by the map encoder and trajectory encoder as the generation condition, and uses the distribution of high-value features and the guiding function to constrain the diffusion denoising direction to generate vehicle behavior.