An automatic driving long tail scene construction method and device

CN122242231APending Publication Date: 2026-06-19BEIJING HETENGTUZHI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING HETENGTUZHI TECH CO LTD
Filing Date: 2026-03-19
Publication Date: 2026-06-19

Application Information

Patent Timeline

19 Mar 2026

Application

19 Jun 2026

Publication

CN122242231A

IPC: G06F30/27; G01S13/86; G01S13/88; G01S17/88; G08G1/01; G08G1/0967; G06F30/15; G06F18/25; G06N3/0455; G06N3/0475; G06N3/084; G06N3/0985; G06F111/04; G06F119/14; G06F111/08

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies cannot effectively generate high-quality long-tail scenarios for autonomous driving, resulting in insufficient safety verification and training of autonomous driving systems in extremely low-probability scenarios. Furthermore, the generated scenarios are difficult to meet the compliance and reproducibility requirements of high-level autonomous driving systems.

Method used

By collecting real driving log data, a structured baseline scene representation and controllable element parameter space are constructed. Candidate scenes are generated by using a conditional diffusion model combined with constraint-guided gradient and risk-guided gradient. These scenes are then verified and corrected by a dynamics verifier and a traffic rule verifier. Finally, long-tail scoring and standardized encapsulation are performed.

Benefits of technology

It enables targeted generation of long-tail scenarios, improves the controllability and effectiveness of scenario generation, ensures that the generated scenarios conform to physical dynamics and traffic rules, have high-risk characteristics, and meet the safety verification and training requirements of high-level autonomous driving systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242231A_ABST

Patent Text Reader

Abstract

This invention provides a method and apparatus for constructing long-tail scenarios for autonomous driving. It extracts baseline scenarios from real driving data, constructs a structured representation and a parameter space for controllable elements, and defines counterfactual constraints consisting of intervention and constraint sets. Using the baseline scenarios and counterfactual constraints as generation conditions, a diffusion model is used to simultaneously superimpose constraint-guided gradients that ensure physical rule compliance and risk-guided gradients that enhance risk value during the denoising process, generating candidate long-tail scenarios. Subsequently, frame-by-frame repair is performed through dynamics and traffic rule verification, and low-value and invalid scenarios are eliminated based on long-tail scores. Finally, the target scenario is standardized, encapsulated, and stored. This invention improves the controllability, effectiveness, and realism of scenario generation, achieves targeted generation and training gain of long-tail scenarios, ensures the reproducibility and auditability of scenarios, and significantly improves the safety verification efficiency and compliance of autonomous driving systems.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of autonomous driving data processing technology, and in particular to a method and apparatus for constructing long-tail scenarios for autonomous driving. Background Technology

[0002] One of the core bottlenecks in the evolution of advanced autonomous driving systems from Level 3 to Level 4 / 5 lies in the insufficient coverage and verification capabilities for long-tail safety scenarios. Many safety incidents and near misses in autonomous driving systems often stem from extremely low-probability scenarios such as extreme weather, sudden intrusion from behind obstructions, abnormal behavior of other road users, and the superposition of multiple risk factors. These scenarios account for a very small percentage of real-world road testing data, and relying solely on large-scale real-world testing faces practical challenges such as high costs, long coverage periods, and the inability to reproduce extremely dangerous scenarios.

[0003] To address the aforementioned issues, the industry has proposed various construction schemes for autonomous driving scenarios, which can be mainly categorized into the following three types: The first category is scene generation schemes based on rule scripts or traffic simulators. This scheme generates scene variants by pre-setting the range of scene parameters and using parameter traversal or orthogonal experiments, then imports them into the simulator for testing. Its advantage lies in the strong controllability of the generation process. However, because scene representation is limited by pre-set rules, this scheme struggles to cover complex, long-tailed scenes with multiple coupled factors, and it cannot achieve controllable causal editing of single variables. This makes the generated scenes unsuitable for refined defect attribution in autonomous driving models.

[0004] The second category is scene generation schemes based on generative adversarial networks (GANs) or general diffusion models. These schemes use real driving data as training samples and generate diverse scene images or trajectory sequences through generative models to augment the data. While this approach can generate high-fidelity scene data, its generation process lacks causal constraints and risk orientation specific to the autonomous driving operational design domain, easily generating physically infeasible or traffic-rule-violating invalid scenarios. Furthermore, its generation process lacks the ability to selectively generate high-risk, long-tail scenarios, resulting in limited gains for improving the performance of autonomous driving models.

[0005] The third category is image-level enhancement schemes based on real data. These schemes can only achieve pixel-level modifications such as lighting and weather, and cannot change the interactive behavior and causal logic of traffic participants in the scene. Therefore, their effect on improving the training of end-to-end autonomous driving decision-making models is extremely limited.

[0006] Therefore, there is an urgent need for a new autonomous driving scenario generation solution to meet the core requirements of high-level autonomous driving systems for constructing long-tail scenarios. Summary of the Invention

[0007] In view of this, embodiments of the present invention provide a method and apparatus for constructing long-tail scenes for autonomous driving, which solves the problem that the prior art cannot perform high-quality targeted generation for long-tail scenes of autonomous driving.

[0008] One aspect of the present invention provides a method for constructing long-tail scenarios for autonomous driving, the method comprising the following steps: Collect all real driving log data and extract multiple benchmark scene components that meet the long-tail scenario standard of autonomous driving systems; construct a structured benchmark scene representation for the benchmark scene components and mark the controllable element parameter space; wherein, the benchmark scene representation is a data sequence obtained by sensing the vehicle's own state and the state of traffic participants through multiple types of sensing devices and aligning the environmental state and map data; the controllable element parameter space is the set of elements that can be intervened in the benchmark scene representation and the range of values; Based on the controllable element parameter space, a counterfactual constraint condition including an intervention set and a constraint set is constructed. The intervention set is used to mark the elements that are allowed to be modified in this generation process, and the constraint set is used to mark the elements that must remain unchanged. Using the baseline scene representation and the counterfactual constraints as generation conditions, sampling is performed through a conditional diffusion model to generate candidate long-tail scene representations. Constraint-guided gradients and risk-guided gradients are established to correct the denoising direction. The dynamics verifier and traffic rule verifier are invoked to perform frame-by-frame verification of the candidate long-tail scene representation, and the non-compliant scene parameters are projected into the feasible domain and the trajectory is smoothed to achieve parameter correction. Based on a predefined long-tail scoring model, long-tail scores are calculated for the candidate long-tail scene representations after parameter correction. Risk-free low-value scenes and invalid scenes that are bound to collide and are outside the preset scoring range are eliminated to obtain multiple target long-tail scene representations. The target long-tail scene representation is standardized, encapsulated, and stored for later use.

[0009] In some embodiments, the vehicle's own state and the traffic participant's state in the baseline scene representation are represented by a BEV bird's-eye view of data generated by multiple types of sensing devices, including one or more cameras, lidar, and / or millimeter-wave radar; the vehicle's own state includes vehicle positioning, speed, heading, acceleration, power parameters, and control parameters; the traffic participant's state includes participant positioning, speed, heading, acceleration, and intent label; the map data includes lane topology, passable area markers, and traffic rule markers. The controllable element parameter space includes static environmental elements, dynamic interactive elements, and traffic rule elements. The static environmental elements include weather visibility, light intensity, road surface adhesion coefficient, and the location and size of obstructions. The dynamic interactive elements include traffic participant type, time of appearance, movement trajectory, time of crossing the road, and speed. The traffic rule elements include traffic light phase, speed limit, right-of-way rules, and the location of construction zones.

[0010] In some embodiments, the conditional diffusion model uses a denoised diffusion implicit model or a denoised diffusion probabilistic model as the backbone network, with 20 to 100 diffusion sampling steps and a classifier-free guiding weight of 1 to 4.

[0011] In some embodiments, the step of calculating the constraint-guided gradient includes: Calculate the degree of violation of dynamic constraints The calculation formula is: ; in, Represents the acceleration at time t. Indicates the maximum permissible acceleration. This represents the jerk at time t. Indicates the maximum permissible jerk. Let t represent the curvature of the vehicle's trajectory at time t. Indicates the maximum permissible curvature; The steps for calculating the degree of violation of traffic rule constraints include: Calculate the degree of violation of the minimum safe following distance constraint The calculation formula is: ; in, For standard safe distance, This represents the distance to the vehicle in front at time t; If the current traffic light phase is red, and the vehicle is far from the stop line Calculate the degree of violation of the red light stopping constraint. The calculation formula is: ; in, Indicates the current speed. Preset standard distance; Calculate the degree of violation of the traffic rule constraints The calculation formula is: ; Calculate the first differentiable loss The calculation formula is: ; in, , , and These are the weighting coefficients; Calculate the constrained guided gradient .

[0012] In some embodiments, the calculation steps of the risk-guided gradient include: The pre-trained risk evaluator takes candidate long-tail scene representations generated by the current conditional diffusion model as input and outputs the probability of future collisions or violations to construct a second differentiable loss. Calculate the risk-guided gradient .

[0013] In some embodiments, the pre-training step of the risk assessor includes: The samples are collected based on real driving scenarios. The samples include the sample scenario representation within a set time window. Whether a collision or violation will occur within a specified future time range is marked as a label to construct a training sample set. The sample scenario representation is a data sequence obtained by sensing the vehicle's own state and the state of traffic participants through multiple types of sensing devices and aligning the environmental state and map data. The initial neural network is trained using the training sample set. The initial neural network adopts a spatiotemporal graph neural network or a Transformer architecture neural network based on multi-agent attention. The sample scene representation within a set time window is used as input, and the output is a prediction of whether a collision or violation will occur within a specified future time range. The parameters of the initial neural network are updated based on the binary cross-entropy loss and focus loss of the prediction and the label to obtain the risk evaluator.

[0014] In some embodiments, the dynamics verifier and the traffic rule verifier are discrete verification logic groups driven by physical rules, motion control rules and traffic rules; The parameters of the non-compliant scenes are projected into the feasible region and trajectory smoothing is performed to correct the parameters, including: For speed, acceleration, and yaw rate state variables that exceed the limits, maximum threshold truncation is performed and re-integrated; for trajectory points that cross the boundary of the passable area, the point with the closest Euclidean distance on the boundary is found as the repair point, and spatial geometric interpolation smoothing is performed on the trajectory points before and after repair using B-spline curves or least squares method.

[0015] In some embodiments, the calculation formula for the long-tail rating model is: ; in, This represents the collision or violation probability output by the risk evaluator for the candidate long-tail scenario, with a value range of [0,1]. This indicates the rarity of the candidate long-tail scene in the real road sampling data, with a value range of [0,1]. This represents the prediction uncertainty of the autonomous driving model to be optimized for the candidate long-tail scenario, with a value range of [0,1]. , and These are the weighting coefficients; The preset scoring range is [0.2, 0.8]. A long-tail score below 0.2 indicates a low-value scenario with no risk, while a long-tail score above 0.8 indicates a scenario that is bound to collide or violate the rules and is therefore invalid.

[0016] In some embodiments, the target long-tail scenario representation is standardized and encapsulated, including: adding fully traceable fields to the target long-tail scenario representation, including a unique scenario identifier, the long-tail score, a counterfactual generation random seed, an intervention element difference vector, a physical verification result, a risk score, a conditional diffusion model version, a list hash value, a generation timestamp, and a baseline scenario hash value. Furthermore, risk labeling is performed, and risk types, risk occurrence times, collision or violation risk point coordinates, safety margins, and expected takeover and / or obstacle avoidance actions triggered by the autonomous driving system are labeled frame by frame for the target long-tail scene representation, generating a scene risk heatmap; and an immutable audit record is generated synchronously for each target long-tail scene representation. The audit record is linked in a hash chain, wherein the hash value of the current frame is obtained by hashing the hash value of the previous frame and the audit record content of the current frame, and the hash value is digitally signed to achieve tamper-proof and end-to-end traceability.

[0017] On the other hand, the present invention also provides an autonomous driving long-tail scenario construction device, including a processor, a memory, and a computer program or instructions stored in the memory. The processor is used to execute the computer program or instructions, and when the computer program / instructions are executed, the device implements the steps of the above method.

[0018] The autonomous driving long-tail scene construction method and apparatus of this invention extracts a baseline scene from real driving data and constructs a structured representation and controllable element parameter space, then defines counterfactual constraints consisting of an intervention set and a constraint set. Based on this, using the baseline scene and counterfactual constraints as generation conditions, a conditional diffusion model is used, innovatively superimposed simultaneously during its denoising process, along with constraint-guided gradients ensuring compliance with physical dynamics and traffic rules, and risk-guided gradients aimed at improving the risk value of the scene, thereby generating candidate long-tail scenes. Subsequently, frame-by-frame verification and parameter projection repair are performed using a dynamics verifier and a traffic rule verifier, and low-value and invalid scenes are eliminated based on a long-tail scoring model. Finally, the target scenes that meet the requirements are standardized, packaged, and stored. This technical solution improves the controllability, effectiveness, and realism of scene generation, realizes the directional generation capability and training gain of long-tail scenes, and enhances the reproducibility and auditability of scenes, significantly improving the safety verification efficiency and compliance of autonomous driving systems.

[0019] Additional advantages, objects, and features of the invention will be set forth in part in the description which follows, and will also become apparent in part to those skilled in the art upon studying the description, or may be learned by practice of the invention. The objects and other advantages of the invention can be realized and obtained by means of the structures specifically pointed out in the description and drawings.

[0020] Those skilled in the art will understand that the objectives and advantages achievable with the present invention are not limited to those specifically described above, and that the above and other objectives achievable with the present invention will become clearer from the following detailed description. Attached Figure Description

[0021] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, are not intended to limit the scope of the invention. In the drawings: Figure 1 This is a flowchart illustrating the method for constructing long-tail scenarios for autonomous driving according to an embodiment of the present invention.

[0022] Figure 2 This is a flowchart illustrating the method for constructing and labeling long-tail scenarios for autonomous driving based on diffusion model counterfactual generation, as described in an embodiment of the present invention. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.

[0024] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.

[0025] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.

[0026] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.

[0027] Current technologies for constructing long-tail scenarios in autonomous driving mainly fall into three categories: parameter sweeping schemes based on rules / simulators, generation schemes based on general generative adversarial networks or diffusion models, and image-level enhancement schemes based on real data. These schemes generally suffer from four core defects: First, poor scenario controllability, making it impossible to achieve single-variable intervention while maintaining core causal conditions, leading to chaotic causal logic; second, low effectiveness of generated scenarios, easily generating physically infeasible or rule-incompatible invalid scenarios due to the lack of strong constraints from vehicle dynamics and traffic rules; third, insufficient long-tail directional generation capabilities, lacking risk-oriented mechanisms, resulting in scenarios that cannot accurately match the shortcomings of autonomous driving models, leading to extremely low training gains; and fourth, scenarios that are not reproducible or auditable, lacking standardized encapsulation and metadata records, failing to meet the compliance verification requirements of high-level autonomous driving for traceable and verifiable scenarios.

[0028] In view of this, the present invention provides a method for constructing long-tail scenarios for autonomous driving, such as... Figure 1 As shown, the method includes the following steps S101~S106: Step S101: Collect all real driving log data and extract multiple benchmark scene parts that meet the long-tail scenario standard of autonomous driving system; construct a structured benchmark scene representation for the benchmark scene parts and mark the controllable element parameter space; wherein, the benchmark scene representation is a data sequence obtained by sensing the vehicle's own state and the state of traffic participants by multiple types of sensing devices and aligning the environmental state and map data; the controllable element parameter space is the set of elements that can be intervened in the benchmark scene representation and the range of values.

[0029] Step S102: Construct counterfactual constraints based on the controllable feature parameter space, including an intervention set and a constraint set. The intervention set is used to label the features that are allowed to be modified in this generation process, and the constraint set is used to label the features that must remain unchanged.

[0030] Step S103: Using the baseline scene representation and counterfactual constraints as generation conditions, sample and generate the scene representation through a conditional diffusion model, establish constraint-guided gradients and risk-guided gradients to correct the denoising direction, and generate candidate long-tail scene representations.

[0031] Step S104: Call the dynamics verifier and traffic rule verifier to perform frame-by-frame verification of the candidate long-tail scene representation, project the illegal scene parameters into the feasible domain and perform trajectory smoothing to correct the parameters.

[0032] Step S105: Based on the predefined long-tail scoring model, calculate the long-tail score for the candidate long-tail scene representations after parameter correction, and remove risk-free low-value scenes and invalid scenes that are bound to collide outside the preset scoring range to obtain multiple target long-tail scene representations.

[0033] Step S106: Standardize, encapsulate, and store the target long-tail scene representation for later use.

[0034] In step S101, by collecting the full amount of real driving log data, high-value segments that meet the definition of long-tail scenarios are selected as benchmarks. Long-tail scenarios (corner cases) refer to extreme scenarios and multi-factor combinations that occur with extremely low probability in real driving data but have a significant impact on the safety of autonomous driving systems. They represent the core bottleneck in the training and safety verification of autonomous driving systems. To achieve a digital representation of the complex real world, the benchmark scenario representation is constructed as a multi-dimensional data sequence, specifically including the vehicle's own state, the state of traffic participants, the environmental state, and map data.

[0035] In some embodiments, the vehicle's own state and the traffic participant's state in the baseline scene representation are represented by a BEV bird's-eye view of data generated by multiple types of sensing devices. This maps the perception information from different sensors onto a grid at a top-down view, forming a unified spatial representation. The sensing devices include one or more cameras, LiDAR, and / or millimeter-wave radar; the vehicle's own state includes vehicle location, speed, heading, acceleration, power parameters, and control parameters; the traffic participant's state includes participant location, speed, heading, acceleration, and intent label; the map data includes lane topology, navigable area markers, and traffic rule markers. Time alignment of data at different sampling rates ensures the temporal consistency of the entire sequence.

[0036] Simultaneously, this invention also marks the controllable element parameter space, which serves as the basis for subsequent interventions. It clearly defines which elements can be modified and the scope of modification. Specifically, the controllable element parameter space includes static environmental elements, dynamic interactive elements, and traffic rule elements. Static environmental elements include weather visibility, light intensity, road surface adhesion coefficient, and the location and size of obstructions; dynamic interactive elements include traffic participant types, arrival times, movement trajectories, crossing times, and speeds; and traffic rule elements include traffic light phases, speed limits, yield rules, and the location of construction zones. This provides a clearly structured, element-complete, and causally logical starting point for all subsequent generation work.

[0037] Step S102 is used to construct counterfactual constraints, which is crucial for achieving controllable single-variable generation. An intervention set and a constraint set are constructed based on the previously defined controllable element parameter space. The intervention set precisely labels the specific elements that are allowed to be modified during this generation process, such as modifying only the position of occluders while all other elements remain unchanged. Conversely, the constraint set labels the core causal conditions of the baseline scene that must remain unchanged, such as the basic topology of the road and the initial state of the vehicle. In this way, the generation process is strictly confined to the framework of keeping the core causal conditions unchanged and only changing the specified intervention elements. This mechanism is called counterfactual generation. Its implementation effectively solves the problems of uncontrollable scene variables and confused causal logic in existing technologies, enabling each generated scene variant to be accurately used to analyze the impact of single-factor changes on the behavior of the autonomous driving system, providing a powerful tool for attributing model defects.

[0038] Step S103 uses the baseline scene representation and counterfactual constraints as generation conditions to drive the conditional diffusion model to sample. To ensure that the generation process strictly adheres to the constraints while also generating high-value long-tail scenes in a targeted manner, this invention introduces a dual gradient guidance mechanism. In the denoising process of each step of the diffusion model, in addition to using classifier-free guidance to enhance compliance with the overall conditions, constraint-guided gradients and risk-guided gradients are additionally superimposed.

[0039] Constraint-guided gradients are differentiable losses constructed by calculating the degree to which the generated scene violates dynamic constraints such as maximum acceleration and minimum vehicle distance, as well as traffic rule constraints such as stopping at a red light. Their role is to guide the generation process towards a region feasible according to physical and traffic rules. Risk-guided gradients, on the other hand, are based on a pre-trained risk evaluator. This model assesses the probability of collisions or violations in the scene, and its corresponding gradient guides the generation process towards directions with higher risk and greater challenges. Through the collaborative correction of these two types of gradients, the final generated candidate long-tail scene representations not only strictly conform to the pre-set counterfactual constraints but also inherently possess high-risk characteristics, achieving targeted generation of long-tail scenes.

[0040] In some embodiments, the calculation step of the constraint-guided gradient includes steps S1031 to S1034: Step S1031: Calculate the degree of violation of dynamic constraints The calculation formula is: ; in, Represents the acceleration at time t. Indicates the maximum permissible acceleration. This represents the jerk at time t. Indicates the maximum permissible jerk. Let t represent the curvature of the vehicle's trajectory at time t. This indicates the maximum permissible curvature.

[0041] Step S1032: Calculate the degree of violation of traffic rule constraints. The steps include: Calculate the degree of violation of the minimum safe following distance constraint The calculation formula is: ; in, For standard safe distance, This represents the distance to the vehicle in front at time t.

[0042] If the current traffic light phase is red, and the vehicle is far from the stop line Calculate the degree of violation of the red light stopping constraint. The calculation formula is: ; in, Indicates the current speed. Preset standard distance; Calculate the degree of violation of traffic rule constraints The calculation formula is: ; Step S1032: Calculate the first differentiable loss The calculation formula is: ; in, , , and These are the weighting coefficients.

[0043] Step S1034: Calculate the constrained guided gradient .

[0044] In some embodiments, the calculation step of the risk-guided gradient includes: constructing a second differentiable loss by taking a candidate long-tailed scene representation generated by the current conditional diffusion model as input based on a pre-trained risk estimator and outputting the probability of future collisions or violations. Calculate the risk-guided gradient .

[0045] In some embodiments, the pre-training step of the risk assessor includes steps S1035 and S1036: Step S1035: Collect samples based on real driving scenarios. These samples include scene representations within a defined time window. Labels are added indicating whether a collision or violation will occur within a specified future time window to construct a training sample set. The scene representations are data sequences obtained by aligning the vehicle's own state, the state of traffic participants, and environmental and map data using multiple sensors. Specifically, real collision scenarios or hazardous scene fragments from open-source datasets such as nuScenes and Waymo are used, combined with synthetic collision datasets generated by simulation software such as CARLA, covering various collision types such as rear-end collisions, side collisions, sudden pedestrian appearances, and traffic violations. If a collision or traffic violation occurs within a future time window, a label is added. ,otherwise .

[0046] Step S1036: Train the initial neural network using the training sample set. The initial neural network adopts a spatiotemporal graph neural network or a Transformer architecture neural network based on multi-agent attention. It takes the sample scene representation within a set time window as input and outputs a prediction of whether a collision or violation will occur within a specified future time range. The parameters of the initial neural network are updated based on the binary cross-entropy loss and focus loss between the prediction and the label, resulting in a risk evaluator. Specifically, the AdamW optimizer is used during training to update the network weights of the model through backpropagation until the model's risk prediction accuracy on the validation set converges.

[0047] In some embodiments, the conditional diffusion model uses a denoised diffusion implicit model or a denoised diffusion probabilistic model as the backbone network, with 20 to 100 diffusion sampling steps and a classifier-free guiding weight of 1 to 4.

[0048] Step S104 is responsible for rigorously verifying and repairing the candidate long-tail scene representations from step S103 to ensure their final validity. This invention calls independent dynamics and traffic rule verifiers to check each frame of data in the candidate scene frame by frame. If a scene parameter, such as a car's trajectory, causes excessive lateral acceleration or violates safe following distance, the verifier will precisely output the location and deviation of the violation. Subsequently, constraint projection repair technology is used to forcibly project these violating parameters back into the feasible domain defined by physics and traffic rules, and trajectory smoothing is performed using optimization algorithms such as least squares, thereby completing the correction while ensuring scene continuity. Scenes that cannot be made reasonable through repair are directly filtered out. Verification and repair significantly improve the realism and usability of the final generated scene, increasing the scene validity rate from less than 75% in existing technologies to over 95%, ensuring that every generated scene is physically feasible and complies with traffic regulations.

[0049] In some embodiments, the dynamics verifier and traffic rule verifier are discrete verification logic groups driven by physical rules, motion control rules, and traffic rules. The parameters of the violated scene are projected onto the feasible region and trajectory smoothing is performed to correct the parameters. This includes: performing maximum threshold truncation and re-integration on state variables such as speed, acceleration, and yaw rate that exceed limits; for trajectory points that exceed the boundary of the passable area, finding the point with the closest Euclidean distance on the boundary as the repair point, and using B-spline curves or the least squares method to perform spatial geometric interpolation smoothing on the trajectory points before and after repair.

[0050] Parametric projection is essentially a constrained mathematical optimization truncation problem, ensuring that illegal trajectories are forcibly pulled back within physically / rule-permissible boundaries. For example, if the generated frame speed... Directly perform hard truncation projection: Similarly, by truncating the acceleration and yaw rate to a maximum threshold and then re-integrating the truncated dynamic parameters, a smooth new trajectory point is generated. For example, if the vehicle trajectory point... Crossing the boundary of the passable area Then find the distance on the boundary. The point closest to Euclidean distance As a point of repair. The mathematical standard is: Move the boundary crossing point to Then, B-spline curves are used to perform secondary smoothing interpolation on the trajectory points before and after the repair to ensure that the repaired trajectory does not go out of bounds in space and remains continuous in dynamics.

[0051] Step S105 aims to accurately identify the most valuable long-tail scenarios from numerous candidate scenarios. Each modified candidate scenario is quantitatively evaluated based on a predefined long-tail scoring model. The long-tail scoring integrates three key dimensions: first, the scenario collision or violation probability output by the risk assessor, reflecting the scenario's degree of danger; second, the rarity of the scenario in real-world road data, reflecting its long-tail characteristics; and third, the uncertainty of the autonomous driving model's prediction result for that scenario, representing the scenario's informational value for identifying model weaknesses. By setting a scoring range to eliminate low-scoring, risk-free, low-value scenarios and physically impossible collision scenarios with excessively high scores, this method can accurately filter out high-value long-tail scenarios. This ensures that the scenarios used for subsequent training or testing can accurately exploit the weaknesses of the current autonomous driving model, thereby significantly improving data utilization efficiency and model training gain.

[0052] In some embodiments, the long-tail rating model is calculated as follows: ; in, This represents the collision or violation probability output by the risk evaluator for candidate long-tail scenarios, with a value range of [0,1]. This indicates the rarity of candidate long-tail scenarios in real road survey data, with a value range of [0,1]. This represents the prediction uncertainty of the autonomous driving model to be optimized for candidate long-tail scenarios, with a value range of [0,1]. , and These are the weighting coefficients.

[0053] The preset scoring range is [0.2, 0.8]. A long-tail score below 0.2 indicates a low-value scenario with no risk, while a long-tail score above 0.8 indicates a scenario that is bound to collide or violate the rules and is therefore invalid.

[0054] Step S106 packages the final target long-tail scene representation and all related metadata into a standardized scene package. This scene package includes not only the scene data itself, but also the baseline scene used to generate the scene, the counterfactual intervention set and constraint set, various risks and long-tail scores, the generated random seed, the model version, the consistency verification result, and hash values used to ensure data integrity, among other core information. Simultaneously, a scene manifest file is generated, recording all traceable fields in a structured format, such as the scene's unique identifier, intervention element difference vector, and timestamp. This standardized encapsulation method enables scene reproducibility, allowing for the regeneration of completely identical scenes based on the metadata within the package. Furthermore, all key information throughout the entire lifecycle is recorded, meeting the stringent compliance requirements of advanced autonomous driving systems for traceable, auditable, and verifiable scene data.

[0055] In some embodiments, the target long-tail scenario representation is standardized and encapsulated, including adding fully traceable fields to the target long-tail scenario representation, including a unique scenario identifier, long-tail score, counterfactual generation random seed, intervention element difference vector, physical verification result, risk score, conditional diffusion model version, inventory hash value, generation timestamp, and baseline scenario hash value.

[0056] In addition, risk labeling is performed, and risk types, risk occurrence times, collision or violation risk point coordinates, safety margins, and expected takeover and / or obstacle avoidance actions triggered by the autonomous driving system are labeled frame by frame for the target long-tail scene representation, generating a scene risk heat map, and simultaneously generating an immutable audit record for each target long-tail scene representation. The audit record is linked in a hash chain, wherein the hash value of the current frame is obtained by hashing the hash value of the previous frame and the audit record content of the current frame, and the hash value is digitally signed to achieve tamper-proof and full-link traceability.

[0057] On the other hand, the present invention also provides an autonomous driving long-tail scenario construction device, including a processor, a memory, and a computer program or instructions stored in the memory. The processor is used to execute the computer program or instructions, and when the computer program / instructions are executed, the device implements the steps of the above method.

[0058] The present invention will now be described with reference to a specific embodiment: This embodiment provides a method for constructing and labeling long-tail scenarios and risks in autonomous driving based on counterfactual generation using a diffusion model, such as... Figure 2 As shown, it includes the following 5 core steps S1~S5, covering the entire process of scenario construction and risk labeling: S1: Baseline Scene Extraction and Controllable Element Parameter Space Construction Collect real-world driving log data, extract benchmark scenarios that meet the ODD requirements of autonomous driving systems, and construct a structured benchmark scenario representation and controllable element parameter space, specifically including: The baseline scene is structured by mapping the fused data from multiple cameras, LiDAR, and millimeter-wave radar, along with vehicle status, high-precision maps, and traffic participant status data, to a unified baseline scene representation X0. This representation includes: a static high-precision map M, a dynamic participant status sequence A, and a multi-sensor fused BEV feature map B. Data with different sampling rates is time-aligned to ensure temporal consistency of the scene. The static high-precision map M includes lane topology, passable areas, and traffic signs. The dynamic participant status sequence A includes location, speed, heading, acceleration, and intent label.

[0059] Construction of controllable element parameter space: Based on the ODD boundary of the autonomous driving system, define the set of interventionable elements and their value range, including static environmental elements, dynamic interaction elements, and traffic rule elements, to form a complete controllable element parameter space.

[0060] Static environmental elements include weather visibility, light intensity, and road surface adhesion coefficient; dynamic interactive elements include traffic participant type, time of appearance, movement trajectory, and location / size of obstructions; traffic rule elements include traffic light phase, speed limit, and yielding rules.

[0061] S2: Constructing Counterfactual Constraints and Defining Long-Tail Scoring Based on the baseline scenario, counterfactual intervention constraints and boundary conditions are constructed, and a long-tail scoring calculation model is defined for targeted screening of high-value long-tail scenarios, specifically including: Counterfactual constraint construction: Define a counterfactual intervention set I and a constraint set C, where the intervention set I is the set of elements that can only be modified in this generation, and the constraint set C is the core causal condition of the baseline scenario that must remain unchanged in this generation; during the generation process, only the scenario elements in the intervention set I are modified, and the core causal condition in the constraint set C remains unchanged throughout the process, realizing single-variable controllable counterfactual generation; Long-tail score definition: Construct the LongTail_Score calculation model, with the following formula: ; in, This represents the collision or violation probability output by the risk evaluator for candidate long-tail scenarios, with a value range of [0,1]. This indicates the rarity of candidate long-tail scenarios in real road survey data, with a value range of [0,1]. This represents the prediction uncertainty of the autonomous driving model to be optimized for candidate long-tail scenarios, with a value range of [0,1]. , and These are the weighting coefficients.

[0062] The sampling boundary is set to [0.2, 0.8], filtering out invalid scenarios that are bound to collide and low-value scenarios with no risk, and generating high-value long-tail scenarios in a targeted manner.

[0063] S3: Conditional diffusion sampling guided by dual gradients to generate long-tailed candidate scenes. Based on counterfactual constraints, a conditional diffusion model is used for sampling and generation. During each denoising step, constraint-guided gradients and risk-guided gradients are simultaneously superimposed to ensure the constraint compliance and risk orientation of the generated scene. Specifically, this includes: Basic configuration of the diffusion model: The Denoising Diffusion Implicit Model (DDIM) / Denoising Diffusion Probabilistic Model (DDPM) is used as the backbone network, the number of diffusion sampling steps is set to 20~100 steps, the CFG classifier-free guiding weight is set to 1.0~4.0, and the baseline scene representation X0 and counterfactual constraints are used as the generation conditions.

[0064] Dual-gradient guided sampling: In each denoising step, two differentiable guided gradients are superimposed on top of the CFG guidance to correct the denoising direction. Constrained guiding gradient G1: Calculate the degree of violation of dynamic constraints The calculation formula is: ; in, Represents the acceleration at time t. Indicates the maximum permissible acceleration. This represents the jerk at time t. Indicates the maximum permissible jerk. Let t represent the curvature of the vehicle's trajectory at time t. This indicates the maximum permissible curvature.

[0065] The steps for calculating the degree of violation of traffic rule constraints include: Calculate the degree of violation of the minimum safe following distance constraint The calculation formula is: ; in, For standard safe distance, This represents the distance to the vehicle in front at time t.

[0066] If the current traffic light phase is red, and the vehicle is far from the stop line Calculate the degree of violation of the red light stopping constraint. The calculation formula is: ; in, Indicates the current speed. This is the preset standard distance.

[0067] Calculate the degree of violation of traffic rule constraints The calculation formula is: ; Calculate the first differentiable loss The calculation formula is: ; in, , , and These are the weighting coefficients.

[0068] Calculate constrained guided gradients .

[0069] Risk-guided gradient G2: The steps for calculating the risk-guided gradient include: taking the candidate long-tailed scene representation generated by the current conditional diffusion model as input based on a pre-trained risk estimator, and outputting the probability of future collisions or violations to construct a second differentiable loss. Calculate the risk-guided gradient .

[0070] Candidate scene generation: After completing all steps of denoising, long-tail candidate scenes that satisfy counterfactual constraints are generated.

[0071] S4: Consistency Verification and Scenario Repair Perform physical dynamics and traffic rule consistency checks on the generated candidate scenes, repair scenes that do not meet the constraints, and filter out invalid scenes. Specifically, this includes: Consistency verification: The dynamics verifier and traffic rule verifier are used to verify the trajectory, motion state and traffic behavior of participants in the candidate scene frame by frame, and the verification results and the location of the violation of constraints are output.

[0072] Constraint projection repair: For scenarios that violate constraints, the trajectory / state parameters that violate constraints are projected into the feasible region to complete the scenario repair.

[0073] Invalid scene filtering: Scenes that cannot be repaired are removed to ensure that the final generated scene complies with vehicle dynamics and traffic rules.

[0074] S5: Standardized Encapsulation and Risk Labeling of Scenario Packages Standardize and encapsulate the long-tail scenarios that pass the verification, generate scenario packages and corresponding scenario manifests, and complete full-dimensional risk labeling, specifically including: The core components of a scene package: Generate a standardized scene package (Scenario Package), the core contents of which include: scene data. Static map M, counterfactual intervention set I, constraint set C, risk score, long tail score, generate random seed, diffusion model version, verification result, integrity hash value; The Manifest field definition for the scenario list: Write all traceable fields into the scenario list, including: scenario_id (unique scenario identifier), LongTail_Score (long-tail score), Counterfactual_Seed (random seed generated by counterfactual), delta_vector (differential vector of intervention elements), physics_check (physical verification result), risk_score (risk score), model_version (model version), manifest_hash (manifest hash value), timestamp (timestamp generated), and input_hash (baseline scenario hash value); Risk labeling: The scene is labeled frame by frame, including the risk type, the time of risk occurrence, the collision risk point, the safety margin, and the expected takeover / obstacle avoidance actions to be triggered by the autonomous driving system. A scene risk heat map is generated, and an immutable audit record is generated synchronously for each target long-tail scene representation. The audit records are linked in a hash chain, where the hash value of the current frame is obtained by hashing the hash value of the previous frame and the audit record content of the current frame. The hash value is digitally signed to achieve tamper-proof and end-to-end traceability.

[0075] Furthermore, to meet the compliance requirements of advanced autonomous driving, this invention constructs a full-link anti-tampering audit mechanism: Audit record generation: During each scenario generation execution cycle, an immutable audit record is generated synchronously. The core fields include: scenario_id, record_id global serial number, timestamp millisecond timestamp, input_hash, output_hash, manifest_hash, verification result, and model version.

[0076] Hash chain evidence storage: Audit records are linked using a hash chain method. The payload_i is the core content of the i-th audit record. The hash value is signed using the national cryptographic SM2 algorithm to prevent the audit record from being tampered with.

[0077] Verifiable Interface: Provides a standardized verifiable interface to external parties, outputting a list of core fields for scenarios and audit records, supporting independent verification by third parties and compliance evidence.

[0078] This embodiment provides a long-tail scene construction device, including: a baseline scene processing module, a counterfactual constraint encoding module, a dual-gradient guided diffusion generation module, a consistency verification and repair module, and a scene encapsulation and risk labeling module; each module works together to execute the above method steps to achieve fully automated long-tail scene construction.

[0079] This device can be expanded to include an audit and evidence storage module, a sample screening module, and a closed-loop iteration module. It can be deployed on cloud training servers, roadside edge nodes, and vehicle-side computing platforms. Each device only performs actions within its own permission scope, without any forced cross-entity interaction.

[0080] Specifically, this implementation provides a concrete application scenario, building a verification environment based on the CARLA 0.9.14 autonomous driving simulator. The diffusion model uses the DDIM backbone network, with 50 sampling steps and CFG weights of 2.0. The model to be optimized is an end-to-end autonomous driving decision-making model based on BEV. The training dataset uses the nuScenes and Waymo Open Dataset public driving datasets, generating 10,000 long-tail scenes, and completing two rounds of model iteration training. Four evaluation metrics are calculated: long-tail scene ODD coverage, scene realism MOS score, reduction in violation rate after training, and scene effectiveness.

[0081] Long-tail scene ODD coverage represents the proportion of low-probability scene subsets within the preset ODD boundary covered by the generated scene; a higher proportion indicates better coverage. Scene realism MOS score is calculated by inviting 10 autonomous driving simulation test engineers to rate the realism and reasonableness of the scene from 1 to 5 points, and the average score is taken. Post-training violation rate reduction refers to the percentage decrease in traffic violation rate on the same test set after model training compared to before training. Scene effectiveness refers to the percentage of generated scenes that pass the physical and traffic rule consistency verification. A comparison is made between rule-enumerated scene generation, general diffusion model generation, and the proposed solution; the results are shown in Table 1 below.

[0082] Table 1 As can be seen, the present invention significantly outperforms existing mainstream solutions in four core indicators: long-tail scene coverage, scene realism, model training gain, and scene effectiveness. It fully achieves the intended purpose of the invention and has outstanding substantive features and significant progress.

[0083] Corresponding to the above method, the present invention also provides an apparatus / system including a computer device, the computer device including a processor and a memory, the memory storing computer instructions, the processor executing the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the apparatus / system performs the steps of the method as described above.

[0084] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned edge computing server deployment method. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.

[0085] In summary, the autonomous driving long-tail scene construction method and apparatus of this invention extracts a baseline scene from real driving data and constructs a structured representation and controllable element parameter space, thereby defining counterfactual constraints composed of intervention sets and constraint sets. Based on this, using the baseline scene and counterfactual constraints as generation conditions, a conditional diffusion model is used, innovatively superimposed simultaneously during its denoising process, along with constraint-guided gradients ensuring compliance with physical dynamics and traffic rules, and risk-guided gradients aimed at enhancing the risk value of the scene, thus generating candidate long-tail scenes. Subsequently, frame-by-frame verification and parameter projection repair are performed using a dynamics verifier and a traffic rule verifier, and low-value and invalid scenes are eliminated based on a long-tail scoring model. Finally, the target scenes that meet the requirements are standardized, encapsulated, and stored. This technical solution improves the controllability, effectiveness, and realism of scene generation, realizes the directional generation capability and training gain of long-tail scenes, and enhances the reproducibility and auditability of scenes, significantly improving the safety verification efficiency and compliance of autonomous driving systems.

[0086] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this invention are programs or code segments used to perform the desired tasks. The programs or code segments can be stored in a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried in a carrier wave.

[0087] It should be clarified that the present invention is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of the present invention.

[0088] In this invention, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.

[0089] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations of the embodiments of the present invention are possible. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for constructing long-tail scenarios for autonomous driving, characterized in that, The method includes the following steps: Collect all real driving log data and extract multiple benchmark scene components that meet the long-tail scenario standard of autonomous driving systems; construct a structured benchmark scene representation for the benchmark scene components and mark the controllable element parameter space; wherein, the benchmark scene representation is a data sequence obtained by sensing the vehicle's own state and the state of traffic participants through multiple types of sensing devices and aligning the environmental state and map data; the controllable element parameter space is the set of elements that can be intervened in the benchmark scene representation and the range of values; Based on the controllable element parameter space, a counterfactual constraint condition including an intervention set and a constraint set is constructed. The intervention set is used to mark the elements that are allowed to be modified in this generation process, and the constraint set is used to mark the elements that must remain unchanged. Using the baseline scene representation and the counterfactual constraints as generation conditions, sampling is performed through a conditional diffusion model to generate candidate long-tail scene representations. Constraint-guided gradients and risk-guided gradients are established to correct the denoising direction. The dynamics verifier and traffic rule verifier are invoked to perform frame-by-frame verification of the candidate long-tail scene representation, and the non-compliant scene parameters are projected into the feasible domain and the trajectory is smoothed to achieve parameter correction. Based on a predefined long-tail scoring model, long-tail scores are calculated for the candidate long-tail scene representations after parameter correction. Risk-free low-value scenes and invalid scenes that are bound to collide and are outside the preset scoring range are eliminated to obtain multiple target long-tail scene representations. The target long-tail scene representation is standardized, encapsulated, and stored for later use.

2. The method for constructing long-tail scenarios for autonomous driving according to claim 1, characterized in that, The vehicle's own state and the traffic participant's state in the baseline scenario representation are represented by a BEV bird's-eye view of data generated by multiple types of sensing devices, including one or more cameras, lidar, and / or millimeter-wave radar; the vehicle's own state includes vehicle positioning, speed, heading, acceleration, power parameters, and control parameters; the traffic participant's state includes participant positioning, speed, heading, acceleration, and intent label; the map data includes lane topology, passable area markers, and traffic rule markers. The controllable element parameter space includes static environmental elements, dynamic interactive elements, and traffic rule elements. The static environmental elements include weather visibility, light intensity, road surface adhesion coefficient, and the location and size of obstructions. The dynamic interactive elements include traffic participant type, time of appearance, movement trajectory, time of crossing the road, and speed. The traffic rule elements include traffic light phase, speed limit, right-of-way rules, and the location of construction zones.

3. The method for constructing long-tail scenarios for autonomous driving according to claim 1, characterized in that, The conditional diffusion model uses a denoised diffusion implicit model or a denoised diffusion probabilistic model as the backbone network, with a diffusion sampling step count of 20 to 100 steps and a classifier-free guiding weight of 1 to 4.

4. The method for constructing long-tail scenarios for autonomous driving according to claim 3, characterized in that, The steps for calculating the constraint-guided gradient include: Calculate the degree of violation of dynamic constraints The calculation formula is: ； in, Represents the acceleration at time t. Indicates the maximum permissible acceleration. This represents the jerk at time t. Indicates the maximum permissible jerk. Let t represent the curvature of the vehicle's trajectory at time t. Indicates the maximum permissible curvature; The steps for calculating the degree of violation of traffic rule constraints include: Calculate the degree of violation of the minimum safe following distance constraint The calculation formula is: ； in, For standard safe distance, This represents the distance to the vehicle in front at time t; If the current traffic light phase is red, and the vehicle is far from the stop line Calculate the degree of violation of the red light stopping constraint. The calculation formula is: ； in, Indicates the current speed. Preset standard distance; Calculate the degree of violation of the traffic rule constraints The calculation formula is: ； Calculate the first differentiable loss The calculation formula is: ； in, , , and These are the weighting coefficients; Calculate the constrained guided gradient .

5. The method for constructing long-tail scenarios for autonomous driving according to claim 4, characterized in that, The steps for calculating the risk-guided gradient include: The pre-trained risk evaluator takes candidate long-tail scene representations generated by the current conditional diffusion model as input and outputs the probability of future collisions or violations to construct a second differentiable loss. Calculate the risk-guided gradient .

6. The method for constructing long-tail scenarios for autonomous driving according to claim 5, characterized in that, The pre-training steps of the risk assessor include: The samples are collected based on real driving scenarios. The samples include the sample scenario representation within a set time window. Whether a collision or violation will occur within a specified future time range is marked as a label to construct a training sample set. The sample scenario representation is a data sequence obtained by sensing the vehicle's own state and the state of traffic participants through multiple types of sensing devices and aligning the environmental state and map data. The initial neural network is trained using the training sample set. The initial neural network adopts a spatiotemporal graph neural network or a Transformer architecture neural network based on multi-agent attention. The sample scene representation within a set time window is used as input, and the output is a prediction of whether a collision or violation will occur within a specified future time range. The parameters of the initial neural network are updated based on the binary cross-entropy loss and focus loss of the prediction and the label to obtain the risk evaluator.

7. The method for constructing long-tail scenarios for autonomous driving according to claim 1, characterized in that, The dynamics verifier and the traffic rule verifier are discrete verification logic groups driven by physical rules, motion control rules and traffic rules; The parameters of the non-compliant scenes are projected into the feasible region and trajectory smoothing is performed to correct the parameters, including: Perform maximum threshold truncation and re-integration on state variables of speed, acceleration, and yaw rate that exceed the limits; For trajectory points that exceed the boundary of the passable area, the point with the closest Euclidean distance on the boundary is found as the repair point, and the trajectory points before and after repair are smoothed by spatial geometric interpolation using B-spline curves or the least squares method.

8. The method for constructing long-tail scenarios for autonomous driving according to claim 6, characterized in that, The calculation formula for the long-tail rating model is: ； in, This represents the collision or violation probability output by the risk evaluator for the candidate long-tail scenario, with a value range of [0,1]. This indicates the rarity of the candidate long-tail scene in the real road sampling data, with a value range of [0,1]. This represents the prediction uncertainty of the autonomous driving model to be optimized for the candidate long-tail scenario, with a value range of [0,1]. , and These are the weighting coefficients; The preset scoring range is [0.2, 0.8]. A long-tail score below 0.2 indicates a low-value scenario with no risk, while a long-tail score above 0.8 indicates a scenario that is bound to collide or violate the rules and is therefore invalid.

9. The method for constructing long-tail scenarios for autonomous driving according to claim 1, characterized in that, The target long-tail scenario representation is standardized and encapsulated, including adding fully traceable fields to the target long-tail scenario representation, including a unique scenario identifier, the long-tail score, a counterfactual generation random seed, an intervention element difference vector, a physical verification result, a risk score, a conditional diffusion model version, a list hash value, a generation timestamp, and a baseline scenario hash value. Furthermore, risk labeling is performed, and risk types, risk occurrence times, collision or violation risk point coordinates, safety margins, and expected takeover and / or obstacle avoidance actions triggered by the autonomous driving system are labeled frame by frame for the target long-tail scene representation, generating a scene risk heatmap; and an immutable audit record is generated synchronously for each target long-tail scene representation. The audit record is linked in a hash chain, wherein the hash value of the current frame is obtained by hashing the hash value of the previous frame and the audit record content of the current frame, and the hash value is digitally signed to achieve tamper-proof and end-to-end traceability.

10. An autonomous driving long-tail scenario construction device, comprising a processor, a memory, and a computer program or instructions stored in the memory, characterized in that, The processor is configured to execute the computer program or instructions, and when the computer program / instructions are executed, the device implements the steps of the method as described in any one of claims 1 to 9.