Automatic driving control method and device, target vehicle, medium and program product

By acquiring and fusing heterogeneous multi-source data from the target vehicle, identifying and quantifying uncertainties, and generating strategic intent and control commands, the system addresses the safety risks and insufficient robustness of autonomous driving systems in complex scenarios, achieving highly reliable autonomous driving control.

CN121912995BActive Publication Date: 2026-06-26CHONGQING CHANGAN AUTOMOBILE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING CHANGAN AUTOMOBILE CO LTD
Filing Date
2026-03-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing autonomous driving systems face high safety risks and insufficient robustness in complex open-world scenarios, mainly due to the lack of explicit modeling and effective utilization of uncertainty, leading to overconfidence or erroneous action decisions.

Method used

By acquiring heterogeneous multi-source data of the target vehicle, fusion features are generated, uncertainty levels are identified and quantified, strategic intentions are generated, and control commands are generated to ensure the reliability and security of decision-making.

Benefits of technology

It improves the safety and stability of autonomous driving systems in complex scenarios and enhances user trust, reduces the probability of overconfident decision-making, and enables precise implementation from high-level decision-making to low-level execution.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121912995B_ABST
    Figure CN121912995B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of automatic driving, in particular to an automatic driving control method and device, a target vehicle, a medium and a program product. The method comprises: acquiring current heterogeneous multi-source data corresponding to a target vehicle; generating first fusion features based on the current heterogeneous multi-source data; identifying the first fusion features to determine target uncertainty of the target vehicle to a current driving scene; determining an uncertainty level corresponding to the target vehicle based on the target uncertainty, and generating a target strategic intention corresponding to the uncertainty level; identifying the first fusion features and the target strategic intention to generate a target control instruction corresponding to the target vehicle; and transmitting the target control instruction to an execution mechanism corresponding to the target vehicle to control the execution mechanism to execute the target control instruction for automatic driving. The present application solves the core defect of excessive self-confidence of the existing system, ensures the safe and stable operation of the target vehicle in a complex open scene, and effectively improves the reliability and user trust of the automatic driving system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of autonomous driving technology, specifically to autonomous driving control methods, devices, target vehicles, media, and program products. Background Technology

[0002] In recent years, with the rapid iteration of deep neural networks, large-scale pre-trained models and multimodal large model technologies, coupled with the continuous breakthroughs in multi-sensor fusion technology, the performance of autonomous driving systems in core sub-tasks such as environmental perception, trajectory prediction and behavior planning has been significantly improved, and the application boundaries are gradually being expanded to complex urban scenarios and open road scenarios.

[0003] Currently, solutions such as Tesla FSD and Wayve autonomous driving systems are able to complete routine intelligent driving tasks on structured roads; the emerging end-to-end vision-language-action (VLA) model has shown potential in natural language command following and end-to-end behavior generation, providing a new technical path for human-machine interaction and task execution in autonomous driving.

[0004] However, in complex open-world application scenarios, existing autonomous driving systems still face key technological bottlenecks such as high safety risks and insufficient robustness. This is mainly reflected in the lack of explicit modeling and effective utilization of accidental uncertainties caused by irreducible factors such as sensor noise and environmental randomness, and / or cognitive uncertainties caused by model structure defects and biased training data distribution during autonomous driving. Consequently, in highly uncertain scenarios, these systems are prone to making overconfident or erroneous action decisions.

[0005] In summary, how to construct an autonomous driving control method that takes into account uncertainty has become a key issue in breaking through existing technological bottlenecks and promoting the evolution of autonomous driving systems towards higher safety levels. Summary of the Invention

[0006] This invention provides an autonomous driving control method, device, target vehicle, medium, and program product to address the problem of how to construct an autonomous driving control method that takes into account uncertainties.

[0007] In a first aspect, the present invention provides an autonomous driving control method, the method comprising: acquiring current heterogeneous multi-source data corresponding to a target vehicle; the current heterogeneous multi-source data including current visual data, current laser point cloud data, current natural language command data, and current vehicle state data; generating a first fusion feature based on the current heterogeneous multi-source data; identifying the first fusion feature to determine the target uncertainty of the target vehicle in the current driving scenario; the target uncertainty characterizing the comprehensive quantitative assessment of the autonomous driving system's own decision reliability in the current driving scenario; determining the uncertainty level corresponding to the target vehicle based on the target uncertainty; generating a target strategic intent corresponding to the uncertainty level based on the uncertainty level; the target strategic intent characterizing the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario; identifying the first fusion feature and the target strategic intent to generate a target control command corresponding to the target vehicle; the target control command including: steering angle control command, throttle control command, and brake control command; transmitting the target control command to the actuator corresponding to the target vehicle, and controlling the actuator to execute the target control command for autonomous driving.

[0008] The autonomous driving control method provided in this application acquires current heterogeneous multi-source data corresponding to the target vehicle, comprehensively collecting the basic information required for autonomous driving decision-making, covering four dimensions: visual environment, three-dimensional space, user intent, and vehicle's own state. This avoids decision-making bias caused by the absence or failure of a single data source, laying a data foundation for subsequent multimodal fusion and accurate decision-making. Based on the current heterogeneous multi-source data, a first fusion feature is generated. Through multimodal fusion technology, the heterogeneous data is transformed into a unified feature vector, eliminating the heterogeneity barriers between different data sources, and mining deep semantic relationships between vision, laser point clouds, language commands, and vehicle state. This improves the feature's ability to represent complex scenes and avoids the interference of low-quality data in early fusion and the semantic fragmentation problem in later fusion. Based on the first fusion feature, target uncertainty is determined. By classifying and quantifying two types of uncertainty, the core defect of "overconfidence" in existing systems is addressed. It identifies cognitive uncertainty caused by model knowledge gaps and assesses accidental uncertainty caused by environmental interference. The lack of explicit modeling and effective utilization of "uncertainty" leads to an overconfident or erroneous action decision rate in highly uncertain scenarios. Then, based on the target uncertainty, the target's strategic intent is determined. By treating target uncertainty as a core element of decision-making, dynamic matching of "risk-strategy" is achieved. Based on the strategic intent of the target, target control commands are generated, transforming the abstract strategic intent into standardized control parameters (steering, throttle, brake) that can be directly recognized by the vehicle's actuators. This ensures precise implementation from "high-level decision-making" to "low-level execution," guaranteeing the smoothness and consistency of driving actions and avoiding abrupt behavioral changes. Finally, based on the target control commands, the target vehicle is controlled. The decision results are transformed into actual driving behavior. Combined with the target uncertainty assessment and strategy optimization in the preceding steps, the safe and stable operation of the target vehicle in complex open scenarios is ensured, effectively improving the reliability of the autonomous driving system and user trust.

[0009] In one optional implementation, identifying the first fusion feature to determine the target vehicle's target uncertainty in the current driving scenario includes: acquiring a historical context sequence within a preset time period prior to the current moment corresponding to the target vehicle, and a historical similarity sequence corresponding to the historical context sequence; wherein, the historical context sequence includes historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle state data corresponding to each historical moment; the historical similarity sequence is used to characterize the similarity between the historical heterogeneous multi-source data corresponding to each historical moment and the current heterogeneous multi-source data corresponding to the current moment; fusing the first fusion feature, the historical context sequence, and the historical similarity sequence to generate a second fusion feature; performing independent forward propagation processing on the second fusion feature to generate at least one preliminary strategic intent and a first intent uncertainty corresponding to each preliminary strategic intent; determining the cognitive uncertainty corresponding to the target vehicle based on the first intent uncertainty corresponding to each preliminary strategic intent; and generating target uncertainty based on the cognitive uncertainty.

[0010] The autonomous driving control method provided in this application acquires historical context sequences and historical similarity sequences, introducing historical spatiotemporal information and similarity association criteria. The historical context sequence supplements the target vehicle's environmental, instruction, and state data within a preset time period, avoiding "short-sighted" decision-making. The historical similarity sequence quantifies the matching degree between historical and current scenarios, providing a quantitative standard for selecting effective historical experience, effectively improving the adaptability of decision-making to continuous temporal scenarios. A second fusion feature is generated by fusing the first fusion feature, historical context sequence, and historical similarity sequence. This achieves deep fusion of current scenario features and effective historical experience. Compared to the first fusion feature, which relies solely on current data, the second fusion feature combines similarity-weighted historical information, preserving the real-time nature of the current scenario while incorporating decision-making experience from historical scenarios. This avoids feature bias caused by noise in data at a single moment, improving the accuracy of feature representation for complex dynamic scenarios. The second fusion feature is independently forward-propagated to generate preliminary strategic intent and the uncertainty of the first intent. By independently forwarding, multiple preliminary strategic intentions and corresponding uncertainty levels are output, breaking the limitations of single intention output; at the same time, each preliminary strategic intention is equipped with a dedicated first-intention uncertainty assessment, achieving "one intention..." Figure 1The refined labeling of "risk" provides a direct quantitative basis for subsequent intent screening, improving the redundancy and robustness of decision-making. Cognitive uncertainty is determined based on the uncertainty of each primary intent. Thus, from the uncertainty of the initial strategic intent, cognitive uncertainty caused by factors such as insufficient model knowledge and out-of-demand (OOD) scenarios is accurately extracted, achieving explicit quantification of "the model's own cognitive limitations." This solves the problem of existing technologies being unable to distinguish between cognitive and accidental uncertainties, providing a core basis for subsequent risk assessment. Then, based on cognitive uncertainty, target uncertainty is generated, completing the integration from intent-level uncertainty to global target uncertainty. The dispersed initial strategic intent risk assessments are summarized into target uncertainty that reflects the overall decision-making risk of the current scenario, providing a unified risk judgment standard for the final selection of subsequent strategic intents. This effectively solves the problem of one-sided uncertainty modeling in existing systems and reduces the probability of "overconfident" decisions.

[0011] In one optional implementation, generating target uncertainty based on cognitive uncertainty includes: processing the second fusion features based on heteroscedasticity regression head to generate second intent uncertainty corresponding to each preliminary strategic intent; determining the random uncertainty corresponding to the target vehicle based on the second intent uncertainty corresponding to each preliminary strategic intent; and fusing cognitive uncertainty and random uncertainty to generate target uncertainty.

[0012] The autonomous driving control method provided in this application generates a second intention uncertainty based on a heteroscedastic regression head. The heteroscedastic regression head directly processes the second fused features, outputting the second intention uncertainty caused by irreducible factors such as sensor noise, environmental interference, and the randomness of target motion. This evaluation can be completed without repeated sampling, balancing real-time performance and accuracy, and effectively characterizing the inherent interference of the environment. Based on the second intention uncertainty, random uncertainty is determined. Integrating the second intention uncertainty into a global random uncertainty achieves a unified quantification of inherent environmental interference, providing a comprehensive environmental risk assessment for decision-making and avoiding the misleading influence of single sensor failure or local interference on overall decision-making. Cognitive uncertainty and random uncertainty are fused to generate target uncertainty. Through weighted fusion, the limitations of model cognition and inherent environmental interference are integrated into a unified target uncertainty, achieving a refined and comprehensive assessment of decision-making risks. This provides a precise quantitative standard for subsequent strategic intention selection, fundamentally reducing the probability of overconfident decision-making and improving the robustness and safety of the system in complex open scenarios.

[0013] In one optional implementation, obtaining the historical context sequence and the historical similarity sequence corresponding to the historical context sequence within a preset time period before the current time for the target vehicle includes: storing the historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle state data corresponding to the target vehicle at each historical time; calculating the similarity between the current visual data, current laser point cloud data, current natural language command data, and current vehicle state data and the historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle state data corresponding to each historical time, and generating the historical similarity sequence corresponding to the historical context sequence.

[0014] The autonomous driving control method provided in this application stores heterogeneous multi-source data from various historical moments, constructs a historical context sequence, and retains full-dimensional historical heterogeneous multi-source data of the target vehicle's past operations, covering four key types of information: vision, laser point clouds, voice commands, and vehicle status. This provides a data foundation for subsequent mining of the correlation between historical and current scenarios. Simultaneously, the structured storage of historical heterogeneous multi-source data supports the reuse of driving scenario experience, avoiding repeated decision-making trials in similar scenarios and improving decision-making efficiency. The similarity between current data and historical heterogeneous multi-source data is calculated to generate a historical similarity sequence. By quantifying the matching degree between current heterogeneous multi-source data and data from various historical moments, the output historical similarity sequence can accurately filter historical experiences highly relevant to the current scenario. Historical heterogeneous multi-source data with high similarity can be used as a decision reference, while data with low similarity can have its weight reduced. This design avoids interference from invalid historical information and allows the system to learn from successful decision-making experiences in similar scenarios, improving the robustness and accuracy of decisions in complex dynamic scenarios.

[0015] In one optional implementation, identifying the first fused feature and the target strategic intent to generate a target control command corresponding to the target vehicle includes: extracting features from the first fused feature to obtain a first local feature; acquiring historical vehicle state data within a preset time period prior to the current moment corresponding to the target vehicle; mapping each sub-state data in the historical vehicle state data to a state feature label through a linear embedding layer; mapping the first local feature to a scene feature label with the same dimension as the state feature label; encoding the target strategic intent into a task condition feature label; fusing the state feature label, scene feature label, and task condition feature label to generate a target feature label; and generating a target control command based on the target feature label.

[0016] The autonomous driving control method provided in this application extracts first local features from the first fused features. From the global first fused features, it accurately extracts local features strongly correlated with the current decision, eliminates redundant information, focuses on core scene elements, reduces the computational complexity of subsequent model processing, and improves the real-time performance and accuracy of the decision. It acquires historical vehicle state data within a preset time period, incorporating past vehicle dynamic state information to supplement the temporal continuity of the decision, avoiding short-sighted decisions based solely on the current state, and ensuring that the subsequently generated behavior sequence conforms to the vehicle's physical motion laws. It maps historical sub-state data to state feature labels. Through a linear embedding layer, heterogeneous historical vehicle sub-state data (such as vehicle speed and steering angle) are transformed into feature labels with unified dimensions, eliminating data format differences and enabling direct processing by subsequent models, laying the foundation for cross-type feature fusion. It maps the first local features to scene feature labels of the same dimension. It performs dimensional normalization processing on the scene local features to ensure consistency with the dimension of the state feature labels, resolving the incompatibility problem of different types of features and ensuring efficient fusion of scene information and vehicle state information. It encodes the target strategic intent into task condition feature labels. This approach transforms abstract high-level strategic intentions into model-recognizable feature tags, clarifies the top-level constraints for generating subsequent behavioral sequences, and ensures that generated control commands strictly adhere to decision-making objectives, avoiding a disconnect between intention and execution. By fusing three types of feature tags to generate target feature tags, it achieves the integration of vehicle state, scene environment, and task intent. Figure 3 The deep integration of core information generates target feature tags that simultaneously carry temporal, environmental, and intent constraints, providing a comprehensive and unified decision-making basis for subsequent control command generation and improving the rationality and robustness of decisions. Based on these target feature tags, target control commands are generated, transforming the fused target feature tags into control parameters that vehicle actuators can directly recognize. This enables precise implementation from high-level decision-making to low-level execution, ensuring the smoothness and consistency of control commands and improving the safety and comfort of autonomous driving.

[0017] In one optional implementation, generating target control instructions based on target feature tags includes: inputting target feature tags into a preset control instruction generation model; the preset control instruction generation model encodes the target feature tags and, based on a self-attention mechanism, predicts candidate behavior sequences at least one future time step, as well as candidate confidence scores and candidate behavior uncertainty scores corresponding to each candidate behavior; based on the candidate confidence scores and candidate behavior uncertainty scores corresponding to each candidate behavior, determines target behavior sequences where the candidate confidence scores are greater than a preset confidence score threshold and the candidate behavior uncertainty scores are lower than a preset uncertainty score threshold; and generates target control instructions based on the target behavior sequences and the target confidence scores and target behavior sequence uncertainty scores corresponding to the target behavior sequences.

[0018] The autonomous driving control method provided in this application's embodiments inputs target feature tags into a preset control command generation model. A unified feature tag integrating vehicle state, scene environment, and task intent is input into a dedicated model for processing, achieving precise adaptation between features and the model. This provides standardized input for subsequent behavior sequence reasoning, ensuring the standardization and efficiency of the decision-making process. The preset control command generation model encodes target feature tags and predicts candidate behavior sequences and evaluation metrics. A self-attention mechanism is used to mine state temporal dependencies and intent-state alignment relationships, outputting candidate behavior sequences for the next T time steps, along with confidence scores and uncertainty scores. This ensures both the physical compliance and intent matching degree of the behavior sequences and enables quantitative evaluation of each candidate sequence, providing a clear basis for subsequent selection and improving decision robustness. Target behavior sequences are selected based on confidence and uncertainty thresholds. By setting a dual selection condition of "high confidence + low uncertainty," candidate sequences with low matching degree and high risk are accurately filtered out, ensuring that the final selected target behavior sequence balances reliability and safety, effectively avoiding high-risk decisions and solving the problem of the model "overconfident" outputting erroneous commands. Target control commands are generated based on the target behavior sequences and evaluation metrics. The selected optimal behavior sequence is transformed into control commands that can be executed by the vehicle. At the same time, the command parameters are optimized by referring to their confidence and uncertainty scores. This achieves the accurate implementation from abstract behavior sequence to specific control parameters, ensuring the smoothness and safety of control commands and improving the execution accuracy of the autonomous driving system.

[0019] In one optional implementation, a target control command is generated based on the target behavior sequence, the target confidence score corresponding to the target behavior sequence, and the target behavior sequence uncertainty score. This includes: performing feature fusion processing on the target behavior sequence, the target confidence score corresponding to the target behavior sequence, the target behavior sequence uncertainty score, the current vehicle state data, and the first local features to obtain a third fused feature; inputting the third fused feature into a fully connected multilayer perceptron, mapping it to the control parameter space through a nonlinear activation function, and outputting a triplet target control command.

[0020] The autonomous driving control method provided in this application integrates multi-dimensional information to generate a third fusion feature. It deeply integrates the target behavior sequence, its confidence and uncertainty scores, current vehicle state data, and first local scene features, combining four key types of information: decision-making scheme, risk assessment, vehicle physical state, and core environmental features. This compensates for the lack of real-time state and scene feedback in a single behavior sequence, providing a comprehensive and accurate decision-making basis for control command generation, ensuring that the commands not only conform to the planning intent but also adapt to the real-time state of the vehicle and environment. Based on a fully connected multilayer perceptron, it outputs triplet target control commands. Through a fully connected multilayer perceptron and a nonlinear activation function, the third fusion feature is mapped into standardized steering angle, throttle, and brake triplet control commands, achieving a precise conversion from abstract fusion features to physical parameters directly recognizable by the vehicle actuators. Simultaneously, the coordinated output of the triplet commands ensures the smoothness and coordination of driving actions, avoiding safety risks caused by sudden changes in a single command, and improving the stability and safety of the autonomous driving execution process.

[0021] In one optional implementation, after transmitting the target control command to the actuator corresponding to the target vehicle and controlling the actuator to execute the target control command for autonomous driving, the method further includes: storing the target strategic intent of the target vehicle at the current moment, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command and the target behavior sequence corresponding to the target control command; comparing the target uncertainty with a preset uncertainty threshold; if the target uncertainty is greater than the preset uncertainty threshold, generating a memory cue vector based on the target strategic intent, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, and the target control command and the target behavior sequence corresponding to the target control command; and generating the next-next-moment strategic intent corresponding to the next moment at the current moment based on the memory cue vector.

[0022] The autonomous driving control method provided in this application stores the entire decision-making process at the current moment. It comprehensively retains core decision-making information such as the target vehicle's strategic intent, two types of uncertainty data, control commands, and behavioral sequences at the current moment, constructing a decision-making experience database for the autonomous driving system. This provides data support for subsequent fault tracing and solution optimization, and lays the foundation for experience reuse in similar scenarios, avoiding repeated trial and error. By comparing the target uncertainty with a preset threshold, the risk level of the current decision is quickly identified through threshold quantification, distinguishing between "low-risk routine scenarios" and "high-risk complex scenarios." This provides a clear trigger for whether to activate the experience memory mechanism, ensuring that system resources are accurately used for decision optimization in high-risk scenarios. In high-uncertainty scenarios, a memory prompt vector is generated. When the target uncertainty exceeds the threshold, a memory prompt vector is generated based on the current decision-making process data, realizing the structured encapsulation of decision-making experience in high-risk scenarios. This vector focuses on core risk elements (cognitive / accidental uncertainty) and decision-making schemes, avoiding invalid data redundancy and improving the efficiency and accuracy of subsequent experience retrieval. The strategic intent for the next moment is generated based on the memory prompt vector. This enables the system to possess experience-driven self-correcting decision-making capabilities, directly guiding the generation of strategic intent for the next moment by reusing decision-making experience from current high-risk scenarios. This reduces the probability of decision-making errors in similar high-risk scenarios and enhances the system's adaptability to long-tail and complex dynamic scenarios, fundamentally strengthening the robustness and safety of the autonomous driving system.

[0023] In a second aspect, the present invention provides an automatic driving control device, the device comprising:

[0024] The acquisition module is used to acquire the current heterogeneous multi-source data corresponding to the target vehicle; the current heterogeneous multi-source data includes current visual data, current laser point cloud data, current natural language command data, and current vehicle status data;

[0025] The first generation module is used to generate the first fusion feature based on the current heterogeneous multi-source data;

[0026] The first determining module is used to identify the first fused features and determine the target uncertainty of the target vehicle in the current driving scenario; the target uncertainty represents the comprehensive quantitative assessment of the reliability of the autonomous driving system's own decision-making in the current driving scenario.

[0027] The second determination module is used to determine the uncertainty level of the target vehicle based on the target uncertainty; generate the target strategic intent corresponding to the uncertainty level based on the uncertainty level; the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario.

[0028] The second generation module is used to identify the first fused features and the target strategic intent, and generate target control commands corresponding to the target vehicle; the target control commands include: steering angle control commands, throttle control commands and brake control commands;

[0029] The control module is used to transmit target control commands to the actuators corresponding to the target vehicle, and control the actuators to execute the target control commands for autonomous driving.

[0030] Thirdly, the present invention provides an electronic device, comprising: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the automatic driving control method of the first aspect or any corresponding embodiment described above.

[0031] Fourthly, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the automatic driving control method of the first aspect or any corresponding embodiment thereof.

[0032] Fifthly, the present invention provides a computer program product, including computer instructions for causing a computer to execute the automatic driving control method of the first aspect or any corresponding embodiment described above. Attached Figure Description

[0033] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0034] Figure 1 This is a schematic flowchart of a first type of autonomous driving control method according to an embodiment of the present invention;

[0035] Figure 2 This is a schematic diagram of a second type of automatic driving control method according to an embodiment of the present invention;

[0036] Figure 3 This is a schematic diagram of the framework of an autonomous driving control system according to an embodiment of the present invention;

[0037] Figure 4 This is a schematic diagram of the framework of another automatic driving control system according to an embodiment of the present invention;

[0038] Figure 5 This is a structural block diagram of an automatic driving control device according to an embodiment of the present invention;

[0039] Figure 6 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present invention. Detailed Implementation

[0040] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0041] It is understood that before using the technical solutions disclosed in the various embodiments of the present invention, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in the present invention and their authorization should be obtained in accordance with relevant laws and regulations through appropriate means.

[0042] The terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0043] According to an embodiment of the present invention, an embodiment of an autonomous driving control method is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0044] This embodiment provides an autonomous driving control method that can be used in electronic devices in a target vehicle. Figure 1 This is a flowchart of an autonomous driving control method according to an embodiment of the present invention, such as... Figure 1 As shown, the process includes the following steps:

[0045] Step S101: Obtain the current heterogeneous multi-source data corresponding to the target vehicle.

[0046] The current heterogeneous multi-source data includes current visual data, current laser point cloud data, current natural language command data, and current vehicle status data.

[0047] Specifically, the target vehicle is equipped with six multi-view cameras, corresponding to six directions: forward view (main camera + wide-angle), left front side view, right front side view, left rear side view, right rear side view, and rear view, forming a 360° blind-spot-free field of view coverage. The electronic equipment can capture environmental visual information in real time based on the six multi-view cameras, including road structure such as lane width and road surface smoothness, traffic participants such as vehicles, pedestrians, and non-motorized vehicles, lane lines such as solid lines, dashed lines, and directional arrows, traffic signs such as prohibition, instruction, and warning signs, falling objects, construction barriers, and other dynamic obstacles. Then, it outputs a raw RGB image stream, with the resolution and frame rate adaptively adjusted according to the hardware configuration, thereby obtaining the current visual data.

[0048] The target vehicle is equipped with a 32- to 128-line mechanical or solid-state LiDAR with a sampling frequency of no less than 10Hz. The mechanical or solid-state LiDAR emits laser beams into the environment and calculates the distance and angle between obstacles and the vehicle (target vehicle) by receiving reflected signals. This generates high-precision 3D current laser point cloud data, acquiring spatial geometric information of objects in the environment, including size, position, and shape. It provides reliable depth perception and obstacle contour recognition capabilities in scenarios where pure vision systems are prone to failure, such as low light, backlight, rain, fog, and visual blur.

[0049] The electronic devices in the target vehicle can receive two types of user commands via voice interaction or touch input interfaces: 1. Navigation target commands: such as "go to the nearest gas station" or "navigate to the underground parking lot of XX shopping mall"; 2. Driving style preference commands: such as "drive smoothly" or "prioritize pedestrian avoidance". The electronic devices can use Natural Language Processing (NLP) algorithms to segment, tag, and identify intent in the commands, transforming unstructured natural language into structured high-level intent constraints (such as "target point coordinates: X, Y; driving constraints: vehicle speed ≤ 50 km / h, pedestrian avoidance priority is the highest"), thus obtaining the current natural language command data.

[0050] In addition, the electronic equipment can also collect the vehicle dynamics parameters and motion status of the target vehicle in real time via the vehicle's CAN bus, specifically including current vehicle speed, longitudinal / lateral acceleration, steering wheel angle, yaw rate, brake pedal opening, and accelerator pedal opening. Then, the electronic equipment can filter and denoise the collected parameters to eliminate sensor noise and signal delay, ensuring data accuracy and thus obtaining the current vehicle status data.

[0051] Step S102: Generate the first fusion feature based on the current heterogeneous multi-source data.

[0052] Specifically, electronic devices can input current visual data (i.e., RGB images) into a pre-trained VisionTransformer (ViT) model. The pre-trained VisionTransformer (ViT) model uniformly divides the input RGB image into fixed-size image patches, with a default size of 16×16 pixels, which can be flexibly adjusted to larger sizes (e.g., 32×32 pixels) based on computing resources, reducing the computational complexity of high-resolution images. Then, the pixel values ​​of all image patches are converted into one-dimensional vectors, positional encoding information (representing the position of the image patch in the original image) is added, and this is input into the encoder layer of the ViT model. A multi-head self-attention mechanism captures the global correlations between image patches, ultimately outputting a high-dimensional visual feature vector F. vision This vector contains high-level semantic information such as object category, scene layout, and traffic status.

[0053] Electronic devices can use the PointNet++ architecture to extract features from current laser point cloud data. This PointNet++ architecture is specifically designed for unstructured 3D point cloud data and supports local feature aggregation and multi-scale feature extraction.

[0054] The PointNet++ architecture samples (downsampling to reduce the number of points and upsampling to fill in sparse regions) and groups (dividing spatially adjacent points into local point sets) the current laser point cloud data, reducing computational cost and highlighting local geometric features. Then, through multi-level set abstraction operations, features are extracted from each local point set, progressively aggregating local geometric features (such as the center point, normal vector, and curvature of the point set) with global context features to generate 3D point cloud features. These 3D point cloud features are projected onto the bird's-eye view (BEV) plane, transforming them into a 2D feature map, achieving spatial alignment with visual features, and finally outputting the point cloud feature vector F. point .

[0055] Electronic devices can use pre-trained BERT-like models (such as RoBERTa and ALBERT) to extract features from current natural language instruction data, leveraging their semantic representation capabilities learned on large-scale text corpora.

[0056] Pre-trained BERT-like models can segment structured current natural language instruction data, breaking continuous text into independent words (e.g., "go to the nearest gas station" is broken down into "go to / nearest / of / gas station"), and adding special markers ([CLS] indicates the beginning of a sentence, [SEP] indicates the end of a sentence). Then, the segmented words are converted into fixed-dimensional word vectors, which are input into the encoder layer of the BERT model. A multi-head self-attention mechanism captures the semantic relationships between words, ultimately outputting a language semantic feature vector F. languageThis vector contains core intent information such as navigation target and driving constraints.

[0057] Electronic devices can normalize the sub-state data in the current vehicle state data, and then generate a vehicle state feature vector F based on the normalized sub-state data. vehicle .

[0058] Electronic devices can introduce four learnable linear projection matrices W. vision W point W language and W vehicle For the visual feature vector F respectively vision Point cloud feature vector F point Language feature vector F language and vehicle state feature vector F vehicle A linear transformation is performed to map the four types of features to the same high-dimensional semantic space, eliminating the differences in feature distribution between modalities. Then, the electronic device performs a weighted sum of the four types of mapped features. The weight coefficients are automatically learned during the model training process, realizing dynamic weight allocation for features of different modalities. For example, in strong light scenes, the weight of visual features is reduced while the weight of point cloud features is increased.

[0059] To avoid gradient vanishing or exploding during feature fusion, residual connections are introduced (adding the original features to the fused features), and layer normalization is used to unify the feature distribution, eliminating redundant information and feature conflicts between modalities.

[0060] Then, the electronic device generates a unified first fusion feature F1, which is calculated using the following formula: .

[0061] Step S103: Identify the first fused feature to determine the target vehicle's uncertainty in the current driving scenario.

[0062] In this context, target uncertainty represents a comprehensive quantitative assessment of the reliability of an autonomous driving system's decision-making in the current driving scenario. Target uncertainty includes cognitive uncertainty and / or random uncertainty. Cognitive uncertainty refers to the decision-making uncertainty arising from the autonomous driving system's own knowledge limitations or outside-of-delivery (OOD) conditions, reflecting the model's degree of cognitive ambiguity and judgment discrepancies regarding the current scenario. Random uncertainty refers to the decision-making uncertainty caused by irreducible factors such as sensor noise, environmental interference, or the randomness of target motion, reflecting the inherent interference of the environment itself. This type of uncertainty cannot be eliminated by optimizing the model or supplementing training data.

[0063] Specifically, the electronic device can extract features from the first fused features and, based on the extracted features, determine the target uncertainty of the target vehicle in the current driving scenario.

[0064] This step will be explained in detail below.

[0065] Step S104: Based on the uncertainty of the target, determine the uncertainty level corresponding to the target vehicle; based on the uncertainty level, generate the target strategic intent corresponding to the uncertainty level.

[0066] Among them, the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario.

[0067] Specifically, electronic devices can classify uncertainty into three levels based on the magnitude of the uncertainty and the requirements of autonomous driving scenarios. For example, Table 1 below illustrates the uncertainty levels.

[0068] Table 1 shows the uncertainty levels.

[0069]

[0070] Optionally, if there are abnormalities in key safety parameters, such as tire pressure below the threshold or brake system failure, the uncertainty level can be directly increased by one level, for example, the original "low uncertainty" can be revised to "medium uncertainty" to ensure safety as a safety net.

[0071] Then, the electronic device can filter key scene elements that are strongly related to strategic intent from the first fusion features. Key scene information includes three categories: environmental perception information: types of obstacles ahead (including pedestrians, vehicles, non-motorized vehicles, etc.), obstacle motion status (such as obstacles that are stationary, moving at a constant speed, or accelerating or decelerating), lane line integrity, traffic light status, etc.; user instruction information includes: instruction types such as path planning, speed adjustment, lane changing and overtaking, instruction priority indicating whether the instruction is mandatory or non-mandatory, and instruction quantification standards (such as "speed limit 60km / h" is a quantified instruction, "drive faster" is a non-quantified instruction); vehicle status information: current vehicle speed, acceleration, remaining range, braking system or steering system status, etc. (i.e., the normalized vehicle status feature vector).

[0072] For example, the key scene information extracted by the first fusion feature is: there is a stationary obstacle 50m ahead + the current vehicle speed is 60km / h + the user command is "go straight through" + the visual recognition confidence level is 90% + the vehicle status is normal. Combined with the uncertainty value U=0.2, it is determined to be a low uncertainty scene.

[0073] Then, the electronic device can determine the target's strategic intent based on a pre-defined rule base that maps uncertainty levels, key scenario information, and the target's strategic intent. This rule base covers all scenarios and satisfies the "decision priority at different levels." The rule base adopts a "condition-action" (IF-THEN) format and is designed hierarchically according to uncertainty levels.

[0074] For example, a subset of rules for low uncertainty level (efficiency priority) is: uncertainty level = low, environmental information = clear lane lines + no obstacles, instruction information = maintain current speed, then the target strategic intent = constant speed cruise, maintain lane, maintain current speed.

[0075] The subset of rules for medium uncertainty level (safety-efficiency balance) is as follows: uncertainty level = medium, environmental information = rainy weather + unclear pedestrian walking trajectory + distance 30m, vehicle status = current vehicle speed 50km / h, then the target strategic intent = lightly apply the brakes to reduce speed to 30km / h, maintain a safe distance, continuously track the pedestrian trajectory, and resume the original speed after confirming that there is no risk.

[0076] The high uncertainty level (safety priority) rule subset is: uncertainty level = high, environmental information = road construction + sensor obstruction + obstacle location unknown; then the target strategic intent = emergency deceleration to 10km / h, turn on hazard lights, continuously scan the environment, and if the risk is not eliminated, pull over and request manual takeover.

[0077] Step S105: Identify the first fusion feature and the target strategic intent, and generate the target control command corresponding to the target vehicle.

[0078] The target control commands include: steering angle control commands, throttle control commands, and brake control commands.

[0079] Specifically, electronic devices can generate target behaviors based on the target's strategic intent. Then, based on the target behaviors, they can generate target control commands corresponding to the target vehicle.

[0080] This step will be explained in detail below.

[0081] Step S106: The target control command is transmitted to the actuator corresponding to the target vehicle, and the actuator is controlled to execute the target control command for autonomous driving.

[0082] Specifically, electronic devices can send target control commands to the vehicle's ECU (Electronic Control Unit), EPS (Electric Power Steering), ESC (Electronic Stability Control), and other execution modules via the vehicle's CAN bus or Ethernet. The execution modules then perform actions according to the command parameters (e.g., the EPS adjusts the steering wheel according to the steering angle command, and the braking system decelerates according to the braking force command). Vehicle status sensors collect the execution results in real time (e.g., actual vehicle speed, actual steering wheel angle) and feed them back to the decision-making level. If the deviation from the command exceeds a threshold (e.g., the command to decelerate to 30 km / h, but the actual speed is 35 km / h), a command correction is triggered.

[0083] The autonomous driving control method provided in this application acquires current heterogeneous multi-source data corresponding to the target vehicle, comprehensively collecting the basic information required for autonomous driving decision-making, covering four dimensions: visual environment, three-dimensional space, user intent, and vehicle's own state. This avoids decision-making bias caused by the absence or failure of a single data source, laying a data foundation for subsequent multimodal fusion and accurate decision-making. Based on the current heterogeneous multi-source data, a first fusion feature is generated. Through multimodal fusion technology, the heterogeneous data is transformed into a unified feature vector, eliminating the heterogeneity barriers between different data sources, and mining deep semantic relationships between vision, laser point clouds, language commands, and vehicle state. This improves the feature's ability to represent complex scenes and avoids the interference of low-quality data in early fusion and the semantic fragmentation problem in later fusion. Based on the first fusion feature, target uncertainty is determined. By classifying and quantifying two types of uncertainty, the core defect of "overconfidence" in existing systems is addressed. It identifies cognitive uncertainty caused by model knowledge gaps and assesses accidental uncertainty caused by environmental interference, providing accurate risk quantification for decision-making and effectively reducing the probability of high-risk decisions in out-of-distribution scenarios. Then, based on target uncertainty, the target's strategic intent is determined. Uncertainty is used as a core element of decision-making to achieve dynamic matching of "risk-strategy." Based on the target strategic intent, target control commands are generated, transforming the abstract strategic intent into standardized control parameters (steering, throttle, brake) that can be directly recognized by the vehicle's actuators. This achieves precise implementation from "high-level decision-making" to "low-level execution," ensuring the smoothness and consistency of driving actions and avoiding abrupt behavioral changes. Finally, based on the target control commands, the target vehicle is controlled. The decision results are translated into actual driving behavior. Combined with uncertainty assessment and strategy optimization from previous steps, this ensures the safe and stable operation of the vehicle in complex open scenarios, effectively improving the reliability of the autonomous driving system and user trust.

[0084] This embodiment provides an autonomous driving control method that can be used in electronic devices in a target vehicle. Figure 2 This is a flowchart of an autonomous driving control method according to an embodiment of the present invention, such as... Figure 2 As shown, the process includes the following steps:

[0085] Step S201: Obtain the current heterogeneous multi-source data corresponding to the target vehicle.

[0086] The current heterogeneous multi-source data includes current visual data, current laser point cloud data, current natural language command data, and current vehicle status data.

[0087] Please refer to the above description of step S101 for details on this step, which will not be repeated here.

[0088] Step S202: Generate the first fusion feature based on the current heterogeneous multi-source data.

[0089] Please refer to the above description of step S102 for details on this step, which will not be repeated here.

[0090] Step S203: Identify the first fused feature to determine the target vehicle's uncertainty in the current driving scenario.

[0091] In this context, target uncertainty represents a comprehensive quantitative assessment of the reliability of an autonomous driving system's decision-making in the current driving scenario. Target uncertainty includes cognitive uncertainty and / or random uncertainty. Cognitive uncertainty refers to the decision-making uncertainty arising from the autonomous driving system's own knowledge limitations or outside-of-delivery (OOD) conditions, reflecting the model's degree of cognitive ambiguity and judgment discrepancies regarding the current scenario. Random uncertainty refers to the decision-making uncertainty caused by irreducible factors such as sensor noise, environmental interference, or the randomness of target motion, reflecting the inherent interference of the environment itself. This type of uncertainty cannot be eliminated by optimizing the model or supplementing training data.

[0092] Specifically, step S203 above may include the following steps:

[0093] Step S2031: Obtain the historical context sequence and the historical similarity sequence corresponding to the historical context sequence within a preset time period before the current time for the target vehicle.

[0094] The historical context sequence includes historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to each historical moment; the historical similarity sequence is used to characterize the similarity between the historical heterogeneous multi-source data corresponding to each historical moment and the current heterogeneous multi-source data corresponding to the current moment.

[0095] Specifically, step S2031 above may include the following steps:

[0096] Step a1: Store the historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to the target vehicle at each historical moment.

[0097] Specifically, after acquiring the current visual data, current laser point cloud data, current natural language command data, and current vehicle status data corresponding to the target vehicle each time, the electronic device can store the acquired current visual data, current laser point cloud data, current natural language command data, and current vehicle status data to the corresponding storage space of the electronic device. This enables the storage of historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to the target vehicle at various historical moments.

[0098] Step a2: Calculate the similarity between the current visual data, current laser point cloud data, current natural language command data, and current vehicle status data and the historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to each historical moment, and generate a historical similarity sequence corresponding to the historical context sequence.

[0099] Specifically, for each historical moment, the electronic device can calculate the visual similarity between the historical visual data corresponding to the historical moment and the current visual data corresponding to the current moment; calculate the laser point cloud similarity between the historical laser point cloud data corresponding to the historical moment and the current laser point cloud data corresponding to the current moment; calculate the natural language similarity between the historical natural language command data corresponding to the historical moment and the current natural language command data corresponding to the current moment; and simultaneously calculate the vehicle state similarity between the historical vehicle state data corresponding to the historical moment and the current vehicle state data corresponding to the current moment.

[0100] Then, the electronic device can fuse the calculated visual similarity, laser point cloud similarity, natural language similarity, and vehicle state similarity to generate historical similarity for each historical moment. Finally, the historical similarities for each historical moment are combined to generate a historical similarity sequence corresponding to the historical context sequence.

[0101] Step S2032: The first fusion feature, the historical context sequence, and the historical similarity sequence are fused to generate the second fusion feature.

[0102] Specifically, the electronic device can multiply each historical context feature in the historical context sequence by the corresponding similarity feature in the historical similarity sequence to obtain a weighted feature. Based on each weighted feature, a weighted feature sequence is generated. For example, for each historical context feature h... i Multiply by its corresponding similarity feature w i The weighted feature h is obtained. i =h i ×w i .

[0103] Then, the electronic device can use a temporal coding network (such as LSTM / TransformerEncoder) to encode the weighted feature sequence [h1′,h2′,h3′] ​​to generate a fixed-dimensional historical experience feature vector F. h F h The dimension is consistent with the first fusion feature F1. The temporal coding network captures the evolution patterns of historical scenes, such as "the trajectory change of a pedestrian from stationary to crossing in a highly similar scene," and ensures that the more similar the historical scenes, the higher the F1 score. h The greater the contribution.

[0104] Then, the electronic device can compute the first fusion feature F1 and the historical experience feature F through the attention layer. h The attention weight α (α∈[0,1]) reflects the reference value of historical experience to the current scene (e.g., when the uncertainty of the current scene is high, α increases, and the weight of historical experience increases). Then, based on the first fusion feature F1 and the historical experience feature F... h The attention weight α is applied to the first fusion feature F1 and the historical experience feature F. h After fusion processing, the second fused feature F2 is obtained. The feature fusion formula is: F2=(1-α)×F1+α×F h +Res(F1). Here, Res(F1) is a residual connection to avoid losing the core features of the current scene during the fusion process. F2 is the final second fused feature, with the same dimension as the first fused feature F1, containing dual information of "current real-time perception + highly similar historical experience".

[0105] Step S2033: Perform independent forward propagation processing on the second fusion feature to generate at least one preliminary strategic intent and the first intent uncertainty corresponding to each preliminary strategic intent.

[0106] The initial strategic intent includes all possible driving strategies. The initial strategic intent package can be classified by action type as follows: speed control, such as constant speed cruise, slowing down, emergency braking, and accelerating to overtake; directional control, such as keeping the lane, changing lanes to the left or right, and parking on the side of the road; and auxiliary actions, such as turning on hazard lights, sounding the horn, and requesting manual intervention.

[0107] Specifically, electronic devices can build independent forward propagation branches for each initial strategic intent. These forward propagation branches include a shared underlying feature extraction layer and a unique upper-level decision-making layer, ensuring that the uncertainty calculations for each initial strategic intent are independent of each other and avoiding cross-intent risk interference.

[0108] The shared bottom feature extraction layer consists of 2 to 3 fully connected layers. The input is the second fusion feature F2, and the output is the shared feature vector. The dimension of the shared feature vector is fixed, such as 256 dimensions, and it is responsible for extracting the core scene features in the second fusion feature F2.

[0109] Each initial strategic intent is assigned an independent fully connected branch by the upper-level decision-making layer. For example, eight initial strategic intents correspond to eight branches. Each branch includes: an intent output layer, which outputs the execution probability of the initial strategic intent, ranging from 0 to 1, reflecting the degree to which the intent is suitable for the current scenario; and an uncertainty output layer, which outputs the uncertainty of the first intent corresponding to the initial strategic intent, ranging from 0 to 1. The larger the value, the higher the decision risk of the intent.

[0110] The second fused feature F2 is input into the forward propagation branch network, and F2 is passed through the shared bottom feature extraction layer to generate a shared feature vector F. share ; Exclusively accessible to the upper-level decision-making body regarding the k-th preliminary strategic intent, input F share Then, the execution probability P is output through the Sigmoid activation function. k =Sigmoid(W k ·F share +b k (W) k b k (This refers to the weights and biases of this branch).

[0111] Then, the Monte Carlo dropout method is used to quantify the uncertainty and simulate the output fluctuation under network parameter perturbation. A dropout layer is added to the branch layer, and the forward propagation is repeated N times (e.g., 10 times) to obtain N execution probabilities. Then, calculate the variance σ of these N probabilities. k 2 The uncertainty of the first intent is obtained by normalizing it to the 0-1 range. The larger the variance, the more unstable the network's decision-making regarding the intention (i.e., the higher the uncertainty), reflecting the execution risk of the intention.

[0112] Optionally, the electronic device can set an execution probability threshold, such as an execution probability threshold of 0.5, retaining only P. k A preliminary strategic intent with a value of ≥0.5 is defined as a “preliminary strategic intent”; each preliminary strategic intent is bound to a corresponding first intent uncertainty, forming an “intent-uncertainty”.

[0113] Step S2034: Based on the uncertainty of the first intent corresponding to each preliminary strategic intent, determine the cognitive uncertainty corresponding to the target vehicle.

[0114] Specifically, the higher the execution probability of an initial strategic intention, the greater its contribution to cognitive uncertainty. High-probability intentions are core candidates for decision-making, and their uncertainty better reflects the knowledge gaps in the model. Therefore, electronic devices can extract the execution probability set P={P1,P2,...,P...} for each initial strategic intention. n}, where P k Let be the execution probability of the k-th preliminary strategic intent. Then, the execution probability is normalized to obtain the weighting coefficients. ,satisfy .

[0115] Then, the electronic device calculates the cognitive uncertainty U based on the weighting coefficients corresponding to each initial strategic intent. epistemic : Finally, U epistemic Mapped to the 0~1 range (0 = no cognitive blind spot, 1 = no matching knowledge at all).

[0116] Step S2035: Generate target uncertainty based on cognitive uncertainty.

[0117] In one alternative implementation, the electronic device can identify cognitive uncertainty as target uncertainty.

[0118] Specifically, step S2035 above may include the following steps:

[0119] Step b1: Process the second fusion features based on the heteroscedasticity regression head to generate the second intent uncertainty corresponding to each preliminary strategic intent.

[0120] Specifically, the heteroscedastic regression head needs to be connected to the shared bottom feature extraction layer in step S2033, and the input is the second fusion feature F2.

[0121] The feature mapping layer of the heteroscedasticity regression head maps the second fused feature F2 of dimension D to a high-dimensional feature space through 1-2 fully connected layers, outputting a high-dimensional feature vector F. aleo F aleo Dimensions such as 128.

[0122] The output layer of the heteroscedastic regression head contains two parallel output branches, both employing linear activation functions. The mean branch outputs μ. k The mean of the optimal execution parameters representing the k-th initial strategic intent (e.g., the mean of the "slow down" intent is the target vehicle speed of 30 km / h); the variance branch output σ k 2 Let represent the "variance of execution parameters" for the k-th preliminary strategic intent. A larger variance indicates higher sensor data noise corresponding to that intent. It should be noted that the variance branch must ensure its output is non-negative, which can be constrained using the Softplus function. .

[0123] To ensure that the variance output accurately reflects data noise, the training phase requires the use of negative log-likelihood loss (NLL Loss) to constrain the dual-output layer. ,in The actual execution parameter labels for the k-th intent (collected from real vehicle tests). This loss function forces the model to output a larger σ in scenarios with high data noise. k 2 .

[0124] Finally, the electronic device can output the variance σ from the heteroscedasticity regression head. k 2The uncertainty of the second intention as the kth preliminary strategic intention is as follows: Then, regarding After normalization (mapping to the 0~1 interval), the final set of second intention uncertainty is obtained. .

[0125] Step b2: Determine the contingent uncertainty corresponding to the target vehicle based on the uncertainty of the second intent corresponding to each initial strategic intent.

[0126] Specifically, electronic devices can adopt a maximum variance priority + weighted fusion strategy. The core logic is that the scenario with the greatest data noise contributes the most to accidental uncertainty (e.g., the visual data corresponding to a certain intention has extremely high noise, and the impact of this noise on the overall decision needs to be considered).

[0127] Specifically, the electronic device can obtain the weight system W corresponding to each preliminary strategic intent based on step b1. k .

[0128] Then, the electronic device can introduce a noise amplification factor β (β≥1, such as 1.2) for each Magnify to highlight the effects of high noise: .

[0129] Then, the second intention uncertainty after introducing noise is weighted and calculated to obtain the random uncertainty U. aleatoric : Finally, U aleatoric Mapped to the 0~1 range, where 0 indicates no noise in the data and 1 indicates extremely high noise in the data.

[0130] Optionally, if there are clear noise sources in the current scene, such as rain, nighttime, or sensor occlusion, the noise level can be determined directly using visual data quality scoring, and the U... aleatoric The noise level in such scenarios, which increases by 30% to 60%, is a typical example of random uncertainty, and risk assessment needs to be strengthened.

[0131] Step b3 involves fusing cognitive uncertainty and accidental uncertainty to generate target uncertainty.

[0132] Specifically, electronic devices can dynamically adjust the weighting coefficients of cognitive uncertainty and random uncertainty according to the risk preferences of autonomous driving scenarios. The core principle is: in safety-first scenarios, such as congested urban roads and school zones, the weight of cognitive uncertainty is increased, as the risk of knowledge gaps is higher than that of data noise; in efficiency-first scenarios, such as high-speed constant-speed driving, the weight of random uncertainty is increased, as the impact of data noise is more direct.

[0133] For example, define the basic weight coefficient: w epi(Cognitive uncertainty weight), w aleo (Weight of random uncertainty), satisfying w epi +w aleo =1; Example of a safety-first scenario: w epi =0.6,w aleo =0.4; Example of an efficiency-first scenario: w epi =0.4,w aleo =0.6.

[0134] Optionally, the electronic device can automatically adjust the weighting coefficients based on scene type features in the second fusion feature, such as urban, highway, and rural areas. For example, if a school area feature is detected, w epi Increased to 0.7; highway characteristics detected, w aleo Increased to 0.7.

[0135] Next, the electronic device uses a weighted summation formula to calculate the target uncertainty U. target :U target =w epi ×U epistemic +w aleo ×U aleatoric Then, U targe Normalize t to the 0~1 range to obtain the final target uncertainty quantification value.

[0136] Step S204: Based on the uncertainty of the target, determine the uncertainty level corresponding to the target vehicle; based on the uncertainty level, generate the target strategic intent corresponding to the uncertainty level.

[0137] Among them, the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario.

[0138] Please refer to the above description of step S104 for details on this step, which will not be repeated here.

[0139] Step S205: Identify the first fusion feature and the target strategic intent, and generate the target control command corresponding to the target vehicle.

[0140] The target control commands include: steering angle control commands, throttle control commands, and brake control commands.

[0141] Specifically, step S205 above may include the following steps:

[0142] Step S2051: Extract features from the first fused features to obtain the first local features.

[0143] Specifically, the electronic device can establish a BEV coordinate system with the rear axle center of the vehicle as the origin. The X-axis represents the vehicle's forward direction (positive), and the Y-axis represents the vehicle's left side (positive). The spatial range is set according to local scenario requirements. For example, a rectangular area 100m in front of the vehicle, 50m behind, and 50m to the left and right. Then, the spatial range under the BEV coordinate system is divided into fixed-size grids, such as 0.5m × 0.5m. Each grid corresponds to a feature unit, denoted as the BEV grid feature map M. bev The dimensions are H×W×C, where H is the grid height, W is the grid width, and C is the number of feature channels, such as obstacle presence, lane line probability, target speed, etc.

[0144] For each point in the current laser point cloud data, the electronic device assigns its three-dimensional coordinates (x, y, z) based on the vehicle's position and heading angle. l ,y l ,z l Convert the coordinates to BEV raster coordinates (u,v). For each BEV raster, calculate the point cloud density, mean height of the target object, and velocity vector within the raster, and then fill the M raster with these values. bev The corresponding channel.

[0145] Electronic devices can use camera intrinsic and extrinsic parameters and vehicle attitude information to project lane lines and obstacle features in the current visual image onto the BEV grid. Through inverse perspective transformation (IPM), the perspective distortion of the image is eliminated, and the road plane features are accurately mapped to the BEV space.

[0146] Then, electronic devices can perform weighted fusion of the projected features (e.g., 0.6 weight for LiDAR features and 0.4 weight for camera features) to solve the blind spot problem of a single sensor.

[0147] Next, the electronic devices can write the vehicle's status characteristics, such as current speed and heading angle, into M. bev The origin grid, i.e., the vehicle's position, serves as the reference for subsequent local area clipping, ensuring that the BEV feature map is always centered on the vehicle.

[0148] Electronic devices can dynamically define local area boundaries based on the needs of autonomous driving scenarios, such as urban roads and highways. For example, in a highway scenario, the focus is on the area 80m in front of the vehicle, 20m behind, and 30m to the left and right, paying attention to the trajectory of distant vehicles ahead; in a congested urban scenario, the focus is on the area 30m in front of the vehicle, 10m behind, and 20m to the left and right, paying attention to nearby obstacles and adjacent vehicles. The boundary coordinates of the clipped area are calculated based on the origin of the BEV coordinate system (the vehicle's position) and denoted as the local area window Win. local .

[0149] According to Win localThe boundary is determined by cropping a local region from the global BEV raster feature map Mbev to obtain the local BEV feature map M. bev-local The electronic device's cropped local BEV feature map M bev-local Perform boundary padding, such as filling graticules outside the global BEV range with 0 values, to ensure M bev-local The size is fixed to facilitate subsequent feature extraction. Then, invalid grids in local areas (such as grids without point clouds or image projections) are removed to reduce redundant calculations.

[0150] Finally, electronic devices can employ a combined network of convolutional layers, pooling layers, and attention layers, from M... bev-local Extract high-order contextual features.

[0151] Specifically, the electronic device can employ a 3-layer two-dimensional convolution (2D-Conv) with a kernel size of 3×3, a stride of 1, and an activation function of ReLU. The first convolutional layer extracts basic features at the grid level, such as the presence of obstacles and lane line probabilities in a single grid cell; the second convolutional layer extracts inter-grid features, such as lane line continuity between adjacent grid cells and obstacle clustering features; and the third convolutional layer extracts scene-level contextual features, such as the direction of adjacent vehicle trajectories and the topology of the lanes.

[0152] Then, global average pooling (GAP) is used to process the local BEV feature map M. bev-local (Dimensions H×W×C) are converted into fixed-dimensional feature vectors. Key contextual information (such as the center coordinates of obstacles and the mean curvature of lane lines) is preserved during pooling.

[0153] Next, a spatial attention mechanism is introduced to automatically identify high-priority regions in local scenes, such as pedestrians crossing the road or vehicles in front at close range; features of high-priority regions are given higher weights to suppress invalid features of background regions, such as distant trees and buildings.

[0154] The electronic device normalizes the feature vector output from the attention layer and maps it to the [0,1] interval, fusing core contextual information from the local BEV feature map: trajectory vectors of neighboring vehicles, position coordinates of lane boundaries, obstacle distribution density, and relative distance between the vehicle and key targets. The final high-dimensional vector output is the first local feature F. local1 Its core feature dimension corresponds to the key elements of the local scene and can be directly used for subsequent feature labeling and fusion.

[0155] Step S2052: Obtain the historical vehicle status data of the target vehicle within a preset time period before the current time.

[0156] Specifically, electronic devices can retrieve historical vehicle status data of the target vehicle within a preset time period prior to the current moment from the storage space.

[0157] Step S2053: Map each sub-state data in the historical vehicle state data into state feature labels through a linear embedding layer.

[0158] Specifically, since the historical vehicle states are a time-series sequence, sequence encoding is required before inputting them into the linear embedding layer. Electronic devices can use one-dimensional convolutional (1D-Conv) layers or LSTM layers to extract temporal features. For example, 1D-Conv uses a 1×3 convolutional kernel to capture the associated features of adjacent sampling points, outputting a sequence feature map. Then, the sequence feature map is transformed into a fixed-dimensional temporal feature vector F. seq Temporal eigenvector F seq This refers to the input data for the linear embedding layer.

[0159] The core of the linear embedding layer is to map time-series data of arbitrary length into a label vector of fixed dimension. The input of this linear embedding layer is the time-series feature vector F. seq (m sampling points, each point has dimension d) s (e.g., 4D: vehicle speed + acceleration + steering angle + throttle opening); hidden layer: 1 fully connected layer, with ReLU activation function; the electronic device generates state feature labels through the following calculations: T state =ReLU(W s ·F seq +b s W s Let b be the embedding layer weight matrix. s As a bias term, the output is a fixed-dimensional state feature label T. state , dimension d token =64. Finally, T state Normalize to the [0,1] interval to ensure that the numerical range is consistent with that of other feature labels.

[0160] Step S2054: Map the first local feature to a scene feature label with the same dimension as the state feature label.

[0161] Among them, the target dimension of the scene feature label must be completely consistent with that of the state feature label, i.e., d token =64.

[0162] Electronic devices can employ fully connected layers (MLPs) as dimension mapping modules to achieve dimensionality scaling of features. The input to the dimension mapping module is the first local feature F. local1 Dimension D local =256); the hidden layers are 1-2 fully connected layers with ReLU activation function, progressively compressing the dimensionality; the output layer is a fully connected layer with an output dimension of d.token The feature vector is 64. Then, the mapped feature vector is normalized to obtain the final scene feature label T. scene It has 64 dimensions and a value range of [0,1].

[0163] For example, the calculation formula is: T scene =Norm(MLP(F local1 Norm(·) is the normalization function.

[0164] Step S2055: Encode the target strategic intent as a task condition feature label.

[0165] Specifically, electronic devices can transform target strategic intent (natural language or enumeration type) into a structured vector. If the target strategic intent is an enumeration type (such as 10 preset intent categories), one-hot encoding is used, such as encoding "slow down and avoid pedestrians" as [0,1,0,...,0] (the dimension is the number of intent categories N). intent If the target strategic intent is described in natural language (e.g., "maintain 30km / h straight ahead"), then a pre-trained NLP model (e.g., a lightweight version of BERT) is used to extract semantic features, with an output dimension of d. nlp =128 semantic vectors.

[0166] Electronic devices can construct fully connected embedding layers to map structured intent vectors to conditional feature labels in the target dimension. The input is a structured intent vector (one-hot encoded vector or semantic vector); the output is a task-specific conditional feature label T. task , dimension d token =64, consistent with the dimension of state and scene feature marking.

[0167] For example, the calculation formula is: T task =ReLU(W t ·F intent +bt), where F intent W is a structured intent vector. t b represents the embedding layer weights. t This is a bias term.

[0168] Finally, electronic devices can prioritize T based on the target's strategic intent (e.g., "emergency braking" priority > "constant speed cruise" priority). task Assign a priority weight λ (λ≥1): The higher the priority, the greater the weight, ensuring that high-priority tasks dominate during fusion.

[0169] Step S2056: The state feature marker, scene feature marker, and task condition feature marker are fused together to generate the target feature marker.

[0170] Specifically, electronic devices can sequentially concatenate three types of feature markers into a marker sequence: The sequence length is L=3, and each label has a dimension of d. token =64.

[0171] Since the self-attention mechanism lacks temporal awareness, electronic devices add a position encoding vector P to each tag. Ei Distinguish between tag types and generate tag sequences. The position encoding uses a sine function to ensure that the positions of different types of markers are distinguishable.

[0172] Electronic devices will tag sequences Input a multi-head self-attention layer and calculate the attention weights of each label to other labels.

[0173] Specifically, the tag sequence Projected into the query (Q), key (K), and value (V) space: ;

[0174] Attention weight calculation: Then, the electronic device concatenates the outputs of multiple attention heads and outputs a fused feature F through a fully connected layer. att To avoid gradient vanishing, residual connections and layer normalization (LN) are added: .

[0175] Finally, the electronic device fuses the features through a single fully connected layer. The mapping is to the final target feature label, with dimensions (e.g., d). target =128).

[0176] In one alternative implementation, if onboard computing power is limited, the electronic device can employ weighted summation fusion: Where w1+w2+w3=1, the weights can be calibrated through real vehicle testing (e.g., task label weight w3=0.5, scene label weight w2=0.3, state label weight w1=0.2).

[0177] Step S2057: Generate target control instructions based on target feature markings.

[0178] Specifically, step S2057 above may include the following steps:

[0179] Step c1: Input the target feature labels into the preset control command generation model.

[0180] Specifically, the electronic device can convert the target feature label into input features F that are consistent with the input dimension of the preset control command generation model. in Then input feature F inInput the preset control command to generate the model.

[0181] Step c2: The preset control instruction generation model encodes the target feature labels and predicts the candidate behavior sequence at least one time step in the future, as well as the candidate confidence score and candidate behavior uncertainty score corresponding to each candidate behavior, based on the self-attention mechanism.

[0182] Specifically, electronic devices can input feature sequences [F] in The model generates a self-attention encoding layer (Multi-HeadSelf-Attention) by inputting preset control commands. The self-attention mechanism automatically learns long-term dependencies between states (such as the correlation between historical speed change trends and future speeds, and the long-term matching of lane boundary positions and steering angles) and intent-state alignment (such as the alignment of the "overtaking" task intent with the positions of neighboring vehicles and lane widths, ensuring that the behavior sequence conforms to strategic intent constraints) by calculating the similarity between "query (Q)" and "key (K)" and "value (V)". Then, it outputs the encoded contextual feature sequence F. ctx It contains information on the correlation weights between features.

[0183] Next, the electronic device will F ctx Input the feedforward neural network (FFN), extract high-order behavior prediction features, and output encoded features F. encode The decoding stage is based on the encoded feature F. encode The core logic of generating multiple sets of candidate behavior sequences is "multi-path prediction" (avoiding the decision risk of a single behavior).

[0184] The preset control command generation model's decoding layer employs a multi-branch decoding structure (e.g., 5 branches), with each branch independently predicting a set of "behavioral sequences for the next T time steps." Each behavioral sequence is denoted as BS. k =[bs k1 ,bs k2 ,...,bs kT (k=1,2,...,N, where N is the number of candidate sequences, e.g., 5; bs) kt (The behavior parameters for the k-th sequence at the t-th time step).

[0185] For each candidate behavior sequence, the pre-defined control instruction generation model simultaneously outputs two core evaluation metrics: one of these metrics is the candidate confidence score, which reflects the pre-defined control instruction generation model's "confidence" in the candidate behavior sequence's fit to the current scene. The calculation logic is as follows: the decoding layer outputs the "matching probability" of the behavior sequence (through the Softmax activation function); combined with the average attention weights of the self-attention mechanism, a weighted confidence score is obtained. k(Normalized to 0~1, with 1 being the highest confidence level). For example, BS1 has a matching probability of 0.9, an average attention weight of 0.95, and Conf1 = 0.9 × 0.95 = 0.855.

[0186] Another evaluation metric is the candidate behavior uncertainty score calculation, which reflects the "execution risk" of the behavior sequence (originating from scenario uncertainty and model prediction fluctuations). The calculation logic is as follows: using the Monte Carlo Dropout method, the forward propagation is repeated 10 times to obtain 10 sets of prediction parameters for the sequence; the variance of the 10 sets of parameters (such as vehicle speed variance and turning angle variance) is calculated, and after weighted summation, it is normalized to 0~1 to obtain the uncertainty score U. nck (1 represents the highest uncertainty). For example, BS1 has a speed variance of 0.5, a turning angle variance of 0.1, and a weighted sum of 0.3, then U... nc1 =0.3.

[0187] Finally, the electronic device organizes the information of all candidate sequences to form a structured output table, as exemplified in Table 2, which is a schematic table of information corresponding to all candidate sequences.

[0188] Table 2. Information illustration of all candidate sequences.

[0189]

[0190] Step c3: Based on the candidate confidence score and candidate behavior uncertainty score corresponding to each candidate behavior, determine the target behavior sequence whose candidate confidence score is greater than the preset confidence score threshold and whose candidate behavior uncertainty score is lower than the preset uncertainty score threshold.

[0191] Specifically, electronic devices can set confidence score thresholds and uncertainty score thresholds based on the risk preferences of autonomous driving scenarios. For example, the confidence score threshold... thr For example, a confidence score of 0.7 (only sequences with a candidate confidence score ≥ 0.7 are retained); uncertainty scoring threshold Unc thr For example, 0.5 (only sequences with candidate behavior uncertainty ≤ 0.5 are retained). Optionally, for high-risk scenarios (such as school areas, rainy days): Conf thr Increased to 0.8, Unc thr Reduced to 0.4; Low-risk scenarios (such as high-speed constant speed) Conf thr Decreased to 0.6, Unc thr Increased to 0.6.

[0192] The device can compare the candidate confidence score corresponding to each candidate behavior with a preset confidence score threshold, and compare the candidate behavior uncertainty score with a preset uncertainty score threshold. Then, it selects the candidate behaviors whose candidate confidence scores are greater than the preset confidence score threshold and whose candidate behavior uncertainty scores are lower than the preset uncertainty score threshold as the target behavior sequence.

[0193] Step c4: Generate target control instructions based on the target behavior sequence, the target confidence score corresponding to the target behavior sequence, and the target behavior sequence uncertainty score.

[0194] Specifically, step c4 above may include the following steps:

[0195] Step c41 involves performing feature fusion processing on the target behavior sequence, the target confidence score and the uncertainty score of the target behavior sequence, the current vehicle state data, and the first local features to obtain the third fused feature.

[0196] Specifically, electronic devices can generate behavioral sequence features based on the target behavioral sequence, the target confidence score corresponding to the target behavioral sequence, and the uncertainty score of the target behavioral sequence.

[0197] Then, the behavioral sequence features, current vehicle state data, and the first local features are weighted and fused to obtain the third fused feature.

[0198] Step c42 involves inputting the third fused feature into a fully connected multilayer perceptron, mapping it to the control parameter space via a nonlinear activation function, and outputting a triplet target control command. The physical value range of the triplet control command (adapting to vehicle hardware capabilities) is shown in Table 3 below.

[0199] Table 3. Triad Control Command Table

[0200]

[0201] The structure of the fully connected multilayer perceptron is as follows: The input layer is 256-dimensional (consistent with the F3 dimension) and receives the third fusion feature; Hidden layer 1 is 128-dimensional with ReLU activation function to extract control-related high-order features; Hidden layer 2 is 64-dimensional with ReLU activation function to further compress features and focus on core control parameters; The output layer is 3-dimensional (corresponding to steering / accelerator / brake) with piecewise activation function: 1. Steering angle: Tanh (output [-1,1], subsequently mapped to [-900°,900°]); 2. Accelerator / brake: Sigmoid (output [0,1], subsequently mapped to [0%,100%]), outputting normalized control parameters.

[0202] The electronic device inputs the third fusion feature F3 into the MLP, which passes through the hidden layer and the output layer in sequence to obtain the normalized output vector O=[o1,o2,o3]. o1∈[-1,1] is the normalized value of the steering angle; o2,o3∈[0,1]: the normalized values ​​of the throttle / brake.

[0203] Electronic devices can convert normalized values ​​into executable physical parameters using a linear mapping formula: steering angle control command: θ = o1 × 900°; throttle control command: throttle = o2 × 100%; brake control command: brake = o3 × 100%.

[0204] Electronic devices can perform hard constraint verification on the mapped parameters to ensure compliance with vehicle hardware limits: if the steering angle exceeds [-900°, 900°], it is truncated to the boundary value; throttle / brake mutual exclusion constraint: if throttle > 0, then brake = 0 (to avoid the throttle and brake being activated simultaneously).

[0205] Step S206: The target control command is transmitted to the actuator corresponding to the target vehicle, and the actuator is controlled to execute the target control command for autonomous driving.

[0206] Please refer to the above description of step S106 for details on this step, which will not be repeated here.

[0207] Step S207: Store the target vehicle's strategic intent at the current moment, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command, and the target behavior sequence corresponding to the target control command.

[0208] Specifically, electronic devices can store the target vehicle's strategic intent at the current moment, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command and the target behavior sequence corresponding to the target control command, into the target location in the electronic device's storage space.

[0209] Step S208: Compare the target uncertainty with a preset uncertainty threshold.

[0210] Specifically, after calculating the target uncertainty, the electronic device can also compare the target uncertainty with a preset uncertainty threshold.

[0211] Among them, the preset uncertainty threshold can be dynamically adjusted according to the scenario to avoid misjudgment / missed judgment caused by "one-size-fits-all". For example, for urban congested road sections, the preset uncertainty threshold is 0.6 to trigger risk response in advance; for highways, the preset uncertainty threshold is 0.7 to balance efficiency; for school / construction areas, the preset uncertainty threshold is 0.5, an extremely low threshold, prioritizing safety; and for ordinary suburban roads, the preset uncertainty threshold is 0.65, a medium threshold.

[0212] Step S209: If the target uncertainty is greater than a preset uncertainty threshold, a memory cue vector is generated based on the target strategic intent, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control instructions and the target behavior sequence corresponding to the target control instructions.

[0213] Specifically, if the uncertainty of the target is greater than a preset uncertainty threshold, the electronic device can generate a memory prompt vector by taking the target strategic intent corresponding to the current moment, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command and the target behavior sequence corresponding to the target control command.

[0214] Step S210: Generate the next-next-time strategic intent corresponding to the current time based on the memory cue vector.

[0215] Specifically, electronic devices can fuse memory cue vectors and target uncertainty to generate target fusion features. Finally, based on these target fusion features, the strategic intent of the target vehicle can be determined.

[0216] For example, such as Figure 3 and Figure 4 The diagram shown is a schematic representation of the framework of the autonomous driving control system corresponding to the autonomous driving control method in this application. Figure 3 As shown, the multimodal input module serves as the system's perception entry point, integrating four core data sources: multi-view cameras (visual information), LiDAR (lidar point cloud), natural language (user commands), and vehicle status (dynamic parameters), providing foundational data for subsequent processing. The feature extraction and fusion module (second layer) connects to the input module, using visual encoders, language encoders, and sensor fusion technology to extract features and fuse them across modalities from heterogeneous raw data, eliminating data heterogeneity and generating a unified feature vector for output to the downstream decision-making module. The three-layer cognitive decision-making module (left core) is the system's decision-making center, employing a three-tiered architecture: high-level (advanced cognitive layer) responsible for scene understanding and long-term planning; middle-level (intermediate planning layer) transforming high-level intentions into short-term behavioral sequences; and low-level (reaction control layer) generating executable control commands for the vehicle. The behavioral memory enhancement module (right core), as the parallel enhancement unit of the decision-making module, uses a two-layer memory structure: short-term memory (fixed buffer) stores recent decision information, maintaining temporal continuity; and long-term memory retrieves similar scenes or abnormal events, providing experiential support for decision-making. This module interacts bidirectionally with the three-layer cognitive decision-making module to achieve experience-driven decision calibration and improve the system's robustness in complex scenarios.

[0217] The autonomous driving control method provided in this application stores heterogeneous multi-source data from various historical moments, constructs a historical context sequence, and retains full-dimensional historical heterogeneous multi-source data of the target vehicle's past operations, covering four key types of information: vision, laser point clouds, voice commands, and vehicle status. This provides a data foundation for subsequent mining of the correlation between historical and current scenarios. Simultaneously, the structured storage of historical heterogeneous multi-source data supports the reuse of driving scenario experience, avoiding repeated decision-making trials in similar scenarios and improving decision-making efficiency. The similarity between current data and historical heterogeneous multi-source data is calculated to generate a historical similarity sequence. By quantifying the matching degree between current heterogeneous multi-source data and data from various historical moments, the output historical similarity sequence can accurately filter historical experiences highly relevant to the current scenario. Historical heterogeneous multi-source data with high similarity can be used as a decision reference, while data with low similarity can have its weight reduced. This design avoids interference from invalid historical information and allows the system to learn from successful decision-making experiences in similar scenarios, improving the robustness and accuracy of decisions in complex dynamic scenarios. A second fusion feature is generated by fusing the first fusion feature, the historical context sequence, and the historical similarity sequence. This approach achieves a deep fusion of current scenario features and historical effective experience. Compared to the first fusion feature, which relies solely on current data, the second fusion feature incorporates historical information weighted by similarity. This preserves the real-time nature of the current scenario while integrating decision-making experience from historical scenarios, avoiding feature bias caused by noise in data at a single moment and improving the accuracy of feature representation for complex dynamic scenarios. The second fusion feature is independently forward-propagated to generate preliminary strategic intentions and uncertainties at the first intention level. Through independent forward propagation, multiple preliminary strategic intentions and corresponding intent-level uncertainties are output, breaking the limitations of single intention output. Simultaneously, each preliminary strategic intention is equipped with a dedicated uncertainty assessment, achieving "one intention..." Figure 1The refined labeling of "risk" provides a direct quantitative basis for subsequent intent screening, improving the redundancy and robustness of decision-making. Cognitive uncertainty is determined based on the uncertainty of each primary intent. Thus, from the initial strategic intent uncertainty, cognitive uncertainty caused by factors such as insufficient model knowledge and out-of-delivery (OOD) scenarios is accurately extracted, achieving explicit quantification of "model-specific cognitive limitations." This solves the problem of existing technologies being unable to distinguish between cognitive and accidental uncertainties, providing a core basis for subsequent risk assessment. Then, based on a heteroscedastic regression head, a secondary intent uncertainty is generated. Through the heteroscedastic regression head, the secondary fusion features are directly processed, outputting the secondary intent uncertainty caused by irreducible factors such as sensor noise, environmental interference, and target motion randomness, without requiring repeated sampling. The assessment, balancing real-time performance and accuracy, effectively characterizes the inherent disturbances of the environment. Based on the uncertainty of the second intent, accidental uncertainty is determined. Integrating the uncertainty of the second intent into a global accidental uncertainty achieves a unified quantification of inherent environmental disturbances, providing a comprehensive environmental risk assessment for decision-making and avoiding the misleading influence of single sensor failures or localized disturbances on overall decision-making. By fusing cognitive uncertainty and accidental uncertainty, target uncertainty is generated. Through weighted fusion, the limitations of model cognition and inherent environmental disturbances are integrated into a unified target uncertainty, achieving a refined and comprehensive assessment of decision-making risks. This provides a precise quantitative standard for subsequent strategic intent selection, fundamentally reducing the probability of overconfident decision-making and improving the system's robustness and security in complex and open scenarios.

[0218] Then, the first local features are extracted from the first fusion feature. From the global first fusion feature, local features strongly correlated with the current decision are accurately extracted, redundant information is eliminated, core scene elements are focused, the computational complexity of subsequent model processing is reduced, and the real-time performance and accuracy of the decision are improved. Historical vehicle state data within a preset time period is obtained, and past vehicle dynamic state information is introduced to supplement the temporal continuity of the decision, avoiding short-sighted decisions caused by relying solely on the current state, and ensuring that the subsequently generated behavior sequence conforms to the physical motion law of the vehicle. Historical sub-state data is mapped to state feature labels. Through a linear embedding layer, heterogeneous historical vehicle sub-state data (such as vehicle speed and steering angle) are transformed into feature labels with unified dimensions, eliminating data format differences and enabling them to be directly processed by subsequent models, laying the foundation for cross-type feature fusion. The first local features are mapped to scene feature labels of the same dimension. The dimensionality of the scene local features is normalized to ensure consistency with the dimension of the state feature labels, solving the problem of dimensional incompatibility between different types of features, and ensuring that scene information and vehicle state information can be efficiently fused. The target strategic intent is encoded as task condition feature labels. This approach transforms abstract high-level strategic intentions into model-recognizable feature tags, clarifies the top-level constraints for generating subsequent behavioral sequences, and ensures that generated control commands strictly adhere to decision-making objectives, avoiding a disconnect between intention and execution. By fusing three types of feature tags to generate target feature tags, it achieves the integration of vehicle state, scene environment, and task intent. Figure 3The deep integration of core information generates target feature tags that simultaneously carry temporal, environmental, and intent constraints, providing a comprehensive and unified decision-making basis for subsequent control command generation and improving the rationality and robustness of decisions. The target feature tags are input into a pre-defined control command generation model, which processes the unified feature tags that integrate vehicle state, scene environment, and task intent. This achieves precise adaptation between features and the model, providing standardized input for subsequent behavior sequence reasoning and ensuring the standardization and efficiency of the decision-making process. The pre-defined control command generation model encodes the target feature tags and predicts candidate behavior sequences and evaluation metrics. Through a self-attention mechanism, it mines state temporal dependencies and intent-state alignment relationships, outputting candidate behavior sequences for the next T time steps, along with confidence and uncertainty scores. This ensures both the physical compliance and intent matching of the behavior sequences and enables quantitative evaluation of each candidate sequence, providing a clear basis for subsequent selection and improving decision robustness. Target behavior sequences are then selected based on confidence and uncertainty thresholds. By setting a dual screening condition of "high confidence + low uncertainty," candidate sequences with low matching degree and high risk are accurately filtered out, ensuring that the final selected target behavior sequence balances reliability and safety, effectively avoiding high-risk decisions, and solving the problem of the model "overconfident" outputting erroneous commands. A third fusion feature is generated by fusing multi-dimensional information. The target behavior sequence, its confidence and uncertainty scores, current vehicle state data, and first local scene features are deeply fused, integrating four key types of information: decision scheme, risk assessment, vehicle physical state, and core environmental features. This compensates for the lack of real-time state and scene feedback in single behavior sequences, providing a comprehensive and accurate decision-making basis for control command generation, ensuring that the commands not only conform to the planning intent but also adapt to the real-time state of the current vehicle and environment. Triple target control commands are output based on a fully connected multilayer perceptron. By using a fully connected multilayer perceptron and a nonlinear activation function, the third fusion feature is mapped into standardized steering angle, throttle, and brake triplet control commands, achieving a precise conversion from abstract fusion features to physical parameters that can be directly recognized by vehicle actuators. At the same time, the coordinated output of triplet commands ensures the smoothness and coordination of driving actions, avoids safety risks caused by sudden changes in a single command, and improves the stability and safety of the autonomous driving execution process.

[0219] Furthermore, the system stores the entire decision-making process at the current moment. It comprehensively retains core decision-making information such as the target vehicle's strategic intent, two types of uncertainty data, control commands, and behavioral sequences at the current moment, constructing a decision-making experience database for the autonomous driving system. This provides data support for subsequent fault tracing and solution optimization, and lays the foundation for experience reuse in similar scenarios, avoiding repeated trial and error. By comparing the target uncertainty with a preset threshold, the system quickly identifies the risk level of the current decision through threshold quantification, distinguishing between "low-risk routine scenarios" and "high-risk complex scenarios." This provides a clear trigger for whether to activate the experience memory mechanism, ensuring that system resources are accurately used for decision optimization in high-risk scenarios. In high-uncertainty scenarios, a memory prompt vector is generated. When the target uncertainty exceeds the threshold, a memory prompt vector is generated based on the current decision-making process data, achieving structured encapsulation of decision-making experience in high-risk scenarios. This vector focuses on core risk elements (cognitive / accidental uncertainty) and decision-making solutions, avoiding redundant invalid data and improving the efficiency and accuracy of subsequent experience retrieval. The next moment's strategic intent is generated based on the memory prompt vector. This enables the system to possess experience-driven self-correcting decision-making capabilities, directly guiding the generation of strategic intent for the next moment by reusing decision-making experience from current high-risk scenarios. This reduces the probability of decision-making errors in similar high-risk scenarios and enhances the system's adaptability to long-tail and complex dynamic scenarios, fundamentally strengthening the robustness and safety of the autonomous driving system.

[0220] This embodiment also provides an autonomous driving control device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0221] This embodiment provides an automatic driving control device, such as... Figure 5 As shown, it includes:

[0222] The acquisition module 301 is used to acquire the current heterogeneous multi-source data corresponding to the target vehicle; the current heterogeneous multi-source data includes the current visual data, the current laser point cloud data, the current natural language command data, and the current vehicle status data;

[0223] The first generation module 302 is used to generate a first fusion feature based on the current heterogeneous multi-source data;

[0224] The first determining module 303 is used to identify the first fused features and determine the target uncertainty of the target vehicle in the current driving scenario; the target uncertainty represents the comprehensive quantitative assessment of the reliability of the autonomous driving system's own decision-making in the current driving scenario.

[0225] The second determining module 304 is used to determine the uncertainty level corresponding to the target vehicle based on the target uncertainty; generate the target strategic intent corresponding to the uncertainty level based on the uncertainty level; the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario.

[0226] The second generation module 305 is used to identify the first fused features and the target strategic intent, and generate target control commands corresponding to the target vehicle; the target control commands include: steering angle control commands, throttle control commands and brake control commands;

[0227] The control module 306 is used to transmit the target control command to the actuator corresponding to the target vehicle and control the actuator to execute the target control command for autonomous driving.

[0228] The autonomous driving control device provided in this embodiment of the invention can execute the autonomous driving control method provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects for executing the method. Further functional descriptions of the various modules and units described above are the same as those in the corresponding embodiments described above, and will not be repeated here.

[0229] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.

[0230] The following is a detailed reference. Figure 6 The diagram illustrates a structural schematic suitable for implementing an electronic device according to embodiments of the present invention. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 01, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 02 or a program loaded from a memory 08 into a random access memory (RAM) 03. The RAM 03 also stores various programs and data required for the operation of the electronic device. The processor 01, ROM 02, and RAM 03 are interconnected via a bus 04. An input / output (I / O) interface 05 is also connected to the bus 04.

[0231] Typically, the following devices can be connected to I / O interface 05: input devices 06 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 07 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 08 including, for example, magnetic tapes, hard disks, etc.; and communication devices 09. Communication device 09 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 6 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.

[0232] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing instructions for performing the processes. Figure 6 The program code for the method shown. In such an embodiment, the computer program can be downloaded and installed from a network via communication device 09, or installed from memory 08, or installed from ROM 02. When the computer program is executed by processor 01, it performs the functions defined in the automatic driving control method of this embodiment of the invention.

[0233] Figure 6 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.

[0234] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as computer code that can be recorded on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the autonomous driving control method shown in the above embodiments is implemented.

[0235] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0236] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. An automatic driving control method, characterized in that, The method includes: Obtain the current heterogeneous multi-source data corresponding to the target vehicle; the current heterogeneous multi-source data includes current visual data, current laser point cloud data, current natural language command data, and current vehicle status data; Based on the current heterogeneous multi-source data, a first fusion feature is generated; The first fused feature is identified to determine the target uncertainty of the target vehicle in the current driving scenario; the target uncertainty represents the comprehensive quantitative assessment of the reliability of the autonomous driving system's own decision-making in the current driving scenario. Based on the target uncertainty, the uncertainty level corresponding to the target vehicle is determined; based on the uncertainty level, the target strategic intent corresponding to the uncertainty level is generated; the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario. The first fusion feature and the target strategic intent are identified to generate target control commands corresponding to the target vehicle; the target control commands include: steering angle control commands, throttle control commands, and brake control commands. The target control command is transmitted to the actuator corresponding to the target vehicle, and the actuator is controlled to execute the target control command for autonomous driving. The step of identifying the first fused feature and determining the target vehicle's target uncertainty in the current driving scenario includes: The system acquires the historical context sequence of the target vehicle within a preset time period prior to the current moment, as well as the historical similarity sequence corresponding to the historical context sequence. The historical context sequence includes historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to each historical moment. The historical similarity sequence is used to characterize the similarity between the historical heterogeneous multi-source data corresponding to each historical moment and the current heterogeneous multi-source data corresponding to the current moment. The first fusion feature, the historical context sequence, and the historical similarity sequence are fused to generate a second fusion feature; The second fusion feature is subjected to independent forward propagation processing to generate at least one preliminary strategic intent and a first intent uncertainty corresponding to each preliminary strategic intent; Based on the uncertainty of the first intent corresponding to each of the preliminary strategic intents, the cognitive uncertainty corresponding to the target vehicle is determined; The target uncertainty is generated based on the cognitive uncertainty.

2. The method according to claim 1, characterized in that, The step of generating the target uncertainty based on the cognitive uncertainty includes: The second fusion feature is processed based on the heteroscedasticity regression head to generate the second intent uncertainty corresponding to each of the preliminary strategic intents; Based on the uncertainty of the second intent corresponding to each of the aforementioned preliminary strategic intents, the accidental uncertainty corresponding to the target vehicle is determined; The cognitive uncertainty and the accidental uncertainty are fused together to generate the target uncertainty.

3. The method according to claim 1, characterized in that, The step of obtaining the historical context sequence of the target vehicle within a preset time period prior to the current moment and the historical similarity sequence corresponding to the historical context sequence includes: The historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle status data corresponding to the target vehicle at each of the historical moments are stored. The similarity between the current visual data, the current laser point cloud data, the current natural language command data, and the current vehicle status data and the historical visual data, the historical laser point cloud data, the historical natural language command data, and the historical vehicle status data corresponding to each historical moment is calculated respectively, and the historical similarity sequence corresponding to the historical context sequence is generated.

4. The method according to claim 1, characterized in that, The step of identifying the first fused feature and the target strategic intent to generate the target control command corresponding to the target vehicle includes: Feature extraction is performed on the first fused feature to obtain the first local feature; Obtain historical vehicle status data for the target vehicle within a preset time period prior to the current moment; Each sub-state data in the historical vehicle state data is mapped to a state feature label through a linear embedding layer; The first local feature is mapped to a scene feature label with the same dimension as the state feature label; The target strategic intent is encoded as a task condition feature tag; The state feature marker, the scene feature marker, and the task condition feature marker are fused together to generate the target feature marker; Based on the target feature markers, the target control instructions are generated.

5. The method according to claim 4, characterized in that, The step of generating the target control command based on the target feature marker includes: The target feature markers are input into a preset control command generation model; The preset control instruction generation model encodes the target feature labels and, based on a self-attention mechanism, predicts the candidate behavior sequence at least one time step in the future, as well as the candidate confidence score and candidate behavior uncertainty score corresponding to each candidate behavior. Based on the candidate confidence score and the candidate behavior uncertainty score corresponding to each candidate behavior, a target behavior sequence is determined where the candidate confidence score is greater than a preset confidence score threshold and the candidate behavior uncertainty score is lower than a preset uncertainty score threshold. The target control instruction is generated based on the target behavior sequence, the target confidence score corresponding to the target behavior sequence, and the target behavior sequence uncertainty score.

6. The method according to claim 5, characterized in that, The step of generating the target control instruction based on the target behavior sequence, the target confidence score corresponding to the target behavior sequence, and the target behavior sequence uncertainty score includes: The target behavior sequence, the target confidence score and the target behavior sequence uncertainty score, the current vehicle state data, and the first local feature are subjected to feature fusion processing to obtain the third fused feature; The third fused feature is input into a fully connected multilayer perceptron, mapped to the control parameter space via a nonlinear activation function, and outputs a triplet target control command.

7. The method according to any one of claims 1-6, characterized in that, After transmitting the target control command to the actuator corresponding to the target vehicle and controlling the actuator to execute the target control command for autonomous driving, the method further includes: The system stores the target vehicle's strategic intent at the current moment, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command, and the target behavior sequence corresponding to the target control command. The target uncertainty is compared with a preset uncertainty threshold; If the target uncertainty is greater than the preset uncertainty threshold, a memory cue vector is generated based on the target strategic intent, the cognitive uncertainty and accidental uncertainty corresponding to the target uncertainty, the target control command and the target behavior sequence corresponding to the target control command; Based on the memory cue vector, generate the next moment's strategic intent corresponding to the next moment of the current moment.

8. An automatic driving control device, characterized in that, The device includes: The acquisition module is used to acquire the current heterogeneous multi-source data corresponding to the target vehicle; the current heterogeneous multi-source data includes current visual data, current laser point cloud data, current natural language command data, and current vehicle status data; The first generation module is used to generate a first fusion feature based on the current heterogeneous multi-source data; A first determining module is used to identify the first fused feature and determine the target uncertainty of the target vehicle in the current driving scenario; the target uncertainty characterizes the comprehensive quantitative assessment of the reliability of the autonomous driving system's own decision-making in the current driving scenario; wherein, identifying the first fused feature and determining the target uncertainty of the target vehicle in the current driving scenario includes: acquiring a historical context sequence within a preset time period before the current moment corresponding to the target vehicle and a historical similarity sequence corresponding to the historical context sequence; wherein, the historical context sequence includes historical visual data, historical laser point cloud data, historical natural language command data, and historical vehicle state data corresponding to each historical moment; the historical similarity sequence is used to characterize the similarity between the historical heterogeneous multi-source data corresponding to each historical moment and the current heterogeneous multi-source data corresponding to the current moment; fusing the first fused feature, the historical context sequence, and the historical similarity sequence to generate a second fused feature; performing independent forward propagation processing on the second fused feature to generate at least one preliminary strategic intent and a first intent uncertainty corresponding to each preliminary strategic intent; determining the cognitive uncertainty corresponding to the target vehicle based on the first intent uncertainty corresponding to each preliminary strategic intent; and generating the target uncertainty based on the cognitive uncertainty. The second determining module is used to determine the uncertainty level corresponding to the target vehicle based on the target uncertainty; generate the target strategic intent corresponding to the uncertainty level based on the uncertainty level; the target strategic intent represents the top-level decision constraints generated under the uncertainty level corresponding to the current driving scenario; The second generation module is used to identify the first fused feature and the target strategic intent, and generate target control commands corresponding to the target vehicle; the target control commands include: steering angle control commands, throttle control commands, and brake control commands; The control module is used to transmit the target control command to the actuator corresponding to the target vehicle, and control the actuator to execute the target control command for autonomous driving.

9. An electronic device, characterized in that, include: The vehicle body and electronic devices, the electronic devices including a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the autonomous driving control method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to execute the automatic driving control method according to any one of claims 1 to 7.

11. A computer program product, characterized in that, It includes computer instructions for causing a computer to perform the automatic driving control method according to any one of claims 1 to 7.