A method and system for scheduling heterogeneous tasks using an NPU for driver assistance
By dynamically identifying multimodal sensor datasets and implementing three-level adaptive scheduling, the efficiency and safety issues of existing vehicle-mounted NPU heterogeneous task scheduling methods in complex scenarios are resolved. This achieves adaptive matching and optimization of computing resources, thereby improving the overall performance of the driver assistance system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING INMOT INFORMATION TECH CO LTD
- Filing Date
- 2026-05-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing heterogeneous task scheduling methods for vehicle-mounted NPUs mostly employ static or semi-static strategies, which make it difficult to achieve unified real-time perception and dynamic optimization across multiple scenarios in complex and ever-changing road conditions. This results in low computing power utilization, poor system energy efficiency, and driving safety risks.
By acquiring multimodal sensor datasets, multidimensional dynamic recognition and heterogeneous task graph construction are performed to generate dynamic driving scenario information sets. Three-level adaptive scheduling is then carried out, including inter-core scheduling, cross-unit collaborative scheduling, and dynamic task migration. Combined with multi-dimensional monitoring and data-driven closed-loop strategy iteration, resource allocation is optimized.
It improves the utilization and energy efficiency of heterogeneous computing platforms, ensures the real-time performance and stability of task execution, reduces the risk of resource allocation mismatch, and enhances the robustness and security of the system.
Smart Images

Figure CN122309088A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of driver assistance technology, and in particular to an NPU heterogeneous task scheduling method and system for driver assistance. Background Technology
[0002] In the fields of advanced driver assistance and autonomous driving, in-vehicle heterogeneous computing systems have become the core foundation for realizing real-time environmental perception, intelligent decision-making, and precise control. Their task scheduling efficiency and level of intelligence directly affect the system's real-time performance, energy efficiency, and driving safety, and are one of the core technologies for the development of intelligent connected vehicles.
[0003] Existing heterogeneous task scheduling methods for in-vehicle NPUs mostly employ static or semi-static task mapping and resource allocation strategies. When faced with complex and ever-changing road scenarios, these methods struggle to achieve unified real-time perception and dynamic optimization across multiple scenarios. Under conditions of scenario switching, computing power fluctuations, or unforeseen circumstances, tasks and resources cannot be allocated rationally, reducing computing power utilization and system energy efficiency. Furthermore, scheduling delays and decision-making biases can easily lead to system response lags, posing driving safety risks. Summary of the Invention
[0004] This application provides a method and system for scheduling heterogeneous tasks using an NPU for driver assistance systems, in order to solve the aforementioned technical problems.
[0005] Firstly, this application provides an NPU heterogeneous task scheduling method for assisted driving, the method comprising: A multimodal sensor dataset for vehicle-mounted systems is acquired. Based on this dataset, multidimensional dynamic identification and heterogeneous task graph construction of assisted driving scenarios are performed to generate a dynamic driving scenario information set. Based on this dynamic driving scenario information set, task-computing unit dynamic matching is performed, and a three-level adaptive scheduling system covering inter-core scheduling, cross-unit collaborative scheduling, and dynamic task migration is driven to generate a dynamic scheduling decision information set. Based on this dynamic scheduling decision information set, multidimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies are performed to generate an adaptive optimization result set for assisted driving.
[0006] Optionally, the generation process of the dynamic driving scenario information set includes: the vehicle-mounted multimodal sensor dataset includes raw perception data from high-definition cameras, millimeter-wave radar, lidar, and vehicle controller status data; real-time pattern matching and decision tree inference are performed based on the vehicle-mounted multimodal sensor dataset to generate driving scenario labels containing specific scenario types and quantified safety level labels; based on the driving scenario labels, atomic task nodes of the entire process of the assisted driving algorithm are instantiated from a predefined task library, and a heterogeneous driving task graph is constructed according to data flow dependencies; real-time load rate, available memory, and power consumption data of the heterogeneous computing unit cluster are collected to generate a heterogeneous computing resource status snapshot; the driving scenario labels, the heterogeneous driving task graph, and the heterogeneous computing resource status snapshot are encapsulated to generate the dynamic driving scenario information set.
[0007] Optionally, the process of generating the driving scene label includes: based on the image data from the high-definition camera, identifying lane line geometric change points, static obstacle contour density, and blind spot ratio, and quantifying and generating road structure change features; based on the continuous frame point cloud data from the millimeter-wave radar and the lidar, calculating the covariance of the movement trajectories of major traffic participants and the divergence of future trajectory packets, and quantifying and generating traffic intention uncertainty features; based on the vehicle controller state data, evaluating the consistency level of multi-source perception data, and quantifying and generating perception confidence decay features; mapping the road structure change features, the traffic intention uncertainty features, and the perception confidence decay features to generate specific scene types, and calculating the dynamic comprehensive risk level, which together constitute the driving scene label; the specific scene types include urban assisted driving type, highway assisted driving type, and extreme assisted driving type.
[0008] Optionally, the process of generating the heterogeneous driving task graph includes: dynamically selecting and instantiating atomic task nodes adapted to the current scenario from a predefined task library using the specific scenario type as an index; the atomic task nodes include perception-type atomic task nodes, prediction-type atomic task nodes, and decision-type atomic task nodes; dynamically adjusting the processing accuracy parameters and update frequency parameters of the perception-type atomic task nodes according to the quantification value of the road structure mutation characteristics, as perception fusion node data; setting multiple hypothesis calculation branches for the prediction-type atomic task nodes and the decision-type atomic task nodes according to the quantification value of the traffic intention uncertainty characteristics, and enhancing the dependency edge weights between the two and the perception fusion node data; mapping the dynamic comprehensive risk level to a global task execution urgency coefficient, thereby uniformly labeling the heterogeneous scheduling priority of all the atomic task nodes; connecting the atomic task nodes with the perception fusion node data, the multiple hypothesis calculation branches, and the heterogeneous scheduling priority according to the data flow dependency relationship of the algorithm flow to generate the heterogeneous driving task graph.
[0009] Optionally, the generation process of the dynamic scheduling decision information set includes: based on the heterogeneous driving task graph and the snapshot of the heterogeneous computing resource status, modeling the task allocation problem as a multi-objective optimization game, solving the optimal matching scheme through a fast convergence algorithm, and generating a task-unit matching mapping table; according to the task-unit matching mapping table, performing load-aware fine-grained task partitioning and resource allocation among several computing cores within the NPU, and generating fine-grained scheduling instructions between cores; simultaneously, performing scenario-demand-oriented computing power collaboration and priority arbitration among the heterogeneous computing unit clusters, and generating cross-computing unit collaborative scheduling instructions; continuously monitoring scenario switching events and the health status of the heterogeneous computing unit clusters, and if a scenario switching or health status abnormality is detected, triggering the generation of a task subgraph overall redeployment plan, and generating task dynamic migration guidance; encapsulating the fine-grained scheduling instructions between cores, the cross-computing unit collaborative scheduling instructions, and the task dynamic migration guidance to generate the dynamic scheduling decision information set.
[0010] Optionally, the process of generating fine-grained scheduling instructions between cores includes: dynamically and logically grouping NPU computing cores according to the computing power requirements of the driving scenario labels; grouping multiple cores into collaborative computing units for unstructured urban scenarios to process large-scale fusion models; and decoupling core groups and shutting down some cores for structured high-speed scenarios to reduce power consumption; monitoring the utilization rate of each computing core and the communication latency between adjacent cores in real time through the on-chip network; when the load deviation is detected to exceed a preset threshold, migrating some computing subgraphs on high-load cores to low-load cores, generating and executing load balancing migration instructions; identifying atomic task nodes marked with fixed computing patterns and high data throughput in the heterogeneous driving task graph, unloading them from the general computing cores, and scheduling them to the dedicated hardware acceleration engine integrated in the NPU for execution, generating and executing dedicated engine offloading instructions.
[0011] Optionally, the generation process of the cross-computing unit collaborative scheduling instruction includes: when the driving scenario label is the urban assisted driving type, generating a first collaborative strategy instruction: directing the image semantic segmentation and BEV spatial modeling task scheduling instructions to the NPU computing core group, directing the multi-target trajectory prediction task scheduling instructions to the GPU, directing the vehicle lateral and longitudinal control task scheduling instructions to the CPU, and configuring the ISP to enter the enhanced image preprocessing mode; when the driving scenario label is the high-speed assisted driving type, generating a second collaborative strategy instruction: decomposing the target tracking task into feature extraction and data association, with its scheduling instructions directed to the NPU and GPU respectively for collaborative pipeline processing, directing the path trajectory smoothing task scheduling instruction to the CPU, and generating an energy-saving control instruction to reduce the working frequency of the NPU non-core computing units; when the driving scenario label is the extreme assisted driving type, generating a third collaborative strategy instruction: broadcasting the highest priority preemption instruction to all computing units, suspending all non-safety-critical tasks, ensuring that the obstacle emergency recognition and braking decision tasks obtain exclusive computing resources until the safety level is reduced.
[0012] Optionally, the process of generating the task dynamic migration guide includes: when the scene recognition module detects a change in the specific scene type in the driving scene label, triggering a scene migration process: based on the new and old scene types, resolving the task allocation game to generate a new matching mapping table, thereby smoothly migrating the task subgraph of the soon-to-be-failed scene from its original computing unit to the newly mapped computing unit; when the health monitoring module detects that the utilization rate of any computing unit is continuously overloaded or a temperature alarm is triggered, triggering a state migration process: migrating the atomic task nodes with the highest load on this computing unit, and the subgraph formed by their direct predecessor and successor dependent nodes in the heterogeneous driving task graph, as a whole to a healthy redundant computing unit; after each migration process is completed, updating the heterogeneous computing resource state snapshot, and re-evaluating the scheduling balance of the heterogeneous computing unit cluster.
[0013] Optionally, the process of generating the adaptive optimization result set for assisted driving includes: real-time collection of performance indicators after scheduling based on the dynamic scheduling decision information set, wherein the performance indicators include the actual completion time of each atomic task node in the heterogeneous driving task graph, the actual power consumption of the heterogeneous computing unit cluster, and the actual risk change rate of the driving scenario label; comparing and analyzing the performance indicators with the performance expectations preset based on the driving scenario label to identify performance deviations and their corresponding scenario conditions and scheduling decisions; based on the continuous analysis results of the performance deviations, dynamically updating the generation strategy of the task-unit matching mapping table and the strategy parameters in the three-level adaptive scheduling using an incremental learning method to form an optimized scheduling strategy; and encapsulating and storing the optimized scheduling strategy for direct use by subsequent users with the same or similar driving scenario labels to generate the adaptive optimization result set for assisted driving.
[0014] Secondly, this application provides an NPU heterogeneous task scheduling system for assisted driving, the system comprising: The driving scenario module is used to acquire the vehicle multimodal sensor dataset, and based on the vehicle multimodal sensor dataset, to perform multi-dimensional dynamic recognition of assisted driving scenarios and construct heterogeneous task graphs to generate a dynamic driving scenario information set. The scheduling decision module is used to perform task-computing unit dynamic matching based on the dynamic driving scenario information set, and drive the execution of three-level adaptive scheduling covering inter-core scheduling, cross-unit collaborative scheduling, and dynamic task migration to generate a dynamic scheduling decision information set. The adaptive optimization module is used to perform multi-dimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies based on the dynamic scheduling decision information set to generate an assisted driving adaptive optimization result set.
[0015] Compared with existing technologies, the beneficial effects of the present invention, an NPU heterogeneous task scheduling method and system for assisted driving, are as follows: 1. This application enables computing resources to be dynamically matched and allocated according to driving scenarios, improving the utilization and operational efficiency of heterogeneous computing platforms. Relying on three-level scheduling and closed-loop optimization, it ensures the real-time performance and stability of task execution in complex scenarios, improves system robustness, and can provide efficient and reliable computing scheduling support for advanced driver assistance systems.
[0016] 2. By unifying the encapsulation of driving scenario tags, heterogeneous driving task graphs, and heterogeneous computing resource status snapshots, subsequent scheduling can simultaneously consider scenario requirements, task dependencies, and hardware load, reducing the risk of mismatch caused by allocating resources based on only a single factor.
[0017] 3. By quantifying driving scenario labels from dimensions such as road structure, traffic intent, and perception confidence, the complexity and risk changes of the current driving environment can be reflected in detail, providing clearer data basis for subsequent task parameter adjustments and scheduling priority settings.
[0018] 4. By dynamically adjusting the weights of atomic task nodes, computation branches, and dependency edges according to specific scenario types and risk levels, the heterogeneous driving task graph can be adapted to the computational needs of the current scenario, reducing the problem of insufficient response of fixed task processes in complex scenarios or excessive resource consumption in simple scenarios.
[0019] 5. By unifying task-unit matching, inter-core scheduling, cross-unit collaborative scheduling, and dynamic task migration into a dynamic scheduling decision information set, a consistent execution basis can be provided for resource allocation at different levels, which helps to maintain the continuity of the scheduling process when the scenario changes or the resource status fluctuates.
[0020] 6. By logically grouping, migrating loads, and offloading to dedicated engines based on scenario requirements, the balance of computing resources within the NPU can be improved, and unnecessary computing power can be appropriately reduced in scenarios with low computing demands.
[0021] 7. By configuring collaborative strategies for NPU, GPU, CPU, and ISP for urban, highway, and extreme assisted driving types respectively, different computing units can undertake tasks more suited to their processing characteristics, and provide a more stable scheduling basis for prioritizing the execution of safety-related tasks in high-risk scenarios.
[0022] 8. By migrating tasks in units of task subgraphs when switching scene types or when the health status of computing units is abnormal, the risk of data dependency interruption caused by the isolated migration of individual tasks can be reduced, and it helps to maintain the continuous execution of task chains and the relative balance of computing resources.
[0023] 9. By collecting performance indicators such as task completion time, power consumption, and risk change rate, and incrementally updating the matching strategy and scheduling parameters based on performance deviation, the scheduling strategy in subsequent similar scenarios can be gradually corrected, making the scheduling results more consistent with the actual operating status of the vehicle. Attached Figure Description
[0024] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0025] Figure 1This is a schematic diagram illustrating an application scenario provided in one embodiment of this application; Figure 2 A flowchart illustrating an NPU heterogeneous task scheduling method for assisted driving, provided as an embodiment of this application; Figure 3 This is a schematic diagram of the structure of an NPU heterogeneous task scheduling system for assisted driving, provided as an embodiment of this application. Detailed Implementation
[0026] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0027] Furthermore, the terms "and / or" in this document are merely descriptions of the relationships between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this document, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.
[0028] The embodiments of this application will now be described in further detail with reference to the accompanying drawings.
[0029] Existing heterogeneous task scheduling methods for vehicle-mounted NPUs generally rely on static or semi-static task mapping and resource allocation strategies when facing complex, ever-changing, and highly dynamic road scenarios. They lack the ability to perform integrated real-time collaborative perception and dynamic optimization of diverse scenarios, making it difficult for the system to achieve optimal matching of tasks and resources under scenario switching, computing power fluctuations, or sudden safety events. This not only restricts the overall computing power utilization efficiency and system energy efficiency, but may also cause response delays due to scheduling delays or improper decision-making, posing potential risks to driving safety.
[0030] Based on this, this application provides a heterogeneous task scheduling method and system for NPUs in assisted driving. First, using multimodal sensor data, the driving scenario type and risk level are identified in real time, and a heterogeneous computing task graph labeled with priorities and parameterization requirements is dynamically constructed. Second, task allocation is modeled as an optimization problem, generating an initial mapping and driving three-level adaptive scheduling: fine-grained load balancing and dedicated engine offloading among NPU cores; computing power collaboration and priority arbitration among heterogeneous units such as NPUs, GPUs, and CPUs based on scenario strategies; and monitoring scenario switching and hardware status at the system level to trigger dynamic migration of task subgraphs. Finally, performance indicators such as latency and power consumption after scheduling are continuously monitored. Through data comparison and incremental learning, the scheduling strategy parameters are optimized in a closed loop, forming an adaptive optimization result set which is then output to automotive R&D personnel. This solution achieves adaptive and precise matching of computing resources to dynamic driving scenarios, significantly improving the overall utilization and energy efficiency of heterogeneous computing platforms. Through three-level scheduling and closed-loop optimization, it ensures the real-time performance and reliability of task processing in complex scenarios, enhances the system's robustness in dealing with emergencies, and thus provides more efficient and safer core computing scheduling support for advanced driver assistance systems.
[0031] Figure 1 This is a schematic diagram illustrating an application scenario provided by this application. In the process of assisted driving, the method provided in this application improves the overall computing power utilization efficiency and system energy efficiency, and reduces potential driving safety risks.
[0032] Specifically, the method of this application is applied to any server that communicates with an onboard multimodal sensor array and obtains the onboard multimodal sensor dataset provided by the onboard multimodal sensor array through the server. Specific implementation details can be found in the following embodiments.
[0033] Figure 2 This is a flowchart illustrating an NPU heterogeneous task scheduling method for assisted driving, provided as an embodiment of this application. The method of this embodiment can be applied to servers in the above scenarios. Figure 2 As shown, the method includes: S201. Obtain the vehicle-mounted multimodal sensor dataset. Based on the vehicle-mounted multimodal sensor dataset, perform multi-dimensional dynamic recognition and heterogeneous task graph construction for assisted driving scenarios, and generate a dynamic driving scenario information set.
[0034] NPU (Neural Processing Unit) refers to an embedded neural network processor. A vehicle-mounted multimodal sensor dataset refers to a collection of raw data synchronously collected by sensors based on different physical principles (such as cameras, millimeter-wave radar, and lidar) on a vehicle, along with the vehicle's own state data. It is the most direct digital representation of the vehicle's environment and its own state, and the data originates from the vehicle-mounted multimodal sensor array. Multidimensional dynamic recognition is a process that, based on multimodal sensor data, analyzes and classifies driving scenarios in real time from multiple quantitative dimensions such as road structure, traffic participant behavior, and perception system confidence, and assesses their dynamic risk level. Heterogeneous task graph construction is the process of instantiating algorithm task nodes from a predefined library based on the identified specific driving scenarios and connecting them into a directed graph model according to data flow dependencies. The nodes in the graph contain differentiated processing parameters and scheduling attributes. A dynamic driving scenario information set refers to an information encapsulation used to comprehensively describe the current scheduling decision background, typically consisting of three parts: driving scenario labels, a heterogeneous driving task graph, and a snapshot of the heterogeneous computing resource state.
[0035] Specifically, advanced driver assistance systems (ADAS) rely on a continuous and accurate understanding of the driving environment and require the rational deployment of complex driving algorithm tasks on heterogeneous in-vehicle computing platforms. However, driving scenarios, such as urban congestion, highway cruising, and intersection crossings, are complex and ever-changing. Different scenarios have vastly different requirements for the types of algorithms, computational accuracy, and real-time performance at each stage of perception, prediction, and decision-making. In-vehicle computing resources, such as NPUs, GPUs (Graphics Processing Units), and CPUs (Central Processing Units), are limited and heterogeneous. Traditional scheduling methods often employ static or predefined task allocation strategies. This rigid approach fails to perceive dynamic changes in the scenario, leading to idle computing power and wasted energy in simple scenarios, while in complex scenarios, unreasonable allocation of computing power may cause task processing delays or even loss, posing safety hazards. This solution addresses this core challenge by acquiring and fusing raw data from multimodal sensors such as cameras and radar in real time. It performs multi-dimensional quantitative identification of the environment, including factors like road structure complexity and the uncertainty of traffic participants' intentions. Based on this, it dynamically constructs a heterogeneous task graph depicting the complete algorithm flow and its dependencies. Simultaneously, it collects the real-time status of each computing unit, ultimately encapsulating this data into a unified dynamic driving scenario information set. This step provides a precise, real-time, and structured panoramic view of the environment, task, and resources for subsequent scheduling, serving as a fundamental prerequisite for adaptive scheduling.
[0036] S202. Based on the dynamic driving scenario information set, perform dynamic matching of tasks and computing units, and drive the execution of three-level adaptive scheduling covering inter-core scheduling, cross-unit collaborative scheduling and dynamic task migration, and generate a dynamic scheduling decision information set.
[0037] Task-computing unit dynamic matching can be the process of allocating task nodes in a heterogeneous task graph to the most suitable heterogeneous computing units in real time through optimization algorithms to form an initial task deployment plan. Three-level adaptive scheduling can refer to a complete scheduling mechanism encompassing three levels: fine-grained inter-core scheduling, cross-computing unit collaborative scheduling, and dynamic task migration, ranging from micro to macro and from routine to emergency. The dynamic scheduling decision information set can be a set of immediately executable scheduling instructions generated in response to specific driving scenarios and resource states, including a task mapping table, scheduling instructions at each level, and migration contingency plans.
[0038] Specifically, after obtaining a precise description of the current driving task requirements and computing resource status, the key to improving the overall system performance and reliability lies in how to efficiently and rationally map hundreds of computing tasks to the most suitable heterogeneous computing units and complete the scheduling within milliseconds. Traditional scheduling schemes typically focus on only a single level, such as scheduling only between CPU cores or statically allocating tasks to different hardware, lacking coordination and contingency mechanisms at three levels: on-chip computing cores, heterogeneous units between chips, and system-level task flows. This results in the system being unable to cope with computing hotspots within the NPU, unable to leverage the synergistic advantages of the GPU and NPU, and lacking rapid recovery capabilities in the event of sudden scene changes or occasional hardware unit failures. This solution addresses this systemic scheduling challenge by first performing global task-unit dynamic matching based on an optimization model to generate an initial allocation scheme; then, driving the execution of three-level adaptive scheduling: fine-grained load balancing and hardware accelerator offloading between cores, coordinating computing power and arbitration priorities across units according to scene strategies, and monitoring and triggering dynamic migration of task subgraphs at the system level. These three levels of scheduling cover all dimensions of resource allocation needs, from micro to macro and from routine to emergency. This step is the core decision-making process that transforms scheduling strategies into executable instructions. Its necessity lies in its ability to achieve deep, flexible, and robust adaptation of computing resources to dynamic driving demands.
[0039] S203. Based on the dynamic scheduling decision information set, perform multi-dimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies to generate an adaptive optimization result set for assisted driving.
[0040] The adaptive optimization result set for assisted driving can refer to the optimization strategy knowledge base formed by the system through long-term monitoring of scheduling efficiency and data-driven learning. It can be directly called in subsequent similar scenarios to improve scheduling performance.
[0041] Specifically, even with the aforementioned advanced scene awareness and dynamic scheduling capabilities, an initially set scheduling strategy parameter cannot remain optimal under all operating conditions. The actual operating environments of vehicles vary greatly, onboard algorithm software is constantly updated, and hardware performance may slowly change over its lifecycle. If the scheduling system lacks the ability to learn and evolve, its initial advantages will diminish over time and with environmental changes, potentially leading to problems such as poor resource utilization or delayed response. Most existing solutions lack this continuous online optimization loop, causing the system to remain at a performance level similar to its factory settings for extended periods. This solution addresses this static pain point by simultaneously monitoring actual performance indicators across multiple dimensions after scheduling execution, including task timeliness, system power consumption, and safety margin changes. These measured data are then compared and analyzed with performance expectations based on scene predictions. By continuously accumulating data on the scenarios, scheduling decisions, and resulting deviations, the system can utilize data-driven methods, such as incremental learning, to dynamically optimize weight parameters in the task matching model and adjust trigger thresholds and strategy logic at each level of scheduling. The resulting set of optimization results enables the system to learn from high-frequency or critical scenarios, directly invoking optimization strategies when encountering similar conditions in the future, thereby achieving continuous self-improvement in scheduling efficiency. This step is the intelligent brain that ensures the system maintains an optimal or near-optimal operating state and achieves continuous evolution.
[0042] The approach provided in this embodiment first uses multimodal sensor data to identify driving scenario types and risk levels in real time, and dynamically constructs a heterogeneous computing task graph labeled with priorities and parameterization requirements. Second, task allocation is modeled as an optimization problem, generating an initial mapping and driving three-level adaptive scheduling: fine-grained load balancing and dedicated engine offloading among NPU cores; computing power collaboration and priority arbitration among heterogeneous units such as NPU, GPU, and CPU based on scenario strategies; and monitoring scenario switching and hardware status at the system level to trigger dynamic migration of task subgraphs. Finally, the system continuously monitors performance indicators such as latency and power consumption after scheduling, and through data comparison and incremental learning, optimizes scheduling strategy parameters in a closed loop, forming an adaptive optimization result set which is then output to automotive R&D personnel. This solution achieves adaptive and precise matching of computing resources to dynamic driving scenarios, significantly improving the overall utilization and energy efficiency of the heterogeneous computing platform; through three-level scheduling and closed-loop optimization, it ensures the real-time performance and reliability of task processing in complex scenarios, enhances the system's robustness in dealing with emergencies, and thus provides more efficient and safer core computing scheduling support for advanced driver assistance systems.
[0043] In some embodiments, the vehicle-mounted multimodal sensor dataset includes raw perception data from high-definition cameras, millimeter-wave radar, and lidar, as well as vehicle controller status data. Real-time pattern matching and decision tree inference are performed based on the vehicle-mounted multimodal sensor dataset to generate driving scene labels containing specific scene types and quantified safety level tags. Based on the driving scene labels, atomic task nodes for the entire process of the assisted driving algorithm are instantiated from a predefined task library, and a heterogeneous driving task graph is constructed according to data flow dependencies. Real-time load rate, available memory, and power consumption data of the heterogeneous computing unit cluster are collected to generate a snapshot of the heterogeneous computing resource status. The driving scene labels, the heterogeneous driving task graph, and the heterogeneous computing resource status snapshot are encapsulated to generate a dynamic driving scene information set.
[0044] Raw perception data can be low-level data streams directly collected by onboard physical sensors without any high-level semantic understanding or fusion processing. Examples include pixel arrays of RGB / BGR format images output from high-definition cameras, point lists containing distance, radial velocity, and azimuth information output from millimeter-wave radar, and three-dimensional spatial coordinates (x, y, z) and reflection intensity point clouds output from lidar. Vehicle controller status data can be data reflecting the vehicle's own dynamic state and chassis actuator state, obtained through the vehicle's internal network, such as the CAN bus (Controller Area Network). Examples include vehicle speed, yaw rate, steering wheel angle, accelerator pedal opening, brake pedal state, and gear information. Real-time pattern matching involves quickly comparing the features of the current perception data, such as lane line shapes in images or obstacle distribution in point clouds, with a pre-stored feature template library for similarity. Decision tree inference can be based on a series of pre-defined, hierarchical if-then rules. For example, if the target density is high and the vehicle speed is low, it is inferred to be a congested scenario. The matching results are then comprehensively judged to ultimately output a classification conclusion. Driving scenario tags can be composite tags that semantically categorize and quantify the driving environment in which the vehicle is currently located. A predefined task library can be a pre-created and stored repository of software components, containing encapsulated computational modules required to implement various assisted driving functions. The entire assisted driving algorithm process can be a collective term for a series of ordered computational steps from sensor data input to the generation of final vehicle control commands. Typical processes include: environmental perception, scenario understanding, decision planning, and control. An atomic task node can be a node representing an indivisible smallest computational unit or algorithm module in a heterogeneous driving task graph. It is the basic unit of task scheduling and execution, such as an inference task running a specific neural network model or a tracking task performing Kalman filtering. Data flow dependencies can be the constraints represented by directed edges connecting atomic task nodes in a heterogeneous driving task graph. They define the execution order and data transfer paths between tasks; that is, the output of one task is the input of one or more other tasks. A heterogeneous driving task graph can be a data structure that uses a directed graph model to formally describe the complete process of the assisted driving algorithm to be executed in a specific driving scenario. Nodes in the graph are atomic task nodes, and directed edges represent data flow dependencies. A heterogeneous computing resource state snapshot can be a data summary obtained by instantaneously collecting the overall operating status of an in-vehicle heterogeneous computing platform at a specific timestamp.
[0045] Specifically, traditional driver assistance systems typically rely on fixed, pre-compiled task pipelines for computational task scheduling, which cannot cope with dynamically changing driving scenarios, such as suddenly entering a congested urban area from a highway, or real-time fluctuations in hardware resources, such as a computing core throttling due to overheating. This rigid scheduling method leads to improper allocation of computing power in complex scenarios, either resulting in idle computing power wasting energy or insufficient computing power causing perception delays or even failures.
[0046] In the specific analysis process, the system executes the following steps at fixed intervals, such as 100 milliseconds, to generate a dynamic driving scenario information set: First, the data fusion module receives and time-aligns raw frame data from high-definition cameras, millimeter-wave radar, and lidar, as well as controller status data from the vehicle's CAN bus, such as vehicle speed and steering angle, forming a synchronized onboard multimodal sensor dataset. Next, the scene understanding engine performs real-time pattern matching and decision tree inference based on this dataset: it inputs image features such as traffic light recognition results, point cloud clustering features such as the number of obstacles, and vehicle states such as low-speed driving into a pre-trained decision tree model based on the CART (Classification and Regression Trees) algorithm. This model outputs a discrete specific scene type; for example, matching a pattern like multiple pedestrians or frequent stop-and-go driving infers an urban assisted driving type, and a continuous quantified safety level label; for example, a risk value of 0.8 is calculated by considering factors such as collision time and distance. Then, the task graph management module uses the driving scenario label as an index to instantiate atomic task nodes from a predefined task library. For example, for urban assisted driving, it loads node instances such as high-precision semantic segmentation, dense target tracking, and conservative decision-making from the library. These nodes are automatically connected according to the data flow dependencies defined in the algorithm manual; for example, the output port of the semantic segmentation node is connected to the input port of the target tracking node, thereby constructing a heterogeneous driving task graph. Simultaneously, the resource monitoring agent collects real-time load rates, available memory, and power consumption data of the heterogeneous computing unit cluster through operating system kernel modules, such as Linux's sysfs and performance counters, and packages them into a timestamped snapshot of the heterogeneous computing resource status. Finally, the wrapper packages the above three parts of information into a unified data structure, such as JSON or Protocol Buffers format, generates a dynamic driving scenario information set, and publishes it to the message bus for the scheduler to consume.
[0047] In alternative or modified implementations, the real-time pattern matching and decision tree inference can be replaced or enhanced with a scene classification model based on deep neural networks, such as ResNet or Transformer, or a hybrid inference method combining rule-based expert systems and learning models. The atomic task nodes in the predefined task library can be dynamically updated and loaded according to algorithm iterations, and their granularity can also be adjusted. For example, perception fusion can be treated as a large node, or further split into two finer-grained nodes: target detection and target attribute recognition. The collection frequency of the heterogeneous computing resource status snapshots can be dynamically adjusted according to system load, increasing the collection frequency when the load is high. The data interface can also use real-time communication middleware such as DDS for transmission. The encapsulation format can also use a custom binary format to improve encoding and decoding efficiency, or a shared memory pointer reference method can be used to avoid large data copying.
[0048] In some embodiments, based on image data from high-definition cameras, abrupt changes in lane line geometry, static obstacle contour density, and blind spot ratio are identified to quantify and generate road structure abrupt change features. Based on continuous frame point cloud data from millimeter-wave radar and lidar, the covariance of the motion trajectories of major traffic participants and the divergence of future trajectory packets are calculated to quantify and generate traffic intention uncertainty features. Based on vehicle controller state data, the consistency level of multi-source perception data is evaluated to quantify and generate perception confidence decay features. The road structure abrupt change features, traffic intention uncertainty features, and perception confidence decay features are mapped to generate specific scenario types, and a dynamic comprehensive risk level is calculated to jointly constitute a driving scenario label. Specific scenario types include urban assisted driving type, highway assisted driving type, and extreme assisted driving type.
[0049] Road structure abrupt change features can be numerical indicators generated by weighted fusion of sub-features such as the number of lane line geometric abrupt change points, static obstacle outline density, and blind spot ratio, based on high-definition camera images. These indicators comprehensively characterize the complexity and perception difficulty of the road's physical structure. Major traffic participants are dynamic obstacles that directly and significantly impact the vehicle's driving safety and decision-making in the current driving environment. These typically refer to vehicles, pedestrians, and non-motorized vehicles in front and to the sides, identified by clustering and tracking point cloud data from millimeter-wave radar and lidar. The covariance of a motion trajectory is a mathematical covariance matrix calculated by statistically analyzing a traffic participant's historical position sequence over a period of time. Its principal eigenvalues reflect the spatial dispersion of the trajectory points, quantifying the instability of historical motion. The divergence of future trajectory packets can be generated by using a trajectory prediction model to create multiple possible future trajectories based on the traffic participant's historical motion state. The statistical variance of these predicted trajectories at the endpoint or other key points is then calculated to quantify the uncertainty of future motion intentions or the degree of difference between multiple possibilities. Traffic intention uncertainty can be a single numerical feature generated by normalizing and fusing the covariance of the comprehensive motion trajectory and the divergence of the future trajectory package, such as through weighted summation, to quantify the unpredictability of traffic participant behavior. Perception confidence decay can be an indicator that quantifies the decline in the overall reliability and data consistency of the multi-sensor fusion system at the current moment by calculating the degree of difference between reported values (e.g., average Euclidean distance difference) and statistically analyzing the proportion of matching pairs exceeding a consistency threshold through spatiotemporal alignment and correlation of perception results (e.g., position, speed) of the same target from multiple sources such as cameras, millimeter-wave radar, and lidar, and then calculating the proportion of matching pairs exceeding a consistency threshold. Specific scenario types can be high-level semantic classifications of the current driving environment mapped from multi-dimensional quantitative features such as road structure abrupt changes, traffic intention uncertainty, and perception confidence decay through pre-defined classification rules or machine learning models (e.g., decision trees, support vector machines). Driving scenario labels can be information pairs or structures composed of specific scenario types and a numerically quantified dynamic comprehensive risk level, used to comprehensively and quantitatively describe the key attributes of the current driving environment.
[0050] Specifically, traditional driver assistance systems typically rely on pre-set simple rules or simple classification of a single modality for scene recognition, which cannot accurately quantify the dynamic and multidimensional complexity within a scene. For example, even within the same urban scenario, the computational requirements of a congested intersection are drastically different from those of a smooth main road; whether the vehicle in front is following smoothly or swaying with unclear intentions also places vastly different demands on predictive computing power.
[0051] In the specific analysis process, the system executes three feature quantization pipelines in parallel. For road structure abrupt change features: First, based on high-definition camera images, a deep learning-based lane detection model, such as LaneNet, is used to extract pixel-level lane lines, and a parameterized representation is obtained through polynomial fitting or spline curves. Abrupt points in the lane line geometry are identified by calculating the rate of change of the first and second derivatives of the fitted curve. When the rate of change of curvature exceeds a preset threshold, such as 0.1, or a bifurcation point is detected, it is marked as an abrupt point, its number is counted, and its spatial distribution density is calculated. Simultaneously, an instance segmentation model, such as Mask R-CNN, is used to identify static obstacle contours. The ratio of the sum of all contour pixels to the total number of pixels in the effective perception area of the image is calculated to obtain the static obstacle contour density. Combining the vehicle CAD model and perception results, the blind spot ratio is calculated through geometric projection. Finally, the three sub-indicators—abrupt point density, obstacle contour density, and blind spot ratio—are normalized and then weighted and summed according to preset weights, such as 0.4, 0.4, and 0.2, to quantify and generate road structure abrupt change features, such as a comprehensive value of 0.75. For traffic intention uncertainty features: Based on continuous frame point clouds from millimeter-wave radar and lidar, historical trajectories of major traffic participants are obtained through multi-target tracking algorithms, such as JPDA (Joint Probabilistic Data Association) or deep learning-based tracking. The covariance of the trajectories of major traffic participants is calculated through the prediction step of an extended Kalman filter to obtain the error covariance matrix of target state estimation, and its trace is used as a measure of instantaneous uncertainty. Simultaneously, the historical trajectories are input into a multimodal trajectory prediction network to generate multiple probabilistic trajectories for the next few seconds. The divergence of future trajectory packets is calculated by calculating the average Hausdorff distance or entropy value of these trajectories at the endpoint or along the entire path. The covariance trace and trajectory packet divergence are normalized and then fused to quantify and generate traffic intention uncertainty features. For the perception confidence decay feature: Based on vehicle controller state data, such as timestamps and validity indicators of raw data from each sensor, the consistency level of multi-source perception data is evaluated. After spatiotemporal alignment, the measurements of the same target attributes, such as position and speed, from different sensors are compared, and statistical measures of their differences, such as Mahalanobis distance, are calculated. This is combined with the signal-to-noise ratio or quality score reported by each sensor. Finally, a time-related decay coefficient, such as the time decay since the last high consistency confirmation, is introduced to comprehensively quantify and generate the perception confidence decay feature. Ultimately, these three feature vectors are input into a pre-defined classifier based on support vector machines or lightweight neural networks to map and generate specific scene types. A weighted formula is then used to combine the three feature values and vehicle speed and other states to calculate a dynamic comprehensive risk level, such as a value between 0 and 1. Both of these factors together constitute the driving scene label.
[0052] In alternative or modified implementations, the identification of lane line geometric abrupt changes can be achieved without relying on deep learning models, using traditional edge detection and Hough transform methods. Abrupt change detection can also employ curvature scale space-based methods. The calculation of static obstacle contour density can rely not only on vision but also on incorporating density clustering results from LiDAR point clouds to obtain more accurate contours. The divergence calculation of future trajectory packets can use KL divergence or kernel density estimation-based methods. Assessing the consistency level of multi-source perception data can involve introducing more complex association matching algorithms and fuzzy logic-based confidence fusion models. Preset classifiers for mapping and generating specific scene types can employ decision trees, random forests, or continuously updated classification models through online learning. The weights in the weighted formula for calculating the dynamic comprehensive risk level can be dynamically optimized using reinforcement learning based on historical accident data or driver intervention data.
[0053] In some embodiments, atomic task nodes adapted to the current scenario are dynamically selected and instantiated from a predefined task library using a specific scenario type as an index. These atomic task nodes include perception-type atomic task nodes, prediction-type atomic task nodes, and decision-type atomic task nodes. Based on the quantified value of road structure mutation characteristics, the processing accuracy and update frequency parameters of the perception-type atomic task nodes are dynamically adjusted as perception fusion node data. Based on the quantified value of traffic intention uncertainty characteristics, multiple hypothesis calculation branches are set for prediction-type and decision-type atomic task nodes, and the dependency edge weights between these two and the perception fusion node data are enhanced. The dynamic comprehensive risk level is mapped to a global task execution urgency coefficient, thereby uniformly labeling the heterogeneous scheduling priority of all atomic task nodes. Atomic task nodes with accompanying perception fusion node data, multiple hypothesis calculation branches, and heterogeneous scheduling priorities are connected according to the data flow dependency relationship of the algorithm flow to generate a heterogeneous driving task graph.
[0054] Perception-type atomic task nodes can be independently scheduled and managed computing units that perform basic functions such as environmental perception and data acquisition. Examples include image-based target detection task nodes, LiDAR-based point cloud segmentation task nodes, or multi-sensor data fusion task nodes. Prediction-type atomic task nodes can be computing units that infer the future motion state or intention of traffic participants based on historical and current perception data. Examples include trajectory prediction nodes, behavioral intention prediction nodes, or risk field prediction nodes. Decision-type atomic task nodes can be computing units that integrate perception and prediction information to generate vehicle motion control commands or behavioral strategies. Examples include behavior planning nodes, local path planning nodes, or emergency braking decision nodes. Perception fusion node data can be a set of runtime parameters dynamically configured and attached to perception-type atomic task nodes after instantiation, based on scene characteristics such as road structure abrupt changes. This data is mainly used to characterize the quality and timeliness requirements of the node when performing tasks, such as the image resolution used for processing, the accuracy of the neural network model, and the cycle in which the task is triggered. Multi-hypothesis computation branches can be computational paths set up in parallel within prediction or decision-making atomic task nodes, based on different initial conditions or logical assumptions, to cope with future uncertainties. For example, within a trajectory prediction node, copies of prediction algorithms under three different assumptions—maintaining the current lane, performing a left lane change, and performing a right lane change—can run simultaneously. Dependency edge weights can be quantified priority values labeled on directed edges connecting two atomic task nodes in the heterogeneous driving task graph, used to characterize the downstream node's time sensitivity and dependence on the upstream node's output data. The global task execution urgency coefficient can be an adjustment coefficient obtained by mapping the dynamic comprehensive risk level scalar value through a preset amplification function, such as linear scaling or exponential scaling, used to uniformly increase or decrease the urgency of all tasks in the entire task graph. Heterogeneous scheduling priority can be a unique ranking score assigned to each atomic task node in the heterogeneous driving task graph, used for arbitration during resource contention.
[0055] Specifically, traditional driver assistance systems employ static, fixed algorithm flows, which cannot flexibly adjust their computational logic and resource requirements based on the fine-grained characteristics of real-time scenarios. For example, in complex intersections with dense obstacles, using low-frequency, low-precision perception models may lead to missed target detections; when the intentions of the vehicle ahead are unclear, if the prediction module only performs single-path prediction, the decision-making risk is extremely high.
[0056] In the specific analysis process, the system first uses the generated specific scene type as the primary key to query a predefined task library. This library is a structured list stored in the vehicle database or configuration file, defining the types of atomic task nodes to be activated under different scene types, default parameters, and connection templates. For example, for the urban assisted driving type, the library defines nodes that need to be instantiated, such as high-resolution semantic segmentation, dense target tracking and attribute recognition, multimodal trajectory prediction, and conservative behavior decision-making. The system then creates instances of these nodes in memory accordingly. Next, based on the quantization value of the road structure abrupt change features, the system dynamically adjusts the processing accuracy parameters and update frequency parameters of the perception-type atomic task nodes: for example, when the feature value is 0.8, the system switches the model of the high-resolution semantic segmentation node from the lightweight MobileNet-V3 to the high-precision ResNet-50 and increases its processing frequency from 10Hz to 20Hz. The adjusted node and its output constitute the perception fusion node data. Then, based on the quantified value of the uncertainty feature of traffic intention, multiple hypothesis calculation branches are set for prediction-type atomic task nodes and decision-type atomic task nodes, and the dependency edge weights between the two and the perception fusion node data are enhanced. For example, when the feature value is 0.7, the system configures three parallel calculation branches for the multimodal trajectory prediction node, corresponding to three intention hypotheses: target vehicle keeping lane, changing lanes to the left, and decelerating and stopping. At the same time, the dependency edge weights between this prediction node and the conservative behavior decision node and the upstream dense target tracking node are increased from the default value of 1.0 to 1.5 to ensure that high-quality perception data is given priority. After that, the dynamic comprehensive risk level is mapped to the global task execution urgency coefficient: the risk level, such as 0.85, is mapped to a coefficient, such as 1.3, through a preset Sigmoid function. Thus, the heterogeneous scheduling priority is uniformly labeled for all atomic task nodes: each node has its type-based baseline priority, such as 90 for the perception node. This baseline priority is multiplied by the global urgency coefficient, such as 1.3, to obtain the final priority, such as 117, and rounded to the nearest integer. Finally, based on the data flow dependencies defined in the algorithm white paper, such as semantic segmentation output -> target tracking input; target tracking output -> trajectory prediction input; trajectory prediction output -> behavior decision input, the system uses a graph theory library, such as Boost Graph Library, to connect these nodes with parameters and generate a heterogeneous driving task graph.
[0057] In alternative or modified implementations, the predefined task library can be dynamically updated remotely using a version-managed configuration center to support OTA (Over-The-Air) technology for algorithm upgrades. Strategies for adjusting perception node parameters are not limited to model switching and frequency adjustment; they can also include dynamically adjusting image regions of interest, point cloud downsampling rates, or radar detection confidence thresholds. The number and type of multi-hypothesis computation branches can be dynamically planned continuously or discretely based on uncertainty characteristics; for example, clustering algorithms can be used to automatically generate the most representative set of intent hypotheses. Dynamic enhancement mechanisms dependent on edge weights can be learned and generated using more complex graph attention networks, rather than fixed rules. The mapping function for the global task execution urgency coefficient can be generated using a small neural network trained on historical intervention data. The synthesis method for heterogeneous scheduling priorities can incorporate other dimensions such as node computational complexity estimation and data freshness requirements for multi-objective optimization.
[0058] In some embodiments, based on heterogeneous driving task graphs and snapshots of heterogeneous computing resource states, the task allocation problem is modeled as a multi-objective optimization game. A fast convergence algorithm is used to solve for the optimal matching scheme, generating a task-unit matching mapping table. Based on this mapping table, load-aware fine-grained task partitioning and resource allocation are performed among several computing cores within the NPU, generating fine-grained inter-core scheduling instructions. Simultaneously, scenario-demand-oriented computing power collaboration and priority arbitration are performed between heterogeneous computing unit clusters, generating cross-computing unit collaborative scheduling instructions. Scenario switching events and the health status of the heterogeneous computing unit clusters are continuously monitored. If a scenario switch or abnormal health status is detected, a task subgraph redeployment plan is triggered, generating dynamic task migration guidelines. The fine-grained inter-core scheduling instructions, cross-computing unit collaborative scheduling instructions, and dynamic task migration guidelines are encapsulated to generate a dynamic scheduling decision information set.
[0059] Multi-objective optimization game can be seen as a complex allocation problem mapping heterogeneous task graphs to heterogeneous hardware clusters. It can be abstracted into a mathematical model where multiple objectives conflict, i.e., the task nodes to be assigned and the available computing units compete and cooperate. Fast convergence algorithms refer to intelligent optimization algorithms with efficient search capabilities applied to solve the above multi-objective optimization game model, such as improved genetic algorithms or particle swarm optimization algorithms. A task-unit matching mapping table is a data structure table generated after solving the multi-objective optimization game using a fast convergence algorithm. It clearly records the correspondence between which specific heterogeneous computing units each atomic task node is assigned to for execution. Fine-grained inter-core scheduling instructions are a set of commands based on the task-unit matching mapping table, performing micro-operations such as task splitting, load migration, or calling dedicated engines among multiple computing cores within an NPU chip. Its core is to achieve fine-grained management of computing resources within a single NPU chip. A heterogeneous computing unit cluster can be a hardware collection composed of computing units with different architectures and computing power characteristics, used for collaborative processing of various computing tasks in an assisted driving system. Cross-computing unit collaborative scheduling instructions can refer to a set of macro-coordination commands based on a task-unit matching mapping table, used for task distribution, pipeline orchestration, or computing power priority arbitration between different types of computing units, aiming to achieve global collaboration between heterogeneous hardware clusters. Dynamic task migration guidance can be a plan and execution command generated by the system when a driving scenario label switch or an abnormal hardware unit health status is detected, guiding the migration of a group of associated atomic task nodes from the original computing unit to a new computing unit.
[0060] Specifically, traditional task scheduling typically employs static binding or simple polling strategies, failing to perform dynamic and global optimization matching based on real-time changes in the heterogeneous driving task graph and snapshots of heterogeneous computing resource states. This leads to severe latency caused by computing power contention in high-load scenarios, while computing power remains idle and energy is wasted in low-load scenarios.
[0061] In the specific analysis process, the scheduling decision engine periodically executes the following process, such as every 100ms or when the scene information set is updated: First, the task allocation problem is modeled as a multi-objective optimization game. The system estimates the expected execution time, power consumption, and memory usage of each atomic task node (M in total) on each candidate computing unit (N in total, including different cores of the NPU, GPU, CPU, etc.), forming an MxN dimensional cost matrix. The optimization objective is defined as a weighted minimization of the total execution time, total power consumption, and the standard deviation of the load on each computing unit. This problem is modeled as a constrained allocation game. Next, the optimal matching scheme is solved using a fast convergence algorithm, such as using an improved Hungarian algorithm to handle this multi-objective problem. By scalarization, such as assigning weights to each objective, the multi-objective problem is transformed into a single objective before solving, and a task-unit matching mapping table is quickly output. Then, based on this mapping table, three levels of instructions are generated in parallel: 1. Generate fine-grained inter-core scheduling instructions: For task nodes mapped to the NPU, the scheduler uses the NPU driver interface to segment the computation graph corresponding to the node and load it onto the specified computation core, configures the data transfer path for shared memory between cores, and sets synchronization barriers. 2. Generate cross-computing unit collaborative scheduling instructions: The scheduler sends task start instructions to units such as GPU and CPU through the operating system or middleware, configures data DMA transfer channels across PCIe or internal buses, and sets global task priorities according to the scenario type. 3. Generate dynamic task migration guidance: The health monitoring module runs continuously. If it detects a scenario type change or a CPU core temperature exceeding a preset threshold, such as 85℃ (this threshold is preset during system initialization based on chip specifications and heat dissipation solutions), the contingency plan generation process is immediately triggered. The contingency plan generator will generate detailed migration scripts for the task subgraphs to be migrated based on the new matching table or healthy unit list, including checkpoint saving instructions on the source unit, data migration paths, and state recovery instructions on the target unit, as guidance for future use.
[0062] In alternative or modified implementations, the multi-objective optimization game model can employ different models such as cooperative game, non-cooperative game, or Stackelberg game. The fast convergence algorithm can be replaced by a combination of auction algorithm, greedy algorithm, and local search, or a policy network trained with deep reinforcement learning can be used to directly output matching decisions. The generation of fine-grained inter-core scheduling instructions can be more dynamic, for example, supporting task-stealing strategies that can be fine-tuned at runtime based on the actual load of the cores. The generation of cross-computing unit collaborative scheduling instructions can be based on a more complex policy rule base or a lightweight scheduling policy network. The preset threshold used to trigger migration can employ an adaptive threshold algorithm, dynamically adjusted according to historical running conditions. The generation of dynamic task migration guidelines can support more advanced fault-tolerance mechanisms such as incremental migration or hot migration.
[0063] In some embodiments, the NPU computing cores are dynamically and logically grouped according to the computing power requirements of driving scenario labels. For unstructured urban scenarios, multiple cores are grouped into collaborative computing units to process large-scale fusion models. For structured high-speed scenarios, the core groups are decoupled and some cores are shut down to reduce power consumption. The utilization rate of each computing core and the communication latency between adjacent cores are monitored in real time through the on-chip network. When the load deviation is detected to exceed a preset threshold, some computing subgraphs on high-load cores are migrated to low-load cores, and load balancing migration instructions are generated and executed. Atomic task nodes marked with fixed computing patterns and high data throughput in the heterogeneous driving task graph are identified, unloaded from the general computing cores, and scheduled to be executed by the dedicated hardware acceleration engine integrated in the NPU, generating and executing dedicated engine offloading instructions.
[0064] Dynamically grouped NPU computing cores can be dynamically divided into different logical computing units at the software level based on the different computing power requirements of the upper-layer driving scenario labels. Alternatively, they can be decoupled into independent cores and some idle cores can be shut down to flexibly adapt to the computing power and power consumption requirements of the scenario. Unstructured urban scenarios can be driving scenarios with complex traffic environments, lack of clear rule constraints, and highly uncertain behavior of traffic participants. Their characteristics include, but are not limited to, dense and irregular static obstacles, frequent traffic flow weaving, and highly dynamic intentions of traffic participants. Structured high-speed scenarios can be driving scenarios with relatively simple traffic environments, clear rules, and highly predictable vehicle behavior. Their characteristics include clear lane lines, long straight or large-radius curvature roads, and traffic participants driving in the same direction at a constant speed with a single intention. The on-chip network can be a high-speed interconnect communication architecture integrated inside the NPU chip to connect various computing cores, storage units, and dedicated acceleration engines. It allows direct, low-latency data exchange between cores and can be monitored to obtain real-time communication load and latency data between computing cores. Load balancing migration instructions refer to scheduling commands generated by the scheduler when the on-chip network monitoring system detects a significant imbalance in the load of computing cores within the NPU. These commands migrate a specific computational subgraph and its associated context currently executing on a high-load core directly to a low-load core for continued execution. Fixed computational patterns refer to computational tasks whose algorithmic operations are highly regular and predictable, and whose computation process can be efficiently executed in a pipelined manner by highly optimized dedicated circuits at the hardware level. High data throughput refers to the characteristic of atomic task nodes exchanging massive amounts of data between their input / output ports and memory or other computing units during execution. Dedicated engine offloading instructions refer to instructions generated by the scheduler after identifying atomic task nodes in a heterogeneous driving task graph suitable for execution by a dedicated hardware acceleration engine. These instructions unload the node from the task queue of a general-purpose computing core and reschedule it to the corresponding dedicated engine for execution.
[0065] Specifically, traditional NPU scheduling typically treats computing cores as fixed, independent resources, either binding all cores together to process large models or allowing them to run small tasks completely independently. This is ill-suited for the dynamic switching between complex urban and simple highway driving scenarios.
[0066] In the specific analysis process, the system activates the NPU internal scheduler based on the received driving scenario label. First, according to the computing power requirements of the driving scenario label, the NPU computing cores are dynamically and logically grouped: if the label is for urban assisted driving, the scheduler binds, for example, 6 out of 8 physical computing cores into a collaborative computing unit by configuring the NPU's inter-core interconnect registers. This unit shares a unified instruction dispatch and data cache and is used to process large fusion models such as BEV (Bird's EyeView) spatial modeling; the remaining 2 cores are used as independent units to handle lightweight tasks. If the label is for highway assisted driving, the scheduler decouples the previously bound core groups, restores the 8 independent cores, and shuts down the 3 idle cores based on task load assessment using power gating technology to reduce power consumption. Second, the utilization rate of each computing core and the communication latency between adjacent cores are monitored in real time through the on-chip network: a lightweight monitoring agent periodically collects data from the performance counters of each computing core and the on-chip network router, such as every 1ms. When a load deviation exceeding a preset threshold is detected (e.g., core A utilization is 90% while core B utilization is only 30%), the scheduler analyzes the task graph, identifies divisible computational subgraphs on the high-load cores (e.g., a large convolutional layer can be divided into multiple channel groups), migrates some computational subgraphs on the high-load cores to the low-load cores, updates the data flow path of the task graph, and generates and executes load balancing migration instructions. Finally, it identifies atomic task nodes in the heterogeneous driving task graph marked with fixed computational patterns and high data throughput: the scheduler parses task node attributes, identifying nodes with fixed patterns and high computational density, such as standard convolution computation and self-attention matrix multiplication. These nodes are offloaded from the general-purpose computational cores and scheduled to be executed by the dedicated hardware acceleration engine integrated within the NPU: by calling specific APIs (Application Programming Interfaces) provided by the NPU driver, the computational operator descriptors and input data pointers of such nodes are directly submitted to the corresponding dedicated engine, such as a Tensor Core / matrix multiplication accelerator, generating and executing dedicated engine offloading instructions. After completion, the dedicated engine writes the results back to shared memory.
[0067] In alternative or modified implementations, dynamic grouping strategies are not limited to binary division; more flexible cluster partitioning can be employed, such as dividing the core into 2-3 clusters of different sizes to handle moderately complex scenarios. Load balancing monitoring is not limited to utilization but can also incorporate metrics such as cache hit rate and memory bandwidth usage. Computational subgraph migration can avoid static partitioning and instead utilize dynamic task stealing techniques based on runtime profiling. Identifying tasks that can be offloaded to a dedicated hardware acceleration engine can be achieved through a pre-compiled database containing operator-engine mappings, loaded during NPU driver initialization. For operators without a dedicated engine, attempts can be made to convert them into equivalent operator sequences that can be executed by existing engines.
[0068] In some embodiments, when the driving scenario label is urban assisted driving type, a first collaborative strategy instruction is generated: the image semantic segmentation and BEV spatial modeling task scheduling instructions are directed to the NPU computing core group, the multi-target trajectory prediction task scheduling instructions are directed to the GPU, the vehicle lateral and longitudinal control task scheduling instructions are directed to the CPU, and the ISP (Image Signal Processor) is configured to enter the enhanced image preprocessing mode; when the driving scenario label is high-speed assisted driving type, a second collaborative strategy instruction is generated: the target tracking task is decomposed into feature extraction and data association, and its scheduling instructions are directed to the NPU and GPU respectively for collaborative pipeline processing, the path trajectory smoothing task scheduling instruction is directed to the CPU, and an energy-saving control instruction to reduce the working frequency of the NPU non-core computing units is generated; when the driving scenario label is extreme assisted driving type, a third collaborative strategy instruction is generated: the highest priority preemption instruction is broadcast to all computing units, all non-safety-critical tasks are suspended, and the obstacle emergency recognition and braking decision tasks are ensured to obtain exclusive computing resources until the safety level is reduced.
[0069] Image semantic segmentation is a computer vision task that classifies each pixel in an image captured by an onboard camera and assigns it to a specific semantic category. It is typically performed by a deep learning model and is fundamental to environmental perception. The BEV space modeling task scheduling instruction can be a hardware scheduling command generated based on the first cooperative strategy instruction, used to schedule the atomic task node of BEV space modeling to the NPU computing core group for execution. This task is responsible for transforming and fusing images from multiple cameras into a unified bird's-eye view feature map through a neural network. The multi-object trajectory prediction task scheduling instruction can be a hardware scheduling command generated based on the first cooperative strategy instruction, used to schedule the atomic task node of multi-object trajectory prediction to the GPU for execution. This task predicts the possible future movement paths of multiple traffic participants in parallel based on historical trajectories and current states. The GPU can be a graphics processing unit, a heterogeneous computing unit adept at large-scale parallel floating-point operations. The vehicle lateral and longitudinal control task scheduling instruction can be a hardware scheduling command generated based on the first cooperative strategy instruction, used to schedule the atomic task node of vehicle lateral and longitudinal control to the CPU for execution. This task calculates specific throttle, braking, and steering control quantities based on the output of the decision module. The CPU can be a central processing unit, a heterogeneous computing unit adept at complex logical judgments, task scheduling, and serial computation. The ISP can be an image signal processor, a hardware unit specifically designed to process the raw signals output from image sensors, performing preprocessing operations such as de-mosaicing, noise reduction, and color correction. Enhanced image preprocessing modes can be a higher-performance ISP operating state dynamically activated by scheduling instructions under complex lighting conditions, such as in urban scenes. This could involve enabling stronger multi-frame noise reduction algorithms and wider dynamic range processing to improve the robustness of subsequent perception algorithms. Cooperative pipeline processing, in high-speed assisted driving scenarios, can decouple the target tracking task in the time dimension into two stages—feature extraction and data association—based on a second cooperative strategy instruction, and arrange for their asynchronous parallel execution on the NPU and GPU, thus forming a scheduling mechanism for an efficient data processing pipeline. Path smoothing tasks, in high-speed assisted driving scenarios, can be computational tasks executed by the CPU, performing Kalman filtering and interpolation on the initial path points or control instructions generated by the planning module to eliminate jitter and ensure smooth vehicle operation. Energy-saving control commands can be hardware control commands generated based on the second cooperative strategy command under high-speed assisted driving type, used to reduce the operating voltage or clock frequency of non-core computing units in heterogeneous computing unit clusters, such as some idle computing cores within the NPU, to achieve system-level power consumption optimization. The highest priority preemption command can be a highest-level interrupt signal generated based on the third cooperative strategy command under extreme assisted driving type, sent to all computing unit schedulers. This signal forcibly interrupts the current task flow and immediately suspends all non-safety-critical tasks according to a preset priority label.Non-safety-critical tasks can be task nodes marked in the heterogeneous driving task graph as tasks that can be suspended or delayed in emergency situations and will not directly affect the instantaneous safety status of the vehicle, such as in-vehicle infotainment, navigation interface rendering, or some background log recording tasks.
[0070] Specifically, traditional heterogeneous computing scheduling often employs fixed hardware task binding, failing to dynamically adjust the collaboration modes and energy efficiency strategies of different hardware based on scenario requirements. For instance, in complex urban scenarios, allocating intensive image segmentation tasks to GPUs instead of the more efficient NPUs will lead to processing latency; in stable scenarios like high-speed cruising, running all hardware at full speed will result in unnecessary power consumption waste; and in emergency collision avoidance scenarios, if critical braking tasks need to queue up and wait for other non-critical tasks to release computing resources, it will lead to accidents.
[0071] In the specific analysis process, the cross-unit cooperative scheduler queries and executes the corresponding predefined cooperative strategies based on the specific scenario type field in the driving scenario label. These strategies are stored in a preset mapping relationship library, which is a golden configuration template manually optimized for each scenario during the system design phase through extensive offline benchmark testing and expert analysis. When the label is for the urban assisted driving type, the scheduler generates the first cooperative strategy instruction: specifically, the scheduling instructions for the image semantic segmentation and BEV spatial modeling tasks are directed to the internally dynamically grouped NPU computing cores; the scheduling instruction for the multi-object trajectory prediction task is directed to the GPU, utilizing its massive parallel advantages to process multiple hypothetical trajectories; and the scheduling instructions for the vehicle lateral and longitudinal control tasks are directed to the CPU's real-time cores, utilizing its low latency and strong logic processing capabilities. At the same time, a register configuration command is sent to the ISP chip via the I2C bus (Inter-Integrated Circuit) to switch it to an enhanced image preprocessing mode to cope with complex lighting and occlusion in urban environments. When the tag is for high-speed assisted driving, the scheduler generates a second cooperative strategy instruction: specifically, it decomposes the target tracking task, directing the feature extraction sub-stage scheduling instruction to the NPU's dedicated engine and the data association sub-stage scheduling instruction to the GPU, with both forming a cooperative pipeline processing through shared memory; the path trajectory smoothing task scheduling instruction is directed to the CPU. Simultaneously, an energy-saving control instruction is generated, reducing the operating frequency of its internal non-core computing units through the NPU's power management unit interface. When the tag is for extreme assisted driving, the scheduler generates a third cooperative strategy instruction: specifically, it broadcasts a highest-priority preemption instruction to all computing units through the API of a real-time operating system, such as AUTOSAR OS; this instruction forcibly suspends all non-safety-critical tasks, such as map rendering and voice assistants, and allocates exclusive computing resources for the obstacle emergency recognition and braking decision tasks, for example, locking a CPU core and reserving a specific GPU computing unit until the risk is eliminated.
[0072] In alternative or modified implementations, the preset mapping database may not be static, but can be dynamically fine-tuned based on actual execution performance through online learning. The collaborative strategies are not limited to three types; sub-strategies can be defined for more granular scenarios. The granularity of collaborative pipeline processing can be further refined, decomposing a task into more stages and expanding across more types of computing units. Energy-saving control commands are not limited to reducing frequency; dynamic voltage and frequency adjustments or shutdown of some computing units can also be used. For resource preemption mechanisms in extreme scenarios, in addition to software commands, a hardware-based watchdog circuit can be designed to directly generate a hardware interrupt to preempt resources upon detecting an emergency signal.
[0073] In some embodiments, when the scene recognition module detects a change in the specific scene type in the driving scene label, a scene migration process is triggered: based on the new and old scene types, the task allocation game is re-solved to generate a new matching mapping table, and the task subgraph of the scene that is about to fail is smoothly migrated from its original computing unit to the newly mapped computing unit; when the health monitoring module detects that the utilization rate of any computing unit is continuously overloaded or a temperature alarm is triggered, a state migration process is triggered: the highest load atomic task nodes on this computing unit, and the subgraph formed by their direct predecessor and successor dependent nodes in the heterogeneous driving task graph, are migrated as a whole to a healthy redundant computing unit; after each migration process is completed, the heterogeneous computing resource state snapshot is updated, and the scheduling balance of the heterogeneous computing unit cluster is re-evaluated.
[0074] The scene migration process can be an automated sequence of operations triggered by the scheduler when the scene recognition module detects a change in the specific scene type in the driving scene label. Its core is to reallocate computing units to all atomic task nodes based on the differences in requirements between the old and new scenes, and to migrate the currently executing tasks from the old units to the new units, achieving seamless reconfiguration of computing resources and scene requirements. The state migration process can be a load transfer and fault avoidance sequence triggered by the scheduler when the health monitoring module detects abnormal health conditions such as continuous overload or temperature exceeding a safety threshold in a heterogeneous computing unit. Its core is to migrate the most critical task subgraph with data dependencies on the abnormal unit to other healthy redundant units to maintain the stability of the overall system's computing power. Directly dependent nodes refer to nodes in the heterogeneous driving task graph that are directly connected to a target atomic task node through data flow edges: nodes that receive the output data of this node as input are called its dependent nodes, and nodes that provide the input data are called its dependent nodes. Scheduling balance refers to the comprehensive evaluation of the real-time load rate, memory usage, and task queue length of each unit in the heterogeneous computing unit cluster after a migration operation is completed, in order to determine whether the computing load is evenly and efficiently distributed across the units.
[0075] Specifically, in dynamic driving environments, scene transitions and occasional hardware unit failures are common occurrences. Traditional scheduling systems either fail to respond to these issues, causing tasks to run inefficiently on faulty hardware, or they can only abruptly terminate all tasks on the faulty unit and restart it, resulting in a momentary interruption of critical driving functions and posing serious safety risks.
[0076] In the specific analysis process, the task dynamic migration manager runs as a resident service. When the scene recognition module publishes a scene type change event via the message bus, for example, from highway assisted driving type to city assisted driving type, the migration manager immediately triggers the scene migration process: it first calls the task allocation engine according to the new scene type to resolve the task allocation game and generate a new matching mapping table. Then, the manager compares the old and new mapping tables to find all atomic task nodes whose allocation targets have changed. These nodes and their dependencies constitute the task subgraph for the scene that is about to fail. For each subgraph to be migrated, the manager triggers a smooth migration through the runtime API of the original computing unit: pausing the computation of the subgraph, serializing its intermediate state in memory, transferring it to the newly mapped computing unit via high-speed interconnect DMA (Direct Memory Access), and resuming execution from the checkpoint on the new unit. When the health monitoring module detects that a computing unit's utilization is continuously overloaded (e.g., >95% for 5 consecutive cycles) or a temperature alarm (e.g., chip junction temperature > a preset threshold of 105℃, which is preset at startup based on safety specifications in the chip datasheet and system thermal design), the migration manager triggers a state migration process. It first locates the task node with the highest load on the abnormal unit, then traverses its direct predecessor nodes upstream and its direct successor nodes downstream, using these nodes as targets for the overall subgraph migration. The manager selects a healthy redundant computing unit from a predefined resource pool, such as another GPU or CPU core of the same model, and migrates the entire subgraph, including checkpoint states and data flow definitions, to it, updating the routing information of the global task graph. After each migration process is completed, the migration manager calls the resource monitoring service to update the heterogeneous computing resource state snapshot and, based on the latest snapshot and task graph, reruns a lightweight load balancing evaluation algorithm to reassess the scheduling balance of the heterogeneous computing unit cluster.
[0077] In alternative or modified implementations, smooth migration can be achieved without relying on complete checkpoints, instead employing a dual-active computing mode where old and new units compute in parallel for a period before switching. The preset threshold for triggering state migration can use an adaptive algorithm, dynamically adjusted based on historical load and thermal efficiency. The scope of the entire subgraph migration is not limited to directly adjacent nodes; it can be based on task graph analysis, migrating a complete functional module subgraph to optimize performance. For systems without redundant units, the state migration process can be modified to perform task degradation or offloading within the abnormal unit, rather than a complete migration out.
[0078] In some embodiments, performance indicators after scheduling based on a dynamic scheduling decision information set are collected in real time. These performance indicators include the actual completion time of each atomic task node in the heterogeneous driving task graph, the actual power consumption of the heterogeneous computing unit cluster, and the actual risk change rate of the driving scenario label. The performance indicators are compared and analyzed with the performance expectations preset based on the driving scenario label to identify performance deviations and their corresponding scenario conditions and scheduling decisions. Based on the continuous analysis results of performance deviations, the generation strategy of the task-unit matching mapping table and the strategy parameters in the three-level adaptive scheduling are dynamically updated using an incremental learning method to form an optimized scheduling strategy. The optimized scheduling strategy is encapsulated and stored for direct invocation by subsequent identical or similar driving scenario labels to generate an adaptive optimization result set for assisted driving.
[0079] Performance metrics refer to the quantifiable data set generated by the system after executing scheduling based on a dynamic scheduling decision information set, used to measure the quality of the scheduling scheme. Performance expectations refer to the pre-set target values or reasonable ranges for various performance metrics based on the current driving scenario labels, combined with system design goals and historical optimization data. For example, in urban assisted driving, the expected completion time for image semantic segmentation nodes is less than 30 milliseconds, and the expected power consumption of the NPU module is less than 8 watts. Performance deviation refers to the numerical value that quantifies the gap between actual performance and expected targets, identified by comparing and analyzing collected performance metrics such as actual completion time, actual power consumption, and actual risk change rate with the corresponding performance expectations. For example, if the actual completion time of a node is 35 milliseconds, exceeding the expectation by 5 milliseconds, then its time deviation value is +5 milliseconds. Incremental learning can be a machine learning strategy that, based on newly generated performance deviations and their corresponding scenario conditions and scheduling decisions, dynamically fine-tunes rather than completely reconstructs the task-unit matching mapping table generation strategy and key parameters in three-level adaptive scheduling, such as load balancing migration thresholds and energy-saving control frequency levels. This allows the scheduling strategy to continuously adapt to hardware characteristic drift and scenario mode changes. An optimized scheduling strategy can refer to a better set of scheduling rules and parameters formed through dynamic updates during the incremental learning process. This set is encapsulated and stored, and can be directly invoked when encountering the same or similar driving scenario labels later.
[0080] Specifically, once deployed, traditional scheduling systems often employ static strategies that cannot learn from actual operation. As vehicle hardware and software iterate, sensor performance degrades, or driving data distribution changes, the initially preset scheduling strategies may gradually become ineffective, leading to a decline in efficiency over long-term operation.
[0081] In the specific analysis process, a lightweight monitoring agent is attached to each atomic task node and computing unit. During a driving scenario cycle, such as from entering the highway to exiting, the monitoring agent collects performance metrics in real time: using performance counters provided by the operating system and hardware sensor APIs, it collects the actual completion time of each task node within that cycle, the actual power consumption of the cluster read by the PMU (Power Management Unit), and the actual risk change rate calculated based on the risk level in the driving scenario labels at the two consecutive moments. These metrics, along with the driving scenario label that triggered the scheduling and the dynamic scheduling decision information set used, are bound to a data packet. Next, the performance deviation analysis engine retrieves the preset performance expectations bound to that driving scenario label from the database. For example, based on extensive early road testing, the expected completion time for the BEV modeling task in urban scenarios on the NPU is 15±2ms, and the expected cluster power consumption is <45W. The engine compares the actual metrics with the expected values item by item, identifying performance deviations, such as an actual time of 18ms and a deviation of +3ms, and records the precise scenario conditions and scheduling decisions that caused this deviation. Then, the system initiates a policy iteration module, which dynamically updates the strategy using incremental learning: the aforementioned biased samples are transformed into a loss function and input into an online learner, such as a lightweight neural network or linear model updated using Stochastic Gradient Descent (SGD). The learner aims to adjust the generation strategy of the task-unit matching map, such as adjusting the weight ratio of computation time and power consumption optimization objectives in the game model, and the policy parameters in the three-level adaptive scheduling, such as the trigger threshold for load balancing migration and the preference for dedicated engine traffic splitting. After the update is complete, an optimized scheduling policy is formed and encapsulated and stored as a policy file with a version number and applicable scenario tags. Subsequently, when the system encounters the same or highly similar scenarios again, the scheduler will preferentially call this version policy directly, thereby skipping the time-consuming online game solving and achieving faster and better scheduling.
[0082] In alternative or modified implementations, performance metrics can be collected not only at the end of a cycle, but also using a sliding window for near real-time acquisition. Preset performance expectations can be dynamic, based on environmental parameters rather than fixed values. Incremental learning algorithms can be replaced with online ensemble learning, Bayesian updates, or rule-based expert system revisions. The storage and retrieval of optimization scheduling strategies can be based on similarity retrieval, such as KNN (K-Nearest Neighbors), to find the most relevant historical optimization strategies applicable to new scenarios, rather than precise scene label matching. For security-critical strategy updates, shadow mode testing can be introduced, where the new strategy is simulated in a parallel environment to verify its effectiveness before official deployment.
[0083] Figure 3 This application provides a schematic diagram of the structure of an NPU heterogeneous task scheduling system for assisted driving, as shown in the embodiment of the present application. Figure 3 As shown, an NPU heterogeneous task scheduling system 300 for assisted driving in this embodiment includes: a driving scenario module 301, a scheduling decision module 302, and an adaptive optimization module 303.
[0084] The driving scenario module 301 is used to acquire the vehicle multimodal sensor dataset, and based on the vehicle multimodal sensor dataset, to perform multi-dimensional dynamic recognition of assisted driving scenarios and construction of heterogeneous task graphs to generate a dynamic driving scenario information set. The scheduling decision module 302 is used to perform dynamic matching of tasks and computing units based on the dynamic driving scenario information set, and drive the execution of a three-level adaptive scheduling covering inter-core scheduling, cross-unit collaborative scheduling and dynamic task migration, and generate a dynamic scheduling decision information set. The adaptive optimization module 303 is used to perform multi-dimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies based on the dynamic scheduling decision information set, and generate an adaptive optimization result set for assisted driving.
[0085] The system in this embodiment can be used to execute the methods of any of the above embodiments, and its implementation principle and technical effect are similar, so they will not be described again here.
Claims
1. A heterogeneous task scheduling method for NPUs for assisted driving, characterized in that, include: Acquire the vehicle-mounted multimodal sensor dataset, and based on the vehicle-mounted multimodal sensor dataset, perform multi-dimensional dynamic recognition and heterogeneous task graph construction of assisted driving scenarios to generate a dynamic driving scenario information set; Based on the dynamic driving scenario information set, task-computing unit dynamic matching is performed, and a three-level adaptive scheduling covering inter-core scheduling, cross-unit collaborative scheduling, and dynamic task migration is driven to generate a dynamic scheduling decision information set. Based on the dynamic scheduling decision information set, multi-dimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies are performed to generate an adaptive optimization result set for assisted driving.
2. The method according to claim 1, characterized in that, The process of generating the dynamic driving scenario information set includes: The vehicle-mounted multimodal sensor dataset includes raw perception data from high-definition cameras, millimeter-wave radar, lidar, and vehicle controller status data. Based on the vehicle-mounted multimodal sensor dataset, real-time pattern matching and decision tree inference are performed to generate driving scene labels that include specific scene types and quantified safety level labels. Based on the driving scenario labels, atomic task nodes of the entire process of the assisted driving algorithm are instantiated from the predefined task library, and a heterogeneous driving task graph is constructed according to the data flow dependency relationship. Real-time data collection of load rate, available memory and power consumption of heterogeneous computing unit clusters, generating snapshots of heterogeneous computing resource status; The driving scenario labels, the heterogeneous driving task graph, and the heterogeneous computing resource status snapshot are encapsulated to generate the dynamic driving scenario information set.
3. The method according to claim 2, characterized in that, The process of generating the driving scene tags includes: Based on the image data from the high-definition camera, the abrupt changes in lane line geometry, the density of static obstacle outlines, and the proportion of blind spots in the field of vision are identified, and the road structure abrupt change features are quantified and generated. Based on the continuous frame point cloud data of the millimeter-wave radar and the lidar, the covariance of the movement trajectory of the main traffic participants and the divergence of the future trajectory packet are calculated to quantify and generate the uncertainty features of traffic intention. Based on the vehicle controller state data, the consistency level of multi-source perception data is evaluated, and perception confidence decay characteristics are quantified and generated. The road structure abrupt change features, the traffic intention uncertainty features, and the perception confidence decay features are mapped to generate specific scenario types, and a dynamic comprehensive risk level is calculated to jointly constitute the driving scenario label. The specific scenario types include urban assisted driving type, highway assisted driving type, and extreme assisted driving type.
4. The method according to claim 3, characterized in that, The process of generating the heterogeneous driving mission map includes: Using the specific scenario type as an index, atomic task nodes adapted to the current scenario are dynamically selected and instantiated from the predefined task library; The atomic task nodes include perception-type atomic task nodes, prediction-type atomic task nodes, and decision-type atomic task nodes. Based on the quantified value of the road structure abrupt change characteristics, the processing accuracy parameters and update frequency parameters of the perception-type atomic task nodes are dynamically adjusted as perception fusion node data. Based on the quantified value of the traffic intention uncertainty feature, multiple hypothesis calculation branches are set for the prediction atomic task node and the decision atomic task node, and the dependency edge weights between the two and the perception fusion node data are enhanced. The dynamic comprehensive risk level is mapped to a global task execution urgency coefficient, thereby uniformly labeling the heterogeneous scheduling priority of all atomic task nodes. The atomic task nodes, which include the perception fusion node data, the multi-hypothesis calculation branches, and the heterogeneous scheduling priority, are connected according to the data flow dependency relationship of the algorithm flow to generate the heterogeneous driving task graph.
5. The method according to claim 4, characterized in that, The process of generating the dynamic scheduling decision information set includes: Based on the heterogeneous driving task graph and the heterogeneous computing resource status snapshot, the task allocation problem is modeled as a multi-objective optimization game. The optimal matching scheme is solved by a fast convergence algorithm, and a task-unit matching mapping table is generated. Based on the task-unit matching mapping table, load-aware fine-grained task partitioning and resource allocation are performed among several computing cores within the NPU, generating fine-grained scheduling instructions between cores. Simultaneously, computing power collaboration and priority arbitration based on scenario requirements are performed among the heterogeneous computing unit clusters to generate cross-computing unit collaborative scheduling instructions; Continuously monitor scene switching events and the health status of the heterogeneous computing unit cluster. If scene switching or abnormal health status is detected, trigger the generation of the overall redeployment plan for the task subgraph and generate dynamic migration guidance for the task. The fine-grained scheduling instructions between cores, the collaborative scheduling instructions across computing units, and the dynamic migration guidelines for tasks are encapsulated to generate the dynamic scheduling decision information set.
6. The method according to claim 5, characterized in that, The process of generating the inter-core fine-grained scheduling instructions includes: Based on the computing power requirements of the driving scenario labels, the NPU computing cores are dynamically logically grouped. For unstructured urban scenarios, multiple cores are grouped into collaborative computing units to process large-scale fusion models. For structured high-speed scenarios, the core groups are decoupled and some cores are turned off to reduce power consumption. The utilization of each computing core and the communication latency between adjacent cores are monitored in real time through the on-chip network. When the load deviation is detected to exceed the preset threshold, the computing subgraph on the high-load core is migrated to the low-load core, and a load balancing migration instruction is generated and executed. The atomic task nodes marked with fixed computing patterns and high data throughput in the heterogeneous driving task graph are identified, unloaded from the general computing core, and scheduled to be executed by the dedicated hardware acceleration engine integrated in the NPU, generating and executing dedicated engine offloading instructions.
7. The method according to claim 5, characterized in that, The process of generating the cross-computing unit collaborative scheduling instruction includes: When the driving scenario label is the urban assisted driving type, a first collaborative strategy instruction is generated: the image semantic segmentation and BEV space modeling task scheduling instructions are directed to the NPU computing core group, the multi-target trajectory prediction task scheduling instructions are directed to the GPU, the vehicle lateral and longitudinal control task scheduling instructions are directed to the CPU, and the ISP is configured to enter the enhanced image preprocessing mode. When the driving scenario label is the high-speed assisted driving type, a second collaborative strategy instruction is generated: the target tracking task is decomposed into feature extraction and data association, and its scheduling instructions are respectively directed to the NPU and GPU for collaborative pipeline processing. The path trajectory smoothing task scheduling instruction is directed to the CPU, and an energy-saving control instruction to reduce the working frequency of the NPU non-core computing unit is generated. When the driving scenario label is the extreme assisted driving type, a third collaborative strategy instruction is generated: broadcast the highest priority preemption instruction to all computing units, suspend all non-safety-critical tasks, and ensure that the obstacle emergency recognition and braking decision task obtains exclusive computing resources until the safety level is reduced.
8. The method according to claim 5, characterized in that, The process of generating the task dynamic migration guide includes: When the scene recognition module detects a change in the specific scene type in the driving scene label, it triggers the scene migration process: based on the new and old scene types, it re-solves the task allocation game to generate a new matching mapping table, and accordingly smoothly migrates the task subgraph of the scene that is about to fail from its original calculation unit to the newly mapped calculation unit. When the health monitoring module detects that the utilization of any computing unit is continuously overloaded or a temperature alarm is triggered, the state transition process is triggered: the atomic task nodes with the highest load on this computing unit, and the subgraph formed by the directly dependent nodes in the heterogeneous driving task graph, are migrated as a whole to the redundant computing unit in a healthy state. After each migration process is completed, the snapshot of the heterogeneous computing resource status is updated, and the scheduling balance of the heterogeneous computing unit cluster is reassessed.
9. The method according to claim 5, characterized in that, The process of generating the adaptive optimization result set for assisted driving includes: Real-time collection of performance indicators after scheduling based on the dynamic scheduling decision information set. The performance indicators include the actual completion time of each atomic task node in the heterogeneous driving task graph, the actual power consumption of the heterogeneous computing unit cluster, and the actual risk change rate of the driving scenario label. The performance indicators are compared and analyzed with the performance expectations preset based on the driving scenario labels to identify performance deviations and their corresponding scenario conditions and scheduling decisions. Based on the continuous analysis results of the performance deviation, the generation strategy of the task-unit matching mapping table and the strategy parameters in the three-level adaptive scheduling are dynamically updated using an incremental learning method to form an optimized scheduling strategy. The optimized scheduling strategy is encapsulated and stored for direct invocation by subsequent users with the same or similar driving scenario labels, generating the adaptive optimization result set for assisted driving.
10. A heterogeneous task scheduling system for NPUs for assisted driving, characterized in that, The method applied to any one of claims 1-9 includes: The driving scenario module is used to acquire the vehicle multimodal sensor dataset, and based on the vehicle multimodal sensor dataset, to perform multi-dimensional dynamic recognition of assisted driving scenarios and construction of heterogeneous task graphs, thereby generating a dynamic driving scenario information set. The scheduling decision module is used to perform dynamic matching of tasks and computing units based on the dynamic driving scenario information set, and drive the execution of three-level adaptive scheduling covering inter-core scheduling, cross-unit collaborative scheduling and dynamic task migration, and generate a dynamic scheduling decision information set. The adaptive optimization module is used to perform multi-dimensional monitoring of scheduling efficiency and data-driven closed-loop iteration of strategies based on the dynamic scheduling decision information set, and generate an adaptive optimization result set for assisted driving.