Batch task execution method, apparatus, device, medium, and product

By assigning targets and additional clusters to autonomous driving tasks, and combining management files and monitoring mechanisms, the problem of uneven utilization of cluster resources was solved, and computing efficiency and accuracy were improved.

CN115237566BActive Publication Date: 2026-06-19BEIJING BAIDU NETCOM SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING BAIDU NETCOM SCI & TECH CO LTD
Filing Date
2022-07-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In autonomous driving operations, there are a large number of batch computing requirements, which leads to high pressure on cluster scheduling tasks, uneven resource utilization, and idle resources in some clusters, affecting computing efficiency.

Method used

By assigning target and additional clusters to batch tasks, setting priorities, and combining associated management files and monitoring mechanisms, resources can be fully scheduled and utilized, resource idleness can be avoided, and computing efficiency can be improved.

Benefits of technology

It achieves full utilization of cluster resources, improves the execution efficiency of batch tasks, ensures stable operation and accuracy of tasks, and reduces resource waste.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115237566B_ABST
    Figure CN115237566B_ABST
Patent Text Reader

Abstract

This disclosure provides a method, apparatus, device, medium, and product for batch task execution, relating to the field of data processing, and particularly to the field of autonomous driving technology. The specific implementation scheme is as follows: receiving batch tasks submitted by a user; allocating a target cluster and an auxiliary cluster for the batch tasks based on pre-registered cluster resource information; wherein the execution priority of the batch tasks in the auxiliary cluster is lower than that of the original batch tasks in the auxiliary cluster, and the original batch tasks are those allocated with the auxiliary cluster as the target cluster; obtaining execution data information generated when the target cluster executes the target tasks in the batch tasks, or obtaining execution data information generated when the target cluster and the auxiliary cluster execute the target tasks in the batch tasks; and monitoring and managing the execution of the batch tasks based on the execution data information. The scheme of this disclosure improves the execution efficiency of batch tasks and achieves full scheduling and utilization of global cluster resources.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing, and more particularly to the field of autonomous driving technology, specifically to a method, apparatus, device, medium, and product for batch task execution. Background Technology

[0002] As the scale of autonomous driving business continues to expand, significant computational demands arise across various application scenarios, including predictive planning algorithm verification, perception and reasoning, data processing, scene analysis, data mining, and map building. The large volume of batch computations in these scenarios places considerable pressure on cluster scheduling tasks. Summary of the Invention

[0003] This disclosure provides a method, apparatus, device, medium, and product for batch task execution.

[0004] According to one aspect of this disclosure, a method for executing batch tasks is provided, comprising:

[0005] Receive batch tasks submitted by users; wherein the batch tasks include at least two target tasks;

[0006] Based on the pre-registered cluster resource information, a target cluster and an additional cluster are allocated to the batch tasks; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are batch tasks that are allocated with the additional cluster as the target cluster.

[0007] Obtain execution data information generated by the target cluster when executing the target task in the batch task, or obtain execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, and monitor and manage the execution of the batch task based on the execution data information.

[0008] According to another aspect of this disclosure, a batch task execution apparatus is provided, comprising:

[0009] A task receiving module is used to receive batch tasks submitted by users; wherein the batch tasks include at least two target tasks.

[0010] The cluster allocation module is used to allocate a target cluster and an additional cluster to the batch tasks according to the pre-registered cluster resource information; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are batch tasks that are allocated with the additional cluster as the target cluster.

[0011] The task execution management module is used to obtain execution data information generated by the target cluster when executing the target task in the batch task, or to obtain execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, and to monitor and manage the execution of the batch task based on the execution data information.

[0012] According to another aspect of this disclosure, an electronic device is provided, comprising:

[0013] At least one processor; and

[0014] A memory communicatively connected to the at least one processor; wherein,

[0015] The memory stores instructions that can be executed by the at least one processor, which, when executed, enable the at least one processor to perform the batch task execution method described in any embodiment of this disclosure.

[0016] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform a batch task execution method according to any embodiment of this disclosure.

[0017] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the batch task execution method according to any embodiment of this disclosure.

[0018] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0019] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:

[0020] Figure 1 This is a schematic diagram of a batch task execution method according to an embodiment of the present disclosure;

[0021] Figure 2 This is a schematic diagram illustrating the allocation of the target cluster and the supplementary cluster according to an embodiment of this disclosure;

[0022] Figure 3 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure;

[0023] Figure 4 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure;

[0024] Figure 5 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure;

[0025] Figure 6 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure;

[0026] Figure 7 This is a schematic diagram of the structure of a batch task execution system according to an embodiment of the present disclosure;

[0027] Figure 8 This is a diagram illustrating batch task scheduling;

[0028] Figure 9 This is a schematic diagram of the target task's execution cycle;

[0029] Figure 10 It is a flowchart of the execution of the target task within a single container;

[0030] Figure 11 This is a schematic diagram of the processing flow of the monitoring and recycling module for the completed target tasks;

[0031] Figure 12 It is the timeout or long-tail processing flow for the target task;

[0032] Figure 13 This is a schematic diagram illustrating the execution of the reporting and statistics module;

[0033] Figure 14 This is a schematic diagram of the structure of a batch task execution device according to an embodiment of the present disclosure;

[0034] Figure 15 This is a block diagram of an electronic device used to implement the batch task execution method of the embodiments of this disclosure. Detailed Implementation

[0035] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0036] Figure 1This is a schematic diagram of a batch task execution method according to an embodiment of the present disclosure. This embodiment is applicable to situations where batch task execution methods are optimized. The method can be executed by a batch task execution device, which can be implemented through software and / or hardware and integrated into an electronic device. The electronic device involved in this embodiment can be a local server or other device with communication and computing capabilities. Specifically, refer to... Figure 1 The method specifically includes the following:

[0037] S110. Receive batch tasks submitted by the user; wherein the batch tasks include at least two target tasks.

[0038] In this context, a batch task refers to a comprehensive computational task that aggregates tasks with the same computational requirements across different application scenarios, while the target task refers to the computational task within each application scenario. In another optional implementation of this embodiment, when simulating and verifying autonomous driving algorithms, the batch task submitted by the user is an algorithm verification task, and the target task is an algorithm verification task under different maps. For example, the autonomous driving algorithm can be a prediction algorithm, a localization algorithm, or a PNC (Planning and Control) algorithm, etc. Of course, besides performing batch computations for algorithm verification in autonomous driving, the batch task in this embodiment is also applicable to other batch computational tasks.

[0039] Specifically, on the autonomous driving simulation cloud platform, when users need to verify autonomous driving positioning algorithms, they create a batch task based on the positioning algorithm. This batch task includes positioning algorithm verification on different roads, with each road's verification being a separate target task. To ensure the correctness of the algorithm verification results, simulation verification needs to be performed on multiple roads. For example, users can submit batch tasks in various ways, such as via HTTP API, a user interface (UI), or a Linux system; there are no restrictions on the submission method.

[0040] Since the verification of autonomous driving algorithms involves a large amount of batch computation, the method in this embodiment is beneficial to improve the efficiency of batch computation and can support the stable operation of various batch tasks.

[0041] S120. Based on the pre-registered cluster resource information, allocate a target cluster and an additional cluster for the batch tasks; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are the batch tasks that are allocated with the additional cluster as the target cluster.

[0042] Pre-registered clusters can be distributed across different regions. Cluster resource information is used to characterize the computing power of the cluster, including the number of physical machines and configuration information such as CPU and memory. A target cluster refers to a cluster that executes the assigned batch tasks directly without the necessary execution conditions; an additional cluster refers to a cluster that executes the assigned batch tasks after certain execution conditions are met. Execution priority is used to characterize the order in which tasks are executed within a cluster.

[0043] Specifically, detailed parameter configuration information for each physical cluster is pre-determined. Based on this parameter configuration information, at least one target cluster and at least one supplementary cluster are determined for the target task. The number of target clusters and supplementary clusters can be determined based on the number of tasks in the batch task and the cluster configuration information, and is not limited here. After the target cluster is determined for the batch task, the target cluster allocates queue resource information to the batch task, that is, determines the resource organization unit allocated to the batch task in the cluster queue. When the cluster executes the batch task, the corresponding allocated resource organization unit will retrieve the corresponding target task in the batch task for execution. In addition, when the same batch task is allocated to multiple cluster queues, the execution priority in each queue can be customized based on the queue resource availability, including the allocated resource organization unit size and execution priority.

[0044] Due to the asymmetry of resource information in the cluster and the inconsistent number of tasks in each batch, the number of target tasks to be executed on different cluster queues is inconsistent. This can easily lead to situations where resource queues containing batch tasks with fewer target tasks are idle, while resource queues containing batch tasks with more target tasks are heavily queued. To avoid some cluster resources being idle, this embodiment allocates additional clusters in addition to the target cluster when allocating cluster resources for batch tasks. The additional clusters are used to execute batch tasks using idle cluster resources after the original batch tasks allocated in the current cluster have been completed, thereby improving the execution efficiency of batch tasks and avoiding the waste of idle cluster resources.

[0045] For example, each batch task can be run by multiple cluster queues, including at least one target cluster queue and at least one additional cluster queue. When each cluster queue receives a batch task, it allocates a resource organization unit (ROU) to that task. The ROU in the target cluster queue directly allocates a container to the batch task based on its execution priority. The container then executes the target tasks within the batch task. The container is a virtualization environment responsible for executing each target task within the cluster. After the ROU in the additional cluster finishes executing the batch task that used that cluster as the target cluster, if the target tasks within that batch task have not yet been completed, the additional ROU in the additional cluster will allocate a container to execute the batch task, thus accelerating its completion. Figure 2 The diagram shows the allocation of target clusters and supplementary clusters. There are three pre-registered cluster queues. In each cluster queue, a batch task (Job) is executed as a resource organization unit (APP). When a batch task is executed in a cluster queue, the APP corresponding to the batch task obtains resources and creates an execution container for the batch task. For Job01, cluster queue 1 is its target cluster, and cluster queues 2 and 3 are supplementary clusters. In the supplementary clusters, supplementary resource organization units are created for Job01. For cluster queue 1, Job01 and Job03 are the original batch tasks, and Job02, Job04, and Job06 are supplementary batch tasks. Supplementary batch tasks Job02, Job04, and Job06 will only be executed after cluster queue 1 has completed the execution of Job01 and Job03. Similarly, the same applies to cluster queues 2 and 3.

[0046] S130. Obtain the execution data information generated when the target cluster executes the target task in the batch task, or obtain the execution data information generated when the target cluster and the attached cluster execute the target task in the batch task, and monitor and manage the execution of the batch task based on the execution data information.

[0047] Among them, execution data information refers to the relevant task data generated during the execution of the target task, such as the start time of execution, execution status, and other task data.

[0048] If the target cluster of a batch task has completed all its target tasks before the attached cluster completes its original batch task, then the execution data generated by the target cluster during the execution of the target tasks in the batch task is obtained. If the target cluster of a batch task has not completed all its target tasks after the attached cluster completes its original batch task, then the execution data of both the target cluster and the attached cluster during the batch task is obtained. The overall execution status of the batch task is monitored and managed based on the execution data of each target task.

[0049] For example, such as Figure 2 As shown, if Job01 has not finished executing after Job02 and Job06 in cluster queue 2, then Job01 appended in cluster queue 2 will begin executing the target tasks in Job01. If the resource organization unit corresponding to Job01 in cluster queue 1 and the resource organization unit corresponding to Job01 appended in cluster queue 2 have finished executing Job01, and Job04 and Job05 have not yet finished executing in cluster queue 3, then cluster queue 3 will not execute Job01. During the execution of batch tasks in cluster queues 1 and 2, various execution data information generated is acquired in real time.

[0050] In another optional implementation of this embodiment, before obtaining the execution data information generated when the target cluster and the attached cluster execute the target task in the batch task, the method further includes:

[0051] Receive the target cluster's request to execute batch tasks and allocate target tasks to the target cluster;

[0052] Receive execution requests for batch tasks from additional clusters and allocate target tasks to additional clusters; wherein, after the additional clusters have completed the execution of the original batch tasks, they initiate execution requests for the batch tasks.

[0053] The target cluster initiates task execution requests according to the execution priority of the batch tasks. After the attached cluster finishes executing the original batch tasks, it initiates task execution requests according to the execution priority of the attached batch tasks. Upon receiving the task execution request, it allocates the target tasks that have not yet been executed from the batch tasks to the corresponding cluster.

[0054] For example, based on the above example, after Job02 and Job06 are executed in cluster queue 2, but Job01 has not yet been completed, an additional Job01 in cluster queue 2 initiates an execution request for Job01. In this case, the unexecuted target task from Job01 is assigned to cluster queue 2, and the corresponding additional resource organization unit in cluster queue 2 executes the target task. Similarly, if cluster queue 1 initiates an execution request for Job01 according to the execution priorities of Job01 and Job03, the unexecuted target task from Job01 is assigned to cluster queue 1. The number of containers initiated can be customized within the resource organization unit of each cluster queue to determine the number of target tasks that can be executed simultaneously in this batch of tasks.

[0055] Setting up target and additional clusters for batch tasks enables full scheduling and utilization of global cluster resources, avoiding the situation where some cluster resources are idle.

[0056] The solution in this embodiment sets up a target cluster and an additional cluster for batch tasks. The additional cluster will only execute other additional batch tasks after the original batch tasks in the additional cluster have been completed. This ensures the smooth execution of the original batch tasks and reduces the pressure on the cluster resources when the cluster resources are idle, thereby improving the execution efficiency of batch tasks. It achieves full scheduling and full utilization of the global cluster resources and avoids the situation where some cluster resources are idle.

[0057] Figure 3 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure. This embodiment is a further refinement of the above technical solution. After receiving the batch tasks submitted by the user, the method further includes: establishing an associated batch management file for the batch tasks, and establishing an associated target management file for each target task included in the batch tasks. The technical solution in this embodiment can be combined with various optional solutions in one or more of the above embodiments. Figure 3 As shown, the batch task execution methods include the following:

[0058] S310. Receive batch tasks submitted by users; establish associated batch management files for batch tasks, and establish associated target management files for each target task included in the batch tasks.

[0059] The batch management file is used to store overall task data related to batch tasks, such as execution data, while the target management file is used to store task data related to target tasks, such as the execution data of each target task.

[0060] Specifically, after receiving a batch task submitted by a user, an associated batch management file is created for the batch task, and the batch task is split into individual target tasks, with an associated target file also created for each target task.

[0061] S320. Based on the pre-registered cluster resource information, allocate target clusters and additional clusters for batch tasks.

[0062] S330. Obtain the execution data information generated when the target cluster executes the target task in the batch task, or obtain the execution data information generated when the target cluster and the additional cluster execute the target task in the batch task.

[0063] In another optional implementation of this embodiment, the management file also includes metadata information about the task;

[0064] Accordingly, after establishing associated batch management files for batch tasks and associated target management files for each target task included in the batch tasks, the method further includes:

[0065] The associated target management file of the target task is stored in the set of tasks to be executed, so that when the target cluster or attached cluster executes batch tasks, it can obtain the metadata information of each target task in the batch tasks from the set of tasks to be executed, and lock the associated target management file of the target task to be executed.

[0066] The metadata information of a task refers to the basic data information required for the task's execution, such as task configuration information, task type, and task submitter information. When a batch of tasks is received, this metadata information is already included and saved in the associated batch management file. When splitting the target tasks within a batch, the metadata information for each target task is determined and saved in the corresponding associated target management file. The set of tasks to be executed stores the associated target management files for all unfinished target tasks among the batch tasks submitted by the user.

[0067] Specifically, after the management file is established, the data information required for the execution of the corresponding tasks is saved in the management file, and the associated target management files of each target task are saved in the set of tasks to be executed. For example, when the target cluster or attached cluster executes the batch task, it queries the set of tasks to be executed to find the associated target files of the target tasks belonging to the batch task, and obtains the parameter information required for the target task to run from the target management file. When any container in any cluster obtains the target management file, the target management file is locked. That is, a locked target management file indicates that the corresponding target task is in the process of execution and can only be executed by the locked target container, so as to prevent the target task from being obtained by other clusters or other containers in the cluster, resulting in the duplicate execution of the same target task.

[0068] By setting up a set of tasks to be executed, all unfinished target tasks can be managed, making it easier for containers in each cluster to obtain the corresponding execution tasks and preventing the same target task from being executed repeatedly by multiple containers.

[0069] S340. Write the execution data information of the target task into the associated target management file.

[0070] When the target cluster or attached cluster executes the target task, the generated execution data is written in real time to the associated management file in the set of tasks to be executed, so as to monitor the execution status of the target task. The execution data information includes execution time, execution status, and other task result data generated during execution.

[0071] S350. Determine the batch task execution data information based on the target task execution data information, and write the batch task execution data information into the associated batch management file.

[0072] Among them, batch task execution data information refers to relevant data information determined from the perspective of the batch task as a whole, such as the number of target tasks in various execution states in the batch task, and the summary results of task execution, etc.

[0073] Specifically, after obtaining the execution data information of each target task in the batch task, the overall task execution status of the batch task is determined based on the execution data information of each target task, and the relevant execution data information is written into the associated batch management file so as to determine the overall execution status of the batch task through the batch management file.

[0074] S360. Monitor and manage the execution progress of target tasks according to the target management file, and monitor and manage the overall execution progress of batch tasks according to the batch management file.

[0075] The target management file stores the execution data of a single target task. Through the target management file, the execution progress of a single target task can be monitored and the target task can be managed based on the execution data. The batch management file stores the execution data of the entire batch task. From an overall perspective, the overall execution progress of the batch task can be monitored and the batch task can be managed based on the overall execution data.

[0076] For example, if monitoring of the target management file of a single task reveals that the execution time of a target task exceeds a preset threshold, then the timed-out target task needs to be handled accordingly; if monitoring of the batch management file of batch tasks reveals that the overall execution time of the batch tasks exceeds a second preset threshold, then the target tasks in the batch tasks that have not been completed need to be handled accordingly.

[0077] In another optional implementation of this embodiment, the execution data information includes at least the execution status;

[0078] Correspondingly, the S360 includes:

[0079] If the execution status of the target task is found to be "execution completed", then the associated target management file of the target task will be moved from the set of tasks to be executed to the set of tasks that have been completed.

[0080] The execution status is used to characterize the current execution progress of the target task, and can include non-execution, in-process, and completed status. The set of completed tasks is used to store the associated target management files of the completed target tasks for unified management of the completed target tasks.

[0081] During the execution of the target task in the target cluster or attached cluster, the execution data information is written to the associated target management file in real time. The execution progress of the target task can be monitored by monitoring the target management file. If the execution status of the target task is monitored to be completed, the associated management file of the target task is deleted from the set of tasks to be executed, and a new target management file is added to the set of completed tasks for the completed target task.

[0082] By setting up a set of completed tasks, the management efficiency of target tasks with the execution status of completed tasks is improved, and it is also helpful to quickly locate target tasks that have failed to execute.

[0083] In another optional implementation of this embodiment, S360 includes:

[0084] After obtaining the execution status of the target task as completed, maintain the execution environment of the target task in the target cluster or attached cluster unchanged;

[0085] Filter the set of tasks to be executed to find other target tasks that belong to the same batch task as the currently executed target task;

[0086] Based on the current execution environment, execute other target tasks.

[0087] Since the execution environment for target tasks within the same batch of tasks is identical, differing only in certain task parameters, the execution of target tasks in the cluster takes place within a container. A container is a virtualized environment responsible for executing a target task. When a container executes a target task after creation, it first needs to create the necessary environment based on the target task's dependency files, and then execute the target task within that environment. Once the target task is completed, the container is deleted, and a new container is created to execute the next target task. In this embodiment, to reduce the time spent on container creation and environment preparation, already created containers are reused.

[0088] Specifically, the system monitors the target management file to determine whether the target task has finished executing. If the execution status is "execution finished," the system enters the container exit or container reuse process. During this process, the container environment in the cluster that executes the target task remains unchanged. The system checks the set of tasks to be executed to determine whether there are other target tasks that belong to the same batch of tasks as the target task. If not, the system enters the container exit process. If so, the container executes the other target tasks in the current execution environment.

[0089] By reusing the execution environment in the cluster, the process of creating different target task environments for batch tasks with the same execution environment is reduced, thereby improving the overall running efficiency of batch tasks.

[0090] In another optional implementation of this embodiment, the execution data information includes at least the reason for the failure of the target task; wherein, a mapping relationship between the candidate task failure reasons and the execution failure error codes is established in advance;

[0091] Correspondingly, the S360 includes:

[0092] Determine the execution failure error code of the mapping based on the reason for the failure of the target task, and write the execution failure error code into the associated target management file of the failed target task and the associated batch management file of the batch task.

[0093] The cause of target task failure is determined based on the specific steps taken upon completion and the execution feedback data. Users pre-define corresponding execution failure error codes based on potential anomalies. Once the cause of target task failure is determined based on the execution status, the corresponding execution failure error code is then written to the relevant management file. Specifically, execution failure error codes are written to specific target management files for analysis and determination of the cause of failure for that target task; and execution failure error codes are written to batch management files to summarize the causes of target task execution failures in a batch of tasks, enabling rapid identification of the cause of task failure and targeted solutions.

[0094] Setting execution failure error codes helps users quickly locate the cause of execution failure from a large number of execution containers in highly parallelized operation scenarios, and perform overall statistics and analysis on the reasons for the failure of the target task.

[0095] The solution in this embodiment establishes an associated target management file for the target task, enabling monitoring and management of the execution status of a single task at the target task level. At the same time, by establishing an associated batch management file for batch tasks, it enables monitoring and management of the overall task execution status at the batch task level. This helps to improve the user's control over the execution of batch tasks and improve the accuracy of batch task execution.

[0096] Figure 4 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure. This embodiment is a further refinement of the above technical solution. The execution completion status includes execution success and execution failure. Correspondingly, the execution progress of the target tasks is monitored and managed according to the target management file, including: determining the failed target tasks based on the execution completion status in the target management file in the completed task set; adding a backup target management file to the task set to be executed for the failed target tasks, so that the target cluster or additional cluster can re-execute the failed target tasks. The technical solution in this embodiment can be combined with various optional solutions in one or more of the above embodiments. Figure 4 As shown, the batch task execution methods include the following:

[0097] S410. Receive batch tasks submitted by the user; establish associated batch management files for the batch tasks, and establish associated target management files for each target task included in the batch tasks, and save the associated target management files of the target tasks in the set of tasks to be executed.

[0098] S420. Based on the pre-registered cluster resource information, allocate target clusters and additional clusters for batch tasks.

[0099] S430. Obtain the execution data information generated when the target cluster executes the target task in the batch task, or obtain the execution data information generated when the target cluster and the additional cluster execute the target task in the batch task.

[0100] S440. Write the execution data information of the target task into the associated target management file; determine the batch task execution data information based on the execution data information of the target task, and write the batch task execution data information into the associated batch management file.

[0101] S450. If the execution status of the target task is obtained as execution completed, the associated target management file of the target task is moved from the set of tasks to be executed to the set of tasks completed; wherein, the execution completed status includes execution success and execution failure.

[0102] For example, if the execution result data is obtained after the execution of the target task's code segment, the target task is determined to have executed successfully; otherwise, the target task is determined to have failed. Alternatively, if the execution of the target task's code segment ends before it has finished executing, the target task is determined to have failed.

[0103] The target task in the execution completion state is further subdivided into target tasks that were successfully executed and target tasks that failed to execute. The further subdivision of the execution completion state is determined based on the execution feedback data.

[0104] S460. Based on the execution completion status in the target management file of the completed task set, determine the failed target tasks that failed to execute.

[0105] Since both successful and unsuccessful execution of a target task are considered as completed tasks, the associated target management files for that target task will appear in the completed task set. Therefore, by judging the execution status of the target management files from the completed task set, the failed target task can be found.

[0106] S470. Add a backup target management file to the set of tasks to be executed for the failed target task, so that the target cluster or attached cluster can re-execute the failed target task.

[0107] Since the failure of a target task may be accidental, it is necessary to re-execute failed tasks in order to ensure the success rate of batch tasks.

[0108] Specifically, after identifying a failed target task, a backup target management file is added to the pending task set based on the associated management file of that failed target task in the completed task set. This backup target management file includes the data information required for the execution of the failed target task, as well as the aforementioned execution status. The backup target management file is identical to the original target management file in the pending task set, belonging to the same batch task. The corresponding target cluster or attached cluster locks and executes the backup target management file. For example, after the target cluster or attached cluster allocated for the batch task obtains the backup target management file from the pending task set, it locks the backup target management file and re-executes the failed target task corresponding to the backup target management file until the number of failures of the failed target task exceeds a preset number or the failed target task is successfully executed.

[0109] The solution in this embodiment quickly locates failed target tasks from the set of completed tasks through a target management file, creates a backup target management file for the failed target tasks, and re-executes the failed target tasks by executing the backup target management file. This improves the efficiency and accuracy of re-execution of failed target tasks, thereby improving the overall execution accuracy of batch tasks.

[0110] Figure 5 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure. This embodiment is a further refinement of the above technical solution. The execution data information includes at least the execution time. Correspondingly, the execution progress of the target tasks is monitored and managed by the additional cluster according to the target management file, including: determining the timed-out target tasks whose execution time exceeds a preset time threshold based on the execution time in each target management file in the set of tasks to be executed; adding a backup target management file to the set of tasks to be executed for the timed-out target tasks, so that the target cluster or the additional cluster can re-execute the timed-out target tasks. The technical solution in this embodiment can be combined with various optional solutions in one or more of the above embodiments. Figure 5 As shown, the batch task execution methods include the following:

[0111] S510: Receive batch tasks submitted by users; establish associated batch management files for batch tasks, and establish associated target management files for each target task included in the batch tasks; save the associated target management files of the target tasks in the set of tasks to be executed.

[0112] S520: Based on the pre-registered cluster resource information, allocate target clusters and additional clusters for batch tasks.

[0113] S530. Obtain execution data information generated when the target cluster executes the target task in the batch task, or obtain execution data information generated when the target cluster and the additional cluster execute the target task in the batch task; wherein, the execution data information includes at least the execution time.

[0114] Execution time refers to the time difference between the start time of the target task and the current time. For example, the start time of the target task is determined, and the time difference between the start time and the current time is updated in real time as the execution time of the target task.

[0115] S540. Write the execution data information of the target task into the associated target management file; determine the batch task execution data information based on the execution data information of the target task, and write the batch task execution data information into the associated batch management file.

[0116] S550. Based on the execution time in each target management file in the set of tasks to be executed, determine the timeout target tasks whose execution time exceeds the preset time threshold.

[0117] The preset time threshold is the estimated execution time based on the execution status of the target task.

[0118] Since the set of tasks to be executed contains tasks that have not been completed, by determining the execution time of each target management file in the set of tasks to be executed, we can find the tasks that have not been completed and whose execution time has exceeded the estimated execution time. These tasks are identified as timed-out target tasks. At this time, the timed-out target tasks are still running in a container in the target cluster or an attached cluster.

[0119] S560: Add a backup target management file to the set of tasks to be executed for the timed-out target task, so that the target cluster or attached cluster can re-execute the timed-out target task.

[0120] Since the timeout of the target task may be accidental, in order to ensure the running efficiency of batch tasks, the timeout task needs to be executed separately to ensure the rapid completion of the target task.

[0121] Specifically, after determining the timeout target task, based on the associated management file of the timeout target task in the task set to be completed, a backup target management file is added to the task set to be executed. This backup target management file includes the data information required for the execution of the timeout target task, as well as the aforementioned execution status. The backup target management file is identical to the original target management file in the task set to be executed, belonging to the same batch task. The corresponding target cluster or attached cluster locks and executes the backup target management file until the timeout count of the timeout target task exceeds a preset number or the timeout target task is successfully executed. While the target cluster is executing the backup target management file of the timeout target task, the original target management file of the timeout target task is also executed simultaneously. When the execution result reported by any target management file of the target task is obtained and written to the batch management file, the other unexecuted management files of the target task stop running.

[0122] The solution in this embodiment quickly locates the timed-out target task from the set of tasks to be completed through the target management file, and creates a backup target management file for the timed-out target task. By executing the backup target management file, the timed-out target task is re-executed, which improves the efficiency and accuracy of re-execution of the timed-out target task, thereby improving the overall execution accuracy of batch tasks.

[0123] Figure 6 This is a schematic diagram of another batch task execution method according to an embodiment of the present disclosure. This embodiment is a further refinement of the above technical solution. The batch task execution data information includes at least the execution status of each target task in the batch task. Correspondingly, the overall execution progress of the batch task is monitored and managed according to the batch management file, including: if it is determined from the execution status of each target task in the batch task's associated batch management file that less than a preset proportion of target tasks in the batch task have not been completed, then the target tasks with an incomplete execution status in the batch task are determined to be long-tail target tasks; a backup target management file is added to the set of tasks to be executed for the long-tail target tasks in the batch task, so that the target cluster or the attached cluster can re-execute the long-tail target tasks. The technical solution in this embodiment can be combined with various optional solutions in one or more of the above embodiments. Figure 6 As shown, the batch task execution methods include the following:

[0124] S610. Receive batch tasks submitted by the user; establish associated batch management files for the batch tasks, and establish associated target management files for each target task included in the batch tasks, and save the associated target management files of the target tasks in the set of tasks to be executed.

[0125] S620. Based on the pre-registered cluster resource information, allocate target clusters and additional clusters for batch tasks.

[0126] S630. Obtain the execution data information generated when the target cluster executes the target task in the batch task, or obtain the execution data information generated when the target cluster and the additional cluster execute the target task in the batch task.

[0127] S640. Write the execution data information of the target task into the associated target management file; determine the batch task execution data information based on the execution data information of the target task, and write the batch task execution data information into the associated batch management file; wherein, the batch task execution data information includes at least the execution status of each target task in the batch task.

[0128] After each target task writes its execution status to the associated management file, the overall execution status of the batch tasks is summarized based on the execution status of each target task, resulting in the execution status of each target task within the batch. For example, the batch task execution data includes the proportion of target tasks in the batch that are in the completed execution state and the proportion of target tasks that are not in the completed execution state.

[0129] S650. If, based on the execution status of each target task in the batch management file associated with the batch tasks, it is determined that the target tasks in the batch tasks with an execution status of less than a preset proportion have not been completed, then the target tasks in the batch tasks with an execution status of incomplete are determined to be long-tail target tasks.

[0130] When batch tasks are executed, most of the target tasks will be completed, while only a small number of target tasks will still be in the process of execution or not executed. This batch task is a long-tail batch task, and the target tasks that have not been completed in the long-tail batch task are all long-tail target tasks.

[0131] Specifically, the proportion of incomplete target tasks in a batch of tasks is determined. If this proportion is less than a preset proportion, the batch of tasks is classified as long-tail batch tasks, and the target tasks within the long-tail batch tasks that affect overall operational efficiency are identified as long-tail target tasks. Incomplete target tasks include both unexecuted target tasks and target tasks that are in the process of execution.

[0132] S660: Add a backup target management file to the set of tasks to be executed for long-tail target tasks in batch tasks, so that the target cluster or attached cluster can re-execute the long-tail target tasks.

[0133] Because the existence of long-tail target tasks affects the overall operation of batch tasks, in order to ensure the running efficiency of batch tasks, long-tail target tasks need to be executed separately to ensure their rapid completion.

[0134] Specifically, after identifying the long-tail target task, a backup target management file is added to the task set to be executed, based on the associated management file of the long-tail target task in the task set to be completed. This backup target management file includes the data information required for the execution of the long-tail target task, as well as the aforementioned execution status. The backup target management file is identical to the original target management file in the task set to be executed, belonging to the same batch task. The corresponding target cluster or attached cluster locks and executes the backup target management file until the execution count of the long-tail target task exceeds a preset number or the long-tail target task is successfully executed. While the target cluster is executing the backup target management file of the long-tail target task, the original target management file of the long-tail target task is also executed simultaneously. When the execution result reported by any target management file of the long-tail target task is obtained and written to the batch management file, the other unexecuted management files of the long-tail target task stop running.

[0135] The solution in this embodiment identifies long-tail batch tasks that have timed out and have not been completed through a batch management file, and creates a backup target management file for the uncompleted target tasks in the long-tail batch tasks. By executing the backup target management file, the long-tail target tasks are re-executed, which improves the efficiency and accuracy of re-execution of long-tail target tasks, thereby improving the overall execution accuracy of batch tasks.

[0136] Figure 7 This is a schematic diagram of a batch task execution system according to an embodiment of the present disclosure. This system can execute the batch task execution method involved in any embodiment of the present disclosure; see reference. Figure 7 The batch task execution system includes a user submission module (Submitter), a multi-cluster scheduling module (Launcher), a monitoring and recycling module (Tracker), a reporting and statistics module (Reporter), and a task management module (Task Manager).

[0137] The user submission module supports users to submit batch tasks in three ways. After receiving the batch tasks submitted by the user, the task management module creates an associated batch management file for the batch tasks to store metadata information. The task management module also supports the transactional addition, update and deletion of tasks.

[0138] The multi-cluster scheduling module is responsible for managing and scheduling cluster resources. It breaks down each batch task into corresponding target tasks and creates associated target files for each target task. These associated target management files are stored in the task management module's pending task set (input group). The task management module also sets the execution completion task set (output group). The multi-cluster scheduling module integrates detailed parameter information for each registered physical cluster and is responsible for scheduling batch tasks to one or more clusters for execution based on cluster resource information. This module eliminates the need for users to perform underlying adaptations for different clusters. Users can schedule batch tasks to execution cluster queues by customizing specified cluster resource queues or using default configurations based on cluster resource information. The cluster queue creates a preset number of containers for each batch task to execute its target tasks. Figure 8 The diagram illustrates batch task scheduling. The same batch task (Job) can be scheduled on different queues, with the number and priority of containers customized based on queue resource availability. Higher priority tasks will be allocated container resources first. The multi-cluster scheduling module is responsible for the unified management of cluster resources in different regions, ultimately storing the configurations of different clusters in the cloud. Updates to cluster configurations in the cloud take effect immediately without further processing, significantly expanding available computing resources.

[0139] The multi-cluster perspective module includes a container execution engine, which is responsible for managing the execution cycle of the target task within the container. A diagram illustrating the execution cycle of the target task is shown below. Figure 9 As shown, when any container in the cluster executes a target task, it first obtains the associated target management file (TM task) of the target task from the input group, and the container execution engine locks (owns) the TM task. The container runs the task business code according to the metadata information in the TM task. After the task is executed, the container execution engine deletes the target management file in the input group and writes the target management file to the output group.

[0140] The execution flowchart of the target task within a single container is as follows: Figure 10As shown, the cluster first creates a container for the target task and sets up the environment for that container. The container execution engine initializes based on the target task's metadata and executes the business code segment of the target task through the business program entry executable file, writing the running status to the target management file in real time until the execution is complete. Error codes for the target task are collected, and the container execution engine determines whether the business code has finished running by checking the execution status in the target management file, thus initiating the container exit or container reuse process. Simultaneously with the execution of the business code, the execution engine can write runtime data (such as the time consumed at each stage) to the database, facilitating the generation of multi-dimensional analytical reports by the subsequent reporting and statistics module.

[0141] The container execution engine manages the task execution code, allowing users to focus on their own business logic. During the preparation phase, the code is passed into the container environment as a replacement package, achieving complete decoupling between business operators and the container.

[0142] The task management module itself has high reliability. It can ensure that no data is lost when the target management file fails to process. It also stores batch tasks in the form of batch management files and updates the task completion progress in real time. Once a fault occurs, it can promptly retrieve the execution status of the tasks before the fault from the management file in the task management module, so that user tasks can continue to execute after the fault is recovered, ensuring that no data is lost. This ensures the fault tolerance of the system from the granularity of batch tasks.

[0143] The monitoring and recycling module is responsible for monitoring the progress of batch tasks and collecting the results of target task execution. Specifically, the monitoring and recycling module recycles the target management files in the output group in real time and obtains whether the most recent execution of the target task was successful. A flowchart illustrating the processing flow of completed target tasks by the monitoring and recycling module is shown below. Figure 11As shown, the success of a target task is determined by the error code. If it fails, a retry mechanism is initiated. This involves adding a backup target management file (backup™ task) for the failed task to the input group and having the container retrieve this backup target management file to rerun the task. This continues until the maximum number of retries is reached or the task succeeds. The monitoring and recycling module also calculates the time difference between the start and end of a target task to determine if it has timed out. If a timeout occurs, the container is considered to have encountered a timeout failure, and the container is rerun by adding a backup target management file. This continues until all target management files for that target task have timed out or a particular target management file has succeeded. Occasionally, during the execution of a batch task, some target tasks may take exceptionally long to execute, causing the entire batch task to enter a long-tail phase. The monitoring and recycling module determines that the batch task has entered the long-tail phase and also initiates a backup target management file process, allowing idle containers to run the backup target management files of the long-tail target tasks to accelerate the execution efficiency of the long-tail phase. The timeout or long-tail handling process for target tasks is as follows: Figure 12 As shown.

[0144] The Reporting and Statistics module is responsible for aggregating and calculating the data from completed target tasks, generating aggregated error codes and other report data. The results are also used to generate analysis reports. The module aggregates and calculates the failure frequency of different error types, and the aggregated results are fed back to the user through a front-end page or query interface. Users can understand the reason for a target task's abnormal execution based on the specific error code information. In addition to statistically analyzing target task failure information, the module can also statistically analyze the execution efficiency of each stage of the business code module, storing the aggregated results in a database for easy display through the front-end page or query via the interface. Users can quickly identify the root cause of task failure based on the aggregated error code statistics and then address it specifically. The Reporting and Statistics module can be configured with multi-dimensional analysis reports to assist in analyzing the main factors affecting task efficiency and stability. A diagram illustrating the execution of the Reporting and Statistics module is shown below. Figure 13 As shown, the raw data of the operation is stored in the corresponding report database. Based on the characteristics of different databases, the required multidimensional data model is established, and data analysis services are provided to users through the BI platform or query interface, thereby helping users to quickly locate faults.

[0145] This disclosure presents a general-purpose, highly available, and highly stable batch task execution system that supports flexible expansion across multiple clusters and provides unified management and scheduling of global cluster resources. This significantly improves the operational scale of batch computing tasks for autonomous driving, achieving a daily task volume exceeding one million. The underlying cluster physical resources are transparent to the user, and containers and business operators are completely decoupled, allowing users to focus more efficiently on business development. Furthermore, this disclosure provides optimization solutions for task execution failures caused by environmental anomalies or other circumstances, long-tail phenomena, and timeout failures, greatly improving the system's stability and robustness. In addition, error code collection and report analysis mechanisms can quickly improve the efficiency of users in locating problems and repairing anomalies.

[0146] Figure 14 This is a schematic diagram of a batch task execution device according to an embodiment of the present disclosure. This device can execute the batch task execution method involved in any embodiment of the present disclosure; see reference. Figure 14 The batch task execution device 400 includes: a task receiving module 410, a cluster allocation module 420, and a task execution management module 430.

[0147] A task receiving module is used to receive batch tasks submitted by users; wherein the batch tasks include at least two target tasks.

[0148] The cluster allocation module is used to allocate a target cluster and an additional cluster to the batch tasks according to the pre-registered cluster resource information; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are batch tasks that are allocated with the additional cluster as the target cluster.

[0149] The task execution management module is used to obtain execution data information generated by the target cluster when executing the target task in the batch task, or to obtain execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, and to monitor and manage the execution of the batch task based on the execution data information.

[0150] The solution in this embodiment sets up a target cluster and an additional cluster for batch tasks. The additional cluster will only execute other additional batch tasks after the original batch tasks in the additional cluster have been completed. This ensures the smooth execution of the original batch tasks and reduces the pressure on the cluster resources when the cluster resources are idle, thereby improving the execution efficiency of batch tasks. It achieves full scheduling and full utilization of the global cluster resources and avoids the situation where some cluster resources are idle.

[0151] In an optional implementation of this embodiment, the apparatus further includes a task allocation module, configured to, before acquiring the execution data information generated when the target cluster and the supplementary cluster execute the target task in the batch tasks,

[0152] Receive the execution request of the target cluster for the batch tasks, and allocate target tasks to the target cluster;

[0153] The system receives the execution request for the batch task from the additional cluster and assigns a target task to the additional cluster; wherein the additional cluster initiates the execution request for the batch task after completing the execution of the original batch task.

[0154] In an optional implementation of this embodiment, the apparatus further includes a management file creation module, used to, after receiving the batch tasks submitted by the user,

[0155] Create an associated batch management file for the batch tasks, and create an associated target management file for each target task included in the batch tasks;

[0156] Correspondingly, the task execution management module includes:

[0157] The target management file writing unit is used to write the execution data information of the target task into the associated target management file;

[0158] The batch management file writing unit is used to determine batch task execution data information based on the execution data information of the target task, and write the batch task execution data information into the associated batch management file.

[0159] The file monitoring and management unit is used to monitor and manage the execution progress of target tasks based on the target management file, and to monitor and manage the overall execution progress of batch tasks based on the batch management file.

[0160] In one optional implementation of this embodiment, the management file also includes task metadata information;

[0161] Accordingly, the device further includes a task set determination module, used to, after establishing associated batch management files for the batch tasks and establishing associated target management files for each target task included in the batch tasks,

[0162] The associated target management file of the target task is stored in the set of tasks to be executed, so that when the target cluster or attached cluster executes the batch tasks, it can obtain the metadata information of each target task in the batch tasks from the set of tasks to be executed, and lock the associated target management file of the target task to be executed.

[0163] In an optional implementation of this embodiment, the execution data information includes at least the execution status;

[0164] Correspondingly, the file monitoring and management unit is specifically used for:

[0165] If the execution status of the target task is found to be "execution completed", then the associated target management file of the target task is moved from the set of tasks to be executed to the set of tasks completed.

[0166] In an optional implementation of this embodiment, the file monitoring and management unit is specifically used for:

[0167] After obtaining the execution status of the target task as the execution completed state, the execution environment of the target task in the target cluster or attached cluster remains unchanged.

[0168] Filter other target tasks from the set of tasks to be executed that belong to the same batch task as the currently executed target task;

[0169] Based on the current execution environment, execute the other target tasks.

[0170] In an optional implementation of this embodiment, the execution completion status includes execution success and execution failure;

[0171] Correspondingly, the file monitoring and management unit is specifically used for:

[0172] Based on the execution completion status in the target management file of the completed task set, the failed target tasks that failed to execute are identified;

[0173] Add a backup target management file to the set of tasks to be executed for the failed target task, so that the target cluster or attached cluster can re-execute the failed target task.

[0174] In an optional implementation of this embodiment, the execution data information includes at least the execution time;

[0175] Correspondingly, the file monitoring and management unit is specifically used for:

[0176] Based on the execution time in each target management file in the set of tasks to be executed, determine the timeout target tasks whose execution time exceeds a preset time threshold;

[0177] Add a backup target management file to the set of tasks to be executed for the timed-out target task, so that the target cluster or attached cluster can re-execute the timed-out target task.

[0178] In an optional implementation of this embodiment, the batch task execution data information includes at least the execution status of each target task in the batch task;

[0179] Correspondingly, the file monitoring and management unit is specifically used for:

[0180] If, based on the execution status of each target task in the associated batch management file of the batch task, it is determined that the target tasks in the batch task with an execution status of less than a preset proportion have not been completed, then the target tasks in the batch task with an execution status of incomplete are determined to be long-tail target tasks.

[0181] For the long-tail target tasks in the batch tasks, add backup target management files to the set of tasks to be executed, so that the target cluster or attached cluster can re-execute the long-tail target tasks.

[0182] In an optional implementation of this embodiment, the execution data information includes at least the reason for the failure of the target task; wherein, a mapping relationship between candidate task failure reasons and execution failure error codes is established in advance;

[0183] Correspondingly, the file monitoring and management unit is specifically used for:

[0184] The execution failure error code is determined based on the reason for the failure of the target task, and the execution failure error code is written into the associated target management file of the failed target task and the associated batch management file of the batch task.

[0185] In an optional implementation of this embodiment, when simulating and verifying the autonomous driving algorithm, the batch task submitted by the user is an algorithm verification task, and the target task is an algorithm verification task under different maps.

[0186] The aforementioned batch task execution device can execute the batch task execution method provided in any embodiment of this disclosure, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment can be found in the batch task execution method provided in any embodiment of this disclosure.

[0187] Figure 7 The batch task execution system in the middle can be used as Figure 14A feasible specific implementation architecture for a batch task execution device is described above, wherein the task receiving module in the batch task execution device corresponds to the user submission module in the batch task execution system; the cluster allocation module and task allocation module in the batch task execution device correspond to the multi-cluster scheduling module in the batch task execution system; the management file creation module and the task set determination module in the batch task execution device correspond to the task management module in the batch task execution system; and the task execution management module in the batch task execution device corresponds to the monitoring and retrieval module and the reporting and statistics module in the batch task execution system.

[0188] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0189] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.

[0190] Figure 15 A schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0191] like Figure 15 As shown, device 500 includes a computing unit 501, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 502 or a computer program loaded from storage unit 508 into random access memory (RAM) 503. RAM 503 may also store various programs and data required for the operation of device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via bus 504. Input / output (I / O) interface 505 is also connected to bus 504.

[0192] Multiple components in device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0193] The computing unit 501 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as batch task execution methods. For example, in some embodiments, the batch task execution method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the batch task execution method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform batch task execution methods by any other suitable means (e.g., by means of firmware).

[0194] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0195] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0196] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0197] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0198] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0199] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0200] It should be understood that the various forms of processes shown above can be used to reorder, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0201] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A method for executing batch tasks, comprising: Receive batch tasks submitted by users; wherein the batch tasks include at least two target tasks; Based on the pre-registered cluster resource information, a target cluster and an additional cluster are allocated to the batch tasks; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are batch tasks that are allocated with the additional cluster as the target cluster. Obtain execution data information generated by the target cluster when executing the target task in the batch task, or obtain execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, and monitor and manage the execution of the batch task based on the execution data information; Before acquiring the execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, the method further includes: Receive the execution request from the target cluster for the batch tasks, and allocate the target cluster with the target tasks that have not yet been executed in the batch tasks; The system receives the execution request from the additional cluster for the batch task and allocates target tasks from the batch task that have not yet been executed to the additional cluster; wherein the additional cluster initiates the execution request for the batch task after completing the execution of the original batch task.

2. The method of claim 1, wherein, After receiving the batch tasks submitted by the user, the method further includes: Create an associated batch management file for the batch tasks, and create an associated target management file for each target task included in the batch tasks; Accordingly, the execution of the batch tasks is monitored and managed based on the execution data information, including: Write the execution data information of the target task into the associated target management file; Based on the execution data information of the target task, determine the batch task execution data information, and write the batch task execution data information into the associated batch management file. The execution progress of target tasks is monitored and managed according to the target management document, and the overall execution progress of batch tasks is monitored and managed according to the batch management document.

3. The method of claim 2, wherein, The management file also includes metadata information for the tasks; Accordingly, after establishing associated batch management files for the batch tasks and establishing associated target management files for each target task included in the batch tasks, the method further includes: The associated target management file of the target task is stored in the set of tasks to be executed, so that when the target cluster or attached cluster executes the batch tasks, it can obtain the metadata information of each target task in the batch tasks from the set of tasks to be executed, and lock the associated target management file of the target task to be executed.

4. The method of claim 3, wherein, The execution data information includes at least the execution status; Accordingly, the execution progress of target tasks is monitored and managed according to the target management document, including: If the execution status of the target task is found to be "execution completed", then the associated target management file of the target task is moved from the set of tasks to be executed to the set of tasks completed.

5. The method of claim 4, wherein, Monitor and manage the execution progress of target tasks according to the target management document, including: After obtaining the execution status of the target task as the execution completed state, the execution environment of the target task in the target cluster or attached cluster remains unchanged. Filter other target tasks that belong to the same batch task as the currently executed target task from the set of tasks to be executed; Based on the current execution environment, execute the other target tasks.

6. The method of claim 4, wherein, The execution completion status includes execution success and execution failure; Accordingly, the execution progress of target tasks is monitored and managed according to the target management document, including: Based on the execution completion status in the target management file of the completed task set, the failed target tasks that failed to execute are identified; Add a backup target management file to the set of tasks to be executed for the failed target task, so that the target cluster or attached cluster can re-execute the failed target task.

7. The method of claim 3, wherein, The execution data information includes at least the execution time; Accordingly, the cluster is used to monitor and manage the execution progress of target tasks according to the target management file, including: Based on the execution time in each target management file in the set of tasks to be executed, determine the timeout target tasks whose execution time exceeds a preset time threshold; Add a backup target management file to the set of tasks to be executed for the timed-out target task, so that the target cluster or attached cluster can re-execute the timed-out target task.

8. The method of claim 7, wherein, The batch task execution data information includes at least the execution status of each target task in the batch task; Accordingly, the overall progress of batch task execution is monitored and managed based on the batch management file, including: If, based on the execution status of each target task in the associated batch management file of the batch task, it is determined that the target tasks in the batch task with an execution status of less than a preset proportion have not been completed, then the target tasks in the batch task with an execution status of incomplete are determined to be long-tail target tasks. For the long-tail target tasks in the batch tasks, add backup target management files to the set of tasks to be executed, so that the target cluster or attached cluster can re-execute the long-tail target tasks.

9. The method according to claim 2, wherein, The execution data information includes at least the reason for the failure of the target task; wherein, a mapping relationship between the candidate task failure reasons and the execution failure error codes is established in advance. Accordingly, the execution progress of target tasks is monitored and managed according to the target management file, and the overall execution progress of batch tasks is monitored and managed according to the batch management file, including: The execution failure error code is determined based on the reason for the failure of the target task, and the execution failure error code is written into the associated target management file of the failed target task and the associated batch management file of the batch task.

10. The method according to claim 1, wherein, When simulating and verifying autonomous driving algorithms, the batch tasks submitted by the user are algorithm verification tasks, and the target tasks are algorithm verification tasks under different maps.

11. A batch task execution device, comprising: A task receiving module is used to receive batch tasks submitted by users; wherein the batch tasks include at least two target tasks. The cluster allocation module is used to allocate a target cluster and an additional cluster to the batch tasks according to the pre-registered cluster resource information; wherein, the execution priority of the batch tasks in the additional cluster is lower than that of the original batch tasks in the additional cluster, and the original batch tasks are batch tasks that are allocated with the additional cluster as the target cluster. The task execution management module is used to obtain execution data information generated by the target cluster when executing the target task in the batch task, or to obtain execution data information generated by the target cluster and the additional cluster when executing the target task in the batch task, and to monitor and manage the execution of the batch task based on the execution data information; The device further includes a task allocation module, used to, before acquiring the execution data information generated by the target cluster and the supplementary cluster when executing the target task in the batch task, Receive the execution request from the target cluster for the batch tasks, and allocate the target cluster with the target tasks that have not yet been executed in the batch tasks; The system receives the execution request from the additional cluster for the batch task and allocates target tasks from the batch task that have not yet been executed to the additional cluster; wherein the additional cluster initiates the execution request for the batch task after completing the execution of the original batch task.

12. The apparatus of claim 11, wherein, The device also includes a management file creation module, used after receiving the batch tasks submitted by the user, Create an associated batch management file for the batch tasks, and create an associated target management file for each target task included in the batch tasks; Correspondingly, the task execution management module includes: The target management file writing unit is used to write the execution data information of the target task into the associated target management file; The batch management file writing unit is used to determine batch task execution data information based on the execution data information of the target task, and write the batch task execution data information into the associated batch management file. The file monitoring and management unit is used to monitor and manage the execution progress of target tasks based on the target management file, and to monitor and manage the overall execution progress of batch tasks based on the batch management file.

13. The apparatus of claim 12, wherein, The management file also includes metadata information for the tasks; Accordingly, the device further includes a task set determination module, used to, after establishing associated batch management files for the batch tasks and establishing associated target management files for each target task included in the batch tasks, The associated target management file of the target task is stored in the set of tasks to be executed, so that when the target cluster or attached cluster executes the batch tasks, it can obtain the metadata information of each target task in the batch tasks from the set of tasks to be executed, and lock the associated target management file of the target task to be executed.

14. The apparatus of claim 13, wherein, The execution data information includes at least the execution status; Correspondingly, the file monitoring and management unit is specifically used for: If the execution status of the target task is found to be "execution completed", then the associated target management file of the target task is moved from the set of tasks to be executed to the set of tasks completed.

15. The apparatus of claim 14, wherein, The file monitoring and management unit is specifically used for: After obtaining the execution status of the target task as the execution completed state, the execution environment of the target task in the target cluster or attached cluster remains unchanged. Filter other target tasks that belong to the same batch task as the currently executed target task from the set of tasks to be executed; Based on the current execution environment, execute the other target tasks.

16. The apparatus of claim 14, wherein, The execution completion status includes execution success and execution failure; Correspondingly, the file monitoring and management unit is specifically used for: Based on the execution completion status in the target management file of the completed task set, the failed target tasks that failed to execute are identified; Add a backup target management file to the set of tasks to be executed for the failed target task, so that the target cluster or attached cluster can re-execute the failed target task.

17. The apparatus of claim 13, wherein, The execution data information includes at least the execution time; Correspondingly, the file monitoring and management unit is specifically used for: Based on the execution time in each target management file in the set of tasks to be executed, determine the timeout target tasks whose execution time exceeds a preset time threshold; Add a backup target management file to the set of tasks to be executed for the timed-out target task, so that the target cluster or attached cluster can re-execute the timed-out target task.

18. The apparatus of claim 17, wherein, The batch task execution data information includes at least the execution status of each target task in the batch task; Correspondingly, the file monitoring and management unit is specifically used for: If, based on the execution status of each target task in the associated batch management file of the batch task, it is determined that the target tasks in the batch task with an execution status of less than a preset proportion have not been completed, then the target tasks in the batch task with an execution status of incomplete are determined to be long-tail target tasks. For the long-tail target tasks in the batch tasks, add backup target management files to the set of tasks to be executed, so that the target cluster or attached cluster can re-execute the long-tail target tasks.

19. The apparatus of claim 12, wherein, The execution data information includes at least the reason for the failure of the target task; wherein, a mapping relationship between the candidate task failure reasons and the execution failure error codes is established in advance. Correspondingly, the file monitoring and management unit is specifically used for: The execution failure error code is determined based on the reason for the failure of the target task, and the execution failure error code is written into the associated target management file of the failed target task and the associated batch management file of the batch task.

20. The apparatus of claim 11, wherein, When simulating and verifying autonomous driving algorithms, the batch tasks submitted by the user are algorithm verification tasks, and the target tasks are algorithm verification tasks under different maps.

21. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-10.

23. A computer program product comprising computer programs / instructions, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method according to any one of claims 1-10.