Fault task transfer method and device, computer device and storage medium
By using a distributed application coordination center to detect faulty executors and allocate transfer executors, the problem of inefficiency when task executors fail is solved, enabling efficient task transfer and execution, reducing resource waste and the burden on the control server.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INDUSTRIAL AND COMMERCIAL BANK OF CHINA
- Filing Date
- 2023-03-15
- Publication Date
- 2026-06-16
AI Technical Summary
When existing task executors malfunction or crash, tasks cannot be completed on time, resulting in low execution efficiency and even irreversible losses.
A distributed application coordination center is introduced to detect faulty executors and obtain task execution breakpoints. Based on task attributes, transfer executors are assigned, and unexecuted tasks are transferred to other executors for continued execution through the coordination center, ensuring that tasks are not duplicated and improving efficiency.
Reduce resource waste, improve task execution efficiency, reduce the workload of the control server, and ensure that tasks are completed efficiently in the event of a failure.
Smart Images

Figure CN116302425B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method, apparatus, computer device, and storage medium for transferring faulty tasks. Background Technology
[0002] With the development of modern technology, more and more artificial intelligence is replacing human labor to perform tasks. At the same time, people's requirements for task execution efficiency are also getting higher and higher. However, if the current task executor malfunctions or crashes while performing a task, the task being executed will stop until the task executor is repaired. However, in most cases, repairing the task executor will take a lot of time. If we wait for the task executor to be repaired before continuing to execute the task, it will greatly reduce the task execution efficiency.
[0003] Moreover, in actual operation, many urgent tasks will be encountered. When the task executor fails or crashes, the task cannot be completed on time, and may even cause irreversible losses. Therefore, how to complete the task efficiently when the task executor fails or crashes has become an urgent problem to be solved. Summary of the Invention
[0004] Therefore, it is necessary to provide a method, apparatus, computer device, and storage medium for transferring faulty tasks that can still efficiently complete the task when the task executor fails or crashes, in order to address the above-mentioned technical problems.
[0005] Firstly, this application provides a method for transferring a failed task. The method includes:
[0006] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0007] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0008] Assign a transfer executor to the unexecuted task based on its task attributes;
[0009] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0010] In one embodiment, a distributed application coordination center detects a faulty executor performing a task within the executor cluster, including:
[0011] If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor will be identified based on the deleted executor registration information. The executor registration information is registered with the distributed application coordination center by each executor in the executor cluster after establishing a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is broken.
[0012] Determine if the deleted executor is currently executing a task;
[0013] If so, the executor to be deleted will be designated as a faulty executor.
[0014] In one embodiment, unexecuted tasks in the fault executor cease execution after the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
[0015] In one embodiment, assigning a transfer executor to the unexecuted task based on the task attributes of the unexecuted task includes:
[0016] Determine whether an unexecuted task has expired based on its time attribute.
[0017] If not, then determine whether to split the unexecuted tasks based on the task complexity attribute of the unexecuted tasks;
[0018] If so, based on the task resource attributes of the unexecuted task, the unexecuted task is split into at least two subtasks, and a transfer executor is determined for each of the split subtasks; wherein, the task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0019] In one embodiment, based on the task resource attributes of the unexecuted task, the unexecuted task is split into at least two subtasks, and a transfer executor is determined for each of the split subtasks, including:
[0020] Based on the task resource attributes of the unexecuted tasks, determine the splitting method for the unexecuted tasks; wherein the splitting method includes physical splitting and / or horizontal splitting;
[0021] Based on the splitting method, unexecuted tasks are split into at least two sub-tasks;
[0022] Based on the task resource attributes of each subtask after splitting, a transfer executor is determined for each subtask; wherein, the transfer executor corresponding to each subtask contains at least one task execution thread.
[0023] In one embodiment, the method for splitting unexecuted tasks is determined based on the task resource attributes of the unexecuted tasks, including:
[0024] If the computational resource usage in the resource attributes of an unexecuted task is greater than the computational resource threshold, then the splitting method for the unexecuted task is determined to include physical splitting.
[0025] If the network resource usage in the resource attributes of an unexecuted task exceeds the network resource threshold, or the disk interface usage exceeds the interface resource threshold, then horizontal partitioning is determined to be one of the methods for splitting the unexecuted task.
[0026] In one embodiment, obtaining the task execution breakpoint of the faulty executor includes:
[0027] Based on the task execution results received from the faulty actuator and the original execution task issued to the faulty actuator, determine the task execution breakpoint of the faulty actuator.
[0028] Secondly, this application also provides a faulty task transfer device. The device includes:
[0029] The breakpoint acquisition module is used to acquire the task execution breakpoint of the faulty executor if the distributed application coordination center detects that there is a faulty executor executing a task in the executor cluster.
[0030] The task acquisition module is used to acquire the unexecuted tasks of the faulty executor based on the task execution breakpoint;
[0031] The executor allocation module is used to allocate transfer executors to unexecuted tasks based on their task attributes.
[0032] The result receiving module is used to send unexecuted tasks to the transfer executors through the distributed application coordination center, and to receive the task execution results of the unexecuted tasks from the transfer executors.
[0033] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:
[0034] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0035] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0036] Assign a transfer executor to the unexecuted task based on its task attributes;
[0037] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0038] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:
[0039] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0040] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0041] Assign a transfer executor to the unexecuted task based on its task attributes;
[0042] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0043] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:
[0044] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0045] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0046] Assign a transfer executor to the unexecuted task based on its task attributes;
[0047] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0048] The above-mentioned method, apparatus, computer equipment, and storage medium for transferring faulty tasks. This solution introduces a distributed application coordination center to detect faults in each executor in the executor cluster. After detecting a faulty executor, it determines the execution breakpoint of the faulty executor, obtains the unexecuted tasks after the breakpoint, and then determines a transfer executor to continue the task execution based on the attributes of the unexecuted tasks, and obtains the task execution results of the transfer executor. This solution only transfers unexecuted tasks after the breakpoint to other executors for continued execution, so previously executed tasks will not be executed repeatedly, which not only reduces resource waste but also increases task execution efficiency. Furthermore, when selecting transfer executors, the attributes of the unexecuted tasks are considered to assign executors to them, which also improves the rationality of the determination of transfer executors. In addition, the introduction of a distributed application coordination center to assist in the fault detection of faulty executors and the distribution of tasks reduces the workload of the control server (i.e., the main body of this solution), thereby effectively reducing the resource consumption of the control server. Attached Figure Description
[0049] Figure 1 This is an application environment diagram of a fault task transfer method in one embodiment;
[0050] Figure 2 This is a flowchart illustrating a method for transferring a faulty task in one embodiment;
[0051] Figure 3 This is a flowchart illustrating a method for determining a faulty actuator in one embodiment;
[0052] Figure 4 This is a flowchart illustrating a method for determining a transfer actuator in one embodiment;
[0053] Figure 5 This is a flowchart illustrating the method for determining the transfer actuator in another embodiment;
[0054] Figure 6A This is a flowchart illustrating the fault task transfer method in another embodiment;
[0055] Figure 6B This is a schematic diagram illustrating the actuator registration principle in one embodiment;
[0056] Figure 6C This is a schematic diagram illustrating the principle of head nurse task transfer in one embodiment;
[0057] Figure 7 This is a structural block diagram of a fault task transfer device in one embodiment;
[0058] Figure 8 This is a structural block diagram of a fault task transfer device in another embodiment;
[0059] Figure 9 This is a structural block diagram of the fault task transfer device in another embodiment;
[0060] Figure 10 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0061] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0062] The fault task transfer method provided in this application embodiment can be applied to, for example, Figure 1 In the application environment shown, the actuator cluster 102 communicates with the control server 104 through a distributed application coordination center. In this embodiment, the actuator cluster 102 internally deploys at least two task executors. Each task executor can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, smart vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, head-mounted devices, etc. The control server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers. When the control server 104 detects a fault in an executor in the executor cluster 102 while it is executing a task through the heartbeat monitoring module of the distributed application coordination center, the control server 104 obtains the task execution breakpoint of the faulty executor. Based on the task execution breakpoint, it obtains the unexecuted tasks of the faulty executor and assigns a transfer executor to the unexecuted tasks according to their task attributes. Then, it re-issues the unexecuted tasks to each transfer executor in the executor cluster 102 through the instruction issuing module of the distributed application coordination center. Each transfer executor in the executor cluster 102 responds to the issued tasks and feeds back its task execution results to the control server 104 through the result feedback module of the distributed application coordination center.
[0063] In one embodiment, such as Figure 2 As shown, a method for transferring faulty tasks is provided, which can be applied to... Figure 1 The control server example in the example is described below, including the following steps:
[0064] S201. If the distributed application coordination center detects a faulty executor in the executor cluster that is currently executing a task, then obtain the task execution breakpoint of the faulty executor.
[0065] The distributed application coordination center is used to establish a heartbeat connection with the executor cluster, replacing the control server to listen to the task execution status of each executor in the cluster. The distributed application coordination center can be ZooKeeper. The executor cluster consists of at least two executors. The executors execute tasks issued by the control server through the distributed application coordination center. During the execution of tasks by the executors, the control server receives feedback on task execution status from the executors through the distributed application coordination center. The task execution breakpoint records the position where the executing task is interrupted when an executor fails.
[0066] Optionally, the control server controls the distributed application coordination center to monitor the task execution status of each executor in the executor cluster through a heartbeat session mechanism. When the control server detects a faulty executor through the distributed application coordination center, it will locate the executor by searching sequentially and determine whether the executor is currently executing a task. If so, the control server will use the task processing progress feedback from the faulty executor last obtained through the distributed application coordination center as the breakpoint.
[0067] Specifically, regarding how to obtain the task execution breakpoint, it can be determined by the task execution result received from the faulty executor and the original execution task sent to the faulty executor.
[0068] The original execution task is a task that the control server sends to the faulty executor before the fault occurs, which it needs to handle.
[0069] For example, the original execution task being executed by the task executor may contain many subtasks, such as 100 subtasks. After the task executor completes each subtask, the control server will obtain the task execution result from the task executor through the distributed application coordination center. If the executor fails while executing the 59th subtask in the original execution task, the 59th subtask in the original execution task can be considered as the task execution breakpoint.
[0070] S202, based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor.
[0071] Among them, the unexecuted tasks are the parts of the original execution tasks that the malfunctioning executor should have executed but have not yet been executed when the executor fails.
[0072] Optionally, after obtaining the task execution breakpoint, determine that the tasks following the breakpoint location in the original task execution are the unexecuted tasks of the faulty executor.
[0073] For example, if the original execution task being executed by the executor contains 100 subtasks, and the 59th subtask in the original execution task is the task execution breakpoint, then the 59th to 100th subtasks in the original execution task are the unexecuted tasks of the faulty executor.
[0074] S203, assign a transfer executor to the unexecuted task based on the task attributes of the unexecuted task.
[0075] Among them, task attributes are used to characterize various attributes of a task. For example, task attributes may include, but are not limited to, time attributes, complexity attributes, and task resource attributes. The transfer executor is the executor that takes over the execution of the unfinished task when the executor fails.
[0076] Optionally, based on the task attributes of the unexecuted task, such as the time attribute, if the time attribute indicates that the task needs to be completed in a short period of time, then a transfer executor with higher execution efficiency can be assigned to the unexecuted task; alternatively, based on the complexity attribute of the unexecuted task, if the complexity of the unexecuted task is high, then the task can be split and assigned to multiple transfer executors; alternatively, based on the task resource attribute of the unexecuted task, a transfer executor that meets the task resource attribute requirements of the unexecuted task can be assigned to the unexecuted task.
[0077] In addition, when assigning transfer executors to unexecuted tasks based on their task attributes, besides considering the task attributes, the idle status of each executor in the executor cluster can also be taken into account. Under the premise of satisfying the task attributes, the transfer executors are assigned according to the principle of prioritizing idle ones.
[0078] S204, through the distributed application coordination center, distributes unexecuted tasks to the transfer executors and receives the task execution results of the unexecuted tasks from the transfer executors.
[0079] Optionally, the control server can control the distributed application coordination center to send unexecuted tasks to the transfer executor via a heartbeat session. The transfer executor will execute the unexecuted tasks while simultaneously feeding back the execution results of the unexecuted tasks to the distributed application coordination center via a heartbeat session.
[0080] In the above embodiments, this solution introduces a distributed application coordination center to perform fault detection on each executor in the executor cluster. Upon detecting a faulty executor, it determines the execution breakpoint of the faulty executor, obtains the unexecuted tasks after the breakpoint, and then determines a transfer executor to continue executing the task based on the attributes of the unexecuted task, and obtains the task execution result of the transferred executor. This solution only transfers unexecuted tasks after the breakpoint to other executors for continued execution, thus preventing previously executed tasks from being executed repeatedly, reducing resource waste and increasing task execution efficiency. Furthermore, by considering the attributes of the unexecuted task when selecting a transfer executor, the rationality of executor selection is improved. In addition, the introduction of a distributed application coordination center to assist in fault detection of faulty executors and task distribution reduces the workload of the control server (i.e., the main body of this solution), thereby effectively reducing the resource consumption of the control server.
[0081] Based on the above embodiments, such as Figure 3 As shown, this embodiment details how to detect a faulty executor executing a task within an executor cluster. The specific methods include:
[0082] S301: If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, then the deleted executor is determined based on the deleted executor registration information.
[0083] The executor registration information is the information registered by each executor in the executor cluster with the distributed application coordination center after establishing a heartbeat connection. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is disconnected. The executor registration information is the information registered by the executor when it registers with the distributed application coordination center, such as the executor number and executor function.
[0084] Specifically, each executor cluster includes at least two executors. The control server connects to the distributed application coordination center via a heartbeat mechanism, while the distributed application coordination center also connects to the executor cluster via a heartbeat mechanism. After each executor in the cluster starts up, it registers its own information with the distributed application coordination center. When an executor fails or crashes, the connection between the distributed application coordination center and the executor cluster will be broken, and the distributed application coordination center will delete the registration information of the executor after detecting the disconnected executor.
[0085] Optionally, the control server periodically retrieves the executor registration information from the distributed application coordination center and compares the retrieved executor registration information with the executor registration information retrieved in the previous period. When it finds that the registration information of a certain executor is missing from the executor registration information retrieved in the previous period, it determines that the executor is a deleted executor.
[0086] S302, determine whether the deleted executor is currently executing a task. If yes, execute S303; otherwise, return to execute S301.
[0087] Optionally, this step is used to determine whether a task is being executed in the deleted executor. If a task is being executed, the S303 operation is performed, and the deleted executor is treated as a faulty executor. If it does not exist, the process returns to continue monitoring the executor registration information maintained by the distributed application coordination center to see if any deletion has occurred.
[0088] S303, if so, the executor to be deleted will be designated as a faulty executor.
[0089] Optionally, if it is determined that a task is being executed in the deleted executor, then the deleted executor is determined to be a faulty executor.
[0090] It should be noted that unexecuted tasks in the faulty executor stop execution once the faulty executor detects a break in the heartbeat connection with the distributed application coordination center. Specifically, regardless of whether the executor has actually failed, as long as the faulty executor detects a break in the heartbeat connection with the distributed application coordination center, it stops the currently executing tasks. This method avoids the situation where the executor appears to be dead and tasks are repeatedly executed.
[0091] The above embodiment identifies deleted executors by monitoring the deletion of executor registration information maintained by the distributed application coordination center, and then identifies faulty executors based on the task execution status of the deleted executors. This allows for more accurate and faster identification of faulty executors.
[0092] To demonstrate in more detail how to assign a transfer executor to an unexecuted task, such as Figure 4 As shown, the conditions for splitting unexecuted tasks are explained in detail, and the specific methods include:
[0093] S401, Begin.
[0094] S402: Based on the task time attribute of the unexecuted task, determine whether the unexecuted task has expired. If yes, return to execute S401; otherwise, execute S403.
[0095] Among them, the task time attribute is a time-related attribute in the task, such as the validity period of the task.
[0096] Optionally, the task validity period in the task time attribute of the unexecuted task can be compared with the current time. If the current time is not within the validity period of the unexecuted task, the unexecuted task is determined to be invalid; if the current time is within the validity period of the unexecuted task, the unexecuted task is determined to be valid.
[0097] S403. If not, determine whether to split the unexecuted task based on the task complexity attribute of the unexecuted task. If yes, execute S404; otherwise, execute S405.
[0098] Among them, the task complexity attribute is used to characterize the degree of task complexity.
[0099] Optionally, the complexity includes two levels: complex and simple. When the complexity attribute of a task to be executed is complex, the unexecuted task is split. When the complexity attribute of a task to be executed is simple, the unexecuted task is not split.
[0100] S404, if so, then based on the task resource attributes of the unexecuted task, split the unexecuted task into at least two subtasks, and determine a transfer executor for each of the split subtasks.
[0101] The task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0102] Optionally, based on task resource attributes such as computational resource consumption, network resource consumption, and disk interface consumption, unexecuted tasks can be split into at least two subtasks. For example, when the computational resource consumption of an unexecuted task is large, since the computational resources of each transfer executor are limited, the unexecuted task can be split into multiple subtasks with lower computational resource consumption; when the network resource consumption of an unexecuted task is large, since the network resources of each transfer executor are limited, the unexecuted task can be split into multiple subtasks with lower network resource consumption.
[0103] S405 determines the transfer actuator for unexecuted tasks.
[0104] Optionally, this case corresponds to situations where unexecuted tasks do not need to be split, and the corresponding transfer executor is directly determined for the unexecuted tasks.
[0105] The above embodiments split unexecuted tasks into multiple subtasks based on their task resource attributes, and determine transfer executors for each of these subtasks. This is equivalent to the unexecuted tasks being executed by multiple executors, which further accelerates task execution efficiency. At the same time, the splitting of unexecuted tasks takes into account multiple dimensions of attributes, which also increases the rationality of the splitting of unexecuted tasks.
[0106] The above embodiments illustrate how an unexecuted task can be split into multiple subtasks based on its task resource attributes. In this embodiment, for example... Figure 5 As shown, this paper details how to break down unexecuted tasks and determine the transfer executor. The specific method is as follows:
[0107] S501, determine the splitting method for unexecuted tasks based on the task resource attributes of unexecuted tasks.
[0108] The splitting methods include physical splitting and / or horizontal splitting; physical splitting involves splitting an unexecuted task into multiple subtasks, each of which is executed by a different transfer executor; horizontal splitting involves splitting an unexecuted task into multiple subtasks, each of which is executed by a different thread within the same transfer executor.
[0109] Optionally, when the computational resource consumption of unexecuted tasks is high, the method for splitting unexecuted tasks is determined to include physical splitting; when the network resource consumption of unexecuted tasks is high, the method for splitting unexecuted tasks is determined to be horizontal splitting.
[0110] Specifically, if the computational resource usage in the task resource attributes of an unexecuted task is greater than the computational resource threshold, then the unexecuted task is determined to include physical partitioning in its partitioning method; if the network resource usage in the task resource attributes of an unexecuted task is greater than the network resource threshold, or the disk interface usage is greater than the interface resource threshold, then the unexecuted task is determined to include horizontal partitioning in its partitioning method.
[0111] It should be noted that the splitting of unexecuted tasks can include both physical splitting and horizontal splitting. For example, when the computational resource usage in the task resource attributes of an unexecuted task exceeds the computational resource threshold, and the network resource usage in the task resource attributes of an unexecuted task exceeds the network resource threshold, then the unexecuted task needs to be split both physically and horizontally.
[0112] S502, based on the splitting method, split the unexecuted task into at least two subtasks.
[0113] Optionally, based on the method of splitting unexecuted tasks, the unexecuted tasks are split into at least two sub-tasks according to the preset splitting rules. For example, the preset rule may be to split the unexecuted tasks into sub-tasks in which the computational resource consumption in the task resource attributes is less than the computational resource threshold and the network resource consumption in the task resource attributes is less than the network resource threshold.
[0114] S503, determine the transfer executor for each of the split subtasks based on the task resource attributes of each subtask.
[0115] Each subtask's corresponding transfer executor contains at least one task execution thread; each task execution thread can execute one subtask.
[0116] Optionally, based on the task resource attributes of each subtask after splitting, such as computational resource consumption and network resource consumption, the computational resource consumption and network resource consumption of the subtask are compared with the computational resource consumption and network resource consumption of the transfer executor. When the computational resource consumption and network resource consumption of the transfer executor are both greater than or equal to the computational resource consumption and network resource consumption of the subtask, the transfer executor is determined to be the transfer executor of the subtask.
[0117] For each subtask after physical splitting, a corresponding transfer executor is determined for the task resource attributes of each subtask. The task resource attributes of each transfer executor can satisfy the corresponding subtask, and each transfer executor includes at least one task execution thread for executing the subtask.
[0118] For each subtask after being split in a horizontal splitting method, a transfer executor is determined for each subtask, and the number of task execution threads contained in the transfer executor shall not be less than the number of subtasks.
[0119] It should be noted that some of the subtasks after splitting have dependencies on each other, while others do not. When unexecuted tasks are sent to the transfer executor, the dependencies between the subtasks are also sent to the transfer executor, so that each transfer executor can execute the corresponding subtask based on the dependencies between the subtasks.
[0120] In the above embodiment, the method of splitting unexecuted tasks is determined based on the task resource attributes of the unexecuted tasks, and a transfer executor is determined for each of the split sub-tasks based on the task resource attributes of each sub-task. This method ensures the rationality of the determination of the transfer executor and ensures that each transfer executor can be adapted to the split sub-tasks.
[0121] To more comprehensively demonstrate this solution, this embodiment presents an optional method for transferring faulty tasks, such as... Figure 6A As shown:
[0122] S601: If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor is determined based on the deleted executor registration information.
[0123] The executor registration information is registered with the distributed application coordination center after each executor in the executor cluster establishes a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is disconnected.
[0124] S602, determine whether the deleted executor is currently executing a task. If yes, execute S603; otherwise, return to execute S601.
[0125] S603, if so, the executor to be deleted will be designated as a faulty executor.
[0126] S604. Based on the task execution results received from the faulty actuator and the original execution task sent to the faulty actuator, determine the task execution breakpoint of the faulty actuator.
[0127] S605: Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor.
[0128] S606: Based on the task time attribute of the unexecuted task, determine whether the unexecuted task has expired. If yes, return to execute S601; otherwise, execute S607.
[0129] S607. If not, determine whether to split the unexecuted task based on the task complexity attribute of the unexecuted task. If yes, execute S608; otherwise, execute S614.
[0130] S608, determine whether the computational resource usage in the resource attributes of the unexecuted task is greater than the computational resource threshold. If yes, execute S609; otherwise, execute S610.
[0131] S609, Determines that the splitting method for unexecuted tasks includes physical splitting.
[0132] S610, determine whether the network resource usage in the task resource attributes of the unexecuted task is greater than the network resource threshold, or whether the disk interface usage is greater than the interface resource threshold. If yes, execute S611; otherwise, execute S614.
[0133] S611, determine that horizontal splitting is included in the splitting method for unexecuted tasks.
[0134] S612, based on the splitting method, split the unexecuted task into at least two subtasks.
[0135] S613, determine the transfer executor for each of the split subtasks based on the task resource attributes of each subtask.
[0136] Each subtask's corresponding transfer executor contains at least one task execution thread.
[0137] S614, determine the transfer actuator for the unexecuted task.
[0138] S615, through the distributed application coordination center, the unexecuted task is sent to the transfer executor, and the task execution result of the unexecuted task is received from the transfer executor.
[0139] It should be noted that if an unexecuted task is split, the split subtasks will be sent to the corresponding transfer executor of each subtask. If an unexecuted task is not split, it will be sent to the corresponding transfer executor of that unexecuted task.
[0140] Optional, such as Figures 6B-6C As shown, after executor A and executor B start, they both register with the distributed application coordination center via a heartbeat connection. The control server listens to the registration information maintained by the distributed application coordination center through the heartbeat connection. When executor A fails, the connection between executor A and the distributed application coordination center is broken. The control server detects that the registration information of executor A maintained by the distributed application coordination center has been deleted. At this time, the control server sends the unexecuted tasks of executor A to executor B for continued execution through the distributed application coordination center.
[0141] The specific processes of S601-S615 described above can be found in the description of the above method embodiments. Their implementation principles and technical effects are similar, and will not be repeated here.
[0142] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0143] Based on the same inventive concept, this application also provides a fault task transfer apparatus for implementing the fault task transfer method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more fault task transfer apparatus embodiments provided below can be found in the limitations of the fault task transfer method described above, and will not be repeated here.
[0144] In one embodiment, such as Figure 7 As shown, a fault task transfer device 7 is provided, comprising: a breakpoint acquisition module 70, a task acquisition module 71, an executor allocation module 72, and a result receiving module 73, wherein:
[0145] The breakpoint acquisition module 70 is used to acquire the task execution breakpoint of the faulty executor if the distributed application coordination center detects that there is a faulty executor executing a task in the executor cluster.
[0146] The task acquisition module 71 is used to acquire the unexecuted tasks of the faulty executor based on the task execution breakpoint;
[0147] The executor allocation module 72 is used to allocate a transfer executor to an unexecuted task based on the task attributes of the unexecuted task;
[0148] The result receiving module 73 is used to send unexecuted tasks to the transfer executor through the distributed application coordination center, and to receive the task execution results of the unexecuted tasks from the transfer executor.
[0149] In another embodiment, such as Figure 8 As shown above, Figure 7 The fault transfer device 7 also includes:
[0150] The fault detection module 74 is used to identify the deleted executor based on the deleted executor registration information if it detects that the executor registration information maintained by the distributed application coordination center has been deleted; determine whether the deleted executor is currently executing a task; if so, the deleted executor is identified as a faulty executor.
[0151] The executor registration information is registered with the distributed application coordination center after each executor in the executor cluster establishes a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is disconnected.
[0152] In another embodiment, unexecuted tasks in the fault executor cease execution after the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
[0153] In another embodiment, such as Figure 9 As shown above, Figure 7 The actuator allocation module 72 also includes:
[0154] The failure determination unit 720 is used to determine whether an unexecuted task has failed based on the task time attribute of the unexecuted task.
[0155] Task splitting unit 721 is used to determine whether to split the unexecuted task based on the task complexity attribute of the unexecuted task if no;
[0156] The executor determination unit 722 is used to, if so, split the unexecuted task into at least two subtasks according to the task resource attributes of the unexecuted task, and determine a transfer executor for each of the split subtasks;
[0157] The task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0158] In another embodiment, the above Figure 9 The actuator determination unit 722 further includes:
[0159] The splitting determination subunit 7220 is used to determine the splitting method of the unexecuted task based on the task resource attributes of the unexecuted task; wherein the splitting method includes physical splitting and / or horizontal splitting;
[0160] Task splitting subunit 7221 is used to split an unexecuted task into at least two subtasks according to the splitting method;
[0161] The executor determination subunit 7222 is used to determine the transfer executor for each of the split subtasks based on the task resource attributes of each subtask.
[0162] Each subtask's corresponding transfer executor contains at least one task execution thread.
[0163] In another embodiment, the actuator determining subunit 7222 in the above embodiments is further specifically used for:
[0164] If the computational resource usage in the task resource attributes of an unexecuted task is greater than the computational resource threshold, then the unexecuted task is determined to include physical partitioning in its partitioning method; if the network resource usage in the task resource attributes of an unexecuted task is greater than the network resource threshold, or the disk interface usage is greater than the interface resource threshold, then the unexecuted task is determined to include horizontal partitioning in its partitioning method.
[0165] In another embodiment, the above Figure 7 The breakpoint acquisition module 70 is specifically used to determine the task execution breakpoint of the faulty executor based on the task execution results received from the faulty executor and the original execution task sent to the faulty executor.
[0166] Each module in the aforementioned fault transfer device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0167] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 10 As shown, the computer device includes a processor, memory, input / output interface, communication interface, display unit, and input device. The processor, memory, and input / output interface are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interface. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input / output interface is used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a method for transferring failed tasks. The display unit is used to form a visually visible image and can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.
[0168] Those skilled in the art will understand that Figure 10 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0169] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:
[0170] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0171] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0172] Assign a transfer executor to the unexecuted task based on its task attributes;
[0173] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0174] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0175] If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor will be identified based on the deleted executor registration information. The executor registration information is registered with the distributed application coordination center by each executor in the executor cluster after establishing a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is broken.
[0176] Determine if the deleted executor is currently executing a task;
[0177] If so, the executor to be deleted will be designated as a faulty executor.
[0178] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0179] Unexecuted tasks in the fault executor cease execution once the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
[0180] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0181] Determine whether an unexecuted task has expired based on its time attribute.
[0182] If not, then determine whether to split the unexecuted tasks based on the task complexity attribute of the unexecuted tasks;
[0183] If so, based on the task resource attributes of the unexecuted task, the unexecuted task is split into at least two subtasks, and a transfer executor is determined for each of the split subtasks; wherein, the task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0184] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0185] Based on the task resource attributes of the unexecuted tasks, determine the splitting method for the unexecuted tasks; wherein the splitting method includes physical splitting and / or horizontal splitting;
[0186] Based on the splitting method, unexecuted tasks are split into at least two sub-tasks;
[0187] Based on the task resource attributes of each subtask after splitting, a transfer executor is determined for each subtask; wherein, the transfer executor corresponding to each subtask contains at least one task execution thread.
[0188] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0189] If the computational resource usage in the resource attributes of an unexecuted task is greater than the computational resource threshold, then the splitting method for the unexecuted task is determined to include physical splitting.
[0190] If the network resource usage in the resource attributes of an unexecuted task exceeds the network resource threshold, or the disk interface usage exceeds the interface resource threshold, then horizontal partitioning is determined to be one of the methods for splitting the unexecuted task.
[0191] In one embodiment, the processor, when executing a computer program, also performs the following steps:
[0192] Based on the task execution results received from the faulty actuator and the original execution task issued to the faulty actuator, determine the task execution breakpoint of the faulty actuator.
[0193] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:
[0194] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0195] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0196] Assign a transfer executor to the unexecuted task based on its task attributes;
[0197] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0198] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0199] If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor will be identified based on the deleted executor registration information. The executor registration information is registered with the distributed application coordination center by each executor in the executor cluster after establishing a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is broken.
[0200] Determine if the deleted executor is currently executing a task;
[0201] If so, the executor to be deleted will be designated as a faulty executor.
[0202] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0203] Unexecuted tasks in the fault executor cease execution once the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
[0204] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0205] Determine whether an unexecuted task has expired based on its time attribute.
[0206] If not, then determine whether to split the unexecuted tasks based on the task complexity attribute of the unexecuted tasks;
[0207] If so, based on the task resource attributes of the unexecuted task, the unexecuted task is split into at least two subtasks, and a transfer executor is determined for each of the split subtasks; wherein, the task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0208] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0209] Based on the task resource attributes of the unexecuted tasks, determine the splitting method for the unexecuted tasks; wherein the splitting method includes physical splitting and / or horizontal splitting;
[0210] Based on the splitting method, unexecuted tasks are split into at least two sub-tasks;
[0211] Based on the task resource attributes of each subtask after splitting, a transfer executor is determined for each subtask; wherein, the transfer executor corresponding to each subtask contains at least one task execution thread.
[0212] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0213] If the computational resource usage in the resource attributes of an unexecuted task is greater than the computational resource threshold, then the splitting method for the unexecuted task is determined to include physical splitting.
[0214] If the network resource usage in the resource attributes of an unexecuted task exceeds the network resource threshold, or the disk interface usage exceeds the interface resource threshold, then horizontal partitioning is determined to be one of the methods for splitting the unexecuted task.
[0215] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0216] Based on the task execution results received from the faulty actuator and the original execution task issued to the faulty actuator, determine the task execution breakpoint of the faulty actuator.
[0217] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, performs the following steps:
[0218] If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained.
[0219] Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor;
[0220] Assign a transfer executor to the unexecuted task based on its task attributes;
[0221] Through the distributed application coordination center, unexecuted tasks are distributed to the transfer executors, and the task execution results of the unexecuted tasks are received from the transfer executors.
[0222] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0223] If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor will be identified based on the deleted executor registration information. The executor registration information is registered with the distributed application coordination center by each executor in the executor cluster after establishing a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is broken.
[0224] Determine if the deleted executor is currently executing a task;
[0225] If so, the executor to be deleted will be designated as a faulty executor.
[0226] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0227] Unexecuted tasks in the fault executor cease execution once the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
[0228] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0229] Determine whether an unexecuted task has expired based on its time attribute.
[0230] If not, then determine whether to split the unexecuted tasks based on the task complexity attribute of the unexecuted tasks;
[0231] If so, based on the task resource attributes of the unexecuted task, the unexecuted task is split into at least two subtasks, and a transfer executor is determined for each of the split subtasks; wherein, the task resource attributes include at least one of computing resource usage, network resource usage, and disk interface usage.
[0232] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0233] Based on the task resource attributes of the unexecuted tasks, determine the splitting method for the unexecuted tasks; wherein the splitting method includes physical splitting and / or horizontal splitting;
[0234] Based on the splitting method, unexecuted tasks are split into at least two sub-tasks;
[0235] Based on the task resource attributes of each subtask after splitting, a transfer executor is determined for each subtask; wherein, the transfer executor corresponding to each subtask contains at least one task execution thread.
[0236] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0237] If the computational resource usage in the resource attributes of an unexecuted task is greater than the computational resource threshold, then the splitting method for the unexecuted task is determined to include physical splitting.
[0238] If the network resource usage in the resource attributes of an unexecuted task exceeds the network resource threshold, or the disk interface usage exceeds the interface resource threshold, then horizontal partitioning is determined to be one of the methods for splitting the unexecuted task.
[0239] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:
[0240] Based on the task execution results received from the faulty actuator and the original execution task issued to the faulty actuator, determine the task execution breakpoint of the faulty actuator.
[0241] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0242] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0243] The above embodiments are merely illustrative of several implementation methods of this application, and their descriptions are relatively specific and detailed. However, they should not be construed as limiting the scope of this application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A method for transferring a faulty task, characterized in that, The method includes: If the distributed application coordination center detects a faulty executor in the executor cluster that is executing a task, then the task execution breakpoint of the faulty executor is obtained; the task execution breakpoint records the position where the executing task was interrupted when the executor failed. Based on the task execution breakpoint, obtain the unexecuted tasks of the faulty executor; the unexecuted tasks are the tasks that are currently being executed after the task execution breakpoint location; Based on the task attributes of the unexecuted task, assign a transfer executor to the unexecuted task; The distributed application coordination center distributes the unexecuted tasks to the transfer executor and receives the task execution results of the unexecuted tasks from the transfer executor. The step of assigning a transfer executor to the unexecuted task based on its task attributes includes: Based on the task time attribute of the unexecuted task, determine whether the unexecuted task has expired; If not, then determine whether to split the unexecuted task based on the task complexity attribute of the unexecuted task; If so, the unexecuted task is split into at least two subtasks according to its task resource attributes; wherein, if the computational resource usage in the task resource attributes of the unexecuted task is greater than the computational resource threshold, the splitting method of the unexecuted task is determined to include physical splitting; if the network resource usage in the task resource attributes of the unexecuted task is greater than the network resource threshold, or the disk interface usage is greater than the interface resource threshold, the splitting method of the unexecuted task is determined to include horizontal splitting. Based on the task resource attributes of each subtask after splitting, a transfer executor is determined for each subtask after splitting; wherein, each subtask in at least two subtasks obtained through physical splitting is executed by a different transfer executor, and each subtask in at least two subtasks obtained through horizontal splitting is executed by different threads in the same transfer executor.
2. The method according to claim 1, characterized in that, Through the distributed application coordination center, it detects faulty executors in the executor cluster that are currently executing tasks, including: If it is detected that the executor registration information maintained by the distributed application coordination center has been deleted, the deleted executor is determined based on the deleted executor registration information. The executor registration information is registered with the distributed application coordination center by each executor in the executor cluster after establishing a heartbeat connection with the distributed application coordination center. The distributed application coordination center will monitor the heartbeat connection with each executor in real time and delete the executor registration information of each executor whose heartbeat is disconnected. Determine whether the deleted executor is currently executing a task; If so, the deleted executor will be considered a faulty executor.
3. The method according to claim 2, characterized in that, Unexecuted tasks in the fault executor cease execution after the fault executor detects a break in the heartbeat connection with the distributed application coordination center.
4. The method according to claim 1, characterized in that, Obtaining the task execution breakpoint of the faulty actuator includes: Based on the task execution results received from the faulty executor and the original execution task issued to the faulty executor, the task execution breakpoint of the faulty executor is determined.
5. A fault task transfer device, characterized in that, The device includes: The breakpoint acquisition module is used to acquire the task execution breakpoint of the faulty executor if the distributed application coordination center detects a faulty executor executing a task in the executor cluster; the task execution breakpoint records the position where the executing task was interrupted when the executor failed. The task acquisition module is used to acquire the unexecuted tasks of the faulty executor based on the task execution breakpoint; the unexecuted tasks are the tasks that are currently being executed after the task execution breakpoint position. An executor allocation module is used to allocate a transfer executor to the unexecuted task based on the task attributes of the unexecuted task; The result receiving module is used to send the unexecuted task to the transfer executor through the distributed application coordination center, and to receive the task execution result of the unexecuted task fed back by the transfer executor; The executor allocation module is specifically used to determine whether the unexecuted task has failed based on its task time attribute; if not, it determines whether to split the unexecuted task based on its task complexity attribute; if yes, it splits the unexecuted task into at least two subtasks based on its task resource attribute; wherein, if the computational resource usage in the task resource attribute of the unexecuted task is greater than the computational resource threshold, it is determined that the splitting method of the unexecuted task includes physical splitting; if the network resource usage in the task resource attribute of the unexecuted task is greater than the network resource threshold, or the disk interface usage is greater than the interface resource threshold, it is determined that the splitting method of the unexecuted task includes horizontal splitting; and determines a transfer executor for each of the split subtasks based on their task resource attributes; wherein, each of the at least two subtasks obtained through physical splitting is executed by a different transfer executor, and each of the at least two subtasks obtained through horizontal splitting is executed by different threads in the same transfer executor.
6. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 4.
7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 4.
8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 4.