A cross-center collaborative task dynamic planning method based on a pipeline mechanism

By adopting a pipeline-based dynamic planning method for cross-center collaborative tasks, the problems of complex data fusion operations and insufficient resource utilization in cross-data center collaborative computing are solved, and efficient cross-center collaborative computing and resource optimization are achieved.

CN116225642BActive Publication Date: 2026-06-26COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Filing Date
2022-12-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies suffer from complex and inefficient data fusion operations in cross-data center collaborative computing, and lack dynamic planning schemes, resulting in insufficient resource utilization.

Method used

A pipeline-based dynamic programming approach for cross-center collaborative tasks is adopted. Through collaborative task orchestration, subtask partitioning, initialization, and dynamic programming, data migration and resource utilization are optimized, and top-down execution is performed using a DAG (Directed Acyclic Graph).

Benefits of technology

It improves the execution efficiency and resource utilization of cross-data center collaborative tasks, and realizes efficient cross-data center collaborative computing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116225642B_ABST
    Figure CN116225642B_ABST
Patent Text Reader

Abstract

The application discloses a cross-center collaborative task dynamic planning method based on a pipeline mechanism, and steps of the method comprise the following steps: 1) according to a collaborative requirement, arranging a data source and a collaborative model algorithm, and configuring a cross-center collaborative computing task; 2) dividing the cross-center collaborative computing task into subtasks according to whether data migration behavior will be generated, and forming a DAG (Directed Acyclic Graph); 3) according to the DAG, dividing different data sources into different subtasks; 4) when a subtask T is allocated with a data source D, the subtask T is sent to a data center where the data source D is located to be executed; for a subtask T' which is not allocated with a data source, according to execution conditions of an upstream subtask of the subtask T' and resource conditions of each data center, a data center for executing the subtask T' is determined, and then the subtask T' is sent to the corresponding data center to be executed. The application realizes efficient arrangement and execution of cross-data-center collaborative tasks for complex analysis scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of pipeline, cross-data center, and collaborative computing technologies, and proposes a dynamic programming method for cross-data center collaborative tasks based on a pipeline mechanism. Background Technology

[0002] Typical interdisciplinary applications require the integration of scientific data from multiple disciplines and fields. For example, black soil ecological analysis and air pollution control require the integration and analysis of ecological data, atmospheric data, and soil data. These data are scattered across different data centers, are large in volume, and difficult to migrate. Furthermore, due to security and privacy considerations, they cannot be made public. Traditional methods often require manual copying of data before integration and analysis, which is complex and inefficient.

[0003] Meanwhile, the "Orchestration Method and System for Cross-Center Collaborative Computing Based on Pipeline Mechanism" (Patent No.: 2022101459584) proposes a planning method for cross-center collaborative tasks. However, the planning method only performs static planning based on the location of the data. Although it proposes dynamic planning based on the dynamic perception of the computing resources, storage resources, and data volume of upstream and downstream data centers, it does not provide a specific planning scheme.

[0004] Based on this background, the inventors of this application provide a dynamic programming method for cross-center collaborative tasks based on a pipeline mechanism. Summary of the Invention

[0005] To improve the efficiency of cross-data center collaborative analysis, this invention provides a dynamic planning method for cross-data center collaborative tasks based on a pipeline mechanism, enabling efficient orchestration and execution of cross-data center collaborative tasks for complex analysis scenarios.

[0006] To achieve the above objectives, the present invention adopts the following technical solution:

[0007] A dynamic programming method for cross-center collaborative computing tasks based on a pipeline mechanism includes the following steps:

[0008] 1) Collaborative Task Orchestration. Users orchestrate data sources and collaborative model algorithms according to their collaborative needs, and configure cross-center collaborative computing tasks. Data center researchers register data sources and various model algorithms into the collaborative network; users, based on their own needs, search for the required data sources and collaborative model algorithms in the collaborative network and perform the planning and processing of this invention.

[0009] 2) Develop a task execution plan. Divide the cross-center collaborative computing task into subtasks based on whether data migration will occur. After subtask division, the cross-center collaborative computing task will form a high-level DAG (Directed Acyclic Graph), where subtasks are nodes in the graph and the execution order of subtasks is the edge in the graph.

[0010] 3) Initialize subtask execution nodes. Based on the task execution plan strategy, different data sources are divided into different subtasks. A subtask, as the smallest unit of execution, is dispatched to the data center where the data source resides when it contains a data source component. When a subtask does not contain a data source component, the execution node needs to be dynamically planned. Different data sources manifest as different components. The task partitioning strategy is to split subtasks according to the Merge operation; therefore, different data sources are assigned to different subtasks. This is because if two data sources are in the same subtask, a Merge operation is required for this to work.

[0011] 4) Dynamically plan the execution of subtasks. Based on the execution status of upstream subtasks and the resource status of each data center, dynamically plan the execution nodes of subtasks.

[0012] Furthermore, the above method first orchestrates the collaborative task T. The collaborative network is set to consist of five data center nodes: DataCenter-A, DataCenter-B, DataCenter-C, DataCenter-D, and DataCenter-E. DataCenter-A, DataCenter-B, and DataCenter-C register three data sources: Data1, Data2, and Data3, with data sizes of Size1, Size2, and Size3, respectively. Users orchestrate collaborative tasks based on the three data sources Data1, Data2, and Data3 according to their collaborative analysis needs, such as... Figure 1 As shown.

[0013] Furthermore, the above method formulates a task execution plan for collaborative task T. First, the cross-center collaborative computing task is divided into subtasks based on whether data migration will occur; the division strategy is detailed below. Figure 2 Left, the result after division is shown in the image. Figure 2 Right. Set the data volume threshold in the collaborative network to Xmax, and optimize the task execution plan by merging data sources based on the difference in data source values. For example, consider the two upstream parallel subtasks T'. i1 T i2 If the difference in the number and size of the included data sources is less than the data volume threshold Xmax in the collaborative network, then the subtask T' is assigned to a new data center; if the two upstream parallel subtasks T' of subtask T' are... i1 T i2 The difference in the number and size of the included data sources is greater than or equal to the data volume threshold Xmax in the collaborative network, and the subtask T i1 The number and size of the data sources contained are greater than T. i2 The quantity and size of the required data sources will be used to merge the subtask T' into T. i1Merging optimization can reduce data transfer across tasks while ensuring that only a small amount of data is transferred. Figure 3 This demonstrates the results of execution plan optimization when the difference in the number and size of data sources contained in the upstream parallel subtasks exceeds a threshold.

[0014] Furthermore, after optimizing and merging the above methods, the subtask execution nodes are initialized. As the smallest execution unit in the network, a subtask, if it contains a data source component, will be distributed to the data center where the data source is located for execution. If the subtask does not contain a data source component, the execution node needs to be dynamically planned. Figure 4 A schematic diagram showing the results of the task execution plan is displayed.

[0015] Furthermore, the above method is executed in a top-down manner according to the DAG (Directed Acyclic Graph). Subtasks with assigned execution nodes are distributed to their corresponding nodes for execution. The allocation method for subtasks T' without assigned execution nodes is as follows: First, construct the network node queue of the current cooperative network, and initialize the upstream subtasks T that subtask T' depends on. i1 T i2 The data output from each network node is distributed, with other network nodes initialized to 0; then, it is sorted according to the data volume, and the upstream subtask T that T' depends on is selected. i1 T i2 When the output data volume is the same, the network nodes included in the collaborative task T with a smaller potential overhead are prioritized, resulting in the optimal execution node ranking. Taking into account the network node computing resource factors, a computing resource threshold Cmax is set to exclude nodes whose computing resources do not meet the conditions, and the optimal execution node is returned. The network node is a data center. Figure 5 The flowchart of the dynamic programming strategy is shown.

[0016] The beneficial effects of this invention are as follows:

[0017] The cross-domain pipeline dynamic planning method of the present invention can effectively improve the execution efficiency of cross-node collaborative technology tasks and maximize resource utilization. Attached Figure Description

[0018] Figure 1 Define a flowchart for the task.

[0019] Figure 2 Flowchart for task division strategy.

[0020] Figure 3 Optimize the task execution plan diagram.

[0021] Figure 4 This is a schematic diagram of the task execution plan results.

[0022] Figure 5 This is a flowchart for dynamic programming strategies.

[0023] Figure 6 This is a schematic diagram of task arrangement for an example.

[0024] Figure 7 This is a task execution plan diagram for an example.

[0025] Figure 8 The following is a diagram showing the result of the task execution plan in an example.

[0026] Figure 9 The diagram shows the result of dynamic programming for the task in the example. Detailed Implementation

[0027] The present invention will now be described in further detail with reference to the accompanying drawings. The examples given are only for explaining the present invention and are not intended to limit the scope of the present invention.

[0028] This embodiment provides a dynamic planning method for cross-center collaborative tasks based on a pipeline mechanism, as detailed below:

[0029] 1) Users select collaborative data sources and collaborative model algorithms based on the collaborative network to orchestrate tasks;

[0030] 2) Develop a task execution plan for collaborative tasks;

[0031] 3) Initialize subtask execution nodes according to the task execution plan;

[0032] 4) Dynamically perceive network resource status to realize sub-task planning and scheduling.

[0033] The construction process of this method is illustrated using a specific user requirement as an example. The user's specific requirement is described as follows: The collaborative network consists of three data centers: DataCenterA, DataCenterB, and DataCenterC. Research users publish shared grassland aboveground biomass datasets Species1 and Species2 on DataCenterA and DataCenterB, respectively, with data sizes of 1GB and 512MB. The user merges these two datasets to perform online processing and distribution analysis of grassland aboveground biomass.

[0034] First, users retrieve relevant data sources and model algorithms based on their needs, and then orchestrate collaborative tasks, specifically as follows: Figure 6 The task involves performing quality control processing such as field checks and null value filtering on grassland aboveground biomass datasets Species1 and Species2 from two different sources, and then merging the data from the two sources to calculate total dry weight, average value, and standard deviation.

[0035] Then, an execution plan is formulated for the choreographed tasks. Set the data volume threshold in the collaboration network to Xmax = 1GB. The subtasks are divided based on whether data migration behavior will occur. The division results are shown in Figure 7 , and it is divided into 3 subtasks in total, namely Task1, Task2, and Task3. The task execution plan is merged and optimized according to the data volume difference. In this task, the execution of Task3 depends on the execution results of Task1 and Task2. The difference between the data Size1 (1GB) processed by Task1 and the data Size (512MB) processed by Task2 is 512MB, which is less than the collaboration network threshold Xmax, that is, |Size1 - Size2| < Xmax. It is considered that the execution time of Task1 and Task2 is quite equivalent, and the execution node of Task3 needs to be dynamically planned, and the execution plan remains unchanged.

[0036] Third, according to the task execution plan, initialize the subtask execution nodes. As the smallest execution unit in the network, when a subtask contains a data source component, it will be distributed to the data center where the data source is located for execution. When a subtask does not contain a data source component, the execution node needs to be dynamically planned. The initialized task execution plan is as Figure 8 shown. Task1 contains Species1, and this data source is distributed in DataCenter-A. The execution node of Task1 is initialized to DataCenter-A; Task2 contains Species2, and this data source is distributed in DataCenter-B. The execution node of Task2 is initialized to DataCenter-B; Task3 depends on the outputs of Task1 and Task2, and the execution node needs to be dynamically planned.

[0037] Fourth, dynamically perceive network resource conditions to achieve subtask planning and scheduling. This collaborative task is executed in a top-down manner according to a DAG (Directed Acyclic Graph). Task1 and Task2 are distributed to DataCenter-A and DataCenter-B respectively for parallel execution. After execution, Task1 outputs 768MB of data, and Task2 outputs 256MB of data. The execution node strategy for Task3 is as follows: First, construct the current network node queue (DataCenter-A, DataCenter-B, DataCenter-C), initialize the data distribution of the upstream task nodes Task1 and Task2 that the current task Task3 depends on, and sort them according to the data size. The sorting result is (DataCenter-A: 768MB, DataCenter-B: 256MB, DataCenter-C: 0MB). The system dynamically senses the computing resources of each network node. If DataCenter-A meets the computing resources required for Task3, Task3 is scheduled to be executed in DataCenter-A. If DataCenter-A does not meet the resource requirements, Task3 is scheduled to be executed in DataCenter-B. If DataCenter-B does not meet the resource requirements, Task3 is scheduled to be executed in DataCenter-C. If DataCenter-C does not meet the resource requirements, Task3 scheduling fails, and the entire task execution fails.

[0038] Although specific embodiments of the invention have been disclosed for illustrative purposes to aid in understanding and implementing the invention, those skilled in the art will understand that various substitutions, variations, and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the invention should not be limited to the content disclosed in the preferred embodiments, and the scope of protection claimed by the invention is defined by the claims.

Claims

1. A dynamic programming method for cross-center collaborative tasks based on a pipeline mechanism, comprising the following steps: 1) Arrange data sources and collaborative model algorithms according to collaborative requirements, and configure and generate cross-center collaborative computing tasks; 2) Divide the cross-center collaborative computing task into subtasks based on whether data migration will occur, forming a DAG (Directed Acyclic Graph); wherein, the subtask is a node in the DAG, and the execution order of the subtask is the edge between the related nodes; 3) Based on the DAG (Directed Acyclic Graph), different data sources are divided into different subtasks; the method for dividing different data sources into different subtasks is as follows: first, construct the network node queue of the current collaborative network, and initialize the distribution of upstream data outputs that each network node currently assigns to its subtasks; then, sort the subtasks of unassigned data sources according to the amount of data they depend on, and then, starting from the subtask with the largest amount of dependent data, assign each subtask of unassigned data source to a network node that meets resource requirements and has the lowest overhead; the network node is a data center; 4) When a subtask T is assigned a data source D, the subtask T is sent to the data center where the data source D is located for execution. For a subtask T' that is not assigned a data source, the data center to execute the subtask T' is determined based on the execution status of the upstream subtasks of the subtask T' and the resource status of each data center, and then the subtask T' is sent to the corresponding data center for execution.

2. The method according to claim 1, characterized in that, Each data center registers with the collaborative network; the registration information includes the data sources and data volume contained in the data center.

3. The method according to claim 1, characterized in that, The process is executed from top to bottom based on the DAG (Directed Acyclic Graph), dividing different data sources into different subtasks.

4. The method according to claim 1, characterized in that, When two subtasks T' that have not been assigned a data source depend on two upstream data volumes of equal size, the subtask T' will be scheduled to a network node with lower potential overhead within the network nodes included in the cooperating task.

5. The method according to claim 3, characterized in that, The method for determining the data center to execute subtask T' is as follows: if the two upstream parallel subtasks T' of subtask T' are... i1 T i2 If the difference in the required number of data sources is less than the data volume threshold Xmax in the collaborative network, then the subtask T' will be assigned to a new data center; if the two upstream parallel subtasks T' of subtask T' are... i1 T i2 The difference in the required number of data sources is greater than or equal to the data volume threshold Xmax in the collaborative network, and the subtask T... i1 The required number and size of data sources are less than T i2 The required data source quantity and size are then used to send subtask T' to subtask T. i1 The required data source is executed in the data center.