Data parallel job scheduling method based on branch DAG dependency

A technology for job scheduling and branching, which is applied in digital data processing, other database retrieval, other database indexing, etc., can solve problems such as difficult application and high complexity, and achieves reduction in job completion time, overhead, and job completion. effect of time

Active Publication Date: 2019-09-24
NAT UNIV OF DEFENSE TECH
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the complexity of this method is very large, and it is difficult to be applied in practice.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data parallel job scheduling method based on branch DAG dependency
  • Data parallel job scheduling method based on branch DAG dependency
  • Data parallel job scheduling method based on branch DAG dependency

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0055] In order to make the technical solutions in this application better understood, Figure 2 to Figure 10 A specific embodiment of the data parallel job scheduling method based on branch DAG dependence of the present invention is shown, including the following steps:

[0056] A data parallel job scheduling method based on branch DAG dependency, comprising the following steps:

[0057] Step 1: The job terminal receives the job;

[0058] Step 2: Traverse the DAG task graph of the job, find out the convergence point and bifurcation point of the DAG, call the convergence point in the DAG task graph a branch synchronization, and refer to the chain part without convergence or fork in the DAG task graph It is called a branch, and the branch that does not depend on other branches or the branch that depends on has been executed is called a suspended branch;

[0059] Such as image 3 As shown in the schematic diagram of branch and branch synchronization, for a data parallel job, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data parallel job scheduling method based on branch DAG dependency. The method comprises:1, a job end receiving a job; 2, traversing the DAG task graph of the job, and finding out branches and branch synchronization in the DAG graph and hanging branches in the branches; 3, finding out hanging branches in each DAG graph of the working end, and adding the hanging branches into a branch set B; 4, executing a branch scheduling algorithm on branches in the suspended branch set B to obtain a branch scheduling sequence P; 5, when computing resources exist, executing unit distribution and executing the branch tasks according to the branch scheduling sequence P; and 6, repeating the steps 3 to 5 until the branches in each DAG in the working end are executed completely. By determining the emergency degree of each branch, the non-emergency branches are scheduled in a delayed mode, computing resources are saved, more emergency work is distributed, and the branch synchronization completion time is shortened. Verification proves that compared with other scheduling methods, the average operation completion time is reduced by 10-15%.

Description

technical field [0001] The invention belongs to the field of parallel and distributed computing, in particular to a data parallel job scheduling method based on branch DAG dependency. Background technique [0002] The analysis of big data, such as machine learning, graph computing, and streaming computing, has become a key part of daily life. Hadoop and Spark platforms are proposed to efficiently process data-parallel jobs. However, this involves some challenging technical issues, such as job scheduling issues and network communication. For big data analysis jobs, job completion time (job completion time, JCT) is an extremely important indicator. JCT refers to a period of time from submission to completion of a data parallel job. A data parallel job consists of multiple computation stages and network communication between computation stages. These computation phases are executed in a specified order to ensure that dependencies are not violated. This dependency relations...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F16/901
CPCG06F9/4881G06F16/9024
Inventor 李东升胡智尧张一鸣
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products