Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Scientific workflow scheduling method and device

A scheduling processing and workflow technology, applied in the field of data processing, can solve problems such as large load imbalance, long time spent on scheduling algorithms, and poor quality

Active Publication Date: 2014-09-10
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the DAG scheduling problem is a problem of NP complexity, it is impossible to use strict mathematical derivation to obtain the optimal scheduling strategy. At present, many scholars at home and abroad have proposed many heuristic and meta-heuristic algorithms
The Myopic algorithm is the simplest scheduling algorithm. It arbitrarily schedules a schedulable task to an optimal resource each time, but the result is often a long execution time and poor resource load balance. The Min-Min algorithm Execute the task with the minimum expected execution time among all executable tasks, and allocate corresponding resources until all task scheduling is completed. The scheduling result can generally shorten the total execution time, but when the resource difference is large, the load is not balanced. The corresponding total execution time will be longer; the Max-Min algorithm is an improvement on the Min-Min algorithm. During the execution process, only the task with the maximum expected execution time of all executable tasks is executed each time, and will It is mapped to the resource that takes the least time, and has good load balance, but it is not as good as the Min-Min algorithm in the case of fewer short tasks and more long tasks
The Sufferage algorithm looks for the task with the largest scheduling loss to be executed first. Generally speaking, the balance is good, but the execution effect is not good in a multi-cluster environment; DCP (Dynamic Critical Path, dynamic critical path algorithm) first calculates the earliest start of each task If the time and the latest start time are equal, the tasks on the critical path are considered to be executed first. Generally, the scheduling algorithm takes a longer time than the previous ones, and the total execution time is also longer in the case of irregular workflows; The genetic algorithm is a meta-heuristic algorithm, which needs to search for the optimal solution in the global scope, so the algorithm itself takes the longest time to execute, and sometimes it will fall into the local optimum due to improper setting of the fitness function and other conditions, resulting in unsatisfactory results and relatively poor dynamic characteristics. Poor, the result is not as good as the above algorithms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scientific workflow scheduling method and device
  • Scientific workflow scheduling method and device
  • Scientific workflow scheduling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The preferred embodiments of the invention will be further described in detail below.

[0043] Such as figure 1 As shown in , the graph shows the mapping relationship between scientific workflow and grid heterogeneous computing resources. Scheduling of scientific workflow is the process of mapping different tasks to different heterogeneous computing resources for execution.

[0044] Such as figure 2 As shown, it is a directed acyclic graph of a scientific workflow, each circle represents a node, each node represents a task in the scientific workflow, and these nodes form a task set T={T0,T 1 ,T 2 ,T 3 ,...T i ...}, the number next to each node indicates the size of the corresponding task, and the size of the task can be represented by MI (million instructions). For example, the sizes of tasks T0 and T1 are 300,000 and 2,500,000, respectively. Assume that the output file size of each task is 1GB (GigaByte, one billion bytes). The arrow relationship in the figure ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a scientific workflow scheduling method and device under a network environment. The method comprises the following steps that heterogeneous computing resources used as targets are inquired, and computing power PCj of each heterogeneous computing resource is recorded; all schedulable tasks in workflows used as targets are inquired; the task ratio p (ti, rj) of each task ti in all the schedulable tasks on the j heterogeneous computing resource rj is computed, and ct (ti, rj) = ext (ti, rj) + rt (ti, rj); in all the obtained task ratios, the largest task ratio p (tm, rn) is obtained, and the m task tm is correspondingly scheduled to the n heterogeneous computing resource rn to be executed. By means of the scientific workflow scheduling method and device, good resource load balance is achieved, the method and device can be adaptive to static and moving environments, scheduling time is short, total execution time is short, and combination property is excellent in the existing scheduling algorithms.

Description

【Technical field】 [0001] The invention relates to the field of data processing, in particular to a scientific workflow scheduling processing method and device. 【Background technique】 [0002] Grid is a global network infrastructure that provides virtual services for scientific research and business operations by integrating large-scale, dispersed, and heterogeneous computing resources, storage resources, and data resources. In recent years, more and more scientific fields such as biomedicine, geography, astrophysics, etc. have begun to use grids to share, manage and process large data sets within and between disciplines. In this environment of big data and intensive computing, the application of scientific workflow in grid becomes more and more important. The main purpose of adopting scientific workflow is to modularize and opaque the complex process of processing big data, to realize the simple call and reuse of the process that requires multiple calculations and repeated ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/48G06F9/50
Inventor 李秀宋靖东
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products