Two-stage self-adaptive scheduling method suitable for large-scale parallel data processing tasks

A technology of data processing and scheduling methods, applied in the direction of electrical digital data processing, program start/switch, program control design, etc., to achieve the effect of realizing system resources, high flexibility, and improving task processing efficiency

Active Publication Date: 2018-11-30
中国航天系统科学与工程研究院
View PDF8 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The problem solved by the technology of the present invention is: to overcome the deficiencies of the prior art, to provide a two-level adaptive scheduling method suitable for large-scale parallel data processing tasks, to process tasks based on two levels of task / subtask, and to improve the degree of parallelism , which effectively solves the difficult problem of parallel scheduling caused by complex dependencies among tasks, realizes orderly and efficient parallel processing of large-scale data processing tasks, and reduces the overall execution time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-stage self-adaptive scheduling method suitable for large-scale parallel data processing tasks
  • Two-stage self-adaptive scheduling method suitable for large-scale parallel data processing tasks
  • Two-stage self-adaptive scheduling method suitable for large-scale parallel data processing tasks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to execute large-scale data processing tasks in an orderly and efficient manner, the method of the present invention processes tasks from two levels, aiming at maximizing the amount of parallelism and reducing task waiting or execution time: the first level, the task level, each Each task declares its dependent predecessor tasks, and the scheduler builds a topology based on this to ensure that tasks are executed in the order of dependencies, and tasks without dependencies can be executed in parallel; the second level, the subtask level, divides tasks into a series of actions or functions The data and resources required by each subtask have been loaded by the first-level task layer. The purpose of dividing subtasks is to further improve the degree of parallelism, and assign subtasks without resource conflicts and execution order associations to multiple threads at the same time. implement.

[0054] Data processing tasks refer to tasks such as data collection, da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a two-stage self-adaptive scheduling method suitable for a large-scale parallel data processing tasks. Two-stage scheduling is performed on tasks from a task stage and a subtask stage. By the method, the problem of parallel scheduling difficulty caused by complex dependence relationship among the tasks is solved effectively, parallelism degree is increased, orderly and efficient parallel processing of large-scale data processing tasks is realized, task waiting or executing time is reduced, and overall executing time is shortened. In addition, by the method, executor operation statistical information can be fed back to a scheduler, self-adaptive adjusting of executor pool size and task type is realized, and scheduling is constantly optimized, so that system resourceusing efficiency is improved.

Description

technical field [0001] The invention relates to a two-level adaptive scheduling method suitable for large-scale parallel data processing tasks, belonging to the field of data processing task scheduling. Background technique [0002] With the continuous development of Internet technology, the demand for large-scale massive data storage and processing in various fields is increasing, and the requirements for its work efficiency and processing cost are also increasing. How to reasonably allocate large-scale data processing tasks to multi-processor systems, improve execution efficiency, and pursue the smallest overall execution time has become a problem that needs to be solved urgently. [0003] Traditional general task scheduling algorithms, such as first come first serve scheduling algorithm, high priority priority scheduling algorithm, time slice round robin scheduling algorithm, etc., have certain limitations and are not suitable for large-scale data processing task scheduli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48
CPCG06F9/4881
Inventor 顾升高刘瑞齐俊鹏胡泉杨越孙毅方
Owner 中国航天系统科学与工程研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products