Hadoop task scheduling method and device

A task scheduling and task technology, applied in the direction of multi-program device, program startup/switching, resource allocation, etc., can solve the unproposed solution, increase the use of I/O resources and network bandwidth consumption, and the FIFO scheduler is not very good Make better use of cluster resources and other issues to achieve the effect of optimizing scheduling and improving resource utilization

Active Publication Date: 2019-12-27
BANK OF CHINA
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

FIFO preferentially runs the tasks of the same job at the top of the queue, which can reduce the throughput of the entire system, but severely limits the processing power of the cluster, because although the tasks of the same job often have the same characteristics, the I / O and CPU resources are not Not fully used, and a task performing I / O is blocked because the scheduler prevents it from using the CPU until the I / O operation is complete
With the increase in the number of users and user programs in the Hadoop cluster, the FIFO scheduler cannot make good use of cluster resources, nor can it meet the service quality requirements of different applications, and in severe cases, it will also affect the normal operation of jobs.
[0004] Capacity Scheduler divides resources to each queue in proportion, and sets strict constraints to prevent resource monopoly, which solves the problem of multi-user scheduling, but the scheduling strategy lacks support for load balancing, and the data locality is not ideal
[0005] Fair Scheduler tries to allocate resources equally to all jobs. If a user submits a new job, some resources will be released to the new job. This method ensures that all jobs get the same amount of resources, but the locality of data is not ideal. , resulting in the need to obtain data from other nodes, increasing I / O resource usage and network bandwidth consumption
[0006] For the above problems, no effective solutions have been proposed so far

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop task scheduling method and device
  • Hadoop task scheduling method and device
  • Hadoop task scheduling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0027] Before introducing the embodiments of the present invention, the technical terms involved in the embodiments of the present invention are firstly introduced.

[0028] 1. MapReduce is a programming model for parallel computing of large-scale data sets.

[0029] 2. Hadoop is an open source framework developed based on the MapReduce computing model and Google file system for processing large-scale data in a distributed environment. The Hadoop...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Hadoop task scheduling method and device, and the method comprises the following steps: obtaining the information of a plurality of tasks which have been operated in each nodeoperation, and determining the operation type of each node according to the information of the plurality of tasks which have been operated in each node operation; predicting the load of each node according to the operation type of each node; determining the adaptability of the operation type of each node and each node according to the operation type of each node and the load of each node; and according to the operation type of each node and the adaptability of each node, allocating the tasks which are not operated in the operation to each node. According to the scheme, based on Hadoop task scheduling of load prediction, the resource utilization rate of the Hadoop cluster can be increased.

Description

technical field [0001] The invention relates to the technical field of Hadoop task scheduling, in particular to a Hadoop task scheduling method and device. Background technique [0002] Hadoop is an open source distributed storage and processing system for big data batch processing jobs, such as big data analysis, web page indexing and other jobs. Hadoop's default job scheduling algorithm is based on FIFO. Currently, Hadoop is configured with a variety of job schedulers, mainly including FIFO (default scheduler), Capacity Scheduler (computing capacity scheduler) and Fair Scheduler (fair scheduler). [0003] Clusters often run different types of jobs at the same time, and these different types of workloads have different resource requirements. For example, I / O-intensive workloads use more I / O resources, while CPU-intensive workloads use more I / O resources. More computing resources will be used. FIFO preferentially runs the tasks of the same job at the top of the queue, whic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F9/50
CPCG06F9/4881G06F9/505G06F9/5061
Inventor 祝春祥翁星晨
Owner BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products