Task optimization scheduling method based on Hadoop

A Hadoop cluster and scheduling method technology, applied in the field of Hadoop-based task optimization scheduling, can solve problems such as mismatch

Active Publication Date: 2016-04-13
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF4 Cites 51 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This creates a mismatch between the assigned resources and required resources of the job, which cannot be resolved by changing the number of node slots

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Task optimization scheduling method based on Hadoop
  • Task optimization scheduling method based on Hadoop
  • Task optimization scheduling method based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0058] In this embodiment, the number of slots is set to 13, the number of CPUs is set to 2, and the range of memory size is set to 2GB according to the configuration of the machine and the resource requirements of the job. The platform is designed as a distributed multi-node deployment, implemented through virtual machines in rack servers. The deployment operating system is Centos5.5, and the number of deployment nodes is 10, one of which is a JobTracker node, and the other nodes are TaskTracker nodes. In order to analyze the performance of the new method more accurately, two large jobs with about 8GB of data and six small jobs with about 500MB of data are mixed.

[0059] Table 1 Mixed processing job data list

[0060]

[0061] Configure the commonly used optimized scheduler (FIFO), fair scheduler (FairScheduler), and computing capacity scheduler (CapacityScheduler) on the Hadoop experimental cluster in turn, submit all jobs in the same way and in the same order, observe ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a task optimization scheduling method based on Hadoop, comprising analyzing resource demands of the operating tasks in all jobs of every node in a Hadoop cluster, predicating the resource demand conditions of unexecuted tasks; allocating tasks to job nodes according to the occupation conditions of resources; wherein resources comprise a cluster CPU, a memory and an input output bandwidth IO; allocating the tasks to the task trackers of the job nodes through job schedulers, updating the waiting task lists of the job nodes, optimizing a task executing sequence according to a rule that local tasks are prior, configuring local resources according to the sequence, carrying out the jobs; when the job queue of a node in the cluster is empty and no task is in the current job queue according to inquiry, taking three indexes: file data backup quantities, idle time prediction values of all nodes in the cluster and disk capacities as parameters, and executing other waiting tasks in the Hadoop cluster. According to the invention, the utilization efficiency of the cluster resources is optimized.

Description

technical field [0001] The invention belongs to the field of computer distributed system data processing, and relates to a Hadoop-based task optimization scheduling method. Background technique [0002] Hadoop is an open source distributed system infrastructure commonly used for large-scale data processing. Job scheduling is one of the core technologies of Hadoop. Its main function is to select and schedule jobs according to a specific algorithm and control computing resources. Therefore, the job scheduling algorithm is directly related to the performance of the entire Hadoop system and the utilization of resources. At present, Hadoop's job scheduling algorithm abstracts multiple types of resources in the system into a single resource, and the resources allocated to jobs are all fixed-sized parts of node resources, called slots. There are three main problems in this kind of slot-based job scheduling algorithm. First of all, this type of algorithm does not take into accoun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5088G06F2209/503
Inventor 崔桐
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products