Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hadoop job scheduling method based on genetic algorithm

A job scheduling and genetic algorithm technology, applied in the field of Hadoop job scheduling based on genetic algorithm, can solve the problem of inability to take into account job fairness and job execution efficiency.

Inactive Publication Date: 2015-04-29
XI'AN POLYTECHNIC UNIVERSITY
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a Hadoop job scheduling method based on a genetic algorithm, which solves the problems in the prior art that cluster resources need to be preconfigured before job scheduling, and that the fairness of the job and the execution efficiency of the job cannot be taken into account. technical problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop job scheduling method based on genetic algorithm
  • Hadoop job scheduling method based on genetic algorithm
  • Hadoop job scheduling method based on genetic algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0057] see figure 1 , the Hadoop job scheduling method based on genetic algorithm of the present invention, comprises the following steps:

[0058] Step 1: Job Preprocessing

[0059] At the JobTracker node, firstly summarize the jobs waiting to be scheduled and the TaskTracker nodes in the cluster. For each job in the job queue, count the number of fragments of each job l m and the maximum number of TaskTrackers it can be scheduled b m ,As shown in Table 1:

[0060] Table 1

[0061] job

split

TaskTracker

job 1

l 1

b 1

job 2

l 2

b 2

……

……

……

job m

l m

b m

[0062] Among them, Job 1 、Job 2 ... Job m The order of jobs is first-come-first-served.

[0063] For each TaskTracker node, read the maximum number of parallel slots s in the corresponding con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Hadoop job scheduling method based on a genetic algorithm. The Hadoop job scheduling method comprises the following steps: firstly, pre-processing work to generate an encoding and decoding table; secondly, generating initial scheduling tables of a plurality of executing work, and carrying out fitness detection sorting on the initial scheduling tables to obtain a scheduling table list; finally, carrying out genetic operation on the initial scheduling tables in the scheduling table list to form a final scheduling table list; taking the scheduling table ranked in the most front of the final scheduling table list as an optimal scheduling table; distributing tasks of different work to corresponding TaskTracker for execution according to the optimal scheduling table, so as to finish a Hadoop job scheduling task. According to the scheduling method, resources in a platform do not need to be pre-set before jobs are scheduled; dynamic acquisition, counting and distribution are carried out in a scheduling process and the burden of an administrator is alleviated; furthermore, the total finishing time of the work and the average finishing time of the work can be controlled by the scheduling method, so that the fairness of executing the work is guaranteed and the executing efficiency can also be ensured.

Description

technical field [0001] The invention belongs to the field of information technology and relates to a Hadoop job scheduling method based on a genetic algorithm. Background technique [0002] Apache Hadoop is an open source distributed platform, mainly composed of two core projects, MapReduce and HDFS. MapReduce is the core computing framework of Hadoop. It is a software framework with a master-slave structure, which is divided into two roles: JobTracker and TaskTracker. The JobTracker node forms task fragments (splits) through the preprocessing of the job data Job, and then distributes them to each TaskTracker node to ensure the parallelism of tasks, and then decomposes each fragment in the Map stage and summarizes in the Reduce stage, and finally outputs the processing The results are saved; HDFS is the storage cornerstone for Hadoop to realize distributed computing. It is a highly fault-tolerant system suitable for deployment on inexpensive machines. HDFS is also a frame...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50
Inventor 薛涛燕明磊
Owner XI'AN POLYTECHNIC UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products