Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Speculative Hadoop Scheduling Method Based on Load Balancing

A scheduling method and load balancing technology, applied in multi-programming devices, resource allocation, etc., can solve the problems of slow task running and job running, and achieve the effect of improving performance and avoiding load imbalance

Active Publication Date: 2018-09-25
INSPUR BEIJING ELECTRONICS INFORMATION IND
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When a job consists of hundreds or thousands of tasks, individual tasks may run slowly, causing the entire job to run slowly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Speculative Hadoop Scheduling Method Based on Load Balancing
  • A Speculative Hadoop Scheduling Method Based on Load Balancing
  • A Speculative Hadoop Scheduling Method Based on Load Balancing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] see figure 1 , which shows the overall flowchart of the speculative Hadoop scheduling method proposed by the present invention. The methods include:

[0038] S1: The method starts to determine whether the task is a slow task;

[0039] Whether a task is a slow task is judged according to the remaining execution time of the task. Specifically, assuming that the current execution progress of the task is A, and the task has been running for t, the remaining time of the task can be calculated as t1=t / A-t. Estimate the remaining completion time of the task based on the progress of the task and the running time, sort the tasks based on the remaining completion time, and select the task with the longest remaining completion time as the slow task; put the backup task of the slow task into the slow task queue.

[0040] S2: Determine which nodes in the cluster are fast nodes;

[0041] The criteria for judging are as follows: if there are a lot of slow tasks on a certain node, t...

Embodiment 2

[0046] The step process of determining the slow task proposed by the present invention is as follows: figure 2 shown, including the following steps:

[0047] S11: Calculate the remaining execution time of the task according to the running progress and running time of the task;

[0048] Specifically: assuming that the current execution progress of the task is A, and the running time of the task is t, the remaining time of the task can be calculated as t1=t / A-t.

[0049] S12: Determine the slow task according to the remaining execution time calculated in step S11;

[0050] Specifically, the tasks are sorted based on the calculated remaining completion time of each task, and the task with the longest remaining completion time is selected as the slow task.

[0051] S13: Determine whether the number of backup tasks of the slow task is greater than a set upper limit; if yes, the process ends; if not, put the backup tasks of the slow task into the slow task queue, and the process ...

Embodiment 3

[0053] The flow chart of selecting the fast node to execute the backup task of the slow task is attached image 3 As shown, including the following processes:

[0054] S21: Determine whether the head node in the node queue is a fast node; if yes, execute step S22, otherwise execute step S25;

[0055] In the cluster system, all cluster node information is placed in the queue to form a node queue; when selecting a node in the cluster system to perform the backup task of the slow task, it is judged whether the head node in the current node queue is a fast node.

[0056] This step judges the slow node and the fast node according to the following principle: if there are many slow tasks on a node, the node is judged as a slow node; on the contrary, a node with few slow tasks is judged as a fast node.

[0057] S22: Judging whether the number of tasks currently running on the head node is greater than 20% of the average number of tasks running on all nodes in the cluster; if not, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Provided is a speculation type Hadoop scheduling method based on load balancing. According to the speculation type Hadoop scheduling method based on load balancing, slow tasks are needed to be determined firstly, quick nodes are selected to execute back-up tasks of the slow tasks, and load balancing of a cluster system is ensured when the back-up tasks of the selected slow tasks are executed. Operation execution performance is optimized by designing a reasonable and simple slow task determining method and a strategy of selecting the quick nodes to execute the back-up tasks. Both the operation execution performance and load balancing in a cluster are taken into consideration. By means of the speculation type Hadoop scheduling method based on load balancing, load unbalance of the cluster is avoided, and integral performance of the Hadoop cluster is improved.

Description

technical field [0001] The invention relates to the technical field of computer load balancing, in particular to a speculative Hadoop scheduling method based on load balancing. Background technique [0002] In the Internet era where the amount of data is increasing rapidly, the Hadoop cluster has become a research system for parallel processing. The Hadoop platform implements application development through the parallel processing framework MapReduce, and the parallelization technology is transparent to developers, which is convenient for developers to write parallel processing. The program only needs to satisfy the MapReduce framework. [0003] The task scheduling algorithm is one of the core technologies on the Hadoop platform. The main function of the algorithm is to reasonably control and allocate the order of task execution and the computing resources of the system. The pros and cons of task scheduling strategies directly affect the execution performance and system res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
Inventor 郭美思吴楠
Owner INSPUR BEIJING ELECTRONICS INFORMATION IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products