Self-adaptive load balancing method for Reduce ends in parallel computing framework

An adaptive load and parallel computing technology, applied in the information field, can solve problems such as too small data, unable to automatically adapt to distributed data, unable to estimate Reduce, etc., to achieve the effect of load balancing

Inactive Publication Date: 2012-08-08
PEKING UNIV
View PDF2 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] It is precisely because of this simple division method that when the number of records corresponding to different keys is unevenly distributed or the size of the value corresponding to the key is also different, some of the data obtained by Reduce is too large, and some of the obtained data is too large. small, resulting in a skewed
In addition, since the amount of data output by Map is different from the amount of data input by Map in real tasks, users cannot estimate the amount of data to be processed by Reduce and set an appropriate number of Reduce, and cannot automatically adapt to data with different distributions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adaptive load balancing method for Reduce ends in parallel computing framework
  • Self-adaptive load balancing method for Reduce ends in parallel computing framework
  • Self-adaptive load balancing method for Reduce ends in parallel computing framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The specific implementation steps and detailed methods of the present invention are described below.

[0047] This embodiment is carried out on the Hadoop platform, and mainly optimizes the problems existing in the current computing framework of Map-Reduce. Here we first give the design architecture diagram of the entire load balancer, and explain the content of each main module, and then describe the design and implementation of each module in detail.

[0048] The method of the present invention requires that a certain proportion (such as 75%) of all data be completed in the Map, and the 75% mentioned later also comes from here; this proportion makes as much as possible without affecting the overall efficiency. The static hash function can fully reflect the distribution of the overall data) After the processing task, the number of Reduce ends and a data division method for the Reduce end are determined according to the data distribution of the Map output, so that all d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a self-adaptive load balancing method for Reduce ends in a parallel computing framework. The distribution of data input by tasks is predicted through adopting a dynamic hash function division method; and a static hash function is produced according to the distribution characteristics of the predicted data, so that under the effect of the static hash function, all data can be evenly distributed into corresponding computing nodes as much as possible. Therefore, during task scheduling, the allocation of the data and computing resources can be dynamically and self-adaptively adjusted according to the data distribution condition, the deflection occurring during the computation is reduced and the efficiency is improved.

Description

technical field [0001] The invention belongs to the field of information technology, and relates to a method for balancing loads of distributed nodes in a distributed computing environment, in particular to a method for self-adaptive load balancing of Reduce terminals in a parallel computing framework. Background technique [0002] With the growth of data volume and the increase in demand for data processing capabilities, traditional parallel computing can no longer cope with distributed computing under large data volumes. [0003] At present, the Map-Reduce computing framework can better solve the problem of task allocation and scheduling of distributed computing under the large amount of data through the random allocation of data and tasks and the parallel utilization of hardware resources. However, because the task allocation of Map-Reduce depends on Static hash function settings and the number of parallel calculations make the distributed calculations not uniform enough,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
Inventor 王林青高军周家帅李红燕王腾蛟
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products