Data balance method based on genetic algorithm in MapReduce calculation module

A genetic algorithm and data balancing technology, applied in the field of data balancing based on genetic algorithms, can solve problems such as Reduce calculation skew, prolong the running time of the Reduce phase, and increase the execution time of the reduce task, so as to avoid the inconsistency of processing time and save computing resources , the effect of reducing processing time

Inactive Publication Date: 2013-05-15
XI AN JIAOTONG UNIV
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is subject to factors such as hash calculation conflicts and the limited number of reduce. It is likely that a large number of keys will converge on the same partition, resulting in an unbalanced amount of data on each reduce task.
[0009] 2) Reduce calculation skew caused by the characteristics of the input data itself
In general, the input data skew in the Reduce phase will increase the execution time of some reduce tasks relative to other reduce tasks, prolong the running time of the entire Reduce phase, and ultimately affect the completion time of the entire MapReduce job

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data balance method based on genetic algorithm in MapReduce calculation module
  • Data balance method based on genetic algorithm in MapReduce calculation module
  • Data balance method based on genetic algorithm in MapReduce calculation module

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0036] A data balancing method based on genetic algorithm in a MapReduce computing model, comprising the following steps:

[0037] 1) Obtain the output information of the global Map, and obtain the metadata information of the partition processed by the reduce task. The process of obtaining the reduce metadata is as follows: figure 1 Shown:

[0038] 1.1. After each Map task completes the processing and writes the output to the local disk, it will use the heartbeat information to send a task completion message to the JobTracker through the TaskTracker;

[0039] 1.2. JobTracker maintains a Map task completion message queue for each MapReduce job. When a TaskTracker running a reduce task requests to obtain a Map task, the message is taken from the corresponding queue and passed to the TaskTracker according to the job to which the reduce task belongs;

[0040] 1.3. The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a data balance method based on genetic algorithm in a MapReduce calculation module. The data balance method based on the genetic algorithm in the MapReduce calculation module includes: obtaining global Map output information, utilizing the genetic algorithm to conduct combination optimization, collecting and coding metadata, conducting multiple random partition on population, forming a genome through each partition, calculating fitness function values of all subsets in each gene, applying a selection operator to a genome on the basis of evaluating fitness of each gene, utilizing a roulette algorithm to choose a plurality of high quality genes in the genome at random, conducting cross operation on the chosen genes, conducting mutation operation, choosing retained genes according to an elitism strategy after multiple evolutions, decoding the genes to obtain a optical combination of the metadata and guaranteeing that each data quantity which is processed by the reducer is approximate equal. The data balance method based on genetic algorithm in the MapReduce calculation module solves the problem of unbalance input data in the reduce phrase, saves calculation resource and reduces calculation cost.

Description

technical field [0001] The invention belongs to the technical field of computer MapReduce calculation models, in particular to a genetic algorithm-based data balance method in the MapReduce calculation model. Background technique [0002] Hadoop is a highly reliable and highly scalable storage and distributed parallel computing platform developed by the Apache open source organization. It was first developed as the basic platform of the open source search engine project Nutch, and then became independent from the Nutch project. Become one of the typical open source cloud computing platforms. The Hadoop core implements a block-based distributed file system (Hadoop Distributed File System, HDFS) and a MapReduce computing model for distributed computing. [0003] The MapReduce computing model is divided into two major task processing stages, Map and Reduce. During the MapReduce process, the Map stage converts the input data into the data form of <Key,Value> key-value pa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/12
Inventor 伍卫国樊源泉魏伟朱霍高颜
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products