High-performance ordering method for MapReduce calculation frame

A computing framework and sorting method technology, applied in the information field, can solve the problems of resource consumption, inefficiency, waste of resources, etc., and achieve the effects of high algorithm complexity, cost reduction, and applicability

Inactive Publication Date: 2014-08-20
PEKING UNIV
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For jobs with a large number of maps, the cost of this part will be higher than the cost of the first part and consume most of the resources
[0008] Therefore, the two stages of sorting in the existing MapReduce computing framework are inefficient due to the selection of algorithms or improper processing procedures, and the sorting operation is essential for every job, resulting in a large waste of resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-performance ordering method for MapReduce calculation frame
  • High-performance ordering method for MapReduce calculation frame
  • High-performance ordering method for MapReduce calculation frame

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below through specific embodiments and accompanying drawings.

[0039] The present invention is carried out on the Hadoop platform version 2.2, and is mainly optimized for the data flow in the MapReduce computing framework. figure 2 It is a data flow chart of the MapReduce computing framework of the present invention. We describe the implementation in two parts. First, we describe the new data flow in which the sorting in the Map stage is moved to the Reduce stage in order to reduce the number of merge paths. Then we describe the implementation details of the hybrid high-performance memory sorting algorithm.

[0040]The present invention redesigns the data flow of the MapReduce computing framework. The work of the present invention is explained based on Hadoop, but the optimization of the sorting scheme in the present invention also covers systems of other MapReduce architectures.

[0041] For the Map stage, the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a high-performance ordering method for a MapReduce calculation frame. The method comprises the steps that buffer chains are respectively constructed in the Map stage according to partitions, the requirement that the partitions are ordered is removed, data of each partition are organized according to blocks, and the cost of data in copy and file IO aspects in internal storage is lowered; the ordering operation is not executed in the Map stage, a large buffer pool is adopted as the basic unit of ordering of one time in the Reduce stage, and the total merging approach quantity in the ordering merging state is made to be a value capable of being adjusted and optimized by a user. Through a mixed internal storage ordering algorithm, the two ordering stages in the MapReduce calculation frame are optimized, the influence of ordering on the performance of the calculation frame is basically eliminated, and then the resource validity of the calculation frame is improved, and the overall resource consumption of a cluster is reduced.

Description

technical field [0001] The invention belongs to the field of information technology, and relates to an optimization method for a distributed computing framework, in particular to a method for improving sorting performance in a MapReduce computing framework. Background technique [0002] MapReduce is a standard framework in distributed computing, but in terms of resource consumption, the existing MapReduce framework is not efficient, resulting in the waste of a lot of cluster resources. [0003] The key / value pairs of intermediate data need to be sorted in the existing MapReduce framework, and sorting constitutes the main resource consumption in the existing computing framework. We take Hadoop, an open source implementation of MapReduce, as an example to illustrate the above problems. [0004] Such as figure 1 Shown is a schematic diagram of the traditional MapReduce data flow. Among them, the Hadoop file system (referred to as HDFS, Hadoop File System) is responsible for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F3/067G06F16/134G06F16/182
Inventor 蒋达晟陈薇王腾蛟
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products