MapReduce calculation process optimization method based on B-tree data structure

A data structure and process optimization technology, applied in the field of MapReduce computing, can solve the problems of long time consumption and low work efficiency, and achieve the effect of reducing disk write operations, repeated addressing processes, and disk read operations.

Active Publication Date: 2019-10-25
HENAN PROVINCIAL COMM PLANNING & DESIGN INST CO LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the Map task outputs a large amount of data, there may be a lot of spill overflow files. There are read and write disks (IO operations) in the process of generating out files and merging operations, which takes a long time and reduces work efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce calculation process optimization method based on B-tree data structure
  • MapReduce calculation process optimization method based on B-tree data structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] A method for optimizing the calculation process of MapReduce based on the B-tree data structure, such as figure 1 shown, including the following steps:

[0022] 1) Execute the Map task on the data input into the area;

[0023] 2) The output result after executing the Map task includes index file *.index and data file *.out;

[0024] 3) Store the index file *.index and data file *.out in a ring memory buffer;

[0025] 4) When the ring memory buffer is about to overflow, judge whether it is the last Map task. The criterion for judging whether the ring memory buffer is about to overflow is: when the storage capacity of the ring memory buffer reaches 80%, it is judged as Almost overflowing.

[0026] 5) If not, the data file *.out will be sorted and merged and written to the disk. At the same time, the process of merging the data file *.out before being stored on the disk will continuously perform sorting and compression operations, and the index file *.index will remain ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a MapReduce calculation process optimization method based on a B-tree data structure. The MapReduce calculation process optimization method comprises the following steps: 1) executing a Map task on data input into a district; 2) after executing the Map task, outputting a result which comprises an index file *. Index and a data file *. Out; 3) storing the index file *. Indexand the data file *. Out in an annular memory buffer area; 4) when the annular memory buffer area is about to overflow, judging whether the task is the last Map task or not; (5) if not, the data files *. Out are sorted and merged and then written into a disk, and the index files *. Index are left in an annular memory buffer area; and if so, directly inputting the data file *. Out into the reducefunction. According to the method, the disk read-write frequency is reduced, the calculation time is shortened, the calculation time is remarkably shortened, the calculation efficiency is improved, and the working efficiency can be effectively improved.

Description

technical field [0001] The invention belongs to the technical field of MapReduce computing, and in particular relates to a method for optimizing a MapReduce computing process based on a B-tree data structure. Background technique [0002] MapReduce is a distributed computing model and one of the main components of the Hadoop ecosystem, which undertakes the distributed computing function of large amounts of data. MapReduce includes two important phases: the Map phase is mapping, which is responsible for filtering and distributing data; the Reduce phase is specification, which is responsible for data calculation and merging. The output of Map is the input of Reduce, and Reduce needs to obtain data through Shuffle. [0003] In the Map process, each input split (split) will be assigned to a Map task for processing. By default, a block size of HDFS (64M by default) is used as a split. The calculation result of the Map process will be temporarily placed in a ring memory buffer. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/22G06F16/2458G06F9/50
CPCG06F16/2246G06F16/2471G06F9/5066
Inventor 王笑风田延峰杨博侯明业郭霄孙云龙刘满
Owner HENAN PROVINCIAL COMM PLANNING & DESIGN INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products