Optimization system and method for shuffling stage in Hadoop MapReduce

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
An optimization method and shuffling technology, which is applied in the fields of big data and cloud computing, can solve the problems of not finding instructions or reports, affecting the task completion time, and not yet collecting data, so as to shorten the data reading time and optimize the task completion time , Optimize the effect of tail delay

Active Publication Date: 2019-11-26

SHANGHAI JIAO TONG UNIV

View PDF4 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, dividing the task into a large number of subtasks will cause the shuffling phase to read and write a large number of small files

Such a large amount of small data volume and random I / O disk reading and writing will become the bottleneck of the shuffling stage and seriously affect the task completion time

[0011] At present, there is no description or report of the similar technology of the present invention, and no similar data at home and abroad have been collected yet.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] The following is a detailed description of the embodiments of the present invention: this embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed implementation methods and specific operation processes. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention.

[0034] The embodiment of the present invention provides a kind of optimization system aiming at the shuffling phase in Hadoop MapReduce, including system master node and system work node; Wherein:

[0035] The main node of the system includes: a scheduler module and a communication module a, the scheduler module is used to schedule the time when partition files are merged in advance, the time for shuffling in advance, and the destination of the shuffling results; the communication module...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an optimization system for a shuffling stage in Hadoop MapReduce. The optimization system runs in a working node and a main node of the Hadoop MapReduce in a daemon process mode, and communicates with the Hadoop MapReduce in an inter-process communication and remote process calling mode. Meanwhile, the invention provides an optimization method based on the optimization system. After the optimization system provided by the invention is operated, all intermediate data in Hadoop MapReduce task operation is taken over; and by means of pre-merging and pre-shuffling, on one hand, the idle network bandwidth in the Map stage is reasonably utilized, and on the other hand, small file reading and writing are effectively reduced after intermediate data in the same node is merged, so that the MapReduce task completion time is optimized.

Description

technical field [0001] The invention relates to the technical field of big data and cloud computing, in particular to an optimization system and method for the Shuffle stage in Hadoop MapReduce. Background technique [0002] MapReduce is a distributed computing framework for processing big data. Hadoop MapReduce is the most well-known and widely used open source implementation of MapReduce. Hadoop MapReduce users can process massive data (TB-level or even PB-level data) in parallel on large-scale clusters (up to thousands of nodes) by simply writing Map and Reduce algorithms. Moreover, Hadoop MapReduce provides a strong fault tolerance capability to ensure that tasks are completed in thousands of nodes. [0003] Hadoop MapReduce follows the BSP (Bulk Synchronous Parallel) model and abstracts the distributed computing process into three stages: Map, Shuffle, and Reduce. [0004] The operation of the Map phase is divided into two sub-phases: Map calculation and partition (P...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F9/50G06F16/18

CPCG06F9/5066G06F16/1815Y02D10/00

Inventor管海兵吴仲轩任锐戚正伟

OwnerSHANGHAI JIAO TONG UNIV

Optimization system and method for shuffling stage in Hadoop MapReduce

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology