Check patentability & draft patents in minutes with Patsnap Eureka AI!

MapReduce performance optimization system and optimization method

An optimization method and performance technology, applied in the field of distributed computing, can solve the problems of MapReduce performance degradation, inability to achieve load balancing, running for a very long time, etc., and achieve the effect of reducing additional overhead, simple and easy to implement, and good computing overhead.

Active Publication Date: 2016-12-21
YANGTZE DELTA REGION INST OF TSINGHUA UNIV ZHEJIANG
View PDF1 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when there is a large skew in the input data, the number of Values ​​corresponding to each Key varies greatly. The same number of Keys does not mean the same amount of data. Therefore, this method cannot achieve good load balancing. Reduce subtasks with too much data are bound to run for a very long time, resulting in a serious decline in MapReduce performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce performance optimization system and optimization method
  • MapReduce performance optimization system and optimization method
  • MapReduce performance optimization system and optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032]Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0033] The following describes the MapReduce performance optimization system and optimization method according to the embodiments of the present invention with reference to the drawings. First, the MapReduce performance optimization system according to the embodiments of the present invention will be described with reference to the drawings.

[0034] figure 2 is a schematic structural diagram of a MapReduce performance optimization system according to an embodiment of the present invention.

[0035] Such as figure 2 As shown, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a MapReduce performance optimization system and method. The system comprises a Skew--master node and a plurality of Skew--slave nodes, wherein the Skew--master node is used for globally managing Key distribution among Reduce sub-tasks as a master coordinator and scheduling the Reduce sub-tasks to proper executive nodes; and each Skew--slave node comprises a Key monitor and an IO monitor, and is used for collecting Key related information and sending the Key related information to the Skew--master node. The optimization system is capable of optimizing the performance of MapReduce in data skewness and is simple and easy to realize.

Description

technical field [0001] The invention relates to the technical field of distributed computing, in particular to a MapReduce performance optimization system and optimization method. Background technique [0002] The rapid growth of the Internet and the World Wide Web has resulted in the dissemination of vast amounts of information online. Additionally, businesses and government agencies generate vast amounts of structured and unstructured information that needs to be processed, analyzed and linked. Data-intensive applications such as systems biology, climate modeling, data mining, and high-performance computing generate soaring amounts of data, ranging from gigabytes to terabytes to petabytes. [0003] MapReduce is a programming model and distributed computing model for processing large-scale data. It has a wide range of applications, such as distributed pattern-based search, distributed sorting, web page link graph reversal, Web access log statistics, inverted index structur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5083
Inventor 姜进磊武永卫王博
Owner YANGTZE DELTA REGION INST OF TSINGHUA UNIV ZHEJIANG
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More