Multithreading-based MapReduce execution system

An execution system and multi-threading technology, applied in multi-programming devices, resource allocation, etc., can solve problems such as high threshold, poor usability, high cost, etc., and achieve the effects of avoiding overhead, reducing resource competition, and reducing resource management pressure

Active Publication Date: 2014-02-26
HUAZHONG UNIV OF SCI & TECH
View PDF2 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the defects of the prior art, the object of the present invention is to provide a multi-thread-based MapReduce exec

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multithreading-based MapReduce execution system
  • Multithreading-based MapReduce execution system
  • Multithreading-based MapReduce execution system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0025] An important reason why Hadoop is widely used is its mature code and high availability. The purpose of the present invention is to improve its execution efficiency while maintaining its original excellent characteristics. For this reason, the system interface is exactly the same as the original Hadoop, and the user does not need to modify its original MapReduce program when using the present invention. The user submits the job to the JobTracker through the JobClient on the node where the user is located, and the JobTracker schedules the job and initializes the job. When it is ready, when eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multithreading-based MapReduce execution system comprising a MapReduce execution engine implementing multithreading. A multi-process execution mode of Map/Reduce tasks in original Hadoop is changed into a multithread mode; details about memory usage are extracted from Map tasks and Reduce tasks, a MapReduce process is divided into multiple phases under fine granularity according to the details, and a shuffle process in the original Hadoop is changed from Reduce pull into Map active push; a uniform memory management module and an I/O management module are implemented in the MapReduce multithreading execution engine and used to centrally manage memory usage of each task thread; a global memory scheduling and I/O scheduling algorithm is designed and used to dynamically schedule system resources during the execution process. The system multithreading-based MapReduce execution system has the advantages that memory usage can be maximized by users without modifying the original MapReduce program, disk bandwidth is fully utilized, and the long-last I/O bottleneck problem in the original Hadoop is solved.

Description

technical field [0001] The invention belongs to the field of big data distributed computing, and more specifically relates to a MapReduce execution system with high I / O efficiency. Background technique [0002] The general-purpose Hadoop system is the most popular MapReduce open source system. It runs tasks in a multi-process manner, and each task has no connection at runtime. The simplicity of management leads to extensive use of resources. At present, the common scenario of the system is that multiple CPUs and multiple disks divide the memory into different independent partitions to run programs. There is a serious excess of CPU resources, but the scheduling takes the CPU as the core, which greatly increases the waiting time of the system; the memory usage is isolated from each other, and the execution of Reduce must wait until all the Maps are completed, so the memory waste is serious; at the same time, the disk read and write is unreasonable , accessing the disk in para...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50
Inventor 石宣化金海陈明吴松陆路
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products