Mapreduce computation process optimization method

A technology of process optimization and sub-jobs, which is applied in the direction of multi-programming devices and resource allocation, can solve problems such as high disk load, concentrated consumption of network resources, resource shortage, etc., to reduce instantaneous network transmission traffic, avoid long-term occupation of disks, Improve the performance of the calculation process

Inactive Publication Date: 2015-03-04
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF7 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is: in order to improve the MapReduce task processing ability, aiming at the high memory usage rate in the MapReduce task processing process, the concentrated consumption of network resources and network congestion, and the current situation of resource shortage caused by high disk load, a MapReduce calculation process is provided Optimization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mapreduce computation process optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] Below according to accompanying drawing of description, in conjunction with specific embodiment, the present invention is further described:

[0017] A mapreduce calculation process optimization method. First, the original data file needs to be divided into several files, and one of the unprocessed files is selected as the input of the sub-job to determine whether there are files that need to be merged. If not, submit the task. ;Start the Map task with the same processing process, execute the Map operation, sort, merge and partition the Map output, receive the Map output result, execute the Reduce operation, and save the output result; if there is a file that needs to be merged, submit the task, start the multi This is a Map task for processing, sending different input data to the corresponding Map, performing Map operations, multi-output sorting, merging, and partitioning, and finally checking whether there are still data files in the original data file set that have no...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mapreduce computation process optimization method. The method includes the steps of dividing an original data file into a plurality of files, selecting a file from an unprocessed file collection as a sub-job to be input, determining whether files needing to be merged exist or not , and if not, submitting a task; starting a Map task with the same processing process, executing the Map operation, receiving a Map output result after sorting, merging and partitioning the output Map, performing the Reduce operation, and saving the output result; if files needing to be merged exist, submitting a task; starting a Map task with multiple processing, executing the Map operation, sending different input data to the corresponding Map, executing the Map operation, and performing sorting, merging and partitioning for multi-output; finally checking whether unprocessed data files exist in the original data file collection or not , if not, terminating a program, and if so, performing the procedure again on the divided data files. The mapreduce computation process optimization method disperses the output time, decreases the transient network traffic flow, reduces the occupancy rate of a local disk and improves the MapReduce computation process.

Description

technical field [0001] The invention relates to the technical field of computer software and parallel computing, and is specifically described as an optimization method for improving the MapReduce computing process by reducing the amount of intermediate data stored on a local disk during program operation and reducing disk load. Background technique [0002] With the rapid development of computer technology and Internet technology, the network penetration rate and the scale of Internet users are also increasing year by year. The double stimulation of the continuous increase of user scale and the rapid growth of data processing has brought new challenges to Internet applications. Massive data requires a huge amount of storage resources as the basis, and the increasing dependence of network applications on data makes the demand for the ability to calculate and process massive data more and more intense. The cost of maintaining data storage for these applications and data calcul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
Inventor 刘晶杨晋博黄敏
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products