A hadoop optimization method based on asynchronous startup

An optimization method and asynchronous startup technology, applied in the computer field, can solve the problems of increasing the memory pressure of nodes, network bandwidth consumption, loss of characteristics, etc., and achieve the effect of increasing speed, improving utilization, and eliminating the bottleneck of memory space

Active Publication Date: 2018-07-24
SOUTH CHINA NORMAL UNIVERSITY +1
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Network bandwidth consumption: The read data must be read from other nodes instead of localized calculations, resulting in a large amount of network bandwidth consumption
[0009] In response to the above-mentioned performance problems of Hadoop, there are now many improvements to Hadoop iterative operations, but it can be found that most of the improvements involve the modification of the underlying source code of Hadoop. When users need to switch to the original Hadoop mode, they need to The entire Hadoop framework must be replaced, which causes great inconvenience
In addition, some improved methods use cache to store data that is repeatedly read and written. Although this can avoid repeated I / O access, it will also increase the pressure on the memory of the node, causing performance bottlenecks and changing Lost the entire structure of Hadoop and lost the characteristics of Hadoop

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hadoop optimization method based on asynchronous startup
  • A hadoop optimization method based on asynchronous startup
  • A hadoop optimization method based on asynchronous startup

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] refer to figure 1 , a kind of Hadoop optimization method based on asynchronous startup of the present invention, comprises the following steps:

[0044] A. Upload the data file to HDFS, and divide the data file into multiple data blocks;

[0045] B. Copy the data block after block and distribute it on different machines;

[0046] C. Issue a startup command, submit a MapReduce job and assign map tasks and MyReduce tasks;

[0047] D. Execute the map task, run the Map function to process the data block, get the intermediate result data and send it, and start the next iteration job;

[0048] E. Execute the MyReduce task, receive the intermediate result data, and process it to obtain the result of this iteration, and execute step D in the next iteration job at the same time;

[0049] F. Determine whether the iteration end condition is met according to the iteration result, and if so, end the iteration; otherwise, return to step E for the next iteration job service.

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop optimization method based on asynchronous starting. According to the method, the operation is changed from serial execution into partial parallel execution, so that the whole operation executing process is optimized, the iteration operation speed is greatly accelerated, and the executing efficiency is effectively improved. The method has the advantages that bottom layer codes do not need to be modified, the use is convenient, in addition, the cluster utilization rate can be improved, and the bottle neck of a memory space does not exist. The invention is used as the Hadoop optimization method based on asynchronous starting, and can be widely applied to a Hadoop architecture technology.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a Hadoop optimization method based on asynchronous startup. Background technique [0002] The current Hadoop processing iterative operation has the following performance bottlenecks: [0003] Complete serial execution: each job has to wait for the previous job to be completely completed before it can start; [0004] Lengthy startup time: Job startup takes an average of 10-15 seconds, which is a huge waste of time; [0005] The reduce process is too long: the reduce process is the process of calculating the global center point and writing the result to HDFS. This process takes about 10 seconds, which is also a relatively large time consumption; [0006] Randomly select the initial center point: The initial center point has a great influence on the number of iterations of k-means. If a better initial center point is selected, it will help reduce the number of iterations and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/455G06F9/46
Inventor 赵淦森邓运亨王翔何建涛程庆年周冠宇周尚勤王欣明
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products