Hadoop parameter automatic optimization method and system based on performance pre-evaluation

A hadoop cluster and automatic optimization technology, applied in the transmission system, electrical components, program control devices, etc., can solve the problems of Hadoop code complexity, high threshold, high cost, etc., and achieve the effect of low cost, small system resources, and convenient use

Inactive Publication Date: 2013-04-24
HUAZHONG UNIV OF SCI & TECH
View PDF1 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] For the defects of the prior art, the purpose of the present invention is to provide a method for automatically optimizing Hadoop parameters based on performance estimation, aiming to solve the problems of high cost, low efficiency, high threshold and complex Hadoop codes existing in the existing methods. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop parameter automatic optimization method and system based on performance pre-evaluation
  • Hadoop parameter automatic optimization method and system based on performance pre-evaluation
  • Hadoop parameter automatic optimization method and system based on performance pre-evaluation

Examples

Experimental program
Comparison scheme
Effect test

example

[0081] In order to verify the feasibility and effectiveness of the system of the present invention, the system of the present invention is configured in a real environment, and an experiment is carried out on a typical application set of Hadoop.

[0082] Hadoop cluster basic hardware and software configuration of the present invention are as shown in table 1:

[0083]

[0084] Table 1

[0085] Each module deployment of the present invention comprises two parts: the Hadoop application analysis module 1 is distributed to each slave node of the Hadoop cluster in the form of a jar package; remaining performance estimation module 2, parameter adjustment module 3, user interaction module 4 It is stored in the working directory as a user program.

[0086] The present invention first tracks the Hadoop application program submitted by the user, collects and counts the running information of the program on the cluster; then establishes a cost model according to the characteristics o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop parameter automatic optimization method based on performance pre-evaluation. The method comprises the following steps: statistically analyzing operational characteristics of an application run by a user on the Hadoop cluster to generate an output file; obtaining the output file and extracting from the output file of run time of various stages, processed and transmitted data, resources allocated for running the Hadoop application and a corresponding parameter configuration scheme of the Hadoop application, and according to the run time of various stages, the processed and transmitted data, the resources allocated for running the Hadoop application and the corresponding parameter configuration scheme of the Hadoop application in the output file, computing the total run time of the Hadoop application by utilizing MapReduce simulation technique; and according to the pre-evaluated performance of the Hadoop application under a current parameter configuration scheme and by utilizing the genetic algorithm, adjusting the corresponding parameter configuration scheme of the Hadoop application. The Hadoop parameter automatic optimization method based on performance pre-evaluation solves the problems of high cost, low efficiency, high threshold and capability of adding complexity of Hadoop codes in the existing method.

Description

technical field [0001] The invention belongs to the field of distributed computing models, and more specifically relates to a method and system for automatic optimization of Hadoop parameters based on performance estimation. Background technique [0002] With the rise of cloud computing, the Mapreduce programming model has been widely used as an important means to simplify large-scale data processing. Hadoop is an open source implementation of Mapreduce. It is a software framework capable of distributed processing of large amounts of data. Through it, users can easily develop distributed programs without knowing the underlying details of the distribution. There are more than 180 parameters in Hadoop to control the operation of the application program. Users can adjust the values ​​of these parameters according to their own needs. A large number of experiments have shown that the setting of Hadoop system parameters has a great impact on the performance of the application prog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44H04L29/08
Inventor 金海石宣化吴松曾林西
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products