Data sensing-based Spark configuration parameter automatic optimization method

A technology for configuring parameters and data perception, applied in program control devices, version control, instruments, etc., can solve the problems of high modeling cost, time-consuming manual parameter configuration method, and low performance model accuracy, and achieve high-precision results.

Inactive Publication Date: 2017-05-10
SHENZHEN INST OF ADVANCED TECH
View PDF3 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the different optimal configuration parameters for different cluster environments, different applications, and different input data sets, the manual configuration parameter method is a time-consuming and boring work
[0010] The disadvantages of the existing automatic parameter configuration methods are the low precision of the performance model and the high cost of modeling
Some methods use artificial neural network (Artificial Neural Network) and support vector machine (Support Vector Machine) to model, but to achieve higher accuracy (within 10%), a very large training set is required

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data sensing-based Spark configuration parameter automatic optimization method
  • Data sensing-based Spark configuration parameter automatic optimization method
  • Data sensing-based Spark configuration parameter automatic optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0038] Such as Figure 1-2 As shown, a data-aware Spark configuration parameter automatic optimization method includes the following three steps:

[0039] 1) collect data; described collection data comprises four small steps, as follows:

[0040] (1) Find the parameters that affect performance from all Spark parameters;

[0041] (2) Determine the value range of the parameter;

[0042] (3) Select the input set for the application;

[0043] (4) Randomly change parameters within a determined value range, configure Spark, run applications with different input data sets, and use the collected data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of electronic information, big data, cloud computing and the like, and particularly relates to a data sensing-based Spark configuration parameter automatic optimization method. The method comprises the steps of predetermining a Spark application and parameters influencing Spark performance; randomly configuring the parameters to obtain a training set; building a performance model by the training set through a random forest algorithm; and searching out optimal configuration parameters through a genetic algorithm. According to the method, under the condition that a user is not required to understand a Spark running mechanism, a parameter meaning effect, a value range, application characteristics and an input set, the optimal configuration parameters of a specific application running in a specific cluster environment can be found for the user; compared with a conventional parameter configuration method, the automatic optimization method is simpler and quicker; and the used random forest algorithm combines the advantages of machine learning and statistical reasoning, so that relatively high precision can be achieved by using relatively few training sets.

Description

technical field [0001] The invention belongs to the technical fields of electronic information, big data, cloud computing, etc., and in particular relates to a data-aware Spark configuration parameter automatic optimization method. Background technique [0002] Spark is a Hadoop-like MapReduce general-purpose parallel framework open sourced by UC Berkeley AMP lab (AMP Lab of the University of California, Berkeley). It has developed rapidly, and in just five years, it has become the top project of the Apache Foundation. Because Spark has the characteristics of storing intermediate results in memory, Spark runs iterative and interactive programs 10 times faster than the traditional disk computing framework Hadoop. Due to the important position of Spark in the field of big data analysis, according to the survey of Typesafe, more than 500 enterprises have used Spark in 2015. [0003] Configuration parameter optimization has always been one of the research hotspots in big data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44G06K9/62
CPCG06F8/71G06F18/24323G06F18/214
Inventor 罗妮喻之斌贝振东姜春涛须成忠熊文
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products