Unlock instant, AI-driven research and patent intelligence for your innovation.

Automatic tuning method of spark configuration parameters based on cluster scaling

A technology for configuring parameters and clusters, applied in the computer field, can solve the problems of complex model creation process and high time cost, and achieve the effect of lowering the threshold of optimization and reducing time cost.

Active Publication Date: 2021-03-23
河钢数字技术股份有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose an automatic optimization method for Spark configuration parameters based on cluster scaling for the disadvantages of high time cost and complicated model creation process of the prior art distributed memory computing framework Spark configuration parameter automatic optimization method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic tuning method of spark configuration parameters based on cluster scaling
  • Automatic tuning method of spark configuration parameters based on cluster scaling
  • Automatic tuning method of spark configuration parameters based on cluster scaling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with the accompanying drawings.

[0038] Refer to attached figure 1 , to further describe the specific steps of the present invention.

[0039] Step 1, build a cluster.

[0040] Build a cluster composed of multiple computers with the same hardware configuration equipped with the distributed memory computing framework Spark.

[0041] Step 2, select the configuration parameter set.

[0042] From all the configuration parameters to be modified in the Spark cluster of the distributed memory computing framework, select the configuration parameters recommended to be modified in the optimization standard to form a set of configuration parameters to be optimized.

[0043] On the optimization page in the official documentation of the distributed memory computing framework Spark, the optimization standard specifies the configuration parameters that should be optimized.

[0044] Step 3, determine the value type...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for automatic tuning of Spark configuration parameters based on cluster scaling disclosed by the present invention, the steps of which are: (1) building a cluster; (2) selecting a configuration parameter set; (3) determining the value type and range of configuration parameters; (4) ) scaling cluster; (5) training random forest model; (6) screening the best configuration; (7) verifying the configuration effect. The present invention can be applied in the technical field of massive data processing. By scaling the value range of Spark memory configuration parameters and the amount of data to be processed in the distributed memory computing framework, the time for evaluating each configuration is shortened, and the configuration and distributed memory are established through the random forest model. The relationship between the performance influence of the computing framework Spark cluster, and search for the configuration with the best performance of the distributed memory computing framework Spark cluster composed of multiple computers with the same hardware configuration.

Description

technical field [0001] The invention belongs to the technical field of computers, and further relates to a method for automatically tuning Spark configuration parameters based on cluster scaling in the technical field of massive data processing. The present invention can obtain a configuration superior to the performance of the Spark cluster of the distributed memory computing framework under the default configuration by scaling the Spark cluster of the distributed memory computing framework and training the random forest model. Background technique [0002] The distributed memory computing framework Spark is a big data parallel computing framework based on memory computing. The distributed memory computing framework Spark is based on memory computing, which improves the real-time performance of data processing in the big data environment, while ensuring high fault tolerance and high scalability, allowing users to deploy the distributed memory computing framework Spark on a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/71G06K9/62
CPCG06F8/71G06F18/24323G06F18/214
Inventor 鲍亮陈炜昭卜晓璇
Owner 河钢数字技术股份有限公司