Spark parameter adaptive optimization method and system

An optimization method and self-adaptive technology, which can be applied in the fields of electrical digital data processing, resource allocation, program control design, etc., and can solve problems such as complex parameter tuning of Spark.

Pending Publication Date: 2020-02-21
武汉联图时空信息科技有限公司
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide Spark parameter self-adaptive optim

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark parameter adaptive optimization method and system
  • Spark parameter adaptive optimization method and system
  • Spark parameter adaptive optimization method and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0068] Example

[0069] In this case, a Spark cluster is built with four servers with 14 cores available and 44GB of available memory. The application model is the spatial intersection calculation of trajectory data points and road network data.

[0070] Step 1. Collect experimental data for model training.

[0071] As shown in Table 1, the Spark application is submitted under the corresponding parameter value space. In this embodiment, 140746 pieces of task execution measurement information are collected. Among them, the data volume ranges from 50 million to 50 million. The amount of data increases until the upper limit of the amount of data that can be processed by the current parameter configuration. The collected task measurement data is processed and input into the neural network model for training. The parameters of the model are shown in Table 2. The test data set is used to evaluate the model, and the average prediction deviation of the execution time of a single task is ab...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Spark parameter adaptive optimization method and system, and the method comprises the steps: 1, building a prediction model of Spark task execution time, and training the prediction model through sample data; 2, sequentially decomposing a Spark application to be executed into a job, a stage and a task according to a Spark execution mechanism, allocating each task to eachcore of each Executor node, predicting the execution time of each task by utilizing a task execution time prediction model, simulating the task scheduling process of the Spark stage based on the taskexecution time, and calculating the execution time of the stage under different parameter combinations; and 3, determining a final optimized parameter combination according to the stage execution timepredicted under different parameter combinations. According to the parameter self-adaptive optimization method, the execution time required by each parameter combination in an actual test is avoidedin a prediction mode, so that the feasible parameter self-adaptive optimization method is realized.

Description

technical field [0001] The invention belongs to the technical field of spatiotemporal big data computing, and in particular relates to a Spark parameter self-adaptive optimization method and system. Background technique [0002] In the past few years, we have entered the era of big data. The growth of massive data puts forward new requirements for storage management and computing analysis, and promotes the development of big data technology. The mainstream computing framework has also evolved from Hadoop based on MapReduce to Spark based on memory computing. As a complex general-purpose distributed computing framework, Spark provides a large number of configurable parameters to meet the various needs of using Spark in different application scenarios and maximize the computing performance of Spark. However, too many parameters and complex parameter settings also bring great challenges to program developers to optimize Spark applications. Contents of the invention [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50
CPCG06F9/5027
Inventor 彭旭呙维朱欣焰佘冰
Owner 武汉联图时空信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products