Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Spark parameter adaptive optimization method and system

An optimization method and self-adaptive technology, which can be applied in the fields of electrical digital data processing, resource allocation, program control design, etc., and can solve problems such as complex parameter tuning of Spark.

Pending Publication Date: 2020-02-21
武汉联图时空信息科技有限公司
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide Spark parameter self-adaptive optimization method and system, to alleviate the more complicated problem of Spark parameter tuning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark parameter adaptive optimization method and system
  • Spark parameter adaptive optimization method and system
  • Spark parameter adaptive optimization method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0069] In this case, four servers with 14 available cores and 44GB of available memory are used to build a Spark cluster, and the spatial intersection calculation between trajectory data points and road network data is used as the application model.

[0070] Step 1. Collect experimental data for model training.

[0071] As shown in Table 1, the Spark application is submitted in the corresponding parameter value space. In this embodiment, a total of 140,746 pieces of task execution measurement information are collected. The data volume ranges from 50 million to 50 million. The amount of data grows until the upper limit of the amount of data that the current parameter configuration can handle. The collected task measurement data is processed and input to the neural network model for training. The parameters of the model are shown in Table 2. And using the test data set to evaluate the model, the average prediction deviation of a single task execution time is about 18%, that is,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Spark parameter adaptive optimization method and system, and the method comprises the steps: 1, building a prediction model of Spark task execution time, and training the prediction model through sample data; 2, sequentially decomposing a Spark application to be executed into a job, a stage and a task according to a Spark execution mechanism, allocating each task to eachcore of each Executor node, predicting the execution time of each task by utilizing a task execution time prediction model, simulating the task scheduling process of the Spark stage based on the taskexecution time, and calculating the execution time of the stage under different parameter combinations; and 3, determining a final optimized parameter combination according to the stage execution timepredicted under different parameter combinations. According to the parameter self-adaptive optimization method, the execution time required by each parameter combination in an actual test is avoidedin a prediction mode, so that the feasible parameter self-adaptive optimization method is realized.

Description

technical field [0001] The invention belongs to the technical field of spatiotemporal big data computing, and in particular relates to a Spark parameter self-adaptive optimization method and system. Background technique [0002] In the past few years, we have entered the era of big data. The growth of massive data puts forward new requirements for storage management and computing analysis, and promotes the development of big data technology. The mainstream computing framework has also evolved from Hadoop based on MapReduce to Spark based on memory computing. As a complex general-purpose distributed computing framework, Spark provides a large number of configurable parameters to meet the various needs of using Spark in different application scenarios and maximize the computing performance of Spark. However, too many parameters and complex parameter settings also bring great challenges to program developers to optimize Spark applications. Contents of the invention [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50
CPCG06F9/5027
Inventor 彭旭呙维朱欣焰佘冰
Owner 武汉联图时空信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products