Spark platform Shuffle process compression algorithm decision method

A technology of compression algorithms and decision-making methods, applied in computing, special data processing applications, instruments, etc., can solve problems such as instability, high threshold, increase system complexity, etc., to achieve the effect of improving efficiency and accuracy, and reducing threshold

Active Publication Date: 2018-01-19
UNIVERSITY OF CHINESE ACADEMY OF SCIENCES
View PDF6 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

So as to solve the problems of high cost, low efficiency, high threshold and increasing system complexity and instability in the Shuffle process of Spark application runtime

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark platform Shuffle process compression algorithm decision method
  • Spark platform Shuffle process compression algorithm decision method
  • Spark platform Shuffle process compression algorithm decision method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] Below in conjunction with accompanying drawing and specific implementation case, further illustrate the present invention, it should be understood that these implementation cases are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after reading the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0076] Such as figure 1 As shown, the present invention first collects cluster basic data, and collects performance data of the Spark platform for user applications, including cluster runtime environment, hardware configuration, memory occupancy, network uplink and downlink speed and other measurement indicators. The specific detailed performance indicators are as follows: Table 4 shows.

[0077] Table 4 is the performance index table

[0078]

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Spark platform Shuffle process compression algorithm decision method. The Spark platform Shuffle process compression algorithm decision method comprises the following steps:1) generating a directed acyclic graph DAG according to the dependency relationship of RDD by the Spark platform, and dividing the DAG into different stages according to the dependency relationship ofthe RDD; 2) calculating the total revenue and the total consumption brought by the two different processes of the Shuffle process using the compression algorithm when not using the compression algorithm and using the compression algorithm according to the basic data of the cluster and the target job information provided by the user; and 3) calculating the corresponding total cost in the whole Shuffle process executing the target job according to the total revenue and total consumption obtained from different compression configurations; and then confirming the configuration combination adoptedwhen the cluster runs for the target job according to the total cost. The invention ensures the stability of the Spark platform, and has the advantages of extensibility, low cost and high efficiency.

Description

technical field [0001] The invention relates to the field of performance optimization of the Shuffle process of a big data processing platform, in particular to a method for decision-making the optimal compression algorithm configuration of the Shuffle process of the Spark platform. Background technique [0002] With the advent of the big data era, the corresponding big data processing new technologies continue to develop, and a variety of big data processing platforms have also emerged, the most eye-catching of which is Apache Spark. [0003] Spark is a distributed big data parallel processing platform based on memory computing. It integrates batch processing, real-time stream processing, interactive query and graph computing, avoiding the need to deploy resources brought by different clusters in various computing scenarios. waste. [0004] Spark's memory-based computing properties make it inherently advantageous for iterative computing, and it is especially suitable for i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06G06F17/30
Inventor 黄珊珊徐俊刚王国路刘仁峰
Owner UNIVERSITY OF CHINESE ACADEMY OF SCIENCES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products