Method for improving query efficiency of Spark SQL

A technology for query efficiency and intermediate data, which is applied to improve the query efficiency of SparkSQL, and can solve the problem of high disk I/O overhead

Active Publication Date: 2018-10-26
SOUTHEAST UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a kind of method that improves the query efficiency of Spark SQ

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving query efficiency of Spark SQL
  • Method for improving query efficiency of Spark SQL
  • Method for improving query efficiency of Spark SQL

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further illustrated below in conjunction with specific embodiments, and it should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0041] A method for improving the query efficiency of Spark SQL, the method comprises the steps:

[0042] Step S1: Build a query pre-analysis module, calculate the size of the intermediate data generated by Shuffle through the estimation model, and calculate the total size of the intermediate data cache layer used to cache the intermediate data;

[0043] Step S2: According to the total size of the intermediate data cache layer calculated in step 1, combined with the distribution of input data of each node in the cluster, set a reasonable memory space size for each node through the cache layer allocation module.

[0044]Further, the specific method for calculating the size of the intermediate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for improving query efficiency of a Spark SQL. The method comprises the steps of S1, establishing a query pre-analysis module, calculating sizes of intermediate data produced by Shuffle through utilization of an estimation model, and calculating a total size of an intermediate data cache layer for caching the intermediate data; and S2, setting a reasonable memory space size for each node based on a cache layer allocation module, through utilization of distribution condition of input data of each node in a cluster, according to the total size of the intermediatedata cache layer calculated in the S1. According to the method, the problem that the disk I/O cost is relatively high in Spark SQL query can be effectively solved through utilization of a Shuffle intermediate data cache processing method.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products