Top-down real-time big data query optimization method based on bushy tree

A top-down, query optimization technology, applied in the field of big data query, can solve the problems of long optimization time, cost model system limitations, etc., to facilitate production and life, reduce execution time, and improve query efficiency

Inactive Publication Date: 2015-04-08
ZHEJIANG UNIV
View PDF1 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Multi-table connection sequence optimization is an important field of database management system performance optimization. The present invention aims at problems such as the current dense tree-based or top-down query optimization algorithm optimization time is long, and the traditional cost model has certain system limitations. Proposed real-time query optimization method for big data based on dense tree and top-down

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Top-down real-time big data query optimization method based on bushy tree
  • Top-down real-time big data query optimization method based on bushy tree
  • Top-down real-time big data query optimization method based on bushy tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] The present invention will be described in detail below with reference to the drawings and specific embodiments.

[0067] In this embodiment, the cost of each query plan tree is defined as the comprehensive cost of the disk read cost, network transmission cost and table size of the query plan tree. The cost of any query plan tree T is according to the following formula (ie cost model) calculate:

[0068]

[0069] Among them, α L +α R +β+γ+δ=1;

[0070] L and R are respectively the left subtree and the right subtree of the query plan tree;

[0071] C L The cost of the left subtree, C R is the cost of the right subtree, calculated recursively using this formula;

[0072] IO L The cost of reading the data corresponding to the left subtree for the disk;

[0073] TR is the network transmission cost, S is the size of the data, S LR is the data size after hash-joining the left subtree and the right subtree.

[0074] Since the 2 subtrees can be executed in parallel...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a top-down real-time big data query optimization method based on a bushy tree. The method comprises the following steps of (1) analyzing a query sentence, and constructing an initial query super-graph according to the analyzed query sentence; (2) stepwise decomposing the initial query super-graph in a top-down mode according to the level based on a lowest cost principle for a query plan tree until an optimal query plan tree of the initial query super-graph is obtained, so that real-time big data query optimization is completed. According to the method, a searching space of the bushy tree is constructed, an optimal cost model and a pruning policy are combined, and the sizes of disk I / O (Input / Output), network transmission and an intermediate result are taken into comprehensive consideration, so that the generation of an optimal connection sequence is guaranteed, the query efficiency is improved, the development of a real-time big data query technology is promoted, the service quality of real-time big data query is improved, and the production and the life of people are facilitated.

Description

technical field [0001] The invention relates to the technical field of big data query, in particular to a dense tree-based and top-down big data real-time query optimization method. Background technique [0002] With the advent of the big data era, fast query and processing of massive data has become an urgent need for Internet, telecommunications, financial and other types of enterprises. In order to meet such demands, big data real-time query systems have emerged, such as Google Dremel, Berkeley Shark, and Cloudera Impala. Big data real-time query generally adopts a distributed architecture, and by weakening the support for functions such as transactions, it can meet the real-time query needs of users in a massive data environment. [0003] Query optimization is mainly composed of three parts: search space, search strategy and cost model. [0004] The search space can be represented by query trees, which are mainly divided into left-deep trees (right-deep trees) and dens...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2453
Inventor 陈岭马骄阳
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products