Query log-based database statistic data histogram generation method

A statistical data and database technology, applied in the direction of electrical digital data processing, special data processing applications, calculations, etc., can solve problems such as large histogram errors, and achieve the effect of overcoming the high cost

Inactive Publication Date: 2011-06-22
PEKING UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the existing methods for generating histograms using query logs, incremental updates are used, that is, queries are used one by one to correct the histograms. Due to the lack of definition of the optimization objective function, the error of the generated histogram is caused. It is relatively large, and at the same time, with the successive arrival of queries, the histogram needs to be adjusted frequently (merging or splitting)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Query log-based database statistic data histogram generation method
  • Query log-based database statistic data histogram generation method
  • Query log-based database statistic data histogram generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments:

[0032] The method of the invention derives the data density distribution based on the information about the number of query result rows in the database query log, and uses the idea of ​​maximum entropy, and expresses it in the form of a histogram. like image 3 As shown in the figure, the process of obtaining the density distribution method of the present invention is as follows: First, obtain the query set Q from the database query log, let Q={Q 1 , Q 2 ,...,Q n} is about the attribute set A={A 1 , A 2 ,...,A d} The query set of}, uniformly assumes that the value range of each attribute is [1, N], where the query Q i of the form (a i1 1 i1 )^(a i2 2 i2 )^...^(a id d id ). The total dimension of the data represented by d is the value given in the real data. The division of the data space by the query set Q and the density d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a query log-based database statistic data histogram generation method, which comprises the following steps of: 1) extracting a query set from a database query log, making query of each time corresponding to an attribute, and forming an array by using left and right boundary values of an attribute set; 2) sequencing the array, and generating basic intervals consisting of adjacent dotted pairs; 3) calculating Cartesian products of the intervals on different attributes to form cubes; 4) obtaining density values of the cubes according to a result row number obtained by the query of each time and the Cartesian products of the cubes; and 5) generating a histogram according to the density values of the cubes. The method provided by the invention is used for generating the histogram based on the database query log, and solves the problems that a conventional database system generates the histogram by scanning raw data to cause high cost and that the conventional method for incrementally generating the histogram by utilizing the query log causes great errors.

Description

technical field [0001] The invention relates to a method for generating database histogram statistical information by using the query result row number information contained in the query log. Background technique [0002] In database systems, how to effectively generate accurate statistical information about data distribution is a very important basic problem. This information is used by the query optimizer to estimate the selectivity of the relevant operators in the query plan, estimate its execution cost, and select the optimal execution plan. If there is an error in the statistics, the error can spread exponentially in the query plan, resulting in a dramatic decrease in the performance of the query plan that is actually executed. Histograms are the most common means to describe the distribution of data and are widely used in current commercial databases. Building a histogram requires scanning or sampling the raw data, sorting the data and forming appropriate bucket part...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈立军汪罕卢阳王潇
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products