An online aggregation method for multi-table joining based on Markov chain

A Markov chain and connection graph technology, applied in the field of big data analysis, can solve problems such as inaccurate result estimation and slow convergence of confidence intervals

Active Publication Date: 2019-02-05
BEIJING INSTITUTE OF CLOTHING TECHNOLOGY
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above problems, the present invention proposes an online aggregation method for multi-table connections based on Markov chains, which converts the multi-table connection processing process into a traversal walk process on the Markov chain, and creates a branch at the starting point of the walk based on the model. Layer samples, and perform unbiased estimation and confidence interval calculation for the sampling method, effectively solving the problem of inaccurate result estimation and slow convergence of confidence intervals caused by connection load or data skew

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An online aggregation method for multi-table joining based on Markov chain
  • An online aggregation method for multi-table joining based on Markov chain
  • An online aggregation method for multi-table joining based on Markov chain

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0009] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0010] A kind of multi-table connection online aggregation method based on Markov chain that the present invention proposes adopts the natural connection of four tables to illustrate its modeling process, assuming that the connection form is:

[0011] SELECT op(exp(t 1i ,t 2j ,...,t km )) FROM R 1 , R 2 , R 3 , R 4

[0012] WHERE R 1 .A=R 2 .B and R 2 .C=R 3 .D and R 3 .E=R 4 .F GR...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an on-line aggregation method of multi-table connection based on Markov chain, includes two phases: sample creation and online gathering, At that sample creation stage, Create ahierarchical sample of the original dataset in conjunction with load characteristics, The hierarchy is based on the grouped column set in the query load, so that the probability of the column set appearing in the load and the probability of the grouped column set being covered in the load are maximized. Based on the determined grouped column set and the distribution of the index, the join order of each table is determined, and the hierarchy sample is created at the running start of the Markov chain. In the online aggregation stage, the multi-table join query statement submitted by the user isparsed, the samples with the lowest query cost are dynamically selected for stratified sampling, the size of samples extracted from each sample layer is determined, and the query result and the confidence interval are further estimated.

Description

technical field [0001] The invention relates to a big data analysis method, and mainly relates to an online aggregation method for multi-table connections based on Markov chains. Background technique [0002] Social media, mobile devices, and sensors continue to generate massive amounts of data at an unprecedented rate. Exploring the value behind these data has become a matter of great concern to the industry and academia. However, complex data analysis tasks run slowly on massive data. The timeliness and value of the analysis results are greatly reduced, and it becomes a bottleneck for data-driven tasks to play a role. Ad hoc interactive data analysis plays an important role in the fields of decision support, trend analysis, and data visualization, and has become one of the urgent problems to be solved in the field of big data analysis. Online aggregation continuously processes part of the sample data, so that statistically significant estimation results can be returned in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2453G06F16/22G06F16/2455
Inventor 史英杰刘怡郭飞刘昊
Owner BEIJING INSTITUTE OF CLOTHING TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products