Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data

A query condition and big data technology, applied in the field of big data computing, can solve the problem of not supporting cardinality statistics, and achieve the effects of reducing calculation errors, improving update efficiency, and high query efficiency

Active Publication Date: 2014-01-29
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional cardinality estimation algorithms, such as Linear Counting, LogLog Counting, HyperLogLog Counting, Adaptive Counting, and Bloomfilter, can solve simple statistics of the number of different elements, but do not support cardinality statistics under interval query conditions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] A cardinality estimation method under the condition of large data multi-interval query, comprising the following steps:

[0047] Step 1: Pre-divide the big data into multiple partitions according to the numerical attributes, store a section of data source in the big data in each partition, and arrange the partitions in an orderly manner;

[0048] Step 2: Establish a tree index structure, each partition is used as a node of the tree index structure, each node is used to record the maximum and minimum values ​​of the corresponding partition, and a data file and a cardinality estimator are set in each node;

[0049] Step 3: Obtain the data source to be written into the tree index structure, and perform inverted index processing on the data source that supports interval query conditions;

[0050] Step 4: Write the corresponding part of the data source processed by the inverted index into the data file and the cardinality estimator respectively;

[0051] Step 5: Query the n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a cardinal number estimating method and a cardinal number estimating device under a multi-section query condition of big data. The cardinal number estimating method comprises the following steps of dividing the big data into a plurality of sections in advance according to numerical attributes; establishing a tree-shaped index structure, and utilizing each section as a node of the tree-shaped index structure; acquiring a data source to be written in the tree-shaped index structure, and reversely indexing the data source which supports the section query condition; writing the data source which is reversely indexed into the nodes of the tree-shaped index structure, and respectively writing corresponding portions of the data source into data files and a cardinal number estimator; and querying the nodes meeting the section query condition in the tree-shaped index structure according to the section query condition to obtain the cardinal number estimator in the nodes, and logically processing the cardinal number estimator to obtain cardinal number estimated values. The cardinal number counting efficiency is improved by reducing data calculating precision; the query efficiency is high under the optional multi-section query condition; and the online index data updating efficiency is improved by using a big data incremental updating technology.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a cardinality estimation method and device under the condition of big data multi-interval query. Background technique [0002] With the development of mobile Internet and Web2.0, the amount of global data is growing alarmingly: in 2008, the amount of data generated globally was 0.49ZB (1ZB=1021 bytes), in 2009 it was 0.8ZB, in 2010 it was 1.2ZB, in 2011 up to 1.82ZB per year. IDC predicts that by 2020, all human beings will generate more than 40ZB of data. An important type of data in big data is structured and semi-structured data, mainly including: transaction data, log data, etc. It is understood that Taobao’s daily new transaction data reaches 10TB, eBay’s analysis platform processes up to 100PB of data per day, and Wal-Mart stores about 2.5PB of data into the database every hour. More and more valuable information can be obtained by mining and analyzing log data. For ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 云晓春徐小琳王明华刘阳李志辉吴广君王树鹏王勇常为领
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products