Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data

A query condition and big data technology, applied in the field of big data computing, can solve the problem of not supporting cardinality statistics, and achieve the effects of reducing calculation errors, improving update efficiency, and high query efficiency

Active Publication Date: 2014-01-29
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional cardinality estimation algorithms, such as Linear Counting, LogLog Counting, HyperLogLog Counting, Adaptive Counting, and Bloomfil

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
  • Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0045] Example 1

[0046] A cardinality estimation method under the condition of multi-interval query of big data, comprising the following steps:

[0047] Step 1: The big data is pre-divided into multiple partitions according to numerical attributes, and each partition stores a segment of the data source in the big data, and the partitions are arranged in an orderly manner;

[0048] Step 2: Establish a tree-shaped index structure, each partition is used as a node of the tree-shaped index structure, each node is used to record the maximum and minimum values ​​of the corresponding partition, and a data file and a cardinality estimator are set in each node;

[0049] Step 3: Obtain the data source to be written into the tree index structure, and perform inverted index processing on the data source supporting the interval query condition;

[0050] Step 4: Write the corresponding part of the data source processed by the inverted index into the data file and the cardinality estimat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cardinal number estimating method and a cardinal number estimating device under a multi-section query condition of big data. The cardinal number estimating method comprises the following steps of dividing the big data into a plurality of sections in advance according to numerical attributes; establishing a tree-shaped index structure, and utilizing each section as a node of the tree-shaped index structure; acquiring a data source to be written in the tree-shaped index structure, and reversely indexing the data source which supports the section query condition; writing the data source which is reversely indexed into the nodes of the tree-shaped index structure, and respectively writing corresponding portions of the data source into data files and a cardinal number estimator; and querying the nodes meeting the section query condition in the tree-shaped index structure according to the section query condition to obtain the cardinal number estimator in the nodes, and logically processing the cardinal number estimator to obtain cardinal number estimated values. The cardinal number counting efficiency is improved by reducing data calculating precision; the query efficiency is high under the optional multi-section query condition; and the online index data updating efficiency is improved by using a big data incremental updating technology.

Description

technical field [0001] The invention relates to the field of big data computing, in particular to a cardinality estimation method and device under the condition of big data multi-interval query. Background technique [0002] With the development of mobile Internet and Web2.0, the amount of global data is growing alarmingly: in 2008, the amount of data generated globally was 0.49ZB (1ZB=1021 bytes), in 2009 it was 0.8ZB, in 2010 it was 1.2ZB, in 2011 up to 1.82ZB per year. IDC predicts that by 2020, all human beings will generate more than 40ZB of data. An important type of data in big data is structured and semi-structured data, mainly including: transaction data, log data, etc. It is understood that Taobao’s daily new transaction data reaches 10TB, eBay’s analysis platform processes up to 100PB of data per day, and Wal-Mart stores about 2.5PB of data into the database every hour. More and more valuable information can be obtained by mining and analyzing log data. For ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 云晓春徐小琳王明华刘阳李志辉吴广君王树鹏王勇常为领
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products