Cardinality estimation method aiming at streaming big data

A big data and cardinality technology, applied in the field of big data computing, can solve problems such as the inability to guarantee real-time data processing, decrease in estimation accuracy, and reduce algorithm efficiency, so as to avoid repeated calculations, improve accuracy, and reduce computing resources.

Inactive Publication Date: 2017-05-24
XIDIAN UNIV
View PDF1 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a cardinality estimation method for streaming big data, aiming to solve the problem that the existing cardinality estimation method for streaming big data has a relatively large computational complexity, which greatly reduces the Algorithm efficiency; the estimation accuracy will decrease, and at the same time, when the streaming big data increases sharply, the calculation time will increase, and the problem of real-time data processing cannot be guaranteed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cardinality estimation method aiming at streaming big data
  • Cardinality estimation method aiming at streaming big data
  • Cardinality estimation method aiming at streaming big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0039] The application principle of the present invention will be described in detail below with reference to the accompanying drawings.

[0040] Such as figure 1 As shown, the cardinality estimation method for streaming big data provided by the embodiment of the present invention includes the following steps:

[0041] S101: Divide the big data into multiple partitions at the same time interval according to the arrival time of the streaming data, each partition stores a piece of data source in the big data, and the partitions are arranged in an orderly manner according to a time series relationship;

[0042] S102: Build a st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cardinality estimation method aiming at streaming big data. The method is characterized in that cardinality estimation efficiency is increased by lowering calculation precision; partition calculation is performed on the intermediate statistical information needed by as HyperLogLog Counting algorithm, an efficient Hash algorithm and an optimal bucketing number are selected, an improved bucketing method is used to evenly map hashed data into different bucket numbers, increment maintenance is performed, and combination is then performed to obtain the final intermediate statistical information so as to calculate a cardinality estimation value. The method has the advantages that historical data is effectively utilized, repeated calculation is avoided, and the cardinality estimation efficiency is increased greatly; high-precision cardinality estimation is achieved, and the efficient bucketing method is provided as compared with a traditional algorithm; the algorithm is quite low in space complexity, and calculation resource consumption is lowered.

Description

Technical field [0001] The invention belongs to the technical field of big data computing, and in particular relates to a cardinal number estimation method for streaming big data. Background technique [0002] In the current era of big data, big data can be divided into two types: batch big data and streaming big data. If you treat data as a reservoir, the water in the reservoir is batch-type big data, and the incoming water is stream-type big data. Streaming big data refers to the data source that arrives in a data stream and is written to the storage management system in real time, also known as FastData. It has the characteristics of high throughput and huge volume, and the data scale and data value range are often unpredictable. Cardinality refers to the number of different elements in a set (repetitive elements are allowed, which is slightly different from the strict definition of set in set theory). Accurate cardinality counting is often inadequate when facing big data s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2462G06F16/24545
Inventor 赵兴文王浩李晖朱辉
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products