Big data environment-oriented summary information dynamic constructing and querying method and device

A construction method and big data technology, applied in the information field, can solve problems such as data statistics that cannot slide windows, and achieve the effect of improving estimation efficiency

Active Publication Date: 2015-05-27
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method can only maintain the query for the data in the window, and cannot count the data beyond the sliding window

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data environment-oriented summary information dynamic constructing and querying method and device
  • Big data environment-oriented summary information dynamic constructing and querying method and device
  • Big data environment-oriented summary information dynamic constructing and querying method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0037] The present invention relates to the following parameters, and the symbolic representation and specific meaning description are as shown in Table 1:

[0038] Table 1. Symbol representation and specific meaning description

[0039]

[0040]

[0041] The basic idea of ​​the DCM sketch designed by the present invention is: pre-allocate a Count-Min Sketch with a smaller space, and as the data is continuously loaded, when the number of data items recorded in the initial Count-Min Sketch reaches the threshold and the value space base reaches After the threshold r×w (r is a preset ratio value, there is almost no "collision" in the Count-Min Sketch at this time, and w is the width of the two-dimensional counting array), and a new Count-Min...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a big data environment-oriented summary information dynamic constructing and querying method and a device. According to the method, based on a Count-Min Sketch method, data scale is described by a first norm of a dataflow, and the distribution situation of data is described by a cardinal number value of data; the method comprises the following steps of distributing a smaller space of Count-Min Sketch structure to streaming big data; along with the continuous loading of data, establishing a new Count-Min Sketch structure for receiving subsequent new data when a data item number recorded by the Count-Min Sketch structure reaches a threshold value and a numerical value space cardinal number reaches a threshold value. By utilizing the method, the new Sketch structure can be established automatically according to data size and the numerical value space cardinal number, so as to count data with higher precision, and effectively support high-precision real-time counting and analysis of the streaming big data.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a method and device for dynamically constructing and querying summary information facing a big data environment. Background technique [0002] Streaming big data refers to data sources that arrive at high speed in the form of data streams and are written to the storage management system in real time, also known as FastData. Streaming big data not only has the characteristics of high throughput and huge volume, but also the data scale and data value range are often unpredictable. For example, massive Weibo data, real-time transaction logs, click stream of portal websites, etc. Effective processing and analysis of the above-mentioned data can fully mine the valuable information hidden in the massive data sources, statistical data rules, and provide decision-makers with important decision-making support basis. [0003] But for streaming big data, traditional stati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2228G06F16/2255G06F16/2365G06F16/2462G06F16/2465
Inventor 吴广君王树鹏陈明张晓宇张燕琴
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products