Unlock instant, AI-driven research and patent intelligence for your innovation.

Carry-based data frequency estimation method for Sketch data structure

A data structure and data technology, applied in the direction of electrical digital data processing, special data processing applications, computing, etc., can solve the problems of limited data storage limit, sensitive use of space, and Count-MinSketch performance limitations, etc., to reach the upper limit of counting The effect of improving and improving the degree of accuracy

Active Publication Date: 2018-07-20
PEKING UNIV
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, as a lightweight data structure even used by GPUs (Y.Wang, Y.Zu, and et al. Wire speed name lookup: A gpu-based approach. In Proc. USENIX NSDI, pages 199–212, 2013 .), Count-Min Sketch still has great limitations in performance, for example, its query accuracy is more sensitive to the size of the used space, and the limitation of space size will greatly restrict its accuracy
At the same time, its data structure design is relatively simple, resulting in a very limited data storage limit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Carry-based data frequency estimation method for Sketch data structure
  • Carry-based data frequency estimation method for Sketch data structure
  • Carry-based data frequency estimation method for Sketch data structure

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0062] Suppose there are 5 different query strings, namely a, b, c, d, e, and the frequencies are 1000, 300, 200, 1200, 400. In the original CM Sketch, a and c are mapped to the same position, and the count of this position is 1000+200=1200. b and d map to the same location, which has a count of 300+1200=1500.

[0063] Now suppose we traverse these strings in the order of edcba, trying to find top-3, and we have found 3 with a maximum value of 350, 340, 330 before. Find e, the query value 400 is large enough, and then go to the hash table to query its real value 400, then the current top-3 are 400, 350, 340 respectively. Find d, query the value of 1500, and then find the real value of 1200, then the current top-3 are 1200, 400, and 350 respectively. Find c, query the value 1200, and then find the real value 200, ignore it. Similarly, b is also ignored. Finally find a, query the real value of 1000, and finally get the top-3 as 1200, 1000, 400. In this process, a total of 5...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a carry-based data frequency estimation method for a Sketch data structure. The method comprises the steps of 1) establishing the Sketch data structure, which is a two-dimensional array consisting of counters, wherein each position is an n-bit counter, and a flag bit and a count bit are set in an n-bit space of the counter; 2) during update operation, mapping data items tothe two-dimensional array through a hash function, performing counting through the count bit in the mapping process, and performing carry by using the flag bit when the count bit reaches an upper limit; and 3) during query operation, returning a minimum value in query values of each row in the two-dimensional array to serve as a query result. According to the method, a fixed flag bit mode or a multilevel dynamic flag bit mode can be adopted; and the counting upper limit can be remarkably increased under the condition that the counter size is unchanged, so that the counting accuracy can be improved.

Description

technical field [0001] The invention relates to multiple important fields such as network security, financial analysis, machine learning, and natural language processing, and is specifically a data frequency estimation method based on a carry-based Sketch data structure. Background technique [0002] At present, Count-Min Sketch (Graham Cormode, S. Muthukrishnan. An Improved DataStream Summary: The Count-Min Sketch and Its Applications [M]), that is, Count-Min Sketch, is the most used, the best performance, and the most popular A Sketch of various data. It is relatively lightweight, simple and fast for real-time counting, has strong scalability, and has low storage and computational complexity. [0003] However, as a lightweight data structure even used by GPUs (Y.Wang, Y.Zu, and et al. Wire speed name lookup: A gpu-based approach. In Proc. USENIX NSDI, pages 199–212, 2013 .), Count-Min Sketch still has great limitations in performance. For example, its query accuracy is s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/2455
Inventor 杨仝姜雨萌李晓明
Owner PEKING UNIV