Unlock instant, AI-driven research and patent intelligence for your innovation.

A Data Frequency Estimation Method Based on Carry-Based Sketch Data Structure

A data structure and data technology, applied in the direction of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of limited data storage limit, sensitive use of space size, space size restricts accuracy, etc. Achieve the effect of improving accuracy and increasing the upper limit of counting

Active Publication Date: 2021-11-16
PEKING UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, as a lightweight data structure even used by GPUs (Y.Wang, Y.Zu, and et al. Wire speed name lookup: A gpu-based approach. In Proc. USENIX NSDI, pages 199–212, 2013 .), Count-Min Sketch still has great limitations in performance, for example, its query accuracy is more sensitive to the size of the used space, and the limitation of space size will greatly restrict its accuracy
At the same time, its data structure design is relatively simple, resulting in a very limited data storage limit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Data Frequency Estimation Method Based on Carry-Based Sketch Data Structure
  • A Data Frequency Estimation Method Based on Carry-Based Sketch Data Structure
  • A Data Frequency Estimation Method Based on Carry-Based Sketch Data Structure

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0062] Suppose there are 5 different query strings, namely a, b, c, d, e, and the frequencies are 1000, 300, 200, 1200, 400. In the original CM Sketch, a and c are mapped to the same position, and the count of this position is 1000+200=1200. b and d map to the same location, which has a count of 300+1200=1500.

[0063] Now suppose we traverse these strings in the order of edcba, trying to find top-3, and we have found 3 with a maximum value of 350, 340, 330 before. Find e, the query value 400 is large enough, and then go to the hash table to query its real value 400, then the current top-3 are 400, 350, 340 respectively. Find d, query the value of 1500, and then find the real value of 1200, then the current top-3 are 1200, 400, and 350 respectively. Find c, query the value 1200, and then find the real value 200, ignore it. Similarly, b is also ignored. Finally find a, query the real value of 1000, and finally get the top-3 as 1200, 1000, 400. In this process, a total of 5...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data frequency estimation method based on a carry-based Sketch data structure. The method comprises: 1) setting up a Sketch data structure, which is a two-dimensional array composed of counters, wherein each position is an n-bit counter, and a mark bit and a count bit are set up in the n-bit space of the counter; 2) in When performing an update operation, the data item is mapped into the two-dimensional array by a hash function, and the counting bit is used for counting during the mapping process, and when the counting bit reaches its upper limit, the mark bit is used to carry out; 3) when performing a query When operating, return the minimum value among the query values ​​of each row in the two-dimensional array as the query result. This method can adopt the way of fixed marker bits or the way of multi-level dynamic marker bits. The present invention can significantly increase the counting upper limit under the condition that the size of the counter remains unchanged, and can improve the counting accuracy.

Description

technical field [0001] The invention relates to multiple important fields such as network security, financial analysis, machine learning, and natural language processing, and is specifically a data frequency estimation method based on a carry-based Sketch data structure. Background technique [0002] At present, Count-Min Sketch (Graham Cormode, S. Muthukrishnan. An Improved DataStream Summary: The Count-Min Sketch and Its Applications [M]), that is, Count-Min Sketch, is the most used, the best performance, and the most popular A Sketch of various data. It is relatively lightweight, simple and fast for real-time counting, has strong scalability, and has low storage and computational complexity. [0003] However, as a lightweight data structure even used by GPUs (Y.Wang, Y.Zu, and et al. Wire speed name lookup: A gpu-based approach. In Proc. USENIX NSDI, pages 199–212, 2013 .), Count-Min Sketch still has great limitations in performance. For example, its query accuracy is s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2455
CPCG06F16/2455
Inventor 杨仝姜雨萌李晓明
Owner PEKING UNIV