Construction and query optimization methods for multiple layers of Bloom Filters

A technology of optimization method and query method, which is applied in the direction of instrumentation, calculation, electrical digital data processing, etc., can solve the problem of excessive element query time, achieve the effect of reducing query time, easy query operation, and reducing the number of disk accesses

Active Publication Date: 2012-11-28
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This directly causes the time for element query to exceed our acceptable range

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction and query optimization methods for multiple layers of Bloom Filters
  • Construction and query optimization methods for multiple layers of Bloom Filters
  • Construction and query optimization methods for multiple layers of Bloom Filters

Examples

Experimental program
Comparison scheme
Effect test

example

[0066] For a massive data deduplication system with a storage capacity of 512TB, it is assumed that it is based on block-level deduplication, the block size is 4KB, each block corresponds to a fingerprint, and the number of fingerprints is 2 37 1, each fingerprint is 20 bytes, plus other metadata information, a fingerprint item requires 32 bytes, a total of 4TB fingerprint library; it cannot fit in the memory; when a new data block arrives , it is necessary to determine whether it is the same as the stored data, that is, whether the fingerprint of the data block is the same as the existing fingerprint;

[0067] In order to speed up the fingerprint search process, the present invention introduces multi-layer Bloom Filter, assuming that the error rate of Bloom Filter is 1 / 10,000, and taking 10 hash functions, the size of each layer of Bloom Filter is up to 320GB, and two layers are 640GB. It also cannot fit in the memory and needs to be placed on the disk, and its query will cau...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses construction and query optimization methods for multiple layers of Bloom Filters. During construction, bit positions of relevant Bloom Filters in the conventional multiple layers of Bloom Filters are rearranged; the bit positions of the first layer of Q Bloom Filters and the same bit positions of Q Bloom Filters of the lower layer, which correspond to each Bloom Filter of the upper layer, are put in the same continuous address space; during query, the bit positions of the Q Bloom Filters of the same layer, which correspond to a hash value, are positioned in the same continuous address space; and the multiple layers of Bloom Filters can be queried by querying a small number of continuous spaces. According to the multiple layers of optimized Bloom Filters, under the condition that the storage space is not increased, the bit position query operation is relatively easy, and the frequency for accessing a magnetic disk is greatly reduced; and the query time of the multiple layers of Bloom Filters is effectively shortened.

Description

technical field [0001] The invention relates to the field of computer storage, in particular to a construction and query optimization method of a multi-layer Bloom Filter. Background technique [0002] Bloom filter is a binary vector data structure proposed by Howard Bloom in 1970, which can be used to quickly determine whether an element exists in a set. Compared with methods such as hashing and trees, Bloom Filter can guarantee the spatial locality of the data set to be queried when it is stored. As the data set to be queried grows, the data set can be divided into several data sets with the same small capacity, each corresponding to a Bloom Filter. Since the queried data needs to query each Bloom Filter in turn until the data is found or the query ends, the query time of multiple Bloom Filters is greatly increased. In order to speed up the query process of massive data sets, multi-layer Bloom Filter is introduced. When the determination element of the upper-level Bloom...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 曹强黄建忠谢长生荣益麟慎涵黄国强
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products