Method for implementing repeated data deletion technology based on single-hash averaging Bloom filter

A technology for data deduplication and implementation methods, applied in the computer field, can solve the problems of increased resources consumed by Bloom filters, lower cost performance, and lower computing power, and achieve mutual independence, low computing consumption, and fast filtering Effect

Pending Publication Date: 2021-01-01
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although Bloom filters are widely used in network applications, due to the high requirements for hash functions (mutual independence and good randomness) and limited storage capacity in Standard Bloom filters (SBF) Space, resulting in more resources consumed by the Bloom filter, reduced computing power, and lower cost performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for implementing repeated data deletion technology based on single-hash averaging Bloom filter
  • Method for implementing repeated data deletion technology based on single-hash averaging Bloom filter
  • Method for implementing repeated data deletion technology based on single-hash averaging Bloom filter

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0050] The implementation method of deduplication technology based on single hash uniform distributed long filter, such as figure 1 shown, including the following steps:

[0051] S1. Determine the length of the storage area, determine the length of the partition, determine the first data set D1 including D data that needs to be stored, set j=1, and determine the second data set D2 to be queried;

[0052] The length of the storage area is the storage size M of the single-hash uniform distribution filter, then the final length of each partition is the integer part of M / k, and the length of the last partition can be less than M / k.

[0053] In this embodiment, the first data set D1 contains a data x 1 , the storage area length is 24, k is 3, and 3 partitions p 1 ,p 2 ,p 3 The length of each is 8;

[0054] S2. Select a high-demand hash function within the scope of the storage area, and take the j-th data d in the first data set D1 j Carry out hash calculation and obtain hash ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for implementing a repeated data deletion technology based on a single-hash averaging Bloom filter. The method comprises the steps that firstly, hash functions with high requirements in a partition range are used, then k hash maps are generated through k hash functions, the adopted k hash functions are modulo operation with extremely low calculation magnitude, andthen scaling mapping is carried out to partitions with the same size; a single-hash averaging Bloom filter is generated from the stored data through calculation and storing the single hash uniform distribution filter; new data is proved not to exist if the mapping blocks are not repeated by generating a new single-hash averaging Bloom filter. According to the method for implementing the repeated data deletion technology based on the single-hash averaging Bloom filter, the data which may be repeated can be quickly and effectively filtered.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method for implementing a deduplication technology based on a single-hash uniform distribution filter, Background technique [0002] Nowadays, there are often a large amount of data screening and qualification review requirements in network applications, such as data deduplication technology. It is usually a good solution to add a filter structure, among which Bloom filter is one of the most commonly used structures. . Although Bloom filters are widely used in network applications, due to the high requirements for hash functions (mutual independence and good randomness) and limited storage capacity in Standard Bloom filters (SBF) space, resulting in more resources consumed by the Bloom filter, lower computing power, and lower cost performance. (An example can be given here) Therefore, how to reduce the resources consumed by the Bloom filter and at the same time redu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/22
CPCG06F16/215G06F16/2255
Inventor 齐德昱俞快
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products