Large-data-volume secret key duplication removal method and system based on Bloom filter

A Bloom filter and large data volume technology, which is applied in digital transmission systems, transmission systems, database indexes, etc., can solve the problem that it is difficult to achieve accurate deduplication in the deduplication method of large data volume data processing, and achieve deduplication Query efficiency, efficiency improvement, and the effect of efficient query

Active Publication Date: 2021-11-02
ZHEJIANG QUANTUM TECH CO LTD
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a method and system for deduplication of large data volume keys based on Bloom filter, so as to solve the technical defect that it is difficult to achieve accurate deduplication in the prior art for large data volume data processing deduplication methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-data-volume secret key duplication removal method and system based on Bloom filter
  • Large-data-volume secret key duplication removal method and system based on Bloom filter
  • Large-data-volume secret key duplication removal method and system based on Bloom filter

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] Embodiment one: refer to figure 1 As shown, the present invention provides a method for deduplication of a large data volume key based on a Bloom filter, comprising the following steps:

[0070] S1: Obtain the key data to be deduplicated, obtain the key K from the secure key distribution system such as the quantum key distribution system and the quantum key relay network system, and wait for deduplication detection.

[0071] S2: Deduplication system initialization, according to the total amount of target keys S designed by the system, and the expected storage capacity n of a single persistent storage unit, determine the number N of storage units, and then create N database tables or N files; Unit expected storage capacity n, preset expected false positive rate fpp, calculate the size m of a single Bloom filter Bitmap array, and the number k of hash functions, create a Bloom filter BF, and create a corresponding one according to the number of storage units Number...

Embodiment 2

[0080] Embodiment two: refer to figure 2 , image 3 , Figure 4 As shown, the present invention also provides a large data volume key deduplication system based on Bloom filter, including the following components:

[0081] Data acquisition module 201: used to acquire large amount of key data to be stored and deduplicated from key distribution systems such as quantum key distribution system and quantum key relay network system.

[0082] Deduplication system initialization module 202: used to create storage units and bloom filters according to input parameters, such as image 3 As shown, the deduplication system initialization module 202 includes the following submodules:

[0083] (1) Create storage unit submodule 2021: used to determine the number N of storage units according to the expected total key S input by the system and the expected storage capacity n of a single persistent storage unit, and then create N database tables or N files ;

[0084] (2) Crea...

Embodiment 3

[0091] Embodiment three: refer to Figure 5 Shown, on the basis of embodiment one, combined Figure 5 The process of step S5 positive data traversal statistics is described in detail, including sub-steps such as S501, S502, S503, S504, S505, S506, as follows:

[0092] S501: traverse and take out a group of keys K in the specified storage unit;

[0093] S502: Determine whether the key K already exists in the HashSet set of positive data output in step S4. If it does not exist, it means that the key K is unique and does not need to be processed. Jump to step S501 to start the next round of traversal statistics. If it exists, go to S503 deal with;

[0094] S503: The key K exists in the HashSet set of positive data, indicating that the key K may be repeated, and the actual storage location information of the key K is obtained, that is, the file displacement of the key K in the storage unit or the database master key can represent the key. information on the actual sto...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-data-volume secret key deduplication method based on a Bloom filter. The method comprises the following steps: obtaining data to be subjected to deduplication; initializing a deduplication system; dividing and storing the data; performing Bloom deduplication on the data; performing traversal statistics on positive data; performing accurate duplicate removal on the data; and completing precise duplicate removal of the large-data-volume key data. The invention further provides a large-data-volume secret key duplicate removal system based on the Bloom filter. The accurate duplicate removal of the large-data-volume key data is completed. Compared with the prior art, a divide-and-conquer storage method and an accurate duplicate removal method based on positive data are provided for large-data-volume key duplicate removal, the large-data-volume keys are uniformly guided and stored to different storage units according to hash remainder, it is guaranteed that the duplicate keys are in the same data set, the BitSet space occupation and deduplication operation consumption required by a single Bloom filter are reduced, that is, the space and time efficiency of the Bloom filter during deduplication operation is improved, accurate deduplication of key data is realized based on positive data HashSet set traversal statistics, and the deduplication accuracy and the key quality are improved.

Description

technical field [0001] The invention relates to the technical field of electrical digital data processing, in particular to a Bloom filter-based method and system for deduplicating keys with a large amount of data. Background technique [0002] With the continuous development of quantum key distribution technology and quantum key relay technology, there has been a situation where the server stores a large amount of data keys in practical applications. As the amount of key data continues to grow, it has become an urgent need to remove duplicate keys for large data volume keys. Deduplication of keys can more effectively ensure key security and improve key quality. Currently, Bloom filter algorithm is often used to deduplicate such a large amount of data, based on multiple hash functions and Bitmap binary vector storage to achieve the purpose of data deduplication, and the efficiency of time and space is relatively high, but simply Using this Bloom filter scheme has a misjudgm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/22H04L9/08
CPCH04L9/0894G06F16/215G06F16/2237
Inventor 丁胜建封连重
Owner ZHEJIANG QUANTUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products