A method and system for deduplicating keys with a large amount of data based on Bloom filter
A Bloom filter and large data volume technology, which is applied in digital transmission systems, transmission systems, database indexes, etc., can solve the problem that it is difficult to achieve accurate deduplication in the deduplication method of large data volume data processing, and achieve deduplication Effects of query efficiency, space improvement, quality improvement and safety improvement
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0068] Embodiment one: refer to figure 1 As shown, the present invention provides a method for deduplication of a large data volume key based on a Bloom filter, comprising the following steps:
[0069] S1: Obtain the key data to be deduplicated, obtain the key K from the secure key distribution system such as the quantum key distribution system and the quantum key relay network system, and wait for deduplication detection.
[0070] S2: Deduplication system initialization, according to the total amount of target keys S designed by the system, and the expected storage capacity n of a single persistent storage unit, determine the number N of storage units, and then create N database tables or N files; Unit expected storage capacity n, preset expected false positive rate fpp, calculate the size m of a single Bloom filter Bitmap array, and the number k of hash functions, create a Bloom filter BF, and create a corresponding one according to the number of storage units Number...
Embodiment 2
[0079] Embodiment two: refer to figure 2 , image 3 , Figure 4 As shown, the present invention also provides a large data volume key deduplication system based on Bloom filter, including the following components:
[0080] Key acquisition module 201: used to acquire a key with a large amount of data to be stored and deduplicated from key distribution systems such as quantum key distribution system and quantum key relay network system.
[0081] Deduplication system initialization module 202: used to create storage units and bloom filters according to input parameters, such as image 3 As shown, the deduplication system initialization module 202 includes the following submodules:
[0082] (1) Create storage unit submodule 2021: used to determine the number N of storage units according to the expected total key S input by the system and the expected storage capacity n of a single persistent storage unit, and then create N database tables or N files ;
[0083] ...
Embodiment 3
[0090] Embodiment three: refer to Figure 5 Shown, on the basis of embodiment one, combined Figure 5 The process of step S5 positive data traversal statistics is described in detail, including sub-steps such as S501, S502, S503, S504, S505, S506, as follows:
[0091] S501: traverse and take out a group of keys K in the specified storage unit;
[0092] S502: Determine whether the key K already exists in the HashSet set of positive data output in step S4. If it does not exist, it means that the key K is unique and does not need to be processed. Jump to step S501 to start the next round of traversal statistics. If it exists, go to S503 deal with;
[0093] S503: The key K exists in the HashSet set of positive data, indicating that the key K may be repeated, and the actual storage location information of the key K is obtained, that is, the file displacement of the key K in the storage unit or the database master key can represent the key. information on the actual sto...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com