Distributed memory calculation based data deduplication method
A memory computing and distributed technology, applied in computing, database query, electrical digital data processing, etc., can solve the problems of time-consuming and system resources, low deduplication efficiency, etc., and achieve the effect of fast deduplication.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0035] A data deduplication method based on distributed memory computing, comprising the following steps:
[0036] (1) Create a file block fingerprint set in the distributed memory, and cache the fingerprint set in the memory. Among them, the content of the fingerprint set: one part is the corresponding path of the block, the creation time of the block, the HASH value of the block, etc.; the other part is the creation time of the fingerprint set, the number of references of the fingerprint set, the weight of the fingerprint set, etc. The first part is used to map the fingerprint set and the block, and the second part is used to control the cache of the fingerprint set to the distributed memory or to the disk.
[0037](2) When creating a fingerprint set, add a unified initial weight to it to determine the cache location of the fingerprint set. The initial weight of each fingerprint set decays gradually over time until the initial weight is zero.
[0038] (3) When performing f...
Embodiment 2
[0047] Apply the present invention to data deduplication based on Spark system:
[0048] Such as figure 1 As shown, it is a flow chart of the present invention. Firstly, a file block fingerprint set is constructed in the distributed memory, and an initial weight is added to the created fingerprint set to determine the cache location of the fingerprint set. The initial weight gradually decays over time. , until it is zero; the file is divided into blocks according to the optimal file block division strategy, and the block fingerprint calculation is completed, compared with the fingerprint set cached in the memory, if a matching fingerprint set is found, a corresponding reference is added to it, if not If it is found, create the block and a new fingerprint set on the disk; the activity of the fingerprint set in memory is represented by its weight value, and according to the order of the weight value, use the weight value to control whether the fingerprint set is cached in memory...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com