Self-adaptive threshold repeated data deletion method based on greedy selection

A technology of deduplication and self-adaptive threshold, which is applied in the direction of data error detection, electrical digital data processing, data processing input/output process, etc., which can solve the problem of deduplication rate reduction and rewriting effect. Different, slower recovery, etc.

Pending Publication Date: 2021-11-09
JINAN UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Existing rewrite algorithms use a fixed threshold to limit old container references, such as figure 1 As shown, when the fixed threshold is 10, the rewriting effect is different for different data segments
Data segment 1 will discard some old container references with high utilization, resulting in an unnecessary reduction in the deduplication rate
And data segment 2 will refer to some old containers with low utilization, resulting in slower recovery

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adaptive threshold repeated data deletion method based on greedy selection
  • Self-adaptive threshold repeated data deletion method based on greedy selection
  • Self-adaptive threshold repeated data deletion method based on greedy selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042] This embodiment discloses a greedy selection based adaptive threshold deduplication method, such as image 3 As shown, the method is implemented based on a deduplication backup system, and the deduplication backup system includes: a backup performance maintenance module, a recovery performance maintenance module, a threshold adjustment module and a data writing module.

[0043] Among them, the backup performance maintenance module is located in the memory of the deduplication backup system, and controls the lower limit of the deduplication rate according to the parameter X and the deduplication status of the previous version, and controls the decrease of the deduplication rate by limiting the amount of rewritten data to ensure deduplication High deduplication ratios for backup systems.

[0044] The recovery performance maintenance module is in the deduplication backup system memory, according to the parameter F level and the reference amount of the old container involv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a self-adaptive threshold repeated data deletion method based on greedy selection. Along with the sharp increase of the data volume in the deduplication backup system, the matching problem of the total disk capacity of the deduplication backup system and the backup data volume becomes one of main contradictions. Therefore, a rewriting algorithm is added in data deduplication to relieve damage to a deduplication backup system caused by fragmentation. According to the method, the old container with a better effect is selected and referenced for deduplication, so that the data recovery performance of a deduplication backup system is improved while the high deduplication rate is ensured. According to the self-adaptive threshold repeated data deletion method based on greedy selection, different data segments are subjected to duplicate removal by using different thresholds according to the distribution condition of the effective reference quantity of old containers involved in each data segment, and then an old container group with the maximum effective reference quantity is selected in a greedy manner, so that the recovery performance of the duplicate removal backup system is improved while the high duplicate data deletion rate is ensured.

Description

technical field [0001] The invention relates to the technical field of data storage and data deduplication, in particular to a greedy selection-based adaptive threshold deduplication method. Background technique [0002] With the widespread use of computers in various fields of society, the amount of digital information in various industries and enterprises has grown explosively, and the demand for capacity for storing data has also increased. At the same time, the surge in data volume has also brought a series of Related issues, for example, there are more and more time points when enterprises need to quickly back up and restore data, and the expenditures for data backup storage, management and related energy consumption in enterprises are increasing, which has gradually become a major factor restricting the economic development of enterprises . Studies have found that up to 60% of the data stored in the deduplication backup system is redundant, and as time goes by, the am...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06G06F9/455G06F11/14
CPCG06F3/0604G06F3/0641G06F3/065G06F3/0676G06F9/45558G06F11/1448G06F11/1469
Inventor 邓玉辉林丽芳
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products