Redundant data marking and removing method

A technology of redundant data and marking, applied in digital data processing, input/output process of data processing, input/output to record carrier, etc., can solve the problems of occupying system storage resources, low deduplication efficiency, slow speed, etc. , to achieve the effect of high redundancy recognition rate, low resource occupation and high robustness

Pending Publication Date: 2021-11-19
FUDAN UNIV SHANGHAI CANCER CENT
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of byte-level deduplication is that the deduplication rate is high, and the disadvantage is that the deduplication efficiency is low, the speed is slow, and the difference content accounts for a large proportion of the content, and it will also occupy a large amount of system storage resources. It is usually used in the post-redundancy method.
[0013] In addition, considering the widespread use of distributed storage systems, traditional redundant data identification methods are not applicable in terms of performance and accuracy, and there are certain defects.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Redundant data marking and removing method
  • Redundant data marking and removing method
  • Redundant data marking and removing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] figure 2 It is a flowchart of a method for marking and removing redundant data in the present invention. like figure 2 As shown, the present invention provides a kind of redundant data marking and removal method, and described method comprises the following steps:

[0044] When writing a file, the file is dynamically and variable-length segmented to form multiple data blocks of different lengths;

[0045] grouping the plurality of data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a redundant data marking and removing method, and belongs to the technical field of data storage. The method comprises the following steps: when a file is written, performing dynamic variable-length segmentation on the file to form a plurality of data blocks with different lengths; grouping the plurality of data blocks to obtain a data block group, and calculating the Bloom value of each data block and the Bloom value of the data block group; processing the Bloom value of the data block to form a characteristic value of the data block; judging whether the characteristic value of the data block exists in a metadatabase or not; if the characteristic value of the data block exists in the metadatabase, calculating the Bloom value of the data block again, comparing the Bloom value of the data block with the Bloom value of each data block group in the metadatabase, positioning a similar group of the data block, and determining a redundant data block; and marking the redundant data blocks, and deleting or retaining the redundant data blocks according to a preset strategy. The method has the advantages of high redundancy recognition rate, high reliability, high robustness and less resource occupation.

Description

technical field [0001] The invention belongs to the technical field of data storage, and in particular relates to a method for marking and removing redundant data. Background technique [0002] Duplication-mark is a data reduction technique designed to reduce the storage capacity used in a storage system. It eliminates excessive redundant data by finding redundant variable-sized data blocks in different locations in different files, marking and deleting redundant data in the data storage system, and retaining only one or a few necessary copies. Redundant data marking technology can greatly reduce the consumption of physical storage space, improve the efficiency of services such as retrieval, save transmission bandwidth, and allow users to perform efficient and economical backup data replication between different sites. [0003] Redundant data marking and removal technology can be divided into pre-processing (Pre-Process), online processing (Online-Process) and post-processi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06
CPCG06F3/0604G06F3/0614G06F3/0652G06F3/067
Inventor 朱敏俊王奕黄宗浩李渊张晖厉励张逸鲁高宇戴梅黄麒玮蔡云飞曹斌石强王正源王骏杰于镆铘崔敏杰胡佳迎
Owner FUDAN UNIV SHANGHAI CANCER CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products