File data replication method for quick deduplication

A technology of file data and copying method, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of uncertain distribution, reduced copying efficiency, economic loss, etc.

Inactive Publication Date: 2011-04-27
COMMUNICATION UNIVERSITY OF CHINA
View PDF3 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, even if the content of the file is changed, however, for some industries, such as the film and television industry, IT industry, etc., the number of files that need to be copied regularly is large, and the size of a single file is also large, and the modified part is usually large, and The distribution of the modified position in the file is not certain either. For example, the content after a large number of modifications exists in the middle or the end of the entire file. , to back up the file data, it is necessary to compare all the contents. Therefore, a lot of information that has not been modified is extracted for comparison, and the detection time in the early stage is greatly increased, which greatly reduces the efficiency of copying, and even worse It will cause the stagnation of industrial production, which will cause greater economic losses.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File data replication method for quick deduplication
  • File data replication method for quick deduplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0047] A method for quickly deduplicating file data replication, including:

[0048] Fingerprint acquisition steps

[0049] Before the first target file in the storage medium is copied to the target file directory for the first time, the processor performs fingerprint calculation on the metadata information of the first target file to form an ID1 file, and The content of several file data segments of the target file is extracted at intervals according to a predetermined interval scale to perform fingerprint calculation to form an ID2 file, and the ID1 file and the ID2 file are stored in the database.

[0050] After copying the first target file to the target file directory for the first time and before copying the second target file in the storage medium to the target file directory, the processor Perform fingerprint calculation on the metadata information of the second target file to form an ID3 file, and perform fingerprint calculation on the content of several file data se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a file data replication method for quick deduplication. The method comprises the following steps of: before a first target file in a storage medium is wholly replicated to a target file directory for the first time, performing fingerprint calculation on metadata information of the first target file to form identity (ID1) by a processor, extracting contents of a plurality of file data segments from the first target file according to an interval with preset size and performing the fingerprint calculation to form ID2, performing the fingerprint calculation on the metadata information of a second target file to form ID3, and extracting contents of a plurality of file data segments from the second target file according to an interval with preset size and performing the fingerprint calculation to form ID4; comparing the ID1 with the ID3; if the ID1 is the same as the ID3, saving replication; if the ID1 is different from the ID3 and the ID2 is the same as the ID4, updating metadata of the first target file; and if the ID2 is different from the ID4, wholly replicating the second target file.

Description

technical field [0001] The invention relates to a method for duplicating data, in particular to a method for rapidly deduplicating file data. Background technique [0002] There are mainly two methods for existing file data copying: one is to deduplicate the entire file, and the other is to deduplicate internal data blocks of the file. The former refers to the improvement of the overall file deduplication algorithm. The traditional algorithm is called whole file detection (WFD). WFS technology uses files as the granularity to find duplicate data. First, fingerprint calculation (hash calculation) is performed on the entire file, and then the value is compared with the hash value of other files that have been stored. For comparison, if the same value is detected, only the file is replaced with a pointer, and if it is different, the entire file is transferred. The latter splits the file into smaller data segments and performs fingerprint calculation on the contents of the data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 朱立谷李强
Owner COMMUNICATION UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products