Index generation method and device for repeated data deletion

A technology of deduplication and generation device, which is applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as excessive amount of index information, save storage space, reduce the amount of index information, reduce The effect of reading and writing pressure

Active Publication Date: 2014-03-05
HUAWEI TECH CO LTD
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Embodiments of the present invention provide an index generation method and device for data deduplication to solve the problem of excessive index information in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index generation method and device for repeated data deletion
  • Index generation method and device for repeated data deletion
  • Index generation method and device for repeated data deletion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] The first embodiment of the present invention provides a deduplication index generation method, the method flow chart is as follows figure 1 As shown, the method includes:

[0044] Step 101: Receive a data stream composed of a plurality of data slices, each of the data slices corresponds to a fingerprint, each of the data slices corresponds to a number, and the order of the numbers is the same as the order of the multiple data slices in the data stream ;

[0045] Step 102: When the numbers of two or more data slices are adjacent, and the fingerprints of the two or more data slices with adjacent numbers are also adjacent to the data slice IDs of the corresponding data slices in the single instance database, then generating a merged index information according to two or more data slices with adjacent numbers;

[0046]The single instance library includes a plurality of data units, each data unit stores a data piece and a fingerprint of the stored data piece, and the data...

Embodiment 2

[0055] In order to further improve the method provided in the first embodiment and to supplement the first embodiment, the second embodiment of the present invention provides an index generation method for deduplication to solve the problem of excessive index information. The flow chart of the method is as figure 2 As shown, the method includes:

[0056] Step 201: receiving a data stream composed of multiple data slices, wherein each data slice corresponds to a fingerprint, and each data slice corresponds to a serial number;

[0057] For example, a data stream F with a size of about 50KB includes 6 data slices with an average length of about 8K. Each data slice corresponds to a number 1, 2, 3, 4, 5, and 6. Multiple data slices are in the same order in the data stream, that is, data slice 1 (7K), data slice 2 (9K), data slice 3 (12K), data slice 4 (4K), data slice 5 (10K ) and data slice 6 (8K). The fingerprint of each data slice is used to identify the data slice. The data ...

Embodiment 3

[0081] According to the method provided in Embodiment 1 or Embodiment 2, when the data slice in the single instance library corresponding to each data slice of the data stream is not indexed by any other data stream to create index information, and in order to further reduce the index information amount, the third embodiment of the present invention provides a deduplication index generation method, the method flow chart is as follows image 3 As shown, the method includes:

[0082] Step 301: Receive a data stream composed of multiple data slices, wherein each data slice corresponds to a fingerprint, and each data slice corresponds to a serial number;

[0083] This step 301 is the same as or similar to the step 201 of the second embodiment, and will not be repeated here.

[0084] Step 302: Search in the single-instance database according to the fingerprint of each data slice of the data stream, and find the data slice of the data stream corresponding to the fingerprint not fou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an index generation method and device for repeated data deletion. The method comprises the steps that a data flow composed of a plurality of data pieces is received, each data piece corresponds to a fingerprint and a serial number, and the sequence of the serial numbers are the same as the sequence of the data pieces in the data flow; when the serial numbers of more than two data pieces are adjacent, and data piece IDs of data pieces, corresponding to the fingerprints of the more than two data pieces with the adjacent serial numbers, in a single case library are also adjacent, a piece of merged index information is generated according to the more than two data pieces with the adjacent serial numbers. According to the embodiment, the piece of merged index information is generated according to the more than two data pieces with the adjacent serial numbers, so that the number of pieces of index information is reduced, the storage space is saved, and the read-write pressure generated when the data flow is recovered is lowered.

Description

technical field [0001] The invention relates to the technical field of data storage, in particular to an index generation method and device for deduplication data. Background technique [0002] At present, with the development of global informatization, enterprise data is growing explosively, and data backup has become an important means for enterprises to ensure that data is not lost. Data deduplication technology is a data reduction technology used to reduce the storage space used by redundant data in the storage system. [0003] The index generation method for deduplication data provided by the prior art includes: [0004] Step 1: Receive a data stream composed of multiple data slices, each data slice corresponds to a fingerprint; [0005] Step 2: searching in the single instance database according to the fingerprint of each data slice of the data flow, and saving the data slices of the data flow that are not found in the single instance database into the single instanc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2272G06F16/24556
Inventor 刘先刚
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products