Repeated data retrieval method and device

A technology for duplicating data and equipment, applied in the storage field, can solve the problems of low efficiency of duplicate block query, affecting the overall performance of deduplication technology, and not being able to put them all into memory.

Inactive Publication Date: 2016-05-25
HUAWEI TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The single-instance library is usually too large to fit into the memory. It is usually placed on the disk. In this way, when querying whether the block is a duplicate block, it needs to access the disk frequently. Due to the low disk access speed, the efficiency of the duplicate block query Low, affecting the overall performance of deduplication technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Repeated data retrieval method and device
  • Repeated data retrieval method and device
  • Repeated data retrieval method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0071] figure 1 It is a flowchart of a duplicate data retrieval method provided by an embodiment of the present invention. like figure 1 As shown, the method of the present embodiment includes:

[0072] Step 101 , perform block processing on the received data, and acquire at least two data blocks.

[0073] The execution subject of this embodim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided are a duplicate data retrieval method and device. The method comprises: segmenting received data to acquire at least two data segments; grouping the at least two data segments to obtain at least one data grouping; and as regards each data grouping, performing a similarity Hash algorithm on the data segments in the data grouping to acquire a Hash value of the data grouping, and acquiring a first Hash value of a first similarity threshold value which is greater than or equal to a Hash value similarity of the data grouping in a Hash value storage table, and if the Hash value of the data grouping and the similarity of the first Hash value are greater than or equal to a preset second similarity threshold value, duplicate segment retrieval is performed on the data segments in the data grouping. The technical solution of the present invention increases the search efficiency of a duplicate segment, improving the overall performance of the duplicate data deletion technique.

Description

technical field [0001] The invention relates to storage technology, in particular to a repeated data retrieval method and device. Background technique [0002] Data deduplication (De-duplication in English) is a data reduction technology designed to reduce the storage capacity used in the storage system or reduce the amount of data transmission in the network. It is widely used in data backup or wide area network data transmission scenarios. The process of data deduplication is: divide the input data into blocks, calculate the hash (Hash) value of each block, use the calculated Hash value to search in the single instance database to determine whether the block is a duplicate block, if If it is a repeated block, the block and its Hash value will not be stored in the single instance library, so as to achieve the purpose of reducing data. [0003] The single-instance library is usually too large to fit into the memory. It is usually placed on the disk. In this way, when queryi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F3/06
CPCG06F16/1748G06F16/152
Inventor 覃强
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products