Duplicated data search method and equipment

A technology for duplicating data and equipment, applied in the storage field, can solve the problems of large single instance database, low efficiency of duplicate block query, and inability to put all of them into memory

Inactive Publication Date: 2013-07-03
HUAWEI TECH CO LTD
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The single-instance library is usually too large to fit into the memory. It is usually placed on the disk. In this way, when querying whether the block is a duplicate block, it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicated data search method and equipment
  • Duplicated data search method and equipment
  • Duplicated data search method and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0071] figure 1 It is a flowchart of a duplicate data retrieval method provided by an embodiment of the present invention. Such as figure 1 As shown, the method of the present embodiment includes:

[0072] Step 101 , perform block processing on the received data, and obtain at least two data blocks.

[0073] The execution subject of this embod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

One embodiment of the invention provides a duplicated data search method and equipment. The method comprises performing partitioning treatment on received data to obtained at least two data partitions, grouping the at least two data partitions to obtain at least one data packet, performing similarity Hash calculation on each data packet to obtain a Hash value of the data packet, and obtaining a first Hash value having a similarity with the Hash value of the data packet larger or equal to a first similarity threshold from a Hash value storage list. If the similarity between the Hash value of the data packet and the first Hash value is larger or equal to a second similarity threshold, the data partitions in the data packet are subjected to duplicated block research. The technical scheme of the invention improves the duplicated block query efficiency and improves the whole performance of data de-duplicated technology.

Description

technical field [0001] The invention relates to storage technology, in particular to a repeated data retrieval method and device. Background technique [0002] Data deduplication (De-duplication in English) is a data reduction technology designed to reduce the storage capacity used in the storage system or reduce the amount of data transmission in the network. It is widely used in data backup or wide area network data transmission scenarios. The process of data deduplication is: divide the input data into blocks, calculate the hash (Hash) value of each block, use the calculated Hash value to search in the single instance database to determine whether the block is a duplicate block, if If it is a repeated block, the block and its Hash value will not be stored in the single instance library, so as to achieve the purpose of reducing data. [0003] The single-instance library is usually too large to fit into the memory. It is usually placed on the disk. In this way, when queryi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F3/06
CPCG06F16/1748G06F16/152
Inventor 覃强
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products