A data deduplication method and device

A technology of data and data blocks, applied in the computer field, can solve the problem of reducing the capacity of the sample comparison library, etc., and achieve the effects of avoiding network congestion, reducing network operating costs, and reducing the size

Active Publication Date: 2017-11-17
HUAWEI TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a data deduplication method and device to solve the problem of reducing the capacity of the sample comparison library under the premise of ensuring the deduplication rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data deduplication method and device
  • A data deduplication method and device
  • A data deduplication method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0061] see figure 1 , is a schematic flowchart of the first embodiment of the data deduplication method of the present invention. In this embodiment, the method includes:

[0062] S101. Receive data to be saved sent by a user, divide the data to be saved into multiple data blocks according to a preset unit, and calculate a fingerprint of each data block in the multiple data blocks.

[0063] Specifically, the preset unit here can be set according to the storage con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses a data duplicate removing method. The data duplicate removing method includes: receiving to-be-saved data sent by a user, dividing the to-be-saved data into multiple data blocks according to a preset unit, and computing a fingerprint of each data block of the multiple data blocks; confirming at least one comparison user from a saved user library according to user characteristics of the user, wherein the at least one comparison user own at least one identical user characteristic with the user; taking corresponding fingerprints of the at least one comparison user as a sample comparison library; comparing the fingerprint of each to-be-compared data block in the multiple data blocks with the fingerprints in the sample comparison library; storing discrepant data blocks of the multiple data blocks, wherein the fingerprint of each data block in the discrepant data blocks is different from the fingerprints of the corresponding fingerprints of the at least one comparison user. The embodiment of the invention further discloses a data duplicate removing device. By the aid of the data duplicate removing and the data duplicate removing device, capacity of the sample comparison library is reduced on the premise that a repeated deleting ratio is guaranteed.

Description

technical field [0001] The invention relates to the field of computers, in particular to a data deduplication method and device. Background technique [0002] With the application of cloud computing technology, different users can upload their own data to the server, and the original isolated information islands can be organically linked through cloud computing technology. However, due to the duplication of data between different users, especially similar users, for example, the proportion of duplicate data between users in the same field, the same company, and the same department is higher. Storing duplicate data not only wastes storage resources, but also increases the amount of data transmitted in the network, easily causing network congestion and increasing network operating costs. [0003] In order to reduce the repeated storage of the same data, in the prior art, the saved data is divided into data blocks of the same capacity according to the preset unit and the finge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1748
Inventor 周景才
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products