A parallel data deduplication method and system

A technology of data deduplication and data block, which is applied in the fields of electronic digital data processing, special data processing applications, instruments, etc. It can solve the problem of inability to achieve 100% deduplication, limited network transmission bandwidth limitations, and deduplication processing performance bottlenecks, etc. problems, to achieve good scalability, maximize the utilization of existing resources, and solve the effect of processing performance bottlenecks

Active Publication Date: 2017-02-15
HUAZHONG UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the above defects or improvement needs of the prior art, the present invention provides a parallel data deduplication method, the purpose of which is to solve the deduplication processing performance bottleneck and limited scalability existing in the existing single-node deduplication data system, so that it cannot be deduplicated Linear expansion with the expansion of the system scale, limited by the bandwidth limit of network transmission, technical problems that cannot achieve 100% deduplication

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A parallel data deduplication method and system
  • A parallel data deduplication method and system
  • A parallel data deduplication method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0057] Such as figure 1 As shown, the computer cluster of the present invention includes multiple clients, query nodes and multiple data nodes, wherein the clients, query nodes and data nodes are connected through switches, and the three can communicate with each other.

[0058] Such as figure 2 As shown, the parallel data deduplication method of the present invention comprises:

[0059] 1. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallel repeated data deleting method. The method comprises the following steps that a client side firstly carries out block partitioning on data needing to be deleted again, the Hash fingerprint of each block is calculated according to a Hash function, and then the Hash fingerprints are sent to different re-deleting server nodes to carry out duplicate checking work of the fingerprints. If a certain fingerprint is confirmed to be a repeated fingerprint after comparison, only the metadata on a query server and a fingerprint counter on the re-deleting server nodes need to be updated. If a certain fingerprint is conformed to be a new data block through comparison, the data block is transmitted to a re-deleting server, and the metadata in a fingerprint database and the metadata in the query server are updated. The parallel repeated data deleting method has an expandability of re-deleting nodes, the needs of performance can be met by expanding different nodes according to different needs, the parallelism between multiple nodes is utilized to improve the performance of a re-deleting system effectively, and efficient and reliable services can be provided.

Description

technical field [0001] The invention belongs to the technical field of computer storage, and more specifically relates to a method for deleting duplicate data in parallel. Background technique [0002] With the development of information technology and the advancement of science and technology, the storage of massive information poses severe challenges to storage systems, and the pressure of PB-level or even EB-level information storage on data centers is also increasing. Since there are a lot of duplicate data in massive data, if these duplicate data are directly stored, it will not only increase the burden on the storage system, but also occupy valuable network bandwidth. Data de-duplication technology can eliminate duplication in data and keep only the only copy of data, thereby reducing the physical storage space required by data, improving storage efficiency, and reducing the occupation of network bandwidth by transmitting duplicate data. [0003] At present, many data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1752G06F16/1767G06F16/183
Inventor 曹强万胜刚林川黄国强谢长生
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products