Fault-tolerant coding method, device and system for improving expandability of data deduplication system

A fault-tolerant coding and scalability technology, applied in the field of computer storage

Active Publication Date: 2020-10-27
HUAZHONG UNIV OF SCI & TECH
View PDF10 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the cloud environment, the flexible and elastic mechanism of the free-scaling cluster with the change of the storage scale is an important feature, but the fault-tolerant coding of the existing deduplication system is difficult to achieve both high availability and high scalability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fault-tolerant coding method, device and system for improving expandability of data deduplication system
  • Fault-tolerant coding method, device and system for improving expandability of data deduplication system
  • Fault-tolerant coding method, device and system for improving expandability of data deduplication system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] A fault-tolerant encoding method to improve the scalability of data deduplication systems, such as figure 1 shown, including:

[0044] When adding Δk=1 node in the data deduplication system, each k+Δk=3 containers with local associations are divided into an associative container group, and the encoding method is extended from RS(2,2) to RS (3,2); The newly added node is node 4, such as figure 1 As shown, three containers are used as an associated container group, denoted as G;

[0045] for figure 1 In the associated container group G, select one of the containers, that is, the third container, as the container to be migrated, and transfer the two data blocks D in the container to be migrated 5 and D 6 Evenly migrate to the newly added nodes; for each unmigrated container C in the associated container group G, that is, the first container or the second container, select a data block from each newly added node and the data block in the container C The combination of ...

Embodiment 2

[0069]A fault-tolerant coding device for improving the scalability of a data deduplication system, comprising: an associated container group division module, a data block migration module, an extended coding module, and a garbage collection module;

[0070] Associative container group division module, used to divide every 3 containers with locality association into an associative container group when adding 1 node in the data deduplication system, and extend the coding method from RS(2,2) is RS(3,2);

[0071] The data block migration module is used to, for each associated container group G, select one of the containers as the container to be migrated, and evenly migrate a total of two data blocks in the container to be migrated to one newly added node;

[0072] The extended encoding module is used to select a data block from each newly added node and combine it with the data block in container C for each unmigrated container C in the associated container group G to obtain 3 da...

Embodiment 3

[0076] A data deduplication system, the data deduplication system includes the error-tolerant encoding device provided in the second embodiment to improve the scalability of the data deduplication system.

[0077] Generally speaking, the fault-tolerant coding method, device and system provided by the present invention to improve the scalability of the data deduplication system, when the cluster expansion occurs, the expansion efficiency is greatly increased compared with the traditional coding expansion efficiency in the container, and the expansion efficiency of the cluster is improved. While scaling performance, it guarantees the degraded read and node recovery performance of the system, and at the same time has lower storage overhead compared with inter-container encoding.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fault-tolerant coding method, device and system for improving the expandability of a data deduplication system and belongs to the field of computer storage. The method comprises the following steps of: when delta k nodes are newly added in a data deduplication system, dividing every k + delta k containers with local association into an associated container group, and expanding an encoding mode from RS (k, m) to RS (k + delta k, m); for each associated container group G, uniformly migrating total delta k * k data blocks in delta k containers in the associated containergroup G to a newly added node; for each non-migrated container C in the associated container group G, selecting one data block from each newly added node to be combined with the data block in the container C; calculating check blocks PC1 '-PCm' corresponding to the k + delta k data blocks obtained by combination according to RS (k + delta k, m), storing the check blocks PC1 '-PCm' into the node,and forming a new strip by the k + delta k data blocks obtained by combination and the check blocks PC1 '-PCm'; and deleting the old check block of each container from the node. According to the invention, the expandability of the data deduplication system can be effectively improved.

Description

technical field [0001] The invention belongs to the field of computer storage, and more specifically relates to a fault-tolerant encoding method, device and system for improving the scalability of a data deduplication system. Background technique [0002] With the rapid development of technologies such as cloud computing and big data, the explosive growth of various types of stored data in the world has made modern data centers face two severe challenges at the same time, reducing storage costs and improving data reliability. For the problem of storage cost, the current common solution in the industry is to reduce data redundancy and storage overhead through data deduplication. Specifically, it first divides the backup file stream into a set of fixed-size or variable-size data blocks. Pack variable-length data blocks into fixed-size containers, and then use a hash algorithm to calculate the fingerprint of each block to uniquely represent the block. A new block fingerprint i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F3/06G06F11/10
CPCG06F3/0641G06F11/1048
Inventor 胡燏翀冯丹周嘉伟
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products