Efficient duplicate removal method for repeated redundant data in cloud storage system

A cloud storage system and redundant data technology, applied in the input/output process of data processing, electrical digital data processing, instruments, etc., can solve the problems of system deduplication effect, reduce system storage performance, etc., and achieve efficient global deduplication Repeat rate, reduce the number of fingerprint matching, improve the effect of deduplication performance

Active Publication Date: 2016-04-13
TSINGHUA UNIV
View PDF2 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of cluster deduplication, due to the consideration of the overall system performance overhead, global deduplication across nodes will seriously reduce system storage performance, so it is generally used to deduplicate data only on the internal data of the node
Therefore, the cluster deduplication system based on cluster deduplication technology will have a great impact on the overall deduplication effect of the system in terms of data routing.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient duplicate removal method for repeated redundant data in cloud storage system
  • Efficient duplicate removal method for repeated redundant data in cloud storage system
  • Efficient duplicate removal method for repeated redundant data in cloud storage system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0027] An efficient deduplication method for redundant data in a cloud storage system according to an embodiment of the present invention will be described below with reference to the accompanying drawings.

[0028] combine first figure 2 As shown, the cloud storage system in the embodiment of the present invention includes, for example, a cluster composed of multiple data storage servers and multiple clients, wherein the cluster composed of multiple data storage servers includes a data server cluster composed of multiple data servers a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes an efficient duplicate removal method for repeated redundant data in a cloud storage system. The method comprises the following steps: multiple clients receive data uploaded by a user, take a data superblock as a data route unit, and extract route feature fingerprints for data route selection from the data; a metadata server and a data server cluster handle data route selection requests of the clients according to a route policy, wherein the data server cluster performs similar fingerprint matching on the route feature fingerprints to determine similar route nodes, and the metadata server determines a final data route address according to a load balancing policy; and the clients interact with corresponding data servers, and the data servers receiving similar redundant data perform efficient and quick duplicate removal on the redundant data. According to the method, the cloud storage system can achieve an efficient repeated redundant data removal effect while keeping high-performance, large-scale and high-throughput system properties, the disk utilization rate is increased, and the data management cost is reduced.

Description

technical field [0001] The invention relates to the technical field of computer information storage, in particular to an efficient deduplication method for redundant data in a cloud storage system. Background technique [0002] Data deduplication technology is a special data compression technology for coarse-grained lossless redundant data removal. The data deduplication technology divides the data at a coarse granularity, then extracts the Hash fingerprint of the data block through the fingerprint calculation technology, and judges whether the data is redundant through the index query of the fingerprint. If the content of the new data block is the same as that of the existing data in the storage system , the new data will not be stored, but the storage location of the data will be determined by saving the pointer to the original data block. The overhead for pointer storage is far less than the space occupied by the data. Therefore, the data deduplication technology can eff...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06
CPCG06F3/0613G06F3/0659G06F3/067G06F3/0674
Inventor 张广艳杨松霖舒继武郑纬民
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products