Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and system for deleting duplicate data

A technology for data deduplication and data duplication, applied in the field of data processing, can solve problems such as high load of database nodes, single node bottleneck, etc., and achieve the effects of fast duplication judgment, bandwidth saving, and storage space reduction

Active Publication Date: 2019-03-26
SHENZHEN INST OF ADVANCED TECH +1
View PDF11 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The data deduplication system divides the file into several data blocks according to the user-defined data division strategy, calculates the Hash value of the data block, and stores the Hash value in the database for centralized storage. However, with the expansion of the cluster scale, the Hash There are more and more values, and centralized storage is likely to cause high load on database nodes, resulting in single-node bottlenecks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for deleting duplicate data
  • A method and system for deleting duplicate data
  • A method and system for deleting duplicate data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

[0034] see figure 2 , is a flowchart of file storage in the storage system data deduplication method according to the first embodiment of the present application. The storage system data deduplication method file storage in the first embodiment of the present application includes the following steps:

[0035] Step 100: the client divides the file into fixed-length data blocks according to the set object size;

[0036] Step 110: use a secure hash algorithm to calculate the Hash value of the data block;

[0037] In step 110, the secure hash algorithm used is: SHA-1 (English: Secure Hash Al...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and system for deleting duplicate data is disclosed. The method comprises the following steps: a, dividing the file into data blocks of fixed length by a client according to the set object size; B, calculating that Hash value of the data block by using a secure hash algorithm; (c) judging whether that data block is repeat according to the Hash value, encapsulating the data blocks into different types of object according to the judging result, hashing calculation according to the name of the object to obtain the cluster node and the disk position that the object should be stored; Stepd: directly communicating with the cluster node stored in the object according to the hash calculation result by the client, and storing the object to the corresponding disk location. The invention not only reduces the storage space occupied by the duplicate data, but also saves the bandwidth occupied by the transmission of the duplicate data, maintains the characteristic that the Ceph distributedstorage system has no central node, and does not affect the original storage process of the storage system.

Description

technical field [0001] The present application belongs to the technical field of data processing, and in particular relates to a method and system for deduplication of data. Background technique [0002] With the continuous advancement of social informatization and the growth of data volume, the storage demand of enterprise centers is becoming larger and larger, and enterprises are faced with more and more time points for rapid backup and recovery. The cost of managing and saving data and data center space and Energy consumption is also becoming more and more serious, and distributed storage systems have many advantages in data storage. The Ceph distributed storage system is an ecosystem with rich features and connections to many open source projects. With a complete infrastructure and a relatively stable operating mode, Ceph is being used by more and more enterprise users. Studies have found that up to 60% of the data stored in application systems is redundant, and more an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/174G06F16/13
Inventor 王锦鹏王和康林鹏程刘毅刘凯王洋须成忠
Owner SHENZHEN INST OF ADVANCED TECH