Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and a device for deleting duplicate data

A data deduplication and data technology, applied in the field of data processing, can solve the problems of slow reading speed, extended reading time, low efficiency, etc., and achieve the effect of improving efficiency and increasing data reading speed

Inactive Publication Date: 2018-12-11
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] When using the weak HASH algorithm combined with data byte-by-byte comparison to deduplicate data, it is necessary to read the data corresponding to the stored data fingerprint from the disk, and reading data from the disk has slow reading speed and time-consuming Extended longer features, therefore, traditional deduplication methods are inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a device for deleting duplicate data
  • A method and a device for deleting duplicate data
  • A method and a device for deleting duplicate data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to enable those skilled in the art to better understand the solution of the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0033] The DDP technology is very important for reducing storage space occupation, especially for All Flash Storage Arrays (All Flash Array, AFA for short) with high storage space costs. When the traditional weak HASH algorithm is combined with data byte-by-byte comparison to deduplicate data, it is necessary to read the data corresponding to the stored data fingerprint from the disk, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a device for deleting duplicate data. The method include: when determining whether the data to be processed needs to be deleted, obtaining a corresponding relationship between a data fingerprint stored in a cache memory and a buffer block storing the data, and if the data to be processed is obtained, calculating a first data fingerprint of thedata to be processed is calculated; if there is a second data fingerprint matched with the first data fingerprint in the stored data fingerprint, preferentially judging whether there is target data in the buffer block corresponding to the second data fingerprint in the cache memory; if there is, reading the target data from the buffer block corresponding to the second data fingerprint without reading the target data from the disk, and comparing the data to be processed with the target data byte by byte so as to delete the data to be processed when it is determined that the data to be processed is identical to the bytes of the target data. The method preferentially reads target data from a cache memory, reduces the number of disk reads, improves the data reading speed, and improves the efficiency of deleting duplicate data.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a method and device for deduplication of data. Background technique [0002] In today's big data era, massive data storage occupies a large amount of storage space, which may include a large amount of redundant data, such as duplicate data. In this case, data deduplication (Data Deduplication, DDP for short) is a core technology that can reduce the storage space occupied by data. [0003] Currently used DDP algorithms may include a strong hash (HASH) algorithm, and a weak HASH algorithm combined with data byte-by-byte comparison. Among them, the weak HASH algorithm combined with data byte-by-byte comparison is a commonly used DDP algorithm. It mainly calculates a data fingerprint for newly written data, and then compares the data fingerprint with the stored data fingerprint. If the two match If successful, read the data corresponding to the stored data fingerprint and c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F3/06
CPCG06F3/0611G06F3/0641G06F3/0656G06F3/0676
Inventor 何孝金
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products