A method and a device for deleting duplicate data

A data deduplication and data technology, applied in the field of data processing, can solve the problems of slow reading speed, extended reading time, low efficiency, etc., and achieve the effect of improving efficiency and increasing data reading speed

Inactive Publication Date: 2018-12-11
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] When using the weak HASH algorithm combined with data byte-by-byte comparison to deduplicate data, it is necessary to read the data corresponding to the stored data fingerp

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and a device for deleting duplicate data
  • A method and a device for deleting duplicate data
  • A method and a device for deleting duplicate data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0032] In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

[0033] The DDP technology is very important for reducing the occupation of storage space, especially for the All Flash Array (AFA) with high storage space cost. When the traditional method of using weak HASH algorithm combined with data byte-by-byte comparison for data deduplication, it is necessary to read the data corresponding to the stored data fingerprint from the disk, and read...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a method and a device for deleting duplicate data. The method include: when determining whether the data to be processed needs to be deleted, obtaining a corresponding relationship between a data fingerprint stored in a cache memory and a buffer block storing the data, and if the data to be processed is obtained, calculating a first data fingerprint of thedata to be processed is calculated; if there is a second data fingerprint matched with the first data fingerprint in the stored data fingerprint, preferentially judging whether there is target data in the buffer block corresponding to the second data fingerprint in the cache memory; if there is, reading the target data from the buffer block corresponding to the second data fingerprint without reading the target data from the disk, and comparing the data to be processed with the target data byte by byte so as to delete the data to be processed when it is determined that the data to be processed is identical to the bytes of the target data. The method preferentially reads target data from a cache memory, reduces the number of disk reads, improves the data reading speed, and improves the efficiency of deleting duplicate data.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a method and device for deduplication of data. Background technique [0002] In today's big data era, massive data storage occupies a large amount of storage space, which may include a large amount of redundant data, such as duplicate data. In this case, data deduplication (Data Deduplication, DDP for short) is a core technology that can reduce the storage space occupied by data. [0003] Currently used DDP algorithms may include a strong hash (HASH) algorithm, and a weak HASH algorithm combined with data byte-by-byte comparison. Among them, the weak HASH algorithm combined with data byte-by-byte comparison is a commonly used DDP algorithm. It mainly calculates a data fingerprint for newly written data, and then compares the data fingerprint with the stored data fingerprint. If the two match If successful, read the data corresponding to the stored data fingerprint and c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F3/06
CPCG06F3/0611G06F3/0641G06F3/0656G06F3/0676
Inventor 何孝金
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products