Data deduplication method and device, electronic equipment and computer readable storage medium

A data and data block technology, applied in the field of data processing, can solve the problems of inability to achieve data deduplication, IO performance degradation, and not supporting massive data search for duplicate data.

Active Publication Date: 2020-08-11
ALIBABA CLOUD COMPUTING LTD
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Currently, Windows 2012 implements deduplication of data within a single disk on the New Technology File System (NTFS, New Technology File System), but this implementation method has the following defects: 1. Data deduplication can only be realized within a single disk, not Global data deduplication; 2. Does not support searching for duplicate data in massive data; 3. Searching for duplicate data will cause IO performance to decline

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data deduplication method and device, electronic equipment and computer readable storage medium
  • Data deduplication method and device, electronic equipment and computer readable storage medium
  • Data deduplication method and device, electronic equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0090] Hereinafter, exemplary embodiments of embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily realize them. Also, for clarity, parts not related to describing the exemplary embodiments are omitted in the drawings.

[0091] In the embodiments of the present invention, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts or combinations thereof disclosed in this specification, and are not intended to The possibility that one or more other features, numbers, steps, acts, parts, parts or combinations thereof exist or be added is excluded.

[0092] In addition, it should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other. The embodiments of the present invention will be descr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention disclose a data deduplication method, a data deduplication device, electronic equipment and a computer readable storage medium. The data deduplication method comprises thesteps of: acquiring a to-be-processed data container; searching a data container of which the data similarity with the to-be-processed data container meets a preset condition from existing data containers, and taking the data container as a target data container; and comparing the to-be-processed data container with the target data container, confirming duplicated data, and deleting the duplicated data in a post-processing flow. According to the data deduplication method and the data deduplication device, mass data deduplication in the global range can be achieved, and the purpose of saving storage space is achieved on the premise that the user IO performance is not reduced.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of data processing, and in particular to a data deduplication method, device, electronic device, and computer-readable storage medium that can be executed in a post-processing flow. Background technique [0002] With the development of data technology, users have higher and higher requirements for high-performance storage, especially in cloud computing block device storage, such as higher read and write times per second (IOPS) and lower latency (Latency) , Because of this, the cost of high-performance storage has also increased significantly, such as all-flash storage arrays, non-volatile memory host controller interface specification solid-state hard disk storage (NVME SSD, Non-Volatile Memory Express Solid State Disk) and so on. In this case, it becomes very meaningful if the space occupied by storage can be saved without reducing the performance of user input and output (IO, Input Ou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06G06F16/13G06F16/17G06F16/174G06F16/182
CPCG06F3/0608G06F3/0641G06F3/067G06F16/1734G06F16/1748G06F16/134G06F16/182
Inventor 佘海斌
Owner ALIBABA CLOUD COMPUTING LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products