Method and system for implementing repeated data deletion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for duplicating data and data, which is applied to the redundancy in operations for data error detection, electrical digital data processing, special data processing applications, etc.

Inactive Publication Date: 2010-11-10

无锡北方数据计算股份有限公司

View PDF0 Cites 58 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The downside of this method is that it requires a supported backup application device so that the device can extract the metadata

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0020] To achieve deduplication, it is necessary to find the same data, then build a high-speed index, and use the index to replace the same data. The key points of realization are how to find the same data, and how to build a fast index. The traditional data deduplication technology marks by calculating the hash value of the data, and maintains the index through a large number of caches. It is difficult to achieve fast indexing, and if the hash value is used as the data fingerprint, hash conflicts are inevitable. Although the probability is very low, once it occurs, it will cause unpredictable data errors.

[0021] The principle of the Simhash (similarity hash) algorithm is: each token in the data is mapped to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for implementing repeated data deletion, which comprises the following steps of: calculating similarity of data by a simhash (similarity hash) algorithm; reckoning the similarity into an index library through a similarity positioning storage position; writing the data into a data warehouse; if the data with the same similarity enters the data warehouse, extracting the corresponding data in the data warehouse, and then performing binary comparison; and if the data are the same, recording indexes, otherwise, recording different data parts. The invention also provides a system for implementing the repeated data deletion, which comprises a similarity marking library (BitMap), a data offset marking library, the data warehouse (LBAMap) and a storage library (Resp) for recording initial data. The method and the system ensure data consistency by data comparison based on the simhash algorithm theory, efficiently finish the repeated data deletion, and ensure the consistency of the data.

Description

technical field [0001] The invention relates to the technical field of computer storage, in particular to a method and system for realizing deduplication of data. Background technique [0002] The key business data of the enterprise will be backed up every day. According to the customization of backup strategy, incremental backup can be done every day, and full backup can be done every week. Big, but with the accumulation of a large amount of data, a large amount of data is repeated, and repeated data leads to a geometric progression in the amount of data. For example, if the initial data volume of an ERP system is 100TB, 10TB is added every day, incremental backup is done 6 days a week, full backup is done on weekends, and the backup data in a week reaches 160TB. However, using Data Deduplication technology, the initial data of 100TB does not need to be backed up repeatedly. It is further found that the incremental data of 10TB per day can be compressed to 1TB. Therefore,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F11/14

Inventor 张庆敏胡刚谢海威郭栋

Owner 无锡北方数据计算股份有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and system for implementing repeated data deletion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology