Supercharge Your Innovation With Domain-Expert AI Agents!

Method for removing repeating data before data storage

A technology for data storage and data duplication, applied in the input/output process of data processing, electrical digital data processing, special data processing applications, etc. problem, to achieve the effect of improving effective utilization, reducing the probability of deleting false positives, and reducing bandwidth utilization.

Inactive Publication Date: 2015-01-14
北京中科同向信息技术有限公司
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The present invention proposes a method for deleting duplicate data before storage in order to solve the problem of low computer storage space utilization and data reduction in computer data archiving, storage, backup, remote disaster recovery, and disaster recovery. Post-Storage Deduplication Chance of False Positives

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for removing repeating data before data storage
  • Method for removing repeating data before data storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The present invention proposes a method for removing duplicate data before data storage, and the specific data storage process is as follows figure 1 , first obtain the data to be stored, which we call the data to be processed, and judge whether the organization structure of the data to be processed is consistent with the existing data. If it is consistent, obtain the data cutter of its structure type, then load the data from the hard disk to the memory, and pass in the data to be processed; if not, obtain the data cutter of the data structure, and pass in the data to be processed. The essence of the data cutter is an algorithm for dividing data into blocks. The size of the data block is set. A variable-sized data block can be divided by a sliding window. When the Hash value of the sliding window matches a reference value , a block is created. The data to be processed is divided into sub-data blocks by the cutter, and the MD5 value of the data block is calculated to ge...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for removing repeating data before data storage according to organization characters of data to be processed. The method aims to solve the problem of identification and removal of repeating data before data storage, and is characterized by comprising the steps that the data to be processed are cut into sub data blocks with different lengths according to the organization characters of the data to be processed, a standard identifier is generated for each sub data block to identify whether repeating data exist, then the data are processed before data storage, and the possibility of deleting and misjudgment of repeating data after storage is lowered. The method is usually used for identifying repeating data and only storing one data while neglecting the others in the processes of computer data archiving, storage, backup, remote disaster tolerance and disaster recovery, the effective utilization rate of storage space of a computer is improved, the bandwidth availability ratio is lowered, the possibility of deleting and misjudgment of repeating data after storage is lowered, and data consistency is guaranteed.

Description

technical field [0001] The invention relates to a method for removing duplicate data before data storage, and belongs to the field of computer data processing. Background technique [0002] In recent years, the demand for using computers for data storage is increasing, and the requirements for the speed and efficiency of data storage are also getting higher and higher. At present, the expansion of enterprise data storage is increasing, and the amount of data will double in a short period of time, which will cause great financial pressure on enterprises. [0003] Data deduplication is a mainstream and very popular storage technology that can effectively optimize storage capacity. Data deduplication is the process of comparing an incoming data stream with data previously held in the system, finding redundant sub-file information, and saving only one version of the file information. This technique is very valuable during backups because most of the data is the same, especiall...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F3/06G06F17/30
CPCG06F3/0638
Inventor 邬玉良
Owner 北京中科同向信息技术有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More