Rapid data de-duplication method adapted to big data application

A technology for deduplication and big data, which is applied in special data processing applications, redundancy in operations, data error detection, response error generation, etc. It can solve the problem of low deduplication rate and inability to effectively adapt to complex Changeable application environment, unsuitable big data application environment and other problems, to achieve the effect of reducing the backup window and storage overhead

Active Publication Date: 2013-09-25
和宇健康科技股份有限公司
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can effectively detect redundant data for data that is easy to modify, but due to the frequent calculation of fingerprint values ​​during the window sliding process, the deduplication rate is low, and it is not suitable...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid data de-duplication method adapted to big data application
  • Rapid data de-duplication method adapted to big data application

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The method of the present invention will be described in detail below with reference to the accompanying drawings.

[0038] see Figure 1 to Figure 2 As shown, the present invention is to deduplicate the redundant data existing in the backup process. Considering the impact of the existing deduplication method on the backup window under the big data application and the problem of limited scope of application, the combination becomes block and fixed The advantage of the length block algorithm is to use the deduplication factor and the acceleration factor to ensure the deduplication rate and greatly improve the deduplication rate. The specific ideas of the method of the present invention are as follows: figure 1 shown.

[0039]Data deduplication is suitable for application environments with a large amount of redundant data, such as backup systems, E-mail systems, data migration, and disaster recovery. In these application environments, a high deduplication rate can be ach...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a rapid data de-duplication method adapted to a big data application. The rapid data de-duplication method is applied to backup de-duplication systems under the big data application and solves the problems that the existing variable-length partition algorithm based on content identification is low in de-duplication rate and fails to identify redundant data rapidly. According to the rapid data de-duplication method, through adjusting de-duplication factors and acceleration factors in a partition process, the de-duplication rate is substantially improved on the premise that the de-duplication ratio is ensured, de-duplication detection can be performed rapidly, the contradiction between the de-duplication ratio and the de-duplication rate is balanced, backup windows are reduced, and network bandwidth and memory spaces are saved.

Description

technical field [0001] The invention belongs to the technical field of computer information storage, and in particular relates to a fast data deduplication method suitable for big data applications. Background technique [0002] In the information age, with the fissile growth of data, the era of big data is coming. The so-called big data means meeting the following characteristics: huge data volume, various types, low value density, and fast generation speed. In the era of big data, there is a large amount of redundant data in the process of data backup and storage. How to eliminate duplicate data in the backup process to reduce storage space and network bandwidth consumption has become a hot research topic in the storage field. [0003] The most effective way to eliminate redundant data in the backup process is to use data deduplication technology. It is generally believed that deduplication technology includes file-level full-file deduplication technology, block-level fi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F11/14
Inventor 张兴军朱国峰董小社朱跃光王龙翔姜晓夏
Owner 和宇健康科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products