Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Approximately optimal data fragment block rewriting method for data deduplication technology

A data- and data-block-oriented technology, applied in electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of small data recovery performance, large loss data deduplication rate, and adjustment of rewriting strategies. , to achieve the effect of improving data recovery performance, improving data deduplication rate and saving bandwidth

Active Publication Date: 2020-10-30
JINAN UNIVERSITY
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, data deduplication technology introduces data fragmentation, which seriously damages the read performance of the data deduplication system, that is, data recovery performance.
The main method to alleviate data fragmentation is to rewrite fragmented blocks, but the current method of rewriting fragmented blocks (referred to as the rewriting method) has the disadvantages of a large deduplication rate of lost data and a small improvement in data recovery performance.
There are two main reasons for this shortcoming: (1) The current rewriting method only sorts the containers referenced by a data segment according to the container reference rate, and selects the container with the "lowest" container reference rate one-sidedly and arbitrarily. As a result, the containers selected by them are not optimal, and thus the rewritten fragment blocks are not optimal; (2) The current rewriting algorithm cannot adaptively adjust the rewriting strategy according to different workloads

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Approximately optimal data fragment block rewriting method for data deduplication technology
  • Approximately optimal data fragment block rewriting method for data deduplication technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] Such as figure 1 , 2 As shown, the approximate optimal data fragment block rewriting method oriented to data deduplication technology disclosed by the present invention, the method sorts the containers referenced by the previous i data segments according to the container reference rate through a hash bucket array, and traverses the hash bucket array The bucket array selects the optimal (lowest container reference rate) x containers within the range of the i data segment. The traditional rewriting method sequentially sorts the containers referenced by a single data segment (such as data segment i), and selects the "optimal" x within the range of a single data segment i containers, when the number of accumulatively processed data segments reaches i, satisfy x=x 1 +x 2 +…+x m . It is worth noting that the containers selected by the traditional rewriting method are not the x containers with the lowest container reference rate in the i data segment range, which leads to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an approximately optimal data fragment block rewriting method for a data deduplication technology. With the arrival of the data era, the total amount of the data is increased in an explosion mode, and the data storage and management requirements of the information world reach the PB level and even the EB level. It is found through IDC research that nearly 75% of data in thedigital world is redundant, and ESG indicates that the data redundancy in a backup and archiving system exceeds 90%. The data deduplication technology can effectively identify and eliminate repeateddata, and the data storage cost is reduced. However, data fragmentation can be caused by deleting repeated data, and data recovery performance is seriously damaged by data fragmentation. According tothe method, the fragment blocks in the data blocks can be accurately identified, and the fragmentation degree of the data is relieved by rewriting the fragment blocks; meanwhile, rewriting strategies(the optimal rewriting strategy and the aggressive rewriting strategy) can be switched in a self-adaptive mode according to different workloads, and therefore the data recovery performance and the data deduplication rate are greatly improved.

Description

technical field [0001] The invention relates to the technical field of data storage and data deduplication, in particular to an approximate optimal data fragment block rewriting method oriented to data deduplication technology. Background technique [0002] With the advent of the data age, the total amount of data in the world is showing an explosive growth trend. IDC research shows that by 2020, the world's annual data growth will increase 44 times from 0.8ZB in 2009 to 35ZB. With the increase of devices such as mobile devices and sensors, the growth of data is intensified. These data sources include human genes, social networks, financial analysis, environmental protection, energy exploration, video games, and medical health. More and more applications field. These data are not only huge in amount, but also complex and diverse in data structure, which brings new challenges to data storage and management, and also increases the risk of data management. How to effectively...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F9/50
CPCG06F16/215G06F9/5016
Inventor 邓玉辉张大统
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products