Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An Approximate Optimal Data Fragmentation Block Rewriting Method Oriented to Data Deduplication Technology

A technology oriented to data and data blocks, applied in the fields of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of small improvement of data recovery performance, large loss data deduplication rate, adjustment of rewriting strategy, etc. , to achieve the effect of improving data recovery performance, improving data deduplication rate, and maximizing benefits

Active Publication Date: 2022-02-11
JINAN UNIVERSITY
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, data deduplication technology introduces data fragmentation, which seriously damages the read performance of the data deduplication system, that is, data recovery performance.
The main method to alleviate data fragmentation is to rewrite fragmented blocks, but the current method of rewriting fragmented blocks (referred to as the rewriting method) has the disadvantages of a large deduplication rate of lost data and a small improvement in data recovery performance.
There are two main reasons for this shortcoming: (1) The current rewriting method only sorts the containers referenced by a data segment according to the container reference rate, and selects the container with the "lowest" container reference rate one-sidedly and arbitrarily. As a result, the containers selected by them are not optimal, and thus the rewritten fragment blocks are not optimal; (2) The current rewriting algorithm cannot adaptively adjust the rewriting strategy according to different workloads

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Approximate Optimal Data Fragmentation Block Rewriting Method Oriented to Data Deduplication Technology
  • An Approximate Optimal Data Fragmentation Block Rewriting Method Oriented to Data Deduplication Technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] Such as figure 1 , 2 As shown, the approximate optimal data fragment block rewriting method oriented to data deduplication technology disclosed by the present invention, the method sorts the containers referenced by the previous i data segments according to the container reference rate through a hash bucket array, and traverses the hash bucket array The bucket array selects the optimal (lowest container reference rate) x containers within the range of the i data segment. The traditional rewriting method sequentially sorts the containers referenced by a single data segment (such as data segment i), and selects the "optimal" x within the range of a single data segment i containers, when the number of accumulatively processed data segments reaches i, satisfy x=x 1 +x 2 +…+x m . It is worth noting that the containers selected by the traditional rewriting method are not the x containers with the lowest container reference rate in the i data segment range, which leads to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an approximately optimal data fragmentation block rewriting method oriented to data deduplication technology. With the advent of the data age, the total amount of data has shown explosive growth, and the data storage and management requirements of the information world have reached PB or even EB levels. IDC research found that nearly 75% of data in the digital world is redundant, and ESG pointed out that data redundancy in backup and archiving systems exceeds 90%. Data deduplication technology can effectively identify and eliminate duplicate data and reduce the cost of data storage. However, deduplication can lead to data fragmentation, which seriously impairs data recovery performance. The present invention can accurately identify the fragmented blocks in the data block, and alleviate the degree of data fragmentation by rewriting the fragmented blocks; at the same time, the present invention can adaptively switch the rewriting strategy (optimal rewriting strategy and Aggressive rewrite strategy), thus greatly improving data recovery performance and data deduplication rate.

Description

technical field [0001] The invention relates to the technical field of data storage and data deduplication, in particular to an approximate optimal data fragment block rewriting method oriented to data deduplication technology. Background technique [0002] With the advent of the data age, the total amount of data in the world is showing an explosive growth trend. IDC research shows that by 2020, the world's annual data growth will increase 44 times from 0.8ZB in 2009 to 35ZB. With the increase of devices such as mobile devices and sensors, the growth of data is intensified. These data sources include human genes, social networks, financial analysis, environmental protection, energy exploration, video games, and medical health. More and more applications field. These data are not only huge in amount, but also complex and diverse in data structure, which brings new challenges to data storage and management, and also increases the risk of data management. How to effectively...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F9/50
CPCG06F16/215G06F9/5016
Inventor 邓玉辉张大统
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products