Cold and hot index identification and classification management method in data deduplication system

A classification management and hot indexing technology, which is applied in the fields of electronic digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of reducing backup data performance, avoid frequent disk access, reduce misjudgment rate, The effect of improving performance

Active Publication Date: 2020-06-12
JINAN UNIVERSITY
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the operation of updating the fragment index is very likely to trigger disk IO, the traditional rewriting algorithm reduces the performance of backup data to a certain extent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cold and hot index identification and classification management method in data deduplication system
  • Cold and hot index identification and classification management method in data deduplication system
  • Cold and hot index identification and classification management method in data deduplication system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030]This embodiment discloses a method for identifying and classifying hot and cold indexes in a data deduplication system, which divides indexes into cold indexes and hot indexes according to the frequency or probability of index access, and cold indexes can be further divided into fragmented indexes And useless indexes, through the classification management of indexes, the purpose of improving the overall performance of the data deduplication system is achieved.

[0031] Traditional methods do not identify and separate hot and cold indexes, and the data deduplication system needs the following steps to manage indexes:

[0032] 1) The cold index and the hot index are mixedly stored in the memory, and all indexes (cold index and hot index) are mapped to the Bloom filter;

[0033] 2) With the increase of the backup version and the amount of backup data, the number of indexes is also increasing. When the memory is not enough to store all the indexes, a part of the cold indexes...

Embodiment 2

[0052] Such as figure 1 , figure 2 and image 3 As shown, the hot and cold index identification and classification management method in the data deduplication system disclosed in the present invention, in order to prevent the data deduplication system from frequently accessing the index on the disk (the disk index in the figure) during the index search process, through Container utilization (the frequency or probability that a container is accessed during a certain backup process) classifies and manages the index. Remove the cold index from the global index / memory and put it on the disk, so as to free up more memory space for prefetching the hot index, so that the index lookup operation can be hit in the memory as much as possible, avoiding the data deduplication system from accessing the disk The index on the index can improve the performance of the backup data; only the hot index is mapped to the Bloom filter to reduce the false positive rate of the Bloom filter, so as to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cold and hot index identification and classification management method in a data deduplication system. Aiming at the maximum bottleneck of a repeated data deletion technologyin the field of data storage, when the backup data volume reaches the PB and EB level or above, the repeated data is deleted; the memory is insufficient to store indexes of all data blocks; therefore, the index search intensive data deduplication system frequently accesses the index on the disk; the performance of the data deduplication system is seriously reduced; according to the method, a hotindex and a cold index are identified and separated for the first time, the hot index is an index which is frequently accessed, and the cold index is an index which is rarely accessed, the purposes ofimproving the memory utilization rate and improving the data backup and data recovery performance are achieved by removing the cold index from the memory or the global index, and finally the overallperformance of the data deduplication system is improved. The method can be applied to a data deduplication system with high locality among various backup data streams.

Description

technical field [0001] The invention relates to the technical field of data storage and deduplication, in particular to a hot and cold index identification and classification management method in a data deduplication system. Background technique [0002] The explosive growth of data has posed a serious challenge to storage space. Researchers have found that there are a lot of duplicate data in the data. Storing duplicate data causes waste of storage space and increases storage costs. Data deduplication technology identifies duplicate and unique data blocks through a series of means, and only stores the unique copy and unique block of duplicate data blocks, which greatly reduces the storage space overhead and saves huge costs for enterprises. [0003] Data deduplication technology can generally be divided into 5 stages - 1). Read, 2). Block, 3). Calculate hash value, 4). Deduplication, 5). Filter. Specifically, 1). First read the data to be backed up in the form of a data st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/174G06F16/13
CPCG06F16/1752G06F16/13Y02D10/00
Inventor 邓玉辉张大统
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products