Distributed duplication elimination system-oriented data routing method

A data routing and distributed technology, applied in the transmission system, database indexing, electronic digital data processing, etc., can solve the problems of different deduplication effects, difficulty in obtaining scalability, and inability to guarantee data deduplication rate, etc., to improve reliability Scalability, effect of suppressing growth of communication and calculation overhead

Active Publication Date: 2014-03-12
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method has two shortcomings: first, because there is no reference to the stored data, the deduplication rate of the data at the target deduplication node cannot be guaranteed; secondly, because the existing storage utilization of the deduplication node is not considered, and the data The deduplication effect of different deduplication server nodes is different, so the problem of data islands will arise, that is, the data stored by a deduplication server node is much higher than that of other deduplication server nodes
The disadvantage is that this method requires additional summary storage nodes for query, and the memory overhead of data summary is very large, so this method is difficult to obtain good scalability
[0006] It can be seen that how to improve the scalability of data routing in the distributed deduplication system while achieving a balance between the deduplication effect and storage utilization, and how to suppress the increase in communication and computing overhead during the fingerprint query process is currently an unsolved problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed duplication elimination system-oriented data routing method
  • Distributed duplication elimination system-oriented data routing method
  • Distributed duplication elimination system-oriented data routing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be described below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0037] According to an embodiment of the present invention, a data routing method oriented to a distributed deduplication system is provided. Briefly, the method includes: using multiple nodes as summary storage nodes, storing data summaries of a certain type of fingerprints on different summary storage nodes; sending the fingerprints of all data blocks in the data to be deduplicated by category to The corresponding abstract storage node queries the fingerprint in the abstract storage node to obtain the hit score of the fingerprint for each deduplication node; then summarizes the summary score of all fingerprints in the data for each deduplication node, and summarizes the score according to the dedupli...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed duplication elimination system-oriented data routing method, which comprises the following steps that a server categorizes fingerprints of all data blocks forming data, and transmits the fingerprints of different categories to different abstract storage nodes for storing data abstracts of the fingerprints of the corresponding categories respectively; the received fingerprints in the abstract storage nodes are queried to obtain hit scores of the fingerprints in each duplication elimination node; the hit scores are returned to the server; the server obtains a summery score of each duplication elimination node according to the hit scores of the fingerprints in each duplication elimination node, and determines a target duplication elimination node by combining the summery scores and the storage conditions of the duplication elimination nodes. According to the method, duplication elimination effect and the storage utilization rate are balanced, the communication and calculation overhead in a fingerprint query process is effectively inhibited, and the data routing expandability of a distributed duplication elimination system is improved.

Description

technical field [0001] The present invention generally relates to de-duplication technology, and specifically relates to a data routing method for a distributed de-duplication system. Background technique [0002] Since mankind entered the digital information age, a large amount of information has been recorded as data. From the basic living needs of food, clothing, housing and transportation to education, medical care and business fields, from the traditional Internet to the mobile Internet developed by smart phones, more and more people and devices are involved in the creation of data, and the total amount of data generated each year shows Explosive growth. At the same time, due to the potential commercial and scientific value in the data, more and more data are recorded and preserved. According to the research report of International Data Corporation (IDC), the data created and copied globally in 2011 was as high as 1.8ZB, and according to the trend, this number will be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30H04L29/08
CPCG06F3/0634G06F16/22H04L67/10H04L67/1097
Inventor 刘厚贵邢晶霍志刚安学军
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products