Iterative data detection method based on localized optimization

A technique for repeating data and detection methods, which is applied in the direction of electrical digital data processing, input/output process of data processing, instruments, etc., can solve problems such as low detection performance, inability to detect repeated data in large data sets, etc., to improve detection efficiency, The effect of improving retrieval performance and improving accuracy

Active Publication Date: 2017-11-24
HUAZHONG UNIV OF SCI & TECH
View PDF14 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above defects or improvement needs of the prior art, the present invention provides a duplicate data detection method based on locality optimization. Technical Issues in Implementing Effective Duplicate Data Detection for Large Datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Iterative data detection method based on localized optimization
  • Iterative data detection method based on localized optimization
  • Iterative data detection method based on localized optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0041] The invention proposes a high-efficiency rapid repeated fingerprint detection technology. It is mainly aimed at the duplicate data detection of the data set type with strong locality, and realizes the optimization strategy of the first-level pre-judgment and the third-level detection through the Bloom filter and the cache technology to improve the performance of the duplicate data detecti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an iterative data detection method based on localized optimization, and belongs to the technical field of computer storage. The problem that the detection efficiency is low in existing iterative data detection methods is solved, and the current situation that as the scale of stored data is expanded, the iterative data detection efficiency is lowered is adapted to. The method includes the steps of Bloom filter detection, Hash bucket write buffer detection, Hash bucket read buffer detection and Hash bucket address table detection. According to the method, mainly focusing on dataset types with high locality, through mining the locality of data concentration, the data prefetching efficiency is improved, the disk access overhead is reduced, and the throughput rate of data deduplication is improved. For probable iterative data of data concentration, the repeatability of a data block is prejudged initially by means of the Bloom filter, then based on different conditions, detection of three-order iterative data is conducted on the hot zone, cold zone and disc of buffer cache respectively, thus the locality of the iterative data is fully utilized, and the detection effectiveness of the iterative data is improved.

Description

technical field [0001] The invention belongs to the technical field of computer storage, and more particularly relates to a method for detecting duplicate data based on locality optimization. Background technique [0002] With the rapid development of information technology, information has become a precious resource for our survival and the biggest driving force for the rapid development of productivity. The extensive application of information technology is also accompanied by the generation of massive data, and more and more valuable data needs to be stored. Then, how to effectively improve the storage efficiency of existing storage media to meet the ever-increasing storage demand has become one of the urgent problems to be solved in the field of storage research. At the same time, IDC's research report shows that about 75% of the existing data is redundant information, that is, only 25% of the data is unique. In this context, data deduplication, as a new technology to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06
CPCG06F3/061G06F3/0641G06F3/0656
Inventor 王桦周可张攀峰
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products