A fuzzy matching-supporting cloud storage data dereplication method

A fuzzy matching and cloud storage technology, applied in the computer field, to achieve the effect of reducing storage space overhead, fast large-scale computing, and reducing computing time

Active Publication Date: 2016-08-17
XIDIAN UNIV
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Using the fuzzy hash method, according to the actual content of the file, the block hash is performed byte by byte, and the fuzzy hash value of the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A fuzzy matching-supporting cloud storage data dereplication method
  • A fuzzy matching-supporting cloud storage data dereplication method
  • A fuzzy matching-supporting cloud storage data dereplication method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] Attached below figure 1 The present invention is further described in detail.

[0054] Step 1: Read the content of the file to be fuzzy matched byte by byte by using the memory-mapped file method.

[0055] The computer operating system creates a mapping kernel object in the file to be fuzzy matched, reads the number of bytes of the file, and sets the paging granularity of the operating system;

[0056] The computer operating system maps all the mapped kernel objects of the file to be fuzzy matched to the process address space of the computer;

[0057] If the computer operating system has read all the bytes of the file to be fuzzy matched, release the mapping kernel object of the file to be fuzzy matched, otherwise, continue to read the number of bytes of the file.

[0058] Step 2, calculating the metadata of the file to be fuzzy matched.

[0059] Use the rolling hash algorithm to calculate the bytes of the file to be fuzzy matched, and get the checksum of the bytes o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a fuzzy matching-supporting cloud storage data dereplication method. The method comprises the steps of firstly reading file content; secondly, calculating file metadata; thirdly, judging whether the blocking conditions are met; fourthly, calculating fuzzy Hash values; fifthly, compressing the fuzzy Hash values; sixthly, calculating the indexing similarity; seventhly, comparing the fuzzy Hash values; eighthly, judging whether there are replicated data block Hash values; ninthly, performing block-level file ownership certification; tenthly, sending non-replicated data block serial numbers and uploading non-replicated data blocks. The method solves the problems in the prior art complete files are uploaded and stored, each file is subjected to equal-length division according to the bit string length, and replicated data cannot be identified for files with similar content and misaligned heads and tails, reduces the overhead for network uploading bandwidth and server storage space and increases the replicated data deletion rate.

Description

technical field [0001] The invention belongs to the field of computer technology, and further relates to a cloud storage data deduplication method supporting fuzzy matching in the field of information security technology. The invention is used in a cloud storage system supporting deduplication data of similar files, which can not only improve the deduplication rate of duplicate data, reduce network upload bandwidth, but also save the storage space overhead of the cloud storage server. Background technique [0002] With the popularity of cloud storage services, the amount of data stored by users has shown explosive growth. In order to maximize the use of network upload bandwidth and reduce server-side storage space overhead, cloud storage service providers need to try their best to avoid duplicative data uploads. Data deduplication is a technical method widely used in cloud storage systems at present. For files or data blocks with the same content, the cloud storage server o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/1752
Inventor 张跃宇庞婷李晖陈杰王勇张云鹏
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products