Content-defined chunking remote file real-time updating method

A real-time update, remote file technology, applied in electrical components, program control devices, program loading/starting, etc., can solve the problems of high update overhead of distributed storage systems, affecting the IO network performance of storage systems, and not supporting random file writing.

Active Publication Date: 2014-04-16
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing method mainly sends the updated files back to the distributed storage system completely, which will increase the network transmission overhead and affect the IO network performance of the storage system, especially in the case of a large amount of IO access
[0005] How to solve the problem that the distributed storage system has high update overhead and generally does not support random writing of files is an important technical issue concerned by those skilled in the art. It can effectively reduce the network transmission overhead in the file update process and can be applied to distributed storage systems that support WAN-level applications. storage system, but the current mainstream GFS, HDFS and other distributed storage systems also have deficiencies. They only support file append operations, but do not support random file writes.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content-defined chunking remote file real-time updating method
  • Content-defined chunking remote file real-time updating method
  • Content-defined chunking remote file real-time updating method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] figure 1 A schematic diagram of the file update process is described.

[0044] The cloud server stores the file F old Block by size, and calculate the fingerprint information and Hash value of the file block. The user downloads the file and block fingerprint information from the cloud server, and prepares to upload to the cloud server after modification. Before uploading the file, the user terminal first follows the F old The block information pair F new Chunk by content and calculate F new Hash value of each file block. The user uploads the Hash value of the file block to the cloud server, and the server compares the Hash value of the old and new file blocks to judge the F new Existing file blocks in the cloud (unmodified by the user) and file blocks that need to be uploaded through the network (modified by the user). Finally, the server stitches each file block to complete the file update on the cloud.

[0045] figure 2 An overall flow diagram of the inventi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a content-defined chunking remote file real-time updating method. The invention aims at providing a file increment updating method with low network transmission expenditure for a distributive storage system, so that the distributive storage system can support the random writing of a file at low expenditure. According to the technical scheme, the method comprises the following steps: storing an original file fold file in a chunking manner, calculating an abstract for each data chunk by adopting a Hash algorithm, calculating one fingerprint for the beginning and the ending of each data chunk by adopting a rabin-fingerprinting algorithm, contrasting the chunking information of the original file fold by adopting a content-defined chunking data chunking way, chunking a novel file Fnew, calculating the abstract for each data chunk of Fnew by adopting the Hash algorithm, comparing the abstracts of the Fold data chunk and the abstract of the Fnew data chunk, finding out the varied data chunk, deleting the data chunk to be deleted from the Fold, and adding the data chunk to be updated into Fold. By adopting the method, the random writing of the file can be supported by the distributive storage system at low expenditure.

Description

technical field [0001] The invention relates to a data update method in a distributed storage system, in particular to a data update method in a distributed storage system supporting wide area network level applications. Background technique [0002] With the rapid development of cloud storage, storage technology is undergoing revolutionary changes. Traditional file systems can no longer meet the needs of massive data storage, and distributed storage systems have emerged as the times require. Typical distributed storage systems include master-slave distributed file systems such as Google's GFS (Google File System), open source project Hadoop's HDFS, and flat ring-structured key-value storage systems such as Amazon's Dynamo and Facebook Cassandra et al. [0003] Distributed storage systems have shown unique advantages in terms of storage capacity, scalability, reliability, and performance. Therefore, they are more and more widely used in massive data processing and are grad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/445H04L29/08
Inventor 廖湘科李珊珊刘晓东彭绍亮谢欣伟贾周阳董德尊张菁林彬孔志印刘磊
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products