Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for deleting multiple layered repetitive data based on distributed file system

A technology of data deduplication and distributed files, which is applied in the field of information storage and can solve the problem of low deduplication rate

Inactive Publication Date: 2017-05-10
TOYOU FEIJI ELECTRONICS
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, developers will make targeted modifications to the software according to their own needs. In this case, there are slight differences between the modified software and the original software, and the existing data deduplication method has a low deduplication rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for deleting multiple layered repetitive data based on distributed file system
  • Method and device for deleting multiple layered repetitive data based on distributed file system
  • Method and device for deleting multiple layered repetitive data based on distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0057] The invention provides a multi-layer duplicate data deletion method based on a distributed file system.

[0058] A distributed system is a software system built on the network, which has a high degree of cohesion and transparency. Cohesion means that each database distribution node is highly autonomous and has a local database management system. Transparency means that each database distribution node is transparent to user applications, and it is impossible to tell whether it is local or remote. In a distributed database system, the user does not feel that the data is distributed, that is, the user does not need to know whether the relationship is divided, whether there is a copy, which site the data is stored on, and which site the transaction is executed on. What an independent computer presents to the user...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for deleting multiple layered repetitive data based on a distributed file system. The method comprises the steps that digital fingerprints pending to be written into files are obtained; the digital fingerprints pending to be written into the files are judged from the digital fingerprint list in the global files whether or not the digital fingerprints exist; if the digital fingerprints exist, metadata information pending to be written into the files is recorded; if the digital fingerprints do not exist, the metadata information pending to be written into the files is segmented according to a preset method, and the digital fingerprint for each segment is obtained; the digital fingerprints of the segments are judged from the segment digital fingerprint list in the global files whether or not the segment fingerprints exist; if the segment digital fingerprints exist, segment metadata information pending to be written into the files is recorded; if the segment digital fingerprints do not exist, the segments and the segment digital fingerprints are sent to the corresponding storage nodes. The invention also discloses a device for deleting multiple layered repetitive data based on the distributed file system. The efficiency of deleting repetitive data is increased, and storage space is saved through the storage of the files or the segment digital fingerprints by the technical scheme.

Description

technical field [0001] The invention relates to the field of information storage, in particular to a multi-layer duplicate data deletion method and device based on a distributed file system. Background technique [0002] In an existing distributed file system, data deduplication technology is used to store duplicate data to improve disk utilization and reduce costs. However, with the development of technology and information, files have become more and more diverse, and the chances of the entire file content being exactly the same are getting smaller and smaller. For example, developers will make targeted modifications to the software according to their own needs. In this case, there are slight differences between the modified software and the original software, and the existing data deduplication method has a low deduplication rate. Contents of the invention [0003] The main purpose of the present invention is to provide a method and device for deduplication of multi-la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/164G06F16/1748G06F16/1827
Inventor 李发明张勤
Owner TOYOU FEIJI ELECTRONICS