Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data de-duplication method for lossless compressed files

A technology for compressing files and files, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of unable to identify redundant data, unable to deduplicate data, unable to find redundant data, etc.

Inactive Publication Date: 2016-08-31
CHONGQING UNIV
View PDF6 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for compressed files (that is, files that have been compressed using traditional compression methods), deduplication cannot discover their potentially redundant data.
This is mainly because the same or similar files, after being compressed by using different compression algorithms, will get a completely different data stream from the original file, resulting in the deduplication technology being unable to identify its potential redundant data
In the storage system, in order to reduce the amount of data, many files are transmitted and stored in the form of compressed files. The inability to identify the potential redundant data of compressed files is a major defect of deduplication technology.
The present invention proposes a data deduplication method for lossless compressed files, which can effectively identify and remove redundant data in compressed files, and fills the technical gap that data deduplication cannot be performed on compressed files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data de-duplication method for lossless compressed files
  • Data de-duplication method for lossless compressed files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The subjects involved in the present invention include a client and a storage server. The specific implementation includes two types: 1) Independent processing mode: the client transmits the compressed file to the storage server, and the storage server processes the compressed file. 2) Cooperation mode: the client and the storage server cooperate with each other to process the compressed files.

[0021] figure 1 It is a schematic diagram of the module structure of the present invention. It mainly includes five parts: a file signature extraction module 101, a duplicate file identification module 102, a file signature library management module 103, a compressed package and compressed package spectrum construction module 104, and a repeated data deletion module 105. The file signature extraction module 101 is used to extract the file signature of each compressed file in the compressed package; the duplicate file identification module 102 finds out duplicate files by cons...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data de-duplication method for lossless compressed files. The method utilizes a data integrity check code of a compressed file such as a cyclic redundancy check code (CRC check codes) as a file signature (File Signature) to recognize a repeated compressed file. Within the limitation of demands of the collision rate, other file attributes such as file length can be extracted, the file length and the check code serve as a file signature to recognize the repeated file; if the compressed file has no the check code, the check code is extracted through calculation, or a Hash value is calculated out through the Hash algorithm and serves as the file signature to recognize the repeated file. The method can be integrated with the conventional repeated data deleting technique and fills up a technique gap that data de-duplication cannot be performed on the compressed files.

Description

technical field [0001] The invention belongs to the technical field of computer information storage, and in particular relates to a data deduplication method for lossless compressed files. Background technique [0002] In the field of information storage technology, data compression is a common means for reducing the amount of data. Data compression includes lossless compression and lossy compression. Lossless compression mainly achieves the purpose of reducing the amount of data by counting the redundant information of the original data within a certain range, using the redundant statistical information to re-encode the original data, and removing the redundant data. Different lossless compression algorithms have different encoding methods. Lossless compression is widely used in the compression of text data, programs, and image data in special applications that require precise storage of data. The lossy compression method takes advantage of the insensitivity of human vis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/1748
Inventor 谭玉娟晏志超
Owner CHONGQING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products