Unlock instant, AI-driven research and patent intelligence for your innovation.

Duplicated data deleting method and apparatus

A technology of data deduplication and equipment, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of low deduplication rate and achieve the effect of improving the deduplication rate of files

Active Publication Date: 2013-09-18
HUAWEI TECH CO LTD
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At this time, the calculated fingerprint information 1 is different from the calculated fingerprint information 2, and file B is non-duplicate data relative to file A. Therefore, file B will be stored, but there is a large amount of the same data as file A in file B. The deduplication rate of files (the ratio of the total amount of original files to the total amount of output files after deduplication processing) is relatively low
[0008] That is to say, for files of the same file type, when the data used to calculate the fingerprint information in the file changes, the deduplication rate of the file will be low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicated data deleting method and apparatus
  • Duplicated data deleting method and apparatus
  • Duplicated data deleting method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as figure 1 As shown, it is a schematic flowchart of a data deduplication method according to Embodiment 1 of the present invention. The methods include:

[0063] Step 101: Identify the classification of files to be stored.

[0064] Wherein, the classification of the files includes commonly used files and non-used files.

[0065] Specifically, in step 101, the acquired file format of the file to be stored is identified, the file type of the file to be stored is judged, and the file classification category to which the judged file type belongs is determined according to the file classification rule.

[0066] Wherein, the file type includes but not limited to: one or more of doc file type, txt file type, pdf file type, ppt and other file types.

[0067] The file classification rules include file size (divided into large files and small files), file generation time (divided into expired files and new files), and occurrence times (divided into commonly used files an...

Embodiment 2

[0123] Such as figure 2 As shown, it is a schematic flowchart of a method for deduplicating files according to Embodiment 2 of the present invention. Embodiment 2 of the present invention is a method under the same concept as Embodiment 1 of the present invention, and the method includes:

[0124] Step 201: Judging whether the received file to be stored belongs to the frequently used files stored in the frequently used file database, if yes, execute step 202; if not, execute step 206.

[0125] Specifically, in step 201, the methods for obtaining the commonly used files stored in the commonly used file database include but are not limited to:

[0126] Such as image 3 As shown in FIG. 2 , it is a schematic flow chart of a method for obtaining commonly used files in the commonly used file database.

[0127] Step 21: Scan all files in the current common file database, and determine the file type of each file.

[0128] Step 22: For the same file type, obtain the number of occ...

Embodiment 3

[0164] Such as Figure 4 As shown, it is a schematic flowchart of a data deduplication method according to Embodiment 3 of the present invention. Embodiment 3 of the present invention is a method under the same inventive concept as Embodiment 1 of the present invention and Embodiment 2 of the present invention, and the method includes:

[0165] Step 301: Monitor the file to be stored input by the IO port, and use the file type identifier to determine the file type of the monitored file to be stored.

[0166] Specifically, in step 301, the file to be stored input by the IO port is monitored in real time, and the file type of the monitored file to be stored is identified by a file type identifier.

[0167] Preferably, after the file type of the file to be stored is determined, the determined file type is found from the file type basic information library, the number of occurrences of the determined file type is increased by a set value, and the file type basic information is re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a duplicated data deleting method and apparatus. The duplicated data deleting method comprises the following steps: identifying the classification of documents to be stored, determining duplicated data deleting rules used in stored documents according to the document classification, and performing duplicated data deleting on the documents to be stored according to the determined duplicated data deleting rules. According to the invention, the duplicated data deleting rules are determined according to the document classification, so that the duplicated data is deleted with pertinence, and the duplicated data deleting ratio is improved.

Description

technical field [0001] The invention relates to the field of data storage, in particular to a method and device for deduplicating data based on file classification. Background technique [0002] With the popularization of cloud computing technology, cloud computing-based virtual desktop infrastructure (virtual desktop infrastructure, VDI) applications have developed rapidly. At present, whether domestic or foreign, many large enterprises and governments have switched their traditional personal computers (Personal Computer, referred to as PC) to VDI desktop cloud, so that the PCs that were originally isolated from each other and similar to information islands are organically connected. stand up. [0003] According to the research data, 60% of the data stored between different users is duplicate data, especially the duplicate data stored between different users in the same work department is as high as 80%. Therefore, in the field of data storage, how to effectively Deduplic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 周景才
Owner HUAWEI TECH CO LTD