A method and device for deduplication of data
A technology of data deduplication and equipment, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of low deduplication rate and achieve the effect of improving the deduplication rate of files
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0062] Such as figure 1 As shown, it is a schematic flowchart of a data deduplication method according to Embodiment 1 of the present invention. The methods include:
[0063] Step 101: Identify the classification of files to be stored.
[0064] Wherein, the classification of the files includes commonly used files and non-used files.
[0065] Specifically, in step 101, the acquired file format of the file to be stored is identified, the file type of the file to be stored is judged, and the file classification category to which the judged file type belongs is determined according to the file classification rule.
[0066] Wherein, the file type includes but not limited to: one or more of doc file type, txt file type, pdf file type, ppt and other file types.
[0067] The file classification rules include file size (divided into large files and small files), file generation time (divided into expired files and new files), and occurrence times (divided into commonly used files an...
Embodiment 2
[0123] Such as figure 2 As shown, it is a schematic flowchart of a method for deduplicating files according to Embodiment 2 of the present invention. Embodiment 2 of the present invention is a method under the same concept as Embodiment 1 of the present invention, and the method includes:
[0124] Step 201: Judging whether the received file to be stored belongs to the frequently used files stored in the frequently used file database, if yes, execute step 202; if not, execute step 206.
[0125] Specifically, in step 201, the methods for obtaining the commonly used files stored in the commonly used file database include but are not limited to:
[0126] Such as image 3 As shown in FIG. 2 , it is a schematic flow chart of a method for obtaining commonly used files in the commonly used file database.
[0127] Step 21: Scan all files in the current common file database, and determine the file type of each file.
[0128] Step 22: For the same file type, obtain the number of occ...
Embodiment 3
[0164] Such as Figure 4 As shown, it is a schematic flowchart of a data deduplication method according to Embodiment 3 of the present invention. Embodiment 3 of the present invention is a method under the same inventive concept as Embodiment 1 of the present invention and Embodiment 2 of the present invention, and the method includes:
[0165] Step 301: Monitor the file to be stored input by the IO port, and use the file type identifier to determine the file type of the monitored file to be stored.
[0166] Specifically, in step 301, the file to be stored input by the IO port is monitored in real time, and the file type of the monitored file to be stored is identified by a file type identifier.
[0167] Preferably, after the file type of the file to be stored is determined, the determined file type is found from the file type basic information library, the number of occurrences of the determined file type is increased by a set value, and the file type basic information is re...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


