A file marking and deduplication analysis method, terminal equipment and storage medium

A technology for file marking and analysis methods, applied in file systems, file/folder operations, instruments, etc., can solve problems such as waste of time, manpower and resources, and achieve the effect of preventing repeated analysis, efficient positioning, and improving analysis efficiency

Active Publication Date: 2020-09-11
XIAMEN MEIYA PICO INFORMATION
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the development of science and technology, the update speed of data information is getting faster and faster. In some specific application scenarios, new data will be appended to the end of the file every period of time for the same file under the same path, such as storing data on the Internet. For files with historical records, since the content of the file is updated in real time, the content of the file needs to be continuously analyzed, and the file becomes larger and larger in the process. If the entire content of the file is analyzed every time, it will Great waste of time, manpower and resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A file marking and deduplication analysis method, terminal equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] refer to figure 1 As shown, the present invention provides a kind of file marking and deduplication analysis method, comprises the following steps:

[0025] S100: Record the file information of the file to be analyzed, where the file information includes the feature value of the file name and the total size of the file. The file name feature value is used to record the file name, and various commonly used algorithms can be used to record it. In this embodiment, the hash algorithm is used to calculate the hash value of the file name, and the file name feature value is Filename hash.

[0026] S200: Check whether there is a marked file under the path where the file to be analyzed is located, if not, go to S300, otherwise, go to S400.

[0027] The marked file is distinguished from the analyzed file by a special name or a special suffix, that is, it uses a different naming method or a different suffix from the file to be analyzed.

[0028] S300: Create a tag file, and aft...

Embodiment 2

[0037] The present invention also provides a terminal device for file marking and deduplication analysis, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program The steps in the above method embodiment of Embodiment 1 of the present invention are realized.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a file marking and duplication removal analysis method, terminal equipment and a storage medium. In the method, firstly, file information of a to-be-analyzed file is recorded;secondly, whether a marker file exists under a path where the to-be-analyzed file is positioned is judged; if no, the marker file is newly created; if yes, whether the marker file comprises marker information corresponding to the to-be-analyzed file is judged again; if no, the marker information is newly created; if yes, a total file size in the file information is judged whether to be equal to atotal file size in the marker information; if the total file size in the file information is equal to the total file size in the marker information, analysis does not need to be carried out; and if the total file size in the file information is greater than the total file size in the marker information, a size of contents of the to-be-analyzed file, which correspond to the analyzed files in the marker information recorded by the marker file, are skipped, and residual contents of the to-be-analyzed file are analyzed. According to the invention, by only carrying out analysis on newly added contents in the file and not carrying out repeated analysis on unchanged contents int he file, the aim of file duplication removal analysis is fulfilled.

Description

technical field [0001] The invention relates to the field of file analysis, in particular to a file marking and deduplication analysis method, a terminal device and a storage medium. Background technique [0002] With the development of science and technology, the update speed of data information is getting faster and faster. In some specific application scenarios, new data will be appended to the end of the file every period of time for the same file under the same path, such as storing data on the Internet. For files with historical records, since the content of the file is updated in real time, the content of the file needs to be continuously analyzed, and the file becomes larger and larger in the process. If the entire content of the file is analyzed every time, it will It causes a great waste of time, manpower and resources. Contents of the invention [0003] In view of the above problems, the present invention aims to provide a file marking and deduplication analysi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/16G06F16/174
Inventor 陈良彬吴鸿伟周成祖李山张永光
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products