Method for filtering same or similar files

A filtering method and document technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of wasting reading and repeating content, reducing the convenience of data search, etc., and achieving the effect of reducing a large number of repetitions

Inactive Publication Date: 2010-05-26
ESOBI
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Generally speaking, search engines will present all search results that match the search keywords to users, even web pages with the same content will be presented without any filtering, although a small number of search engines will search The results are filtered, but web pages with high approximation are still easy to appear again and again. For users, time will be wasted in reading repeated content, and the convenience of data search will be reduced.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for filtering same or similar files
  • Method for filtering same or similar files
  • Method for filtering same or similar files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055]The method disclosed below in the present invention can be implemented by general electronic equipment, such as a computer, which includes but is not limited to a personal computer (Personal Computer), a notebook computer (Note BookComputer) and a server (Server) computer equipment to perform the present invention below Those skilled in the art should be able to implement the methods disclosed below after understanding the present invention.

[0056] According to one of the preferred embodiments of the method disclosed in the present invention, such as figure 1 Shown: Include the following steps:

[0057] (a) read multiple files to be filtered;

[0058] (b) Convert the data structures of multiple files to be filtered, and merge and store them as preset data structure files;

[0059] (c) setting a low threshold value, representing the minimum length of continuous character length;

[0060] (d) setting a high threshold value, indicating the length of continuous characte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for filtering the same or similar files. The method comprises the following steps of: storing a plurality of files to be filtered into a pat tree data structure file (PT file) in a pat tree data structure; searching all the character string nodes of which continuous character length reaches a low threshold and all the files to which the character string nodes belong in the PT file; searching files in which continuous character contents are same, and the same continuous character length reaches a high threshold in the files; searching all the character string nodes of which continuous character length reaches a low threshold and all the files to which the character string nodes belong in the PT file; searching files in which continuous character contents are same, and the ratio of the same continuous character length to the total length of the prior file content reaches a ratio threshold; and marking the files as files having the same content or high degree of approximation. The method can filter the files having the same content or high degree of approximation and solves the problem repeating a great number of same or similar files.

Description

technical field [0001] The invention relates to a file filtering method, in particular to a method for filtering the same or similar files among multiple files by using a computer and performing cluster classification. Background technique [0002] With the rapid development of computers and the Internet, the amount of information to be processed has increased rapidly. Users often use computers to search for required data or information in huge file files or on the Internet; the search engine (searching engine) in the Internet is A tool that helps web users quickly search for data in the vast Internet. [0003] Generally speaking, search engines will present all search results that match the search keywords to users, even web pages with the same content will be presented without any filtering, although a small number of search engines will search The results are filtered, but web pages with high approximation are still easy to appear again and again. For users, time will be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蔡弘扬卓训学
Owner ESOBI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products