Large-scale and multi-key word matching method for text or network content analysis

A technology for keyword matching and network content, applied in the field of computer data processing, it can solve the problems of low throughput, decrease in average jump value, and decrease in matching speed, achieve excellent space and time performance, excellent algorithm evaluation performance, and improve matching. effect of speed

Inactive Publication Date: 2009-01-14
TSINGHUA UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] However, through the analysis of the WM algorithm process, it can be known that after the number of keywords increases, the number of items with a jump value of zero in the jump table of the WM method will increase, and the average jump value will decrease accordingly, so that the matching process cannot be effectively avoided. A large number of unnecessary character comparisons, resulting in a slowdown in the matching speed of text or network content
The experimental results show that when the number of keywords reaches 100,000, the matching speed of the WM method drops significantly, and the extremely low throughput can hardly meet the practical requirements of text or network content processing.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale and multi-key word matching method for text or network content analysis
  • Large-scale and multi-key word matching method for text or network content analysis
  • Large-scale and multi-key word matching method for text or network content analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031]This method is used in a large-scale multi-keyword matching method for text or network content analysis. First, the length m of the shortest keyword in the keyword set is determined, and m is a positive integer greater than or equal to 4, and a jump table and a keyword table are established; A window with a size of m is placed at the beginning of the first keyword in the above keyword set, and the first hash operation is performed on the data block composed of the last B characters entering the window, where B is 3 or 4, using The obtained hash value retrieves the above jump table, if the jump value of the corresponding entry is not zero, then move the window by one character, repeat the above steps, if it is zero, then enter the following steps, and the Perform the second hash operation on the data block, and use the obtained hash value to retrieve the above keyword table. If the number of keywords corresponding to the entry is not zero, move the window by one character ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A large-capacity multikey-word matching method used on content analysis of text or network includes setting up jump table and key word table, calculating jump value of each jump table item and correlating key word with key word table item, carrying out Hash operation on data block in window, indexing jump value in jump table and moving window as per said value if jump value is not zero or otherwise making Hash operation again data block, indexing key word table and comparing key word correlated with said table item with field in text in sequence for confirming whether they are matched with each other or not.

Description

technical field [0001] The invention relates to a large-scale multi-keyword matching method for text or network content analysis, in particular to a text or network content processing technology under the background of a large-scale keyword set, and belongs to the technical field of computer data processing. Background technique [0002] Multi-keyword matching is one of the fundamental problems in the field of computer science. The problem it solves is to quickly and accurately judge whether a certain text or data block contains one or some keywords in a given keyword set. Multi-keyword matching technology has now been widely used in various fields of network security such as firewalls, virus detection, intrusion detection and defense, and content filtering. It can also be extended to other disciplines, such as information management systems, network search engines, and bioinformatics. Gene sequence detection in etc. Therefore, the research and improvement of multi-keyword...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 周宗伟薛一波
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products