Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-keyword matching method for text or network content analysis

A keyword matching and network content technology, which is applied in the field of fast multi-keyword matching for efficient storage, can solve problems such as a large amount of memory, and achieve the effect of reducing space consumption and memory space consumption

Active Publication Date: 2006-07-12
TSINGHUA UNIV
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The main disadvantage of the finite state automaton of the AC method is that it requires a lot of memory to store its automaton structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-keyword matching method for text or network content analysis
  • Multi-keyword matching method for text or network content analysis
  • Multi-keyword matching method for text or network content analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The multi-keyword matching method for text or network content analysis proposed by the present invention firstly establishes a finite state automaton with states as nodes according to the keywords to be matched, and records the characters in the keywords; The finite state automaton is converted to obtain a finite state automaton with characters as nodes, the total number of nodes is m+1, m is the number of characters in the above keywords, and the addresses of all nodes are stored as an index table; the to-be-matched The text or network data stream is used as the input of the finite state automaton with characters as nodes, and is matched with keywords.

[0028] In the above method, the process of converting the finite state automaton is: first, the value corresponding to each character in all nodes of the finite state automaton with the state as the node is used as the value of the node in the finite state automaton with the character as the node values; and then combi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multiple-keyword matching method of text or internet content analysis in the text or network content disposal technique domain, which is characterized by the following: establishing limited state automat at node of state according to the matching keyword; recording the character in the keyword; switching the limit state automat according to the character; adapting total node as m+1 (m is character number in the keyword); making the matched text or internet data flow as the input of limit state automat to match the keyword. The invention doesn't introduce any accessory calculation in comparison with the present AC method, which reduces the consumption of internal memory space greatly when the appearing character quantity is less than the most possible character quantity in the keyword set.

Description

technical field [0001] The invention relates to a multi-keyword matching method for text or network content analysis, in particular to a fast multi-keyword matching method based on high-efficiency storage of finite state automata, and belongs to the technical field of text or network content processing. Background technique [0002] Multiple Pattern String Matching (Multiple Pattern String Matching) is one of the basic problems in the field of computer science. The problem it solves is to quickly judge whether a certain data block contains one or some keywords in the keyword set. Multi-keyword matching technology is widely used in text processing, network content analysis, intrusion detection, bioinformatics, information retrieval and other fields. [0003] One of the classic methods to solve the fast multi-keyword matching problem is based on the method of finite state automata. This method was first proposed by Alfred V.Aho and Margaret J.Corasick in 1975, and is usually...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 余建明李军
Owner TSINGHUA UNIV