Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi keyword matching method for rapid content analysis

A keyword matching and content analysis technology, applied in the field of content analysis, can solve the problem that the intrusion detection system does not have a good solution

Inactive Publication Date: 2004-06-30
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the development speed of network bandwidth is much faster than the development speed of computer hardware. Real-time information detection must rely on algorithm improvement and hardware development at the same time.
There is no good solution for information monitoring and intrusion detection systems under the current G bandwidth network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] A multi-keyword matching algorithm based on Huffman coding (Huff-Match) that we designed is a multi-keyword matching algorithm based on Huffman code. In the preprocessing stage, Huff-Match first establishes the Huffman code of each character according to the probability of the character appearing in the keyword, and then encodes each keyword into an integer. In the scanning phase, Huff-Match scans the text from left to right and encodes this part of the text as an integer. Then use a test table to determine whether any of the keywords are matched. Since the encoded integer at any position in the text is easily calculated from the encoding at the previous position, Huff-Match can quickly process a large number of keywords. In the actual implementation of the algorithm, there is a detection table that cannot be too large, so we only take a part of the code for calculation, but finally use strict matching to confirm the occurrence of keywords. The closest previous work o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is a kind of multi-keyword matching method of quick content analysis, which comprises following steps: 1) carries on pretreatment to the keywords; 2) creates the Huffman code of each character according to the occurrence frequency of each keyword, then codes each keyword into an integer; 3) creates a test table with all keywords; 4) scans the text; 5) uses the test table to carry on text content analysis. It can carry on dynamic adjustment according to the occurrence frequency of each word in keyword, thus enhances the matching speed of multi-keyword. The character lies in: it designs the multi-keyword matching algorithm based on Huffman code against large quantity of short keywords.

Description

technical field [0001] The invention belongs to the field of content analysis, in particular to a multi-keyword matching method for fast content analysis. Background technique [0002] Multi-keyword matching (Keywords Matching), sometimes also called multi-pattern matching (MultiplePattern Matching) or dictionary matching (Directory Matching, Set Matching), is a classic algorithm problem, which studies the rapid matching of multiple keywords from a large amount of data ( multiple modes) technology. Keyword matching algorithms are divided into indexing schemes and non-indexing schemes according to preprocessing of text or patterns. The indexing scheme can preprocess the text first, and then perform keyword matching. We mainly consider non-indexed solutions. Since this scheme does not need to preprocess the search text, it is the core algorithm of the network information monitoring system. [0003] By 2002, research reports showed that the algorithm could only handle data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F11/00G06F12/14G06F17/00G06F17/22
Inventor 谭建龙卜东波张鑫余智华郭莉
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI