Improved method for dynamic generation of data structure for Aho-Corasick algorithm

A technology of data structure and dynamic generation, applied in the field of computer theory, it can solve the problem that processing cannot be reflected in time, and achieve the effect of multi-pattern matching.

Inactive Publication Date: 2015-03-04
BEIJING UNIV OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

During this period, the data processing cannot be reflected in time, so an algorithm is needed that can guarantee the realization of multi-mode matching and complete the data reorganization operation within a limited time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved method for dynamic generation of data structure for Aho-Corasick algorithm
  • Improved method for dynamic generation of data structure for Aho-Corasick algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The specific embodiments of the present invention will be described below in conjunction with the accompanying drawings of the specification.

[0020] Technical solution data definition part: the definition of the data structure and components that need to be used.

[0021] Definition 1: Node object, including: 1) the character that the node refers to; 2) the corresponding character string when reaching the node from the root node; 3) the node's son node reference collection; 4) all nodes that take this node as the failure target Reference set (hereinafter collectively referred to as "failure attribution set"); 5) the depth of the node relative to the root node; 6) the parent node reference of the node; 7) the invalid target node reference of the node; 8) whether the mark is the end node of the string.

[0022] Definition 2: Table header objects, including: 1) A collection of character table objects.

[0023] Definition 3: Root node object, including: 1) general information of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an improved method for the dynamic generation of a data structure for an Aho-Corasick algorithm. The method comprises the following steps of: adding and deleting feature character strings; splitting the feature character strings into single characters, and adding corresponding nodes to positions of a deterministic finite automaton (DFA); setting corresponding data at the new nodes, and checking failure targets of father nodes; finding nodes, namely rejecting first characters of character strings substituted by the nodes, and matching the DFA by using the rest of the character strings; finding an implementation home set of the failure targets, traversing the quotation of all nodes in the implementation home set, judging whether the nodes exist or not, and taking the nodes as failure target nodes; adding the nodes to a character set object set in a header of the DFA; sequentially reducing the character strings from back to front; and finding the corresponding nodes. The data structure is dynamically maintained, and the multi-mode matching retrieval of a great number of continuously variable character strings within a short time is facilitated.

Description

Technical field [0001] The invention belongs to the field of computer theory, and is used to provide an Aho-Corasick-tree data structure capable of dynamic addition and subtraction for the Aho-Corasick algorithm of multi-pattern string matching. Background technique [0002] With the rapid development of information technology, especially in the issue of big data processing, how to achieve rapid retrieval of key fields is an increasingly prominent problem. Especially in the WEB2.0 era, real-time traversal or search of large amounts of data is a normal operation. In the processing of such a large amount of data, it is often necessary to retrieve many different strings at the same time for multi-pattern matching operations, which requires the use of the Aho-Corasick algorithm. But this algorithm has a problem. As an automata algorithm, it relies on a tree-shaped data structure generated in advance from many characteristic strings. Once the characteristic strings need to be added o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 张正欣张建
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products