String matching method and device based on definite state automaton

A string matching and finite state technology, applied in the field of retrieval, can solve problems such as time extension and slow character matching

Active Publication Date: 2010-02-03
BEIJING ZHIGU TECH SERVICE
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] The embodiment of the present invention provides a character string matching method and device based on a finite state automaton, which solves the problems of slow character matching speed and prolonged time in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • String matching method and device based on definite state automaton
  • String matching method and device based on definite state automaton

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050]Since most of the current mainstream processors have L1 cache and L2 cache, a few high-end processors also integrate L3 cache. Among them, the first-level cache is a high-speed cache enclosed in the CPU chip, and its access speed is consistent with the main frequency of the CPU. The first-level cache can temporarily store various types of operation instructions and data required for operation that will be used when the CPU performs operations, so as to deliver them to the CPU. That is, the first-level cache includes a first-level instruction cache and a first-level data cache.

[0051] The second-level cache is the buffer of the first-level cache. Outside the CPU, its function is to store data that needs to be used by the CPU for processing but cannot be stored in the first-level cache. In the same way, the L3 cache and memory can be regarded as the buffer of the L2 cache. L2 cache, L3 cache, and memory cannot store CPU operation instructions.

[0052] The latency ove...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses string matching method and device based on a definite state automaton. The string matching method comprises the following steps: scheduling a program code of the definite stateautomaton DFA, which corresponds to a keyword block, when keywords input by a user are determined to be included in a determined keyword block, wherein the program code is pre-generated by adopting an Aho-Corasick algorithm according to a corresponding relationship of a current state, input characters and an output state, which is determined by the keyword block; executing the program code, sequentially inputting characters included in a database to be searched, and determining the output state according to the current state and the input characters, wherein the output state is a current state of the character input next time; and outputting a character matching result according to the output state. The method adopts the mode of the program code to store the DFA, reduces the time delay for system processing and improves the speed and the efficiency of character matching.

Description

technical field [0001] The invention relates to the field of retrieval technology, in particular to a method and device for character string matching based on a finite state automaton (Deterministic Finite State Automaton, DFA). Background technique [0002] The Aho-Corasick algorithm was proposed by Aho and Corasick of Bell Labs in "EfficientString Matching: An Aid to Bibliographic Search" in 1975. Its core is a finite state automaton (Deterministic Finite State Automaton, DFA) covering all query keywords. ). Each character in the database to be searched is input into DFA one by one, and when a certain query keyword hits, DFA outputs a report. [0003] In the process of obtaining DFA through the Aho-Corasick algorithm, three functions need to be constructed: GOTO, FAILURE and OUTPUT. The process of constructing these three functions includes: [0004] 1.1 Construct GOTO function. [0005] The input required for this process is: the set of keywords to be queried. For ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 黄凯明
Owner BEIJING ZHIGU TECH SERVICE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products