Method for multi-mode string matching according to word length

A multi-pattern string matching and pattern string technology, which is applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of reduced matching efficiency, increased verification entries, and reduced average jump distance, reaching the number of memory accesses Reduced effect

Active Publication Date: 2012-07-25
顾乃杰
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Wu-Manber method adopts hash hash and jump search, which has good matching efficiency, but the Wu-Manber method has a shortest pattern string length in the pattern set, and the average jump distance becomes smaller when searching, requiring frequent Calculate the hash value, increase the number of verification entries, and the matching efficiency will be seriously reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054]This multi-pattern string matching method of word length matching proposed by the present invention is divided into a pre-compilation process and a search process, wherein the pre-compilation process is the same as the traditional Wu-Manber method, namely setting up three tables: SHIFT table, HASH table and PREFIX table; during the search process, the unit of each match is a word length, that is, an integer, and by shifting the integer, the hash of the three character blocks contained in the integer can be quickly obtained value, and then perform matching verification according to the three tables obtained in the precompilation process.

[0055] The implementation platform in this embodiment is the linux operating system, the word length is 32 bits, the central processing unit is a dual-core Core 2 generation, and the memory is 2 gigabytes, and the matching text of 9.54 megabytes randomly generated is adopted, and the character The set size is 256. The pattern string set...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for multi-mode string matching according to word length, which comprises a precompiling process and a searching process. A shift table, a hash table and a prefix table are constructed in the precompiling process. The method is characterized in that text is read according to the word length, one integer is loaded from the text each time, one machine word is read in and processed each time, and accordingly, weakness of small leap distance caused by small string length in a shortest mode can be overcome; and hash values of three character blocks contained in the integer can be obtained by means of shifting the integer, one by one valuing and OR operation are not needed, calculating speed of the hash values are improved, access memory times are reduced effectively, and memory accessing efficiency is enhanced. By the method, higher efficiency in multi-mode string matching is achieved.

Description

technical field [0001] The invention belongs to the technical field of character string matching in computers, and in particular relates to a multi-pattern string matching method based on word length matching. Background technique [0002] Multi-pattern string matching methods have been widely used in information retrieval, network content filtering, virus detection and biocomputing. The so-called multi-pattern string matching is to search out all occurrences of all the pattern strings in the pattern string set from the text. The classic multi-pattern matching methods include prefix-based matching defense, suffix-based matching method and substring-based matching method. Among them, the suffix-based matching method, such as the Wu-Manber method proposed in the 1994 report "A Fast Multi-Pattern String Matching Algorithm" by the School of Computer Science, University of Arizona (report number TR-94-17), is currently the average in practice. One of the best performing methods...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 顾乃杰汪永进郭利财任开新
Owner 顾乃杰
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products