High-speed accurate single-pattern character string matching method

A string matching, single-pattern technology, applied in the field of information processing, can solve the problems of low probability of subscript out of bounds, reduced algorithm performance, and SBNDM2 algorithm performance impact, etc., to achieve high performance and wide range of applications.

Inactive Publication Date: 2009-12-23
HARBIN ENG UNIV
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the text to be matched is generally relatively large, the probability of subscript out-of-bounds is very small, so frequent subscript out-of-bounds checks will reduce the performance of the algorithm
[0010] 3) Although both Scan Loop and Match Loop can achieve jumping, the performance of the SBNDM2 algorithm is mainly affected by Scan Loop

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-speed accurate single-pattern character string matching method
  • High-speed accurate single-pattern character string matching method
  • High-speed accurate single-pattern character string matching method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The present invention is described in more detail below in conjunction with accompanying drawing example:

[0052] A high-speed bit-parallel string matching method described in the present invention includes two stages: preprocessing and searching.

[0053] 1. Preprocessing: Preparatory work for matching. It mainly includes the following three steps:

[0054] a) Preprocessing of the pattern: This process is used to generate a bitmask of all characters in the alphabet. In this step, a corresponding relationship between the position of a character in the pattern and the bit in the bit mask is first given. Then generate its corresponding bitmask for each character in the character set (the bitmask is an unsigned integer), and the generation method is as follows: if the character appears in the pattern, the character in the bitmask is in each pattern The occurrence position in the bitmask according to the corresponding bit is set to 1, while the other positions are 0. A...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-speed accurate single-pattern character string matching method, comprising a pretreatment phase and a search phase; wherein, the pretreatment phase comprises three main steps: pretreating patterns, pretreating texts and judging optimal matching action in accordance with matching conditions; the search phase is a process of string matching and comprises three main steps: Scan Loop, Match Loop and subsequent judgment action. In the invention, the following improvements are made on the basis of an SBNDM2 algorithm, one of the top-speed methods when matching is carried out in current corpora of English: reducing the expenditure of index bound detection by introducing an index bound protection mechanism; simplifying the algorithm by the way of modifying the definitions of bitmasks and bit vectors; determining a method for selecting the optimal loop unrolling characters with regard to different pattern lengths and different corpora by expanding the loop unrolling mechanism of SBNDM2 and improving the matching performance of the algorithm aiming at different matching conditions. The method of the invention is a high-speed bit parallel accurate single-pattern string matching method with high performance and broad application range when the pattern length is not more than the machine word-length.

Description

(1) Technical field [0001] The invention relates to a method for improving the search performance of processed text data in the field of information processing, in particular to a method for precise single-pattern character string matching. (2) Background technology [0002] The string matching problem refers to the problem of finding a subsequence of symbols with certain properties in a given sequence of symbols. The string matching problem is one of the basic problems in computer science, and its application range is extremely wide. Almost all fields involving text processing or fields that can be planned as text processing include related requirements for string matching (such as search engines, language translation, OCR recognition, spell checking, etc.). Especially in important fields such as intrusion detection / virus detection, network information filtering and retrieval, computational biology / bioinformatics, string matching has become the core problem in these fields...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 姚念民范洪博
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products