Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-string matching method in a search engine

A character string matching and search engine technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of the average jumping distance of the window, the impact of the rapidity of the Wu-Manber method, etc., and achieve the maximum jumping distance increase The effect of large and high matching efficiency

Inactive Publication Date: 2012-07-04
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the Wu-Manber method is used for text matching of the new rule set, since the maximum value in the SHIFT entry will be rapidly reduced from the original (100-B+1) to (6-B+1), the window during the matching process The average jump distance will be greatly reduced, and the quickness of the Wu-Manber method will be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-string matching method in a search engine
  • Multi-string matching method in a search engine
  • Multi-string matching method in a search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The solution of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0065] In the embodiment, there are 3 pattern strings in the pattern string set P to be matched, namely "english", "kilometer" and "fine", the corresponding lengths are 7, 9, and 4, and their numbers are 0, 1, and 2 respectively . The content of its text T is " vmogenglishsdyfine "; Use the method that the present invention proposes to search pattern string " english ", " kilometer ", the concrete process of " fine " in text T as follows:

[0066] The preprocessing stage includes building hash table HASH, jump table SHIFT, prefix table PREFIX and short pattern string filter table HOT. The specific working steps are as follows:

[0067] Step 1: Set the size SUM of the HOT table and the length s of character blocks selected by the HOT table; set SUM to 256, and take s=2.

[0068] Step 2: Divide all pattern strings in the patte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a multi-string matching method, belonging to the technical field of string matching. The invention separates long strings from short strings in a rule set based on the conventional Wu-Manber method and further processes the long strings and short strings in the rule set in different ways when a SHIFT table is created, thus ensuring the maximum table entry of the SHIFT table to be free from the limit of the length of the short strings and overcoming the disadvantage that the maximum skipping distance of the maximum table entry is limited by the length of the shortest string in the rule set; and by introducing the HOT table and using the method for HOT search in the matching process, the invention increases the maximum skipping distance of the window without skipping the short strings. The method of the invention achieves higher matching efficiency.

Description

technical field [0001] The invention relates to a multi-character string matching method in a search engine, and belongs to the technical field of character string matching. Background technique [0002] In the computer field, string matching has always been one of the focuses of computer research. The string matching problem can be described as: t (t is a positive integer) substrings (usually called pattern strings, or rules) that need to be matched are known, and P 1 , P 2 ,...,P t Indicates that the string to be retrieved (usually referred to as text) is represented by T[1...n] (n is a positive integer), finds all the pattern strings that appear in the text T[1...n], and reports its occurrence s position. The so-called multi-pattern matching is to match multiple pattern strings P at a time in the text string T[1...n] 1 , P 2 ,...,P t , when t=1, multi-pattern matching degenerates into single-pattern matching. [0003] String matching plays a key role in applicatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 嵩天黎达
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products