Unlock instant, AI-driven research and patent intelligence for your innovation.

Filtering method for eliminating alphabetic string fuzzy matching redundance

A technology of fuzzy matching and character elimination, applied in the field of string matching, can solve problems such as redundancy, and achieve the effect of improving matching efficiency and eliminating redundancy.

Inactive Publication Date: 2010-04-21
ZTE CORP
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Then, the problem of string fuzzy matching research is: Given a string t of length n, a pattern string p of length m, and a positive integer k so that ed(s, p) ≤ k
In fact, there is redundancy in this process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Filtering method for eliminating alphabetic string fuzzy matching redundance
  • Filtering method for eliminating alphabetic string fuzzy matching redundance
  • Filtering method for eliminating alphabetic string fuzzy matching redundance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] figure 2 The characters appearing in p are called useful characters, and other characters are useless characters. The figure shows: in the process of scanning from j+m-1 forward, more than k useless characters are found, then you can jump to the nearest useful character and start a new match.

[0029] image 3 Medium BPM is a classic fuzzy matching method using a bit vector method, ABNDM is a method obtained by using another filtering method to improve BPM, and BPM-BM is an algorithm obtained by using the filtering method of the present invention to improve BPM. The figure shows that BPM-BM is 5-7 times faster than BPM and 10%-4 times faster than ABNDM.

[0030] Attached below figure 1 It explains how the filtering method of the present invention eliminates the aforementioned redundancy.

[0031] Assume attached figure 1 The character string t in is "abefabcdxyz......", and a fuzzy match of k=1 with the pattern string p="a bcd" should be found among them.

[0032] Initially,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

In the method, forwarding scanning is carried out from j+m-1 character. Character occurred in pattern string is a useful character, and other characters are useless characters. When useless charactersare more than fuzziness, the method jumps to nearest a useful character to start a new matching. For Chinese character string, speed of the method is 10%-60% faster than world fastest filtering methodin present, and 5-7 times faster than fuzzy matching Chinese character string without filtering method. When applied to Chinese character string, the invention filters out 80%-85% characters so as to raise matching efficiency greatly.

Description

Technical field [0001] The invention relates to a filtering method for character string matching, in particular to a filtering method for eliminating redundancy in string fuzzy matching, and is particularly suitable for character string matching on large character sets such as Chinese. Background technique [0002] String fuzzy matching has important applications in intrusion detection, mobile short message filtering, text editing, information query, automatic indexing, computational biology, information extraction and other fields, and it has become an important topic for improving the performance of computers. [0003] Let s and t be two strings, and ed(s, t) refers to the number of modifications required to transform t into s, that is, the difference between s and t. Here one modification can be to insert, replace or delete a character. From the modified symmetry, it is obvious that ed(s, t)=ed(t, s). If ed(s, t)≤k, then s is said to be a k-match of t. [0004] Obviously, ed(s,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22
Inventor 陈开渠赵洁彭志威
Owner ZTE CORP