Rapid character string matching method based on suffix array

A technology of suffix array and matching method, which is used in electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of low search efficiency and achieve the effect of excellent matching speed

Active Publication Date: 2018-11-30
南京搜文信息技术有限公司
View PDF14 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is that the search efficiency of the existing string matching method based on suffix numbers is not high. A fast string matching method based on a suffix array is proposed, that is, to quickly find the pattern P that appears in the text T frequency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid character string matching method based on suffix array
  • Rapid character string matching method based on suffix array
  • Rapid character string matching method based on suffix array

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the invention. Obviously, the embodiments described below are only It is a part of the embodiments of the invention, but not all of them. Based on the embodiments of the invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the invention.

[0036] figure 1 It is a flow chart of the method for quickly matching character strings based on suffix arrays in the present invention.

[0037] The purpose of the present invention is to quickly find the number of occurrences of pattern P in text T, first adopt prior art to set up the suffix array SA about text T, then find the number of times that P appears in suffix array SA, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a rapid character string matching method based on a suffix array. The method includes two stages, the first stage includes that the appearing position of a pattern string in a text string is limited in a possible interval of the suffix array taking a first character of the pattern string as a beginning character by binary search, and the second stage includes that search conditions are further limited on the interval, suffixes with the length smaller than that of the pattern string and with last characters different from those of the pattern string are excluded, the comparison frequency of the characters is decreased, character string matching range is narrowed, and the appearing position of the pattern string in the text string is rapidly acquired.

Description

technical field [0001] The invention relates to the technical field of natural language processing under the computer technical field. In particular, it relates to a processing method for time information in text. Background technique [0002] String matching, also called pattern matching, is a key technology widely used in information retrieval, intrusion detection, computational biology, search engines, data compression and other fields. The so-called pattern matching problem refers to finding a specific pattern string P=p 1 p 2 …p m In the text string T=t 1 t 2 ...t n All occurrences and times of occurrences in . According to different research fields and research objects, pattern matching problems can be roughly divided into the following four types: exact string matching, extended string matching, regular expression matching and approximate string matching. Suffix array is an ordered integer array, which is a powerful tool in string processing, and is more pract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 路松峰
Owner 南京搜文信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products