Multi-mode matching algorithm and system based on encoding association

An encoding and pattern string technology, applied in computing, special data processing applications, instruments, etc., can solve problems such as large storage space consumption and impact, achieve high processing speed, avoid encoding conversion, and optimize storage structure.

Active Publication Date: 2012-11-28
CHENGDU WANGAN TECH DEV
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can be seen that although the mandatory code splitting avoids cumbersome code conversion and solves the problem of excessiv

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-mode matching algorithm and system based on encoding association
  • Multi-mode matching algorithm and system based on encoding association
  • Multi-mode matching algorithm and system based on encoding association

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] see Figure 6 , the embodiment of the present invention provides a method for establishing an automatic state machine and a failure jump function. This method utilizes encoding splitting of Unicode encoded characters to generate Goto function, thereby optimizing the storage space of the traditional Wang algorithm, using the association of encoded characters to generate Failedjump function, compensating for the loss of operating efficiency caused by optimizing storage space, and improving To ensure the running speed of the algorithm in the same coding environment, the specific steps are as follows:

[0079] 101: Input a keyword pattern group, divide the input keyword group according to a preset keyword group separator, and obtain a divided keyword set;

[0080] The keyword pattern group is an overall pattern string obtained by concatenating multiple keyword pattern strings with a pre-set delimiter, in order to facilitate the overall input after selecting multiple keywor...

Embodiment 2

[0093] see Figure 7 , the embodiment of the present invention provides a system for multi-pattern matching in electronic documents, using the two functions generated by the initialization module to quickly execute the effective content segment obtained from electronic documents or other environments and encoded in Unocode The parallel multi-keyword matching system includes:

[0094] The initialization module uses the method described in step 102 and step 103 to initialize and generate the Goto function and the FailedTable function.

[0095] The text content extraction module is used to extract effective content in electronic documents or other information carriers as target text objects for multi-keyword matching.

[0096] Wherein, the detected target text object of the output should be the content segment obtained directly from the electronic document (doc, xls, ppt and some txt files) obtained with the text content extraction module and using Unicode encoding as the Chines...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-mode matching algorithm and a multi-mode matching system based on encoding association. The system comprises a keyword group input module for acquiring a matching keyword set, a Goto function generation module for generating an auxiliary structure Goto function, a FailedJump function generation module for generating an auxiliary structure FailedJump function, and a matching execution module for reading a search object and performing multi-keyword matching. The algorithm is characterized in that by character splitting, a space storage structure of a Wang algorithm in a Unicode and Chinese semantic environment is optimized; by encoding association, staggered matching of the Wang algorithm after encoding conversion is performed is eliminated, and the maximum skipping value of the Wang algorithm is increased; and therefore, the running speed of the Wang algorithm in the current coding environment is quickened. The algorithm can be widely applied to computer products for key information scanning and positioning, keyword matching and checking for various types of electronic documents and the like in which Unicode is used as an encoding mode.

Description

technical field [0001] The invention belongs to the technical field of text content processing and searching, and in particular relates to a multi-pattern matching algorithm and system based on coding association. Background technique [0002] Electronic documents, along with the development of information technology and the popularization of PC computers, have become an important carrier of information in today's society. Due to the large number of applications and widespread popularity of electronic documents, information processing technologies related to electronic documents, such as indexing of specified keywords in electronic documents, and checking of specified keywords in some confidential documents, have become more and more important. The multi-pattern matching algorithm, as a fast algorithm for finding specified keywords in the target content segment, is the core technology for keyword information processing of electronic documents. The spatial performance of the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 朱永强
Owner CHENGDU WANGAN TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products