Multi-mode matching algorithm and system based on encoding association
An encoding and pattern string technology, applied in computing, special data processing applications, instruments, etc., can solve problems such as large storage space consumption and impact, achieve high processing speed, avoid encoding conversion, and optimize storage structure.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0078] see Figure 6 , the embodiment of the present invention provides a method for establishing an automatic state machine and a failure jump function. This method utilizes encoding splitting of Unicode encoded characters to generate Goto function, thereby optimizing the storage space of the traditional Wang algorithm, using the association of encoded characters to generate Failedjump function, compensating for the loss of operating efficiency caused by optimizing storage space, and improving To ensure the running speed of the algorithm in the same coding environment, the specific steps are as follows:
[0079] 101: Input a keyword pattern group, divide the input keyword group according to a preset keyword group separator, and obtain a divided keyword set;
[0080] The keyword pattern group is an overall pattern string obtained by concatenating multiple keyword pattern strings with a pre-set delimiter, in order to facilitate the overall input after selecting multiple keywor...
Embodiment 2
[0093] see Figure 7 , the embodiment of the present invention provides a system for multi-pattern matching in electronic documents, using the two functions generated by the initialization module to quickly execute the effective content segment obtained from electronic documents or other environments and encoded in Unocode The parallel multi-keyword matching system includes:
[0094] The initialization module uses the method described in step 102 and step 103 to initialize and generate the Goto function and the FailedTable function.
[0095] The text content extraction module is used to extract effective content in electronic documents or other information carriers as target text objects for multi-keyword matching.
[0096] Wherein, the detected target text object of the output should be the content segment obtained directly from the electronic document (doc, xls, ppt and some txt files) obtained with the text content extraction module and using Unicode encoding as the Chines...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com