Large scale rapid matching method of sentence surface

A matching method and large-scale technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as being unsuitable for large-scale text applications, fuzzy matching of high-speed SMS content, and complex matching rules.

Active Publication Date: 2008-12-24
讯飞医疗科技股份有限公司
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The commonly used method of calculating sentence similarity can be used for fuzzy matching of sentences, but it is not suitable for large-scale text applications; if the keyword-based search matching algorithm is applied to the fuzzy matching requirements at the sentence level, multiple substrings can be established for each sentence Keywords will lead to a large number of keywords, complex matching rules, and low matching efficiency. It also cannot meet the requirements for fuzzy matching of large-capacity and high-speed SMS content.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large scale rapid matching method of sentence surface
  • Large scale rapid matching method of sentence surface

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The sentence fast matching algorithm provided by the present invention is further explained below in conjunction with accompanying drawings, specifically, as attached figure 1 As shown, the algorithm can be divided into two stages: index library establishment and matching search.

[0018] In order to improve the accuracy of matching, the algorithm provides a text preprocessing module to preprocess sentences, specifically, including deleting spaces, special symbols and other characters that cannot be used as matching keywords; full-width and half-width conversion; case conversion; unified encoding Conversion to support matching between sentences with different encodings; this module is called both in the indexing phase and in the matching phase. After preprocessing, all sentences can be seen as an encoded sequence with 2 bytes per character.

[0019] For each sentence, multiple indexes are established. Specifically, a sliding window that can accommodate L characters is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a large-scale fast matching method in sentence level. The method of the invention comprises three stages which are index establishment, fuzzy matching and exact matching. The state of index establishment is in charge of carrying out the standardization of sentence content and conversion of code; the fuzzy matching stage is for picking up candidate sentences possible to match with new sentences from numerous sentences, and the number of the candidate sentences is controlled in a practicable range; the exact matching stage adopts a similarity measure algorithm based on edit distance; the final matched sentences are then obtained by arranging the candidate sentences according to the similarity of the exact matching. The method of the invention has the advantages of excellent performance of actual test, high efficiency of search, low undetected rate and being capable of meeting practical requirements.

Description

technical field [0001] The invention relates to a text retrieval method, in particular to a large-scale rapid matching method at the sentence level in text retrieval queries. Background technique [0002] At present, search matching algorithms are widely used in Internet search and management information systems. Depending on the purpose of the application, the search matching algorithm is different. The most common algorithm is to generate matching rules based on fixed keywords combined with different logical AND or relationships. A more intelligent algorithm supports the search for keywords that are similar in sound or shape. [0003] The patent "Linear Parameter Matching Algorithm for SMS Content" (public number 200410061271.4) publicly retrieved on the website of the State Intellectual Property Office of China provides a matching result that can linearly adjust the matching parameters through matching feedback information, so as to match a certain flow of SMS within t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈志刚胡国平胡郁刘庆峰王仁华
Owner 讯飞医疗科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products