A query method based on lucene typos

A query method and technology for typos, applied in text database query, electronic digital data processing, unstructured text data retrieval, etc., can solve problems such as low proofreading efficiency, poor user experience, and interference with sentence syntax and semantics. achieve the effect of improving the accuracy

Active Publication Date: 2020-03-20
南方电网互联网服务有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 3) True word errors will interfere with the grammar and semantics of the entire sentence, so finding true word errors requires a lot of knowledge and resources;
[0007] 4) Data sparseness is a major obstacle for automatic proofreading of true word errors
The automatic proofreading method for Chinese true word errors of the present invention solves the problems of data sparseness, misjudgment of correct words, and low proofreading efficiency in the prior art, and has high effectiveness and accuracy; but the inventive method still has certain defects: In practical applications, this method requires a large amount of corpus training, and the retrieval takes a lot of time, which is not very good for the actual user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A query method based on lucene typos

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] The present embodiment is based on the query method of Lucene typos, and text " excellent flower " is queried, and described query method comprises the following steps:

[0031] (1) Carry out word segmentation on the query text "Youhua", and the result after word segmentation is "You" and "Hua";

[0032] (2) Read the word "优", judge whether it is a non-single-character word, or a single-character word, and obtain the simset of "优"=[you, squid, yo, 莸, post, yo, you, 牖, lure, yo, 蝣, 蝤, weed, friend, wart, and, larvae, especially, gnat, young, worried, secluded, you, you, you, uranium, long, you, oil, right, pomelo, still, have, excellent, brachial, 奶, you, you, 卣, europium, you, yo, glaze];

[0033] (3) Read the word "flower", judge it as a single-character word, and obtain the simset of "flower"=[化, hua, 吪, slip, 植, 姡, cunning, hua, stroke, 呚, flower, wow, birch, flower , Hua, Long, words]

[0034] (4) The formed Cartesian product result Result=[Youhua, Youhua, Youhua, Yo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a query method based on Lucene typos, which divides the sentences of the text to be queried, selects the first word to see if it is a single-character word, and if it is a single-character word, queries the sound-like table and shape-like table, according to the sound The similar table and the similar table return the query result simset, and then perform a Cartesian product of the query result simset with the next word or the next word query result simset to obtain the Cartesian product result, and use the result to match all the words in the dictionary, if If the matching is successful, the error correction result will be returned and added to the error correction result set. If the error correction result set is empty, a null value will be returned and the match will be exited. If the error correction result set is not empty, all error correction results will be returned. Use error correction Result query; the first word in the query text sentence is not a single word, or result matches all words in the dictionary, if the match is unsuccessful, read the characters backward and repeat the previous steps. The invention has the advantages that: the invention makes the Lucene search more accurate and humanized, and improves the accuracy of the search.

Description

technical field [0001] The invention belongs to natural language processing in the field of artificial intelligence computers, in particular to a query method based on Lucene typos. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers, text electronic publications such as e-books, e-newspapers, emails, and office documents are constantly emerging, and there are more and more errors in texts . [0003] At present, most of them use manual proofreading. The proofreading work is monotonous, labor-intensive, and inefficient. Manual proofreading can no longer meet the needs of text proofreading. Therefore, the study of automatic text proofreading has far-reaching significance for both theory and application. Automatic text proofreading is one of the main applications of natural language processing, and it is also a difficult problem in natural language under...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F40/289G06F40/30
CPCG06F16/3344G06F40/289G06F40/30
Inventor 张晓如陈璟刘嘎琼陈国程文月刘亮亮
Owner 南方电网互联网服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products