Search engine-oriented error correction method and system of Chinese and English mixed querying

A search engine and query string technology, which is applied in the field of search engine-oriented Chinese-English mixed query error correction, can solve problems such as insufficient support, and achieve the effect of improving the accuracy of error correction

Active Publication Date: 2017-09-22
SUN YAT SEN UNIV
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] For Chinese information retrieval systems, most of them only support error correction for pure Chinese query words or pure English query words, but for the Chinese-English mixed query input by users, the current support is not perfect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Search engine-oriented error correction method and system of Chinese and English mixed querying
  • Search engine-oriented error correction method and system of Chinese and English mixed querying
  • Search engine-oriented error correction method and system of Chinese and English mixed querying

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Such as figure 1 as shown in Figure 2~5 As shown, a search engine-oriented Chinese-English mixed query error correction method includes the following steps:

[0054]S1. Using crawler technology to crawl Internet webpage content;

[0055] S2. Using the webpage content and search logs crawled in step S1 as corpus to construct a language model, and construct a pinyin-based dictionary tree, an English index table and a word segmentation dictionary;

[0056] S3. For the query string input by the user, first use the language model to evaluate it and calculate its rationality probability, if its rationality probability is lower than the set threshold A, or the number of search results obtained based on the query string is less than the threshold B, then proceed to the error correction process of step S4;

[0057] S4. (1) If the query string only contains Chinese, such as figure 2 As shown, the following error correction process is performed:

[0058] S101. If the input ...

Embodiment 2

[0073] This embodiment provides a system applying the method of Embodiment 1, such as figure 1 As shown, the specific scheme is as follows:

[0074] Including learning module, error correction module and training module;

[0075] Wherein the learning module is used to dig out new words to the corpus, and add the new words that have been dug out to the word segmentation dictionary, and the word segmentation dictionary is used for the segmentation of the query string in step S3;

[0076] The training module is used to build a language model based on the corpus, and build a pinyin-based dictionary tree, English index table and word segmentation dictionary;

[0077] The error correction module is used for error correction processing.

[0078] In a specific implementation process, the error correction module includes a Chinese error correction submodule, a Chinese and letter error correction submodule, an English and Pinyin error correction submodule, wherein the Chinese error co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a search engine-oriented error correction method and system of Chinese and English mixed querying. Based on an N-gram language model and a variety of error correction strategies, the method or system realizes error correction for the Chinese and English mixed querying with part of errors in a search engine.

Description

technical field [0001] The present invention relates to the technical field of search engines, and more specifically, to a method and system for error correction of Chinese-English mixed queries oriented to search engines. Background technique [0002] The demand for error correction of query words originated from the log analysis of search engines, and a large number of query words containing some errors were found in the search logs. When querying query words with some errors, the recall rate and precision rate of the search engine will be greatly reduced. Therefore, the technology of correcting query words is introduced into the search engine system to solve the problem of invalid query caused by users inputting some wrong query words. [0003] Query error correction is aimed at spelling error correction of query sentences in information retrieval system. The query statement directly affects the reliability and accuracy of the results returned by the information retriev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/322G06F16/3335G06F16/374G06F16/951
Inventor 刘玉葆占明明葛又铭戴戈南
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products