Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese search engine mixed speech-oriented query error corrosion method and system

A technology of mixed language and error correction method is applied in the field of query error correction method and system for Chinese search engine mixed language, which can solve the problems of coexistence of Chinese and English, Pinyin and Chinese, and can not handle Chinese query errors, etc. To achieve the effect of simple processing

Active Publication Date: 2013-01-09
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI +1
View PDF2 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solution cannot handle Chinese query errors other than the correct set, and it cannot handle the coexistence of Chinese, English, Pinyin and Chinese in Chinese search engines
[0007] Due to the characteristics of Chinese search engines' mixed language queries, neither the English query error correction method nor the simple fuzzy sound matching Chinese query error correction method is applicable to the Chinese search engine query error correction method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese search engine mixed speech-oriented query error corrosion method and system
  • Chinese search engine mixed speech-oriented query error corrosion method and system
  • Chinese search engine mixed speech-oriented query error corrosion method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments, but it is not intended to limit the present invention.

[0068] Such as figure 1 As shown, a query error correction method for Chinese search engine mixed language, including the following steps:

[0069] (1) Construct a heterogeneous character tree dictionary for hybrid languages, and use high-frequency or high-click query text to build a language model.

[0070] Step (1) corresponds to figure 1 In step 102 and step 106.

[0071] In step 102, the detailed process of using the thesaurus file 104 to construct a heterogeneous character tree dictionary oriented to mixed languages ​​is as follows: figure 2 shown.

[0072] In the process of building the dictionary tree, Chinese characters and other characters are handled slightly differently. As ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese search engine mixed speech-oriented query error corrosion method and a Chinese search engine mixed speech-oriented query error corrosion system. A mixed speech-oriented heterogeneous character tree dictionary and a language model constructed based on high-frequency or high-clicking frequency user query log are adopted for the characteristics of mixed speech query of a Chinese search engine to perform synchronous segmentation and error correction on the query of a user, and the aim of segmenting the query by switching states is fulfilled; N optimal completed states and M optimal uncompleted states after the edition of each step are recorded by adopting double queues, so that error correction speed is ensured, and in addition, an optimal segmentation mode and a corresponding substitute entry combination are obtained; and a judgment is made by utilizing the characteristics of a final error correction result candidate set, and error correction results consistent with limit conditions are output, so that accuracy is effectively improved.

Description

technical field [0001] The invention belongs to natural language processing technology, in particular to a query error correction method and system for Chinese search engine mixed language. Background technique [0002] The main way that the existing search engine interacts with the user is that the user inputs a search term, and the search engine provides a corresponding matching web page for the search term. Therefore, correctly understanding the user's query requirements from the search terms input by the user is one of the functions that the search engine needs to continuously improve. Compared with traditional text, the search terms entered by users in search engines have a higher probability of error and more types of errors, mainly due to the huge user base of search engines and the novelty and variety of online languages. According to statistics, 10%-15% of queries entered into English search engines contain spelling errors. Query error correction technology is a n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/24G06F11/07
Inventor 程舒杨熊锦华公帅颛悦张成程学旗廖华明
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More