Chinese text error correction method based on same or similar pinyin

A text error correction and pinyin technology, applied in the field of text error correction, can solve the problems of low word granularity accuracy, time-consuming, inconvenient use, etc., and achieve the effect of improving generalization ability, fast error correction and convenient use.

Active Publication Date: 2020-10-09
杭州云嘉云计算有限公司
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problem that the establishment of the confusion set requires a lot of time and manual maintenance, high cost and inconvenient use, and the accuracy of the traditional method is low, the present invention proposes a Chinese text error correction method based on the same or similar pinyin, and establishes The Chinese character structure language model whose granularity is a single Chinese character uses the confusion set and MAD algorithm to detect errors in the candidate sequence, and uses the double-choice Viterbi algorithm to decode and output the error correction result

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text error correction method based on same or similar pinyin
  • Chinese text error correction method based on same or similar pinyin
  • Chinese text error correction method based on same or similar pinyin

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0052] This embodiment proposes a Chinese text error correction method based on the same or similar pinyin, refer to figure 1 , including the following steps:

[0053] S1, make adjustments on the basis of the traditional ngrams language model, and establish a Chinese character structure language model whose granularity is a single Chinese character;

[0054] It needs to go through the steps of text corpus, word segmentation text, conversion into word structure, generation of statistical counting files, and then generation of language model, so that the word granularity language model retains the advantages of word granularity language model, which is convenient for word-by-word error detection of sentences.

[0055] Word structure description: It is composed of Chinese characters + pinyin + word position numbers, of which there are 6 types of position numbers, a single word is numbered s, two-character words are numbered b2, e2, and words with three or more characters...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Chinese text error correction method based on the same or similar pinyin, and the method comprises the following steps: S1, carrying out the adjustment on the basis of a conventional ngram language model, and building a Chinese character structure language model with the granularity of a single Chinese character; S2, performing candidate processing on the statement to be corrected to generate a candidate sequence; s3, performing error detection on the candidate sequence based on the confusion set and an MAD algorithm to obtain a to-be-corrected statement candidate sequence; and S4, based on the maximum posterior probability of the Chinese character structure language model, decoding and outputting an error correction result by using a double-selection Viterbi algorithm. Compared with a traditional method, the word granularity accuracy is high, and the error correction speed is higher.

Description

technical field [0001] The invention relates to the technical field of text error correction, in particular to a Chinese text error correction method based on the same or similar pinyin. Background technique [0002] Text error correction is applicable to many fields, such as manual typing assistance: it can automatically check and prompt typos after the user enters. In this way, errors caused by negligence can be reduced, and the efficiency and quality of user input can be effectively improved. In the field of search error correction: For search interfaces such as e-commerce and search engines, users often input errors when searching. By analyzing the form and characteristics of search items, you can Automatically correct search items and prompt users, and then provide search results that better meet user needs, effectively shielding the impact of typos on users' real needs; in the field of speech recognition or robot dialogue: embed text error correction into the dialogue ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/232G06F40/289
CPCG06F40/232G06F40/289
Inventor 何卓威
Owner 杭州云嘉云计算有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products