A Chinese text error correction method based on the same or similar pinyin

A text error correction and pinyin technology, applied in the field of text error correction, can solve the problems of low word granularity accuracy, time-consuming, inconvenient use, etc., and achieve high accuracy, fast error correction speed, and convenient use

Active Publication Date: 2021-07-27
杭州云嘉云计算有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problem that the establishment of the confusion set requires a lot of time and manual maintenance, high cost and inconvenient use, and the accuracy of the traditional method is low, the present invention proposes a Chinese text error correction method based on the same or similar pinyin, and establishes The Chinese character structure language model whose granularity is a single Chinese character uses the confusion set and MAD algorithm to detect errors in the candidate sequence, and uses the double-choice Viterbi algorithm to decode and output the error correction result

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese text error correction method based on the same or similar pinyin
  • A Chinese text error correction method based on the same or similar pinyin
  • A Chinese text error correction method based on the same or similar pinyin

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0052] This embodiment proposes a Chinese text error correction method based on the same or similar pinyin, refer to figure 1 , including the following steps:

[0053] S1, make adjustments on the basis of the traditional ngrams language model, and establish a Chinese character structure language model whose granularity is a single Chinese character;

[0054] It needs to go through the steps of text corpus, word segmentation text, conversion into word structure, generation of statistical counting files, and then generation of language model, so that the word granularity language model retains the advantages of word granularity language model, which is convenient for word-by-word error detection of sentences.

[0055] Word structure description: It is composed of Chinese characters + pinyin + word position numbers, of which there are 6 types of position numbers, a single word is numbered s, two-character words are numbered b2, e2, and words with three or more characters...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention proposes a Chinese text error correction method based on the same or similar pinyin, comprising the following steps: S1, making adjustments on the basis of the traditional ngrams language model, and establishing a Chinese character structure language model whose granularity is a single Chinese character; S2, treating Perform candidate processing on wrong sentences to generate candidate sequences; S3, perform error detection on candidate sequences based on confusion set and MAD algorithm, and obtain candidate sequences of sentences to be corrected; S4, based on the maximum posterior probability of the Chinese word structure language model, use double selection The Viterbi algorithm decodes and outputs error correction results. Compared with the traditional method, the word granularity accuracy rate of the present invention is higher, and the error correction speed is faster than the traditional method.

Description

technical field [0001] The invention relates to the technical field of text error correction, in particular to a Chinese text error correction method based on the same or similar pinyin. Background technique [0002] Text error correction is applicable to many fields, such as manual typing assistance: it can automatically check and prompt typos after the user enters. In this way, errors caused by negligence can be reduced, and the efficiency and quality of user input can be effectively improved. In the field of search error correction: For search interfaces such as e-commerce and search engines, users often input errors when searching. By analyzing the form and characteristics of search items, you can Automatically correct search items and prompt users, and then provide search results that better meet user needs, effectively shielding the impact of typos on users' real needs; in the field of speech recognition or robot dialogue: embed text error correction into the dialogue ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/232G06F40/289
CPCG06F40/232G06F40/289
Inventor 何卓威
Owner 杭州云嘉云计算有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products