Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese error correction method and system based on pinyin feature representation

An error correction method and Chinese technology, applied in the field of data processing, can solve the problems of not paying attention to correct Chinese characters and typos, and low model prediction accuracy, and achieve the effect of improving accuracy and efficiency

Active Publication Date: 2021-06-15
灯塔财经信息有限公司
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above-mentioned models pay more attention to the enhancement or processing of the semantics of Chinese characters, and do not pay attention to the connection between correct Chinese characters and typos in pinyin input. Therefore, the above models still have prediction accuracy when it comes to correcting typos that are strongly related to pinyin. not high problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese error correction method and system based on pinyin feature representation
  • Chinese error correction method and system based on pinyin feature representation
  • Chinese error correction method and system based on pinyin feature representation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] The present embodiment provides a Chinese error correction method based on pinyin feature representation, which includes the following steps:

[0051] S1. Constructing a pinyin fuzzy set of Chinese characters and constructing Chinese sentence training samples containing Chinese typos;

[0052] Wherein, the fuzzy set corresponding to each Chinese character pinyin includes: all pinyin combinations of the fuzzy initials corresponding to the pinyin initials (initial) and the fuzzy finals corresponding to the pinyin finals (final); and / or, similar to the pinyin pronunciation, and Pinyin with an edit distance of less than 2; wherein, the "fuzzy" refers to unclear distinction between front nasal and back nasal, and / or, unclear distinction between flat tongue and warped sound, and / or, unclear distinction between voiced and unvoiced sounds Unvoiced, and / or, confusion caused by indistinguishable lateral and nasal sounds; for example, "cai chai ca", "ban bang ba", "chang chan can ...

Embodiment 2

[0089] This embodiment provides a Chinese error correction system for implementing the Chinese error correction method described in Embodiment 1 above, such as image 3 shown, which includes:

[0090] Pinyin fuzzy set construction unit 1, which is used to store the fuzzy set corresponding to each Chinese character pinyin;

[0091] Training sample construction unit 2, which is used to obtain some training corpora corresponding to correct Chinese sentences, and in some training corpora, each Chinese character in this correct Chinese sentence has corresponding typos; Specifically, the training sample structure The method for unit 2 to obtain training samples refers to steps S11-S14 of embodiment 1;

[0092] A sample training unit 3, which stores a training model for performing sample training on the above-mentioned training samples; specifically, the method for performing sample training on the above-mentioned training corpus by the sample training unit 3 refers to step S2 of Em...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese error correction method and a system based on pinyin feature representation. The method comprises the following steps: S1, constructing a pinyin fuzzy set of Chinese characters and constructing a Chinese statement training sample containing wrongly written Chinese characters; s2, performing model training by using the training sample; and S3, extracting a Chinese character embedding sequence and a pinyin character embedding sequence of Chinese characters in the target Chinese statement, and inputting the Chinese character embedding sequence and the pinyin character embedding sequence into the training model to obtain a Chinese character prediction result of each position in the target Chinese statement, and finally obtaining the Chinese statement after error correction. According to the method, the pinyin fuzzy set is obtained through the mapping relation between the correct Chinese characters and the wrongly written characters by taking pinyin as a medium, and the training model is established based on the mixed attention module, so that the learning efficiency and the prediction accuracy of the wrongly written characters are improved.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a Chinese error correction method and system based on pinyin feature representation. Background technique [0002] Chinese character error correction has always been a hot spot in the field of natural language processing research in China. Since deep learning can enable the model to automatically learn effective language knowledge, in recent years, on this issue, newly proposed methods based on deep learning have generally surpassed methods based on traditional machine learning. At this stage, the method based on the BERT (Bidirectional Encoder Representations from Transformer) model has reached a new height in terms of effect. The advantage of this method is that its pre-training phase can enable the language model to learn very effective language knowledge. [0003] Treating a sentence as a sequence of Chinese characters, using language knowledge to correct a typo is actually to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/232G06N3/04G06N3/08
CPCG06F40/232G06N3/084G06N3/047
Inventor 许振兴曾庆斌庞洵朱留锋
Owner 灯塔财经信息有限公司