Unlock instant, AI-driven research and patent intelligence for your innovation.

A Chinese error correction method and system based on pinyin feature representation

An error correction method and Chinese technology, applied in the field of data processing, can solve the problems of not paying attention to correct Chinese characters and typos, and low model prediction accuracy.

Active Publication Date: 2021-09-14
灯塔财经信息有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above-mentioned models pay more attention to the enhancement or processing of the semantics of Chinese characters, and do not pay attention to the connection between correct Chinese characters and typos in pinyin input. Therefore, the above models still have prediction accuracy when it comes to correcting typos that are strongly related to pinyin. not high problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese error correction method and system based on pinyin feature representation
  • A Chinese error correction method and system based on pinyin feature representation
  • A Chinese error correction method and system based on pinyin feature representation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] This embodiment provides an error correction method based on a Chinese phonetic characterize, comprising the steps of:

[0051] S1, pinyin characters fuzzy sets and Chinese sentence structure containing training samples of characters spelling;

[0052] Wherein each Pinyin corresponding fuzzy set comprising: the pinyin consonant (Initial) corresponding to fuzzy blur consonant vowel combination corresponding to the phonetic vowel (Final) into all the alphabet; and / or, similar to the phonetic pronunciation, and pinyin edit distance is less than 2; and wherein the "fuzzy" means that since the front and rear nasal distinguish nasal area, and / or flat Alice retroflex retroflex and distinguish areas, and / or does not distinguish between voiced and unvoiced clear, and / or while tones nasal area and can not tell confusion caused; for example, "cai chai ca", "ban bang ba", "chang chan can cang", "lang nang lan nanrang" and so on.

[0053] Phonetic in a regular pattern, after the...

Embodiment 2

[0089] Chinese present embodiment provides the above-described error correction system for implementing the method of Example 1 of the Chinese correction, such as image 3 Shown, comprising:

[0090] Phonetic fuzzy sets construction unit 1, for storing fuzzy sets corresponding to each Pinyin;

[0091] Training sample constructing unit 2 for obtaining a plurality of training data corresponding to the correct Chinese sentence, and the plurality of training data, the correct statement each Chinese character has a corresponding spelling; Specifically, the training sample configuration unit 2 acquires the training sample method, see Example 1, steps S11-S14;

[0092] Sample 3 training unit, which stores a training model, for the above-described training samples for training samples; Specifically, the method of the above-described sample training unit 3 training corpus of training samples see Example 1, Step S2 embodiment;

[0093] And sentence prediction means 4, connecting the sample t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention proposes a Chinese error correction method and system based on pinyin feature representation, which includes the following steps: S1, constructing a pinyin fuzzy set of Chinese characters and constructing Chinese sentence training samples containing Chinese typos; S2, using the above training samples to perform model Training; and S3, extracting the Chinese character embedding sequence and the pinyin character embedding sequence of the Chinese characters in the target Chinese sentence, and inputting them into the training model to obtain the Chinese character prediction results for each position in the target Chinese sentence, and finally obtain the error-corrected Chinese statement. The invention obtains a pinyin fuzzy set through the mapping relationship between correct Chinese characters and typos using pinyin as a medium, and establishes a training model based on a mixed attention module, thereby improving learning efficiency and prediction accuracy of typos.

Description

Technical field [0001] The present invention relates to data processing, and more particularly to a method and system for error correcting Chinese pinyin characterize based. Background technique [0002] Chinese characters correction has long been a hotspot of natural language processing research. Since the depth learning model can automatically make learning effective knowledge of the language, so in recent years on this issue, based on the depth of the proposed new learning method based on overall than traditional machine learning methods. At this stage, based on BERT (ie Bidirectional EncoderRepresentations from Transformer) method model has reached a new height in effect, the advantage of this method lies in its pre-training phase can make learning a language model to a very effective language knowledge. [0003] The sentence considered as a Chinese character sequence, then the use of language knowledge to correct a typo in fact, it is to establish the mapping between the cor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/232G06N3/04G06N3/08
CPCG06F40/232G06N3/084G06N3/047
Inventor 许振兴曾庆斌庞洵朱留锋
Owner 灯塔财经信息有限公司