Unlock instant, AI-driven research and patent intelligence for your innovation.
A Chinese error correction method and system based on pinyin feature representation
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An error correction method and Chinese technology, applied in the field of data processing, can solve the problems of not paying attention to correct Chinese characters and typos, and low model prediction accuracy.
Active Publication Date: 2021-09-14
灯塔财经信息有限公司
View PDF5 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
However, the above-mentioned models pay more attention to the enhancement or processing of the semantics of Chinese characters, and do not pay attention to the connection between correct Chinese characters and typos in pinyin input. Therefore, the above models still have prediction accuracy when it comes to correcting typos that are strongly related to pinyin. not high problem
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0050] This embodiment provides an error correction method based on a Chinese phonetic characterize, comprising the steps of:
[0051] S1, pinyin characters fuzzy sets and Chinese sentence structure containing training samples of characters spelling;
[0052] Wherein each Pinyin corresponding fuzzy set comprising: the pinyin consonant (Initial) corresponding to fuzzy blur consonantvowel combination corresponding to the phonetic vowel (Final) into all the alphabet; and / or, similar to the phonetic pronunciation, and pinyin edit distance is less than 2; and wherein the "fuzzy" means that since the front and rear nasal distinguish nasal area, and / or flat Alice retroflex retroflex and distinguish areas, and / or does not distinguish between voiced and unvoiced clear, and / or while tones nasal area and can not tell confusion caused; for example, "cai chai ca", "ban bang ba", "chang chan can cang", "lang nang lan nanrang" and so on.
[0089] Chinese present embodiment provides the above-described error correction system for implementing the method of Example 1 of the Chinese correction, such as image 3 Shown, comprising:
[0090] Phonetic fuzzy sets construction unit 1, for storing fuzzy sets corresponding to each Pinyin;
[0091] Training sample constructing unit 2 for obtaining a plurality of training data corresponding to the correct Chinese sentence, and the plurality of training data, the correct statement each Chinese character has a corresponding spelling; Specifically, the training sample configuration unit 2 acquires the training sample method, see Example 1, steps S11-S14;
[0092] Sample 3 training unit, which stores a training model, for the above-described training samples for training samples; Specifically, the method of the above-described sample training unit 3 training corpus of training samples see Example 1, Step S2 embodiment;
[0093] And sentence prediction means 4, connecting the sample t...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The present invention proposes a Chinese error correction method and system based on pinyin feature representation, which includes the following steps: S1, constructing a pinyinfuzzy set of Chinese characters and constructing Chinese sentence training samples containing Chinese typos; S2, using the above training samples to perform model Training; and S3, extracting the Chinese character embedding sequence and the pinyin character embedding sequence of the Chinese characters in the target Chinese sentence, and inputting them into the training model to obtain the Chinese character prediction results for each position in the target Chinese sentence, and finally obtain the error-corrected Chinese statement. The invention obtains a pinyin fuzzy set through the mapping relationship between correct Chinese characters and typos using pinyin as a medium, and establishes a training model based on a mixed attention module, thereby improving learning efficiency and prediction accuracy of typos.
Description
Technical field [0001] The present invention relates to data processing, and more particularly to a method and system for error correcting Chinese pinyin characterize based. Background technique [0002] Chinese characters correction has long been a hotspot of natural languageprocessing research. Since the depth learning model can automatically make learning effective knowledge of the language, so in recent years on this issue, based on the depth of the proposed new learning method based on overall than traditional machinelearning methods. At this stage, based on BERT (ie Bidirectional EncoderRepresentations from Transformer) method model has reached a new height in effect, the advantage of this method lies in its pre-training phase can make learning a language model to a very effective language knowledge. [0003] The sentence considered as a Chinese character sequence, then the use of language knowledge to correct a typo in fact, it is to establish the mapping between the cor...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.