Chinese error correction method and system based on pinyin feature representation
An error correction method and Chinese technology, applied in the field of data processing, can solve the problems of not paying attention to correct Chinese characters and typos, and low model prediction accuracy, and achieve the effect of improving accuracy and efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0050] The present embodiment provides a Chinese error correction method based on pinyin feature representation, which includes the following steps:
[0051] S1. Constructing a pinyin fuzzy set of Chinese characters and constructing Chinese sentence training samples containing Chinese typos;
[0052] Wherein, the fuzzy set corresponding to each Chinese character pinyin includes: all pinyin combinations of the fuzzy initials corresponding to the pinyin initials (initial) and the fuzzy finals corresponding to the pinyin finals (final); and / or, similar to the pinyin pronunciation, and Pinyin with an edit distance of less than 2; wherein, the "fuzzy" refers to unclear distinction between front nasal and back nasal, and / or, unclear distinction between flat tongue and warped sound, and / or, unclear distinction between voiced and unvoiced sounds Unvoiced, and / or, confusion caused by indistinguishable lateral and nasal sounds; for example, "cai chai ca", "ban bang ba", "chang chan can ...
Embodiment 2
[0089] This embodiment provides a Chinese error correction system for implementing the Chinese error correction method described in Embodiment 1 above, such as image 3 shown, which includes:
[0090] Pinyin fuzzy set construction unit 1, which is used to store the fuzzy set corresponding to each Chinese character pinyin;
[0091] Training sample construction unit 2, which is used to obtain some training corpora corresponding to correct Chinese sentences, and in some training corpora, each Chinese character in this correct Chinese sentence has corresponding typos; Specifically, the training sample structure The method for unit 2 to obtain training samples refers to steps S11-S14 of embodiment 1;
[0092] A sample training unit 3, which stores a training model for performing sample training on the above-mentioned training samples; specifically, the method for performing sample training on the above-mentioned training corpus by the sample training unit 3 refers to step S2 of Em...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


