Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese spelling error correction method and device based on multiple representations and multiple pre-training models

An error correction method and pre-training technology, applied in the field of Chinese error correction, can solve problems such as the inability to achieve comprehensive error correction, and achieve the effect of improving the accuracy rate

Pending Publication Date: 2021-11-09
NANJING UNIV OF SCI & TECH +1
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] This application provides a Chinese spelling error correction method and device based on multiple representations and multiple pre-training models, which can be used to solve the technical problems that cannot achieve comprehensive error correction in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese spelling error correction method and device based on multiple representations and multiple pre-training models
  • Chinese spelling error correction method and device based on multiple representations and multiple pre-training models
  • Chinese spelling error correction method and device based on multiple representations and multiple pre-training models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0089] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0090] For the convenience of describing the embodiments of the present application, the professional terms involved in the embodiments of the present application are explained first.

[0091] Chinese spelling correction, which detects and corrects misspelled words from sentences.

[0092] Word boundary features, that is, the boundary features between phrases in Chinese sentences.

[0093] The BiLSTM network, or bidirectional long short-term memory network (LSTM), is a neural network suitable for sequential data. BiLSTM is composed of forward LSTM and backward LSTM.

[0094] Conditional Random Field (CRF), a discriminative probability model, is used to find the closest real label sequence among many predicted label sequences. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese spelling error correction method and device based on multiple representations and multiple pre-training models, and the method comprises the steps: carrying out the fusion of word boundaries and the extraction of radical features of each word in Chinese to be subjected to error correction, and obtaining the Chinese to be subjected to error correction with feature values, wherein the feature values comprise word boundary feature values and radical feature values; inputting the to-be-corrected Chinese character with the characteristic value into a pre-trained wrong character recognition model to obtain a recognized to-be-corrected character; replacing the to-be-corrected characters with preset marks to obtain middle to-be-corrected Chinese characters; and inputting the middle Chinese character to be corrected into a pre-trained multi-pre-training model, selecting a target correct character from a preset confusion set to replace the character to be corrected, and obtaining the corrected Chinese character. According to the method provided by the invention, spelling errors can be identified from multiple angles, and the accuracy of spelling error correction is improved.

Description

technical field [0001] The present application relates to the technical field of Chinese error correction, in particular to a Chinese spelling error correction method and device based on multiple representations and multiple pre-trained models. Background technique [0002] Spelling errors, including speech recognition errors, image-to-text errors, and writing errors, appear widely in our lives. According to surveys, about 83% of spelling mistakes are related to phonetic similarity, and about 48% of spelling mistakes are related to visual similarity. These misspellings have a great impact on downstream tasks such as pattern recognition and named entity recognition. [0003] Spelling error correction is also a very challenging task, and a good spelling error correction method requires deep language understanding and the ability to link context. At present, spelling errors are mainly caused by the following three reasons: reason 1 is that the Chinese text lacks word boundary...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/232G06N3/04G06N3/08
CPCG06F40/232G06N3/08G06N3/044
Inventor 黄河燕顾雅涵
Owner NANJING UNIV OF SCI & TECH