Method and system for realizing cross-domain Chinese text error correction

A text error correction and cross-domain technology, applied in instruments, biological neural network models, electrical digital data processing, etc., can solve problems such as high model complexity, limited scope of application, ambiguity, etc.

Pending Publication Date: 2021-07-06
XIAMEN KUAISHANGTONG TECH CORP LTD
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the former is relatively inefficient for sorting and recalling the algorithm for correcting wrong text, and the given correct text has a limited scope of application because the candidate set is a finite set, which may also lead t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for realizing cross-domain Chinese text error correction
  • Method and system for realizing cross-domain Chinese text error correction
  • Method and system for realizing cross-domain Chinese text error correction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The embodiment of the present invention proposes a cross-domain Chinese text error correction method, that is, a set of error detection→candidate recall→error correction sorting model, which can deal with the cross-domain text error correction problem more generally, through deep learning training Using a language model to recall text can increase the perplexity of the recalled text, and the models are decoupled from each other, improving efficiency.

[0045] Such as figure 1 , a specific flow chart for implementing a cross-domain Chinese text error correction method provided by an embodiment of the present invention, including the following steps:

[0046] S101: Using the error detection model of sequence annotation combined with the supervision data training model in the general field to perform error detection;

[0047] Specifically, error detection is performed by using an error detection model of sequence annotation combined with a supervised data training model i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for realizing cross-domain Chinese text error correction. The method comprises the following steps: carrying out error detection by adopting a sequence labeled error detection model and combining a supervision data training model of a general field; performing error retrieval in the pinyin library of the word list through the editing distance or the Jaccard distance to obtain an error replacement set; and sequentially replacing the words in the error replacement set with errors, adopting an rnnlm language model to carry out confusion degree calculation on the sentence after error replacement, determining correct words in the error replacement set according to the calculated sentence confusion degree, and completing Chinese text error correction. The invention provides the method for realizing cross-domain Chinese text error correction, namely a set of error detection-candidate recall-error correction sorting model, the error correction problem of cross-domain texts can be processed more universally, texts are recalled through a language model of deep learning training, the confusion degree of the recalled texts can be improved, the models are mutually decoupled, and the error correction efficiency is improved. And the efficiency is improved.

Description

technical field [0001] The invention relates to the field of text error correction, in particular to a method and system for realizing cross-field Chinese text error correction. Background technique [0002] In our daily life, when we use social tools such as WeChat and Weibo, typos often appear when we browse the web and read articles on official accounts, which leads to ambiguity in the meaning of the text. Chinese text error correction technology is an important technology for automatic checking and automatic error correction of Chinese sentences through natural language processing algorithms. Its purpose is to improve the correctness of language and enhance the efficiency and value of text interaction. The existing mainstream text error correction technologies are mainly divided into two types: one is to find the text error position through sequence learning, and then correct the pipeline method of text error information by sorting. The other is an end-to-end model base...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/232G06F40/289G06N3/04
CPCG06F40/232G06F40/289G06N3/044
Inventor 宋正博肖龙源李稀敏李威
Owner XIAMEN KUAISHANGTONG TECH CORP LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products