Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text error correction method and system

A text error correction and confidence level technology, applied in the field of text error correction methods and systems, can solve the problem of lack of a large amount of training corpus

Active Publication Date: 2020-06-19
新华智云科技有限公司
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims at the shortcomings in the prior art, solves the problem of lacking a large amount of training corpus in the prior art solution by using BERT, and utilizes a new Chinese character encoding method, comprehensively considers the impact of Chinese spelling fonts on error text, and dynamically adapts to various Error correction task

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text error correction method and system
  • Text error correction method and system
  • Text error correction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described in detail below in conjunction with the examples, the following examples are explanations of the present invention and the present invention is not limited to the following examples.

[0039] A text error correction method, comprising the following steps:

[0040] Train the BERT model;

[0041] Input the sentence to be detected into the BERT model to obtain the confidence TopK candidate set at each position;

[0042] Encode the Chinese characters, and calculate the similarity between the candidate item in the candidate set and the original item based on the Chinese character encoding;

[0043] Using the similarity and confidence to comprehensively calculate the error correction probability;

[0044] Reorder the candidate set according to the error correction probability;

[0045] Compared with the set threshold, if the probability is lower than the threshold, the error will not be corrected; if not, the Top1 in the cand...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text error correction method and system. The method comprises the steps of training a BERT model; encoding the Chinese characters; inputting a sentence to be detected into the BERT model to obtain a confidence coefficient TopK candidate set at each position; calculating the similarity between the candidate item and the original item by using the Chinese character code; calculating an error correction probability by utilizing the similarity and the confidence coefficient; reordering the candidate set according to the error correction probability; comparing the probability with a set threshold value, and if the probability is lower than the threshold value, not performing error correction; and if not, taking out the Top1 in the candidate set as a final error correction result. According to the method, the problem that a large number of training corpora are lacked in an existing technical scheme is solved by using BERT, the influence of the spelling fonts of theChinese characters on the error text is comprehensively considered by using a novel Chinese character encoding mode, and the method is dynamically suitable for various error correction tasks.

Description

technical field [0001] The invention relates to the field of language processing, in particular to a text error correction method and system. Background technique [0002] Existing text error correction methods are mainly divided into statistical models and neural network models. The method based on the statistical model takes N-gram as an example, by calculating the adjacent N-gram probability with the target word as the core in the sentence, the error detection and judgment of the target word are performed, and the confusion set is sorted at the same time, and the first candidate word is selected for correction. error text. The method based on the neural network model takes the Seq2Seq model as an example. By inputting the sentence into the Encoder, after the model is calculated, the Decoder will output the character with a confidence score of Top1 at each position in the sentence for error correction. [0003] The above prior art has the following disadvantages: (1) The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/232G06F40/126G06K9/62
CPCG06F18/22Y02D10/00
Inventor 陈司浩
Owner 新华智云科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products