Text grammar error correction method fusing monolingual data

A grammatical error and error correction technology, applied in the field of text error correction, can solve problems such as grammatical errors

Inactive Publication Date: 2020-11-24
HARBIN INST OF TECH
View PDF1 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most previous work has only focused on the generation of a few specific types of syntax errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text grammar error correction method fusing monolingual data
  • Text grammar error correction method fusing monolingual data
  • Text grammar error correction method fusing monolingual data

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0034] The present invention provides a text grammatical error correction method that integrates monolingual data, specifically:

[0035] A text grammatical error correction method fused with monolingual data, comprising the following steps:

[0036] Step 1: build the reverse grammatical error generation model, and train the reverse grammatical error generation model;

[0037] The step 1 is specifically:

[0038] Build a reverse grammatical error generation model. The input of the model is the correct sentence written in the parallel sentence pair, and the output is a sentence containing grammatical errors in the parallel sentence pair. The reverse grammatical error generation model is exactly the same as the forward grammatical error correction model. The network structure, learning criteria and training method of the given error sentence x=(x 1 , x 2 ,...,x m ) and the corresponding correction sentence y=(y 1 ,y 2 ,...,y n ), the noise-adding probability p(x|y) modele...

specific Embodiment 2

[0058] The present invention uses the back-translation method in neural machine translation to synthesize fake data.

[0059] First, a reverse grammatical error generation model is trained using the seed corpus,

[0060] During training, the source input of the model is the correct corrected sentence written in the "error-correction" parallel sentence pair, and the target output is the sentence containing grammatical errors in the parallel sentence pair.

[0061] Once trained, the inverse model can be used to "translate" a large number of correctly written texts into texts with grammatical errors, thereby constructing pseudo "error-correction" parallel sentence pairs.

[0062] It has been shown in past studies that in the decoding stage of the reverse model, if the Beam Search strategy is directly adopted, there will be insufficient grammatical error diversity in the generated pseudo-source error sentences. Different from the noise-added Beam Search decoding strategy used in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text grammar error correction method fusing monolingual data, and belongs to the technical field of text error correction, and the method comprises the steps: constructing areverse grammar error generation model, and training the reverse grammar error generation model; constructing an error correction parallel sentence pair for the text containing the grammar error according to the trained reverse grammar error generation model; adopting an adversarial training reverse grammar error generation model, and distinguishing grammar error sentences from error correction parallel sentence pairs; and correcting grammar error sentences by adopting an adversarial training forward grammar error correction model. According to the method, a sampling decoding strategy is adopted in back-transfer for the first time to construct a pseudo'error-correction 'parallel sentence pair; a grammar error generation model is trained based on an adversarial learning framework, and a more real pseudo error-correction parallel corpus is constructed by using the grammar error generation model.

Description

technical field [0001] The invention relates to the technical field of text error correction, and relates to a text grammatical error correction method that integrates monolingual data. Background technique [0002] With the gradual improvement of informatization, a large number of texts are produced. In the face of massive texts, there are bound to be some hidden grammatical errors in the texts written by humans, which poses a severe challenge to the traditional manual-based proofreading. . Correcting the hidden grammatical errors in the text can not only make the writing more fluent and easy to read, but also some special texts, if there are grammatical errors or logical errors, it will have a huge impact, and processing a large amount of text based on manual proofreading Obviously it is unrealistic, which makes text error correction technology more and more attention in recent years. This patent conducts a detailed analysis of the text grammatical error correction metho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/253G06F40/211G06N3/04G06N3/08
CPCG06F40/253G06F40/211G06N3/084G06N3/045
Inventor 朱海麒白明骏姜峰
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products