Method and device for enhancing grammar error correction data based on real error mode

An error mode, error correction data technology, applied in the field of data processing, can solve the problem of error existence in error data, error propagation syntax error correction model, and reducing the overall performance of the model.

Pending Publication Date: 2021-11-16
GUANGDONG UNIVERSITY OF FOREIGN STUDIES
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Existing data enhancement technologies mainly include methods based on editing operations, methods based on machine translation and back-translation, etc. Data enhancement methods based on editing operations use editing operations such as randomly selecting words from the dictionary for replacement or insertion, random deletion, and random position exchange. , although a large amount of artificial error data can be generated, the generated errors include error data that will not appear to learners. This unreal and low-quality error data will further propagate errors into the grammatical error correction model, thereby reducing the accuracy of the model. overall performance
Although back-translation-based methods can achieve better results, they still require large-scale labeled training data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for enhancing grammar error correction data based on real error mode
  • Method and device for enhancing grammar error correction data based on real error mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0058] Aiming at the problems existing in the prior art, one aspect of the present invention provides a grammatical error correction data enhancement method based on real error patterns, including:

[0059] Obtaining a sentence to be added to noise and a set of noise-adding strategies; wherein, the sentence to be added to noise contains a plurality of words;

[0060] Determining the noise-adding probability of each word in the sentence to be noise-added;

[0061] Randomly select a noise-adding strategy from the described noise-adding strategy set according to the noise-ad...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for enhancing grammar error correction data based on a real error pattern. The method comprises the following steps: acquiring a to-be-noise-added statement and a noise adding strategy set; determining the noise adding probability of each word in the statement to be subjected to noise adding; randomly selecting a noise adding strategy from a noise adding strategy set according to the noise adding probability to carry out noise adding processing on the to-be-noise-added word; and constructing parallel statement pairs according to the error statements subjected to noise addition processing and the correct statements before noise addition processing. The noise adding strategy set comprises a real error pattern-based replacement strategy, a synonym replacement strategy, a function word replacement strategy, a similar spelling replacement strategy and a flexion replacement strategy. According to the embodiment of the invention, through introduction of real errors and simulation of various real errors, high-quality artificial error enhancement data which is more real and closer to real errors of learners can be generated; and various grammar errors can be manufactured through various types of noise schemes, and the method and the device can be widely applied to the technical field of data processing.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a syntax error correction data enhancement method and device based on a real error mode. Background technique [0002] Grammatical error correction refers to the process of grammatical detection and error correction for sentences with grammatical errors and generating natural, fluent and grammatically correct sentences. In recent years, grammatical error correction models based on neural networks have achieved good results, but these models usually require large-scale labeled parallel training data. In actual scenarios, due to the high cost of manually labeling large-scale massive data, researchers try to Another perspective is to improve the performance of the error correction model, that is, through the data enhancement method, noise is introduced into the correct sentence to obtain pseudo-labeled data with errors, as a supplement to the manually labeled data, thereby i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/253G06F40/284G06F40/242
CPCG06F40/211G06F40/253G06F40/284G06F40/242
Inventor 李霞何俊毅
Owner GUANGDONG UNIVERSITY OF FOREIGN STUDIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products