Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Reading understanding data enhancement method and device based on back translation

A technology for reading comprehension and labeling data, applied in neural learning methods, electrical digital data processing, natural language data processing, etc., can solve problems such as difficult to obtain reliable reading comprehension models, and achieve increased data scale and data diversity, Alleviating the effects of data scarcity

Pending Publication Date: 2022-03-22
BEIJING UNISOUND INFORMATION TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a reading comprehension data enhancement method and device based on back-translation to solve the problem that it is difficult to obtain a reliable reading comprehension model in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reading understanding data enhancement method and device based on back translation
  • Reading understanding data enhancement method and device based on back translation
  • Reading understanding data enhancement method and device based on back translation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] This method proposes a reading comprehension data enhancement method based on back-translation, which can effectively expand the scale of training data, so that a more reliable reading comprehension model can also be obtained based on a limited data scale. Usually, the reading comprehension training data consists of three parts (document d, question q, answer a), and the training goal is to select the answer to the question from the document. This application provides a data enhancement method based on back-translation. First, use the open source Chinese-English bilingual parallel corpus to train a two-way neural machine translation model (English->Chinese, Chinese->English), and then use the translation model to convert the reading comprehension The documents in the training data are translated from Chinese to English, and then the English translation results are translated back into Chinese. Then based on the results of flipping back, construct the answer to the quest...

Embodiment 2

[0103] Figure 5 It is a schematic flow chart of the back-translation-based reading comprehension data enhancement method provided by Embodiment 2 of the present invention. The device for enhancing reading comprehension data based on back-translation includes: a training module 510 , an expansion module 520 and a construction module 530 .

[0104] The training module 510 is used to train a bidirectional neural machine translation model through the Chinese-English bilingual parallel corpus;

[0105] The expansion module 520 is used to expand the reading comprehension document pre-labeled data through the neural machine translation model;

[0106] The construction module 530 is used for constructing the answers of the pre-labeled data according to the answers in the reading comprehension training data and the pre-labeled data.

[0107] Wherein, the training module 510 is specifically used for:

[0108] Obtain the Chinese-English bilingual parallel corpus pair; the Chinese-Eng...

Embodiment 3

[0125] Embodiment 3 of the invention provides a device, including a memory and a processor, the memory is used to store programs, and the memory can be connected to the processor through a bus. The memory can be non-volatile memory, such as a hard drive and flash memory, where software programs and device drivers are stored. The software program can execute various functions of the above method provided by the embodiment of the present invention; the device driver can be a network and interface driver. The processor is configured to execute a software program, and when the software program is executed, the method provided in Embodiment 1 of the present invention can be realized.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a reading understanding data enhancement method based on back translation. The method comprises the following steps: training a bidirectional neural machine translation model through Chinese-English bilingual parallel corpora; expanding, reading and understanding document pre-annotation data through the neural machine translation model; and according to the answers in the reading understanding training data and the pre-annotated data, constructing the answers of the pre-annotated data.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for enhancing reading comprehension data. Background technique [0002] The current machine reading comprehension system mainly consists of the following steps: first, mark a given text and the corresponding question, and mark a segment in the text as the answer; then build a neural network model, input the text and the question, and mark the correct answer segment as the model output. This method proposes a reading comprehension data augmentation method based on back-translation, which provides more available training data for building a reliable reading comprehension system in a lower-resource domain, thereby alleviating the need for training data in the domain. scarcity problem. [0003] Existing relatively mature machine reading comprehension models are often trained from large-scale labeled news corpora. However, other professional fields (such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/332G06F40/58G06F40/49G06F40/44G06F40/289G06N3/04G06N3/08
CPCG06F16/3329G06F40/58G06F40/49G06F40/289G06F40/44G06N3/08G06N3/047
Inventor 王亦宁梁家恩
Owner BEIJING UNISOUND INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products