Reading understanding data enhancement method and device based on back translation
A technology for reading comprehension and labeling data, applied in neural learning methods, electrical digital data processing, natural language data processing, etc., can solve problems such as difficult to obtain reliable reading comprehension models, and achieve increased data scale and data diversity, Alleviating the effects of data scarcity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0057] This method proposes a reading comprehension data enhancement method based on back-translation, which can effectively expand the scale of training data, so that a more reliable reading comprehension model can also be obtained based on a limited data scale. Usually, the reading comprehension training data consists of three parts (document d, question q, answer a), and the training goal is to select the answer to the question from the document. This application provides a data enhancement method based on back-translation. First, use the open source Chinese-English bilingual parallel corpus to train a two-way neural machine translation model (English->Chinese, Chinese->English), and then use the translation model to convert the reading comprehension The documents in the training data are translated from Chinese to English, and then the English translation results are translated back into Chinese. Then based on the results of flipping back, construct the answer to the quest...
Embodiment 2
[0103] Figure 5 It is a schematic flow chart of the back-translation-based reading comprehension data enhancement method provided by Embodiment 2 of the present invention. The device for enhancing reading comprehension data based on back-translation includes: a training module 510 , an expansion module 520 and a construction module 530 .
[0104] The training module 510 is used to train a bidirectional neural machine translation model through the Chinese-English bilingual parallel corpus;
[0105] The expansion module 520 is used to expand the reading comprehension document pre-labeled data through the neural machine translation model;
[0106] The construction module 530 is used for constructing the answers of the pre-labeled data according to the answers in the reading comprehension training data and the pre-labeled data.
[0107] Wherein, the training module 510 is specifically used for:
[0108] Obtain the Chinese-English bilingual parallel corpus pair; the Chinese-Eng...
Embodiment 3
[0125] Embodiment 3 of the invention provides a device, including a memory and a processor, the memory is used to store programs, and the memory can be connected to the processor through a bus. The memory can be non-volatile memory, such as a hard drive and flash memory, where software programs and device drivers are stored. The software program can execute various functions of the above method provided by the embodiment of the present invention; the device driver can be a network and interface driver. The processor is configured to execute a software program, and when the software program is executed, the method provided in Embodiment 1 of the present invention can be realized.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com