Unlock instant, AI-driven research and patent intelligence for your innovation.

Data enhancement method and system based on multilingual machine translation

A machine translation and multilingual technology, applied in the field of translation and artificial intelligence, can solve the problems of unfavorable low-resource language data generation, limited improvement of data diversity, and high cost of manual labeling, so as to reduce memory requirements, improve model performance, and reduce resources. cost effect

Pending Publication Date: 2021-05-04
BEIJING UNISOUND INFORMATION TECH +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The general-purpose neural machine translation system is based on the end-to-end Encoder-Decoder framework. In general, a large-scale parallel corpus is required for supervised model training. However, due to the high cost of manual labeling, bilingual corpora in low-resource languages , usually use data enhancement methods such as back-translation and data noise to forge parallel corpus and add them to the training data for training, but data noise has limited improvement in data diversity; the quality of data generated by back-translation depends on the translation model , and the effect of the translation model depends on the size of the parallel corpus, so it is not conducive to data generation on low-resource languages, and when the number of languages ​​increases, more translation models need to be trained

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data enhancement method and system based on multilingual machine translation
  • Data enhancement method and system based on multilingual machine translation
  • Data enhancement method and system based on multilingual machine translation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0032] figure 1 A flow chart of a data enhancement method based on multilingual machine translation provided by an embodiment of the present invention, as shown in figure 1 As shown, the method includes:

[0033] 110. Using a pre-trained multilingual translation model to translate the original sentence from the source language to the target language to obtain multiple candidate translations with different probabilities, wherein the source language is the same as the target language;

[0034] Specifically, before performing step 110, it is necessary to use training data to train the multilingual translation model, and the training data includes bilingual parallel corpora of multiple language pairs, such as language ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data enhancement method and system based on multilingual machine translation. The method comprises the steps: translating an original sentence from a source language to a target language through a pre-trained multilingual translation model to obtain a plurality of candidate translations with different probabilities, the source language being the same as the target language; and reserving candidate translations which are not completely the same as the original sentences in the plurality of candidate translations as training data for training corresponding translation models. Due to joint training and knowledge migration, translation of low-resource and zero-resource languages in the multilingual translation model is benefited from high-resource languages, so that high-quality and diversified training data can be obtained as training resources for training the translation model, and improvement of model performance is facilitated; moreover, since the multilingual translation model has the capabilities of multilingual translation and zero-resource translation, translation among the same languages of multiple languages can be realized, so that data enhancement of the multiple languages can be carried out only by training one multilingual translation model, and the resource cost is reduced.

Description

technical field [0001] The invention relates to the field of translation and artificial intelligence, in particular to a data enhancement method and system based on multilingual machine translation. Background technique [0002] The general-purpose neural machine translation system is based on the end-to-end Encoder-Decoder framework. In general, a large-scale parallel corpus is required for supervised model training. However, due to the high cost of manual labeling, bilingual corpora in low-resource languages , usually use data enhancement methods such as back-translation and data noise to forge parallel corpus and add them to the training data for training, but data noise has limited improvement in data diversity; the quality of data generated by back-translation depends on the translation model , and the effect of the translation model depends on the size of the parallel corpus, so it is not conducive to data generation in low-resource languages, and when the number of la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/289G06N3/02
CPCG06F40/58G06F40/289G06N3/02
Inventor 丁颖孙见青梁家恩
Owner BEIJING UNISOUND INFORMATION TECH