High-quality Mongolian-Chinese unsupervised neural machine translation method

A machine translation, unsupervised technology, applied in the field of neural machine translation, can solve problems such as lack, and achieve the effect of improving generation quality, improving translation fluency and translation accuracy, and the method is simple and feasible

Active Publication Date: 2021-03-30
INNER MONGOLIA UNIV OF TECH
View PDF12 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to overcome the shortcomings of the above-mentioned prior art, the purpose of the present invention is to provide a high-quality Mongolian-Chinese unsupervised neural machine translation method, and apply high-quality unsupervised learning strategies to Mongolian-Chinese neural machine translation to make full use of the existing large number of Unlabeled monolingual data to alleviate the lack of Mongolian-Chinese parallel corpus resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-quality Mongolian-Chinese unsupervised neural machine translation method
  • High-quality Mongolian-Chinese unsupervised neural machine translation method
  • High-quality Mongolian-Chinese unsupervised neural machine translation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

[0020] like figure 1 Shown, a kind of high-quality Mongolian-Chinese unsupervised neural machine translation method of the present invention, its process is:

[0021] Step 1. Use Bert to train an unsupervised tokenizer: Taking Mongolian as an example, use BPE to pre-segment the large-scale Mongolian monolingual corpus, and then use Bert to perform single-segmentation on the large-scale Mongolian monolingual segmentation corpus. Chinese language model pre-training, after training the Mongolian monolingual language model, use it as prior knowledge combined with a fusion subword-segment correlation matrix generation method to train the unsupervised Mongolian word segmenter, and then treat the word segmentation Mongolian sentence Score the correlation between any two subwords to complete the word segmentation, and the same is true for Chinese.

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-quality Mongolian-Chinese unsupervised neural machine translation method which comprises the steps that large-scale Mongolian-Chinese monolingual corpora are pre-segmented, Bert is used for carrying out monolingual language model pre-training on the segmented corpora to obtain Mongolian and Chinese language models, and an unsupervised Mongolian-Chinese word segmentation device is trained in combination with a correlation matrix generation method fusing sub-word segments; the correlation of any two sub-words in the to-be-segmented Mongolian-Chinese sentence is scored to complete word segmentation, the segmented Mongolian and Chinese are embedded into a shared potential space, Mongolian-Chinese bilingual word vector space is optimally aligned by using an unsupervised adversarial autonomous learning method, a Mongolian-Chinese language model is trained for the segmented Mongolian-Chinese monolingual corpus in the space, nearest neighbor search is performed byusing a CSLS method to obtain a Mongolian-Chinese bilingual dictionary based on a GAS framework, an initial Mongolian-Chinese translation model is trained in combination with the Mongolian-Chinese language model generated by a pre-training model, and a high-quality Mongolian-Chinese-Mongolian bidirectional dual unsupervised translation model is jointly trained by using an unsupervised back-translation method in combination with a dual learning strategy.

Description

technical field [0001] The invention belongs to the technical field of neural machine translation, and in particular relates to a high-quality Mongolian-Chinese unsupervised neural machine translation method. Background technique [0002] Machine translation has flourished in recent years, and machine translation tasks for rare resource languages ​​and minority languages ​​have also gained more attention. Mongolian is a language that is widely used across multiple countries and regions. The Mongolian language is the official language of the Inner Mongolia Autonomous Region. On the one hand, the research on Mongolian-Chinese machine translation is of great significance to the promotion of national culture dissemination and the communication of multi-ethnic people. On the other hand, it plays a positive role in promoting the development of rare resources and minority language machine translation research. However, due to the relatively late start of research on Mongolian natu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/205G06F40/242G06F40/284G06F40/289G06F40/30
CPCG06F40/58G06F40/205G06F40/284G06F40/289G06F40/242G06F40/30Y02D10/00
Inventor 苏依拉王昊贺玉玺仁庆道尔吉李雷孝石宝
Owner INNER MONGOLIA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products