Mongolian and Chinese inter-translation method based on monolingual corpus training

A corpus and monolingual technology, applied in the field of machine translation, can solve problems such as lack of resources, and achieve the effect of promoting prosperity and improving translation quality

Inactive Publication Date: 2018-11-16
INNER MONGOLIA UNIV OF TECH
View PDF2 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to overcome the shortcomings of the above-mentioned prior art and make full use of existing data to alleviate the problem of resource shortage, the object of the present invention is to provide a Mongolian-Chinese intertranslation method based on monolingual corpus training, which will give unlabeled (i.e. monolingual corpus ) data, introduce a denoising autoencoder and cross-domain training to learn the translation between Mongolian and Chinese languages, use adversarial training to learn a similar latent space between the two languages, and establish a denoising autoencoder loss function, Establish the loss function of the translation process and the loss function of the discriminator, and set the constraints so that the sum of the above three different losses is the smallest

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mongolian and Chinese inter-translation method based on monolingual corpus training
  • Mongolian and Chinese inter-translation method based on monolingual corpus training
  • Mongolian and Chinese inter-translation method based on monolingual corpus training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

[0042] The present invention is a Mongolian-Chinese intertranslation method based on monolingual corpus training, which trains the Mongolian-Chinese intertranslation translation model and sets constraints, including establishing a noise reduction autoencoder loss function, establishing a translation process loss function and establishing a discriminator loss function, And set the constraint conditions so that the sum of the above three different losses is the smallest. Since this is a sequence-to-sequence problem, the present invention uses a long-short-term memory network (LSTM) and uses two LSTM-based autoencoders, one each for Mongolian and Chinese. In the present invention, training the Mongolian-Chinese intertranslation model includes the following main steps:

[0043] (1) Use the encoder of language A (Mongolian or Chinese) and the decode...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Mongolian and Chinese inter-translation method based on monolingual corpus training. The method comprises the following steps: specially using a nose-reduction automatic encoder based on two automatic encoders which are respectively used for Mongolian language and Chinese language; training a Mongolian and Chinese inter-translation model by using monolingual source language corpus and a monolingual target language corpus; setting three restriction conditions including noise-reduction loss functions for the automatic encoders, loss functions for a translation process and loss functions for an identifier in a training process so as to ensure that the sum of the three losses is minimum, and thus, a translation system of a Mongolian and Chinese machine is improved, and a better translation target is fulfilled. According to the Mongolian and Chinese inter-translation method disclosed by the invention, the problem of poor Mongolian and Chinese inter-translation translated texts caused by deficiency of Mongolian and Chinese parallel corpus is alleviated by utilizing an existing Mongolian and Chinese monolingual corpus to a great limit, a research related with theMongolian language linguistics is driven in a radiation manner, research evidences are provided for researching and developing machine translation and multi-language voice technologies, information processing procedures of the Mongolian language is propelled, and a reference is provided for developing relevant researches on the other minority languages, and therefore, certain significance is achieved.

Description

technical field [0001] The invention belongs to the technical field of machine translation, in particular to a Mongolian-Chinese mutual translation method based on monolingual corpus training. Background technique [0002] Machine translation research on how to use computers to realize automatic conversion between natural languages ​​is one of the important research directions in the field of artificial intelligence and natural language processing. As a key technology to break through the "language barrier" problem faced by different countries and nations in information transmission, machine translation is of great significance for promoting national unity, strengthening cultural exchanges and promoting foreign trade. In recent years, machine translation has received more and more attention. On the one hand, machine translation technology has urgent social needs, and imperfect machine translation technology is more and more widely used in the industry. Real-time voice trans...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
CPCG06F40/53G06F40/58
Inventor 苏依拉牛向华赵亚平
Owner INNER MONGOLIA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products