Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mongolian-Chinese neural machine translation domain adaptation method based on course learning

A machine translation and adaptive method technology, applied in the field of machine translation, can solve problems such as lack of corpus resources and insufficient support for model training, and achieve the effect of shortening the convergence time and improving the local minimum

Inactive Publication Date: 2021-02-19
INNER MONGOLIA UNIV OF TECH
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the corpus resources of the Mongolian and Chinese language pairs are poor, and the domain-specific corpus required to train a model in a specific domain is not enough to support the training of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mongolian-Chinese neural machine translation domain adaptation method based on course learning
  • Mongolian-Chinese neural machine translation domain adaptation method based on course learning
  • Mongolian-Chinese neural machine translation domain adaptation method based on course learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

[0030] Such as figure 1 As shown, the present invention is based on a curriculum learning-based Mongolian-Chinese neural machine translation domain adaptation method, based on the Transformer framework, including the following steps:

[0031] Step 1, corpus preparation

[0032] The prepared corpus includes: out-of-domain parallel corpus 1 for training the out-of-domain translation model, in-domain parallel corpus and out-of-domain parallel corpus 2 for mixed fine-tuning, and use BPE to process the three parts of the corpus respectively. The corpus can also be processed by BPE first, and then divided into the first parallel corpus outside the domain, the parallel corpus inside the domain and the second parallel corpus outside the domain. The number of sentences is tens of thousands.

[0033] In transfer learning, when the data distribution of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a neural machine translation method based on course learning. The method is based on a Transformer framework and is characterized by comprising the following steps: firstly, processing corpora to be used by using BPE, then training a model by using processed out-of-domain parallel corpora to obtain an out-of-domain translation model, and initializing a new sub-model by using the model; and then, calculating the distance between the out-domain corpus and the in-domain corpus by using a data selection method, sequentially feeding the data to the sub-models according to acertain sequence and a course learning strategy, and finally obtaining a translation model in a specific domain by combining a dropout regularization method.

Description

technical field [0001] The invention belongs to the technical field of machine translation, and in particular relates to a learning-based Mongolian-Chinese neural machine translation domain adaptation method. Background technique [0002] With the rapid development of computer technology, the proportion of computers in people's lives is increasing, and the intelligence possessed by computers is becoming stronger and stronger. In fields that humans are not good at, such as computing, computers shine. However, For things that humans are particularly good at, computers are mediocre, such as language and image-related fields, so there are two major research areas: natural language processing and computer vision. Among them, natural language processing can be subdivided into multiple fields, one of which is machine translation (MT for short), which uses a computer to translate text / speech fragments in one language into text / speech fragments in another language. However, it is di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06N3/04G06N3/08
CPCG06F40/58G06N3/08G06N3/045
Inventor 苏依拉范婷婷卞乐乐薛媛赵旭仁庆道尔吉
Owner INNER MONGOLIA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products