Neural machine translation method oriented to small language

A machine translation and language technology, applied in the field of neural machine translation, to achieve the effect of improving the translation effect

Active Publication Date: 2019-10-15
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF6 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is: to provide a neural machine translation method for minority languages, and to solve the problem of neural machine translation in the case of lack of parallel corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural machine translation method oriented to small language
  • Neural machine translation method oriented to small language
  • Neural machine translation method oriented to small language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0070] The overall structure of the neural machine translation model in this embodiment includes four parts: language model, mapper, discriminator, and translation model, such as figure 2 shown. The implementation process of the model is as follows figure 1 As shown, it mainly includes five parts, namely data preprocessing, language model training, mapper initialization, discriminator training, and translation model training.

[0071] 1. Data preprocessing mainly includes monolingual corpus collection and data preprocessing. Specifically:

[0072] 1.1 Collect a large amount of monolingual corpus in both the source language and the target language from the Internet, which can be obtained by crawling relevant websites with crawlers;

[0073] 1.2 Preprocessing small-scale parallel corpora and monolingual corpora, specifically including:

[0074] 1.2.1 Word segmentation: perform word segmentation processing on the source language and target language sentences;

[0075] 1.2.2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of neural machine translation, and discloses a neural machine translation method oriented to a small language. The method solves the problem of neural machine translation under the condition of lack of parallel corpora. According to the method, a neural machine translation model is constructed and trained through the following steps: 1, obtaining and preprocessing monolingual corpora; 2, respectively training language models of a source language and a target language by utilizing the monolingual corpus; 3, respectively training mapper used for mapping the encoding result of one language to the space of the other language by utilizing the encoding results of the bilingual parallel corpus in the parallel corpus of the small language in the language models of the source language and the target language; 4, training a discriminator model by utilizing the monolingual corpus; and 5, training a translation model by utilizing the language model, themapper, the discriminator model, the bilingual parallel corpus and the monolingual corpus. The method is suitable for translation between small languages only having small-scale parallel corpora.

Description

technical field [0001] The invention relates to the technical field of neural machine translation, in particular to a neural machine translation method for a small language with only a small-scale parallel corpus. Background technique [0002] Machine translation is a branch of natural language processing and one of the goals of artificial intelligence. With the development of neural network-related theories and technologies, research on machine translation has gradually shifted from traditional statistics-based machine translation to neural network-based machine translation. Neural machine translation has become one of the research focuses of current scholars. While promoting the development of theory and technology, it has also played an important role in promoting world economic and cultural exchanges. [0003] Neural machine translation has some characteristics of neural networks, such as: large data requirements, high computing performance requirements, etc., but also ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06N3/04G06N3/08
CPCG06N3/08G06F40/58G06N3/044G06N3/045
Inventor 田玲朱大勇秦科罗光春杨洋
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products