Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

A bilingual dictionary and machine translation technology, applied in the field of machine translation, can solve problems such as poor translation quality

Active Publication Date: 2020-10-09
KUNMING UNIV OF SCI & TECH
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, neural machine translation requires large-scale parallel corpus to achieve good results. When the training data is insufficient, it will lead to poor translation quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary
  • Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary
  • Chinese- Vietnamese unsupervised neural machine translation method fusing EMD minimized bilingual dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Embodiment 1: as Figure 1-7 As shown, the Chinese-Vietnamese unsupervised neural machine translation method that integrates the EMD minimized bilingual dictionary, Step1, first obtain parallel corpora: 58 million Chinese monolingual corpora crawled from the Internet, and 30 million Vietnamese monolingual corpora.

[0056] Step2, corpus preprocessing; on the basis of step Step1, Chinese and Vietnamese single-sentence word segmentation and part-of-speech tagging are trained to obtain single-language word vectors; Vietnamese word segmentation and part-of-speech tagging are performed using the undertheseanlp Vietnamese word segmentation tool for Vietnamese, Use the jieba word segmentation tool to perform word segmentation and part-of-speech tagging for Chinese. Using word2vec to train Chinese-Vietnamese and Vietnamese monolingual word vectors. Both Chinese and Vietnamese train 300-dimensional word vectors. The 300-dimensional word embeddings are trained using the skip-gr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese-Vietnamese unsupervised neural machine translation method fusing an EMD minimized bilingual dictionary, and belongs to the technical field of machine translation. The method comprises the steps of collecting corpora; crawling Chinese and Vietnamese single sentences by using a web crawler; firstly, training monolingual word embedding of Chinese and Vietnamese respectively, and obtaining a Chinese-Vietnamese bilingual dictionary through EMD training of minimized word embedding distribution; taking the dictionary as a seed dictionary for training to obtain Chinese-Vietnamese bilingual word embedding; and finally, embedding the bilingual words into an unsupervised machine translation model of a shared encoder to construct the Chinese-Vietnamese neural machinetranslation method fusing the EMD minimized bilingual dictionary. According to the method, the performance of the Hami unsupervised neural machine translation can be effectively improved.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese unsupervised neural machine translation method that integrates EMD (Earth Mover's Distance) minimum bilingual dictionary, and belongs to the technical field of machine translation. Background technique [0002] Neural machine translation is a machine translation method proposed in recent years, and the quality of neural machine translation has surpassed statistical machine translation in multiple language pairs to become the mainstream translation method. However, neural machine translation requires large-scale parallel corpus to achieve good results. When the training data is insufficient, it will lead to poor translation quality. Parallel corpora between Chinese and Vietnamese are scarce and not easy to obtain, so Chinese-Vietnamese machine translation is a typical low-resource language machine translation. However, Chinese and Vietnamese have a large amount of monolingual corpus. This paper explores the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/289G06F40/284G06F40/242G06F40/247G06F16/951
CPCG06F16/951
Inventor 余正涛薛明亚高盛祥赖华翟家欣朱恩昌陈玮
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products