Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese-Vietnamese unsupervised neural machine translation method based on shared encoder

A technology of machine translation and encoder, applied in the field of Chinese-Vietnamese unsupervised neural machine translation

Pending Publication Date: 2021-01-29
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, some people use pivot language and semi-supervised methods to solve the low-resource problem, but these methods still require a lot of cross-lingual information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-Vietnamese unsupervised neural machine translation method based on shared encoder
  • Chinese-Vietnamese unsupervised neural machine translation method based on shared encoder
  • Chinese-Vietnamese unsupervised neural machine translation method based on shared encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Such as Figure 1-2 As shown, based on the Chinese-Vietnamese unsupervised neural machine translation method based on a shared encoder, the specific steps of the Chinese-Vietnamese unsupervised neural machine translation method based on a shared encoder are as follows:

[0023] Step1. First, obtain the monolingual corpora of Chinese and Vietnamese respectively. Use Chinese and Vietnamese monolingual corpora to train monolingual word embedding matrices X and Y, X i* is the i-th source language word embedding, Y j* embedding for the jth target language word. Represent the dictionary in the form of a binary matrix D, when the i-th word in the source language and the j-th word in the target language are aligned with each other D ij =1. The goal of learning word mapping is to find the best mapping matrix W*, which can make the mapped X i* and Y j* The Euclidean distance of is the shortest, the formula is as follows:

[0024]

[0025] Length-normalize and center the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese-Vietnamese unsupervised neural machine translation method based on a shared encoder. Only Chinese and Vietnamese monolingual corpora are used, and training is carried out in an unsupervised mode. Firstly, number alignment is used as a seed dictionary to train Chinese-Vietnamese bilingual word embedding. The bilingual words are embedded and applied to a shared encoder model, Chinese and Vietnamese are mapped to the same semantic space, and decoded by a decoder to realize shared encoder-based Chinese-Vietnamese unsupervised neural machine translation. Comparedwith GNMT and Transformer, the Chinese-Vietnamese unsupervised neural machine translation model has great advantages under extremely low resources, and a semi-supervised translation model trained after a small amount of parallel corpora are added on the basis of the unsupervised model exceeds a supervised translation model trained by directly using the same amount of parallel corpora.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese unsupervised neural machine translation method based on a shared encoder, and belongs to the technical field of natural language processing. Background technique [0002] In the field of Vietnamese natural language processing, for Chinese-Vietnamese machine translation, due to the relatively large language differences between Chinese and Vietnamese, there are no cognate words, and bilingual parallel corpus is scarce while monolingual corpus is sufficient, lacking large-scale , high-quality parallel corpora is a practical problem. Chinese to Vietnamese translation is a typical low-resource machine translation situation. At present, some people use pivot language and semi-supervised methods to solve the low-resource problem, but these methods still require a lot of cross-lingual information. Therefore, the unsupervised Chinese-Vietnamese machine translation method using only monolingual corpus is studied, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/126G06F40/242G06F40/30G06N3/04G06N3/08
CPCG06F40/58G06F40/30G06F40/242G06F40/126G06N3/08G06N3/044G06N3/045
Inventor 余正涛薛振宇文永华郭军军王振晗相艳
Owner KUNMING UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More