Neural machine translation method of merging multilingual coded information

A technology for encoding information and machine translation, applied in the field of neural machine translation, which can solve problems such as low translation accuracy

Active Publication Date: 2017-11-17
HARBIN INST OF TECH
View PDF5 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problem of low translation accuracy in the prior art, and propose a neural machine translation method that fuses multilingual coding information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural machine translation method of merging multilingual coded information
  • Neural machine translation method of merging multilingual coded information
  • Neural machine translation method of merging multilingual coded information

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0049] Specific implementation mode one: combine figure 1 , figure 2 To illustrate this embodiment, the specific process of the neural machine translation method for fusing multilingual coded information in this embodiment is as follows:

[0050] Step 1. Use the word segmentation script tokenizer.perl provided by the statistical machine translation platform Moses (Moses) to segment the trilingual (Chinese, English, converted into Japanese) parallel corpus, and then use BPE (Byte Pair Encoding, BPE) ) (learn_bpe.py script under the Nematus platform) characterizes the trilingual parallel corpus to be processed after word segmentation into a series of subword symbol sequences corresponding to each language, and uses the build_dictionary.py script under the Nematus platform to build source input Language dictionary dic_s 1 ,dic_s 2 and the target language dictionary dic_t;

[0051] Step 2. Input the language dictionary dic_s based on the source 1 To the subcharacter sequence...

specific Embodiment approach 2

[0066] Specific embodiment two: what this embodiment is different from specific embodiment one is: in described step 3, based on the GRU unit, form the bidirectional cyclic encoder that recurrent neural network is formed, the bidirectional cyclic encoder is to the word vector W=( w 1 ,w 2 ,...,w T ) and word vector W'=(w' 1 ,w′ 2 ,...,w′ T′ ) to encode, get W=(w 1 ,w 2 ,...,w T ) encoding vector ctx_s 1 and W'=(w' 1 ,w′ 2 ,...,w′ T′ ) encoding vector ctx_s 2 ; The specific process is:

[0067] Step three one,

[0068] The bidirectional encoder pair W=(w 1 ,w 2 ,...,w T ) Calculate the forward encoding state information according to the forward word sequence

[0069] The bidirectional encoder pair W=(w 1 ,w 2 ,...,w T ) Calculate the reverse encoding state information according to the reverse word sequence

[0070] The bidirectional encoder pair W'=(w 1 ',w 2 ',...,w T ″) Calculate the forward encoding state informa...

specific Embodiment approach 3

[0075] Specific implementation mode three: combination figure 1 , image 3 To illustrate this embodiment, the specific process of the neural machine translation method for fusing multilingual coded information in this embodiment is as follows:

[0076] Step 1), use the word segmentation script tokenizer.perl provided by the statistical machine translation platform Moses (Moses) to segment the trilingual (Chinese-English to Japanese) parallel corpus to be processed, and then use BPE (Byte Pair Encoding (BPE)) (learn_bpe.py script under the Nematus platform) characterize the trilingual parallel corpus to be processed after word segmentation into a series of subword symbol sequences corresponding to each language, and use the build_dictionary.py script under the Nematus platform to build the source input language dictionary dic_s 1 ,dic_s 2 and the target language dictionary dic_t;

[0077] Step 2), based on the source input language dictionary dic_s 1 To the subc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a neural machine translation method of merging multilingual coded information, and relates to a method of neural machine translation. The purpose is to solve the problem of low accuracy of translation in the prior art. The process comprises the steps of firstly, obtaining subcharacter symbol sequence corresponding to each language, establishing dic_s1,dic_s2 and dic_t; secondly, inputting a word vector into an NMT model for training, updating the word vector according to an initial value training, until the bleu value of the NMT model is increased by 1-3 points; thirdly, obtaining ctx_s1 and ctx_s2; fourthly, obtaining the merged result; fifthly, obtaining C; sixthly, calculating qt+1 at time t +1 according to the formula to obtain the probability distribution pt+1 of the word y't+1 at time t +1 for a target language sequence, sampling the target word y't+1 at time t +1 according to pt+1, until the closing tag of the sentence is decoded, and the decoding translation is finished. The neural machine translation method of merging multilingual coded information is used in the machine translation field.

Description

technical field [0001] The present invention relates to neural machine translation methods. Background technique [0002] Machine translation is the process of converting a source language into another target language using a computer. Before the rise of neural networks, the mainstream statistical machine translation built a statistical translation model through statistical analysis of a large number of parallel corpora, and then proceeded with structures such as syntax trees and word alignment. Not only was the process cumbersome, but the realization of the model was also very complicated. With the development and popularization of neural networks, researchers are trying to apply neural networks to the field of machine translation. With the most mainstream end-to-end neural machine translation model, this model does not require complex structures and cumbersome artificial features. , instead a neural network is used to map the source language to the target language, and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27
CPCG06F40/247G06F40/289G06F40/58
Inventor 朱聪慧曹海龙赵铁军刘笛杨沐昀郑德权徐冰
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products