Neural machine translation method of merging multilingual coded information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for encoding information and machine translation, applied in the field of neural machine translation, which can solve problems such as low translation accuracy

Active Publication Date: 2017-11-17

HARBIN INST OF TECH

View PDF5 Cites 58 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problem of low translation accuracy in the prior art, and propose a neural machine translation method that fuses multilingual coding information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0049] Specific implementation mode one: combine figure 1 , figure 2 To illustrate this embodiment, the specific process of the neural machine translation method for fusing multilingual coded information in this embodiment is as follows:

[0050] Step 1. Use the word segmentation script tokenizer.perl provided by the statistical machine translation platform Moses (Moses) to segment the trilingual (Chinese, English, converted into Japanese) parallel corpus, and then use BPE (Byte Pair Encoding, BPE) ) (learn_bpe.py script under the Nematus platform) characterizes the trilingual parallel corpus to be processed after word segmentation into a series of subword symbol sequences corresponding to each language, and uses the build_dictionary.py script under the Nematus platform to build source input Language dictionary dic_s 1 ,dic_s 2 and the target language dictionary dic_t;

[0051] Step 2. Input the language dictionary dic_s based on the source 1 To the subcharacter sequence...

specific Embodiment approach 2

[0066] Specific embodiment two: what this embodiment is different from specific embodiment one is: in described step 3, based on the GRU unit, form the bidirectional cyclic encoder that recurrent neural network is formed, the bidirectional cyclic encoder is to the word vector W=( w 1 ,w 2 ,...,w T ) and word vector W'=(w' 1 ,w′ 2 ,...,w′ T′ ) to encode, get W=(w 1 ,w 2 ,...,w T ) encoding vector ctx_s 1 and W'=(w' 1 ,w′ 2 ,...,w′ T′ ) encoding vector ctx_s 2 ; The specific process is:

[0067] Step three one,

[0068] The bidirectional encoder pair W=(w 1 ,w 2 ,...,w T ) Calculate the forward encoding state information according to the forward word sequence

[0069] The bidirectional encoder pair W=(w 1 ,w 2 ,...,w T ) Calculate the reverse encoding state information according to the reverse word sequence

[0070] The bidirectional encoder pair W'=(w 1 ',w 2 ',...,w T ″) Calculate the forward encoding state informa...

specific Embodiment approach 3

[0075] Specific implementation mode three: combination figure 1 , image 3 To illustrate this embodiment, the specific process of the neural machine translation method for fusing multilingual coded information in this embodiment is as follows:

[0076] Step 1), use the word segmentation script tokenizer.perl provided by the statistical machine translation platform Moses (Moses) to segment the trilingual (Chinese-English to Japanese) parallel corpus to be processed, and then use BPE (Byte Pair Encoding (BPE)) (learn_bpe.py script under the Nematus platform) characterize the trilingual parallel corpus to be processed after word segmentation into a series of subword symbol sequences corresponding to each language, and use the build_dictionary.py script under the Nematus platform to build the source input language dictionary dic_s 1 ,dic_s 2 and the target language dictionary dic_t;

[0077] Step 2), based on the source input language dictionary dic_s 1 To the subc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a neural machine translation method of merging multilingual coded information, and relates to a method of neural machine translation. The purpose is to solve the problem of low accuracy of translation in the prior art. The process comprises the steps of firstly, obtaining subcharacter symbol sequence corresponding to each language, establishing dic_s1,dic_s2 and dic_t; secondly, inputting a word vector into an NMT model for training, updating the word vector according to an initial value training, until the bleu value of the NMT model is increased by 1-3 points; thirdly, obtaining ctx_s1 and ctx_s2; fourthly, obtaining the merged result; fifthly, obtaining C; sixthly, calculating qt+1 at time t +1 according to the formula to obtain the probability distribution pt+1 of the word y't+1 at time t +1 for a target language sequence, sampling the target word y't+1 at time t +1 according to pt+1, until the closing tag of the sentence is decoded, and the decoding translation is finished. The neural machine translation method of merging multilingual coded information is used in the machine translation field.

Description

technical field [0001] The present invention relates to neural machine translation methods. Background technique [0002] Machine translation is the process of converting a source language into another target language using a computer. Before the rise of neural networks, the mainstream statistical machine translation built a statistical translation model through statistical analysis of a large number of parallel corpora, and then proceeded with structures such as syntax trees and word alignment. Not only was the process cumbersome, but the realization of the model was also very complicated. With the development and popularization of neural networks, researchers are trying to apply neural networks to the field of machine translation. With the most mainstream end-to-end neural machine translation model, this model does not require complex structures and cumbersome artificial features. , instead a neural network is used to map the source language to the target language, and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/28G06F17/27

CPCG06F40/247G06F40/289G06F40/58

Inventor 朱聪慧曹海龙赵铁军刘笛杨沐昀郑德权徐冰

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Neural machine translation method of merging multilingual coded information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology