Machine translation method for semantic vector based on multilingual parallel corpus

A parallel corpus, machine translation technology, applied in the field of machine translation of semantic vectors, can solve problems such as the curse of dimensionality and the inability to measure the relationship between two words

Active Publication Date: 2016-12-07
黑龙江省工研院资产经营管理有限公司
View PDF6 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional One-hot Representation is brief but sparse, words are isolated from each other, it is impossible to measure whether two words have a relationship, and it will cause a disaster of dimensionality when solving certain tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine translation method for semantic vector based on multilingual parallel corpus
  • Machine translation method for semantic vector based on multilingual parallel corpus
  • Machine translation method for semantic vector based on multilingual parallel corpus

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0047] Specific Embodiment 1: The machine translation method based on the semantic vector of multilingual parallel corpus in this embodiment is specifically prepared according to the following steps:

[0048] Step 1. Since the whole process is actually a two-way vector implicit splicing process, it is called a vector-based implicit splicing model; figure 1 The process of the model is depicted; we can see from the figure that the translation system from source language 1 to the target language and the translation system from source language 2 to the target language are not completely independent, and the parameters are not independent; during the training process, the input parallel The source language 1, the source language 2 and the standard target language; wherein, the standard target language is the target language corresponding to the parallel source language 1 and the target language corresponding to the source language 2;

[0049] During the training process, the source...

specific Embodiment approach 2

[0076] Specific embodiment two: the difference between this embodiment and specific embodiment one is: calculate h' in step two i The specific process is as follows: when the encoding part is building a recurrent neural network (Recurrent Neural Network, RNN), random initialization Calculated using formula (5)

[0077] h → i ′ = σ ( W h → ′ x ′ ) x j ′ + W h → ′ h → ′ ...

specific Embodiment approach 3

[0098] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is: in step two, calculate h″ i Specific process:

[0099] In the encoding part, when reversely establishing a Recurrent Neural Network (RNN), it is first randomly initialized Initialize using formula (2) Calculated by formula (6)

[0100] h → 0 ′ ′ = σ ( W h → ′ ′ x ′ ′ ) x 0 ′ ′ + W h ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a machine translation method for a semantic vector based on multilingual parallel corpus, and relates to machine translation methods. The problem to be solved in the invention is that semantic information obtained by the bilingual parallel corpus is usually less. The machine translation method comprises the following steps: 1, inputting parallel source languages 1, 2 and a target language; 2, carrying out calculation according to a formula (1) to a formula (6) to obtain implicit states h' and h''; 3, calculating the obtained vector c; 4, generating the target language; or, 1, inputting the source languages 1, 2 and the target language; 2, calculating a normalized cosine distance of the vector c1 and a vector c2; 3, comparing the similarity of the vector c1 and the vector c2; 4, setting dis(c1, c2) be greater than threshold delta, setting a sentence set S1 of the source language 1 and a sentence set S2 of the source language 2, namely expressing as the following constrain optimization problem; and 5, establishing a final target function. The machine translation method provided by the invention is applied to the machine translation field.

Description

technical field [0001] The invention relates to a machine translation method of semantic vectors, in particular to a machine translation method based on semantic vectors of multilingual parallel corpus. Background technique [0002] Vector representation is a commonly used method to formalize text in natural language processing. The vector representation in natural language processing has developed from the traditional 0-1 vector (One-hot Representation) representation method to the current word embedding (Word Embedding) representation method in deep learning, which has brought a variety of mainstream tasks in the field of natural language processing. Profound significance. The traditional One-hot Representation is simple but sparse, words are isolated from each other, it is impossible to measure whether two words have a relationship, and it will cause a disaster of dimensionality when solving certain tasks. The Word Embedding representation method in deep learning is low...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27G06N3/04G06N3/08
CPCG06N3/08G06F40/30G06F40/58G06N3/044
Inventor 朱聪慧赵铁军郑德权杨沐昀徐冰曹海龙
Owner 黑龙江省工研院资产经营管理有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products