Method and device for fusing multiple machine translation systems

A technology of machine translation and translation results, which is applied in the fields of instruments, special data processing applications, and electrical digital data processing, etc. It can solve the problems of not fully considering the decoding process information and the decoding search space, etc., to achieve good scalability, The effect of improving performance

Inactive Publication Date: 2014-03-19
HARBIN UNIV OF SCI & TECH
View PDF3 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention is to solve the problem that the traditional method of system fusion in post-processing does not fully consider the information of the decoding process, and the fusion of post-processing cannot fully consider the huge search space in decoding, and provides a multi-machine Method and device for translation system integration

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for fusing multiple machine translation systems
  • Method and device for fusing multiple machine translation systems
  • Method and device for fusing multiple machine translation systems

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0036] Embodiment 1: The device for integrating multiple machine translation systems in this embodiment includes a monolingual or bilingual preprocessor, a phrase extractor, a language model generator, multiple machine translation system trainers and decoders;

[0037] The monolingual or bilingual preprocessor preprocesses monolingual and bilingual; the phrase extractor extracts the phrase from the bilingual training corpus and puts it in the phrase table; uses the language model generator to train the language model from the monolingual training corpus; The machine translation system before fusion uses the phrase table and language model for training, and uses the parameter weights obtained from the training as the weights of the final decoder; the decoder decodes the test corpus to generate translation results, and evaluates the translation results to output scores.

specific Embodiment approach 2

[0038] Specific embodiment two: the method for the fusion of multiple machine translation systems in this embodiment is implemented in the following steps:

[0039] 1. The preprocessing process of the machine translation system;

[0040] 2. Establish a translation hypergraph for each translation system;

[0041] 3. Fuse the two translation hypergraphs and train the training set;

[0042] Wherein, the training includes two parts: the single machine translation system before fusion adopts the BTG sequence model trained by maximum entropy and the machine translation system after fusion adopts minimum error rate training (MERT);

[0043] Fourth, the test set is decoded to generate translation results, and the translation results are scored, that is, a method for integrating multiple machine translation systems is completed.

[0044] Modern machine translation technology is based on bilingual grammar, which is a quadruple

[0045] G=(V N ,V T ,P,S), where V N Is the set of no...

specific Embodiment approach 3

[0090] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the preprocessing process of the machine translation system is specifically:

[0091] (1) Segmentation of source language and target language;

[0092] (2) Sentences that need part-of-speech tagging are tagged with part-of-speech, and bilingual alignment is performed at the same time;

[0093] (3) the sentence that needs to carry out syntactic analysis carries out syntactic analysis;

[0094] (4) Merge the alignment information with the part-of-speech & syntax information;

[0095] (5) Phrases are extracted, and feature scores related to the phrases are calculated.

[0096] In order to better understand the preprocessing process, this embodiment uses a tree-to-string model for introduction. image 3 It is participle for the sentence "Bush and Sharon held talks"; Figure 4 Part-of-speech tagging for sentences after word segmentation; Figure 5...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for fusing multiple machine translation systems, relates to the related field of machine translation, and aims at solving the problems that the information of a decoding process is not fully considered by the traditional method for carrying out system fusion on post-treatment and a search space in decoding cannot be fully considered. The device for fusing of the multiple machine translation systems comprises a preprocessor, a phrase extractor, a voice model generator, a plurality of machine translation system trainers and a decoder. The method comprises 1, pretreating machine translation systems; 2, building a translation hypergraph of each translation system; 3, fusing two translation hypergraphs and training a training set, wherein training comprises two parts, wherein a single machine translation system before fusion adopts a BTG ordering model of the maximum entropy training, and the machine translation system after fusion adopts MERT of the minimum error rate training; 4, decoding a test set to generate a translation result, and grading the translation result. The method and the device are applied to the field of machine translation.

Description

technical field [0001] The invention relates to a method and a device for merging multiple machine translation systems, and belongs to the related field of machine translation. Background technique [0002] With the rapid development of computers, the use of computers to realize the translation technology between different languages ​​has long been known by people. Machine translation system fusion is to fuse the output N-best results of multiple systems to generate new translation results. And it has been shown that the translation results of the fusion are better than the output of a single system. According to the granularity of fusion, it includes sentence-level, phrase-level and word-level. Recently, the word-level system fusion technology based on confusion network has achieved substantial performance improvement, but these methods are all fused on the post-processing of machine translation. The traditional method of system fusion in post-processing does not fully co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
Inventor 刘宇鹏
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products