Generalized reordering statistic translation method and device based on non-continuous phrase

A statistical translation, discontinuous technology, applied in the field of generalized reordering statistical translation methods and devices, can solve the problems of translation model accuracy limitation and dependence, and achieve the effect of large generalization ability

Inactive Publication Date: 2010-03-31
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The second is the reordering of phrases
However, these models are highly dependent on the parser, and the performance of translation models is also limited by the accuracy of the parser

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generalized reordering statistic translation method and device based on non-continuous phrase
  • Generalized reordering statistic translation method and device based on non-continuous phrase
  • Generalized reordering statistic translation method and device based on non-continuous phrase

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The detailed issues involved in the technical solution of the present invention are described in detail below.

[0031] The present invention proposes a generalized reordering statistical translation method based on discontinuous phrases, which is divided into two parts: the training process and the translation process. The language to be translated and the language to be translated in the present invention are Chinese to be translated and English to be translated as examples. The specific process is as follows :

[0032] The training process includes:

[0033] a) To treat the parallel training corpus of translated Chinese and translated English, from the Chinese to be translated to the translated English and the translated English to the Chinese to be translated, run GIZA++ in both directions and apply the heuristic (grow-diag-final) correction rule for each A sentence pair gets a many-to-many word alignment.

[0034] b) Use the SRILM tool to train the translated Eng...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a generalized reordering statistic translation method and a device based on non-continuous phrases. The device consists of a word alignment module, a language model module, a phrase extraction module, a maximum entropy classifier training module, a minimum error training module and a decoder, provides a generalized reordering module for statistical machine translation basedon phrases, introduces non-continuous phrases, combines continuous phrases and non-continuous phrases by using regulations for any continuous series in a specified script to be translated so as to acquire continuous target translations as more as possibly, and combines the reordering model with a reordering sub model simultaneously to realize local and global reordering of the phrases so as to acquire final target translations for sentences in the source language. The model can grasp local and global reordering knowledge of the phrases, and can acquire the generalization capability of the phrases through non-continuous phrases. Experiment results prove that the model improves the BLUE rating of the reordering model based on the maximum entropy and a translation model based on hierarchicalphrases by about 1.54 percent and 0.66 percent.

Description

technical field [0001] The invention relates to the technical field of natural language processing, and is a new generalized reordering statistical translation method and device based on discontinuous phrases. Background technique [0002] In statistical machine translation, phrase-based translation models have improved word-based translation models. In the phrase-based translation model, a phrase is any continuous substring without syntactic constraints, which can learn some local knowledge, such as local ordering, or translation of multi-word expressions, and the insertion and insertion of words related to local contexts. delete. However, in phrase-based translation models, key issues such as lack of discontinuous phrases, weak phrase reordering ability, and generalization ability are still not effectively addressed. [0003] In order to improve phrase-based translation models, two issues must be addressed. One is the type of phrase, which must include both continuous p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
Inventor 宗成庆何彦青
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products