Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Two-stage-type machine translation method with preferentiality of idiomatic phrases

A machine translation, two-stage technology, applied in the field of machine translation, can solve the problems of data sparseness and limited corpus size, so as to improve the translation effect and alleviate the problem of data sparseness

Active Publication Date: 2016-11-23
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the above-mentioned prior art, the purpose of the present invention is to provide a statistical machine translation method, which aims to solve the problem of data sparsity caused by the limited size of the corpus and the length of the extracted phrases in the prior art phrase-based statistical machine translation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-stage-type machine translation method with preferentiality of idiomatic phrases
  • Two-stage-type machine translation method with preferentiality of idiomatic phrases
  • Two-stage-type machine translation method with preferentiality of idiomatic phrases

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Phrase-based statistical machine translation includes two parts, training and translation. The training part is mainly to obtain the model required by the decoder, wherein the phrase translation probability table in step S3 is obtained by the training part; after obtaining the training results such as the phrase translation probability table, The decoder uses the training results such as the phrase translation probability table to translate the sentence to be translated.

[0045] 1. The specific implementation of the training part is as follows:

[0046] The training mainly includes three parts, namely, translation model training, language model training and tuning training. For details, see figure 2 Those skilled in the art can understand that the translation model training is mainly to obtain the phrase translation probability table, and there are many existing training methods, one of which is as image 3 As shown, it is divided into the following three steps:

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a machine translation method for achieving the whole-sentence translation through preferential translation of idiomatic phrases composed of one or more phrase nests on the basis. According to the embodiment, the method comprises the following steps of marking idiomatic phrases, wherein idiomatic phrases in sentences in a source language are marked out; translating the idiomatic phrases, wherein the idiomatic phrases are divided into two parts to be translated respectively, and translations are regrouped; dividing short phrases, wherein remaining parts of original sentences are divided into all possible phrases, and the idiomatic parts are adopted as translated portions; constructing a candidate phrase list, wherein only phrases existing in a phrase translation probability list are screened out and added into the candidate phrase list; translating sentences, wherein the optimal translation is generated by means of an existing heuristic decoder and the candidate phrase list for the partially-translated source language sentences composed of the idiomatic translation and other non-translated parts. According to the first stage, the idiomatic phrases are translated, and according to the second stage, the remaining parts of the sentences are translated.

Description

technical field [0001] The invention relates to the field of machine translation, in particular to a two-stage statistical machine translation method that preferentially translates fixed collocations. Background technique [0002] Statistical machine translation is a data-driven translation method. It regards the translation of natural language as a machine learning problem, uses a mathematical model to model the translation, and uses a bilingual parallel corpus with a certain scale to train the model and parameters. Finally, use This model is used to generate the translation with the highest probability. Compared with rule-based translation methods, statistical machine translation does not require human experts to write translation rules, and its translation rules can be automatically obtained from parallel corpora through the training process. In addition, statistical machine translation is language-independent. As long as a parallel corpus of the corresponding language p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/28
CPCG06F40/58
Inventor 秦科刘贵松罗光春段贵多
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products