Machine translation method

A technology of machine translation and syntax, applied in the direction of instruments, special data processing applications, electronic digital data processing, etc., can solve the problems of high translation error rate, sensitive syntax analysis performance, waste of space, etc., and achieve high translation performance and fast translation speed Effect

Inactive Publication Date: 2011-02-16
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Compared with the string-based model, the tree-based model uses a syntax tree as input. The advantages are: fast decoding speed, concise model, and no need for binarization; however, this model has a flaw: only a single syntax tree is used to guide translation , since syntax-based models are sensitive to parsing performance, causing parsing errors to introduce false translations
A simple method is to use the N-best tree, decode each tree, and finally output the translation result with the highest probability, but this method has a limited search space and cannot share the same nodes of different trees, resulting in repeated decoding of many nodes , which wastes both space and time
At the same time, since this tree-based statistical machine translation system only uses a single syntax tree to guide translation, it often has the problem of high translation error rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine translation method
  • Machine translation method
  • Machine translation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Such as figure 1 as shown, figure 1 The implementation flowchart of the overall technical solution of the machine translation decoding method based on the shared compressed forest provided by the present invention, the method includes the following steps:

[0028] Step 101), utilizing the syntax analyzer to analyze the source language string and output the shared compressed syntax forest;

[0029] The main task of syntactic analysis is to analyze the input source language string into a corresponding syntax tree. Available phrase tree parsers: Charniak parser, Bikel Parser, Stanford parser, Collins Parser, MuskCpars; the parser should not only output the 1-best tree, but also output the entire shared compression forest, that is: all possible ways to finally generate the root node A shared compression forest composed of parsing trees of . In the present embodiment, the MuskCpar analyzer is used. Refer to Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin, Yueliang Qian.20...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a machine translation method, comprising the steps as follows: 1) source language strings are analyzed to gain share compressed syntax forests of the source language strings; step 2) according to the translation rule set between the known source language and a destination language, the syntax forests are matched so as to gain the share compressed translation forests; step 3) a search algorithm is used to look over the translation forests to generate final translation results. The machine translation method utilizes the share compressed forest to guide the translation, can search the translation results from a plurality of trees, and far exceeds the search space when N-best tree is independently used. On 2.23 millions of parallel bilingual data sets, compared with models decoded by 30-best, the translation speed of machine translation method is 1.4 times faster and the translation performance thereof is 1.7 BLEU points higher.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, in particular, the invention relates to the technical field of tree-based statistical machine translation. Background technique [0002] Syntax-based statistical machine translation models have become the current mainstream translation methods. According to different inputs, they can be divided into string-based models and tree-based models (for tree-based models, please refer to Yang Liu, Qun Liu, and Shouxun Lin.2006 .Tree-tostring alignment template for statistical machine translation. In Proceedings of COLING-ACL, pages 609-616, Sydney, Australia, July. and Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of AMTA.). Compared with the string-based model, the tree-based model uses a syntax tree as input. The advantages are: fast decoding speed, concise model, and no need for binariza...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/28G06F17/30
Inventor 米海涛黄亮刘群
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products