Method for establishing tree structure and tree-structure-based machine translation system

A machine translation, tree structure technology, applied in the direction of instruments, special data processing applications, electrical digital data processing, etc., can solve problems such as unavailability, reduced accuracy of syntax analyzers, limited, etc., to increase coverage and expand applicability , good coordination effect

Active Publication Date: 2012-09-12
北京中科凡语科技有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the limited resources of the current human-labeled treebank, many language pairs only have resources in a few limited fields, and once sentences in other fields are involved, the accuracy of the parser will be greatly reduced to the point that it cannot be used
What's more serious is that at present, a large numbe

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for establishing tree structure and tree-structure-based machine translation system
  • Method for establishing tree structure and tree-structure-based machine translation system
  • Method for establishing tree structure and tree-structure-based machine translation system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0031] 1. Perform word segmentation, part-of-speech tagging and word alignment on bilingual sentence pairs in the bilingual corpus. The specific implementation is as follows:

[0032] The source language sentence and the target language sentence in the bilingual sentence pair are segmented, and the word segmentation results of the source language end and the target language end are obtained. If the source language or target language does not contain Chinese, word segmentation is not required. If Chinese is included in the source language or target language, the Chinese word needs to be segmented. There are many ways to segment Chinese words. In the embodiment of the present invention, the word analysis tool Urheen is used to automatically segment Chinese words. Urheen lexical analysis tool can be downloaded for free at the following website: http: / / www.openpr.org.cn / index.php / NLP-Toolkit-for-Natural-Language-Processing / .

[0033] After obtaining the word segmentation results of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for establishing a tree structure and a tree-structure-based machine translation system. The method includes the steps of performing word segmentation, part-of-speech tagging and word alignment for bilingual sentence pairs in a bilingual corpus; performing bilingual segmentation for the bilingual sentence pairs to generate bilingual sub-sentence pairs shorter than the former sentence pairs according to the result of word alignment, and performing word realignment for the generated bilingual sub-sentence pairs; combining sub-sentences to achieve word alignment of the bilingual sentence pairs according to the result of word realignment of the bilingual sub-sentence pairs, and constituting a compressed forest for the bilingual sentence pairs; and selecting the proper tree structure from the compressed forest. The method can be used for establishing the tree-structure-based translation system with the language pairs having part-of-speech tagging resources without any syntactic tree resources.

Description

Technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a method for constructing a tree structure and a machine translation system based on the tree structure, in particular to a method for constructing a tree structure using bilingual corpus for unsupervised tree structure derivation, and a method for constructing a tree structure based on The method of the tree structure machine translation system. Background technique [0002] Statistical machine translation is a technology that automatically learns translation rules from a parallel bilingual corpus and makes effective use of these translation rules for automatic translation of sentences to be translated. Statistical machine translation mainly includes translation systems based on word-based models, phrase-based models and models based on syntactic analysis tree structure. Among them, the translation system based on the structure model of the syntactic anal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28G06F17/27
Inventor 宗成庆翟飞飞
Owner 北京中科凡语科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products