Translation method and translation system oriented to morphologically-rich language

A translation system and rich technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of ambiguity and loss of translation rules

Inactive Publication Date: 2012-09-19
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method loses the affix information after all,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Translation method and translation system oriented to morphologically-rich language
  • Translation method and translation system oriented to morphologically-rich language
  • Translation method and translation system oriented to morphologically-rich language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Specific embodiments of the present invention are given below, and the present invention is described in detail in conjunction with the accompanying drawings.

[0043] The purpose of the present invention is to propose a translation method for morphologically rich languages. By treating stems and affixes differently and using stems as atomic translation units, the problem of data sparsity is alleviated; affixes associated with translation rules are used to disambiguate translation rules, thereby improving the quality of morphologically rich language translation.

[0044] In order to achieve the above-mentioned purpose of the invention, the present invention provides a specific machine translation method, comprising the following steps:

[0045] Step 1) Perform morphological analysis on morphologically rich language to obtain word stem and affix information;

[0046] Step 2) When extracting translation rules, the word stem is used as the atomic translation unit, and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a translation method and a translation system oriented to a morphologically-rich language. The method comprises the following steps of: (1) carrying out morphological analysis on the morphologically-rich language, so as to obtain stem and affix information; (2) during the extraction of translation rules, taking a stem as an atomic translation unit, and reserving corresponding affix distribution information; and (3) during translation, acquiring stem and affix distribution according to a fragment to be translated, wherein a stem sequence is used for querying a rule table, the affix distribution information and candidate affix distribution according to a rule are used for calculating similarity, so as to characterize the degree of the similarity between the affix distribution information and the candidate affix distribution, and guide to decod, and the stem sequence is a sequence consisting of a plurality of stems.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, in particular, the invention relates to a translation method and system for morphologically rich languages. Background technique [0002] Current statistical machine translation (Statistical Machine Translation) related technologies are mainly derived from English and similar languages. It assumes that words are atomic translation units. Based on this assumption, word-based, phrase-based, and syntax-based translation models are proposed; under the premise of a large corpus, such methods effectively improve the isolated language (such as Chinese) and languages ​​that are not rich in morphological changes (such as English, French). [0003] But for a morphologically rich language, there are a series of morphological changes: twists and turns, phonetic harmony, consistency, compounding, etc.; therefore, for a given stem form, it can theoretically generate hundreds of words. Tho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 王志洋吕雅娟刘群
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products