Mongolian-Chinese machine translation system based on byte pair coding technology

A byte pair encoding and machine translation technology, applied in the field of Mongolian-Chinese machine translation system, can solve the problem of reducing the number of unregistered words in Mongolian, reduce the number of unregistered words in Mongolian, improve translation performance, and save structure. The effect of character and fluency

Inactive Publication Date: 2020-01-10
INNER MONGOLIA UNIV OF TECH
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This greatly preserves the structural features and fluency of sentences, reduces the number of unregistered words in Mon

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mongolian-Chinese machine translation system based on byte pair coding technology
  • Mongolian-Chinese machine translation system based on byte pair coding technology
  • Mongolian-Chinese machine translation system based on byte pair coding technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

[0039] The present invention starts from reducing the number of Mongolian unregistered words in Mongolian-Chinese translation and improving the translation quality of Mongolian-Chinese machine translation, aiming at the excessive number of unregistered words in the Mongolian-Chinese translation process and the complexity of the Mongolian sentence structure itself. There are still serious deviations in translation results and other problems. A Mongolian-Chinese machine translation system based on byte pair coding technology is proposed. The implementation process is as follows:

[0040] 1. Data preprocessing based on BPE technology for corpus

[0041] First, add the constituent characters of all words in English, Mongolian, and Chinese in the corpus to the dictionary as an initialization dictionary. Convert all words into the form of character s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Mongolian-Chinese machine translation system based on a byte pair coding technology. Firstly, English-Chinese parallel corpora and Mongolian-Chinese parallel corpora are preprocessed through the BPE technology, English words, Mongolian words and Chinese words are all divided into single characters, then the occurrence frequency of character pairs is counted within the range of the words, and the character pair with the highest occurrence frequency is stored every time till the circulation frequency is finished. Secondly, the preprocessed English-Chinese parallel corpus is used for training based on a neural machine translation framework; and then, the preprocessed translation model parameter weight trained by the English-Chinese parallel corpus is migrated into aMongolian-Chinese neural machine translation framework, and a neural machine translation model is trained by utilizing the preprocessed Mongolian-Chinese parallel corpus to obtain a Mongolian-Chineseneural machine translation prototype system based on a byte pair coding technology. And finally, the BLEU value of the translation of the system and the BLEU value of the translation of the statistical machine are compared and evaluated to achieve the purpose of finally improving the translation performance of the Mongolian-Chinese machine.

Description

technical field [0001] The invention belongs to the technical field of neural machine translation, in particular to a Mongolian-Chinese machine translation system based on byte pair coding technology. Background technique [0002] Machine translation refers to the process of using computers to automatically convert one natural language into another natural language with exactly the same meaning. With the rapid development of economic globalization and the Internet, machine translation technology plays an increasingly important role in promoting political, economic, and cultural exchanges. Neural machine translation adopts a novel system for solving machine translation problems, and it has developed rapidly in recent years and achieved many important results. Especially in terms of fluency and accuracy of translation, neural machine translation has smoother translation results than traditional statistical machine translation. [0003] However, neural machine translation als...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/58G06F40/242G06F40/216G06N3/04G06N3/08
CPCG06N3/084G06N3/044
Inventor 苏依拉王昊贺玉玺
Owner INNER MONGOLIA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products