Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Neural machine translation system based on Roman Uygur language

A machine translation and Uyghur language technology, applied in the field of neural machine translation system, can solve the problems of lower translation quality, inability to correctly segment sentences, less research, etc., and achieve the effect of improving translation quality, reducing sparsity problems, and improving quality.

Pending Publication Date: 2021-03-16
NANJING UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] 2) For languages ​​with high deformation and morphological changes (such as Uyghur), BPE cannot correctly segment sentences, thereby reducing translation quality;
[0011] 3) There are few studies on Chinese-Uyghur neural machine translation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural machine translation system based on Roman Uygur language
  • Neural machine translation system based on Roman Uygur language
  • Neural machine translation system based on Roman Uygur language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The present invention will be further described below in conjunction with the accompanying drawings.

[0069] Such as figure 1 The flow chart of building the Chinese-Uyghur translation model is shown. The Chinese-Uniform translation system includes a model building process and a translation process. The translation system performs different preprocessing on the Chinese corpus and the Uyghur corpus. After the bilingual parallel corpus is obtained, and before the bilingual statistical translation system is built, there will be a bilingual data preprocessing process to provide bilingual data with good word classification and appropriate format for subsequent processing such as word alignment. The preprocessing of this system is based on the different characteristics of the corpus, and the corpus is subjected to different preprocessing processes, that is, Jieba word segmentation is used for the Chinese corpus, and BPE coding is used to continue cutting words, while the "r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a neural machine translation system based on Romatized Uygur language, which provides bilingual data with segmented words and proper format for subsequent word alignment processing, for Chinese-Uygur language before constructing a translation system; according to the preprocessing of the system, different preprocessing processes are carried out on corpora according to different characteristics of the corpora, namely Jieba word segmentation is carried out on Chinese corpora, words are continuously cut by using BPE coding, root + affix form word segmentation is firstly carried out on Uygur language corpora, then Roman processing is carried out on Uygur language after word segmentation, and finally processing is carried out by using BPE coding; a translation model is trained by using a Tansformer translation model to obtain a final translation model; for a translation process, firstly, Uygur language is also preprocessed, namely, word segmentation and Roman processing of'word roots + affixes' are carried out, then the processed corpus is translated by using the trained model, and finally translated Chinese is obtained.

Description

technical field [0001] The invention relates to the technical field of machine translation, and mainly relates to a neural machine translation system based on Romanized Uighur language. Background technique [0002] Machine translation is a technology that uses computers to automatically convert a source language into a target language. Currently commonly used translation methods are statistical machine translation and neural machine translation. Neural Machine Translation (NMT) has achieved impressive results in previous years, outperforming traditional Phrase-Based Statistical Machine Translation (PBSMT) methods. State-of-the-art NMT systems rely on encoder-decoder architectures and introduce attention mechanisms to simulate word alignment; the model then encodes the source sentence into a fixed-length vector, which is then decoded word-by-word to output the target string from the vector representation. [0003] NMT and Statistical Machine Translation (SMT) models are se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/284G06F40/289G06F40/242
CPCG06F40/58G06F40/284G06F40/242G06F40/289Y02D10/00
Inventor 王健陈昊钰陈思宇侯潇钰
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products