Neural machine translation system based on Roman Uygur language

A machine translation and Uyghur language technology, applied in the field of neural machine translation system, can solve the problems of lower translation quality, inability to correctly segment sentences, less research, etc., and achieve the effect of improving translation quality, reducing sparsity problems, and improving quality.
CN112507734APending Publication Date: 2021-03-16NANJING UNIV

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV
Publication Date
2021-03-16

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a neural machine translation system based on Romatized Uygur language, which provides bilingual data with segmented words and proper format for subsequent word alignment processing, for Chinese-Uygur language before constructing a translation system; according to the preprocessing of the system, different preprocessing processes are carried out on corpora according to different characteristics of the corpora, namely Jieba word segmentation is carried out on Chinese corpora, words are continuously cut by using BPE coding, root + affix form word segmentation is firstly carried out on Uygur language corpora, then Roman processing is carried out on Uygur language after word segmentation, and finally processing is carried out by using BPE coding; a translation model is trained by using a Tansformer translation model to obtain a final translation model; for a translation process, firstly, Uygur language is also preprocessed, namely, word segmentation and Roman processing of'word roots + affixes' are carried out, then the processed corpus is translated by using the trained model, and finally translated Chinese is obtained.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of machine translation, and mainly relates to a neural machine translation system based on Romanized Uighur language. Background technique

[0002] Machine translation is a technology that uses computers to automatically convert a source language into a target language. Currently commonly used translation methods are statistical machine translation and neural machine translation. Neural Machine Translation (NMT) has achieved impressive results in previous years, outperforming traditional Phrase-Based Statistical Machine Translation (PBSMT) methods. State-of-the-art NMT systems rely on encoder-decoder architectures and introduce attention mechanisms to simulate word alignment; the model then encodes the source sentence into a fixed-length vector, which is then decoded word-by-word to output the target string from the vector representation.

[0003] NMT and Statistical Machine Translation (SMT) models are se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More