Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

A technology of machine translation and processing methods, which is applied in the field of processing out-of-set words in Chinese-Vietnamese neural machine translation, can solve problems such as incomplete context translation, effective processing, and difficult out-of-set words, so as to improve performance and effect and reduce difficulty Effect

Active Publication Date: 2019-11-15
KUNMING UNIV OF SCI & TECH
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a method for processing out-of-set words in Chinese-Vietnamese neural machine translation integrated into a classification dictionary, in order to solve the following problem: most of the out-of-set word processing methods do not consider to the universal applicability of the language; finding other resources to alleviate such problems will affect the translation effect of the words around the out-of-set words, resulting in incomplete context translation; replacing out-of-set words by building a general dictionary will result in more than one word The translation problem is difficult to effectively deal with the out-of-collection words
[0005] The present invention studies the characteristics of different out-of-set words, proposes a classification idea, classifies out-of-set words to build a classification dictionary, and integrates it into the neural network Dealing with out-of-set words in the machine translation model solves the negative impact of out-of-set words in neural machine translation on the translation effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary
  • Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary
  • Method for processing out-of-set words of Chinese-over-neural machine translation integrated with classification dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Embodiment 1: as Figure 1-3 As shown, a Chinese-Vietnamese neural machine translation foreign word processing method integrated into the classification dictionary, the specific steps of the Chinese-Vietnamese neural machine translation foreign word processing method integrated into the classification dictionary are as follows:

[0044] Step1. Obtain the homepage of the Chinese-Vietnamese website, use web crawler technology to crawl the Chinese-Vietnamese data, and denoise it, and organize it into a training set, a test set, and a verification set;

[0045] Step2, construction of the classification dictionary: analyze the characteristics of the out-of-set words, and divide the out-of-set words into three categories to build the classification dictionary;

[0046] One is rare words, words outside the regular vocabulary, using words outside the regular vocabulary to build a bilingual dictionary; the construction method is: use GIZA++ to align the corpus, and then exclude ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for processing out-of-set words of Chinese-over-neural machine translation integrated with a classification dictionary, and belongs to the technical field of naturallanguage processing. Out-of-set words are classified. Different types of out-of-set words can be processed by adopting different methods. A classification dictionary is built in a targeted manner; wherein a bilingual dictionary is used for solving a translation problem of rare words outside a word list; an entity dictionary is used for solving the problem of inaccurate entity word translation; anda rule dictionary is used for solving the translation problem of numbers, symbols, time, dates and other words. Then, in the preprocessing stage of the model, out-of-set word recognition is performedby querying a classification dictionary. Label replacement is performed on out-of-set words at the encoding end of the model. A translation result with labels is acquired after model translation andtranslation recovery is carried out on the labels by querying the classification dictionary. The classification dictionary is fused into neural machine translation. The out-of-set words can be more accurately translated and the performance and effect of a neural machine translation system are improved.

Description

technical field [0001] The invention relates to a method for processing foreign words in a Chinese-Vietnamese neural machine translation set integrated into a classification dictionary, and belongs to the technical field of natural language processing. Background technique [0002] Neural machine translation is a machine translation method proposed in recent years. It has achieved good results in resource-rich translation tasks, but the effect in low-resource language neural machine translation is not ideal. To control the computational complexity that grows in proportion to the size of the target vocabulary, most neural machine translation systems limit the vocabulary to only 30,000 to 80,000 common words in the parallel data. Words, when translating, convert out-of-set words into UNK symbols. The obvious problem with this method is that the neural machine translation model cannot effectively translate out-of-set words, and meaningless UNK symbols will increase the ambigui...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F16/35G06F17/27
CPCG06F16/35
Inventor 赖华贾承勋余正涛朱恩昌车万金文永华高盛祥
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products