Chinese-Vietnamese hybrid network neural machine translation set outer word processing method integrated with classification dictionary

A hybrid network and machine translation technology, applied in the fields of electronic digital data processing, special data processing applications, instruments, etc., can solve the problem of lack of research on foreign words, and achieve the goal of alleviating the problem of foreign words, improving accuracy and improving performance. Effect

Active Publication Date: 2019-11-26
KUNMING UNIV OF SCI & TECH
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The above work did not involve too much integration of external knowledge such as bilingual dictionaries, and at the same time did not conduct research on the characteristics of words outside the collection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-Vietnamese hybrid network neural machine translation set outer word processing method integrated with classification dictionary
  • Chinese-Vietnamese hybrid network neural machine translation set outer word processing method integrated with classification dictionary
  • Chinese-Vietnamese hybrid network neural machine translation set outer word processing method integrated with classification dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Embodiment 1: as Figure 1-4 As shown, the Chinese-Vietnamese mixed network neural machine translation method for processing out-of-set words integrated into the classification dictionary, the specific steps are as follows:

[0036]Step1. Construct a taxonomy dictionary: build a taxonomy dictionary according to the classification of words outside the set, and the constructed taxonomy dictionary includes bilingual dictionaries, entity dictionaries and rule dictionaries;

[0037] The bilingual dictionary uses the GIZA++ word alignment tool to process the Chinese-Vietnamese bilingual corpus, excludes the words in the model vocabulary from the alignment results, and adds some bilingual dictionaries added manually, and finally builds a bilingual dictionary with a scale of 8735. The entity dictionary extracts entries based on Wikipedia. According to the linked HTML information, a total of 18,741 entity dictionaries are extracted, including 6,418 person name entities, 2,934 pl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese-Vietnamese hybrid network neural machine translation set outer word processing method integrated with a classification dictionary, and belongs to the technical fieldof resource scarcity type language neural machine translation. The method comprises the following steps: firstly, constructing a classification dictionary; merging the segmented source language sentences through a method of scanning and searching a classification dictionary to recover regular phrases in the sentences, and then performing label marking on the phrases by using an encoder in RNNSearch; and constructing a hybrid network decoder containing a word-level mode and a phrase mode by adopting a gating unit to decide which decoding mode is used for decoding during decoding, and generating final translation. According to the method, by integrating the classification dictionary and constructing the hybrid network, the out-of-set word problem of resource scarcity type language machine translation is effectively relieved, and the translation accuracy is improved.

Description

technical field [0001] The invention relates to a method for processing out-of-collection words in Chinese-Vietnamese mixed network neural machine translation integrated into a classification dictionary, and belongs to the technical field of resource-scarce language neural machine translation. Background technique [0002] At present, in order to control the computational complexity that grows proportionally with the size of the target vocabulary, most neural machine translation systems limit the vocabulary to only include 30,000 to 80,000 common words in the source language and target language corpus, except Words other than this are called out-of-set words. Out-of-set words have always been a research hotspot in neural machine translation, and have a huge impact on translation performance. How to deal with out-of-set words has always been the main research direction of neural machine translation. [0003] In the neural machine translation of resource-scarce languages, ther...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27G06F16/36
CPCG06F16/374
Inventor 余正涛徐毓赖华郭军军车万金王红斌线岩团
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products