A Chinese-Vietnamese Mixed Network Neural Machine Translation Method for Out-of-Set Word Processing Integrating into the Thesaurus Dictionary

A hybrid network and machine translation technology, applied in natural language translation, electronic digital data processing, special data processing applications, etc., can solve the problem of no research on out-of-set words, achieve the effect of alleviating the problem of out-of-set words and improving accuracy

Active Publication Date: 2022-07-19
KUNMING UNIV OF SCI & TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The above work did not involve too much integration of external knowledge such as bilingual dictionaries, and at the same time did not conduct research on the characteristics of words outside the collection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese-Vietnamese Mixed Network Neural Machine Translation Method for Out-of-Set Word Processing Integrating into the Thesaurus Dictionary
  • A Chinese-Vietnamese Mixed Network Neural Machine Translation Method for Out-of-Set Word Processing Integrating into the Thesaurus Dictionary
  • A Chinese-Vietnamese Mixed Network Neural Machine Translation Method for Out-of-Set Word Processing Integrating into the Thesaurus Dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Example 1: as Figure 1-4 As shown in the figure, the Chinese-Vietnamese hybrid network neural machine translation processing method for out-of-set words integrated into the classification dictionary, the specific steps are as follows:

[0036]Step1. Construct the classification dictionary: construct the classification dictionary according to the classification of the words outside the set, and the constructed classification dictionary includes bilingual dictionary, entity dictionary and rule dictionary;

[0037] The bilingual dictionary uses the GIZA++ word alignment tool to process the Chinese-Vietnamese bilingual corpus, excludes the words in the model vocabulary from the alignment results, and adds some manually sorted and added bilingual dictionaries to finally build a bilingual dictionary with a scale of 8735. The entity dictionary is based on Wikipedia for entry extraction. According to the linked HTML information, a total of 18,741 entity dictionaries are extrac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for processing out-of-set words in a Chinese-Vietnamese hybrid network neural machine translation integrated into a classification dictionary, and belongs to the technical field of language neural machine translation with scarce resources. The present invention firstly constructs a classification dictionary; then, the split source language sentences are merged to restore the regular phrases in the sentences by scanning and searching for the classification dictionary, and then the encoder in RNNSearch is used to label these phrases; The gating unit builds a hybrid network decoder containing word-level patterns and phrase patterns to decide which decoding mode to use when decoding, and generates the final translation. By integrating the classification dictionary and the method of constructing a hybrid network, the invention effectively alleviates the problem of out-of-set words in the machine translation of language with scarce resources, and improves the accuracy of translation.

Description

technical field [0001] The invention relates to a method for processing out-of-set words in a Chinese-Vietnamese hybrid network neural machine translation integrated into a classification dictionary, and belongs to the technical field of language neural machine translation with scarce resources. Background technique [0002] In order to control the computational complexity that grows proportionally to the target vocabulary size, most neural machine translation systems limit the vocabulary to only contain 30,000 to 80,000 common words in the source and target language corpora, except Other words are called out-of-set words. The problem of out-of-set words has always been a research hotspot in neural machine translation, and has a huge impact on translation performance. How to deal with the problem of out-of-set words has always been the main research direction of neural machine translation. [0003] In the neural machine translation of resource-scarce languages, there is litt...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F40/242G06F40/289G06F16/36
CPCG06F16/374
Inventor 余正涛徐毓赖华郭军军车万金王红斌线岩团
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products