Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Synonym data enhancement-based Chinese-Vietnamese neural machine translation method

A machine translation and synonym technology, applied in natural language translation, digital data processing, natural language data processing, etc., can solve the problem of low performance of neural machine translation, and achieve the goal of solving low translation performance, improving translation quality, and good recognition. Effect

Pending Publication Date: 2020-09-18
KUNMING UNIV OF SCI & TECH
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The invention provides a Chinese-Vietnamese neural machine translation method based on synonym data enhancement to solve the problem of low neural machine translation performance due to the scarcity of Chinese-Vietnamese parallel corpus resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Synonym data enhancement-based Chinese-Vietnamese neural machine translation method
  • Synonym data enhancement-based Chinese-Vietnamese neural machine translation method
  • Synonym data enhancement-based Chinese-Vietnamese neural machine translation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] Embodiment 1: as Figure 1-3 As shown, the Chinese-Vietnamese neural machine translation method based on synonym data enhancement, the specific steps of the Chinese-Vietnamese neural machine translation method based on synonym data enhancement are as follows:

[0030] Step1. First construct a vocabulary V within the scope of the source language of the training corpus, and then add words with a frequency of N (1≤NR ;

[0031] Step2. Find synonyms of low-frequency words through monolingual word vectors: in the monolingual semantic space, the similarity between two words can be judged by calculating the distance between words. Therefore, the present invention utilizes the monolingual corpus of large-scale Chinese and Vietnamese, obtains the characteristic vector of monolingual vocabulary representation through monolingual language training, judges the cosine value between two vocabulary by calculating the cosine value between the vectors of characterizing vocabulary then ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a synonym data enhancement-based Chinese-Vietnamese neural machine translation method, and belongs to the technical field of natural language processing. The resource scarcityof the Chinese-Vietnamese parallel corpus affects the Chinese-Vietnamese machine translation effect to a great extent. Data enhancement is an effective way for improving Chinese-Vietnamese machine translation. According to the method, firstly, a synonym list of low-frequency words of a language at one end is obtained through monolingual word vector learning, then, synonym replacement is conductedon the low-frequency words, replaced sentences are screened through a language model, and finally, the screened sentences are matched with sentences in the language at the other end to obtain extended parallel corpora. According to the method, powerful support is provided for expanding the work of the Chinese-Vietnamese parallel corpus; the problem of low Chinese-Vietnamese neural machine translation performance caused by scarce corpus resources is solved.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese neural machine translation method based on synonym data enhancement, and belongs to the technical field of natural language. Background technique [0002] In the field of Chinese and Vietnamese natural language processing, the construction of high-quality Chinese-Vietnamese parallel corpora is the foundation, premise and pillar of Chinese-Vietnamese neural machine translation. The quality and scale of the Chinese-Vietnamese parallel corpus directly affect the performance of Chinese-Vietnamese neural machine translation; in order to solve the quality and performance of follow-up work, it is necessary to build a high-quality Chinese-Vietnamese parallel corpus. At present, in the field of machine translation, there are two main types of data enhancement methods, one is vocabulary replacement, and the other is back translation. Since Chinese and Vietnamese are low-resource languages, bilingual dictionaries are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/247G06F40/205G06F40/279
CPCG06F40/58G06F40/247G06F40/205G06F40/279
Inventor 高盛祥尤丛丛余正涛毛存礼潘润海
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products