Check patentability & draft patents in minutes with Patsnap Eureka AI!

A Chinese-Vietnamese Neural Machine Translation Method Fused with Syntactic Parsing Trees

A technology of machine translation and syntax, applied in neural architecture, natural language translation, biological neural network models, etc., can solve problems such as insufficient corpus and poor performance of Chinese-Vietnamese neural machine translation models, and improve fluency and accuracy , high robustness and generalization ability, and the effect of accurate translation

Active Publication Date: 2020-08-28
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a Chinese-Vietnamese neural machine translation method fused with syntactic parsing trees to solve the problem of poor performance of the Chinese-Vietnamese neural machine translation model caused by insufficient bilingual parallel corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese-Vietnamese Neural Machine Translation Method Fused with Syntactic Parsing Trees
  • A Chinese-Vietnamese Neural Machine Translation Method Fused with Syntactic Parsing Trees
  • A Chinese-Vietnamese Neural Machine Translation Method Fused with Syntactic Parsing Trees

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Embodiment 1: as Figure 1-4 As shown, the Chinese-Vietnamese neural machine translation method of merging the syntax parsing tree, the specific steps of the method are as follows:

[0033] Model building process:

[0034] Step1. From the 146K parallel sentence pairs collected through Internet crawling and manual translation, 144K parallel corpora were randomly selected as training sets and development sets to train the translation model, and 2K parallel corpora were used as test sets to evaluate the experimental results;

[0035] Step2. Preprocessing of training corpus: First, use the Chinese word segmentation tool to segment Chinese words, then perform tokenization, lowercase, and clean on all training data, and finally retain sentence pairs with a length of less than 80 words;

[0036] Step3. Use Stanford's Chinese syntax analysis model (ChinesePCFG) to analyze Chinese syntax, and obtain a Chinese syntax analysis tree, such as figure 2 shown. Use the Vietnamese p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese-Vietnamese neural machine translation method fused with a syntax analysis tree, and belongs to the technical field of natural language processing. The invention can realize the machine translation of Chinese-Vietnamese and Vietnamese-Chinese. The Chinese-Vietnamese bilingual parallel corpus constructed by crawling from the Internet and human translation is used as the training data set. In order to solve the translation errors caused by insufficient training corpus in the current Chinese-Vietnamese machine translation; the present invention first performs word segmentation, part-of-speech tagging and syntax analysis on the source language to obtain the syntax tree of the source language. Then the syntactic tags are vectorized and fused into the encoding process of machine translation model training to train the machine translation model. The resulting model can efficiently translate between Chinese and Vietnamese. Experimental results show that, compared with the benchmark system without syntactic parse tree, the translation obtained by this method is more fluent, and 0.6 BLEU value has been improved.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese neural machine translation method fused with a syntax analysis tree, and belongs to the technical field of natural language processing. Background technique [0002] Machine translation is the process of using a computer to automatically convert one language into another, and it is a hot and difficult issue in the field of natural language processing. There are two main types of machine translation technologies currently in existence: statistical machine translation and neural machine translation. Statistical machine translation is to build a translation model by statistically analyzing a large number of parallel corpora. In recent years, with the rise of deep learning technology, the performance of machine translation models obtained by using deep learning-based neural machine translation methods has been significantly improved. Neural machine translation is a machine translation method proposed by Sutsk...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F40/47G06F40/211G06N3/04
CPCG06N3/049G06F40/211G06F40/47G06F40/58
Inventor 余正涛王振晗高盛祥何健雅琳文永华
Owner KUNMING UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More