Chinese-cranial nerve machine translation method fusing syntactic analytic trees

A machine translation and syntax technology, applied in neural architecture, biological neural network models, instruments, etc., can solve the problems of poor performance of Chinese-Vietnamese neural machine translation models and insufficient corpus, and achieve high robustness and generalization ability , improved fluency and accuracy, and the effect of accurate translation

Active Publication Date: 2019-10-25
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a Chinese-Vietnamese neural machine translation method fused with syntactic parsing trees to solve the problem of poor performance of the Chinese-Vietnamese neural machine translation model caused by insufficient bilingual parallel corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese-cranial nerve machine translation method fusing syntactic analytic trees
  • Chinese-cranial nerve machine translation method fusing syntactic analytic trees
  • Chinese-cranial nerve machine translation method fusing syntactic analytic trees

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Embodiment 1: as Figure 1-4 As shown, the Chinese-Vietnamese neural machine translation method of merging the syntax parsing tree, the specific steps of the method are as follows:

[0033] Model building process:

[0034] Step1. From the 146K parallel sentence pairs collected through Internet crawling and manual translation, 144K parallel corpora were randomly selected as training sets and development sets to train the translation model, and 2K parallel corpora were used as test sets to evaluate the experimental results;

[0035] Step2. Preprocessing of training corpus: First, use the Chinese word segmentation tool to segment Chinese words, then perform tokenization, lowercase, and clean on all training data, and finally retain sentence pairs with a length of less than 80 words;

[0036] Step3. Use Stanford's Chinese syntax analysis model (ChinesePCFG) to analyze Chinese syntax, and obtain a Chinese syntax analysis tree, such as figure 2 shown. Use the Vietnamese p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese-cranial nerve machine translation method fusing syntactic analytic trees, and belongs to the technical field of natural language processing. Machine translation of Chinese-Vietnamese and Vietnamese-Chinese can be realized. A Chinese-Vietnamese bilingual parallel corpus constructed through Internet crawling and manual translation is used as a training data set. The method aims to solve the problem of translation errors caused by insufficient training corpus in current Chinese-Yue machine translation. The method comprises the following steps: performing word segmentation, part-of-speech tagging and syntactic analysis on a source language to obtain a syntactic tree of the source language; and then vectorizing the syntactic labels and fusing the vectorized syntactic labels into a coding process of machine translation model training to train a machine translation model. An obtained model can effectively complete translation between Chinese and Vietnamese.An experiment result shows that compared with a benchmark system which does not fuse the syntactic parsing tree, the translation obtained by the method is smoother, and 0.6 BLEU values are improved.

Description

technical field [0001] The invention relates to a Chinese-Vietnamese neural machine translation method fused with a syntax analysis tree, and belongs to the technical field of natural language processing. Background technique [0002] Machine translation is the process of using a computer to automatically convert one language into another, and it is a hot and difficult issue in the field of natural language processing. There are two main types of machine translation technologies currently in existence: statistical machine translation and neural machine translation. Statistical machine translation is to build a translation model by statistically analyzing a large number of parallel corpora. In recent years, with the rise of deep learning technology, the performance of machine translation models obtained by using deep learning-based neural machine translation methods has been significantly improved. Neural machine translation is a machine translation method proposed by Sutsk...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27G06N3/04
CPCG06N3/049G06F40/211G06F40/47G06F40/58
Inventor 余正涛王振晗高盛祥何健雅琳文永华
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products