An English word and case joint prediction method based on neural machine translation

A machine translation, capitalization technology, applied in the field of machine translation, can solve the problems of increasing processing steps and time overhead, not considering source corpus, and word case information restoration interference, etc., to reduce size, reduce model parameters, The effect of quality improvement

Active Publication Date: 2019-01-11
BEIJING UNIV OF TECH
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The above methods are all carried out on a single corpus. After the translation is completed, the upper and lower case of the target translation is restored, which increases the processing steps and time overhead.
And these methods do not consider the situation of the source corpus, when the translation result is inaccurate, it will greatly interfere with the recovery of word case information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An English word and case joint prediction method based on neural machine translation
  • An English word and case joint prediction method based on neural machine translation
  • An English word and case joint prediction method based on neural machine translation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] 1) The parallel corpus used is the 2017 China Workshop on Machine Translation (CWMT) English-Chinese machine translation evaluation corpus. After noise reduction, deduplication, and deletion of unreasonable sentences, 7 million pieces of data were obtained. The training data set contains Chinese corpus and English corpus, and each Chinese sentence in the Chinese corpus corresponds to an English translation sentence in the English corpus. We divide the case of English words into four categories: a) other, b) lowercase, c) first letter uppercase, d) all uppercase.

[0028]According to the English corpus, make the uppercase and lowercase labels of the corresponding words to form an English label corpus. Each word corresponds to a case tag, so each English translation corresponds to a sequence of case tags. Convert all the English corpus to lowercase, count the frequency words of English words in the English corpus, and arrange them in descending order from high frequency ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an English word and case joint prediction method thereof based on neural machine translation, which mainly comprises the following steps of: establishing a training data set and making a vocabulary; converting the training data set into a vector training set according to the vocabulary; training the translation model, adding the predicted word loss and the predicted word case loss as the total predicted loss of the translation model; when the total loss is no longer reduced in the training process, stopping training of the model; translating Chinese using a trained translation model; after the completion of translation, according to the English translation and the corresponding word case attribute information, restoring the word in the translation to its due form, and obtaining the official translation. The method of the invention not only reduces the size of the vocabulary and the model parameters, but also improves the quality of the translation.

Description

technical field [0001] The invention relates to a machine translation method, in particular to a method for predicting the case of Chinese and English words from Chinese to English Background technique [0002] Driven by practical applications, machine translation has been a research hotspot that has attracted much attention in recent years. The mainstream solution before machine translation is statistical machine translation. In recent years, deep learning and neural networks have been well developed in the field of images, and have achieved results beyond humans in the field of classification. The neural network method is also rapidly being widely used in other fields. In 2014, Jacob Devlin proposed a neural network joint model, which has achieved significant improvements compared to traditional statistical machine translation methods. This year, Microsoft Hany and others applied neural machine translation, surpassing the quality of translation for the first time. [0...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06F17/27
CPCG06F40/284G06F40/58
Inventor 张楠靳晓宁
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products