Unlock instant, AI-driven research and patent intelligence for your innovation.

Neural network machine translation corpus extension method based on statistical phrase table

A technology of machine translation and extension methods, applied in the field of computer applications and machine translation, can solve problems such as difficulty in obtaining satisfactory results, and achieve the effect of alleviating adverse effects and improving evaluation indicators

Inactive Publication Date: 2018-08-03
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF9 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For some languages ​​where bilingual parallel resources are relatively scarce, it is difficult to obtain satisfactory results by applying neural networks for translation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network machine translation corpus extension method based on statistical phrase table
  • Neural network machine translation corpus extension method based on statistical phrase table

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] This embodiment describes the flow of the method of the present invention and its specific examples.

[0054] figure 1 It is a statistical phrase table-based neural network machine translation corpus expansion method of the present invention and a flow chart in this embodiment.

[0055] from figure 1 It can be seen from the figure that the present invention includes two stages 1) the training set expansion stage and 2) the operation process of the model training stage.

[0056] Take Uyghur-to-Chinese translation as an example, where Uyghur is the source language and Chinese is the target language.

[0057] 1) Training set expansion stage:

[0058] Step 1. Preprocess the original training set according to Definition 1, Definition 2, Definition 3, Definition 4, and Definition 5. The specific process of preprocessing varies with different source languages ​​and target languages. The purpose is to standardize the training set. Among them, the preprocessing process of th...

Embodiment 2

[0070] The training set in the Uyghur-Chinese news translation task provided by CWMT2017 is randomly split into training set, development set and test set 1. In addition, the development set data of the Uyghur-Chinese news translation evaluation task provided by CWMT2017 is used as a test Set 2, the experimental results show that, in the case that the original training set, development set, test set data and neural machine translation model are the same, the present invention adopts BLEU based on Chinese characters compared with the neural machine translation model training method of the present invention. As the evaluation index, the following experimental results can be obtained.

[0071] Table 1 uses the comparison of BLEU values ​​before and after the training set expansion method proposed by the present invention

[0072]

[0073] The experimental results in Table 1 show that: in the case of the same training set, development set and test set data, the BLEU evaluation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a neural network machine translation corpus extension method based on a statistical phrase table and belongs to the technical field of machine translation. The neural network machine translation corpus extension method based on the statistical phrase table is provided for a neural network machine translation technology, and the scale of a corpus can be effectively extendedon the basis of a machine translation original training set. The method mainly comprises the steps of an extension stage of the training set and a training stage of a model; at the first stage, the phrase table is learned from the original training set through a statistical machine learning method, and the phrase table and the original training set are fused into a new extended training set according to a certain filtering rule; at the second stage, a neural machine translation model is trained, pre-training is conducted through the extended training set at first, then training is conducted through the original training set for optimization adjustment, and a final model is obtained; it is proved through experimental results that compared with a machine translation model without using the corpus extension method, the BLEU measurement and evaluation index is remarkably increased.

Description

technical field [0001] The invention relates to a neural network machine translation corpus extension method based on a statistical phrase table, and belongs to the technical fields of computer application and machine translation. Background technique [0002] Machine translation is a technology that uses computers to automatically translate one language (source language) into another language (target language). [0003] With the development of artificial neural network and deep learning technology, the neural network machine translation technology based on deep learning technology (hereinafter referred to as neural machine translation) has made important achievements in recent years. Neural machine translation has the advantages of less linguistic knowledge and manual intervention, less space for model storage, and smooth and natural translation output. Neural machine translation is generally considered the best choice for bilingual resource-rich translation tasks. At pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28G06N3/08
CPCG06F40/58G06N3/08
Inventor 黄河燕史学文鉴萍唐翼琨
Owner BEIJING INSTITUTE OF TECHNOLOGYGY