Data-enhanced machine translation method based on similar word and synonym replacement

A machine translation and synonym technology, applied in the field of natural language processing or conversion, can solve problems such as poor performance, difficulty in obtaining bilingual parallel corpus, insufficient training data for neural network machine translation models, etc., and achieve the effect of improving translation quality

Active Publication Date: 2018-11-30
GLOBAL TONE COMM TECH
View PDF7 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) Large-scale, high-quality bilingual parallel corpus is difficult to obtain, and the cost of constructing high-quality bilingual parallel corpus by human translation is relatively high
[0005] (2) Lack of large-scale, high-quality bilingual parallel corpus, resulting in ins

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data-enhanced machine translation method based on similar word and synonym replacement
  • Data-enhanced machine translation method based on similar word and synonym replacement
  • Data-enhanced machine translation method based on similar word and synonym replacement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0036] Such as figure 1 As shown, the data-enhanced machine translation method based on the replacement of similar words and synonyms provided by the embodiment of the present invention includes the following steps:

[0037] S101: Utilize the feature that the word vectors will be well clustered in the end to obtain a higher-quality similar vocabulary and synonymous vocabulary;

[0038] S102: Use the word vectors obtained in the training process of large languages ​​to construct similar vocabulary and synonyms, and then replace similar words and synonyms in rare small languages;

[0039] S103: Expand the parallel corpus o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of processing or transformation of natural languages, and discloses a data-enhanced machine translation method based on similar word and synonym replacement. The characteristics that word vectors are finally clustered well are utilized to obtain a similar word table and a synonym table with high quality; the similar word table and the synonym table areconstructed using the word vectors obtained in the training process of a large language, and similar words and synonyms in a scarce small language are replaced; a parallel corpus of the small languageis expanded, and a neural network machine translation model of the small language is trained by the adoption of an encoding-decoding structure and a neural network of an attention mechanism. Trainingdata is expanded, parameters of a neural network translation model can be well studied in enough data, and the problem of unregistered words in the neural machine translation can be alleviated, so that the translation quality of the translation model is improved. When the translation quality of the entire network on a development set is no longer significantly improved, the network parameters have been well studied.

Description

technical field [0001] The invention belongs to the technical field of natural language processing or conversion, and in particular relates to a data-enhanced machine translation method based on similar words and synonyms replacement. Background technique [0002] At present, the existing technologies commonly used in the industry are as follows: With the improvement of computer computing power and the application of big data, deep learning has been further applied, and Neural Machine Translation based on deep learning has attracted more and more attention. As a research hotspot of artificial intelligence, machine translation has very important scientific research value and practical value. In the NMT field, one of the most commonly used translation models is the encoder-decoder model with an attention-based mechanism. The main idea is to encode the sentence to be translated (collectively referred to as "source sentence" hereinafter) into a vector representation through an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
CPCG06F40/58
Inventor 汪一鸣熊德意秦文杰程国艮
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products