A Method for Improving the Quality of English-Chinese Machine Translation Based on Data Selection

A machine translation and data selection technology, applied in the field of data selection, can solve problems such as a lot of training time and the translation results cannot be the best, and achieve the effect of reducing the amount of data, saving time and space costs, and reducing storage space costs.

Active Publication Date: 2020-04-14
GLOBAL TONE COMM TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to provide a method for improving the quality of English-Chinese machine translation based on data selection, aiming to solve the problem that most machine translation systems at the present stage require a large amount of training time during the entire training process, and also require huge disk space to store data and models, while at the same time, the translation system trained with a large amount of multi-domain training data cannot achieve the best translation results for a specific domain

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Improving the Quality of English-Chinese Machine Translation Based on Data Selection
  • A Method for Improving the Quality of English-Chinese Machine Translation Based on Data Selection
  • A Method for Improving the Quality of English-Chinese Machine Translation Based on Data Selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0035] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0036] Such as figure 1 As shown, a method for improving the quality of English-Chinese machine translation based on data selection provided by an embodiment of the present invention, the method for improving the quality of English-Chinese machine translation based on data selection includes:

[0037] S101: Re-express the data in a bag-of-words representation.

[0038] S102: Utilize the cosine calculation method to express the distance between the sentences, and then obtain the final...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for improving English-Chinese machine translation quality on the basis of data selection. The method includes presenting data by the aid of bag-of-word presentation forms again; presenting the distances among sentences by the aid of cosine computation processes and acquiring final scores of each sentence pair by means of cosine correlation computation; sorting universal data by the aid of the scores, ultimately selecting the related data and systematically training machine translation systems. The method has the advantages that the time costs and the storage space costs in statistic machine translation system training procedures can be reduced, and the data volumes of the training data can be reduced as compared with systems trained by the aid of multi-field universal data; the selected data and to-be-tested data come from the same field and are related to one another relatively in the aspect of contents, and accordingly the performance of the systems trained by the data selected by the aid of the method can be theoretically superior to the performance of the machine translation systems trained by all the data.

Description

technical field [0001] The invention belongs to the technical field of data selection, in particular to a method for improving the quality of English-Chinese machine translation based on data selection. Background technique [0002] With the introduction of the IBM statistical model, statistical-based machine translation methods have gradually replaced rule-based translation methods and become the mainstream machine translation methods at this stage. The basic idea is to use statistical methods to automatically learn translation knowledge from large-scale bilingual corpora and build translation models. [0003] In traditional statistical machine translation, the quality of the corpus directly determines the quality of the final translation system. In this era of information explosion, the information on the Internet is growing exponentially, and it also provides a large amount of monolingual or bilingual corpus for machine translation. [0004] Theoretically, as the amount...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58
CPCG06F40/58
Inventor 程国艮汪一鸣
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products