Training method and device of a machine translation model

A machine translation and training method technology, applied in the electronic field, can solve the problem of low translation accuracy, and achieve the effect of reducing over-learning, avoiding over-fitting, and solving low translation accuracy.

Active Publication Date: 2019-07-05
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiments of the present invention provide a training method and device for a machine translation model, which solves the technical problem of low translation accuracy in the machine translation model in the prior art, and achieves the technical effect of improving the translation accuracy of the machine translation model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method and device of a machine translation model
  • Training method and device of a machine translation model
  • Training method and device of a machine translation model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0086] This embodiment provides a training method for a machine translation model, such as figure 1 shown, including:

[0087] Step S101: Obtain the first bilingual sentence pair.

[0088] Specifically, the first bilingual sentence pair includes a sentence to be translated (also called a "source sentence") and a translation sentence corresponding to the sentence to be translated (also called a "target sentence").

[0089] In the specific implementation process, when training the machine translation model, a large number of bilingual sentence pairs are needed, wherein each bilingual sentence pair includes a sentence to be translated and a translation sentence corresponding to the sentence to be translated.

[0090] In a specific implementation process, the sentence to be translated may be a sentence in any language, and the sentence to be translated may be a sentence in any other language different from the sentence to be translated.

[0091] For example, during the training ...

Embodiment 2

[0129] Based on the same inventive concept, this embodiment provides a machine translation model training device 200, including:

[0130] An acquisition unit 201, configured to acquire the first bilingual sentence pair;

[0131] A deletion unit 202, configured to delete the punctuation marks in the first pair of bilingual sentences according to a preset probability to obtain a second pair of bilingual sentences;

[0132] The training unit 203 is configured to use the second bilingual sentence pair to train the machine translation model.

[0133] As an optional implementation manner, the first two-sentence pair includes:

[0134] A sentence to be translated, and a translation sentence corresponding to the sentence to be translated.

[0135] As an optional implementation manner, the deletion unit is specifically used for:

[0136] According to a preset probability, delete the first punctuation mark in the sentence to be translated to obtain a second double sentence pair; or ...

Embodiment 3

[0150] Based on the same inventive concept, this embodiment provides a training device for a machine translation model, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it realizes The following steps:

[0151] Obtaining a first pair of bilingual sentences; deleting punctuation marks in the first pair of bilingual sentences according to a preset probability to obtain a second pair of bilingual sentences; using the second pair of bilingual sentences to train a machine translation model.

[0152]As an optional embodiment, the first two-sentence pair includes:

[0153] A sentence to be translated, and a translation sentence corresponding to the sentence to be translated.

[0154] As an optional embodiment, the deleting the punctuation marks in the first bilingual sentence pair according to a preset probability to obtain the second bilingual sentence pair includes:

[0155] Accord...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a training method of a machine translation model. The method comprises the steps of obtaining a first bilingual sentence pair; deleting punctuation marks in the first bilingualsentence pair according to a preset probability to obtain a second bilingual sentence pair; and training a machine translation model by using the second bilingual sentence pair. The technical effectsof reducing overlearning of a machine translation model on punctuation marks and improving translation accuracy are achieved. Meanwhile, the invention further discloses a training device of the machine translation model.

Description

technical field [0001] The invention relates to the field of electronic technology, in particular to a training method and device for a machine translation model. Background technique [0002] With the accumulation of a large number of bilingual sentence pairs, the improvement of computer hardware computing power, and the advancement of machine translation algorithms, the performance of machine translation has been greatly improved. [0003] For some languages, bilingual sentence pairs generally have sentence-ending punctuation. For example: Most sentence ending punctuation in Chinese is ".", and most sentence ending punctuation in English is ".". Since there are a large number of common punctuation at the end of this source sentence, and all of them are at the end of the sentence, a very clear pattern will be formed. When using such double sentence pairs to train the machine translation model, it will often cause over-learning of the punctuation at the end of the sentence....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/28
CPCG06F40/44G06F40/58Y02D10/00
Inventor 施亮亮王宇光姜里羊阳家俊李响卫林钰陈伟
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products