Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for generating Zhongtai bilingual corpus based on multistage translation model

A technology of translation model and corpus, applied in the field of text translation, can solve the problem of insufficient translation accuracy and achieve good synthesis effect

Active Publication Date: 2020-05-19
IOL WUHAN INFORMATION TECH CO LTD
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current training method has the problem that the translation accuracy is not high enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating Zhongtai bilingual corpus based on multistage translation model
  • Method and device for generating Zhongtai bilingual corpus based on multistage translation model
  • Method and device for generating Zhongtai bilingual corpus based on multistage translation model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0046] figure 1 A schematic flow diagram of the method for generating Chinese-Thai bilingual corpus based on the multi-level translation model of the embodiment of the present invention, as figure 1 shown, including:

[0047] S101. Obtain the original sentence in Chinese and the original sentence in Thai;

[0048] In the embodiment of the pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a method and device for generating a Chinese-Thai bilingual corpus based on a multistage translation model. The method comprises the following steps: obtainingChinese original text sentences and Thai original text sentences; inputting the Chinese original text sentences into a pre-trained first two-stage translation model, outputting Thai translation sentences, inputting the Thai original text sentences into a pre-completed second two-stage translation model, and outputting Chinese translation sentences, wherein the first two-stage translation model and the second two-stage translation model are formed by jointly training high-resource Chinese-English bilingual corpora, high-resource English-Thai bilingual corpora and low-resource Chinese-Thai bilingual corpora from two translation directions. According to the embodiment of the invention, under the condition that only the Chinese-English bilingual corpus and the English-Thai bilingual corpus exist, the translation model capable of generating the Chinese-Thai bilingual corpus is obtained, and the obtained translation model is jointly trained from two translation directions through the low-resource Chinese-Thai bilingual corpus, so that the expression of the model is improved to better synthesize the corpus.

Description

technical field [0001] The present invention relates to the technical field of text translation, and more specifically, to a method and device for generating Chinese-Thai bilingual corpus based on a multi-level translation model. Background technique [0002] Training a high-quality machine translation model often requires millions of bilingual parallel corpora. However, for some bilingual language pairs with relatively scarce resources, such as Chinese and Thai bilingual corpora, the construction of machine translation models is often a big challenge. [0003] In order to solve this problem, NLP (Natural Language Processing) engineers often do data synthesis to generate more bilingual data, and then use these synthetic corpus for machine translation model training. Due to the easy availability of high-quality monolingual corpora in large quantities, current main synthesis methods are based on a large amount of monolingual corpus and a small amount of bilingual corpus. For...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/289G06N3/04G06N3/08
CPCG06N3/084G06N3/044G06N3/045Y02D10/00
Inventor 张睦
Owner IOL WUHAN INFORMATION TECH CO LTD