Machine translation model and pseudo-professional parallel corpus determination method, system and device

A machine translation and parallel corpus technology, applied in the information field, can solve the problems of limited translation quality improvement of the neural machine translation model, limited pseudo-professional parallel corpus, etc.

Active Publication Date: 2020-03-17
HUAWEI TECH CO LTD
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] This application provides a machine translation model, a method for determining pseudo-professional parallel corpus, a system, and equipment, which can solve the problem of limited pseudo-professional parallel corpus generated in related technologies and limited improvement of the translation quality of neural machine translation models in professional fields

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine translation model and pseudo-professional parallel corpus determination method, system and device
  • Machine translation model and pseudo-professional parallel corpus determination method, system and device
  • Machine translation model and pseudo-professional parallel corpus determination method, system and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0072] First, some terms involved in the embodiments of the present application are explained to facilitate understanding.

[0073] Neural Machine Translation: It is a machine translation method that began to emerge in 2014. NMT gradually applies techniques such as recurrent neural networks, convolutional neural networks, and attention mechanisms to construct encoding and decoding models for text sequences, thereby realizing translation of texts. Since 2016, neural machine translation has basically completely replaced traditional statistical-based machine translation.

[0074] Domain Adaptation Learning: It is a method of transfer learning. Domain adaptation is suitable for solving the challenge of inconsistent distribution of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a machine translation model and a pseudo-professional parallel corpus determination method, system and device, belongs to the technical field of information, and further relates to application of artificial intelligence in the field. The method comprises the following steps of: obtaining a first universal parallel corpus and professional parallel word pairs in the professional field, searching a candidate parallel statement pair corresponding to each professional parallel word pair from the first universal parallel corpus, and replacing the corresponding universal parallel word pair in the corresponding candidate parallel statement pair with the professional parallel word pair to obtain a pseudo-professional parallel corpus. According to the scheme, more pseudo-professional parallel corpora are generated according to the professional parallel word pairs. Moreover, professional information of professional parallel word pairs in the professional field is introduced into the scheme, so that the translation quality of the obtained neural machine translation model in the professional field is greatly improved after the basic neural machine translation model is further finely tuned by using the pseudo-professional parallel corpus generated by the scheme.

Description

technical field [0001] The present application relates to the field of information technology, in particular to a method, system and equipment for determining a machine translation model and a pseudo-professional parallel corpus. Background technique [0002] Currently, neural machine translation (neural machine translation, NMT) models have been widely used in daily life, for example, to translate documents, news, etc. However, texts have diverse meanings in different contexts and fields, and at present, neural machine translation models are basically trained through general-purpose parallel corpora, and most of the parallel corpora in general-purpose parallel corpora are non-professional fields (out-of -domain, OOD), there are few parallel corpora in the professional domain (in-domain, IND). Therefore, the translation quality of the neural machine translation model obtained by using general parallel corpus training in the professional domain is low. In order to ensure the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/284
Inventor 黄崇轩彭伟赵金阳刘群陈云
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products