Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Machine translation method using part-of-speech information

A machine translation and part-of-speech technology, applied in the field of machine translation, can solve the problems such as insufficient utilization of part-of-speech information that is not used to improve the text generation capability of the translation model, and achieve the effect of alleviating half-right and half-mistakes and improving translation performance.

Pending Publication Date: 2022-03-11
EAST CHINA NORMAL UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The existing technology of machine translation does not fully utilize part-of-speech information, and part-of-speech information is only used for the semantic understanding of the translation model, not for improving the text generation ability of the translation model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine translation method using part-of-speech information
  • Machine translation method using part-of-speech information
  • Machine translation method using part-of-speech information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] See attached figure 1 , according to the following steps for machine translation of multi-source information:

[0041] Step 1: Prepare a bilingual corpus

[0042] Download the English-German and English-French datasets from the official website of the wmt14 dataset (http: / / www.statmt.org / wmt14 / ), and perform word segmentation and BPE algorithms on each sentence to obtain the preprocessed text. The data set contains the training set and test set of the corresponding language translation.

[0043] The text in a language is first segmented into tokens, and the text preprocessing is completed through BPE encoding. BPE encoding is a word segmentation method that divides words into subwords. After training on monolingual corpus, subwords that appear frequently can be found and used as the smallest unit of text.

[0044] Step 2: Extract part-of-speech tags

[0045] Use the part-of-speech tagging tool spacy to tag the part-of-speech tags of the bilingual corpus, extract the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a machine translation method using part-of-speech information, which is characterized in that a training set of bilingual corpora containing corresponding part-of-speech tags is used, and after a translation model is trained, the part-of-speech information is used for assisting machine translation; the translation model comprises a shared encoder, a decoder and a part-of-speech classifier, part-of-speech information is imported in different modes, and the ability of model language understanding and text generation is enhanced; the shared encoder is composed of a source sentence encoder and an auxiliary sentence encoder; the decoder comprises an original multi-head attention module and an auxiliary multi-head attention module. Compared with the prior art, the method has the advantages that the performance of a machine translation model is improved through introduced part-of-speech information, the model is translated on the basis of Transform, the part-of-speech information is imported in two components of an encoder and a decoder in different modes, the translation model can utilize the part-of-speech information to enhance the ability of model language understanding and text generation, and the model language understanding and text generation efficiency is improved. And the machine translation quality is improved.

Description

technical field [0001] The invention relates to the technical field of machine translation, in particular to a machine translation method using part-of-speech information to improve translation quality. Background technique [0002] At present, machine translation technology generally adopts typical machine translation, multi-source machine translation, Transformer model, part-of-speech tagging or BPE algorithm. The goal of a typical machine translation is to translate a sentence in the source language into a sentence in the target language, mark the source language with S, mark the target language with T, and use X for the i-th sentence in a language L (ie S or T) L (i) Indicates that the data set on this language consists of these sentences, expressed as N L is the number of these statements. Each sentence consists of a series of ordered tokens, namely where each symbol comes from the vocabulary V of the language L L , that is, any symbol x i ∈V L . The goal of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/253
CPCG06F40/58G06F40/253
Inventor 赵静潘凌超孙仕亮
Owner EAST CHINA NORMAL UNIVERSITY
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More