Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-language general translation method based on multi-granularity semantic alignment

A multi-language, multi-granularity technology, applied in the field of machine translation, can solve the problems of not being able to use multi-lingual knowledge, reduce the semantic relevance of phrases/sentences, and difficult to train, so as to achieve good generalization and practicability, and improve generalization sexual effect

Pending Publication Date: 2021-11-02
SUN YAT SEN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is currently no multilingual machine translation model that utilizes the knowledge in the existing multilingual pre-trained language model with high efficiency and high quality. Specifically:
[0004] (1) Current approaches focus on designing separate encoders or decoders for each language, resulting in a bulky and inconvenient model, or designing shared encoders and decoders with a large number of parameters requiring massive parallel corpus training, Both are difficult to train;
[0005] (2) In the existing method, the existing machine translation model using the multilingual pre-training model is still designed as a single language machine translation model in two languages, and cannot utilize multilingual knowledge other than the training language;
[0006] (3) In existing methods, the alignment of vector spaces in different languages ​​only considers the alignment of word vectors, and less consideration is given to the alignment of high-level feature vectors such as phrase vectors and sentence vectors, which reduces the number of phrases with the same meaning in different languages. Semantic Relevance of Sentences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-language general translation method based on multi-granularity semantic alignment
  • Multi-language general translation method based on multi-granularity semantic alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

[0038] In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

[0039] For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

[0040] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] Such as Figure 1-2 As shown, in this embodiment, the source language sentence in the data set is denoted as S, and the source language is denoted as L s , the target language sentence is denoted as T, and the target language is denoted as L t , H is the set of labels that have been partially translated or not, H={yes, no}. For a piece of data in the data set, that is, a parallel sente...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-language general translation method based on multi-granularity semantic alignment, which aims at the problem that word vectors of a multi-language pre-training model are not aligned, provides a local translation method for the first time, and realizes alignment of semantic features on different granularities; knowledge learned by a multi-language pre-training model is fused and migrated to a new language, and the method effectively faces low-resource corpora and also effectively serves zero sample learning. Dependency among words is obtained through analysis of a multi-language pre-training model, so that syntactic structure analysis of multi-language versions is achieved, universal dependency of different languages is obtained through a tree structure, the influence of different word orders of different languages is eliminated, generalization of the model is further improved, and the model becomes a universal translation model; it is verified that the multi-language pre-training model can migrate knowledge to a new language, and good generalization and practicability are achieved.

Description

technical field [0001] The present invention relates to the field of machine translation, and more specifically, relates to a multi-language general translation method based on multi-granularity semantic alignment. Background technique [0002] As one of the core tasks of natural language processing, machine translation has attracted the attention of many scholars all the year round. Machine translation works by understanding, analyzing the meaning of the source language, and converting it into the target language while preserving the original meaning as much as possible. In recent years, one implementation of machine translation, neural machine translation, has become the dominant paradigm not only in academia but also in business. Compared with traditional statistical machine translation, neural machine translation mainly benefits from distributed word vectors and end-to-end model design. The former makes fine-grained features possible, and the latter reduces error transm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/211G06F40/279G06F40/30G06N3/08
CPCG06F40/58G06F40/211G06F40/279G06F40/30G06N3/084
Inventor 万海苏蓝彭勃黄佳莉曾娟
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products