Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Low-frequency word translation method based on semantic information fusion

A technology of semantic information and low-frequency words, applied in the field of machine translation

Active Publication Date: 2020-06-12
NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, considering that the current neural machine translation system usually uses two methods of subword segmentation and replacement mechanism to jointly solve the translation problem of low-frequency words, at this time the low-frequency words are usually segmented into subword sequences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-frequency word translation method based on semantic information fusion
  • Low-frequency word translation method based on semantic information fusion
  • Low-frequency word translation method based on semantic information fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to enable those skilled in the art to better understand the technical solution of the present invention, the process of the neural machine translation system is described by taking a translation system based on a recurrent neural network (RNN) and an attention mechanism (Attention) as an example, and then using this framework as This example illustrates how to effectively integrate the vector representations of low-frequency words in the source language, low-frequency words in the target language, and the vector representation of the wildcard UNKi. It should be noted that the present invention can also be extended to other neural network translation systems, such as translation systems based on convolutional neural networks (CNN), and translation systems based entirely on attention mechanisms.

[0060] Description of translation system based on RNN and Attention:

[0061] Such as figure 1 As shown, the schematic diagram of the neural network translation model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a low-frequency word translation method based on semantic information fusion, and belongs to the field of machine translation, bilingual sentence pairs are input into a translation system, wherein the source language sentence is X, the target language sentence y corresponds to the source language sentence; obtaining a sub-word sequence of low-frequency words in the source language sentence; and obtaining a target translation corresponding to the low-frequency word in the target language sentence, replacing the low-frequency word in the bilingual sentence pair (x, y) witha wildcard character UNKi to obtain a new bilingual sentence pair, and fusing the vector representation of the source language low-frequency word and / or the target language low-frequency word with the vector representation of the wildcard character UNKi. According to the invention, the core thought of semantic fusion is fastened; specific forms of semantic fusion including integration of source language low-frequency word vector representation, integration of target language low-frequency word vector representation and fusion of two-end low-frequency word vector representation are provided, and vectors of low-frequency words in two languages and two vector spaces are fully utilized to represent semantic information of the low-frequency words.

Description

technical field [0001] The present invention relates to the field of machine translation, especially for the conversion task of low-frequency words in the neural machine translation system, by making full use of the semantic vector representation of source-side and target-side low-frequency words in the process of model training and decoding, thereby improving low-frequency words and even full sentences translation quality. Background technique [0002] Low-frequency words refer to a class of words that are sparse in frequency or never appear in large-scale bilingual parallel corpora. Depending on the degree of frequency, it is usually called Unknown words or Out of vocabulary in natural language processing. Due to its characteristics of frequency sparsity and translation unity, the translation of low-frequency words has always been the focus and difficulty of machine translation research. Especially in the current mainstream neural machine translation, the word list is li...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/58G06F40/284G06K9/62G06N3/04
CPCG06N3/044G06N3/045G06F18/253
Inventor 张学强董晓飞曹峰石霖孙明俊
Owner NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products