Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Method for Translation of Low-Frequency Words Based on Semantic Information Fusion

A technology of semantic information and low-frequency words, applied in the field of machine translation

Active Publication Date: 2021-02-05
NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, considering that the current neural machine translation system usually uses two methods of subword segmentation and replacement mechanism to solve the translation problem of low-frequency words, at this time, low-frequency words are usually segmented into subword sequences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Translation of Low-Frequency Words Based on Semantic Information Fusion
  • A Method for Translation of Low-Frequency Words Based on Semantic Information Fusion
  • A Method for Translation of Low-Frequency Words Based on Semantic Information Fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to enable those skilled in the art to better understand the technical solution of the present invention, the process of the neural machine translation system is described by taking a translation system based on a recurrent neural network (RNN) and an attention mechanism (Attention) as an example, and then using this framework as This example illustrates how to effectively integrate the vector representations of low-frequency words in the source language, low-frequency words in the target language, and the vector representation of the wildcard UNKi. It should be noted that the present invention can also be extended to other neural network translation systems, such as convolutional neural network (CNN)-based translation systems, and attention-based translation systems.

[0060] Description of translation system based on RNN and Attention:

[0061] Such as figure 1 As shown, the schematic diagram of the neural network translation model based on RNN and Attention,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention proposes a low-frequency word translation method based on semantic information fusion, which belongs to the field of machine translation. A pair of sentences is input into the translation system, wherein the source language sentence X and the target language sentence y corresponding to the source language sentence are obtained in the source language The subword sequence of the low-frequency words in the sentence, the target translation corresponding to the low-frequency words in the target language sentence is obtained, and the wildcard UNKi is used to replace the low-frequency words in the bilingual sentence pair (x, y), and the new bilingual sentence pair is obtained. The vector representations of words and / or low-frequency words in the target language are fused with the vector representations of the wildcard UNKi. The present invention closely follows the core idea of ​​semantic fusion, and proposes three specific forms of semantic fusion, including integrating low-frequency word vector representations in the source language, integrating low-frequency word vector representations in the target language, and integrating low-frequency word vector representations at both ends. Vectors in two languages ​​and two vector spaces are used to represent the semantic information of low-frequency words.

Description

technical field [0001] The present invention relates to the field of machine translation, especially for the conversion task of low-frequency words in the neural machine translation system. By making full use of the semantic vector representation of source-side and target-side low-frequency words in the process of model training and decoding, the low-frequency words and even the whole sentence are improved. translation quality. Background technique [0002] Low-frequency words refer to a class of words that are sparse in frequency or never appear in large-scale bilingual parallel corpora. Depending on the degree of frequency, it is usually called Unknown words or Out of vocabulary in natural language processing. Due to its characteristics of sparse frequency and single translation, the translation of low-frequency words has always been the focus and difficulty of machine translation research. Especially in the current mainstream neural machine translation, the word list is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F40/284G06K9/62G06N3/04
CPCG06N3/044G06N3/045G06F18/253
Inventor 张学强董晓飞曹峰石霖孙明俊
Owner NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products