Unlock instant, AI-driven research and patent intelligence for your innovation.

A Method for Avoiding Duplication of Segments in Machine Translation

A machine translation and translation technology, applied in natural language translation, instruments, computing, etc., can solve the problems of large limitations, repeated punishment strategies cannot be fully effective, and achieve the effect of avoiding translation repetition.

Active Publication Date: 2021-09-24
NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The above-mentioned technology is mainly used in beam search (Beam Search) when evaluating multiple candidate translations. It has relatively large limitations and is not suitable for greedy algorithms and other fields. In addition, length penalty and repetition penalty are adopted. The two penalty factors function In the same function, it will lead to mutual influence, and the repeated punishment strategy cannot be fully effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Avoiding Duplication of Segments in Machine Translation
  • A Method for Avoiding Duplication of Segments in Machine Translation
  • A Method for Avoiding Duplication of Segments in Machine Translation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0042] Such as figure 1 As shown, this embodiment discloses a method for avoiding the repetition of machine translation translation segments. During the decoding process of the greedy algorithm of machine translation, the duplicate segment detection mechanism of the translation is used to punish the generation probability of the repeated target words. As the length of the repeated segment increases, the generation probability of the target word is punished at the logarithmic level, linear level, and exponential level in turn, so as to avoid the purpose of machine translation to generate repeated segments, specifically including the following steps:

[0043] Step 1: Data processing: Process the bilingual parallel corpus in the form of sentence pairs, the form is: source language sentence, target language sentence, namely (s i , t i )...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for avoiding the repetition of machine translation translation segments, which belongs to the field of machine translation in natural language processing. In the decoding process of the greedy algorithm of machine translation, the duplicate segment detection mechanism of the translation is used to generate repeated target words. Probability is punished, and along with the length of repeated segment increases, carry out logarithmic level, linear level, exponential level penalty to target word generation probability successively, thereby reach the purpose of avoiding machine translation to generate repeated segment, stepwise punishment of the present invention, aggravate gradually, guarantee It does not affect the decoding process of the original translation too much, and can effectively reduce the false alarm rate of repeated segment penalties; the present invention fully considers the existence of repeated segments in the original text, and allows the length of repeated segments in the translation to be less than or equal to the length of the original text. Therefore, the consistency of the original text and the translation is guaranteed to the greatest extent, that is, the false alarm rate of the proposed method is reduced.

Description

technical field [0001] The invention relates to the field of machine translation in natural language processing, in particular to a method for avoiding repetition of machine translation translation segments. [0002] In the decoding process of machine translation, the repeat segment detection mechanism is used to penalize the generation probability of the repeated generated target words. As the length of repeated segments increases, punishment strategies such as logarithmic, linear, and exponential levels are used in turn to target words, so as to avoid the purpose of machine translation to generate repeated segments. Background technique [0003] With the further development of the trend of globalization and the proposal of the Belt and Road Initiative, machine translation has become an important research topic for communication between different language and ethnic groups. However, both academia and industry have found that machine translation based on deep learning metho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/58G06F40/44G06F40/242
CPCG06F40/242G06F40/44G06F40/58
Inventor 张学强张丹董晓飞万怡方曹峰
Owner NANJING NEW GENERATION ARTIFICIAL INTELLIGENCE RES INST CO LTD