Method for constructing machine translation test set in chapter-level English translation

A technology of machine translation and construction method, applied in the field of the construction of text-level English-Chinese machine translation test set, which can solve the problem of no evaluation index and so on.

Active Publication Date: 2021-02-19
TIANJIN UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Most of the existing evaluation indicators are those used in automatic evaluation. When calculating the scores of indicators, most of them only consider various linguistic phenomena in sentences, which are more suitable for evaluating various linguistic phenomena in sentences. There are no relevant metrics specifically designed for discourse-level linguistic phenomena

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing machine translation test set in chapter-level English translation
  • Method for constructing machine translation test set in chapter-level English translation
  • Method for constructing machine translation test set in chapter-level English translation

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0088]Source language data:

[0089]Previous: You Rich Guys Think That Money Can Buy Anything.

[0090]Current sentence: how right you are.

[0091]Target language data:

[0092]Previous sentence: Your rich people always have money to buy everything.

[0093]Current sentence: You are too right.

[0094]Chapter Level Connection Word Test Set, requires one of five chapter level connections such as "AS", "OR", "While", "Since", "Though", and "though" in the current sentence in the source language data. The word "CC", "in", "WRB" needs to meet one of "CC", "in", "WRB", because the expression of the Chinese chapter level connection is more diverse, we use the first automatic filtering sentence pair, then take manual check to meet the source language The conditions in the data but the target language data does not contain the corresponding connection word pair, and then check whether the information used by the connection to eliminate the ambiguity is in the previous sentence, and finally each meaning of e...

example 2

[0097]Source language data:

[0098]Previous sentence: Everything is so difficult in life, for me.

[0099]Current sentence: While for Others It's All Child's Play.

[0100]Target language data:

[0101]Previous sentence: For me, life is very difficult.

[0102]Current sentences: It is like children.

[0103]The omission of the test set first matches the sentence of the source language data to the source language data. The sentence is filtered, which contains "do", "does", "can", "could", "shouth", "is", "am "" Are "," May "sentence pair, then require the verbs in the previous sentence of source language data, ie, word" VC "," VE "," VV ", and then check the current sentence in the target language data. The verb and the previous sentence consistency, and finally select a certain number of test cases to constitute the omitted test set.

[0104]Then check the verbs included in the previous sentence of source language data, ie, word, "VE", "VV", and then check the verbs in the current sentence in the targe...

example 3

[0106]Source language data:

[0107]Previous sentence: you see, she doesn't know.

[0108]Current sentence: Neither Do I.

[0109]Target language data:

[0110]Previous sentence: Look, she doesn't know.

[0111]When I don't know.

[0112]Step 4, perform artificial inspections of the selected test cases, correct translation errors.

[0113]Table 1: BLEU automatic score results

[0114] pronoun Chapter level connection word Omit thumt 12.49.818.2 CADEC 19.115.325.5 BERT-NMT13.912.719.1

[0115]As can be seen from Table 1: From the perspective of bleu (bilingual evaluation replacement) value, the CADEC (combined context decoder) model is the highest in three language phenomena, indicating that the model is in three chart level languages. The best translation effect is the best, BERT-NMT (Combined BERT's neural machine translation) fusion BERT's neuromechanical translation of the Gert, the Bleu value of the model is second, and thumt (Tsinghua University machine translation) model is the lowest, indicating th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for constructing a machine translation test set in chapter-level English translation. The method comprises the following steps of obtaining chapter-level English textdata with anaphora, connection and omission connection grammars and corresponding Chinese text data; filtering the acquired data to form text data only containing English and Chinese vocabularies; taking the English text data as source language data, and taking the Chinese text data as target language data; respectively selecting two-meaning pronouns, polysemy contiguous words and assistant wordsas search parameters, searching source language data, and checking and correcting target language data; performing word segmentation processing and part-of-speech tagging on the checked and correctedtwo language data to obtain a candidate data set; and respectively setting screening parameters, screening corresponding source language data and corresponding target language data from the candidatedata set, and respectively making an exponential test set, a chapter-level conjunction test set and an omission test set. The method can be used for testing and evaluating the chapter-level translation capability of different machine translation models.

Description

Technical field[0001]The invention relates to the field of machine translation, in particular to a method for constructing a machine translation test set for text-level English translation.Background technique[0002]At present, with the gradual improvement of machine translation technology, there are more and more researches on machine translation close to practical applications, and the research focus in the field of machine translation has gradually transitioned from sentence level to text level. Compared with sentence-level machine translation, text-level machine translation focuses on a wider range of text, and has more problems and phenomena to be considered, so the difficulty is further increased.[0003]While studying how the machine translation model can further improve the translation ability, how to evaluate the model translation ability more reasonably has also become a problem for researchers. The text-level machine translation model should not only consider the translation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/58G06F40/44G06F40/289
CPCG06F40/58G06F40/44G06F40/289
Inventor 蔡心怡熊德意
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products