Term translation mining method based on parallel corpus

A parallel corpus and terminology technology, applied in the field of terminology translation, it can solve the problems that the coverage of terminology dictionaries cannot meet the needs and the efficiency of manual translation is low, and achieve the effect of ensuring consistency, rapid acquisition, and improving accuracy and efficiency.

Pending Publication Date: 2022-05-31
IOL WUHAN INFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, in the process of extracting terms from manuscripts to be translated, methods such as manual translation or searching and matching in terminology dictionaries are generally used to obtain term translations, but there are problems such as low efficiency of manual translation or the coverage of terminology dictionaries may not be able to meet the needs of term translation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Term translation mining method based on parallel corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1 As shown, a term translation mining method based on a parallel corpus includes the following steps:

[0044] S1. Extracting the original term S in the manuscript to be translated;

[0045] S2. Retrieve the parallel corpus M: perform a character string search for the original term S in the parallel corpus M;

[0046] S3. Obtain a set Q of candidate original text and translation sentences containing the same original term S;

[0047] S4, word segmentation and word alignment: perform word segmentation and word alignment on the sentence-by-sentence pairs in the candidate original text and the translation sentence set Q;

[0048] S5. Extracting the term translation t corresponding to the original term S in the word alignment result;

[0049] S6, generate translation phrase set: use phrase extraction algorithm or spaCy tool to extract the phrase segment of each sentence in the candidate translation sentence, form the translation phrase set p after deduplic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a term translation mining method based on a parallel corpus. The method specifically comprises the following steps: S1, extracting an original text term S in a to-be-translated manuscript; s2, retrieving a parallel corpus M; s3, obtaining a candidate original text and translated text sentence set Q containing the same original text term S; s4, performing word segmentation and word alignment; s5, extracting a term translation t corresponding to the original term S in the word alignment result; s6, generating a translation phrase set p; s7, generating an embedded vector and calculating a distance d; and S8, fusing the models to obtain the optimal translation of the original text term S. According to the method, the optimal translation of the original term is quickly obtained by retrieving the parallel corpus, so that the accuracy and consistency of term translation are ensured, and the translation efficiency is improved.

Description

technical field [0001] The invention belongs to the technical field of term translation, in particular to a term translation mining method based on a parallel corpus. Background technique [0002] The translation of terminology is a key issue in translation projects. Since terminology is professional and may even be a new word, its accuracy directly affects the translation quality of the entire translation. The processing method of term translation in existing translation projects is: with the assistance of human translators or term extraction tools, the pre-translation processing needs to extract terminology from the manuscript to be translated; When a sentence containing a term is encountered, the result of the term translation is directly applied, thereby ensuring the accuracy and consistency of the term translation in the entire translation manuscript. At present, in the process of extracting terms from manuscripts to be translated, methods such as manual translation or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/289
CPCG06F40/211G06F40/289
Inventor 毛红保
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products