Online corpus alignment method and system

A corpus, sentence pair technology, applied in the field of translation, to achieve the effect of improving alignment accuracy and efficiency, improving alignment efficiency, and reducing alignment time

Active Publication Date: 2016-11-16
上海一者信息科技有限公司
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] For the alignment of large-scale bilingual translation texts, two problems must be solved. One is the alignment accuracy, and the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online corpus alignment method and system
  • Online corpus alignment method and system
  • Online corpus alignment method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] figure 1 is a flowchart of an online corpus alignment method of the present invention, see figure 1 , the online corpus alignment method, including steps:

[0049] S1: analyzing the bilingual mutual translation file to obtain the result file;

[0050] S2: Adjust the paragraphs of the result file, so that the paragraphs between the original text and the translation are compared;

[0051] S3: The original text and the translation are automatically segmented to obtain the original text sentence and the translation sentence through the preset sentence segmentation rules, and the permutation and combination of the original text sentence and the translation sentence are calculated according to the preset arrangement rules;

[0052] S4: Calculate the sentence similarity corresponding to each permutation and combination of the original sentence and the translation sentence, and select the permutation and combination with the largest similarity as the final sentence alignment ...

Embodiment 2

[0062] figure 2 It is a schematic diagram of an online corpus alignment system in Embodiment 2 of the present invention. The online corpus alignment system 100 includes:

[0063] File parsing filter 10, for parsing bilingual translation files to obtain result files;

[0064] The paragraph alignment module 20 is used to adjust the paragraphs of the result file, so that the paragraphs between the original text and the translation are compared;

[0065] The sentence alignment module 30 is used to automatically segment the original text and the translation through the preset sentence segmentation rules to obtain the original sentence and the translation sentence, and calculate the permutation and combination of the original sentence and the translation sentence according to the preset arrangement rules; then calculate each original sentence Sentence similarity corresponding to the permutation and combination of translated sentences, and the permutation and combination with the h...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online corpus alignment method and system. The method comprises the steps of analyzing a bilingual inter-translated file to obtain a result file; performing paragraph adjustment on the result file to enable paragraphs between an original text and a translated text to correspond; automatically performing sentence segmentation on the original text and the translated text through a preset sentence segmentation rule to obtain original text sentences and translated text sentences, and performing calculation according to a preset arrangement rule to obtain arrangement combinations of the original text sentences and the translated text sentences; and calculating sentence similarity corresponding to each arrangement combination of the original text sentences and the translated text sentences, and selecting the arrangement combination with the maximum similarity as a final sentence-sentence alignment result. According to the method and the system, the accuracy of alignment can be improved.

Description

technical field [0001] The invention relates to the technical field of translation, in particular to an online corpus alignment method and system. Background technique [0002] Today, there are a large number of bilingual translations in traditional translation companies, industry customers, universities and the Internet. How to build a large-scale bilingual corpus through these bilingual translation texts has become a crucial issue for the processing of bilingual translation texts. [0003] For the alignment of large-scale bilingual inter-translated texts, two problems must be solved. One is the alignment accuracy, and the other is that if the alignment is achieved efficiently, manual participation can be minimized and production efficiency can be improved. These two problems are problems to be solved urgently by those skilled in the art. [0004] It should be noted that the above introduction to the technical background is only for the convenience of a clear and complete...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
CPCG06F40/45G06F40/58
Inventor 张井陈件
Owner 上海一者信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products