Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for online corpus alignment

A corpus, sentence pair technology, applied in the field of translation, to achieve the effect of improving alignment efficiency, improving alignment accuracy and efficiency, and reducing alignment time

Active Publication Date: 2019-10-22
上海一者信息科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] For the alignment of large-scale bilingual translation texts, two problems must be solved. One is the alignment accuracy, and the other is that if the alignment is achieved efficiently, manual participation can be minimized and production efficiency can be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for online corpus alignment
  • A method and system for online corpus alignment
  • A method and system for online corpus alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] figure 1 is a flow chart of an online corpus alignment method of the present invention, see figure 1 , the online corpus alignment method includes steps:

[0049] S1: parse the bilingual translation file to obtain the result file;

[0050] S2: Paragraph adjustment is performed on the result file, so that paragraphs are compared between the original text and the translated text;

[0051] S3: automatically segment the original text and the translation through the preset segmentation rules to obtain the original sentence and the translation sentence, and calculate the arrangement and combination of the original sentence and the translation sentence according to the preset arrangement rule;

[0052] S4: Calculate the sentence similarity corresponding to each permutation and combination of the original sentence and the translated sentence, and select the permutation and combination with the largest similarity as the final sentence alignment result.

[0053] The beneficial...

Embodiment 2

[0062] figure 2 It is a schematic diagram of an online corpus alignment system according to the second embodiment of the present invention. The online corpus alignment system 100 includes:

[0063] The file parsing filter 10 is used for parsing the bilingual translation file to obtain the result file;

[0064] The paragraph alignment module 20 is used to perform paragraph adjustment on the result file, so that paragraphs are compared between the original text and the translated text;

[0065] The sentence-sentence alignment module 30 is used to automatically segment the original text and the translated text according to the preset sentence segmentation rules to obtain the original sentence and the translation sentence, and according to the preset arrangement rules, calculate and obtain the arrangement and combination of the original sentence and the translation sentence; and then calculate each original sentence. The sentence similarity corresponding to the permutation and c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an online corpus alignment method and system. The method comprises the steps of analyzing a bilingual inter-translated file to obtain a result file; performing paragraph adjustment on the result file to enable paragraphs between an original text and a translated text to correspond; automatically performing sentence segmentation on the original text and the translated text through a preset sentence segmentation rule to obtain original text sentences and translated text sentences, and performing calculation according to a preset arrangement rule to obtain arrangement combinations of the original text sentences and the translated text sentences; and calculating sentence similarity corresponding to each arrangement combination of the original text sentences and the translated text sentences, and selecting the arrangement combination with the maximum similarity as a final sentence-sentence alignment result. According to the method and the system, the accuracy of alignment can be improved.

Description

technical field [0001] The invention relates to the technical field of translation, in particular to an online corpus alignment method and system. Background technique [0002] There are a large number of bilingual translation texts in today's traditional translation companies, industry customers, universities and the Internet. How to build a large-scale bilingual corpus through these bilingual translation texts has become a crucial issue for the processing of bilingual translation texts. [0003] For the alignment of large-scale bilingual inter-translation texts, two problems must be solved, one is the problem of alignment accuracy, and the other is that if the alignment is efficiently implemented, manual participation can be minimized and production efficiency can be improved. These two problems are urgent problems to be solved by those skilled in the art. [0004] It should be noted that the above description of the technical background is only for the convenience of cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/28
CPCG06F40/45G06F40/58
Inventor 张井陈件
Owner 上海一者信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products