Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for parallel corpus alignment

A technology of parallel corpus and sentences, applied in the field of translation, to save time, realize automation, and improve efficiency

Active Publication Date: 2018-08-10
IOL WUHAN INFORMATION TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiments of the present invention is to overcome the above-mentioned deficiencies in the prior art, and provide a method for aligning parallel corpora, which solves the problem of aligning original and translated texts based on the similarity of content words
[0005] Another purpose of the embodiments of the present invention is to overcome the above-mentioned deficiencies in the prior art, and provide a device for aligning parallel corpora, which solves the problem of aligning original and translated texts based on the similarity of content words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for parallel corpus alignment
  • Method and device for parallel corpus alignment
  • Method and device for parallel corpus alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0027] An embodiment of the present invention provides a method for aligning parallel corpora. Such as figure 1 Shown is a flow chart of a method for aligning parallel corpora in an embodiment of the present invention. The specific process of the parallel corpus alignment method is as follows:

[0028] Step S10: Convert all the original text sentences in the original text and all the target text sentences in the translation text into characters of the same coding mode.

[0029] Step S10 comprises the following steps:

[0030] Step S101: Read the characters or character strings in the original s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a parallel corpus aligning method. The method comprises the following steps: converting all the original text sentences in an original text and all the translated text sentences in a translated text into characters with a same encoding manner; carrying out word segmentation on the converted original text sentences in the original text and removing stop words therein so as to obtain content words; obtaining all the translated items of each content word of each original text sentence; matching all the translated items of each content word of each original text sentence in the converted translated text sentences of the translated text so as to obtain the similarity between each word content of each original text sentence and the translated text sentences; matching each original text sentence with the translated text sentences according to the similarity between each word content of each original text sentence and the translated text sentences so as to obtain the similarity between each original text sentence and the translated text sentences; and matching and aligning the translated text sentences having the highest similarity with the original text sentences with the original text sentences. The invention furthermore discloses a parallel corpus aligning device. According to the method and device, the problem of aligning between the original texts and the translated texts is solved.

Description

technical field [0001] The invention relates to the technical field of translation, in particular to a method and device for aligning parallel corpus. Background technique [0002] Parallel corpora play a fundamental role in many fields such as machine translation, assisted translation, semantic disambiguation, and lexicography. Alignment of parallel corpora refers to matching the original text and the translated text at different segmentation granularities to form a standardized language pair. The unit of corpus alignment has different granularities such as chapters, paragraphs, sentences, and words from large to small. The smaller the granularity of the parallel corpus, the richer the language information it provides and the greater its application value. [0003] Generally speaking, if the corpus is aligned by chapter or paragraph, the original text and the translation can be aligned in order. However, aligning the source text and the target text by sentences or smaller...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/25G06F17/22G06F17/27G06F40/189
CPCG06F40/126G06F40/189G06F40/30
Inventor 江潮张芃
Owner IOL WUHAN INFORMATION TECH CO LTD