Parallel corpus aligning method and device

A technology of parallel corpus and sentences, which is applied in the field of translation to achieve the effect of aligning the original text with the translated text, saving time and improving efficiency

Active Publication Date: 2016-06-08
IOL WUHAN INFORMATION TECH CO LTD
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiments of the present invention is to overcome the above-mentioned deficiencies in the prior art, and provide a method for aligning parallel corpora, which solves the problem of aligning original and translated texts based on the similarity of content words
[0005] Another purpose of the embodiments of the present invention is to overcome the above-mentioned deficiencies in the prior art, and provide a device for aligning parallel corpora, which solves the problem of aligning original and translated texts based on the similarity of content words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel corpus aligning method and device
  • Parallel corpus aligning method and device
  • Parallel corpus aligning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the objectives, technical solutions and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0027] The embodiment of the present invention provides a method for parallel corpus alignment. Such as figure 1 Shown is a flowchart of a method for aligning parallel corpora according to an embodiment of the present invention. The specific process of the parallel corpus alignment method is as follows:

[0028] Step S10: Convert all original sentences in the original text and all target sentences in the translation into characters in the same encoding mode.

[0029] Step S10 includes the following steps:

[0030] Step S101: Read the characters or character strings in the original sentence accor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallel corpus aligning method. The method comprises the following steps: converting all the original text sentences in an original text and all the translated text sentences in a translated text into characters with a same encoding manner; carrying out word segmentation on the converted original text sentences in the original text and removing stop words therein so as to obtain content words; obtaining all the translated items of each content word of each original text sentence; matching all the translated items of each content word of each original text sentence in the converted translated text sentences of the translated text so as to obtain the similarity between each word content of each original text sentence and the translated text sentences; matching each original text sentence with the translated text sentences according to the similarity between each word content of each original text sentence and the translated text sentences so as to obtain the similarity between each original text sentence and the translated text sentences; and matching and aligning the translated text sentences having the highest similarity with the original text sentences with the original text sentences. The invention furthermore discloses a parallel corpus aligning device. According to the method and device, the problem of aligning between the original texts and the translated texts is solved.

Description

Technical field [0001] The present invention relates to the technical field of translation, in particular to a method and device for parallel corpus alignment. Background technique [0002] Parallel corpora plays a fundamental role in many fields such as machine translation, auxiliary translation, semantic disambiguation and dictionary compilation. The alignment of parallel corpora refers to the correspondence between the original text and the translation according to different division granularities to form a standardized language pair. The unit of corpus alignment has different granularities from large to small, such as chapters, paragraphs, sentences, and words. The smaller the granularity of parallel corpus, the richer the language information it provides and the greater the application value. [0003] Generally speaking, if the corpus is aligned by chapter or paragraph, the original text and the target text can be aligned in order. However, aligning the original text and tar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/25G06F17/22G06F17/27G06F40/189
CPCG06F40/126G06F40/189G06F40/30
Inventor 江潮张芃
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products