Multi-translation parallel corpus construction system

A parallel corpus and translation technology, applied in semantic tool creation, natural language translation, unstructured text data retrieval, etc., can solve problems such as inability to achieve alignment of multiple translations and inaccurate alignment of corpus, achieve alignment of multiple translations, improve The effect of precision

Active Publication Date: 2016-08-10
BEIJING LANGUAGE AND CULTURE UNIVERSITY
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, the alignment of existing parallel corpus corpora is not precise. For automatic alignment, some use statistical methods, and some use sentence sorting methods, such as sorting the sentence pairs in the parallel corpus according to certain criteria, s...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-translation parallel corpus construction system
  • Multi-translation parallel corpus construction system
  • Multi-translation parallel corpus construction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the embodiments and accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0025] figure 1 It is a schematic structural diagram of the construction system of the multi-translation parallel corpus in the embodiment of the present invention, such as figure 1 As shown, the system includes:

[0026] A deep semantic similarity calculation device 10, which is used to calculate the deep semantic similarity between the source language text sentence and the sentence to be matched in each of the multiple translations;

[0027] Representative dictionary similarity and other statistical information similarity calculation means 20, used to calculate the representative dictionary...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a multi-translation parallel corpus construction system. The system comprises: a depth semantic similarity-degree calculator, used for separately calculating a depth semantic similarity degree between a source language text sentence and a to-be-matched sentence of each translation among multiple translations; a representative dictionary similarity-degree and other statistical information similarity-degree calculator; a fusion matching-degree calculator, used for calculating a fusion matching degree between the source language text sentence and the to-be-matched sentence of each translation among the multiple translations; a sentence matching apparatus, used for performing sentence matching on a source language text and each translation according to the fusion matching degree, wherein fusion matching degrees between the source language text and other translations among the multiple translations are referred to during matching; and a multi-translation parallel corpus construction apparatus, used for constructing a multi-translation parallel corpus according to a matching result. The technical scheme above implements construction of the multi-translation parallel corpus and improves precision of corpus alignment, and the multi-translation parallel corpus constructed by the scheme has robustness.

Description

technical field [0001] The invention relates to the technical field of corpus construction, in particular to a system for constructing a multi-translation parallel corpus. Background technique [0002] With the rapid development of the Internet, the explosive growth of network data and texts has been brought about. The vigorous development of the Internet has brought a wealth of multilingual information. With the help of this rich multilingual information, a better machine translation system can be built. Manual translation is time-consuming and costly, and it can no longer meet people's growing demand for multilingual information. Machine translation can automatically translate one natural language into another natural language. It has become an inevitable trend to use machine translation to quickly obtain multilingual information and resources. This makes machine translation systems and equipment that can provide multilingual, high-quality, and accessible translation ser...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28G06F17/30
CPCG06F16/3344G06F16/36G06F40/58
Inventor 吴平吴增欣唐嘉梨张弛安丰科
Owner BEIJING LANGUAGE AND CULTURE UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products