Method and device for extension of data in bilingual corpuses

A bilingual corpus and expansion device technology, which is applied in natural language data processing, digital data processing, special data processing applications, etc., can solve problems such as bilingual corpus data sparseness, and achieve the effect of solving data sparseness

Active Publication Date: 2014-02-12
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention proposes a bilingual corpus da...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extension of data in bilingual corpuses
  • Method and device for extension of data in bilingual corpuses
  • Method and device for extension of data in bilingual corpuses

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only parts related to the present invention are shown in the drawings but not all content.

[0024] figure 1 A first embodiment of the invention is shown.

[0025] figure 1 It is a schematic flowchart of the bilingual corpus data expansion method provided by the first embodiment of the present invention. This method is suitable for expanding the bilingual corpus of the source language-target language corpus based on the source language-pivot language corpus and the pivot language-target language corpus. Specifically, it can be realized by the data expansion device of the bilingual corpus , the device can be configu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for extension of data in bilingual corpuses. The method for extension of the data in the corpuses includes the steps that the source language-pivot language corpus is searched for at least one first pivot language phrase matched with the semanteme of a first source language phrase; the source language-pivot language corpus is searched for at least one second language phrase matched with the semanteme of each first pivot language phrase; the pivot language-target language corpus is searched for at least one first target language phrase matched with the semanteme of each first pivot language phrase; the second source language phrases in a source language phrase set are combined with the first target language phrases in a target language phrase set; combined phrase pairs between the source language phrases and the target language phrases are stored in the source language-target language corpus. The method achieves extension of the data in the bilingual corpuses, thereby solving the problem of data sparseness in the bilingual corpuses.

Description

technical field [0001] The invention relates to the technical field of machine translation, in particular to a data expansion method and device for a bilingual corpus. Background technique [0002] Machine translation systems can be divided into rule-based machine translation systems, instance-based machine translation systems, and statistical-based machine translation systems. The machine translation system based on statistics is a kind of machine translation system that emerged in the 1990s, and it is also the most important machine translation system at present. It does not need to manually write rules, and is applicable to all languages, so it is widely used. [0003] The translation quality of statistical-based machine translation systems largely depends on the quality of the corpus. That is, the more data in the corpus and the higher the quality, the higher the translation quality of the machine translation system based on statistics. At the beginning of the establi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28G06F17/30G06F40/00
CPCG06F16/3329G06F16/3337G06F40/45G06F16/24522G06F16/24556G06F40/242G06F40/49
Inventor 朱晓宁何中军吴华王海峰
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products