Data expansion method and device for bilingual corpus

A bilingual corpus and expansion device technology, which is applied in natural language data processing, digital data processing, special data processing applications, etc., can solve problems such as bilingual corpus data sparseness, and achieve the effect of solving data sparseness

Active Publication Date: 2018-01-23
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention proposes a bilingual corpus data expansion method and device to solve the data sparse problem of bilingual corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data expansion method and device for bilingual corpus
  • Data expansion method and device for bilingual corpus
  • Data expansion method and device for bilingual corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only parts related to the present invention are shown in the drawings but not all content.

[0024] figure 1 A first embodiment of the invention is shown.

[0025] figure 1 It is a schematic flowchart of the bilingual corpus data expansion method provided by the first embodiment of the present invention. This method is suitable for expanding the bilingual corpus of the source language-target language corpus based on the source language-pivot language corpus and the pivot language-target language corpus. Specifically, it can be realized by the data expansion device of the bilingual corpus , the device can be configu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data expansion method and device for a bilingual corpus. The data expansion method of the bilingual corpus includes: searching in the source language-pivot language corpus for at least one first pivot language phrase that matches the semantics of the first source language phrase; at least one second source language phrase that semantically matches the first pivot language phrase; searching the pivot language-target language corpus for at least one first target language phrase that semantically matches each of the first pivot language phrases; Combining the second source language phrase in the source language phrase set with the first target language phrase in the target language phrase set; storing the phrase pair between the source language phrase and the target language phrase formed by the combination in the source language‑target language corpus. The invention expands the data in the bilingual corpus and solves the data sparse problem in the bilingual corpus.

Description

technical field [0001] The invention relates to the technical field of machine translation, in particular to a data expansion method and device for a bilingual corpus. Background technique [0002] Machine translation systems can be divided into rule-based machine translation systems, instance-based machine translation systems, and statistical-based machine translation systems. The machine translation system based on statistics is a kind of machine translation system that emerged in the 1990s, and it is also the most important machine translation system at present. It does not need to manually write rules, and is applicable to all languages, so it is widely used. [0003] The translation quality of statistical-based machine translation systems largely depends on the quality of the corpus. That is, the more data in the corpus and the higher the quality, the higher the translation quality of the machine translation system based on statistics. At the beginning of the establi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/28G06F17/30G06F40/00
CPCG06F16/3329G06F16/3337G06F40/45G06F16/24522G06F16/24556G06F40/242G06F40/49
Inventor 朱晓宁何中军吴华王海峰
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products