Annotation library generation method and annotation library generation device

A technology for labeling databases and target sentences, applied in the field of generating labeling databases, can solve problems such as lack of data and large manpower, and achieve low-cost effects

Active Publication Date: 2017-06-09
HUAWEI TECH CO LTD
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, at present, the data in the manually-labeled corpus is seriously lacking, and ge...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotation library generation method and annotation library generation device
  • Annotation library generation method and annotation library generation device
  • Annotation library generation method and annotation library generation device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0052] The terms "first", "second", "third" and "fourth" in the specification and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-excl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses an annotation library generation method and an annotation library generation device. The method includes: aligning a target statement and a source statement, mutually translated in sentence level, in a bilingual parallel corpus; if a first specific word, belonging to a first word class, in the target statement has no corresponding source specific word in the source statement, acquiring a first candidate set which includes candidate source specific words in mutual translation relation with the first specific word; acquiring a candidate position set according to the alignment relation between the source statement and the target statement, wherein the candidate position set includes positions of possibly lost source specific words in the source statement; acquiring the correct probability of each statement in a second candidate set according to a preset language probability model, wherein the second candidate set includes candidate source statements formed by filling of the candidate source specific words in the first candidate set to the positions of the candidate position set; generating an annotation library which includes candidate source statements determined according to the correct probability of statement in the second candidate set.

Description

technical field [0001] The present invention relates to the field of computers, in particular to a method and device for generating an annotation library. Background technique [0002] In recent years, the field of Statistical Machine Translation (SMT) has made tremendous progress. In the field of SMT, pronouns play a very important role. However, in languages ​​such as Japanese and Chinese, pronoun omission is an extremely common phenomenon, but in languages ​​such as English, pronouns are indispensable as sentence components. This leads to the fact that most of the missing pronouns will be difficult to translate correctly when translating from a pronoun-omitting language to a pronoun-omitting language. Therefore, the automatic generation of pronouns can effectively assist statistical machine translation to complete the problem of pronoun missing in the process of pronoun easy to omit language to pronoun not easy to omit language translation, so that the translation is mo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
CPCG06F40/58
Inventor 涂兆鹏李航刘群
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products