Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for determining intertranslation relationship of bilingual sentence pairs

A technology of sentence pairs and bilingual corpora, applied in natural language translation, special data processing applications, instruments, etc., can solve the problems of not covering the distribution of corpus, the decrease of recall rate, and the decrease of accuracy rate

Active Publication Date: 2017-04-26
BEIJING KINGSOFT OFFICE SOFTWARE INC +1
View PDF12 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The feature threshold is set empirically, and the feature threshold may often be determined by the setter based on only a few corpus resources, which cannot cover the distribution of most corpora
Moreover, when the feature threshold set by experience is too low, the accuracy rate will drop, and if it is too high, the recall rate will drop.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for determining intertranslation relationship of bilingual sentence pairs
  • Method and system for determining intertranslation relationship of bilingual sentence pairs
  • Method and system for determining intertranslation relationship of bilingual sentence pairs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0095] The invention provides a method for determining the inter-translation relationship of bilingual sentence pairs, which is used to improve the universality, accuracy rate and recall rate of a corpus.

[0096] see reference figure 1 with figure 2 , figure 1 The flow chart of the first embodiment of the method for determining the inter-translation relationship between bilingual sentence pairs in the present invention, figure 2 yes figure 1 Flow chart of building a classification model in .

[0097] The method for determining the inter-translation relationship between bilingual sentence pairs described in the first embodiment of the present invention, such as figure 1 shown, including the following steps:

[0098] S100. Obtain target bilingual sentence pairs in the bilingual corpus.

[0099] Wherein, the target bilingual sentence pair may be a sentence pair composed of a first sentence in the first language and a second sentence in the second language, and there is a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for determining an intertranslation relationship of bilingual sentence pairs. The method comprises a step of determining matching feature values of the bilingual sentence pairs, performing filtering and classification on the bilingual sentence pairs according to the weights of the matching feature values in the intertranslation relationship according to a pre-established training classification model, and determining whether the bilingual sentence pairs are bilingual sentence pairs satisfying the requirements of the intertranslation relationship. Therefore, by adoption of the method for determining the intertranslation relationship of the bilingual sentence pairs provided by the embodiment of the invention, a bilingual corpus with a huge data size can be processed quickly and conveniently. The problem of determining the intertranslation relationship of the bilingual sentence pairs is converted into a binary classification problem by using the classification idea of the training classification model, so that the weights of the matching features of the bilingual corpus can be determined more scientifically and reasonably, and compared with the existing experience method, the universality is better, and the accuracy and the recall rate are improved accordingly.

Description

technical field [0001] The invention relates to a method for determining the mutual translation relationship of bilingual sentence pairs, especially a method and system for determining the mutual translation relationship of bilingual sentence pairs. Background technique [0002] The great value of corpus resources for natural language processing research has been increasingly recognized. Especially the parallel bilingual corpus, which is a special corpus that contains information about mutual translation between two languages. Parallel bilingual corpora can provide rich matching information between two languages, and have important application value in the acquisition of translation knowledge, the establishment of bilingual dictionaries, machine translation based on statistics or examples, word sense disambiguation, etc., especially high-quality The role of corpus is more prominent. [0003] There are two main ways to build a corpus, one is the traditional method of manual...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/28
CPCG06F40/42G06F40/58
Inventor 武英波
Owner BEIJING KINGSOFT OFFICE SOFTWARE INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products