Method and device for calculating sentence similarity and method and device for machine translation

A technique of sentence similarity and similarity, applied in the computer field

Active Publication Date: 2013-04-10
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF9 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although it has been proposed to consider the similarity between different words based on the dictionary of synonyms in the calculation of similarity, in many applications, such as the above-mentioned machine translation application, the collocation relationship between

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for calculating sentence similarity and method and device for machine translation
  • Method and device for calculating sentence similarity and method and device for machine translation
  • Method and device for calculating sentence similarity and method and device for machine translation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0095] figure 1 The flow chart of the method for calculating sentence similarity provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:

[0096] Step 101: Compare the sentence E1 and the sentence E2, and determine the difference word pairs.

[0097] The embodiment of the present invention is based on basic text processing of sentences, such as word segmentation, alignment, etc. Since this part of the content is prior art, it will not be repeated here.

[0098] Compare the words in sentence E1 and sentence E2, and determine that different words constitute different word pairs, for example:

[0099] Sentence E1 is: Can I take a picture of the painting?

[0100] Sentence E2 is: Can we take a photo of the painting?

[0101] Then determine that the difference word pair is: the difference word pair formed by I and we, the difference word pair formed by picture and photo.

[0102] Step 102: Use the collocation pr...

Embodiment 2

[0117] figure 2 The flow chart of the method for calculating sentence similarity provided by Embodiment 2 of the present invention, such as figure 2 As shown, the method may include the following steps:

[0118] Step 201 is the same as step 101 in the first embodiment.

[0119] Step 202 is the same as step 102 in the first embodiment.

[0120] Step 203: Determine the feature vectors of the two different words in the difference word pair, and use the feature vectors of the two different words to calculate the similarity distance between the two different words.

[0121] In the second embodiment, the degree of similarity of the difference words in a specific corpus can be further considered, and the degree of similarity is reflected by the distance between the feature vectors of the two difference words in the difference word pair.

[0122] The feature vector of the difference word can be composed of words that have a higher collocation probability with the difference word....

Embodiment 3

[0140] image 3 The flow chart of the machine translation method provided by Embodiment 3 of the present invention, such as image 3 As shown, the method may include the following steps:

[0141] Step 301: Calculate the similarity between the sentence to be translated and the sentence in the preset example sentence database.

[0142] In this step, the method described in Embodiment 1 or Embodiment 2 can be used to calculate the similarity between the sentence to be translated and the sentence in the example sentence database, so as to prepare for further selection of similar example sentences.

[0143] Because the number of example sentences in the example sentence bank is very large, if the method shown in embodiment one or embodiment two is used to calculate the similarity between each sentence in the example sentence bank and the sentence to be translated one by one, then the efficiency will be low. In order to improve efficiency, you can first calculate The edit distance...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for calculating sentence similarity and a method and a device for machine translation, wherein the method for calculating the sentence similarity comprises the following steps that a first sentence and a second sentence are compared, so as to determine different word pairs; different words are marked by utilizing the matching probability of the different words in the different word pairs with other words in the first sentence or the second sentence in which the different words are contained, wherein the matching probability of two words is obtained by inquiring a matching probability model, and the matching probability of the two words in the matching probability model is obtained by counting the co-occurrence frequency of the two words in a preset corpus; the marking results of the different words in the different word pairs are utilized to mark the different word pairs; and the marking results of the different word pairs are utilized to determine the similarity of the first sentence and the second sentence. According to the method and the device, the matching degree of the two sentences can be more accurately reflected, thereby increasing the application quality of the method and the device for the machine translation and the like.

Description

【Technical field】 [0001] The invention relates to the field of computer technology, in particular to a method and device for calculating sentence similarity and a method and device for machine translation. 【Background technique】 [0002] Sentence similarity calculation has very important application value in the fields of question retrieval, bilingual example sentence retrieval, machine translation, document summarization, etc. Among them, what kind of sentence similarity calculation method can accurately reflect the similarity between two sentences is the most influential The key to the above application quality. [0003] To give an application in machine translation technology, preprocessed bilingual example sentences are usually used as the main translation resource in machine translation technology, and the final translation is generated by editing similar example sentences that match the sentences to be translated. Specifically, the following steps are included: [00...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 刘占一吴华王海峰
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products