Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for calculating text similarity and realizing search processing through computer

A text similarity, computer technology, applied in computing, digital data processing, special data processing applications, etc., can solve problems such as matching errors between search sentences and web page titles, affecting user experience, and affecting the display and sorting of search results. Accurate similarity, the effect of solving dependency problems

Active Publication Date: 2015-03-25
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF8 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method cannot solve the problem of long-distance dependence in the target language. It can only perform semantic matching simply, so that the semantics of the search statement cannot be truly reflected and expressed, so that the search statement and the title of the web page are incorrectly matched, which affects the display of search results. and sorting, which in turn affects the user experience
For example, match the sentence "Why didn't Guan Yu kill Cao Cao back then" to "Why didn't Cao Cao kill Guan Yu back then", in the original sentence (query), "Guan Yu" is the subject, "Cao Cao" is the object, and because the problem of long-distance dependency is not solved , the search statement and the title of the webpage only match words, but the dependency relationship of the actual sentence is not reflected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for calculating text similarity and realizing search processing through computer
  • Method and device for calculating text similarity and realizing search processing through computer
  • Method and device for calculating text similarity and realizing search processing through computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The basic idea of ​​the present invention is that in information processing technology, the semantic structure matching is realized by introducing the dependency structure model of the target language into the translation model; in the processing of text matching, the translation model is combined with the dependency structure model to perform Decoding is used to generate Top K translation text strings, and then realizes semantic structure matching through the multiple translation text strings and another text string to be compared / matched, strengthens semantic structure information, and uses semantic similarity Calculate and push the title of the webpage matching the search statement to the user.

[0017] Traditional phrase translation models, when translating search words into Top K titles, use the NGRAM language model to examine whether the translated titles conform to the language rules of the target language. In the present invention, in order to further examine th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for calculating text similarity and realizing search processing achieved through a computer. The method comprises the following steps: acquiring a first text cluster and a second cluster; decoding the first text cluster according to a preset phrase translation model and a dependency structure model to obtain K translation text clusters; respectively calculating a first semantic similarity value between the K translation text clusters and the second text cluster, and calculating a second semantic similarity value between the first text cluster and the second text cluster according to the K calculated semantic similarity values. By adopting the method and the device, the problem of long distance dependency relationship in sentences is solved, the semantics of searched sentences can be relatively well expressed, the searched sentences can be relatively well matched with webpage titles, and a user can obtain semantic matching search result items, so that the search experience of the user is improved.

Description

technical field [0001] The invention relates to natural language processing technology, in particular to a computer-implemented method and device for calculating text similarity and searching. Background technique [0002] Among search engines, in order to be able to match the search term (or Query) input by the user to each domain of the document (for example, title, content) as well as possible, a method based on complete word matching is usually used to realize the match. [0003] At present, there are also methods that use translation models. From the perspective of translation, it is assumed that titles and search terms (for example, Query) are written in different sub-languages, to translate similar "effective" into "useful". Phrase translation to achieve semantic matching. However, this method cannot solve the problem of long-distance dependence in the target language. It can only perform semantic matching simply, so that the semantics of the search statement cannot...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 张军吴先超刘占一
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products