Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Determination method and determination device for semantic redundancy and corresponding search method and device

A technology for determining methods and semantics, applied in the field of natural language processing, can solve problems such as inability to obtain rankings and inability to recall, and achieve the effect of improving recall rate and search effect

Active Publication Date: 2013-11-13
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-husband after divorce" and "what should I do if I hate my ex-husband", etc., due to the keyword matching The method requires each keyword to be able to match to achieve a top ranking. Therefore, for a web page that actually matches semantically but does not match the semantically redundant keywords in the query, it may not be able to obtain a top ranking. , it is not even possible to recall

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Determination method and determination device for semantic redundancy and corresponding search method and device
  • Determination method and determination device for semantic redundancy and corresponding search method and device
  • Determination method and determination device for semantic redundancy and corresponding search method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] figure 1 The flow chart of the method for determining semantic redundancy provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include:

[0056] Step 101: Determine word A in semantic redundancy mining.

[0057] Since there is semantic redundancy, nouns are mostly used as the central word, so in this step, nouns are used as the main word to determine word A, and at the same time, statistics are carried out in a large-scale corpus, and nouns whose frequency of occurrence is greater than the preset first frequency threshold are used as words a. The first frequency threshold can be set according to actual needs, for example, a noun whose frequency of occurrence is greater than 10 in the corpus is used as word A.

[0058] Step 102: Determine the collocation word B of the word A.

[0059] The collocation word B determined in this step is used for subsequent mining of redundant words. In view of forming semantic redundancy with word A...

Embodiment 2

[0080] figure 2 The search method provided for Embodiment 2 of the present invention, such as figure 2 As shown, the search methods include:

[0081] Step 201: Perform word segmentation processing on the query input by the user.

[0082] Step 202: Determine the collocation word pairs formed by pairs of each word obtained after the word segmentation process.

[0083] When determining collocation word pairs in this step, it can be similar to the method of step 102 in the first embodiment, that is, it is determined that among the words obtained after the word segmentation process, the co-occurrences are within the preset window range and the co-occurrence conditions meet the preset first template The two words form a collocation word pair. The first template may include but not limited to: adjective+noun, noun+noun, noun+verb, verb+noun, and so on.

[0084] Step 203: Use the determined collocation word pair to search the semantic redundancy pair database, and if a semantic ...

Embodiment 3

[0093] image 3 The structural diagram of the device for determining semantic redundancy provided by Embodiment 3 of the present invention, as shown in image 3 As shown, the apparatus may include: a collocation word pair determination unit 300 , a context vector determination unit 310 and a redundant pair determination unit 320 .

[0094] The collocation word pair determination unit 300 determines the word A and its collocation word B.

[0095] Wherein, the collocation word pair determining unit 300 may specifically include: a candidate word determining subunit 301, configured to determine a noun in the corpus whose occurrence frequency is greater than a preset first frequency threshold as the word A.

[0096]Due to the presence of semantic redundancy, nouns are mostly used as the central word, so the candidate word determination subunit 301 mainly determines the word A based on nouns, and at the same time performs statistics in a large-scale corpus, and nouns with a frequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a determination method and a determination device for semantic redundancy and a corresponding search method and device. The determination method for semantic redundancy comprises the steps of S1, determining a word A and a collocation word B thereof; S2, making the statistics of a context vector of a collocation word pair composed of the word A and the word B from corpus and a context vector of the word A; S3, computing the similarity between the context vector of the collocation word pair composed of the word A and the word B and the context vector of the word A, if the similarity is larger than a preset similarity threshold, determining that the collocation word pair composed of the word A and the word B and the word A form a semantic redundancy pair, wherein the word B is a redundancy word. By using the determination method and the determination device for semantic redundancy and the corresponding search method and device, the semantic redundancy condition in query is effectively determined, a basis is provided for the elimination of redundancy of the query, search is carried out by the query subjected to the elimination of redundancy, redundancy keywords do not need to participate matching, the recall rate of search results is improved, and the search effect is improved.

Description

【Technical field】 [0001] The present invention relates to natural language processing technology, in particular to a method and device for determining semantic redundancy, and a corresponding search method and device. 【Background technique】 [0002] With the continuous development of search engine technology, the traditional strategy based on keyword matching is more and more incapable of semantic matching in modern search engines. In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-husband after divorce" and "what should I do if I hate my ex-husband", etc., due to the keyword matching The method requires each keyword to be able to match t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 方高林
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products