Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and device for determining semantic redundancy, and a corresponding search method and device

A technology to determine the method and semantics, applied in the field of natural language processing, can solve problems such as inability to recall, inability to obtain rankings, etc., to achieve the effect of increasing the recall rate and improving the search effect

Active Publication Date: 2018-03-02
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-husband after divorce" and "what should I do if I hate my ex-husband", etc., due to the keyword matching The method requires each keyword to be able to match to achieve a top ranking. Therefore, for a web page that actually matches semantically but does not match the semantically redundant keywords in the query, it may not be able to obtain a top ranking. , it is not even possible to recall

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for determining semantic redundancy, and a corresponding search method and device
  • A method and device for determining semantic redundancy, and a corresponding search method and device
  • A method and device for determining semantic redundancy, and a corresponding search method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] figure 1 The flow chart of the method for determining semantic redundancy provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include:

[0056] Step 101: Determine word A in semantic redundancy mining.

[0057] Since there is semantic redundancy, nouns are mostly used as the central word, so in this step, nouns are used as the main word to determine word A, and at the same time, statistics are carried out in a large-scale corpus, and nouns whose frequency of occurrence is greater than the preset first frequency threshold are used as words a. The first frequency threshold can be set according to actual needs, for example, a noun whose frequency of occurrence is greater than 10 in the corpus is used as word A.

[0058] Step 102: Determine the collocation word B of the word A.

[0059] The collocation word B determined in this step is used for subsequent mining of redundant words. In view of forming semantic redundancy with word A...

Embodiment 2

[0080] figure 2 The search method provided for Embodiment 2 of the present invention, such as figure 2 As shown, the search methods include:

[0081] Step 201: Perform word segmentation processing on the query input by the user.

[0082] Step 202: Determine the collocation word pairs formed by pairs of each word obtained after the word segmentation process.

[0083] When determining collocation word pairs in this step, it can be similar to the method of step 102 in the first embodiment, that is, it is determined that among the words obtained after the word segmentation process, the co-occurrences are within the preset window range and the co-occurrence conditions meet the preset first template The two words form a collocation word pair. The first template may include but not limited to: adjective+noun, noun+noun, noun+verb, verb+noun, and so on.

[0084] Step 203: Use the determined collocation word pair to search the semantic redundancy pair database, and if a semantic ...

Embodiment 3

[0093] image 3 The structural diagram of the device for determining semantic redundancy provided by Embodiment 3 of the present invention, as shown in image 3 As shown, the apparatus may include: a collocation word pair determination unit 300 , a context vector determination unit 310 and a redundant pair determination unit 320 .

[0094] The collocation word pair determination unit 300 determines the word A and its collocation word B.

[0095] Wherein, the collocation word pair determining unit 300 may specifically include: a candidate word determining subunit 301, configured to determine a noun in the corpus whose occurrence frequency is greater than a preset first frequency threshold as the word A.

[0096]Due to the presence of semantic redundancy, nouns are mostly used as the central word, so the candidate word determination subunit 301 mainly determines the word A based on nouns, and at the same time performs statistics in a large-scale corpus, and nouns with a frequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and device for determining semantic redundancy, and a corresponding search method and device, wherein the method for determining semantic redundancy includes: S1, determining word A and its collocation word B; S2, counting word A from corpus The context vector of the collocation word pair formed with word B, and the context vector of statistical word A; S3, calculate the similarity between the context vector of the collocation word pair formed by said word A and word B and the context vector of said word A degree, if the similarity is greater than the preset similarity threshold, it is determined that the collocation word pair formed by the word A and the word B forms a semantic redundant pair with the word A, wherein the word B is a redundant word. The present invention can effectively determine the semantic redundancy existing in the query, and provide a basis for de-redundancy of the query. Using the deredundant query to search, so that redundant keywords do not need to participate in matching, improve the recall rate of search results, and improve the search effect.

Description

【Technical field】 [0001] The present invention relates to natural language processing technology, in particular to a method and device for determining semantic redundancy, and a corresponding search method and device. 【Background technique】 [0002] With the continuous development of search engine technology, the traditional strategy based on keyword matching is more and more incapable of semantic matching in modern search engines. In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-husband after divorce" and "what should I do if I hate my ex-husband", etc., due to the keyword matching The method requires each keyword to be able to match t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 方高林
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products