Determination method and determination device for semantic redundancy and corresponding search method and device

A technology for determining methods and semantics, applied in the field of natural language processing, can solve problems such as inability to obtain rankings and inability to recall, and achieve the effect of improving recall rate and search effect

Active Publication Date: 2013-11-13
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Determination method and determination device for semantic redundancy and corresponding search method and device
  • Determination method and determination device for semantic redundancy and corresponding search method and device
  • Determination method and determination device for semantic redundancy and corresponding search method and device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0054] Example one

[0055] figure 1 This is a flowchart of a method for determining semantic redundancy provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method can include:

[0056] Step 101: Determine the word A in semantic redundancy mining.

[0057] Because of the semantic redundancy, nouns are mostly used as the central word. Therefore, in this step, nouns are used as the main word to determine the word A. At the same time, statistics are performed in a large-scale corpus, and nouns with a frequency greater than the preset first frequency threshold are used as words. A. The first frequency threshold can be set according to actual needs, for example, nouns with a frequency greater than 10 in the corpus are used as word A.

[0058] Step 102: Determine the collocation word B of word A.

[0059] The collocation word B determined in this step is used for subsequent mining of redundant words. In view of the semantic redundancy with word A, it usually me...

Example Embodiment

[0079] Embodiment two

[0080] figure 2 This is the search method provided in the second embodiment of the present invention, such as figure 2 As shown, the search method includes:

[0081] Step 201: Perform word segmentation processing on the query input by the user.

[0082] Step 202: Determine the collocation word pair formed by each word after word segmentation processing.

[0083] When determining the collocation word pair in this step, it can be similar to the method of step 102 in the first embodiment, that is, it is determined that each word obtained after word segmentation processing co-occurs within the preset window range and the co-occurrence condition satisfies the preset first template The two words form a collocation pair. The first template may include, but is not limited to: adjective + noun, noun + noun, noun + verb, verb + noun, and so on.

[0084] Step 203: Use the determined collocation word pair to search the semantic redundancy pair database, and if the semant...

Example Embodiment

[0092] Embodiment three

[0093] image 3 This is a structural diagram of the device for determining semantic redundancy provided in the third embodiment of the present invention, such as image 3 As shown, the device may include: a collocation word pair determining unit 300, a context vector determining unit 310, and a redundant pair determining unit 320.

[0094] The collocation word pair determining unit 300 determines the word A and its collocation word B.

[0095] The collocation word pair determining unit 300 may specifically include: a candidate word determining subunit 301, configured to determine a noun whose appearance frequency is greater than a preset first frequency threshold in the corpus as the word A.

[0096] Since nouns are used as the central word in most cases of semantic redundancy, the candidate word determination subunit 301 uses nouns as the main word to determine the word A, and at the same time, performs statistics in a large-scale corpus to find nouns with a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a determination method and a determination device for semantic redundancy and a corresponding search method and device. The determination method for semantic redundancy comprises the steps of S1, determining a word A and a collocation word B thereof; S2, making the statistics of a context vector of a collocation word pair composed of the word A and the word B from corpus and a context vector of the word A; S3, computing the similarity between the context vector of the collocation word pair composed of the word A and the word B and the context vector of the word A, if the similarity is larger than a preset similarity threshold, determining that the collocation word pair composed of the word A and the word B and the word A form a semantic redundancy pair, wherein the word B is a redundancy word. By using the determination method and the determination device for semantic redundancy and the corresponding search method and device, the semantic redundancy condition in query is effectively determined, a basis is provided for the elimination of redundancy of the query, search is carried out by the query subjected to the elimination of redundancy, redundancy keywords do not need to participate matching, the recall rate of search results is improved, and the search effect is improved.

Description

【Technical field】 [0001] The present invention relates to natural language processing technology, in particular to a method and device for determining semantic redundancy, and a corresponding search method and device. 【Background technique】 [0002] With the continuous development of search engine technology, the traditional strategy based on keyword matching is more and more incapable of semantic matching in modern search engines. In the process of users searching, there are a large proportion of semantic redundant expressions, for example: "Where is Beijing Zhongguancun" actually has the same meaning as "Where is Zhongguancun", "Apple iphone4s" and "iphone4s" express The meaning is the same, "a new film directed by Zhang Yimou" and "Zhang Yimou's new film" express the same meaning, "what should I do if I hate my ex-husband after divorce" and "what should I do if I hate my ex-husband", etc., due to the keyword matching The method requires each keyword to be able to match t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 方高林
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products