Method and device for excavating synonymous attribute words

A technology of attribute words and dictionaries, applied in the field of mining synonymous attribute words, can solve the problems of low recall rate, low efficiency, and human resource consumption

Active Publication Date: 2013-05-15
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the identification process of entity attributes, entity words and attribute words are matched and identified through the preset entity word dictionary and attribute word dictionary respectively. However, usually the expression of entity words is relatively unique and fixed, while attribute words may have different expressions Usually, the attribute word dictionary contains a standardized expression form.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for excavating synonymous attribute words
  • Method and device for excavating synonymous attribute words
  • Method and device for excavating synonymous attribute words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] figure 1 The flow chart of the method provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method includes the following steps:

[0076] Step 101: Obtain query set.

[0077] The query set within a certain period of time can be obtained from the search log as the corpus for extracting synonymous attribute words.

[0078] Step 102: Determine the click vector of each query in the query set, wherein the click vector of the query is composed of the clicked url corresponding to the query and the click weight of each url.

[0079] query i url in the click vector j The click weight w ij can use query i at url j The proportion of clicks on , which can be specifically expressed as the following formula:

[0080] w ij = click ij / Σ k = 1 n click ...

Embodiment 2

[0127] figure 2 The device structure diagram provided for the second embodiment of the present invention, such as figure 2 As shown, the device may include: a data acquisition unit 201 , a structured analysis unit 202 , a data extraction unit 203 , a candidate word extraction unit 204 and a synonym extraction unit 205 .

[0128] The data acquisition unit 201 acquires a query set, specifically, a query set within a certain period of time may be acquired from a search log as a prediction for extracting synonymous attribute words.

[0129] The structured parsing unit 202 performs structured parsing on each query in the query set based on the existing entity word dictionary and attribute word dictionary, and extracts a standard query. The query that does not extract the standard query is used as a non-standard query, and the standard query is composed of entity words Combination with attribute words.

[0130] Specifically, when the structured parsing unit 202 performs structur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for excavating synonymous attribute words, wherein the method includes: based on an existing entity word dictionary and an existing attribute word dictionary, carrying out structured analysis on each query in a query set, extracting standard query which is formed by the combination of entity words and attribute words; aiming at each standard query, respectively calculating the click similarity between each non-standard query and the standard query, and confirming the non-standard query which has the same entity words as the current standard query, wherein the click similarity of the non-standard query meets the requirement of the preset similarity; removing the entity words identical to the current standard query in the confirmed non-standard query to obtain candidate synonymous attribute words; and scoring each synonymous attribute word, and based on the scores, confirming the synonymous attribute word of the attribute word in the current standard query. The manpower resource can be saved, and efficiency and a recall rate can be improved.

Description

【Technical field】 [0001] The invention relates to the field of computer technology, in particular to a method and device for mining synonymous attribute words. 【Background technique】 [0002] With the continuous development of network technology, search engines have become an important way for people to obtain information. By inputting a search term (query) in the search engine, users obtain search results returned by the search engine for the query. In order to return search results to users in a targeted manner, it is necessary to perform demand analysis on the query, in which entity attributes are the basis for analyzing user needs and also the basis for realizing structured search (vertical search). That is, the entity word and attribute word are analyzed from the query. For example, for the query "What is Andy Lau's height", the entity word is "Andy Lau" and the attribute word is "height". Know the specific information about Andy Lau's height. [0003] In the identifi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 陈庆轩李皛皛
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products