Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and device for mining attribute name repetition

A technology of attribute and phrase pairs, which is applied in natural language data processing, special data processing applications, network data retrieval, etc., and can solve problems such as differences and obvious attribute names

Active Publication Date: 2018-04-03
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, in the actual search process of the user, the language expression used may be different from the expression in the structured database, especially reflected in the attribute name

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for mining attribute name repetition
  • A method and device for mining attribute name repetition
  • A method and device for mining attribute name repetition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] figure 1 The flow chart of the method for mining attribute name retelling provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:

[0057] Step 101: Obtain at least one resource among Q-Q, Q-T and T-T from a search log (query log) as a candidate sentence pair.

[0058] The purpose of this step is to obtain the sentence pair resources used for subsequent mining from the query log. The query log records the data of the user's query session (session) and the click on the webpage title (title). The specific query log can be the query of a specified period of time log, such as query log for one day.

[0059] The aforementioned Q-Q refers to a query-query pair, which refers to two queries searched by a user in one session, and the meanings of these two queries may be the same.

[0060] The above Q-T refers to the query-clicked title pair, which refers to the query and the corresponding clicked title. Usually...

Embodiment 2

[0091] figure 2 The device structure diagram of the retelling of mining attribute names provided by Embodiment 2 of the present invention, such as figure 2 As shown, the device includes: a candidate sentence pair acquisition unit 201 , a first phrase pair extraction unit 202 , a second phrase pair extraction unit 203 and a noise filtering unit 204 .

[0092] The candidate sentence pair acquisition unit 201 obtains at least one resource in Q-Q, Q-T and T-T from the query log as a candidate sentence pair, where Q-Q is a sentence pair composed of two queries searched by a user in a session, and Q-T is a query and a corresponding The sentence pair formed by the clicked title, T-T is the sentence pair formed by two clicked titles corresponding to the same query.

[0093] The first phrase pair extracting unit 202 extracts phrase pairs with the same context from each candidate sentence pair as candidate paraphrase phrase pairs. Specifically, phrase pairs can be extracted as candi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for excavating attribute name repeat. The method comprises the steps of obtaining at least one source of Q-Q, Q-T, and T-T from a searching log as candidate sentence pairs, wherein the Q-Q is a sentence pair formed by two queries searched by a user in a dialogue, Q-T is a sentence pair formed by the queries and clicked web titles corresponding to the queries, and T-T is a sentence pair formed by two clicked titles corresponding to the same query; extracting phrases with the same context as candidate repeated phrase pairs from the candidate sentence pairs; extracting candidate repeated phrase pairs stored in at least one phrase attribute list from the candidate repeated phrases; conducting noise filtration on the extracted candidate repeated phrase pairs in the third step so as to obtain attribute name repeated phrases pairs. The method and device can obtain the expression form of attribute names, so that flexible and diverse request expression of users can be matched better.

Description

【Technical field】 [0001] The present invention relates to the field of computer application technology, in particular to a method and device for mining attribute name repetition. 【Background technique】 [0002] In the field of network information, a piece of triplet data can be expressed as (e, a, v), where e is the entity name (entity), a is the attribute name (attribute), and v is the attribute value (value), for example (Yao Ming , height, 2.26 meters) is a triplet. Triple data is used in many aspects, especially in search engines, where triple data is stored in a structured database to provide data sources for vertical searches. When users search for entity attributes, the search engine can directly return the corresponding data to the user. For example, when a user searches for "how tall is Yao Ming", the exact answer "2.26 meters" can be returned directly. [0003] However, in the actual search process of the user, the language expression used may be different from t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/951G06F40/284
Inventor 赵世奇
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD