Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for excavating attribute name repeat

A technology of attribute and phrase pairs, which is applied in natural language data processing, special data processing applications, network data retrieval, etc., and can solve problems such as obvious attribute names and differences

Active Publication Date: 2014-03-12
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, in the actual search process of the user, the language expression used may be different from the expression in the structured database, especially reflected in the attribute name

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for excavating attribute name repeat
  • Method and device for excavating attribute name repeat
  • Method and device for excavating attribute name repeat

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] figure 1 The flowchart of the method for mining attribute name repetition provided by the first embodiment of the present invention, as figure 1 As shown, the method may include the following steps:

[0057] Step 101: Obtain at least one resource among Q-Q, Q-T, and T-T from a query log as a candidate sentence pair.

[0058] The purpose of this step is to obtain the sentence pair resources used for subsequent mining from the query log. The querylog records the data of the user's query session (session) and the clicked page title (title). The specific querylog used can be the query of a specified period of time. log, such as the query log of a day.

[0059] The above Q-Q refers to a query-query pair, which refers to two queries searched by a user in a session, and these two queries may have the same meaning.

[0060] The above Q-T refers to the query-clicked title pair, which refers to the query and the corresponding clicked title. Usually, the semantics between the q...

Embodiment 2

[0091] figure 2 The structure diagram of the apparatus for repeating the mining attribute name provided in the second embodiment of the present invention, as shown in figure 2 As shown, the apparatus includes: a candidate sentence pair acquisition unit 201 , a first phrase pair extraction unit 202 , a second phrase pair extraction unit 203 and a noise filtering unit 204 .

[0092] The candidate sentence pair obtaining unit 201 obtains at least one resource among Q-Q, Q-T and T-T from the query log as a candidate sentence pair, where Q-Q is a sentence pair formed by two queries searched by the user in a session, and Q-T is the query and the corresponding sentence pair. The sentence pair formed by the clicked title, T-T is the sentence pair formed by the two clicked titles corresponding to the same query.

[0093] The first phrase pair extracting unit 202 extracts phrase pairs with the same context from each candidate sentence pair as candidate paraphrase phrase pairs. Speci...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for excavating attribute name repeat. The method comprises the steps of obtaining at least one source of Q-Q, Q-T, and T-T from a searching log as candidate sentence pairs, wherein the Q-Q is a sentence pair formed by two queries searched by a user in a dialogue, Q-T is a sentence pair formed by the queries and clicked web titles corresponding to the queries, and T-T is a sentence pair formed by two clicked titles corresponding to the same query; extracting phrases with the same context as candidate repeated phrase pairs from the candidate sentence pairs; extracting candidate repeated phrase pairs stored in at least one phrase attribute list from the candidate repeated phrases; conducting noise filtration on the extracted candidate repeated phrase pairs in the third step so as to obtain attribute name repeated phrases pairs. The method and device can obtain the expression form of attribute names, so that flexible and diverse request expression of users can be matched better.

Description

【Technical field】 [0001] The invention relates to the technical field of computer applications, in particular to a method and device for mining attribute name repetition. 【Background technique】 [0002] In the field of network information, a triplet of data can be represented as (e, a, v), where e is the entity name (entity), a is the attribute name (attribute), and v is the attribute value (value), for example (Yao Ming , height, 2.26 meters) is a triple. Triple data has applications in many aspects, especially in search engines, triple data is stored in a structured database to provide a data source for vertical search. When a user searches for entity attributes, the search engine can directly return the corresponding data to the user. The attribute value of , for example, when a user searches for "how tall is Yao Ming", the exact answer "2.26 meters" can be directly returned. [0003] However, in the process of actual search by users, the language expression used may be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951G06F40/284
Inventor 赵世奇
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD