Mining method and device of single entity instance

An entity, a single technology, applied in the field of data processing, can solve problems such as inaccurate knowledge base, inaccurate entity instances, inaccurate query results, etc.

Active Publication Date: 2016-05-04
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a mining method and device for a single entity instance to solve the inaccurate description of the entity instance of the same entity in the e

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mining method and device of single entity instance
  • Mining method and device of single entity instance
  • Mining method and device of single entity instance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] refer to figure 1 , shows a flow chart of steps of a mining method for a single entity instance in Embodiment 1 of the present invention.

[0045] The mining method of a single entity instance in this embodiment may include the following steps:

[0046] Step 101, grabbing pages from multiple data sources that contain entity instances corresponding to entities of a specific type.

[0047] Among them, an entity is a specific thing or concept, and entities are generally divided into types, such as person-type entities, movie-type entities, and so on. The same entity can correspond to multiple entity instances. An entity instance is a descriptive page (content) for an entity in the network (or other media). For example, various encyclopedia pages contain the entity instance corresponding to the entity.

[0048] In the embodiment of the present invention, pages from multiple data sources including entity instances corresponding to entities of a specific type are firstly craw...

Embodiment 2

[0059] refer to image 3 , shows a flow chart of steps of a method for mining a single entity instance according to Embodiment 2 of the present invention.

[0060] The mining method of a single entity instance in this embodiment may include the following steps:

[0061] Step 301, grabbing pages from multiple data sources that contain entity instances corresponding to entities of a specific type.

[0062] In the embodiment of the present invention, processing is performed on a specific type of entity. The specific type is a person class as an example for description below. For the processing process of other types of entities, refer to the processing process of the person class entity.

[0063] Step 302, respectively extracting entity names, attribute names and attribute values ​​of entity instances included in the page.

[0064] Crawl pages from various webpages, such as Baidu Encyclopedia, Sogou Encyclopedia, Haosou Encyclopedia, etc., and contain multiple pages of entity i...

Embodiment 3

[0104] refer to Figure 4 , shows a flow chart of steps of a method for building a knowledge base in Embodiment 3 of the present invention.

[0105] The method for building a knowledge base in this embodiment may include the following steps:

[0106] Step 401, grabbing pages from multiple data sources that contain entity instances corresponding to entities of a specific type.

[0107] Step 402, respectively extracting entity names, attribute names and attribute values ​​of entity instances included in the page.

[0108] Step 403, for the set of entity instances of entities with the same name, according to the distribution entropy index of the attribute value under the attribute name with a single degree of discrimination corresponding to the entity with the same name, combine the entity instances describing the same entity in the set into the same A single entity instance for an entity.

[0109] For the specific process of the above step 401, step 402, and step 403, it is t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mining method and device of a single entity instance. The method comprises the following steps: fetching a page which is from a plurality of data sources and contains an entity instance corresponding to an entity of a specific type; independently extracting the entity name, the attribute name and the attribute value of the entity instance contained in the page; and aiming at an entity instance set of the identical entities, and combining entity instances which describe the same entity in the set into the single entity instance of the same entity according to the distribution entropy index of the attribute value under the attribute name corresponding to the identical entities, wherein the attribute name has a single distinction degree, and the identical entities are the entity instances which have the same entity name. The entity described by the single entity instance finally obtained by the combination is the identical entity, a mining result is accurate, so that a knowledge base constructed by the single entity instance is more accurate, a query result obtained when a subsequent user inquires the knowledge base is more accurate, and user experience is improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a mining method and device for a single entity instance. Background technique [0002] The knowledge base is a structured, easy-to-operate, easy-to-use, comprehensively organized knowledge cluster in knowledge engineering. It is aimed at the needs of solving problems in a certain (or some) fields, and uses a certain (or several) knowledge representation methods in the computer memory. A collection of interconnected pieces of knowledge stored, organized, managed, and used in . These pieces of knowledge include domain-related theoretical knowledge, factual data, and heuristic knowledge obtained from expert experience, such as definitions, theorems, algorithms, and common sense knowledge related to a certain domain. [0003] Before establishing a knowledge base, it is necessary to establish a unified data structure in the field through domain knowledge. The data structure i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/212
Inventor 邸楠
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products