Unique constraint based Deep Web entity identification method

An entity identification, unique technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as errors, erroneous entity identification, misses, etc.

Inactive Publication Date: 2013-08-21
SUZHOU UNIV
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, method one has three problems: first, wrong attribute values ​​may lead to wrong entity recognition; second, method one will miss other correct at

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The present invention will be described in detail below.

[0043] A Deep Web entity recognition method based on unique constraints, comprising the following steps:

[0044] Step 1) Uniqueness constraint definition

[0045] Hard Uniqueness Constraints

[0046] Assume is the field A set of entities on , for an attribute on . if Each entity in the attribute has unique values ​​on , including null values, then define a on about The uniqueness constraint of is expressed as ;

[0047] soft uniqueness constraints

[0048] Assume is the domain A set of entities on , yes an attribute on . one in on about The soft uniqueness constraint of is defined as ,in is an entity in Upper bound probability for multiple values ​​on , yes The upper boundary probability that a value on is shared by multiple entities;

[0049] k-part graph encoding

[0050] Assume is a set of entities, yes k unique attributes on is a set of The data s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a unique constraint based Deep Web entity identification method. The unique constraint based Deep Web entity identification method includes steps of firstly sorting problems into K-map cluster problems from the aspect of rigid constraint, and providing a clustering algorithm; and expanding the k-map cluster problems to the flexible constraint, sorting entity identification problems into optimization problems, and providing matching algorithm. In the unique constraint based Deep Web entity identification method, recording connection and data integration are integrated to be applied in overall situation, and the k-map cluster problems under rigid constraint are provided and are expanded to the flexible constraint. In the meantime, overall policy is determined on the basis of similarity of attribute values and relevancy among attributes in a same record, incorrect values can be identified and are differentiated from correct values from the beginning, and better identification effect is achieved. Clustering the attribute values can show clustering effect with finer grid.

Description

technical field [0001] The invention relates to an information integration technology method, in particular to a Deep Web entity recognition method on unique constraints. Background technique [0002] According to statistics, the amount of information contained on the Web is growing at a rate of 30% every year. Many fields have a large number of data sources and some data overlap. Different data sources provide information of the same entity, and they may represent the same attribute value in different ways, and some data sources even provide wrong attribute values. An important part of data integration is connecting and fusing different records pointing to the same entity. [0003] In practice, many attributes satisfy uniqueness constraints, that is, each entity (or most entities) has a unique value on these attributes, such as the title of the book, publisher, ISBN (International Standard Book Number), and so on. However, sometimes these data do not all satisfy the u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵朋朋辛洁陆姗姗鲜学丰崔志明
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products