Asymptotic expression entity identification method based on multi-path partitioning

An entity recognition, asymptotic technology, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve high asymptotic effect

Active Publication Date: 2017-06-30
NORTHEASTERN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are currently many applications that require (approximate) real-time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Asymptotic expression entity identification method based on multi-path partitioning
  • Asymptotic expression entity identification method based on multi-path partitioning
  • Asymptotic expression entity identification method based on multi-path partitioning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032]The following is a concrete implementation example of the present invention.

[0033] As shown in Table 1, there is a sample dataset containing 7 records. This is a dirty data set, and the corresponding real recognition result is {{r 1 , r 2 , r 3 , r 4},{r 5},{r 6},{r 7}}. The current requirement is to identify this dirty dataset asymptotically, that is, to identify as many duplicate record pairs as possible given a short running time.

[0034] The sample dirty data set in Table 1 contains 7 personal records whose attributes include name, age, job, and city.

[0035] ID Name age Work City r 1

John Young 29 Waiter Poston r 2

John Joong 29 Waiter Boston r 3

Jon Young - Waiter Boston r 4

John Young 29 Waiter Boston r 5

Bob Brown 27 Waiter Austin r 6

Jeff Allen 29 - Boston r 7

Will Green 29 teacher Boston

[0036] 1. First, perform multi-way partit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an asymptotic expression entity identification method based on multi-path partitioning. The method comprises the following steps: carrying out multi-path partitioning to obtain intersected blocks, building a block diagram to eliminate block redundancy, initializing block credibility and candidate pair credibility, sequencing candidate pairs according to the credibility, and interpolating the candidate pairs into a candidate queue; then iteratively carrying out the following three steps: (1) processing the candidate pairs in the candidate queue, (2) updating the credibility of part of the candidate pairs according to an identification result, and (3) adjusting the sequence of the candidate queue according to the updated credibility of the candidate pairs, gradually outputting identified repeated data object pairs, and continuously repeating the three steps till the candidate queue is emptied. According to the asymptotic expression entity identification method based on the multi-path partitioning, more repeated data objects can be identified according to a relatively short time budget; the credibility of the candidate pairs is updated by dynamically estimating the redundancy of blocks; by real-time selection of the most possibly matched candidate pairs for identification, high asymptotic property is guaranteed.

Description

technical field [0001] The invention belongs to the field of data quality and data integration, and mainly relates to an asymptotic entity recognition method based on multi-way block. Background technique [0002] In the era of big data, an important feature of data is diversity. Data objects that describe the same entity in the real world may appear repeatedly in different forms in a single or multiple data sources, which leads to low-quality data and reduced data quality. It reduces the availability and value of data and becomes a bottleneck for big data integration, processing, analysis and mining. Entity recognition is an important aspect of data quality. By analyzing dirty data sets, repeated data objects describing the same entity are classified into the same group, so as to improve data quality. Entity recognition usually deals with structured data objects, including data records in relational databases, data records in CSV files, data records in XML files, etc. Ent...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/245
Inventor 申德荣孙琛琛寇月聂铁铮于戈
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products