Unlock instant, AI-driven research and patent intelligence for your innovation.

Two level compute memoing for large scale entity resolution

A large-scale entity resolution technology, applied in computing, computing models, special data processing applications, etc., can solve problems that take several minutes to tens of minutes, and large data sets cannot be scaled well.

Pending Publication Date: 2021-02-26
IBM CORP
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Current solutions do not scale well on large datasets
For data collections with millions of records, each iteration can take minutes to tens of minutes on a 6-node cluster

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two level compute memoing for large scale entity resolution
  • Two level compute memoing for large scale entity resolution
  • Two level compute memoing for large scale entity resolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018]The description of different embodiments has been presented for illustrative purposes, but is not intended to be exhaustive or limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes will be apparent to those of ordinary skill in the art. The terms used herein are selected to best explain the principles of the embodiments, practical applications, or technical improvements found in the market, or to enable those of ordinary skill in the art to understand the embodiments disclosed herein.

[0019]First of all, it should be understood that although the present disclosure includes a detailed description about cloud computing, the implementation of the technical solutions described therein is not limited to a cloud computing environment, but can be implemented in combination with any other type of computing environment now known or developed in the future.

[0020]The embodiment relates to the eliminat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

One embodiment provides for a method that includes performing, by a processor, active learning of large scale entity resolution using a distributed compute memoing cache to eliminate redundant computation. Link feature vector tables are determined for intermediate results of the active learning of large scale entity resolution. The link feature vector tables are managed by a two-level cache hierarchy.

Description

Background technique[0001]Active learning of Entity Resolution (ER) rules reduces the burden on users who are essential for interactivity. Current solutions cannot scale well on large data sets. For a data set with millions of records, each iteration may take several minutes to tens of minutes on a 6-node cluster.[0002]The matching function is the basic unit that composes the ER rule, and the ER rule is provided by the user. Active learning learns the composition and threshold of several matching functions and generates ER rules. Multiple iterations of the active learning process output multiple ER rules, which as a whole identify entities belonging to the same real word entity.[0003]The blocking function is a specific type of matching function incorporated into the ER rule. An ER rule should have at least one blocking function. The blocking function is used to reduce the number of pairs to be compared from the dual-input data set, thereby reducing the computational cost.Summary of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F12/0813G06F12/0866G06F12/0897G06F12/0811G06N20/00G06F3/06G06N5/02
CPCG06N20/00G06N5/025G06F12/0866G06F2212/284G06F12/0897G06F2212/154G06F2212/454G06F2212/1048G06F16/215G06F12/0811G06F12/0813G06F3/067G06F3/0608G06F3/0641
Inventor 李旻L·普帕P·瑟恩
Owner IBM CORP