Active learning for data matching

A technology of data points and data sets, applied in the field of matching data, can solve problems such as inconsistent data and unprocessed document records, and achieve the effect of accurate classification and saving processing resources

Pending Publication Date: 2022-03-11
IBM CORP
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This results in most of the paperwork not being processed for a long peri

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active learning for data matching
  • Active learning for data matching
  • Active learning for data matching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The description of various embodiments of the present invention has been presented for purposes of illustration, and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvements over technologies found in the marketplace, or to enable a person of ordinary skill in the art to understand the embodiments disclosed herein.

[0032] A dataset is a collection of one or more data records. For example, a data set may be provided as a collection of related records contained in a file. For example, a dataset could be a file containing records for all students in a class. A dataset can be, for example, a table of a database or a file of the Hadoop file system, etc. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The inventive method comprises: a) training a machine learning model using a current set of tagged data points, each data point being a plurality of data records, where the tagging of the data points indicates a classification of the data points, the training resulting in a trained machine learning model configured to classify the data points as representing the same entity or different entities. B) a subset of unmarked data points can be selected from the current unmarked data point set using the classification results of the current unmarked data point set. C) a subset of unlabeled data points may be provided to a classifier and a label of the subset of unlabeled data points may be received in response to the providing. Steps a) to c) may be repeated using the current set of tagged data points plus the subset of tagged data points as the current set of tagged data points.

Description

Background technique [0001] The present invention relates to the field of digital computer systems, in particular to a method for matching data. [0002] Clerical records are records that a given matching process cannot determine whether they are duplicates of each other and therefore should be merged, or whether one or more records should be considered non-matching and therefore should be kept separately from each other. These paperwork records may require user intervention to more closely view the values ​​of the data records. Despite tremendous efforts to automate and improve the record matching process, the number of these paper records continues to increase (eg, it can be millions of paper records). This results in most of the paperwork not being processed for a long period, during which time inconsistent data may be used in the system configuration. Contents of the invention [0003] Various embodiments as described by the subject matter of the independent claims pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/00
CPCG06N20/20G06N5/01G06F16/285G06N20/00
Inventor L·布雷默U·巴杰帕M·奥伯霍菲尔A·鲁茨夏维尔德考斯塔
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products