Unlock instant, AI-driven research and patent intelligence for your innovation.

Robust detector of fuzzy duplicates

A fuzzy, processor-based technology, applied in the field of databases and data warehouses, which can solve problems such as false negatives and false positives

Active Publication Date: 2006-03-08
MICROSOFT TECH LICENSING LLC
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this threshold-based approach results in a large number of false positives (tuples that are not true duplicates but are presumed to be) or a large number of false negatives (tuples that are actually duplicates but cannot be identified)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robust detector of fuzzy duplicates
  • Robust detector of fuzzy duplicates
  • Robust detector of fuzzy duplicates

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The following description sets forth techniques that facilitate the detection and elimination of ambiguously repeated tuples in databases. These technologies can be implemented in numerous ways, including but not limited to, program modules, general and special purpose computing systems, special purpose appliances, and part of one or more computer networks.

[0017] An exemplary implementation of these techniques may be referred to as an "Exemplary Fuzzy Repeat Detector" and is described below.

[0018] An exemplary fuzzy duplicate detector can solve the fuzzy duplicate removal problem. Here, "fuzzy duplicates" are seemingly different groups of tuples (ie records) that are not exact matches but represent the same actual entity or phenomenon. Detecting and eliminating fuzzy duplicates is the fuzzy duplicate removal problem.

[0019] standard characterization repetition

[0020] In detecting ambiguous repetitions, the exemplary ambiguous repetition detector utilizes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Description

technical field [0001] The present invention generally relates to technologies related to databases and data warehouses. Background technique [0002] Decision support analysis of data warehouses affects important business decisions; therefore, the accuracy of such analysis is critical. However, data received at the data warehouse from external sources often contains errors (eg spelling mistakes, inconsistent conventions between the various data sources). These errors often result in repeated entry of tuples. Therefore, a lot of time and money is spent on data cleaning which is the task of detecting and correcting errors in the data. [0003] The problem of detection and elimination of duplicate tuples in databases is one of the main problems in the broad field of data cleaning and data quality. It is often the case that the same logical real entity can have multiple representations within the data warehouse. [0004] For example, when a customer named Isabel shopped twi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30303G06F16/215G06F11/00G06F12/00Y10S707/99937Y10S707/99942Y10S707/99943Y10S707/99945Y10S707/99932Y10S707/99933
Inventor R·莫特瓦尼S·乔德里V·甘提
Owner MICROSOFT TECH LICENSING LLC