Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System and method for adaptive multi-cultural searching and matching of personal names

a multi-cultural and personal name technology, applied in the field of automatic data processing systems, can solve the problems of significant information retrieval challenges, system inability to address the full range of spelling variations, and behavior and permutations, and achieve the effect of achieving the desired level of precision and recall

Inactive Publication Date: 2005-12-08
IBM CORP
View PDF54 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The present invention provides an improved system and method for searching names in a database using multiple processing options, automatically selecting and using appropriate cultural-specific set of algorithms to search for database names and evaluate their proximity to a query name. The system incorporates a name classifier, a multi-step process, and a cultural intelligence component to identify matches. The system also uses innovative key-searching technology based on the International Phonetic Alphabet and selectively uses sets of generic and language-specific spelling rules to infer possible phonological manifestations for personal names. The system provides an improved automatic data processing system for searching names and an improved process for effectively searching and retrieving personal names in a database. The system also integrates fuzzy logic and compensates for transpositions, inversions, and affixes to improve search accuracy. The search methodologies can be selectively used in various combinations for different applications, providing flexibility and customization options.

Problems solved by technology

The nature of names, however, their behavior and permutations, pose significant challenges to information retrieval.
Other name search or information retrieval systems are generally unable to recognize or address the full range of variation in names.
These systems cannot accommodate even the slightest spelling variations, initials or abbreviations (JOS. Z. BROWN / JOSEPH ZACHARY BROWNE).
Other systems may use techniques or keys (such as Soundex or Soundex-like keys) that permit some minor spelling differences between names (DORSHER / DOERSHER) but these techniques generally fail to cope with significant variation (DOERSHER / DOESHER) or problems posed by names from non-Anglo cultures (ABDEL RAHMAN / ABDURRAMAN).
Some of the more common variants can be accommodated in this way, but retrieval is then limited to those items on the list and cannot accommodate new representations or random variation or keying errors (GOMEZ / BOMEZ).
Although spelling variations can often be addressed through character-matching techniques (e.g., SMITH / SMYTH), false-positive matches can result from traditional string or character comparisons when common morphological endings, such as OVICH, occur at the end of otherwise dissimilar names (e.g., ZELENOVICH / JOVANOVICH).
Character-based systems may also be confronted with significant retrieval problems caused by names with the same pronunciation but with divergent spellings.
Another common cause of name variation, which creates retrieval difficulty for name search systems, is the inclusion or exclusion of name data.
However, this system uses Soundex algorithms to process Unicode input for all cases, rather than providing a name searching system with culture-specific algorithms.
Its architecture included sets of algorithms applicable to different cultures, but no automatic classification of the cultural origin of a name.
None of these earlier systems provide a satisfactory system and method for multicultural name searching.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for adaptive multi-cultural searching and matching of personal names
  • System and method for adaptive multi-cultural searching and matching of personal names
  • System and method for adaptive multi-cultural searching and matching of personal names

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041]FIG. 1 shows a multi-algorithmic name search system 100 according to a first preferred embodiment of the present invention, in block schematic form. In this embodiment, system 100 sequentially performs three basic processes. First, system 100 selects a search strategy based on the cultural origin, distribution, language or ethnicity of the name in question and pre-processes the name to break it into its component parts for processing. Second, a subset of the available database records is selected, based on a culture-relevant key-indexing strategy. The objective of this subsetting process is to select a set of keys that are likely matches for the name in question. Finally, the records selected in the second process are subjected to a similarity measurement, using a complex algorithm tailored according to the selected search strategy, to evaluate and rank-order potential matches. Thus, system 100 adopts a search strategy that is specific to the ethnicity or cultural origin of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An automated name searching system incorporates an automatic name classifier and a multi-path architecture in which different algorithms are applied based on cultural identity of the query name. The name classifier operates with a preemptive list, analysis of morphological elements, length, and linguistic rules. A name regularizer produces a character based computational representation of the name. A pronunciation equivalent representation such as an IPA language representation, and language specific rules to generate name searching keys, are used in a first pass to eliminate database entries which are obviously not matches for the query name. The methods can also be implemented as a callable set of library routines including an intelligent preprocessor and a name evaluator that produces a score comparing a query name and database name, based on a variety of user-adjustable parameters. The user-controlled parameters permit tuning of the search methodologies for specific custom applications.

Description

[0001] This application claims the benefit of U.S. Provisional Patent Application Ser. No. 60 / 079,233, filed Mar. 25, 1998, the entire disclosure of which is incorporated herein by reference. [0002] A portion of this disclosure contains material in which copyright is claimed by the applicant and / or others. The copyright owner has no objection to the copying of this material in the course of making copies of the application file or any patents that may issue on the application, but all other rights whatsoever in the copyrighted material are reserved.FIELD OF THE INVENTION [0003] The present invention relates generally to automatic data processing systems that search and retrieve records from a database based on matching of personal names, and to improved systems and methods for intelligently processing name comparisons. BACKGROUND OF THE INVENTION [0004] Information about individuals is often stored in a computer. Access to that information is most readily gained by using the name of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F7/00G06F17/30G06F40/00
CPCY10S707/99936Y10S707/99933G06F16/33Y10S707/99942G06F16/90344Y10S707/99945
Inventor HERMANSEN, JOHN CHRISTIANSHAEFER, LEONARD ARTHUR JR.MCCALLUM-BAYLISS, HEATHERLUTZ, RICHARD D.
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products