Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for performing efficient similarity search

a similarity search and efficient technology, applied in the field of similarity search methods, can solve the problems of limited scalability to large collection of objects, limited efficiency, marginally affecting the accuracy of produced ranking,

Inactive Publication Date: 2010-04-29
ESULI ANDREA +1
View PDF1 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

"The patent text describes a method for performing similarity searches in a collection of objects using a data index structure. The method allows for efficient and fast searches by performing approximate k nearest neighbors analysis. The method is divided into two types: exact methods and approximate methods. Exact methods require a linear scan of the entire database to retrieve the objects that meet the constraint of the query, while approximate methods relax some of the constraints to speed up the search process. The method uses a data structure that allows for scalability and efficient retrieval of objects. The technical effect of the patent is to provide a faster and more scalable method for performing similarity searches in large databases."

Problems solved by technology

This simple data organization results in a limited scalability to large collection of objects, due to the large amount of main memory required to store the sequences, and a limited efficiency, due to the non-optimized pattern of accesses to disk in order to retrieve the objects to be compared with the query.
In [1], the authors propose two optimizations that improve the efficiency of the search process, marginally affecting the accuracy of the produced ranking.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for performing efficient similarity search
  • Method for performing efficient similarity search
  • Method for performing efficient similarity search

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048]This section describes the data structures defined by the invention, the input values taken by the invention to build and access such data structures, and how the data structures are used to provide an efficient similarity search functionality.

7.1 Data Structures

[0049]This section describes the data structure, i.e. the index, defined by the invention.

[0050]The invention allows to perform approximate k-NN similarity search on a database D of objects belonging to a domain , on the base of a distance function d: ×→.

[0051]In order to build the index, the invention takes in input a set of reference objects R, belonging to the domain , where each object rεR is identified univocally by a number that goes from 0 to #R−1, where the #X operator returns the number of elements in the set X, that is R={r0, r1, . . . , r#R−1}.

[0052]The invention uses a function ƒI(o, R, d, l) (FIG. 3) that, given an element oε, the set of reference objects R and the distance function d, returns a sequence s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides systems and methods for performing efficient k-NN approximate similarity search on a database of objects. The invention is based on the definition of an index data structure that enables to have fast searches and very good scalability with respect to the database size. Such index makes efficient use of both the main and secondary memory of the computer, taking advantage of the specific properties of both kinds of memories.A prefix tree is built on all the sequences assigned to the database objects by a sequence generation function. The prefix tree is stored in the main memory.The information required to identify each database object and to compute the similarity between database objects and query objects are stored in a data storage kept in the secondary memory.Given a query object and the request for the k nearest neighbors, the search functionality of the invention uses the prefix tree to quickly identify a set of candidate objects. The organization of the data storage is then used to efficiently retrieve the information relative to the candidate objects. Such information is used to compute the similarity of candidate object with the query, in order to select the k most similar ones, which are thus returned as the result.

Description

1 PROVISIONAL LINKRelated U.S. Application Data[0001]Provisional application No. 61 / 108,943, filed 28 Oct. 2008, by the same inventors of the present application.2 FIELD OF THE INVENTION[0002]This invention relates generally to methods for performing similarity searches in a collection of objects. In particular the invention performs approximate k nearest neighbors analysis using a particular data index structure that permits to execute efficient and fast searches.3 BACKGROUND[0003]In a lot of modern applications is required to find, in a database, some objects similar to a given one, on the base of a degree of similarity. This problem can be solved with many advantages with similarity search methods. In these methods, to determine if an object is similar to another, a distance function is used: the smaller is the distance between two objects, the higher is their relative similarity.[0004]More formally the problem can be expressed in the following way:[0005]a database D contains obj...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30G06F7/00
CPCG06K9/6276G06F17/30961G06F16/9027G06F18/24147
Inventor ESULI, ANDREAGALEOTTI, CRISTINA
Owner ESULI ANDREA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products