Method for performing efficient similarity search

a similarity search and efficient technology, applied in the field of similarity search methods, can solve the problems of limited scalability to large collection of objects, limited efficiency, marginally affecting the accuracy of produced ranking,

Inactive Publication Date: 2010-04-29
ESULI ANDREA +1
View PDF1 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0032]The main contribution of the invention is the definition of an index data structure that enables to have fast searches and very good scalability with respect to the database size. Such index makes efficient use of both the main and secondary memory of the computer, taking advantage of the specific properties of both kinds of memories. The main memory is a relatively small but very fast random-access memory that allows fast access and navigation through complex data structures. The secondary memory is a permanent storage that allows to store large amounts of data. It is orders of magnitude slower than the main memory but it still guarantees good I / O performance for sequential accesses.

Problems solved by technology

This simple data organization results in a limited scalability to large collection of objects, due to the large amount of main memory required to store the sequences, and a limited efficiency, due to the non-optimized pattern of accesses to disk in order to retrieve the objects to be compared with the query.
In [1], the authors propose two optimizations that improve the efficiency of the search process, marginally affecting the accuracy of the produced ranking.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for performing efficient similarity search
  • Method for performing efficient similarity search
  • Method for performing efficient similarity search

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048]This section describes the data structures defined by the invention, the input values taken by the invention to build and access such data structures, and how the data structures are used to provide an efficient similarity search functionality.

7.1 Data Structures

[0049]This section describes the data structure, i.e. the index, defined by the invention.

[0050]The invention allows to perform approximate k-NN similarity search on a database D of objects belonging to a domain , on the base of a distance function d: ×→.

[0051]In order to build the index, the invention takes in input a set of reference objects R, belonging to the domain , where each object rεR is identified univocally by a number that goes from 0 to #R−1, where the #X operator returns the number of elements in the set X, that is R={r0, r1, . . . , r#R−1}.

[0052]The invention uses a function ƒI(o, R, d, l) (FIG. 3) that, given an element oε, the set of reference objects R and the distance function d, returns a sequence s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides systems and methods for performing efficient k-NN approximate similarity search on a database of objects. The invention is based on the definition of an index data structure that enables to have fast searches and very good scalability with respect to the database size. Such index makes efficient use of both the main and secondary memory of the computer, taking advantage of the specific properties of both kinds of memories.A prefix tree is built on all the sequences assigned to the database objects by a sequence generation function. The prefix tree is stored in the main memory.The information required to identify each database object and to compute the similarity between database objects and query objects are stored in a data storage kept in the secondary memory.Given a query object and the request for the k nearest neighbors, the search functionality of the invention uses the prefix tree to quickly identify a set of candidate objects. The organization of the data storage is then used to efficiently retrieve the information relative to the candidate objects. Such information is used to compute the similarity of candidate object with the query, in order to select the k most similar ones, which are thus returned as the result.

Description

1 PROVISIONAL LINKRelated U.S. Application Data[0001]Provisional application No. 61 / 108,943, filed 28 Oct. 2008, by the same inventors of the present application.2 FIELD OF THE INVENTION[0002]This invention relates generally to methods for performing similarity searches in a collection of objects. In particular the invention performs approximate k nearest neighbors analysis using a particular data index structure that permits to execute efficient and fast searches.3 BACKGROUND[0003]In a lot of modern applications is required to find, in a database, some objects similar to a given one, on the base of a degree of similarity. This problem can be solved with many advantages with similarity search methods. In these methods, to determine if an object is similar to another, a distance function is used: the smaller is the distance between two objects, the higher is their relative similarity.[0004]More formally the problem can be expressed in the following way:[0005]a database D contains obj...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30G06F7/00
CPCG06K9/6276G06F17/30961G06F16/9027
Inventor ESULI, ANDREAGALEOTTI, CRISTINA
Owner ESULI ANDREA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products