Distributed index method based on LSH (Locality Sensitive Hashing)

A sensitive hash and distributed technology, applied in the computer field, can solve the problems of poor scalability, low search efficiency, and long response time of location-sensitive hash, so as to overcome the limitation of single-point capabilities, improve search efficiency, The effect of improving processing power

Inactive Publication Date: 2014-04-23
NANJING UNIV
View PDF11 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The problem to be solved by the present invention is: in the retrieval and query technology of high-dimensional data, the scalability of traditional position-sensitive hashing is not very good, and the search efficiency is low; In online search, because the calculation method of offline processing takes too long to respond

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed index method based on LSH (Locality Sensitive Hashing)
  • Distributed index method based on LSH (Locality Sensitive Hashing)
  • Distributed index method based on LSH (Locality Sensitive Hashing)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The invention solves the defects of the prior art, and proposes a distributed index method based on position sensitive hash to process the approximate nearest neighbor search of massive data, and can improve the search efficiency. The present invention can provide convenient search service for subsequent applications such as multimedia information retrieval, multimedia data classification, data mining and the like. The present invention will be described in detail below with reference to the accompanying drawings.

[0030] In the embodiment of the present invention, the original data set is clustered, and then each class is mapped to a different computer node, such as figure 1 shown. On each node, the present invention uses position-sensitive hashing to hash all data points on the node and store the hash results in hash buckets, that is, index files. Specifically include the following steps:

[0031] A general search framework consists of two phases: index creation a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed index method based on LSH (Locality Sensitive Hashing). The distributed index method comprises the following steps: firstly utilizing a clustering algorithm to cluster mass data sets; then mapping clustering centers to different computational nodes; then mapping original mass picture or video characteristic data to the computational nodes corresponding to the type so that each node can process one type; finally utilizing the method based on p-stable distribution LSH to establish data index on different nodes. In order to reduce the merging time of search results on different computational nodes and improve the quality of the search results, the invention provides two methods to select m types recently for subsequent detailed search. The invention provides a guide for automatically mapping the mass data to the different computational nodes; moreover, according to the method, the detailed comparison times during the search period of the LSH further can be effectively reduced, so that the search of the mass data is more accurate and efficient.

Description

technical field [0001] The invention belongs to the technical field of computers, relates to a distributed index and query method in a high-dimensional data space, and relates to an approximate nearest neighbor search method, in particular to a distributed index method based on position-sensitive hashing. Background technique [0002] The traditional nearest neighbor search method plays an important role in information retrieval, data mining, database and other applications. Some classic tree-structure-based search methods usually perform well for low-dimensional data with dimensions less than 10. However, the features used to analyze data are far greater than 10, so these tree-structure-based search methods All degenerate to the efficiency of linear search, this phenomenon is known as the "dimension disaster". [0003] In the last 10 years, some scholars have proposed the use of approximate nearest neighbor search to replace the traditional nearest neighbor search. This me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/244G06F16/2255G06F16/2471
Inventor 武港山徐向阳
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products