LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system

A locally sensitive hashing and indexing system technology, applied in the field of information filtering, can solve the problems of randomness limitation in hash function selection, increase in query matching speed, uneven distribution of data points, etc., to achieve improved accuracy, improved query efficiency, Query the effect of stable performance

Active Publication Date: 2014-03-12
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF3 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the irregularity of the data set distribution, the distribution of data points in the local sensitive hash table is uneven. At the same time, the original local sensitive hash method is to perform hash mapping on all data sets, which leads to the matching speed being limited. Constraints and matching performance are sensitive to inhomogeneity in the distribution of the dataset
There have been some papers to improve the adaptability of the hash function to the data set from the perspective of optimizing local sensitive hash parameters, but the randomness of the selection of the hash function is limited due to the need for the hash function to adapt to the data set. The entire data set is hash-mapped and queried, so the query matching speed has not been improved compared with the original LSH

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system
  • LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system
  • LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0052]The traditional search matching strategy adopts a tree index structure, and the search speed is faster when the data dimension is low, but even the search speed is not as good as the linear search when the dimension is greater than ten. The LSH method maps similar data points to the same hash bucket, calculates the hash value of the query point when matching, uses the point in the hash bucket with the same hash value as a candidate point, and calculates the candidate point and query point The Euclidean distance between them returns the computed nearest neighbor. The LSH method guarantees to return the real nearest neighbor point with a certain probability through the method of hash function mapping, thus great...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an LSH (Locality Sensitive Hashing)-based clustering and indexing method and an LSH-based clustering and indexing system. The LSH-based clustering and indexing method comprises the steps of step 1, carrying out clustering analysis on a data set, dividing the data set into a plurality of categories, and determining and ensuring a clustering center of each category; step 2, establishing a hashing table in each category by adopting an LSH method; step 3, calculating Euclidean distance between each clustering center and a query point, and selecting multiple categories in minimum Euclidean distances as candidate categories; step 4, calculating a hashing value of the query point in each candidate category, and selecting data points of which the hashing values are the same as that of the query point in the candidate categories as candidate points according to the hashing table established in step 2; step 5, calculating the Euclidean distances between the candidate points and the query point, and taking the candidate point in minimum Euclidean distance as a nearest adjacent point to the query point. According to the LSH-based clustering and indexing method and the LSH-based clustering and indexing system, disclosed by the invention, great increasing of query efficiency and relative stability of query performance can be obtained under the situation of less sacrificing the accuracy rate.

Description

technical field [0001] The present invention relates to the technical field of information filtering, in particular to a clustering index method and system based on Locality Sensitive Hashing (LSH for short). Background technique [0002] Below are some nomenclature explanations in this area, and its scope of use is limited to the present invention: [0003] Hash bucket (HashBucket): There may be multiple elements in the same position in the hash table to deal with hash collisions, so that each position in the hash table represents a hash bucket. [0004] Nearest neighbor: Nearest neighbor search is an optimization problem of finding the nearest neighbor in scale space, that is, given a point set S and a target point q∈M in scale space M, find the point closest to q in S , which is the nearest neighbor point. [0005] With the rapid development of the Internet, the number of images on the Internet has grown exponentially. For example, the current number of images on Facebo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/285
Inventor 谢洪涛王鹏徐克付谭建龙
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products