Local-sensitive hash high-dimensional indexing method based on distribution entropy

A local sensitive hash and local sensitive technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of complex learning process and high overall complexity, reduce memory resources, improve retrieval performance, high The effect of query efficiency and precision

Active Publication Date: 2012-07-25
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF3 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The above methods based on machine learning often lead to high overall complexity due to the complex learning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Local-sensitive hash high-dimensional indexing method based on distribution entropy
  • Local-sensitive hash high-dimensional indexing method based on distribution entropy
  • Local-sensitive hash high-dimensional indexing method based on distribution entropy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0046] In an embodiment of the present invention, a local sensitive hash index method based on distribution entropy is provided. Based on the training data set, the method calculates the distribution entropy value of each hash function in the candidate set of locally sensitive hash functions, and selects L hash functions with the highest distribution entropy values ​​as the set of locally sensitive hash functions, and then according to the local A collection of hash functions to store the data set to be indexed in a hash table. More specifically, the method mainly includes the followi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a local-sensitive hash high-dimensional indexing method based on distribution entropy. The method comprises: firstly, generating a local-sensitive hash function candidate set; secondly, calculating the distribution entropy of each hash function in the local-sensitive hash function candidate set according to a training data set, and selecting L hash functions with the highest distribution entropy as the local-sensitive hash function set; thirdly, storing a data set to be indexed to a hash table according to the local-sensitive hash function set; and querying the above hash table by adopting a query algorithm based on triangle inequality filtering and Euclidean distance sorting to obtain a result set similar to the query data. The method can well suit the data distribution by selecting the hash functions with the highest distribution entropy, thereby optimizing the hash table index structure, reducing memory consumption for indexing, and ensuring more accurate and efficient query.

Description

technical field [0001] The invention relates to an indexing method and a query method in a high-dimensional data space, in particular to an approximate nearest neighbor query method. Background technique [0002] The exponential growth of images and videos on the Internet has brought great challenges to the organization and management of information. At the same time, the demand for mass image and video content analysis is also increasing. Content analysis relies on similarity matching between visual feature data extracted from images and videos. These feature data are not only large in number but also hundreds of dimensions at every turn. High-dimensional index research is exactly how to accurately and efficiently query data sets similar to given data from massive high-dimensional databases. The most basic query method of high-dimensional index is the nearest neighbor query. In the Euclidean space, the formal description of the nearest neighbor of the data x in the databa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张伟高科张勇东李锦涛
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products