Efficient distributed locality-sensitive hashing method

A locally sensitive hashing and distributed technology, applied in instrumentation, database indexing, computing, etc., to achieve the effects of reducing message transmission overhead, improving query performance, and improving efficiency

Active Publication Date: 2021-10-01
NAT UNIV OF DEFENSE TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Aiming at solving the similarity search problem of massive high-dimensional data, the present invention implements an efficient distributed local sensitive hashing method based on the Spark platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient distributed locality-sensitive hashing method
  • Efficient distributed locality-sensitive hashing method
  • Efficient distributed locality-sensitive hashing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to better understand the technical solutions in this application, the following will give a clear and detailed description of this application in conjunction with the drawings and specific implementation methods in the embodiments of this application:

[0029] Such as figure 1 As shown, the present invention discloses an efficient distributed locality-sensitive hashing method, which is designed and implemented based on the distributed computing framework Spark, and includes the following parts:

[0030] A client is an application that defines a specific task, such as building a hash table or executing a query. The client submits the task to the master node for scheduling, then sends it to each computing node for parallel execution, and waits to receive the calculation result.

[0031] The master node communicates with the client and the working nodes, receives the jobs submitted by the clients, divides the jobs into a set of tasks, schedules the tasks accordin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A distributed locality-sensitive hashing method, comprising: loading original data from a distributed file system, reading a set of original data vectors, and generating a first elastic distributed data set; Quantity k, construct L composite hash functions; calculate L hash values ​​of each item of data in the data set, map each item of data to each hash bucket of each hash table, and convert the L hash values ​​of each item of data The key-value pair composed of the hash table identifier and the value of the composite hash function is merged into a character string and mapped to a digital key value, and the digital key value and the data identifier form a key-value pair, which is saved as a second elastic distributed data set; according to The digital key value of each data item in the second data set is repartitioned, so that data with the same digital key value is stored in the same partition, and the construction of the hash table is completed. The method can reduce the amount of shuffling generated in the process of constructing the hash table, improve the efficiency of constructing the index, and reduce the overhead of message transmission during query.

Description

technical field [0001] The invention belongs to the field of big data data mining in the Internet technology, and in particular relates to the distributed implementation of a locally sensitive hash method, which accelerates the similarity search of massive high-dimensional data. Background technique [0002] Similarity search is an important issue in the field of multimedia information retrieval. It refers to finding the highest similarity (or the object with the smallest distance). In order to improve the efficiency of similarity search, indexing methods such as KD-tree, R-tree, and SR-tree have been proposed one after another, and have good results in low-dimensional spaces. However, as the data dimension increases, the performance of these methods shows a sharp decline, which is called the "curse of dimensionality". In order to overcome the "curse of dimensionality", many approximate search methods have been proposed, one of the most famous methods is Locality Sensitive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22G06F16/2458G06F16/2453G06F16/2455
CPCG06F16/2255G06F16/2453G06F16/24554G06F16/2471
Inventor 张万新李东升徐颖
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products