Unlock instant, AI-driven research and patent intelligence for your innovation.

A high-dimensional data nearest neighbor query method based on variable-length Hash coding

A hash coding, high-dimensional data technology, applied in the field of information retrieval, can solve problems such as insufficient use of data set distribution information, short query point coding length, and no retained information.

Active Publication Date: 2019-04-26
NINGBO UNIV
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Correlation hash technology does not make full use of the distribution information of the data set, and at the same time the length of the query point code is short, does not retain more information, and needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A high-dimensional data nearest neighbor query method based on variable-length Hash coding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0013] A variable-length hash coded high-dimensional data nearest neighbor query method, comprising the following steps:

[0014] ① Obtain the original high-dimensional data set containing multiple original high-dimensional data and given the query point, perform low-dimensional mapping on the original high-dimensional data set, and generate a random Fourier eigenvector corresponding to each original high-dimensional data. The set of random Fourier eigenvectors of .

[0015] ② Encode according to the hash value of each random Fourier feature vector to obtain the hash code corresponding to each original high-dimensional data, and count the number of occurrences of each hash code in all hash codes to obtain the representation The coding frequency of the occurrence frequency of each hash code, the hash codes with the same coding frequency are us...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a high-dimensional data nearest neighbor query method based on variable-length Hash coding. The method is characterized by comprising the steps of firstly obtaining an originalhigh-dimensional data set and giving a query point; generating a random Fourier feature vector set; acquiring a Hash code corresponding to each piece of original high-dimensional data and the codingfrequency of each Hash code; taking the Hash codes with the same coding frequency as a group of sub-data sets and sorting the sub-data sets; setting a compression ratio for each group of sub-data sets; compressing and training each group of sub-data sets according to the compression ratio; Hash codes and original codes corresponding to each group of trained sub-data sets are acquired, the Hash codes of each group of trained sub-data sets are copied to obtain a plurality of duplicates, and the original codes and the corresponding duplicates are connected in series to obtain serial Hash codes and are fused to form a final nearest neighbor query table; And finally, acquiring a query code of a query point, and searching a nearest neighbor data set in a final nearest neighbor query table to complete query. The method has the advantage that query efficiency and accuracy are greatly improved.

Description

technical field [0001] The invention relates to the technical field of information retrieval, in particular to a variable-length hash coded high-dimensional data nearest neighbor query method. Background technique [0002] Currently, hashing technology is an effective solution for large-scale high-dimensional retrieval. In related technologies, a unified hash coding method is adopted for the entire data set to obtain a hash coding index with a consistent length in lower dimensions, and the same hash method as the data set is generally used for query points. However, in the actual large-scale high-dimensional data, the distribution of data sets has no regularity. Correlation hashing technology does not make full use of the distribution information of the data set. At the same time, the code length of the query point is short and does not retain more information, so it needs to be improved. Contents of the invention [0003] The technical problem to be solved by the presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/901G06F16/9032
CPCG06F16/2255G06F16/2264G06N20/00G06F16/24553G06F17/14
Inventor 任艳多钱江波孙瑶胡伟
Owner NINGBO UNIV