High-dimensional data accurate neighbor quick searching method based on euclidean distance

A technology of Euclidean distance and high-dimensional data, which is applied in the field of data processing, can solve the problems of performance degradation, inability to query accurate neighbors, high efficiency, etc., and achieve the effect of narrowing the range, increasing the speed, and accurate results

Active Publication Date: 2013-09-04
ZHEJIANG UNIV
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The traditional neighbor query algorithm has many deficiencies, such as k-dimensional tree, ball tree and other tree structures that adopt the space division strategy. They have better effect on low-dimensional data, but when the data dimensi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data accurate neighbor quick searching method based on euclidean distance
  • High-dimensional data accurate neighbor quick searching method based on euclidean distance
  • High-dimensional data accurate neighbor quick searching method based on euclidean distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] With reference to accompanying drawing, further illustrate the present invention:

[0020] A high-dimensional data neighbor query method based on the upper and lower bounds of Euclidean distance and data filtering strategy, the method comprises the following steps:

[0021] 1. After expressing the data as a vector, perform the following processing:

[0022] 1) Embed the high-dimensional data into the two-dimensional space S composed of mean and variance, and use the commanding height tree to index the embedded two-dimensional data, which is recorded as index1;

[0023] 2) Establish a sampling neighbor index for the high-dimensional data itself, which is recorded as index2. The establishment of this index can use any approximate neighbor index structure, such as R tree, KD tree, and local sensitive hash;

[0024] 3) For the query data q, first sample through the index index2 to obtain the threshold T, then query the set of data points whose Euclidean distance from the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a high-dimensional data accurate neighbor quick searching method based on euclidean distance. The method includes expressing high-dimensional data into a vector form, embedding the high-dimensional data into a two-dimensional space formed by mean value and variance and meanwhile building a sampling index of the original high-dimensional data. When neighbor searching is conducted, the sampling index is first utilized to obtain a filtering threshold when a searching point is input, then the filtering threshold is utilized to filter non-neighbor data in the two-dimensional space to obtain a candidate data set, finally the distances between all candidate data points and the searching point are calculated in a linear traversal mode, and the nearest neighbor point of the searching point is calculated. The method has the advantage of being capable of quickly processing the high-dimensional data and capable of searching for the accurate neighbor point.

Description

technical field [0001] The invention relates to the fields of data processing such as information retrieval, data mining and cluster analysis, and specifically relates to indexing high-dimensional data and performing accurate neighbor query by using the upper and lower bounds of Euclidean distance and a certain data structure. Background technique [0002] With the vigorous development of information technology and the Internet, and the widespread use of multimedia digital devices, we have a massive amount of network information beyond any previous era, which contains a large amount of high-dimensional data, such as pictures, audio, video, etc. How to analyze this Fast and accurate indexing and retrieval of massive high-dimensional data is an urgent problem to be solved. [0003] An important function of indexing and retrieval is nearest neighbor query, which is to query the data most similar to the input data in the database. This is a very basic but important operation. In...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 陈纯王灿卜佳俊朱林徐斌吴晓凡汪识翰
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products