High-dimensional data similarity connection inquiry method and device based on distance partition tree

A similarity and distance technology, applied in the field of data processing, can solve problems such as high computational complexity, low query efficiency, and inability to meet performance requirements, and achieve the effect of reducing complexity and improving query efficiency.

Pending Publication Date: 2018-11-16
LUOYANG NORMAL UNIV
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The similarity join query of massive high-dimensional data is a computationally intensive operation. With the continuous increase of data size and dimension, traditional centralized processing methods and index-based algorithms can no longer meet the performance requirements.
[0004] Similarity query is an important operation that is widely used. At present, a lot of rese

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data similarity connection inquiry method and device based on distance partition tree
  • High-dimensional data similarity connection inquiry method and device based on distance partition tree
  • High-dimensional data similarity connection inquiry method and device based on distance partition tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0074] It should be noted that like numerals and let...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a high-dimensional data similarity connection inquiry method and device based on a distance partition tree. The method comprises the steps of: acquiring high-dimensional original data, and mapping the original data into a one-dimensional space; according to a first distance threshold and a chi square distribution property, determining a second distance threshold, and according to the original data and the second distance threshold, constructing the distance partition tree; traversing the distance partition tree and carrying out comparison on each node in the distance partition tree to obtain a candidate similar node pair set; and calculating an original distance between the original data included in each candidate similar node pair in the candidatesimilar node pair set, and carrying out comparison on each original distance and the first distance threshold to obtain a similarity inquiry result. The device is used for executing the method. According to the embodiment of the invention, complexity of calculation is reduced by mapping the high-dimensional original data to the one-dimensional space, candidate results can be found with low cost bythe distance partition tree, and a filtering effect is improved, so that inquiry efficiency is greatly improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a high-dimensional data similarity connection query method and device based on mapping space division. Background technique [0002] With the development of data acquisition technology and the advancement of data acquisition equipment, the data scale, data accuracy, and data dimension have all increased rapidly in unprecedented ways. The dimensions of many types of data can reach thousands of dimensions, or even tens of thousands of dimensions, such as graphic images, videos, trajectories, time series, etc. The purpose of high-dimensional data similarity join query is to find data pairs whose similarity is greater than or equal to a given similarity threshold or whose distance is less than or equal to a given distance threshold from a large number of high-dimensional data sets. Important applications, such as image clustering, document deduplication, similar vide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 马友忠张瑞玲林春杰李莹
Owner LUOYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products