High-dimensional data similarity join query method and device based on mapping space partition

A high-dimensional data and mapping space technology, applied in the field of data processing, can solve problems such as low query efficiency, failure to meet performance requirements, and high computational complexity, and achieve the effects of reducing computational complexity, reducing quantity, and improving query efficiency

Pending Publication Date: 2018-11-20
LUOYANG NORMAL UNIV
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The similarity join query of massive high-dimensional data is a computationally intensive operation. With the continuous increase of data size and dimension, traditional centralized processing methods and index-based algorithms can no longer meet the performance requirements.
[0004] Similarity query is an important operation that is widely used. At present, a lot of research has been done on it. For the performance and expansion problems faced by large-scale data similarity join query, in the prior art, the MapReduce framework is used to solve it , however, for high-dimensional data, when the similarity query is performed through the MapReduce framework, its computational complexity is high, which leads to the problem of low query efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data similarity join query method and device based on mapping space partition
  • High-dimensional data similarity join query method and device based on mapping space partition
  • High-dimensional data similarity join query method and device based on mapping space partition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0050]It should be noted that like numerals and lett...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the present invention provide a high-dimensional data similarity join query method and device based on mapping space partition. The method comprises: acquiring high-dimensional raw dataand mapping the raw data to one-dimensional space; determining a second distance threshold according to a first distance threshold and the chi-square distribution property, and dividing the one-dimensional space into a plurality of subspaces according to the second distance threshold; determining a number of the subspace corresponding to each raw data; obtaining a candidate data pair according tothe second distance threshold and the numbers of the subspaces; calculating an original distance of the candidate data pair and comparing the original distance with the first distance threshold to obtain a similarity query result. Device for performing a method is provided. As the high-dimensional raw data is mapped to the one-dimensional space, the raw data is divided in the one-dimensional space according to the second distance threshold, and then the similarity inquiry is carried out, and in this way, the computational complexity is lowered, the number of candidate results is reduced, andthe inquiry efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a high-dimensional data similarity connection query method and device based on mapping space division. Background technique [0002] With the development of data acquisition technology and the advancement of data acquisition equipment, the data scale, data accuracy, and data dimension have all increased rapidly in unprecedented ways. The dimensions of many types of data can reach thousands of dimensions, or even tens of thousands of dimensions, such as graphic images, videos, trajectories, time series, etc. The purpose of high-dimensional data similarity join query is to find data pairs whose similarity is greater than or equal to a given similarity threshold or whose distance is less than or equal to a given distance threshold from a large number of high-dimensional data sets. Important applications, such as image clustering, document deduplication, similar vide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 马友忠张瑞玲林春杰李莹
Owner LUOYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products