Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Measurement space data similarity query method and device based on SQL

A technology for data similarity and space measurement, applied in the field of data processing, can solve problems such as mismatching index structure types, and achieve the effect of improving applicability and performance

Active Publication Date: 2018-01-09
RENMIN UNIVERSITY OF CHINA
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and device for querying similarity of metric space data based on SQL to solve the problem that the index structure constructed by the similarity query method of metric space data in the prior art does not match the type of index structure supported in RDBMS, so as to realize A database based on SQL technology to realize the method of measuring spatial data similarity query, so as to improve the applicability and performance of the RDBMS database similarity query

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Measurement space data similarity query method and device based on SQL
  • Measurement space data similarity query method and device based on SQL
  • Measurement space data similarity query method and device based on SQL

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] This embodiment provides a SQL-based method for querying the similarity of metric space data, such as figure 1 As shown, the method may include:

[0055] Step 101, perform partition processing on the data set to obtain multiple partitions; wherein, each partition contains: data object, reference point;

[0056] Due to the variety of data types and the huge amount of data in the metric space, in order to improve the query efficiency, the preprocessing method of dividing all the data in the database can be used to divide the data into multiple partitions. All the data in the metric space constitute a data set, and each partition after the data set is divided contains a partition serial number for identifying the partition, and each partition also contains at least one reference point and at least one data object.

[0057] Specifically, such as figure 2 As shown, multiple objects can be arbitrarily selected as reference points in the data set, and the data space can be ...

Embodiment 2

[0078] This embodiment provides a SQL-based method for querying the similarity of metric space data, such as image 3 As shown, the method may include:

[0079] Step 201, in the data set, determine multiple reference points;

[0080] There are many ways to determine the reference point in the data set. For example, according to the number of data objects remaining in the data set except for the reference point, the data objects can be equally divided into multiple reference points according to the number; The distance to the reference point, divide the data objects whose distance to the reference point is within the preset range to each reference point.

[0081] Since the distribution of reference points and data objects is irregular, the selection of reference points affects the number of data objects in each partition and the distance from each data object to the reference point. Therefore, the quality of reference point selection directly affects the performance of simila...

Embodiment 3

[0122] This embodiment provides a SQL-based method for querying the similarity of metric space data, such as Figure 4 As shown, the method may include:

[0123] Step 301, perform partition processing on the data set to obtain multiple partitions; wherein, each partition contains: data object, reference point;

[0124] Due to the variety of data types and the huge amount of data in the metric space, in order to improve the query efficiency, the preprocessing method of dividing all the data in the database can be used to divide the data into multiple partitions. All the data in the metric space constitute a data set, and each partition after the data set is divided contains a partition serial number for identifying the partition, and each partition also contains at least one reference point and at least one data object.

[0125] Specifically, such as figure 2 As shown, multiple objects can be arbitrarily selected as reference points in the data set, and the data space can be...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a measurement space data similarity query method and device based on an SQL. The method includes the steps that partitioning processing is performed on a data set, wherein eachpartition comprises data objects and a reference point; a first distance between each data object and the reference point in each partition is determined according to the reference point; an index structure of each data object is determined according to the first distances; a second distance between a query object and the reference point in each partition is determined according to the query object in a query request, a query range of the query object in each partition is determined according to the second distances and a preset distance threshold; a target data object corresponding to the query range in each partition is determined in the index structure of each data object. The method can achieve measurement space data similarity query on the basis of a database of the SQL technology soas to improve the applicability and performance of RDBMS database similarity query.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method and device for querying similarity of metric space data based on SQL. Background technique [0002] Similarity query is to find all data objects r in the data set R whose distance from the query object q is less than or equal to the user-specified threshold θ by giving a data set R, a query object q, a similarity function, and a user-specified threshold θ. That is, the data object r is considered to be similar to the query object q. Similarity query can be applied in various fields, including face recognition, fingerprint recognition, spatial location query, text error correction, pattern recognition (such as DNA or protein sequence), etc. With the rapid growth of data volume in datasets and the diversity of data types, the objects of similarity query have extended from the early dimension data in Euclidean space and string data in text space to the present The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 卢卫杜小勇侯佳佳
Owner RENMIN UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products