Similarity connection query method and device

A connection query and similarity technology, which is applied in the field of similarity connection query method and device, can solve the problems of low efficiency, redundancy, and repetitive calculation work of similarity connection query

Active Publication Date: 2019-05-21
LUOYANG NORMAL UNIV
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in practice, it is found that the threshold-based similarity join query method needs to manually pre-set the threshold, and then continuously modify the threshold, and repeat the similarity join query method according to the new thresh

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity connection query method and device
  • Similarity connection query method and device
  • Similarity connection query method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] Please see figure 1 , figure 1 It is a schematic flow diagram of a similarity connection query method provided by the embodiment of the present application. Such as figure 1 As shown, the similarity connection query method includes:

[0071] S101. Obtain a set of original vectors to be queried, the number of result vectors, and a set of initial result vector pairs.

[0072] In the embodiment of the present application, the original vector set is the data set for the similarity join query, the initial result vector pair set is the initial data set of the similarity join query result, and the number of result vectors represents the number of vector pairs of the similarity join query result.

[0073] Similarity join query has important applications in many fields, such as image clustering, duplicate webpage detection, similar user recommendation, etc. Correspondingly, the original vector set to be queried may be an original image data set, an original webpage data set,...

Embodiment 2

[0085] Please see figure 2 , figure 2 It is a schematic flow diagram of a similarity connection query method provided by the embodiment of the present application. Such as figure 2 As shown, the similarity connection query method includes:

[0086] S201. Obtain a set of original vectors to be queried, the number of result vectors, and a set of initial result vector pairs.

[0087] In the embodiment of the present application, the original vector set is the data set for the similarity join query, the initial result vector pair set is the initial data set of the similarity join query result, and the number of result vectors represents the number of vector pairs of the similarity join query result.

[0088] S202. Acquire original vector dimensions of the original vector set.

[0089] In the embodiment of the present application, the original vector set is a high-dimensional vector set, including multiple original vectors, where the dimension of each original vector is the ...

Embodiment 3

[0172] Please see image 3 , image 3 It is a schematic block diagram of a similarity connection query device provided in the embodiment of the present application. Such as image 3 As shown, the similarity connection query device includes:

[0173] The first obtaining module 310 is used to obtain the original vector set to be queried, the number of result vectors and the initial result vector pair set; wherein, the original vector set is a data set for similarity connection query, and the initial result vector pair set is a similarity connection The initial data collection of query results, the number of result vectors indicates the number of vector pairs of similarity connection query results.

[0174] The grouping module 320 is configured to perform grouping processing on the original vector set to obtain multiple sub-vector grouping sets.

[0175] A construction module 330, configured to construct a similarity distribution histogram of the original vector set according...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a similarity connection query method and device, and relates to the field of data processing. The method comprises: when similarity connectivity query is carried out; The method comprises the steps of firstly obtaining an original vector set for similarity connection query; the number of vector pairs of the similarity connection query results and an initial data set of thesimilarity connection query results are determined; performing grouping processing on the original vector set; obtaining a plurality of sub-vector grouping sets; constructing a similarity distributionhistogram of the original vector set; according to a similarity distribution histogram and the number of result vectors, calculating a similarity distribution histogram; calculating vector distance thresholds, finally, grouping the sets according to a plurality of sub-vectors; and the initial result vector pair set is updated according to the vector distance threshold value and the result vectorquantity to obtain a result vector pair set for representing the similarity connection query result, so that the vector distance threshold value does not need to be set manually in advance, a large amount of redundant calculation can be reduced, and the similarity connection query efficiency is improved.

Description

technical field [0001] The present application relates to the field of data processing, in particular, to a similarity connection query method and device. Background technique [0002] Similarity join query is to find data pairs whose similarity is greater than or equal to a given similarity threshold or whose distance is less than or equal to a given distance threshold from a large number of high-dimensional data sets. It has important applications in many fields, such as image Clustering, duplicate web page detection, similar user recommendation, etc. At present, the similarity join query method based on the threshold value can be used to perform similarity join query. The selection of the threshold needs to be manually determined according to the distance distribution between vectors in the vector set to be queried, and finally a preset number of data pairs can be obtained. . However, in practice, it is found that the threshold-based similarity join query method needs t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F16/28
Inventor 马友忠张瑞玲林春杰李莹
Owner LUOYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products