Unlock instant, AI-driven research and patent intelligence for your innovation.

A KNN Query Method Based on Hybrid Granularity Distributed Memory Grid Index

A grid index and mixed granularity technology, applied in the field of information retrieval, can solve the problems of low KNN query efficiency, skewed cluster data, and poor real-time query performance.

Active Publication Date: 2018-07-17
南方电网互联网服务有限公司
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The efficiency of the KNN query method in the big data environment is relatively low. The lack of an effective index structure and the KNN query algorithm supported by this structure is the key to this problem, which is mainly reflected in: ① The research on the index structure in a centralized environment has been relatively mature, based on These index structures also propose many relatively effective KNN algorithms, but with the explosive growth of data volume, the single-machine processing performance in the centralized environment has become an irreconcilable bottleneck; ②In the distributed environment, the index structure based on the MapReduce architecture and related There are also some studies on the KNN query algorithm, but because the MapReduce architecture is a batch processing model, the intermediate results need to be written back to the disk, which increases I / O, which leads to poor real-time performance of the query, and the existing algorithm does not consider the data distribution. , it is easy to cause the data skew problem of the cluster
[0005] In view of the low efficiency of KNN query in the current big data environment and the shortcomings of existing technologies, we propose a KNN query method based on mixed granularity distributed memory grid index, combined with the characteristics of memory clusters, we estimate the overall data by summarizing , establish a coarse-fine mixed-grained distributed memory grid index structure to reduce data skew and improve data retrieval efficiency; design a non-lost neighbor fine-grained grid search algorithm to ensure fast and accurate positioning of the neighbors of the search object Grid, finally, based on the established index structure and non-lost neighbor fine-grained grid search algorithm, combined with the distributed memory computing model, the traditional centralized KNN algorithm can be distributed and extended to eliminate the single machine of the centralized KNN algorithm Performance bottlenecks and I / O bottlenecks based on the MapReduce architecture KNN algorithm, and then perform fast KNN queries on massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A KNN Query Method Based on Hybrid Granularity Distributed Memory Grid Index
  • A KNN Query Method Based on Hybrid Granularity Distributed Memory Grid Index
  • A KNN Query Method Based on Hybrid Granularity Distributed Memory Grid Index

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0085] 1, without loss of generality, take the cluster that 6 servers form as experimental platform (wherein 1 serves as Master node, 5 serves as Slave node), take two-dimensional spatial data KNN query as example to carry out the detailed description of the technical solution of the present invention. The overall data is shown in the table below, and the spatial distribution is as follows image 3 as shown,

[0086] (12,68)

(31,73)

(58,63)

(57,23)

(4,26)

(28,33)

(11,16)

(56,8)

(21,66)

(16,72)

(13,56)

(62,78)

(52,29)

(7,34)

(32,19)

(13,26)

(53,16)

(23,61)

(18,61)

(26,57)

(65,66)

(59,49)

(6,23)

(28,13)

(26,23)

(56,43)

(27,71)

(11,63)

(7,55)

(67,72)

(64,24)

(9,11)

(38,26)

(67,43)

(66,16)

(53,72)

(8,73)

(21,53)

(46,33)

(62,36)

(8,2)

(37,13)

(2,12)

(57,71)

(56,76)

[0087] 2. Spatially partition the data in Table 1 using grid ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hybrid granularity distributional memory grid index-based KNN query method, and specifically, the method is realized by the following steps: performing a data pre-processing step: based on the grid and density, performing space division of overall data to obtain overview estimation of overall data distribution; performing a data query step: establishing a hybrid granularity distributional memory grid index structure, that is, establishing a non-bisecting coarse-grained grid index and a bisecting fine-grained grid grid index; and and on the basis, by designing a distributional KNN query algorithm, realizing a fast KNN query for mass data. Compared to the prior art, the hybrid granularity distributional memory grid index-based KNN query method, reduces the data skew of the cluster, improves the data index efficiency and supports the distributional algorithm by designing and establishing the hybrid granularity distributional memory grid index, and disperses the bottleneck in single processing performance of the centralized KNN query algorithm, and the real-time degradation problem led by intermediate result rewriting disc of the KNN query algorithem based on the MapReduce construction by using the KNN query algorithm based on the index structure.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a highly practical KNN query method based on a mixed granularity distributed memory grid index. Background technique [0002] With the rapid development of technologies such as the Internet, the Internet of Things, and big data, KNN query, as a basic operation, is widely used in various location-based applications. However, as the amount of data continues to grow, the traditional centralized KNN query and the KNN query method based on the MapReduce architecture cannot effectively process massive data quickly. How to expand the traditional centralized KNN query algorithm in the big data environment, combining the characteristics of memory clusters, indexing technology and distributed memory computing technology is the fundamental way to solve this problem. [0003] Indexing technology is a key component of the KNN query algorithm. Its basic idea is to use methods such as divi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1847G06F16/2282
Inventor 蔡斌雷朱世伟郭芹杨子江于俊凤魏墨济李思思徐蓓蓓李晨巴志超鞠镁隆
Owner 南方电网互联网服务有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More