High-performance parallel implementation device for K-NN on GPU processor

A high-performance, processor-based technology, applied in multi-programming devices, electrical digital data processing, instruments, etc., can solve the problems of lack of fine tuning, unresolved Top-K data dependence, and insufficient use of computing resources. Achieve rational use of computing resources, speed up target recognition, and reduce data dependence

Active Publication Date: 2021-02-19
PEKING UNIV
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Currently, the best implementation of K-NN on a single GPU is XiaoXing Tang et al. (Tang X, Huang Z, Eyers D, et al. Efficient selection algorithm for fast k-nn search ongpus[C] / / 2015IEEE International Parallel and Distributed ProcessingSymposium.IEEE, 2015:397-406.) proposes an optimization scheme from three perspectives: the data structure for storing data, the use of buffers to reduce calculation branches, and the use of tree structures to reduce the number of operations, but there is no solution in essence The most serious data dependence problem in the Top-K process, with optimization limitations
In addition ZhengYuan Xue in their work (XueZ, Li R, Zhang H, et al.DC-Top-k: A Novel Top-k Selecting Algorithm and ItsParallelization[C] / / 2016 45th International Conference on Parallel Processing(ICPP). IEEE,2016:370-379.) proposed a parallel-friendly Top-K selection scheme, and gave the corresponding MPI (a communication protocol, mostly used for cross-machine communication in high-performance computing clusters) implementation, In this scheme, the array to be selected is divided into k groups, the original array is filtered after the threshold is obtained, and a candidate set whose data volume is only related to k is obtained, and the final result is obtained by performing Top-K on the candidate set, which is greatly improved. The data dependence in the selection process is reduced, but the scheme lacks fine tuning. Since the performance of the algorithm is related to the size of k, when it is applied to a platform such as GPU, the computing resources are not fully utilized.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-performance parallel implementation device for K-NN on GPU processor
  • High-performance parallel implementation device for K-NN on GPU processor
  • High-performance parallel implementation device for K-NN on GPU processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described in detail with reference to the accompanying drawings and embodiments.

[0028] In the embodiment of the present invention, the implementation of the method of the present invention is described by taking the classic application scenario of recognizing the handwritten data set MNIST as an example. The handwritten data set MNIST is composed of digits from 0 to 9 handwritten by different people. The size of each sample is 28×28 pixels and stored in binary format. The MNIST handwritten digit recognition model hopes to recognize handwriting based on an input handwritten digit image. Which digit is the image? Use K-NN as an algorithm to build a handwritten digit recognition model. Whenever a new handwritten digit image is input, the distance between the image and the pixels of the image in all known training samples is calculated and stored in In a distance matrix, the k training samples closest to the image are then selected, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-performance parallel implementation device of a K-NN on a GPU processor. The high-performance parallel implementation device is used for improving parallel acceleration implementation of classification on the GPU processor. The device comprises: a data read-in module for storing training data and test data in an application scene in a matrix form; the sample distancecalculation module that calculates the distance between each test sample and all the training samples; the Top-K selection module that judges execution granularity by utilizing a pre-trained decisiontree model, including thread level optimization, thread bundle level optimization, thread block level optimization, multi-thread block level optimization and radix-based sorting optimization, and first k elements are selected; the label selection module that sets a category label for the test sample. According to the invention, the Top-K parallel framework based on the divide-and-conquer method isused, unnecessary operation is greatly reduced, hardware resources can be more fully utilized, and the purposes of improving the K-NN parallel efficiency on the GPU and achieving time performance acceleration are achieved.

Description

technical field [0001] The invention belongs to the field of parallel acceleration research of classification algorithms in machine learning implemented on computers, and in particular relates to a high-performance parallel implementation device of K-NN algorithm on a general-purpose Nvidia GPU processor. Background technique [0002] In the fields of machine learning, computer vision, pattern recognition, computational geometry, and bioinformatics, the classification of target data is basically a problem involved. Data classification can be described as a basic problem of neighbor search, which is the process of determining the target data category by finding data similar to the target data in the known data set. The K-Nearest Neighbor (K-NN) algorithm is proposed based on the nearest neighbor search. It finds the k data items closest to the target data in the known data set, and determines the target data according to the k data item categories. category. Taking the more...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06N20/00
CPCG06F9/5027G06F9/5044G06N20/00
Inventor 杨超李雨芮敖玉龙李敏李克森
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products