Convolutional neural network multinuclear parallel computing method facing GPDSP

A convolutional neural network and parallel computing technology, applied in the field of deep learning, can solve problems such as accelerating convolutional neural network computing, and achieve the effects of powerful parallel computing, efficient parallel computing, and high-bandwidth vector data loading capabilities

Active Publication Date: 2018-11-30
NAT UNIV OF DEFENSE TECH
View PDF12 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The powerful computing power of GPDSP makes it possible to become a very good platform for accelerating convolutional neural network calculations. However, GPDSP is a heterogeneous multi-core processor that includes CPU cores and DSP cores, including register files, scalar memories, and on-chip vector arrays. Memory, on-chip shared memory array, off-chip DDR memory and other multi-level storage architectures, the existing convolutional neural network computing methods cannot be directly applied to GPDSP.
To implement convolutional neural network calculations by GPDSP, there are still problems such as how to map convolutional neural network calculations to GPDSP's CPU core and multiple DSP cores with 64-bit vector processing arrays, and how to utilize the multi-level parallelism of GPDSP. There is no effective solution for convolutional neural network computing based on GPDSP, and it is urgent to provide a GPDSP-oriented convolutional neural network multi-core parallel computing method to take advantage of the structural features and multi-level parallelism of GPDSP to improve the performance of convolutional neural networks. Network Computational Efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Convolutional neural network multinuclear parallel computing method facing GPDSP
  • Convolutional neural network multinuclear parallel computing method facing GPDSP
  • Convolutional neural network multinuclear parallel computing method facing GPDSP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below in conjunction with the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

[0042] The simplified memory access structure model of the GPDSP specifically adopted in this embodiment is as follows: figure 1 As shown, the system includes a CPU core unit and a DSP core unit, wherein the DSP core unit includes several 64-bit vector processing array computing units, dedicated on-chip scalar memory and vector array memory, and the on-chip shared memory shared by the CPU core unit and the DSP core unit. Storage, large-capacity off-chip DDR memory, that is, GPDSP contains multiple DSP cores of 64-bit vector processing arrays, which can simultaneously perform parallel data processing through SIMD.

[0043] Such as figure 2 As shown, the present embodiment is oriented to the convolutional neural network multi-core parallel computing method of GP...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a convolutional neural network multinuclear parallel computing method facing GPDSP. The method comprises the following steps that S1, two data cache regions and a weight data cache region are constructed by a CPU core in an off-chip memory; S2, convolution kernel data of the designated number is combined by the CPU core for processing and stored in the weight data cache region; S3, the CPU core accesses to-be-calculated image data of the designated amplitude for combination processing, and the data is transmitted to a free data cache region; S4, if DSP cores are leisure, and data ready of the data cache regions is achieved, an address is transmitted to the DSP cores; S5, all the DSP cores conduct convolution neural network calculation; S6, a calculation result of the current time is output; S7, the steps of S3-S6 are repeated until all the calculations are completed. The performance and multi-level parallelism of the CPU core and the DSP cores in the GPDSP can be fully exerted, and efficient convolutional neural network calculation is achieved.

Description

technical field [0001] The present invention relates to the technical field of deep learning, in particular to a convolutional neural network multi-core parallel computing method oriented to GPDSP (General-Purpose Digital Signal Processor). Background technique [0002] At present, the deep learning model based on Convolutional Neural Networks (CNN) has made remarkable achievements in various aspects such as image recognition and classification, machine translation, automatic text processing, speech recognition, automatic driving, and video analysis. become a research hotspot in various fields. Convolutional neural network is a deep feed-forward neural network, which is usually composed of several convolutional layers, activation layers and pooling layers alternately. The convolutional layer performs feature extraction through the convolution operation of the convolution kernel and the input features, so that Learn the characteristics of each level. In the calculation of c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/80G06N3/04G06N3/063
CPCG06F15/8069G06N3/063G06N3/045
Inventor 刘仲郭阳扈啸田希陈海燕陈跃跃孙永节王丽萍
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products