A Deep Learning Accelerator for Stacked Hourglass Networks

A deep learning and stacking technology, applied in the field of neural network training, can solve the problems of occupying hardware running time, reducing battery life, and not being able to accelerate, so as to improve efficiency, improve utilization, and reduce delay.

Active Publication Date: 2021-04-13
SUN YAT SEN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The stacked hourglass network structure uses a large number of depth-separable convolution modules and multi-level residual structures. During the calculation process, these calculation layers require a large number of calculation units to access the memory to obtain the data required for calculation, and the delay generated during the memory access process It will take up most of the hardware running time. In the past, deep neural network accelerators did not provide optimized computing circuits for the memory access methods of the above-mentioned network structure, so they cannot provide effective acceleration for this structure.
At the same time, additional memory access due to unoptimized circuit design will also bring additional power consumption, which greatly reduces the battery life of devices with this type of accelerator unit.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Deep Learning Accelerator for Stacked Hourglass Networks
  • A Deep Learning Accelerator for Stacked Hourglass Networks
  • A Deep Learning Accelerator for Stacked Hourglass Networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0047] Such as figure 1 As shown, this embodiment discloses a deep learning accelerator suitable for a stacked hourglass network, including a control module 1, a data calculation module 2 and a data cache module 3;

[0048] The control module 1 is connected to the main control processor, and is used to receive the control signal input by the main control processor, and control the data calculation module 2 and the data cache module 3 according to the control signal;

[0049] Specifically, such as figure 2 As shown, the data calculation module 2 includes a plurality of layer calculation units 21; the layer calculation units 21 are used to perform data processing op...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep learning accelerator suitable for a stacked hourglass network. The parallel calculation layer calculation unit improves the calculation parallelism, and the data cache module improves the utilization of data loaded into the accelerator internal cache while accelerating the calculation speed. At the same time, the data adjuster inside the accelerator can make adaptive changes in the order of data arrangement according to the operation of the computing layer, which can increase the integrity of the acquired data, improve the efficiency of data acquisition, and reduce the delay of the memory access process. Therefore, while improving the computing speed of the algorithm, the accelerator effectively reduces the memory bandwidth by reducing the number of memory accesses and improving memory access efficiency, thereby realizing the overall computing acceleration performance of the accelerator.

Description

technical field [0001] The invention belongs to the field of neural network training, and in particular relates to a deep learning accelerator suitable for a stacked hourglass network. Background technique [0002] Deep Neural Networks (Deep Neural Networks) is an algorithm model in deep learning. Due to its superior performance compared with traditional algorithms, it has been widely used in various fields such as image classification, target recognition, and gesture recognition. Deep neural networks require a large amount of data calculations. Traditional general-purpose processors have slow calculation speeds due to architectural limitations and cannot meet the needs of real-time applications. Therefore, it is necessary to design dedicated neural network accelerators to provide hardware support for real-time calculations of deep neural networks. . [0003] In the application of gesture recognition, a deep neural network structure called stacked hourglass network (Stacked...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/063G06N3/08G06N3/04
CPCG06N3/063G06N3/08G06N3/045
Inventor 栗涛陈弟虎梁东宝萧嘉乐叶灵昶
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products