Unlock instant, AI-driven research and patent intelligence for your innovation.

Efficient Configurable Convolution Computing Accelerator for Convolutional Neural Networks

A technology of convolutional neural network and accelerator, applied in the field of hardware structure of general-purpose convolutional neural network accelerator

Active Publication Date: 2021-12-03
南京风兴科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the convolutional networks use convolution kernels of 3*3 or 5*5 sizes, and a small number of larger-sized convolution kernels are 7*7 and 11*11, and other sizes are also available. not used effectively

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient Configurable Convolution Computing Accelerator for Convolutional Neural Networks
  • Efficient Configurable Convolution Computing Accelerator for Convolutional Neural Networks
  • Efficient Configurable Convolution Computing Accelerator for Convolutional Neural Networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Here is an introduction to the configuration of the RCC structure and its implementation in different modes. Its input and output interface names are the same as figure 2 One to one correspondence.

[0044] To enable 3*3 mode, set control signals {cs_3, cs_7, cs_11} to {1, 0, 0}. The two fast convolution modules implement three independent 3*3 convolution calculations respectively. At this time, the three independent 3*3 convolution input and output data streams completed by the first fast convolution module are shown in Table 1. The three sets of convolution input and output data patterns completed in the second fast convolution module are similar, and only need to replace the subscript a in Table 1 with b.

[0045]

[0046] Table 1. Input and output data flow of 3*3 mode

[0047] To enable 5*5 mode, set control signals {cs_3, cs_7, cs_11} to {0, 0, 0}. The two fast convolution modules realize two 6*6 convolution calculations in total, and realize 5*5 convolutio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an efficient and configurable convolution computing accelerator for convolutional neural networks. Through configuration, this structure can efficiently implement the convolution calculation of the four mainstream sizes of convolution kernels in the convolutional neural network and all sizes below 12*12, while significantly reducing the complexity of convolution calculations. The present invention first introduces the hardware structure (FFIR) based on the fast FIR algorithm, and cascades 3 parallel FFIRs on the 2 parallel FFIR structure, designs 6 parallel FFIRs (6P-FFIR), and uses a compressor to optimize 6P-FFIR . Based on the structure of 6P‑FFIR, an efficient and configurable convolution computing accelerator (RCC) is designed. Compared with the traditional FIR filter, the present invention can save 33% to 47% of the multiplication calculation when realizing the convolution calculation of the four mainstream sizes. This architecture can save a lot of hardware area and power consumption. It is very suitable for applications in scenarios with strict power consumption requirements such as the Internet of Things and embedded chips. It can also be used in occasions that require convolution calculations of various sizes and improve The effective throughput of the system.

Description

technical field [0001] The invention relates to the field of integrated circuits and machine learning, in particular to a method for efficiently realizing four sizes of 3*3, 5*5, 7*7 and 11*11 in a convolutional neural network, and can realize 12*12 and The hardware structure of a general-purpose convolutional neural network accelerator for convolution calculations of all other sizes below. Background technique [0002] Convolutional neural network (CNN) is currently one of the most studied and widely used machine learning algorithms. Convolution calculation is the part that consumes the most computing resources in CNN. Now most of the convolutional neural network models run on the cloud platform with CPU or GPU as the core. With the further progress and expansion of artificial intelligence technology, convolution The application requirements of neural networks in embedded systems and real-time systems that have strict requirements on hardware resources are also increasing,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/063
CPCG06N3/063
Inventor 王中风王昊楠林军
Owner 南京风兴科技有限公司