A Fully Pipelined Multiply-Add Cell Array Circuit for Convolutional Neural Networks

A convolutional neural network and cell array technology, applied in the field of artificial intelligence algorithm hardware implementation, can solve the problems of difficult realization and application of MAC array circuit structure, low hardware resource utilization, complex control logic, etc., to improve the complexity of data space. The effect of increasing the utilization rate, increasing the time reuse rate, and improving the system performance

Active Publication Date: 2021-09-17
HUAZHONG UNIV OF SCI & TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the convolutional neural network has a huge amount of calculation and data, and the convolution operation includes a large amount of repeated data multiplication and addition operations. How to design the data flow can improve the data reuse rate in hardware computing as much as possible, making the hardware Computing resources are fully utilized, and the control logic is simple and easy to implement, which is a severe challenge for the hardware design of convolutional neural network algorithms
[0004] Literature "Angel-Eye: A Complete Design Flow for Mapping CNN Onto EmbeddedFPGA", Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol 37, No.1, 2018, A MAC array circuit structure is disclosed, which adopts the strategy of parallel expansion of convolution loops (1), (2) and (4), and realizes high data multiplexing rate through parallel calculation of multiple multiplication units followed by an addition tree structure , but the parallel multiplication mode of this structure causes the calculation unit to be idle during most of the convolution operation, and the control logic is complex, which has the disadvantages of low calculation efficiency and low utilization of hardware resources
In light-weight applications or resource-limited applications, the MAC array circuit structure disclosed in this document is difficult to realize and apply
The data flow and MAC array circuit structure realized by the current research are difficult to achieve high data multiplexing rate, high hardware resource utilization rate and high computing efficiency at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Fully Pipelined Multiply-Add Cell Array Circuit for Convolutional Neural Networks
  • A Fully Pipelined Multiply-Add Cell Array Circuit for Convolutional Neural Networks
  • A Fully Pipelined Multiply-Add Cell Array Circuit for Convolutional Neural Networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0027] Please refer to figure 1 , the embodiment of the present application provides a MAC array, including a plurality of MAC units. The arrangement of the plurality of MAC units is: a single MAC unit is repeatedly arranged along the first direction A1 n, ​​and the n MAC units are connected together in a cascade manner to form a MAC sub-module 102; the MAC sub-module The modules 102 are repea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fully pipelined multiply-add unit array circuit for convolutional neural network, which is characterized in that it includes multiple multiply-add units, and the arrangement of the multiple multiply-add units is as follows: a single multiply-add unit along N units are repeatedly arranged in the first direction, and the n multiplication and addition units are connected together in a cascaded manner to form a multiplication and addition sub-module; the multiplication and addition sub-modules are repeatedly arranged in m along the second direction to form a multiplication and addition core module ; The multiplication and addition core module is repeatedly arranged i along the third direction to form an array circuit including n*m*i multiplication and addition units; wherein m, n and i are integers not less than 2; The first, second and third directions are different. The circuit of the invention can effectively improve the multiplexing rate of data, fully reduce the idle time of the operation unit, and increase the efficiency of convolution operation hardware implementation.

Description

technical field [0001] The invention belongs to the field of artificial intelligence algorithm hardware implementation, and more specifically relates to a fully pipelined multiply-add unit ((Multiplication and Accumulation, MAC)) array circuit for convolutional neural networks. Background technique [0002] In the context of the big data era, the performance of traditional CPUs is no longer sufficient to support large-scale data calculations in artificial intelligence algorithms. The structural design of its general-purpose computing units greatly limits the speed of algorithm prediction and judgment. Due to its high cost and energy consumption, GPU is difficult to be applied on a large scale. Therefore, the design of dedicated hardware circuits for artificial intelligence algorithms that require huge calculations and throughput has broad application prospects. [0003] The convolutional neural network algorithm is one of the most widely used algorithms in artificial intelli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F7/544G06N3/04
CPCG06F7/5443G06N3/045
Inventor 刘冬生陆家昊成轩魏来刘子龙李奥博徐影雄马贤
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products