Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Scale-extensible convolutional neural network acceleration system

A technology of convolutional neural network and acceleration system, which is applied to biological neural network models, neural architectures, neural learning methods, etc., can solve problems such as reducing accelerator efficiency, high idle rate of computing units, and poor scalability, and achieve simplified hardware design The effect of structure, large resource reuse, and good flexibility

Pending Publication Date: 2022-05-20
南京广捷智能科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The systolic array structure has the advantages of simple and regular design, easy realization of high parallelism, and relatively simple communication between computing units. efficiency; and the problem of poor scalability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scale-extensible convolutional neural network acceleration system
  • Scale-extensible convolutional neural network acceleration system
  • Scale-extensible convolutional neural network acceleration system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] refer to figure 1 , in this embodiment, a scalable convolutional neural network acceleration system is proposed, including an XDMA module, a memory interface module, a synchronization module, a control module, an external memory, and at least one acceleration core;

[0044] XDMA module is used for data transmission between host computer and FPGA;

[0045] The memory interface module is used to realize the logic function of controlling external memory read and write;

[0046] The synchronization module is used to solve the problem of cross-clock domain data transmission between the XDMA module and the acceleration core and memory interface module;

[0047] The control module is used to control the operation of each functional module;

[0048] The off-chip main memory is used to store the data needed to accelerate the core operation and the data generated after the operation process is completed.

[0049] The acceleration core includes an operation unit, an input cache...

Embodiment 2

[0099] refer to figure 1 and figure 2 , in this embodiment, a scalable convolutional neural network acceleration method is proposed,

[0100] S1: The XDMA module receives the original data (including image data and weight data) from the host computer from the PCIe interface and stores it in the corresponding address space of the external memory through the synchronization module;

[0101] S2: After preparing the original data required for the operation, the control module starts the acceleration core to perform the operation, and the control module controls the input cache unit and the weight cache unit to read and store the first set of data from the external memory.

[0102] S3: The multiplier will read a set of data from the input cache unit and the weight cache unit for multiplication, and store the calculated result in the on-chip cache. While performing calculations, the input cache unit and the weight cache The unit reads and stores the second set of data from the ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a convolutional neural network acceleration system with an extensible scale. The convolutional neural network acceleration system comprises an XDMA module, a memory interface module, a synchronization module, a control module, an external memory and at least one acceleration core, the main operation of the convolutional neural network is multiplication and addition calculation, the calculation is realized by adopting a special circuit, and compared with a processor, the performance and the power consumption are greatly improved; a multi-channel parallel operation framework is provided, and the defects that a systolic array framework is high in vacancy rate and poor in expandability are overcome; the input data and the weight parameter are respectively stored in an input cache unit and a weight cache unit, so that the data can be efficiently accessed in an operation process; the weight parameter cache and the cache address of the input cache unit are switched according to a set rule, the input cache unit and the weight parameter input operation unit are sequentially input for convolution operation, convolution operation processes of different sizes and different synchronization lengths are unified, and a hardware design structure is simplified.

Description

technical field [0001] The invention relates to the field of convolutional neural network acceleration, in particular to a scalable convolutional neural network acceleration system. Background technique [0002] Convolutional Neural Networks (CNN) is a type of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolution calculations and has a deep structure, and is one of the representative algorithms for deep learning. Convolutional neural network has the ability of representation learning, and can perform shift-invariant classification on input information according to its hierarchical structure, so it is also called "Shift-Invariant Artificial Neural Networks". , SIANN). [0003] Research on convolutional neural networks began in the 1980s and 1990s. Time-delay networks and LeNet-5 were the earliest convolutional neural networks; after the 21st century, with the introduction of deep learning theory and numerical calculation With the improvement of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/082G06N3/045Y02D10/00
Inventor 沈琳喻
Owner 南京广捷智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products