Scale-extensible convolutional neural network acceleration system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of convolutional neural network and acceleration system, which is applied to biological neural network models, neural architectures, neural learning methods, etc., can solve problems such as reducing accelerator efficiency, high idle rate of computing units, and poor scalability, and achieve simplified hardware design The effect of structure, large resource reuse, and good flexibility

Pending Publication Date: 2022-05-20

南京广捷智能科技有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0011] The systolic array structure has the advantages of simple and regular design, easy realization of high parallelism, and relatively simple communication between computing units. efficiency; and the problem of poor scalability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0043] refer to figure 1 , in this embodiment, a scalable convolutional neural network acceleration system is proposed, including an XDMA module, a memory interface module, a synchronization module, a control module, an external memory, and at least one acceleration core;

[0044] XDMA module is used for data transmission between host computer and FPGA;

[0045] The memory interface module is used to realize the logic function of controlling external memory read and write;

[0046] The synchronization module is used to solve the problem of cross-clock domain data transmission between the XDMA module and the acceleration core and memory interface module;

[0047] The control module is used to control the operation of each functional module;

[0048] The off-chip main memory is used to store the data needed to accelerate the core operation and the data generated after the operation process is completed.

[0049] The acceleration core includes an operation unit, an input cache...

Embodiment 2

[0099] refer to figure 1 and figure 2 , in this embodiment, a scalable convolutional neural network acceleration method is proposed,

[0100] S1: The XDMA module receives the original data (including image data and weight data) from the host computer from the PCIe interface and stores it in the corresponding address space of the external memory through the synchronization module;

[0101] S2: After preparing the original data required for the operation, the control module starts the acceleration core to perform the operation, and the control module controls the input cache unit and the weight cache unit to read and store the first set of data from the external memory.

[0102] S3: The multiplier will read a set of data from the input cache unit and the weight cache unit for multiplication, and store the calculated result in the on-chip cache. While performing calculations, the input cache unit and the weight cache The unit reads and stores the second set of data from the ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a convolutional neural network acceleration system with an extensible scale. The convolutional neural network acceleration system comprises an XDMA module, a memory interface module, a synchronization module, a control module, an external memory and at least one acceleration core, the main operation of the convolutional neural network is multiplication and addition calculation, the calculation is realized by adopting a special circuit, and compared with a processor, the performance and the power consumption are greatly improved; a multi-channel parallel operation framework is provided, and the defects that a systolic array framework is high in vacancy rate and poor in expandability are overcome; the input data and the weight parameter are respectively stored in an input cache unit and a weight cache unit, so that the data can be efficiently accessed in an operation process; the weight parameter cache and the cache address of the input cache unit are switched according to a set rule, the input cache unit and the weight parameter input operation unit are sequentially input for convolution operation, convolution operation processes of different sizes and different synchronization lengths are unified, and a hardware design structure is simplified.

Description

technical field [0001] The invention relates to the field of convolutional neural network acceleration, in particular to a scalable convolutional neural network acceleration system. Background technique [0002] Convolutional Neural Networks (CNN) is a type of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolution calculations and has a deep structure, and is one of the representative algorithms for deep learning. Convolutional neural network has the ability of representation learning, and can perform shift-invariant classification on input information according to its hierarchical structure, so it is also called "Shift-Invariant Artificial Neural Networks". , SIANN). [0003] Research on convolutional neural networks began in the 1980s and 1990s. Time-delay networks and LeNet-5 were the earliest convolutional neural networks; after the 21st century, with the introduction of deep learning theory and numerical calculation With the improvement of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/08G06N3/04

CPCG06N3/082G06N3/045Y02D10/00

Inventor 沈琳喻

Owner 南京广捷智能科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Scale-extensible convolutional neural network acceleration system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology