Separable array-based reconfigurable accelerator and realization method thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An implementation method and accelerator technology, applied in physical implementation, neural learning methods, biological neural network models, etc., can solve problems such as waste of computing resources and bandwidth resources, low utilization of memory bandwidth, waste of resources, etc.

Active Publication Date: 2017-11-10

TSINGHUA UNIV

View PDF2 Cites 71 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In the hybrid neural network, the convolutional network is a computationally intensive network, and one data transmission can participate in dozens (or even hundreds) of convolution operations, so the convolutional network only needs a part of the memory bandwidth to satisfy all computing resources. Demand for data, resulting in low utilization of memory bandwidth

On the contrary, fully connected networks and recursive networks are memory-intensive, and one data transmission only participates in one operation. Therefore, these two networks can only provide data for a part of computing resources by using all the memory bandwidth, resulting in low utilization of computing resources.

[0006] Second, the waste of resources caused by sparse

The fully connected network has a very high degree of sparsity, so the use of sparse computing to accelerate the fully connected network can improve the performance and energy efficiency ratio, but the existing convolution accelerators are not compatible with sparse network computing, resulting in the shortage of computing resources and bandwidth resources. waste at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0078] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0079] figure 1 It is a schematic structural diagram of a reconfigurable accelerator based on a divisible array according to an embodiment of the present invention, such as figure 1 As shown, the reconfigurable accelerator includes: a scratch-pad memory buffer (Scratch-Pad-Memory Buffer, SPM buffer or SPM buffer for short), a register buffer and a partitionable computing array (computing array). The register buffer area is connected to the computing array, an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a separable array-based reconfigurable accelerator and a realization method thereof. The reconfigurable accelerator comprises a scratchpad memory cache area, separable calculation arrays, and a register cache area, wherein the scratchpad memory cache area is used for realizing reuse of data of convolution calculation and sparsity full connection calculation, the separable calculation arrays comprise multiple reconfigurable calculation units and fall into a convolution calculation array and a sparsity full connection calculation array, the register cache area is a storage area formed by multiple registers, and provides input data, weight data and corresponding output results for convolution calculation and sparsity full connection calculation, input data and weight data of convolution calculation are input into the convolution calculation array, the convolution calculation array outputs a convolution calculation result, input data and weight data of the sparsity full connection calculation are input into the sparsity full connection calculation array, and the sparsity full connection calculation array outputs a sparsity full connection calculation result. Characteristics of two neural networks are fused, so that the calculation resource of the chip and the memory bandwidth use ratio are improved.

Description

technical field [0001] The invention relates to neural network accelerator technology, in particular to a reconfigurable accelerator based on a divisible array and its realization method. Background technique [0002] In the past ten years, deep learning (Deep Learning) technology has promoted the rapid development of artificial intelligence technology. Artificial intelligence technology based on deep learning has achieved great success in the fields of image recognition, video analysis, speech recognition and natural semantic understanding. In some scenarios, it even surpasses human intelligence. The deep neural network (DeepNeural Network) based on deep learning is the core technology for realizing intelligent tasks. At this stage, an intelligent task is often composed of multiple deep neural networks. The current mainstream deep neural networks mainly include: deep convolutional network (Deep Convolution Neural Network, CNN), deep full connection network (Deep Full Conne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/063G06N3/08

CPCG06N3/063G06N3/084

Inventor 尹首一唐士斌欧阳鹏涂锋斌刘雷波魏少军

Owner TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Separable array-based reconfigurable accelerator and realization method thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology