Unlock instant, AI-driven research and patent intelligence for your innovation.

Split accumulator for convolutional neural network accelerator

a convolutional neural network and accelerator technology, applied in the field of neural network computation, can solve the problems of reducing the frequency of the accelerator, adding complexity of the design, and producing unachievable cycles, so as to reduce improve zero slacks, the effect of reducing the number of weights

Pending Publication Date: 2021-11-18
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The invention is a split accumulator for a convolutional neural network accelerator that uses weight kneading technique for the acceleration. This technique improves the efficiency of modern DCNN models by reducing the number of weights without loss of accuracy. The patent also describes the architecture of a Tetris accelerator that further enhances the acceleration of convolution computation. The technical effects of the invention are increased efficiency and speed in the acceleration of convolutional neural networks.

Problems solved by technology

Different weights or activations may produce different latency time in the process of acceleration, so unexpectable cycles are produced.
Design of hardware necessarily covers the worse case, and only if the cycle of the worse case functions as a cycle of the accelerator, a processing cycle is increased, and a frequency of the accelerator is reduced while also adding complexity of the design.
Moreover, accuracies desired by different DCNN models are different, and even different layers of the same model have different requirements for accuracy, so the multiplier designed for the convolutional neural network accelerator must cover the worse case.
A main component of the typical DCNN accelerator is a multiply-adder, and the main problem of the multiply-adder is to perform invalid operation.
Secondly, an intermediate result of the multiplication and addition operation is useless.
However, such will necessarily sacrifice accuracy of the result, and in particular, in large data sets, accuracies of these solutions are quite seriously damaged.
However, it is impossible to avoid computing the zero bits.
However, it is quite difficult to realize this object, because it is necessary to modify the current MAC computing mode, and reconstruct hardware architecture to support a new computing mode.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Split accumulator for convolutional neural network accelerator
  • Split accumulator for convolutional neural network accelerator
  • Split accumulator for convolutional neural network accelerator

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027]The invention reconstructs an inference and computing mode of the DCNN model. The invention replaces the typical computing mode MAC with a split accumulator (SAC). A series of adders with a low operation cost are replaced without typical multiplication operation. The invention can make full use of essential bits in the weight, and the split accumulator SAC is formed of adders and shifters without multipliers. Each weight / activation pair in the traditional multiplier performs one shift summation operation, where “weight / activation” means “weight and activation”. However, the invention performs several accumulations on the multiple weights / activations, but one shift-and-add summation only, thereby acquiring large acceleration.

[0028]Finally, the invention proposes a Tetris accelerator to tap the maximum potential of the kneading weight technique and the split accumulator SAC. The Tetris accelerator is formed of a series of split accumulator SAC units, and uses the kneading weight...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed embodiments relate to a split accumulator for a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.

Description

CROSS REFERENCE TO RELATED APPLICATIONS[0001]This application is a national application of PCT / CN2019 / 087769, filed on May 21, 2019. The contents of PCT / CN2019 / 087769 are all hereby incorporated by reference.BACKGROUND OF THE INVENTION1. Field of the Invention[0002]The invention relates to the field of neural network computation, and particularly to a split accumulator for a convolutional neural network accelerator.2. Related Art[0003]A deep convolutional neural network has achieved a significant progress in application of machine learning, for example, real-time image recognition, detection and natural language processing. In order to improve accuracy, architecture of the advanced deep convolutional neural network (DCNN) owns a complex connection and massive neurons and synapses to satisfy requirement for high accurate and complex tasks. In a convolution operation, weights are multiplied by corresponding activations, and finally, products are added up to perform a summation. That i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/04G06F17/15G06N3/082G06N3/045H03M7/4037H03M7/70H03M7/3059G06N3/08G06N3/048G06F5/01G06F7/50H03M7/40G06F7/5443G06F17/16G06F2207/386
Inventor LI, XIAOWEIWEI, XINLU, HANG
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI