Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Accelerator for end-side real-time training

An accelerator and weight technology, applied in the field of meta-learning, can solve problems such as accuracy drop and load imbalance

Pending Publication Date: 2022-08-05
NANJING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] This application proposes an accelerator for end-side real-time training. The accelerator is an unstructured sparse accelerator with a queuing mechanism, which can improve the load imbalance problem existing in existing unstructured accelerators and reduce a large number of redundant calculations. , to avoid the accuracy drop caused by structured sparseness

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Accelerator for end-side real-time training
  • Accelerator for end-side real-time training
  • Accelerator for end-side real-time training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to facilitate the technical solution of the application, some concepts involved in the application are first described below.

[0051] The training of convolutional neural network generally includes three stages of calculation: FF (feed-forward, forward propagation stage), BP (backward propagation, back propagation stage) and WG (weight gradient generation, weight update stage). A batch of data is forward-propagated and their losses are obtained. The BP process obtains the error of each intermediate feature map by back-propagating the loss. The WG process uses the error of the intermediate features to obtain the gradient value and update value of the weight, and performs renew.

[0052] The specific calculations involved in the above three stages are as follows:

[0053] In the FF stage, the intermediate feature map a of the previous convolutional layer l-1 by the weight W of the current convolutional layer l convolution, and with the bias b l After the add...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an accelerator for end-side real-time training. The accelerator comprises a data module, an address decoding module, a calculation module and a control module, the data module comprises a feature storage block, a weight non-zero value storage block and a weight bit map storage block, the feature storage block comprises a plurality of storage units, the storage units are used for storing input blocks in input groups corresponding to the stages, and the storage units are used for storing the input blocks in the input groups corresponding to the stages. The input block is a data matrix to be multiplied by a corresponding weight value, the weight value sequentially detects values of all dimensions of a first position from front to back, and the weight non-zero value storage block and the weight bit map storage block are used for storing sparse weight data. And the control module is used for controlling the execution process of each stage. The accelerator provided by the invention can improve the load imbalance problem and improve the network detection speed.

Description

technical field [0001] The invention relates to the field of meta-learning, in particular to an accelerator for end-side real-time training. Background technique [0002] In an intelligent military system, in order to adapt to the dynamic battlefield, combat machines are often required to learn real-time data, and deep network training is usually used to learn data. However, in a dynamic environment, new training samples are continuously generated, but combat machines exist. Lightweight requirements, the huge amount of data and computation of deep network training hinder its efficient deployment on resource-constrained devices, and too direct inefficient deployment has high latency and power consumption, which will seriously affect system performance. Few-shot learning provides a way to achieve efficient training on the device side. Based on the pre-training of deep neural networks, selecting an appropriate small-sample learning algorithm can realize low-power, low-latency...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/08G06N3/04G06F17/16G06F7/544
CPCG06N3/063G06N3/084G06F17/16G06F7/5443G06N3/045Y02D10/00
Inventor 王中风王美琪薛睿鑫程昕
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products