Neural network acceleration hardware architecture and method for quantization bit width dynamic selection

A neural network and hardware architecture technology, applied in neural architecture, neural learning methods, biological neural network models, etc., can solve the problem of not taking into account the differences of neurons, and achieve the effect of reducing inference time, ensuring accuracy, and improving performance

Pending Publication Date: 2022-01-07
GUIZHOU POWER GRID CO LTD +1
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But it still does not take into account the differences between different channels and neurons in the same layer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network acceleration hardware architecture and method for quantization bit width dynamic selection
  • Neural network acceleration hardware architecture and method for quantization bit width dynamic selection
  • Neural network acceleration hardware architecture and method for quantization bit width dynamic selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] To achieve the purpose of the present invention, the present invention comprises the following operating steps:

[0045] Step 1. According to the parallel computing characteristics of the deployed hardware, the feature map in the network is divided into blocks as units of neurons. In each block, continue to be divided into groups in the spatial dimension, and the group is defined as the dynamic quantization operation. smallest unit.

[0046] Step 2. Configure a trainable threshold parameter for each block in all feature maps of the target network, and determine the upper and lower bounds of the selectable sparsity of each block according to the given basic quantization bit width and the total amount of target bit width constraints .

[0047] Step 3. Establish a dynamic quantified neural network training and inference model, divide the inference into two parts: high-precision and low-precision calculations, and judge whether to perform low-precision calculations based on ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a neural network acceleration hardware architecture and method for quantization bit width dynamic selection. The hardware architecture comprises a global storage module, a data scheduling module, a local storage module, a dynamic quantization prediction controller and a calculation unit array. The method comprises the following steps: dividing neurons of a feature map in a network by taking a block as a unit, continuing to divide in a spatial dimension by taking a group as a unit in each block, and defining the group as a minimum unit for executing a dynamic quantization operation; configuring a trainable threshold parameter for each block in all feature maps of the target network, and determining upper and lower bounds of selectable sparseness of each block according to a given basic quantization bit width and a target bit width total amount constraint; establishing a dynamically quantized neural network training and reasoning model, dividing reasoning into two parts of high-precision calculation and low-precision calculation, and judging whether to execute the low-precision calculation or not according to a result of the high-precision calculation; on the premise of ensuring that the precision is not lost, the reasoning time in actual hardware is reduced as much as possible.

Description

technical field [0001] The invention relates to the technical fields of embedded data intelligent processing and artificial intelligence; in particular, it relates to a neural network acceleration hardware architecture and method for dynamic selection of quantization bit width. Background technique [0002] In recent years, deep learning has been widely used in data processing fields such as images and voices. Its powerful data analysis, prediction, modeling and classification capabilities also have unique advantages in dealing with complex problems in the field of digital grids, including through images, voices, etc. Recognition technology realizes image analysis of transmission line inspection, intelligent monitoring and alarm of power distribution room and cable pipe gallery, etc., and can achieve more accurate assessment of safety risks around power grid equipment. In addition, intelligent analysis of data in deep learning can be applied to power grids Control text knowl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/08G06N3/04G06N5/04G06F9/50G06F13/16
CPCG06N3/063G06N3/08G06N5/046G06F9/5027G06F13/1673G06N3/042G06N3/045
Inventor 徐长宝辛明勇高吉普王宇张历刘卓毅习伟姚浩陈军健于杨陶伟
Owner GUIZHOU POWER GRID CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products