Unlock instant, AI-driven research and patent intelligence for your innovation.

A bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device

A technology of neural network and quantitative method, applied in the direction of neural learning method, biological neural network model, complex mathematical operation, etc., can solve the problems of large consumption of computing resources and high power consumption, and reduce the time of reasoning and storage and memory footprint, reducing the effect of computational complexity

Pending Publication Date: 2021-04-20
NANJING UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention mainly aims at the problem of large memory and computing resource consumption and high power consumption of the existing convolutional neural network, and proposes a multi-bit neural network quantization method and device with adaptive bit width

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device
  • A bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device
  • A bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] This embodiment provides a neural network quantization method with an adaptive bit width, and the specific steps are as follows figure 1 shown. In this embodiment, the weight parameters of the neural network are divided into groups as the basic unit. Generally, layers or channels are taken as a group. The layer refers to the weight represented by a whole layer of convolutional network or fully connected network. parameter, the shape of the tensor is generally C o ×C i ×K×K (convolution layer) or F o ×F i (fully connected layer); the channel refers to C o or F o Dimension weight data, that is, C i ×K×K (convolution layer) or F i (Fully connected layer) dimension data is considered as a whole. Weight parameters in the same group are quantized to the same number of quantization bits, and weight parameters in different groups may be quantized to different numbers of quantization bits.

[0030] In this embodiment, any set of quantized parameters is modeled as a stan...

Embodiment 2

[0060] This embodiment provides a neural network quantization device with adaptive bit width. Such as figure 2 As shown, it includes a multi-bit quantization module and a quantization bit number adjustment module.

[0061] The multi-bit quantization module, under the constraints of multi-bit quantization, optimizes the quantization expected mean square error by using the violent search algorithm in the case of standard Gaussian distribution or standard Laplace distribution, and establishes a quantization strategy (that is, quantization level) for different quantization bits. Gaussian lookup table and Laplacian lookup table of numbers. In the case of Gaussian distribution, the Gaussian lookup table and the multi-bit Gaussian quantization formula are used in the quantization training process to perform multi-bit quantization on each set of weight parameters. , find the corresponding quantization level, then subtract the mean value of each group of weight parameters, divide it...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device. The method comprises the following steps: obtaining a lookup table of a quantization strategy under different quantization bit numbers by minimizing a quantization expected mean square error under multi-bit binary constraints; quantifying different weight parameters by using a lookup table in a quantitative training process; and after the forward process and the reverse process, calculating the sensitivity of each weight parameter relative to the loss function by using the gradient of the quantized weight parameter, and adjusting the quantized bit number of each weight parameter according to the accumulated sensitivity after a certain number of iterations. The device comprises a multi-bit quantization module and a quantization bit number adjusting module. According to the method and the device, the storage space and the calculation amount occupied by the neural network parameters can be greatly reduced, and the training time can be shortened, so that the method and the device can be conveniently deployed in embedded equipment, mobile equipment or other terminals.

Description

technical field [0001] The invention relates to the field of neural network compression, in particular to a quantization method and device for a neural network with adaptive bit width. Background technique [0002] In recent years, deep neural networks have made significant progress in many fields, such as object recognition, image restoration, semantic segmentation, etc. However, a neural network layer may contain tens of thousands of parameters, resulting in millions of calculations during an iteration. Due to limitations in computing power, memory, and power consumption, neural network algorithms are difficult to deploy on portable devices, such as smartphones, smart wearable devices, and drones. [0003] In the neural network algorithm, the data to be operated is usually composed of 32-bit floating-point numbers, and the main operations are multiplication and addition of floating-point numbers. These operations and floating-point numbers consume most of the computing re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06F17/15G06F17/18
Inventor 岳涛赵思杰胡雪梅
Owner NANJING UNIV