A bit-width-adaptive neural network quantization method and a bit-width-adaptive neural network quantization device
A technology of neural network and quantitative method, applied in the direction of neural learning method, biological neural network model, complex mathematical operation, etc., can solve the problems of large consumption of computing resources and high power consumption, and reduce the time of reasoning and storage and memory footprint, reducing the effect of computational complexity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0029] This embodiment provides a neural network quantization method with an adaptive bit width, and the specific steps are as follows figure 1 shown. In this embodiment, the weight parameters of the neural network are divided into groups as the basic unit. Generally, layers or channels are taken as a group. The layer refers to the weight represented by a whole layer of convolutional network or fully connected network. parameter, the shape of the tensor is generally C o ×C i ×K×K (convolution layer) or F o ×F i (fully connected layer); the channel refers to C o or F o Dimension weight data, that is, C i ×K×K (convolution layer) or F i (Fully connected layer) dimension data is considered as a whole. Weight parameters in the same group are quantized to the same number of quantization bits, and weight parameters in different groups may be quantized to different numbers of quantization bits.
[0030] In this embodiment, any set of quantized parameters is modeled as a stan...
Embodiment 2
[0060] This embodiment provides a neural network quantization device with adaptive bit width. Such as figure 2 As shown, it includes a multi-bit quantization module and a quantization bit number adjustment module.
[0061] The multi-bit quantization module, under the constraints of multi-bit quantization, optimizes the quantization expected mean square error by using the violent search algorithm in the case of standard Gaussian distribution or standard Laplace distribution, and establishes a quantization strategy (that is, quantization level) for different quantization bits. Gaussian lookup table and Laplacian lookup table of numbers. In the case of Gaussian distribution, the Gaussian lookup table and the multi-bit Gaussian quantization formula are used in the quantization training process to perform multi-bit quantization on each set of weight parameters. , find the corresponding quantization level, then subtract the mean value of each group of weight parameters, divide it...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


