4-bit quantization method and system of neural network

A neural network and bit quantization technology, applied in the 4-bit quantization method and system field of neural network, can solve the problems of low quantization efficiency, achieve the effects of improving calculation speed, improving practicability, and saving training time

Pending Publication Date: 2020-11-03
SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a 4-bit quantization method and system for a neural network to solve the problem of low quantization efficiency of the neural network quantization method in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • 4-bit quantization method and system of neural network
  • 4-bit quantization method and system of neural network
  • 4-bit quantization method and system of neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] see figure 1 , figure 1 It is a schematic flowchart of a 4-bit quantization method for a neural network provided in the embodiment of the present application. Depend on figure 1 It can be seen that the 4-bit quantization method of the neural network in this embodiment mainly includes the following process:

[0063] S1: Load the pre-trained model of the neural network.

[0064] S2: In the pre-training model, count the initial value of satRelu of each saturated activation layer.

[0065] Specifically, step S2 includes:

[0066] S21: Replace all activation layers relu in the neural network with saturated activation layers satRelu.

[0067] S22: Obtain activation values ​​of satRelu of each saturated activation layer according to the obtained command.

[0068] S23: According to the activation value, use the histogram to statistically distribute the data.

[0069] S24: Select the activation value located at 99.999% points in the histogram as the initial value of the p...

Embodiment 2

[0110] exist figure 1 On the basis of the illustrated embodiment see figure 2 , figure 2 It is a schematic structural diagram of a 4-bit quantization system of a neural network provided by the embodiment of the present application. Depend on figure 2 It can be seen that the 4-bit quantization system of the neural network in this embodiment mainly includes: a loading module, a statistical module, a retraining module, a judgment module and a conversion module.

[0111] Among them, the loading module is used to load the pre-training model of the neural network; the statistics module is used to count the initial value of satRelu of each saturated activation layer in the pre-training model; the retraining module is used to add pseudo-quantization nodes in the neural network , and use the initial value of satRelu to retrain the neural network to obtain a pseudo-quantized model; the judgment module is used to judge whether the accuracy of the pseudo-quantized model converges to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a 4-bit quantification method and system of a neural network. The method comprises the steps of loading a pre-training model of the neural network; in the pre-training model, counting an initial value of each saturation activation layer satRelu; adding pseudo quantization nodes into the neural network, and using the initial value of satRelu for retraining the neural networkto obtain a pseudo quantization model; judging whether the precision of the pseudo-quantization model converges to the set precision; if yes, carrying out reasoning pretreatment on the pseudo-quantization model, and converting the pseudo-quantization model into a 4-bit reasoning model which can be used for reasoning operation; otherwise, returning to carry out re-training of the neural network. The system mainly comprises a loading module, a statistics module, a retraining module, a judgment module and a conversion module. Through the method and the system, the training efficiency can be effectively improved on the basis of ensuring the accuracy of the training result.

Description

technical field [0001] The present application relates to the technical field of neural network model compression, in particular to a 4-bit quantization method and system for neural networks. Background technique [0002] In a neural network, the Neural Network model generally takes up a lot of disk space. For example, the model file of AlexNet exceeds 200MB. Models contain millions of parameters, and most of the disk space is used to store model parameters. Since the model parameters are of floating-point type, it is difficult for ordinary compression algorithms to compress their space. Therefore, introducing model quantization and compressing the original network by reducing the number of bits required to represent each weight can greatly improve the operating speed of the network. Therefore, how to quantify neural networks is an important technical issue. [0003] At present, the mainstream method of neural network quantization is 8-bit quantization, and most training a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/082G06N3/084G06N3/045
Inventor 王曦辉
Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products