Iterative Neural Network Quantization Method and System Based on Vector Quantization
A neural network and network technology, applied in the field of neural network quantization solutions, can solve problems such as low compression efficiency and large convolution layer bits, and achieve the effects of improving performance, ensuring network performance, and good scalability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0051] This embodiment provides an iterative neural network quantization system based on vector quantization, including: a clustering module, an error-based division module, a parameter sharing module and a retraining module, wherein:
[0052] The clustering module makes full use of the distribution of the parameters itself to control the quantization error, that is, clusters the network parameters into a specified number of categories, stores the cluster centers, and fully considers the distribution of the parameters in the clustering operation, which is easy to control the error.
[0053] The error-based division module sorts the clustered classes according to the impact of quantization on network performance (i.e., network loss), and divides all classes into two parts. The quantization part has a large impact on network performance, and the network The part with little performance impact is the retraining part.
[0054] The parameter sharing module quantizes the network param...
Embodiment 2
[0062] This embodiment provides an iterative neural network quantization method based on vector quantization, including the following steps:
[0063] Step S1, clustering, clustering the network parameters, and storing the center of each category;
[0064] Step S2, based on the division of errors, detect the network loss caused by each type of quantization, that is, the quantization loss, and divide all the classes obtained in step S1 into a quantization part and a retraining part according to the quantization loss;
[0065] Step S3, parameter sharing, quantizing the network parameters of the quantized part as the center of the class to which they belong;
[0066] Step S4, retraining, fixing the quantized network parameters, updating the network parameters in the retraining part to compensate for the quantization error, and recovering the precision of the quantized network.
[0067] Further, the step S1 adopts the k-means clustering method.
[0068] Further, the k-means clust...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


