Iterative Neural Network Quantization Method and System Based on Vector Quantization

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neural network and network technology, applied in the field of neural network quantization solutions, can solve problems such as low compression efficiency and large convolution layer bits, and achieve the effects of improving performance, ensuring network performance, and good scalability

Active Publication Date: 2020-11-10

SHANGHAI JIAOTONG UNIV

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, it is not optimal to use the same class of parameter gradients and update parameters

The quantization of the convolutional layer in the article uses a large number of bits, and the compression efficiency is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0051] This embodiment provides an iterative neural network quantization system based on vector quantization, including: a clustering module, an error-based division module, a parameter sharing module and a retraining module, wherein:

[0052] The clustering module makes full use of the distribution of the parameters itself to control the quantization error, that is, clusters the network parameters into a specified number of categories, stores the cluster centers, and fully considers the distribution of the parameters in the clustering operation, which is easy to control the error.

[0053] The error-based division module sorts the clustered classes according to the impact of quantization on network performance (i.e., network loss), and divides all classes into two parts. The quantization part has a large impact on network performance, and the network The part with little performance impact is the retraining part.

[0054] The parameter sharing module quantizes the network param...

Embodiment 2

[0062] This embodiment provides an iterative neural network quantization method based on vector quantization, including the following steps:

[0063] Step S1, clustering, clustering the network parameters, and storing the center of each category;

[0064] Step S2, based on the division of errors, detect the network loss caused by each type of quantization, that is, the quantization loss, and divide all the classes obtained in step S1 into a quantization part and a retraining part according to the quantization loss;

[0065] Step S3, parameter sharing, quantizing the network parameters of the quantized part as the center of the class to which they belong;

[0066] Step S4, retraining, fixing the quantized network parameters, updating the network parameters in the retraining part to compensate for the quantization error, and recovering the precision of the quantized network.

[0067] Further, the step S1 adopts the k-means clustering method.

[0068] Further, the k-means clust...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides an iterative neural network quantization system based on vector quantization, including: a clustering module, an error-based division module, a parameter sharing module and a retraining module, wherein: the clustering module makes full use of the distribution of the parameters itself to control Quantization error; the error-based division module divides the network parameters into two parts, quantization and retraining; the parameter sharing module quantifies the divided quantization part; the retraining module fixes the quantized parameters, and updates the parameters of the retraining part to make up for the quantization error , recovering the accuracy of the quantized network. The four-part iterative process is performed until all parameters of the network are quantized. At the same time, an iterative neural network quantization method based on vector quantization is provided. The invention can quantify the 32-bit floating point of the neural network into 4 bits without losing the network precision, and has high practical value.

Description

technical field [0001] The invention relates to a neural network quantization scheme, in particular to an iterative neural network quantization method and system based on vector quantization. Background technique [0002] Deep convolutional neural networks have achieved great success in computer vision fields such as image classification, object detection, and semantic segmentation. The excellent performance of deep convolutional networks is caused by many factors. In addition to more and more data resources and increasingly powerful computing hardware, a large number of learnable parameters is the most important factor. In order to achieve a high accuracy rate, the design of the neural network is developing in a wider and deeper direction, which brings a great burden on computing and storage resources. Deploying deep networks on mobile devices has become more difficult. For example, the VGG-16 model has 138.34 million parameters and takes up about 500MB of storage space....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/08G06K9/62

CPCG06N3/08G06F18/23213

Inventor 熊红凯徐宇辉

Owner SHANGHAI JIAOTONG UNIV

Iterative Neural Network Quantization Method and System Based on Vector Quantization

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology