A Quantization Method for Deep Neural Networks Based on Elastic Significant Bits

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of deep neural network and quantization method, which is applied in the field of deep neural network quantization based on elastic effective bits, can solve the problems of difficult step function, decreased precision, and reduced computational efficiency of quantized models, achieving efficient multiplication calculations, improving accuracy, The effect of low quantization loss

Active Publication Date: 2021-06-22

NANKAI UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

None of these distributions are suitable for the bell-shaped distribution, but in most cases, the DNN weights / activation values are bell-shaped distributed, which shows that APoT cannot adapt to most DNN quantization, which will bring a large decrease in accuracy

figure 2 The displayed APoT is used to quantize the complex step function of the projection, which is difficult to implement with simple scaling and rounding functions, so its time complexity and space complexity have reached O(n), which will greatly The reduced computational efficiency of the quantized model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] Embodiments of the present invention provide a deep neural network quantization method based on elastic effective digits, which quantizes fixed-point numbers or floating-point numbers into quantized values with elastic effective digits, and discards redundant mantissa parts. The flexible significant bit is reserved from the most significant bit, and there are a limited number of significant bits.

[0041] For a given data v, from the position of its most significant bit, specify k+1 significant bits, expressed as follows:

[0042]

[0043] Among them, the part from (n-k) to n is the effective part reserved, and the part from 0(-∞) to (n-k-1) is the mantissa part that needs to be rounded; quantization of fixed-point or floating-point numbers:

[0044] P(v)=R(v>>n-k)<

[0045] Among them, >> and << are shifting operations, and R() is a rounding operation.

[0046] Such as image 3 As shown, in this embodiment, 91 is quantized, and the flexible valid bits are s...

Embodiment 2

[0049] Embodiments of the present invention provide a deep neural network quantization method based on elastic effective digits, which quantizes fixed-point numbers or floating-point numbers into quantized values with elastic effective digits, and discards redundant mantissa parts. The flexible significant bit is reserved from the most significant bit, and there are a limited number of significant bits.

[0050] For a given data v, from the position of its most significant bit, specify k+1 significant bits, expressed as follows:

[0051]

[0052] Among them, the part from (n-k) to n is the effective part reserved, and the part from 0(-∞) to (n-k-1) is the mantissa part that needs to be rounded; quantization of fixed-point or floating-point numbers:

[0053] P(v)=R(v>>n-k)<

[0054] Among them, >> and << are shifting operations, and R() is a rounding operation.

[0055] Such as Figure 5 As shown, in this embodiment, 92 is quantized, and the flexible valid bits are ...

Embodiment 3

[0058] It is to quantitatively evaluate the distribution difference between the quantitative value and the original data by means of a feasible solution:

[0059] The quantization weight is W, which is sampled from random variables t~p(t), and the set of all quantization values is Q, and the distribution difference function is defined as follows:

[0060]

[0061] s.t.S=(q-q l ,q+q u ]

[0062] where (q-q l ,q+q u ] represents the range of continuous data that can be projected onto q-values, the range is centered at q, and q 1 and q u Indicates its floating range. Its solution diagram is as follows Figure 7 shown. Distribution differences can be used to evaluate optimal quantification at different elastic significands.

[0063] Input: There is a DNN weight w sampled from the standard normal distribution N(0, 1) f , it needs to be quantized to a low-precision value of 4 bits;

[0064] Output: the optimal effective number of digits, quantized weight w q .

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a deep neural network quantization method based on elastic effective digits, which quantizes fixed-point numbers or floating-point numbers into quantized values with elastic effective digits, discards redundant mantissa parts, and adopts a feasible solution method to quantitatively evaluate quantization The distribution of values differs from the original data. The present invention has quantized values with flexible effective bits. Through different effective bits, the distribution of quantized values can cover a series of bell-shaped distributions from long tail to uniform, adapting to the weight / activation distribution of DNNs, thereby ensuring low precision loss; The multiplication calculation can be realized by multiple shift additions on the hardware, which improves the overall efficiency of the quantization model; the distribution difference function quantitatively estimates the quantization loss caused by different quantization schemes, and can select the optimal quantization scheme under different conditions. Achieve lower quantization loss and improve the accuracy of the quantization model.

Description

technical field [0001] The invention belongs to the technical field of deep neural network compression, in particular to a deep neural network quantization method based on elastic valid bits. Background technique [0002] Deep Neural Networks (DNNs) quantization is an effective way to compress DNNs, which can significantly improve the computational efficiency of DNNs and enable the network to be deployed on edge computing platforms with limited resources. One of the most common ways to achieve quantization in DNNs is to project high-precision floating-point values into low-precision quantized values. For example, DNNs with 32-bit floating point can achieve 32x model size compression by replacing weights with only one bit, and even reduce complex multiplication operations to simple bit operations on hardware. Therefore, in the case of fewer bit values, DNN quantization can significantly reduce the computational scale or memory footprint, thereby improving computational eff...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/08G06N3/063

CPCG06N3/063G06N3/082

Inventor 龚成卢冶李涛

Owner NANKAI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Quantization Method for Deep Neural Networks Based on Elastic Significant Bits

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology