Supercharge Your Innovation With Domain-Expert AI Agents!

CNN quantification method based on low-precision floating-point number, forward calculation method and device

A quantization method and forward computing technology, applied in the field of deep convolutional neural network quantization, which can solve problems such as low acceleration performance

Active Publication Date: 2020-02-28
深圳市比昂芯科技有限公司
View PDF10 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the current technology improves the quantization and improves the quantization accuracy, there are still several limitations: 1) For quantized deep convolutional neural networks (the number of convolutional layers / fully connected layers exceeds 100 layers), retraining is required to Guaranteed accuracy; 2) Quantization needs to use 16-bit floating-point numbers or 8-bit specific points to ensure accuracy; 3) Under the premise of not using retraining and ensuring accuracy, the current technology can only achieve two at most in one DSP. multiplication operations, resulting in lower acceleration performance on the FPGA

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • CNN quantification method based on low-precision floating-point number, forward calculation method and device
  • CNN quantification method based on low-precision floating-point number, forward calculation method and device
  • CNN quantification method based on low-precision floating-point number, forward calculation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] In view of the need for retraining in the prior art to ensure accuracy, quantization requires the use of 16-bit floating-point numbers or 8-bit specific points to ensure accuracy. The quantization method of this application uses the low-precision floating-point number representation MaEb, and does not require retraining In the case of , you can find the optimal data representation form, only need a 4-bit or 5-bit mantissa, to ensure that the loss of top-1 / top-5 accuracy is negligible, and the loss of top-1 / top-5 accuracy The amount is within 0.5% / 0.3%, respectively, as follows:

[0073] The CNN quantization method based on low-precision floating-point numbers includes the following steps in each layer of each convolutional neural network:

[0074] Step 1: define the low-precision floating-point representation MaEb of the network, the low-precision floating-point representation includes a sign bit, a mantissa and an exponent, wherein a and b are positive integers;

[00...

Embodiment 2

[0098] Based on Embodiment 1, this embodiment provides a convolutional layer forward calculation method, including the following steps in the convolutional neural network:

[0099] Step a: Quantize the input data of single-precision floating-point numbers into floating-point numbers of MaEb in the form of low-precision floating-point numbers, the input data includes input activation values, weights and biases, and a and b are positive integers;

[0100] Step b: Distribute the floating-point numbers of MaEb to the parallel N in the floating-point number function module m A low-precision floating-point multiplier performs forward calculation to obtain a full-precision floating-point product, where N m Represents the number of low-precision floating-point number multipliers of a processing unit PE in the floating-point number function module;

[0101] Step c: transmitting the product of the full-precision floating-point number to the data conversion module to obtain a fixed-poin...

Embodiment 3

[0128] Based on embodiment 1 or 2, this embodiment provides a device, such as image 3 As shown, it includes a floating-point number function module of a customized circuit or a floating-point number function module of a non-customized circuit; the floating-point number function module is used to distribute input data to different processing units PE and perform parallel calculations through low-precision floating-point number representations. is the dot product of MaEb floating-point numbers, and completes the forward calculation of the convolutional layer;

[0129] The floating-point function module includes n parallel processing units PE, and the processing unit PE realizes N m A MaEb floating point multiplier, wherein, n is a positive integer, a, b are both positive integers, N m Indicates the number of low-precision floating-point number multipliers of a processing unit PE.

[0130] Each processing unit PE includes 4T parallel branches, and each parallel branch contains...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a CNN quantification method based on a low-precision floating-point number and a forward calculation method and device, and relates to the field of deep convolutional neural network quantification. The quantification method comprises the following steps that a low-precision floating-point number representation form MaEb of a network is defined; in the process of optimizingthe low-precision floating-point number representation form, the optimal low-precision floating-point number representation form and the optimal scaling factor corresponding to the minimum mean squareerror value are obtained by changing the scaling factor, changing the combination of a and b and calculating the weight and activation value mean square error before and after quantification; based on the low-precision floating point number representation form and the optimal scale factor, single-precision floating points are quantified into low-precision floating points. According to the method,the low-precision floating-point number representation form MaEb is calculated and used, so that the accuracy of network quantization is ensured under the condition that retraining is not needed; under the condition that the accuracy is guaranteed, the acceleration performance of a customized circuit or a non-customized circuit is greatly improved, the customized circuit is an ASIC or an SOC, andthe non-customized circuit comprises an FPGA.

Description

technical field [0001] The invention relates to the field of deep convolutional neural network quantization, in particular to a CNN quantization method, a forward calculation method and a device based on low-precision floating-point numbers. Background technique [0002] In recent years, the application of AI (Artificial Intelligence, artificial intelligence) has penetrated into many aspects, such as face recognition, game battle, image processing, simulation, etc., although the processing accuracy has been improved, but because the neural network contains many layers and A large number of parameters require a very large calculation cost and storage space. In this regard, technicians have proposed a neural network compression processing scheme, that is, by changing the network structure or using quantization and approximation methods to reduce network parameters or storage space, and reduce network cost and storage space without greatly affecting neural network performance. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/082G06N3/045
Inventor 吴晨王铭宇徐世平
Owner 深圳市比昂芯科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More