Unlock instant, AI-driven research and patent intelligence for your innovation.

Adaptive quantization method adaptive to neural network accelerator running on FPGA

An adaptive quantization and neural network technology, applied in the field of adaptive quantization and neural network accelerators, can solve problems such as the inability to ensure the correctness of calculation results and the overflow problem, and achieve easy deployment and implementation, save storage space and computing resources, The effect of ensuring correctness

Pending Publication Date: 2022-02-01
XIAN MICROELECTRONICS TECH INST
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the existing quantization method that does not consider the overflow problem that may occur during the integer data operation process when deployed on the FPGA, and cannot ensure the correctness of the calculation results, and provides a method that is adapted to run on FPGA Adaptive Quantization Methods for Neural Network Accelerators on

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive quantization method adaptive to neural network accelerator running on FPGA
  • Adaptive quantization method adaptive to neural network accelerator running on FPGA
  • Adaptive quantization method adaptive to neural network accelerator running on FPGA

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] Taking a neural network model with only one Convolution layer as an example, the present invention will be further described in detail.

[0031] (1) If the invention is not adopted, that is, the overflow problem that may occur during the integer data operation process is not considered when deployed on the FPGA, then the calculation process and results are deduced as follows:

[0032] The parameters of the Convolution layer in this neural network model are pad=0, stride=1, the input size is 1×3×3, and its value is The filter size is 2×2 and its value is The bias is 1.213093. The process of deploying the neural network model on the neural network accelerator is as follows:

[0033] (a) The quantization coefficient is pre-calculated by the quantization software

[0034] Take the quantization bit width as 8 as an example. The quantization parameter th is obtained by the quantization software in = 11.32375,th w =8.134685,th iout = 153.37865. The quantization coeff...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an adaptive quantization method adaptive to a neural network accelerator running on an FPGA, and belongs to the field of neural networks. The overflow degree of the neural network accelerator during calculation is automatically pre-judged according to the actual bit width in the calculation process of the neural network accelerator, the quantization parameters are adaptively adjusted according to the overflow degree, the problem of data overflow in the calculation process of a neural network algorithm on an FPGA is avoided, and therefore the correctness of a neural network model result is guaranteed. According to the adaptive quantification method, quantification operation is combined with resource planning of neural network accelerator hardware, the correctness of a result when the neural network accelerator deploys an algorithm is ensured, the model scale can be effectively reduced on the premise that the model precision and the execution efficiency are not lost, the method is easy to deploy and implement under the condition that resources are limited, storage space and computing resources are saved, and the method has important research significance and application value.

Description

technical field [0001] The invention belongs to the field of neural networks, in particular to an adaptive quantization method adapted to a neural network accelerator running on FPGA. Background technique [0002] In order to achieve high-speed and low-power calculations, neural network accelerators generally support low numerical precision, such as 8-bit or 6-bit fixed-point calculations, while the original numerical precision of neural network models is generally 32-bit floating-point numbers. Therefore, when a neural network accelerator deploys a neural network algorithm, it is necessary to automatically compress the neural network model into an 8-bit or 6-bit integer network through a quantization operation. [0003] The foreign giant Nvidia dominates the neural network accelerator market based on its complete GPU+CUDA ecology, and maps floating-point data to integer data. However, its products are expensive, do not have independent controllability, and GPU computing per...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/02
CPCG06N3/02
Inventor 魏璐马钟王月娇杨超杰
Owner XIAN MICROELECTRONICS TECH INST