Neural network low-bit quantization method

A technology of neural network and quantization method, applied in the field of data compression, can solve the problems of large memory and limit the application of neural network, and achieve the effect of strong practicability, improving quantization efficiency and improving accuracy

Pending Publication Date: 2021-02-19
BEIJING TSINGMICRO INTELLIGENT TECH CO LTD
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] As a result, higher requirements are placed on the computing performance of the device, and the increasing network scale and power consumption have gradually become the main obstacles that limit the application of neural networks.
The ever-increasing neural network leads to more and more memory required for computing the neural network, and the increase of the network model also requires greater bandwidth, which greatly limits the application of neural networks in embedded devices

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network low-bit quantization method
  • Neural network low-bit quantization method
  • Neural network low-bit quantization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023]The specific embodiments of the present invention will be described below in conjunction with the drawings.

[0024]Such asfigure 1 As shown, an embodiment of the present invention is a neural network low-bit quantization method, including S101 to S114.

[0025]S101: Obtain an initial neural network, and acquire the weight values ​​and biases of C channels in the initial neural network, and each channel includes a convolutional layer.

[0026]The initial neural network can be applied to tasks such as image classification, target detection and natural language processing. The initial neural network has been trained. The initial neural network is a floating-point storage operation, that is, it turns out that a weight needs to be represented by float32, that is, the initial neural network is a floating-point operation.

[0027]Quantify the initial neural network, that is, convert floating-point operations into integer storage operations, and realize the compression technology of the model's ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a neural network low-bit quantification method. The weight value of each channel of a neural network is quantified to a low-bit fixed-point weight. And the method also includes obtaining an input quantization coefficient of the current convolution layer according to the target quantization threshold and the floating point quantization interval; taking the input quantization coefficient of the next convolution layer as the output quantization coefficient of the current convolution layer; quantizing the input floating point data to obtain input fixed point data; and quantizing the output floating point data to obtain output fixed point data; and converting the scaling factor and the bias into a scaling factor fixed-point value and a bias fixed-point value respectively; according to the low-bit fixed-point weight, the input fixed-point data, the output fixed-point data, the scaling factor fixed-point value and the offset fixed-point value, obtaining a neural network, and applying the quantized neural network model to embedded equipment.

Description

Technical field[0001]The invention relates to the field of data compression, in particular to a neural network low-bit quantization method.Background technique[0002]Neural network technology has achieved good results in tasks including image classification, target detection, and natural language processing. In order to improve recognition accuracy, the scale of neural network models has been increasing, and the complexity of the model has been increasing.[0003]Subsequently, higher requirements have been placed on the computing performance of the equipment, and the increasing network scale and power consumption have gradually become the main obstacles limiting the application of neural networks. The ever-increasing neural network has led to the need for more and more memory when computing the neural network. At the same time, the increase of the network model also requires greater bandwidth, which greatly limits the application of neural networks in embedded devices.Summary of the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/06
CPCG06N3/06G06N3/045
Inventor 张书瑞欧阳鹏尹首一
Owner BEIJING TSINGMICRO INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products