Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Neural network model quantification method, device and system, electronic equipment and storage medium

A neural network model and quantification method technology, applied in electronic equipment and storage media, devices and systems, neural network model quantization method field, can solve problems such as quantization error, increase hardware design overhead, neural network prediction performance loss, etc. The effect of improving accuracy, reducing distribution differences, and reducing quantization errors

Pending Publication Date: 2022-01-07
NANJING HOUMO TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of realizing the present disclosure, the inventors found that, in the existing quantization algorithm that uses the layer of the neural network as the granularity for quantization, the weights of each layer of the neural network share a set of quantization parameters, because the weights of different channels in the same layer of the neural network Generally, the distribution of the distribution will be very different. Using a set of quantization parameters will cause larger quantization errors for channels with a smaller distribution range. If a lower number of quantization bits is used, the prediction performance of the neural network will suffer a greater loss; the existing The channel of the neural network is a quantization algorithm for granular quantization. The weight of an output channel of each layer of neural network shares a set of quantization parameters. Since the quantization parameters of each channel are different, fine-grained operation of different channels will increase hardware design overhead and reduce The operating efficiency of the hardware; the existing quantization algorithm is quantized with the storage and calculation unit array as the granularity. Each layer of the neural network is mapped to multiple storage and calculation unit arrays, and the weight of each storage and calculation unit array shares a set of quantization parameters. The weights of different channels are stored in an array of storage and calculation units, and their distributions will also be quite different, and there will be quantization errors when using a set of quantization parameters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network model quantification method, device and system, electronic equipment and storage medium
  • Neural network model quantification method, device and system, electronic equipment and storage medium
  • Neural network model quantification method, device and system, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that relative arrangements of components and steps, numerical expressions and numerical values ​​set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

[0039] Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices or modules, etc. necessary logical sequence.

[0040] It should also be understood that in the embodiments of the present disclosure, "plurality" may refer to two or more than two, and "at least one" may refer to one, two or more than two.

[0041] It should also be understood that any component, data or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more unless there i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a quantization method, device and system for a neural network model, electronic equipment and a medium. The method comprises the steps: obtaining a weight matrix of any to-be-quantized network layer for the to-be-quantized network layer in the to-be-quantized neural network model; carrying out matrix transformation on the weight matrix of any one to-be-quantized network layer to obtain a to-be-quantized weight matrix of any one to-be-quantized network layer; quantizing the to-be-quantized weight matrix of the any to-be-quantized network layer to obtain a quantized weight matrix of the any to-be-quantized network layer; and obtaining a quantized neural network model based on the quantized weight matrix of any to-be-quantized network layer in the to-be-quantized neural network model. According to the embodiment of the invention, the distribution difference of the weight data of each channel in the weight matrix can be reduced, the quantization error can be reduced, and the precision of the quantized neural network can be improved.

Description

technical field [0001] The present disclosure relates to artificial intelligence technology, especially a quantification method, device and system, electronic equipment and storage medium of a neural network model. Background technique [0002] With the rapid development of artificial intelligence, the application of artificial neural network is more and more extensive. The main operation in the artificial neural network is the matrix-vector multiplication operation, such as the convolutional layer and the fully connected layer, etc. are matrix-vector multiplication operations. The memory-computing integrated neural network accelerator integrates the computing unit into the storage unit, which can efficiently run matrix-vector multiplication, thereby reducing the frequent data interaction between the computing unit and the storage unit to a large extent, and can also greatly reduce the intermediary data. Data interaction of off-chip main memory. Therefore, the use of memor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/063
CPCG06N3/08G06N3/063
Inventor 袁之航陈亮赵亦彤王辉吴强
Owner NANJING HOUMO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products