Quantization method of improving the model inference accuracy

A technology for quantifying parameters and neural network models, applied in the field of artificial intelligence engines, can solve problems such as the decline in accuracy of neural network model inferences

Pending Publication Date: 2020-11-13
BAIDU USA LLC
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, reducing the bitwidth with quantization usually leads to a drastic drop in the inference accuracy of quantized neural network models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Quantization method of improving the model inference accuracy
  • Quantization method of improving the model inference accuracy
  • Quantization method of improving the model inference accuracy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Various embodiments and aspects of the disclosure will be described with reference to details discussed below and illustrated in the accompanying drawings. The following description and drawings are illustrative of the present disclosure and should not be construed as limiting the present disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the present disclosure. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present disclosure.

[0034] Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present disclosure. The appearances of the phrase "in one embodiment" in various places in this specification are not necessarily all referring to the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The disclosure describes various embodiments for quantizing a trained neural network model. In one embodiment, a two-stage quantization method is described. In the offline stage, statically generatedmetadata (e.g., weights and bias) of the neural network model is quantized from floating-point numbers to integers of a lower bit width on a per-channel basis for each layer. Dynamically generated metadata (e.g., an input feature map) is not quantized in the offline stage. Instead, a quantization model is generated for the dynamically generated metadata on a per-channel basis for each layer. The quantization models and the quantized metadata can be stored in a quantization meta file, which can be deployed as part of the neural network model to an AI engine for execution. One or more speciallyprogrammed hardware components can quantize each layer of the neural network model based on information in the quantization meta file.

Description

technical field [0001] Embodiments of the present disclosure relate generally to artificial intelligence (AI) engines. More specifically, embodiments of the present disclosure relate to neural network quantization. Background technique [0002] A branch of artificial intelligence (AI), machine learning can perform tasks without using an application specifically programmed for that task. In contrast, machine learning learns from past examples of a given task during training, which typically involves learning weights from a dataset. [0003] A trained machine learning model (e.g., a neural network model) can perform tasks on input data by inference, and typically uses a 32-bit floating-point representation as the default representation to represent the model's metadata (e.g., weights and biases). During inference, the input feature map can be represented as a 32-bit integer. Larger bitwidths for metadata and input feature maps can severely impact the performance of neural n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/08
CPCG06N3/063G06N3/08H03M7/24H03M7/3059G06N3/045G06N20/20
Inventor 郭敏
Owner BAIDU USA LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products