Quantization method of improving the model inference accuracy

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for quantifying parameters and neural network models, applied in the field of artificial intelligence engines, can solve problems such as the decline in accuracy of neural network model inferences

Pending Publication Date: 2020-11-13

BAIDU USA LLC

View PDF2 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, reducing the bitwidth with quantization usually leads to a drastic drop in the inference accuracy of quantized neural network models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] Various embodiments and aspects of the disclosure will be described with reference to details discussed below and illustrated in the accompanying drawings. The following description and drawings are illustrative of the present disclosure and should not be construed as limiting the present disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the present disclosure. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present disclosure.

[0034] Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present disclosure. The appearances of the phrase "in one embodiment" in various places in this specification are not necessarily all referring to the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The disclosure describes various embodiments for quantizing a trained neural network model. In one embodiment, a two-stage quantization method is described. In the offline stage, statically generatedmetadata (e.g., weights and bias) of the neural network model is quantized from floating-point numbers to integers of a lower bit width on a per-channel basis for each layer. Dynamically generated metadata (e.g., an input feature map) is not quantized in the offline stage. Instead, a quantization model is generated for the dynamically generated metadata on a per-channel basis for each layer. The quantization models and the quantized metadata can be stored in a quantization meta file, which can be deployed as part of the neural network model to an AI engine for execution. One or more speciallyprogrammed hardware components can quantize each layer of the neural network model based on information in the quantization meta file.

Description

technical field [0001] Embodiments of the present disclosure relate generally to artificial intelligence (AI) engines. More specifically, embodiments of the present disclosure relate to neural network quantization. Background technique [0002] A branch of artificial intelligence (AI), machine learning can perform tasks without using an application specifically programmed for that task. In contrast, machine learning learns from past examples of a given task during training, which typically involves learning weights from a dataset. [0003] A trained machine learning model (e.g., a neural network model) can perform tasks on input data by inference, and typically uses a 32-bit floating-point representation as the default representation to represent the model's metadata (e.g., weights and biases). During inference, the input feature map can be represented as a 32-bit integer. Larger bitwidths for metadata and input feature maps can severely impact the performance of neural n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/063G06N3/08

CPCG06N3/063G06N3/08H03M7/24H03M7/3059G06N3/045G06N20/20

Inventor 郭敏

Owner BAIDU USA LLC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Quantization method of improving the model inference accuracy

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology