Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

After-training quantization compression method and system in speech recognition task

A technology of quantization compression and speech recognition, which is applied in neural learning methods, complex mathematical operations, biological neural network models, etc., can solve problems such as increased calculation errors, inconsistent numerical distribution, and increased calculations, so as to improve reasoning efficiency and be easy to implement , to avoid the effect of calculating the cost

Pending Publication Date: 2022-01-04
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Quantization is divided into quantization training and post-training quantization. The former requires additional training, which will increase a large amount of calculations, and is not suitable for situations where training data is difficult to obtain due to privacy or permission issues.
[0004] For post-training quantization, the common method is to use a uniform scaling ratio for the weight matrix. However, the distribution of values ​​in each row and column of the weight matrix is ​​often inconsistent, which will increase the calculation error caused by quantization; in addition, when selecting the optimal scaling ratio, it is often used The minimum information loss (KL divergence) criterion does not necessarily mean the minimum loss of model accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • After-training quantization compression method and system in speech recognition task
  • After-training quantization compression method and system in speech recognition task
  • After-training quantization compression method and system in speech recognition task

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0021] First, the terms that may be used in this article are explained as follows:

[0022] The terms "comprising", "comprising", "containing", "having" or other descriptions with similar meanings shall be construed as non-exclusive inclusions. For example: including certain technical feature elements (such as raw materials, components, ingredients, carriers, dosage forms, materials, dimensions, parts, components, mechanisms, devices, steps, proced...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an after-training quantization compression method and system in a speech recognition task. The related method comprises the steps: carrying out the scaling of an input vector and each row of a weight matrix through a diagonal matrix, carrying out quantization, and carrying out estimation to obtain a theoretical expectation error of a calculation result after quantization; setting that truncation does not occur in calculation after quantization of a scaled weight matrix, establishing corresponding constraint conditions, and solving a diagonal matrix which enables a theoretical expected error to be minimum by using an alternative iteration optimization algorithm; or, introducing a pre-truncation boundary to adjust the constraint condition, and then using an alternative iteration optimization algorithm to solve the diagonal matrix which enables the theoretical expectation error to be minimum. According to the scheme and the quantification method, through finer scaling and an optimization algorithm based on theoretical derivation, while the model precision loss is well controlled, the model reasoning efficiency is obviously improved, that is, the storage space and the operation time are obviously reduced, and the application scene of a speech recognition model is expanded.

Description

technical field [0001] The invention relates to the technical field of deep learning and artificial intelligence, in particular to a post-training quantization compression method and system in speech recognition tasks. Background technique [0002] The speech recognition task is the process of converting speech into text, and it is also one of the most important and common tasks in the field of deep learning and artificial intelligence. At present, deep neural network models, such as end-to-end VGG-Transformer model, Conformer model, etc., often achieve the best results in speech recognition tasks, and the further improvement of the effect is usually accompanied by the increase of model complexity, which brings problems. The ever-increasing storage and computing requirements limit its application and deployment in scenarios such as mobile terminals and embedded devices. [0003] In order to reduce the storage and computing costs of the deep speech recognition model, the fol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04G06F17/16
CPCG06N3/082G06F17/16G06N3/045
Inventor 杨周旺胡云鹤王星宇杜叶倩
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products