Low-bit quantization method and system for large language model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By performing a two-stage rotation transformation and low-rank decomposition on the activation data matrix of a large language model, the problem of accuracy loss caused by massive activations and uneven weight distribution in ultra-low bit quantization of large language models is solved, achieving efficient model compression and accuracy improvement, which is suitable for edge devices and cloud deployment.

CN122197988APending Publication Date: 2026-06-12BEIJING SILICONFLOW TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING SILICONFLOW TECHNOLOGY CO LTD
Filing Date: 2026-03-12
Publication Date: 2026-06-12

Application Information

Patent Timeline

12 Mar 2026

Application

12 Jun 2026

Publication

CN122197988A

IPC: G06N3/0495; G06N3/045; G06N3/0499; G06N5/04; G06F18/25

AI Tagging

Application Domain

Biological models Inference methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing large language models suffer severe accuracy loss during ultra-low bit quantization due to massive activations and uneven weight distribution. Existing methods cannot effectively solve the data distribution problem within the model, leading to a catastrophic decrease in model accuracy after quantization.

Method used

By performing a two-stage rotation transformation optimization on the activation data matrix of a large language model, including uniform preprocessing and data-driven fine optimization, the raster-to-standard deviation ratio is reduced. Combined with low-rank decomposition and row-level fine-tuning, the weight and residual distribution are optimized, and finally low-bit quantization is performed.

Benefits of technology

It significantly improves the accuracy of large language models in ultra-low bit scenarios, achieves efficient model storage compression and inference efficiency, and is suitable for resource-constrained edge devices and cloud deployments.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122197988A_ABST

Patent Text Reader

Abstract

The application belongs to the technical field of large language models, and relates to a low-bit quantization method and system for a large language model. Two-stage rotation transformation of uniform preprocessing and data-driven fine optimization is performed on activation data of the model to reduce the grid-standard deviation ratio of the activation value, so that the distribution is more suitable for the quantization grid. Then, the rotation matrix is applied to the weight, and low-rank decomposition and residual distribution processing are performed on the weight to separate the low-rank part reserved with high precision and the to-be-quantized residual part optimized by row-level fine tuning. Finally, low-bit quantization is performed on the residual to generate a deployable model, effectively solving the problem that the large language model has serious precision loss in the ultra-low-bit (such as W4A4) scene due to uneven distribution of a large number of activations and weights in the post-training quantization of the ultra-low-bit (such as W4A4). Without model retraining, the ultra-low-bit quantization precision, model storage, memory occupation and inference efficiency are significantly improved.

Need to check novelty before this filing date? Find Prior Art