Method and apparatus for encoding weight parameters of large language model

By applying randomized Hadamard transform and linearly constrained quantization grid mapping to the weight parameters of large language models, the problem of scalar quantization and vector quantization being unable to be simultaneously achieved in existing technologies is solved. This results in high quantization accuracy and computational efficiency at low bit depths, making it suitable for edge devices and large-scale service scenarios.

CN122242597APending Publication Date: 2026-06-19启元实验室

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
启元实验室
Filing Date
2026-05-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies cannot simultaneously leverage the advantages of scalar and vector quantization when performing vector quantization, resulting in performance degradation and low computational efficiency of the model at low bit settings.

Method used

By performing random Hadamard transform on the weight parameters of the large language model and using a mapping grid composed of multiple discrete vectors for mapping transformation, combined with a linearly constrained quantization grid, the organic integration of scalar quantization and vector quantization is achieved. Affine transformation and LDL decomposition are used to optimize the quantization process.

Benefits of technology

It maintains high quantization accuracy and computational efficiency with low bit settings, reduces storage space requirements, and improves the model's deployment capability in edge devices and large-scale service scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242597A_ABST
    Figure CN122242597A_ABST
Patent Text Reader

Abstract

This application proposes a method and apparatus for encoding weight parameters of a large language model. The method includes performing a randomized Hadamard transform on the weight parameters of the large language model; using a mapping grid composed of multiple discrete vectors to perform a mapping transformation on the transformed weight parameters to obtain discrete vectors corresponding to the transformed weight parameters; and using the obtained discrete vectors corresponding to the transformed weight parameters to quantize and encode the transformed weight parameters to obtain integer vectors corresponding to the weight parameters. According to an example embodiment of this application, by introducing a linearly constrained mapping grid, scalar quantization and vector quantization are organically integrated. This achieves the flexibility of vector quantization during the quantization process while maintaining the simplicity of scalar quantization in terms of computational structure. Furthermore, compared to scalar quantization, it can have higher degrees of freedom and better fit the weight distribution; compared to vector quantization, it has better structural integrity.
Need to check novelty before this filing date? Find Prior Art