A neural network position encoding method and system based on learnable power-law decay

By configuring learnable power-law decay parameters for the attention head in the self-attention module, the problem of insufficient extrapolation capability of the self-attention mechanism in long sequence processing is solved, achieving better long-range dependency modeling and model adaptability, and improving the performance and stability of the Transformer model.

CN121543673BActive Publication Date: 2026-06-19JIANGXI QIANAN ELECTRONIC TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JIANGXI QIANAN ELECTRONIC TECH CO LTD
Filing Date
2026-01-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing self-attention mechanisms lack extrapolation capabilities when dealing with sequences longer than those used in training, leading to a decline in model performance. Furthermore, existing positional encoding methods lack adaptability and flexibility when modeling long-range dependencies.

Method used

We employ a neural network position encoding method based on learnable power-law decay. By configuring learnable decay parameters for each attention head in the self-attention module and updating these parameters during training, we combine the position bias matrix and the Softmax function to calculate the final attention weights, thereby enhancing the model's length extrapolation capability and adaptability.

Benefits of technology

It significantly improves the stability and performance of the model in long sequence processing, better preserves key long-distance information, enhances the model's expressive power and flexibility, and is highly compatible and easy to integrate into existing Transformer architectures.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121543673B_ABST
    Figure CN121543673B_ABST
Patent Text Reader

Abstract

This invention provides a neural network positional encoding method and system based on learnable power-law decay. The method involves: acquiring each query vector and key vector in the input sequence within the self-attention module; processing the query vector and key vector using a positional encoding method; calculating an initial attention score based on the processed query vector and key vector, and forming an attention score matrix from these initial attention scores; configuring a learnable decay parameter for each attention head in the self-attention module, which is part of the neural network model and updated during training using an optimization algorithm; calculating a corresponding positional bias matrix for each attention head based on the learnable decay parameter; determining a corrected attention score matrix based on the attention score matrix and the positional bias matrix; and applying the Softmax function to the corrected attention score matrix to calculate the final attention weights.
Need to check novelty before this filing date? Find Prior Art