A neural network position encoding method and system based on learnable power-law decay
By configuring learnable power-law decay parameters for the attention head in the self-attention module, the problem of insufficient extrapolation capability of the self-attention mechanism in long sequence processing is solved, achieving better long-range dependency modeling and model adaptability, and improving the performance and stability of the Transformer model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JIANGXI QIANAN ELECTRONIC TECH CO LTD
- Filing Date
- 2026-01-16
- Publication Date
- 2026-06-19
AI Technical Summary
Existing self-attention mechanisms lack extrapolation capabilities when dealing with sequences longer than those used in training, leading to a decline in model performance. Furthermore, existing positional encoding methods lack adaptability and flexibility when modeling long-range dependencies.
We employ a neural network position encoding method based on learnable power-law decay. By configuring learnable decay parameters for each attention head in the self-attention module and updating these parameters during training, we combine the position bias matrix and the Softmax function to calculate the final attention weights, thereby enhancing the model's length extrapolation capability and adaptability.
It significantly improves the stability and performance of the model in long sequence processing, better preserves key long-distance information, enhances the model's expressive power and flexibility, and is highly compatible and easy to integrate into existing Transformer architectures.
Smart Images

Figure CN121543673B_ABST