A neural network position encoding method and system based on learnable power-law decay

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By configuring learnable power-law decay parameters for the attention head in the self-attention module, the problem of insufficient extrapolation capability of the self-attention mechanism in long sequence processing is solved, achieving better long-range dependency modeling and model adaptability, and improving the performance and stability of the Transformer model.

CN121543673BActive Publication Date: 2026-06-19JIANGXI QIANAN ELECTRONIC TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JIANGXI QIANAN ELECTRONIC TECH CO LTD
Filing Date: 2026-01-16
Publication Date: 2026-06-19

Application Information

Patent Timeline

16 Jan 2026

Application

19 Jun 2026

Publication

CN121543673B

IPC: G06N3/10; G06N3/084; G06N3/045

AI Tagging

Application Domain

Computer simulations

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Large model calling method, computer device, readable storage medium and program product
CN122240210AExecution paradigmsComputer simulationsResource informationTheoretical computer science
Multi-agent causal discovery
WO2026128557A1Mathematical models Inference methods
Chatbot for defining a machine learning (ML) solution
US20260170363A1Natural language translation Ensemble learning
A multi-event extraction method and device, electronic equipment and storage medium
CN116628178BDigital data information retrieval Natural language data processing EngineeringConnectivity graph
Automatic operator fusion method for computational graph and related product
US20260170057A1Resource allocation Other databases indexing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing self-attention mechanisms lack extrapolation capabilities when dealing with sequences longer than those used in training, leading to a decline in model performance. Furthermore, existing positional encoding methods lack adaptability and flexibility when modeling long-range dependencies.

Method used

We employ a neural network position encoding method based on learnable power-law decay. By configuring learnable decay parameters for each attention head in the self-attention module and updating these parameters during training, we combine the position bias matrix and the Softmax function to calculate the final attention weights, thereby enhancing the model's length extrapolation capability and adaptability.

Benefits of technology

It significantly improves the stability and performance of the model in long sequence processing, better preserves key long-distance information, enhances the model's expressive power and flexibility, and is highly compatible and easy to integrate into existing Transformer architectures.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121543673B_ABST

Patent Text Reader

Abstract

This invention provides a neural network positional encoding method and system based on learnable power-law decay. The method involves: acquiring each query vector and key vector in the input sequence within the self-attention module; processing the query vector and key vector using a positional encoding method; calculating an initial attention score based on the processed query vector and key vector, and forming an attention score matrix from these initial attention scores; configuring a learnable decay parameter for each attention head in the self-attention module, which is part of the neural network model and updated during training using an optimization algorithm; calculating a corresponding positional bias matrix for each attention head based on the learnable decay parameter; determining a corrected attention score matrix based on the attention score matrix and the positional bias matrix; and applying the Softmax function to the corrected attention score matrix to calculate the final attention weights.

Need to check novelty before this filing date? Find Prior Art