Large language model text answer method incorporating draft answer and kv cache eviction
By incorporating draft answers and KV cache eviction into a large language model text answering method, the problem of inaccurate answer quality in long context scenarios of KV cache eviction is solved, and more efficient answer generation is achieved under low cache conditions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN INST OF TECH
- Filing Date
- 2025-07-22
- Publication Date
- 2026-06-26
AI Technical Summary
Existing key-value cache eviction methods, in long-context scenarios, fail to reflect the overall contextual text information and are inconsistent with the model's focus, resulting in a decline in response quality.
This paper proposes a text-based answering method for large language models that incorporates draft answers and key-value caching. By segmenting and encoding long text sequences, it retains query vectors at the end of the query vector set. Combined with attention score calculation, it retains important key vectors and value vectors and performs autoregressive operations to generate more accurate answers.
With the same answer accuracy, the KV cache usage is reduced, more accurate answers are generated, and the GPU memory used by the model to generate answers is saved.
Smart Images

Figure CN120849565B_ABST