Large language model text answer method incorporating draft answer and kv cache eviction

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By incorporating draft answers and KV cache eviction into a large language model text answering method, the problem of inaccurate answer quality in long context scenarios of KV cache eviction is solved, and more efficient answer generation is achieved under low cache conditions.

CN120849565BActive Publication Date: 2026-06-26HARBIN INST OF TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HARBIN INST OF TECH
Filing Date: 2025-07-22
Publication Date: 2026-06-26

Application Information

Patent Timeline

22 Jul 2025

Application

26 Jun 2026

Publication

CN120849565B

IPC: G06F16/3329; G06F40/289; G06N5/04

AI Tagging

Technology Topics

Engineering Data mining

Technical Efficacy Phrases

Think comprehensivelyAnswer accurately

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineering Light filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data setDescent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangementHeating and refrigeration combinationsHeat flow Working fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories EngineeringSediment
Credit text analysis method, credit object auditing method and credit object auditing device
CN114386430AReduce labor costs Improve efficiency Finance Semantic analysisCredit cardEngineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing key-value cache eviction methods, in long-context scenarios, fail to reflect the overall contextual text information and are inconsistent with the model's focus, resulting in a decline in response quality.

Method used

This paper proposes a text-based answering method for large language models that incorporates draft answers and key-value caching. By segmenting and encoding long text sequences, it retains query vectors at the end of the query vector set. Combined with attention score calculation, it retains important key vectors and value vectors and performs autoregressive operations to generate more accurate answers.

Benefits of technology

With the same answer accuracy, the KV cache usage is reduced, more accurate answers are generated, and the GPU memory used by the model to generate answers is saved.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120849565B_ABST

Patent Text Reader

Abstract

The large language model text answer method integrating the draft answer and KV cache eviction belongs to the field of large language model text answer generation. The large language model answer method based on the existing KV cache eviction method has the problem of low answer quality. The information of the draft answer is used, so that the small part of the KV cache (K2 and V2) retained is more important, and the attention score is introduced in the process of obtaining the KV cache (K2 and V2) retained, so that the information consideration is more comprehensive, and the model generation will obtain more accurate answers. The present application is mainly applied to the answer of the large language model to the text question.

Need to check novelty before this filing date? Find Prior Art