Task processing method and device, storage medium and program product
By identifying shared prefixes in query statements within the RAG system and employing a dynamic scheduling strategy, the problem of wasted computing resources was solved, and the optimization of video memory and computing resources was achieved, thereby improving the system's computational efficiency and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies in RAG systems suffer from wasted computing resources, especially in Error Correction Retrieval Enhanced Generation (CRAG), where document scoring tasks with the same or similar problems are indiscriminately assigned to multiple GPU nodes, resulting in duplicate execution and waste of computing resources.
By identifying the target shared prefix in the query statement and adopting a dynamic scheduling strategy based on execution duration, multiple tasks can share key-value resources on the GPU node during execution, reducing the need for each node to maintain an independent kvcache and avoiding redundant computation.
It significantly reduces memory usage and computational load, avoids resource waste, improves the computational efficiency and resource utilization of the GPU inference framework, and ensures load balancing and system stability among tasks.
Smart Images

Figure CN119759529B_ABST