Task processing method and device, storage medium and program product

By identifying shared prefixes in query statements within the RAG system and employing a dynamic scheduling strategy, the problem of wasted computing resources was solved, and the optimization of video memory and computing resources was achieved, thereby improving the system's computational efficiency and stability.

CN119759529BActive Publication Date: 2026-06-23INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2024-12-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies in RAG systems suffer from wasted computing resources, especially in Error Correction Retrieval Enhanced Generation (CRAG), where document scoring tasks with the same or similar problems are indiscriminately assigned to multiple GPU nodes, resulting in duplicate execution and waste of computing resources.

Method used

By identifying the target shared prefix in the query statement and adopting a dynamic scheduling strategy based on execution duration, multiple tasks can share key-value resources on the GPU node during execution, reducing the need for each node to maintain an independent kvcache and avoiding redundant computation.

Benefits of technology

It significantly reduces memory usage and computational load, avoids resource waste, improves the computational efficiency and resource utilization of the GPU inference framework, and ensures load balancing and system stability among tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119759529B_ABST
    Figure CN119759529B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a task processing method and device, a storage medium and a program product, and relate to the field of computers. The method comprises: in the case of adding a first task to a first queue, sending the first task to a first node matched with the first queue, storing a first key-value resource matched with the first task to a first storage space by the first node, and processing the first task based on the first key-value resource; in the case of the execution duration of the first task reaching a first predetermined duration, adding a second task matched with the first task to the first queue; in the case of determining that the execution duration of the first task reaches a second predetermined duration, sending the second task to the first node, storing a second key-value resource matched with the second task to the first storage space by the first node, and processing the second task based on a target resource in the second key-value resource and the first key-value resource. The above scheme solves the technical problem of wasting computing resources.
Need to check novelty before this filing date? Find Prior Art