Instruction cache system and instruction acquisition method

By introducing a shared instruction cache system into the GPU, the pipeline stall problem caused by dedicated caches for computing units is solved, resulting in more efficient instruction processing and computational throughput.

CN122285083APending Publication Date: 2026-06-26SUZHOU YIZHU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU YIZHU INTELLIGENT TECH CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing GPU architectures, each computing unit is equipped with a dedicated instruction cache, which causes subsequent requests to wait when an instruction is missed, resulting in pipeline stalls and reducing instruction cache utilization and computing throughput.

Method used

By employing multiple computing units sharing a shared instruction cache, out-of-order processing and response are achieved by preloading instructions from memory and continuing to process other requests even when a cache miss occurs.

Benefits of technology

It improves instruction processing efficiency, eliminates pipeline stalls, and enhances overall instruction access efficiency and GPU computing throughput.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122285083A_ABST
    Figure CN122285083A_ABST
Patent Text Reader

Abstract

This disclosure proposes an instruction caching system and instruction fetching method. The system includes multiple computing units, each including at least one execution unit; a shared instruction cache, coupled to the multiple computing units, for caching instructions and processing concurrent instruction access requests from the multiple computing units; and a first memory, coupled to the shared instruction cache, for storing instructions executed by the multiple computing units. The shared instruction cache also preloads instructions from the first memory and performs out-of-order processing on instruction access requests from the multiple computing units. When a first instruction access request misses, an instruction fetch request is sent to the first memory while simultaneously processing a second instruction access request and returning the corresponding instruction response data. This disclosure reduces storage resource waste, improves instruction processing efficiency, eliminates pipeline stalls, and enhances overall instruction access efficiency and GPU computational throughput by having multiple computing units share a single shared instruction cache.
Need to check novelty before this filing date? Find Prior Art