Check patentability & draft patents in minutes with Patsnap Eureka AI!

Method and device for accessing fine-grained cache by GPU (Graphics Processing Unit) multi-granularity memory access request

A memory access request, fine-grained technology, applied in the field of cache architecture, can solve the problems of increasing the number of memory access requests in the storage system, increasing the processing delay of memory access instructions, and reducing GPU performance, so as to avoid processing delays and reduce premature Eliminates effects that increase processing speed

Pending Publication Date: 2022-06-24
CIVIL AVIATION UNIV OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the cache structure requires the GPU to use the fine-grained cache line size as the merge window when merging memory access requests, resulting in more fine-grained memory access requests generated by regular memory access instructions, which not only increases the The processing delay of the memory instruction also increases the number of memory access requests in the storage system, which ultimately reduces the performance of the GPU. [6]

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for accessing fine-grained cache by GPU (Graphics Processing Unit) multi-granularity memory access request
  • Method and device for accessing fine-grained cache by GPU (Graphics Processing Unit) multi-granularity memory access request
  • Method and device for accessing fine-grained cache by GPU (Graphics Processing Unit) multi-granularity memory access request

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] The GPU puts the generated memory fetch requests into a FIFO (first in, first out) queue. If the memory access request popped from the FIFO is 32 bytes, the corresponding cache group number is obtained according to the address information of the memory access request, and then the cache column to be accessed by the memory access request is determined according to the cache group number. like figure 1 As shown, assuming its group number is 5, the fetch request will access cache column -1.

[0044] A fetch request first accesses the tag memory to check for a matching tag. If no matching tag is found, the first tri-state gate-1 is not turned on, the output of the NAND gate is true, the second tri-state gate is turned on, the memory access request address information is stored in MSHR-1, and then the MSHR-1 is sent to the merging unit. Since there is only one memory access request at this time, the merge unit directly sends the memory access request to the next-level mem...

Embodiment 2

[0046] If the fetch request popped from the FIFO is 32 bytes, and assuming its bank number is 6, the fetch request will access cache line -2.

[0047] A fetch request first accesses the tag memory to check for a matching tag. At this time, a matching tag is found, that is, the hit signal is true, the output of the NAND gate is false, the first tri-state gate is turned on, and the cache line index number is sent to the data memory as an address signal to access the hit data. At this time, the default value of the index number of the other cache lines is an invalid address for the data memory, so no data access occurs in the other three data memories.

Embodiment 3

[0049] If the memory access request popped from the FIFO is 128 bytes, it needs to be split into four 32-byte memory access requests, that is, its 128-byte-based address is divided into four 32-byte-based addresses. Assuming that its address is a, the four addresses after decomposition are a, a+32, a+64, and a+96 respectively. According to the cache address mapping method, four addresses will be mapped to four consecutive cache groups, that is to say, the four split memory access requests need to access four different cache columns. At this time, four memory access requests are issued in parallel to the tag memories in the four cache groups for matching checking.

[0050] If all four memory access requests are hit, that is, all hit signals are true, all four first three-state gates are turned on and the four cache line index numbers are sent as address information to the four data memories for data access, and The output of the NOT gate is also false.

[0051] If the memory ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for accessing a fine-grained cache by a GPU (Graphics Processing Unit) multi-granularity memory access request, comprising the following steps of: when a memory access request accesses the cache and the memory access request is 128 bytes, splitting the current memory access request into four continuous 32-byte sub-memory access requests for parallel processing; judging whether the four sub memory access requests are all hit or not, if so, accessing the data memory and returning hit data, and ending the process; if not, data missing occurs in the 128-byte memory access request, only data of the hit sub-memory access requests are returned, and the four sub-memory access requests are combined into a new memory access request to be sent to a next-level memory; when the new memory access request carrying the missing data is returned from the next-level memory, splitting the new memory access request into four 32-byte sub memory access requests again, and then storing the missing data required by the sub memory access requests which are missing previously into the cache, and ending the process. According to the invention, congestion caused by too many access requests is reduced.

Description

technical field [0001] The invention relates to the field of cache (cache memory) architecture in a GPU (graphics processing unit), and in particular to a method for accessing a fine-grained cache by GPU multi-granularity memory access request, thereby increasing the number of accessible cache lines and supporting coarse Parallel issue of granular fetch requests. Background technique [0002] In recent years, GPU has developed into a general high-performance computing platform due to its powerful parallel computing capability, and more and more applications with parallel computing characteristics have begun to use GPU for acceleration. At the same time, in order to meet the storage bandwidth requirements of parallel computing, GPUs deploy register files, caches, and shared memory on-chip. [1] , while off-chip is equipped with GDDR5 (Double Data Rate Memory for Fifth Edition Graphics) or HBM (High Bandwidth Memory) with higher bandwidth than DDR (Double Data Rate Memory) [2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/30G06F9/38G06F12/0806G06T1/20
CPCG06F9/3012G06F9/3816G06F12/0806G06T1/20
Inventor 李炳超赵静玉赵柏潼徐龙文佳伟
Owner CIVIL AVIATION UNIV OF CHINA
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More