Method and device for accessing fine-grained cache by GPU (Graphics Processing Unit) multi-granularity memory access request
A memory access request, fine-grained technology, applied in the field of cache architecture, can solve the problems of increasing the number of memory access requests in the storage system, increasing the processing delay of memory access instructions, and reducing GPU performance, so as to avoid processing delays and reduce premature Eliminates effects that increase processing speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] The GPU puts the generated memory fetch requests into a FIFO (first in, first out) queue. If the memory access request popped from the FIFO is 32 bytes, the corresponding cache group number is obtained according to the address information of the memory access request, and then the cache column to be accessed by the memory access request is determined according to the cache group number. like figure 1 As shown, assuming its group number is 5, the fetch request will access cache column -1.
[0044] A fetch request first accesses the tag memory to check for a matching tag. If no matching tag is found, the first tri-state gate-1 is not turned on, the output of the NAND gate is true, the second tri-state gate is turned on, the memory access request address information is stored in MSHR-1, and then the MSHR-1 is sent to the merging unit. Since there is only one memory access request at this time, the merge unit directly sends the memory access request to the next-level mem...
Embodiment 2
[0046] If the fetch request popped from the FIFO is 32 bytes, and assuming its bank number is 6, the fetch request will access cache line -2.
[0047] A fetch request first accesses the tag memory to check for a matching tag. At this time, a matching tag is found, that is, the hit signal is true, the output of the NAND gate is false, the first tri-state gate is turned on, and the cache line index number is sent to the data memory as an address signal to access the hit data. At this time, the default value of the index number of the other cache lines is an invalid address for the data memory, so no data access occurs in the other three data memories.
Embodiment 3
[0049] If the memory access request popped from the FIFO is 128 bytes, it needs to be split into four 32-byte memory access requests, that is, its 128-byte-based address is divided into four 32-byte-based addresses. Assuming that its address is a, the four addresses after decomposition are a, a+32, a+64, and a+96 respectively. According to the cache address mapping method, four addresses will be mapped to four consecutive cache groups, that is to say, the four split memory access requests need to access four different cache columns. At this time, four memory access requests are issued in parallel to the tag memories in the four cache groups for matching checking.
[0050] If all four memory access requests are hit, that is, all hit signals are true, all four first three-state gates are turned on and the four cache line index numbers are sent as address information to the four data memories for data access, and The output of the NOT gate is also false.
[0051] If the memory ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



