Storage access device, storage access method, and electronic device
By using a scheduling mechanism with buffer queue items and a scheduling module in the storage access device, the conflict problem of multi-core concurrent access to shared memory is solved, achieving efficient data transmission in the event of memory access conflicts, and reducing hardware resource overhead and circuit layout area.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI BIREN TECH CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-12
AI Technical Summary
When multiple cores concurrently access shared memory, memory access conflicts lead to increased memory access latency, decreased bandwidth utilization, and reduced computer system efficiency.
The storage access device employs multiple buffer queues and a scheduling module. The scheduling module stores access information with storage access conflicts into different buffer queues, and the information is output to the storage array through a shared buffer queue. This avoids changing the access address and dynamically adapts to the conflict situation of different storage groups.
In the event of a memory access conflict, data writing or reading operations can still be completed in each clock cycle, saving hardware resource consumption and reducing circuit layout area.
Smart Images

Figure CN122195873A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of integrated circuits, and more specifically to a memory access device, a memory access method, and an electronic device. Background Technology
[0002] As computing system performance improves, the demands for faster memory access response times are increasing, making concurrent access to shared memory by multiple cores commonplace. However, when multiple threads request different addresses of the same memory bank within the same clock cycle, the hardware cannot respond simultaneously and must queue sequentially, resulting in memory access conflicts. This directly leads to a significant increase in memory access latency, decreased bandwidth utilization, and negatively impacts the operating efficiency of the computer system. Summary of the Invention
[0003] At least one embodiment of this disclosure provides a storage access device, comprising: a plurality of buffer queues, a first scheduling module, and a second scheduling module; wherein the first scheduling module is connected to the input ports of the plurality of buffer queues respectively, and is configured to, in response to a storage access conflict arising from the access addresses of at least two access requests, store the access information corresponding to each of the at least two access addresses into different buffer queues respectively; the second scheduling module is connected to the output ports of the plurality of buffer queues respectively, and is configured to determine the corresponding output access information in the plurality of buffer queues to the first buffer queue of the storage array; wherein the plurality of buffer queues are shared by a plurality of storage groups in the storage array.
[0004] At least one embodiment of this disclosure provides a storage access method, which includes: in response to a storage access conflict between the access addresses of at least two access requests, storing access information corresponding to the at least two access addresses into different buffer queue entries in a plurality of buffer queue entries; determining a first buffer queue entry in the plurality of buffer queue entries that corresponds to outputting access information to a storage array, so as to output the access information stored in the first buffer queue entry to a target storage group of a plurality of storage groups in the storage array; wherein the plurality of buffer queue entries are shared by the plurality of storage groups.
[0005] This disclosure provides at least some embodiments of an electronic device, which includes a storage access device and a storage array provided in any embodiment of this disclosure. Attached Figure Description
[0006] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings of the embodiments will be briefly described below. Obviously, the drawings described below only relate to some embodiments of this disclosure and are not intended to limit this disclosure.
[0007] Figure 1A schematic block diagram of a general-purpose graphics processing unit (GPGPU) is shown.
[0008] Figure 2 An exemplary memory organization diagram is shown.
[0009] Figure 3 An organizational chart of an exemplary computer system including storage access devices, provided in at least one embodiment of the present disclosure, is shown.
[0010] Figure 4 A schematic diagram illustrating an example operating logic of a storage access device provided in at least one embodiment of the present disclosure is shown.
[0011] Figure 5 An exemplary flowchart of a storage access method provided by at least one embodiment of the present disclosure is shown.
[0012] Figure 6 An exemplary block diagram of an electronic device provided by at least one embodiment of the present disclosure is shown.
[0013] Figure 7 An exemplary block diagram of an electronic device provided by at least one embodiment of the present disclosure is shown. Detailed Implementation
[0014] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the described embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0015] Unless otherwise defined, the technical or scientific terms used in this disclosure shall have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms “first,” “second,” and similar terms used in this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an,” “a,” or “the,” and similar terms do not indicate a quantity limitation, but rather indicate the presence of at least one. The terms “including,” “comprising,” or “containing,” and similar terms mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. The terms “connected,” “linked,” or similar terms are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. The terms “upper,” “lower,” “left,” and “right,” etc., are used only to indicate relative positional relationships, and these relative positional relationships may change accordingly when the absolute position of the described objects changes.
[0016] The present disclosure will now be described through several specific embodiments. To keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and components may be omitted. When any component of the embodiments of the present disclosure appears in more than one drawing, the component is represented by the same or similar reference numerals in each drawing.
[0017] In the field of artificial intelligence, it is necessary to utilize AI processors for efficient data computation. AI processors can be graphics processing units (GPUs), tensor processing units (TPUs), general-purpose graphics processing units (GPGPUs), deep learning processing units (DPUs), accelerated processing units (APUs), neural network processing units (NPUs), etc. The following description uses a general-purpose computing on graphics processing units (GPGPUs) as an example to illustrate the AI processor.
[0018] Figure 1 A schematic structural diagram of a general-purpose graphics processing unit (GPGPU) provided in at least one embodiment of this disclosure is shown. It will be understood that... Figure 1 The GPGPU shown may also include additional units and / or modules not shown.
[0019] like Figure 1 As shown, a general-purpose graphics processor includes an array of programmable multiprocessors, such as a streaming processor cluster (SPC), for example, including... Figure 1 The diagram shows streaming processor clusters 1, ..., M, where M is a positive integer greater than 1. In a general-purpose graphics processor, one streaming processor cluster handles one computational task, or multiple streaming processor clusters handle one computational task. Multiple streaming processor clusters share data through a global cache or global memory.
[0020] like Figure 1 As shown, taking streaming processor cluster 1 as an example, one streaming processor cluster includes multiple computing units, such as... Figure 1 The system is structured as Computation Unit 1, Computation Unit 2, ..., Computation Unit N, where N is a positive integer. Each Computation Unit (CU) performs arithmetic and logical operations, such as accumulation, reduction, and standard addition, subtraction, multiplication, and division. A Computation Unit includes multiple cores (also called computational kernels), each containing an Arithmetic Logic Unit (ALU), a floating-point unit, etc., which are used to execute specific computational tasks. Furthermore, the Computation Unit also includes buffer queue items (e.g., ...). Figure 1 The buffer queue item heap and shared memory are used to hierarchically store source and destination data related to computing tasks. The shared memory in a computing unit is used to share data between the cores of that computing unit.
[0021] like Figure 1 As shown, each computing unit also provides a tensor core for performing tensor-related computations, such as tensor shrinking operations. Tensor cores can accelerate tensor operations such as matrix multiplication. Tensor cores in multiple computing units can be scheduled and controlled uniformly.
[0022] like Figure 1 As shown, each streaming processor cluster also provides a buffer for caching data across the N computing units in the streaming processor cluster.
[0023] In parallel computing, computational tasks are typically executed by multiple threads. These threads are divided into multiple thread blocks before execution in a general-purpose graphics processor (or parallel computing processor), and then dispatched via a thread block distribution module. Figure 1(Not shown in the image) Multiple thread blocks are distributed to various computation units. All threads in a thread block must be assigned to the same computation unit for execution. Simultaneously, thread blocks are broken down into minimum execution thread bundles (or simply warps), each containing a fixed number (or less than this fixed number) of threads, for example, 32 threads. Multiple thread blocks can execute in the same computation unit or in different computation units.
[0024] In each computing unit, the thread beam scheduling / distribution module ( Figure 1 (Not shown in the diagram) Thread bundles are scheduled and allocated so that multiple computing cores within the computing unit can run thread bundles. Depending on the number of computing cores in the computing unit, multiple thread bundles within a thread block can be executed concurrently or in a time-sharing manner. Multiple threads within each thread bundle execute the same instructions. Memory-executed instructions are issued to shared memory (composed of SRAM) within the computing unit, or further to intermediate-level cache, global cache, or global memory (composed of DRAM, e.g.) Figure 1 High Bandwidth Memory (HBM) is used for read and write operations.
[0025] Taking computing unit 1 as an example, when each core in the computing unit performs calculations, each core loads the data to be calculated from global memory or global cache. For example, each core can first transfer the data in the global cache or global memory to its corresponding memory via shared memory. Figure 1 The buffer queue item heap shown facilitates computation by each core based on the data in the buffer queue item heap. For example, after computation unit 1 completes the computation and obtains the corresponding result, it stores the result in global memory. For instance, the result can be first transferred from the buffer queue item heap corresponding to each core to shared memory, and then transported from shared memory to the global cache or global memory for storage. Therefore, the response efficiency of access requests to shared memory, global cache, or global memory (hereinafter referred to as "memory") determines the overall data computation efficiency of the GPGPU.
[0026] In the field of artificial intelligence, the data used for computation is usually multidimensional arrays, such as matrices, images, and feature maps. When loading and storing this data, bank conflicts often occur due to multiple access requests to the storage array. For example, the access information corresponding to multiple access requests must be written into the same storage bank at the same time.
[0027] Figure 2 An exemplary organizational structure diagram of a memory is shown.
[0028] like Figure 2 As shown, the memory 10 includes control logic 101, address buffer queue entries 102, row decoder 103, column decoder 104, sensitive amplifier 105, data buffer 106, memory bank groups 0~X and memory banks 0~Y (X and Y are positive integers).
[0029] For example, the storage hierarchy of the storage array 10 may include a bank group, a bank, a row, and a column; wherein each bank group includes multiple banks, each bank includes multiple rows, and each row includes multiple columns.
[0030] For example, the control logic 101 of the storage array 10 is configured to receive access requests, and the address buffer queue 102 is configured to parse the access address (i.e., the physical address) based on the received access request (e.g., a read request or a write request) to determine the address fields in the access address. These address fields may include the address field of the storage group, the address field of the storage unit, the row address field, and the column address field. Then, the corresponding storage group and storage unit can be selected based on the storage group address field and the storage unit address field in the address field. For example, after parsing the storage group address field, a gating mechanism can be used to select the target storage group, and after parsing the storage unit address field, a gating mechanism can be used to select the target storage unit.
[0031] Then, the target row (i.e., word line) in the storage array is activated by sending the row address to the row decoder 103 of the corresponding storage bank; the column address is sent to the column decoder 104 of the storage bank to locate the target column (i.e. bit line), thereby locking the specific storage bit in the storage array by cross-referencing the row address field and the column address field.
[0032] Subsequently, the weak signal of the storage capacitor is detected and amplified by the sensitive amplifier 105 built into each storage unit, and finally the data is matched with the data bus rate and transmitted bidirectionally via the data buffer 106.
[0033] For example, a request to access a row requires a precharge operation on the currently active row before activating the target row and executing the corresponding read / write command. Therefore, only one access request can be responded to at any given time (e.g., within the same clock cycle) within the same memory bank. Consequently, when multiple write or read requests simultaneously access different rows within the same memory bank, a bank conflict occurs, causing latency in the memory array, reducing data read / write efficiency, and ultimately impacting the processor's data processing efficiency.
[0034] To resolve storage access conflicts (hereinafter referred to as "access conflicts" or "access address conflicts"), one approach is to use hash calculations to process physical addresses, redirecting some of the physical addresses accessing the same storage device to other storage devices to avoid response delays caused by access conflicts. However, this method modifies the original physical address through calculation, requiring complex formulas to accurately reconstruct the original physical address for correct access request responses, which increases the burden on the entire processing logic.
[0035] At least one embodiment of this disclosure provides a storage access device, which includes: a plurality of buffer queue items, a first scheduling module, and a second scheduling module; wherein, the first scheduling module is connected to the input ports of the plurality of buffer queue items respectively, and is configured to, in response to a storage access conflict between the access addresses of at least two access requests, store the access information corresponding to each of the at least two access addresses into different buffer queue items respectively; the second scheduling module is connected to the output ports of the plurality of buffer queue items respectively, and is configured to determine the corresponding output access information in the plurality of buffer queue items to the first buffer queue item of the storage array; wherein, the plurality of buffer queue items are shared by a plurality of storage groups in the storage array.
[0036] In the storage access apparatus of the above embodiments of this disclosure, by configuring multiple buffer queue entries shared by multiple memory banks in a storage array, and by scheduling at least two access addresses with memory bank access conflicts according to a first scheduling module and a second scheduling module, memory bank access conflicts in the storage array can be scheduled and processed without changing the access addresses. Furthermore, since the multiple buffer queue entries in this storage access apparatus are shared by multiple memory banks, it can dynamically adapt to different memory bank access conflicts occurring in different memory banks, thus eliminating the need to add additional hardware resources for each memory bank group to handle access conflicts, saving hardware overhead and consequently saving hardware circuit layout area.
[0037] At least one embodiment of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals will be used to refer to the same parts described in different drawings.
[0038] Figure 3 An organizational chart of an exemplary computer system including storage access devices, provided in at least one embodiment of the present disclosure, is shown.
[0039] like Figure 3 As shown, the computer system 20 includes a processing module 201, control logic 101, storage access device 200, and storage array 204.
[0040] For example, the processing module 201 may be a processor or a processor core, such as... Figure 1 One or more cores in the GPGPU shown. The processor can be any processing circuitry with processing capabilities implemented in hardware or firmware, and can include a single processing core (also called a computing core) or multiple processing cores. For example, processor 10 can be a central processing unit (CPU) or coprocessor, a microcontroller unit (MCU), or a digital signal processor (DSP); for example, the coprocessor can be an artificial intelligence processor such as a general-purpose computing on graphics processing units (GPGPU), an accelerator (e.g., a graphics accelerator or digital signal processing unit), a graphics processing unit (GPU), etc., or a programmable logic array or any other processor with instruction execution capabilities, etc., and the embodiments of this disclosure are not limited thereto.
[0041] For example, control logic 101 is Figure 2 The control logic in the memory 10 shown may be, for example, a memory controller.
[0042] For example, storage array 204 may include Figure 2 The memory bank 10 shown contains memory bank groups 0~X and memory bank 0~Y.
[0043] After receiving an access request, the computer system 20 can use its internal instruction pipeline structure to determine the access address (i.e., physical address) of the request. The determined access address is then sent to the control logic 101, which handles bank conflicts. For example, the execution unit within the processing module 201 can send multiple access requests, including access addresses, to the control logic 101.
[0044] Control logic 101 parses all pending access addresses received within a single clock cycle to determine whether there is a memory access conflict. For example, it can compare the memory bank address fields of each access address within a single clock cycle. If there are two or more access requests accessing the same memory bank in the same memory bank group within that clock cycle, it is determined that at least two access requests have a memory access conflict.
[0045] The ideal access mode for the storage array 204 is that a new access request can be received every clock cycle (i.e., 1 cycle in) and a data write or read operation can be completed every clock cycle (i.e., 1 cycle out).
[0046] However, when multiple access requests received by control logic 101 simultaneously attempt to access different rows within the same bank, control logic 101 is constrained by the row cycle time (tRC), limiting it to activating only one row within the same bank at a time. When controlling access to a new target row, control logic 101 must first precharge the currently accessed row within the bank before activating the new target row. This process takes 2-3 clock cycles (i.e., row switching latency), making it impossible to switch between different rows within the same bank within a single clock cycle. In other words, when multiple access addresses experience bank access conflicts, although the request is initiated within that clock cycle (1-cycle in), the row switching latency caused by the bank access conflict prevents data from returning within that clock cycle (i.e., 1-cycle out).
[0047] For example, when the control logic 101 detects that multiple access requests have memory access conflicts, the storage access device 200 processes the multiple access requests to achieve an ideal access mode for the storage array 204 (that is, to achieve 1 Cycle in and 1 Cycle out).
[0048] The storage access device 200 includes multiple buffer queue items Slot1 to SlotK (K>1 and K is a positive integer), a first scheduling module 202, and a second scheduling module 203.
[0049] The first scheduling module 202 is connected to the input ports of multiple buffer queue items Slot1~SlotK respectively, and is configured to store the access information corresponding to the at least two access addresses into different buffer queue items in response to the memory access conflict between the access addresses of at least two access requests.
[0050] The second scheduling module 203 is connected to the output ports of multiple buffer queue items Slot1~SlotK respectively, and is configured to determine the corresponding output access information of the multiple buffer queue items to the first buffer queue item of the storage array 204.
[0051] For example, multiple buffer queue items Slot1 to SlotK are shared by multiple memory banks in memory array 204.
[0052] "First buffer queue entry" refers to the buffer queue entry selected from multiple buffer queue entries in the current clock cycle for output access information to the memory array. The first buffer queue entry can refer to one buffer queue entry or multiple buffer queue entries.
[0053] A buffer slot is a hardware storage unit that temporarily stores information in a first-in-first-out (FIFO) queue. For example, a buffer slot can be an SRAM block or a register, and this disclosure does not limit it.
[0054] For example, an access request can be a read request or a write request. The access address for a read request is the read address, and the access address for a write request is the write address. For example, for a write request, the access information is the accessed data (i.e., the data being written). For example, for a read request, the access information is the accessed metadata (i.e., the read request information). For example, read request information may include information such as the access address corresponding to the read request. For example, for at least two write requests with storage access conflicts, the write data corresponding to each write request can be stored in different buffer queue entries. For example, for at least two read requests with storage access conflicts, the read request information corresponding to each read request can be stored in different buffer queue entries.
[0055] Therefore, after processing by the storage access device 200, the first scheduling module 202 can ensure that even if there is a storage access conflict, a new access request can be received in each clock cycle (i.e., 1Cycle In). The second scheduling module outputs the access information in the first buffer queue to the storage array 204 by determining the first buffer queue item, so that even if there is a storage access conflict, a data write or read operation can be completed in each clock cycle (i.e., 1Cycle Out).
[0056] Furthermore, by sharing multiple buffer queue entries among multiple bank groups, even if a bank group experiences a large number of access conflicts within a certain period, it is not necessary to configure additional buffer queue entries for that bank group. This allows for dynamic adaptation to different bank group access conflict scenarios, saving hardware resources and thus reducing chip circuit layout area. For example, if each of the multiple bank groups had its own bank access conflict handling capability, at least twice the number of corresponding hardware units (e.g., handling two access conflicts) would be required. However, even with these hardware units, if a bank group experiences a larger number of access conflicts, the additional hardware units would be insufficient to handle the increased number of conflicts. Instead, each bank group would need to add more hardware units based on the maximum possible number of access conflicts, resulting in wasted hardware resources and increased chip circuit layout area.
[0057] It should be noted that when multiple storage groups share multiple buffer queue items, the multiple storage groups can specify to share certain buffer queue items according to a certain rule, or they can not set a rule and each buffer queue item in the multiple buffer queue items can be shared by each storage group. This disclosure does not restrict the specific sharing method.
[0058] In order to further reduce the hardware area in the storage access device, in some embodiments of this disclosure, the first number of multiple buffer queue items is less than twice the second number of multiple memory bank groups, and the first number is greater than or equal to the maximum number of memory bank access conflicts, which is the maximum number of memory bank access conflicts that may occur in each memory bank group among the multiple memory bank groups.
[0059] For example, in four memory banks, if the maximum number of memory bank access conflicts that may occur in two of the memory bank groups is 2 (i.e., two access conflicts), and the maximum number of memory bank access conflicts that may occur in the other two memory bank groups is 3 (i.e., three access conflicts), then the maximum number of memory bank access conflicts for the four memory bank groups is 3. That is, the first number of multiple buffer queue items can be any integer in the range [3, 8).
[0060] It should be noted that the first quantity, the second quantity, and the maximum number of storage access conflicts are all integers. The specific values of the second quantity and the maximum number of storage access conflicts can be determined according to actual needs, and this disclosure does not impose any restrictions on them.
[0061] The following access request is a write request, and the buffer queue entries of the storage access device are composed of registers, which will be explained in detail below.
[0062] Figure 4A schematic diagram illustrating an example operating logic of a storage access device provided in at least one embodiment of the present disclosure is shown.
[0063] like Figure 4 As shown, the memory access device includes four buffer queue entries consisting of four registers: Reg0, Reg1, Reg2, and Reg3. Taking the memory receiving two write requests, Req1 and Req2, in the same clock cycle as an example, the write data of write requests Req1 and Req2 is transmitted to the memory array through the data bus. The data bus is the data path between each memory bank group in the memory array and the execution unit of the processor. For example, the data bus can be the data bus of a Network-on-Chip (NoC).
[0064] For example, the data bus can transmit a maximum of 2 KB (KiloByte) of bus data in the same clock cycle. For instance, the data bus supports data transmission of 2 KB, 1 KB, or 512 B (Byte).
[0065] For example, for 1 KB of bus data, there are instances where the 512 B write data of write request Req1 and the 512 B write data of write request Req2 are written to the same memory bank; or, for 1 KB of bus data, there are instances where the 512 B write data of write request Req1 and the 512 B write data of write request Req2 are written to different memory banks.
[0066] For example, each of the four registers Reg0, Reg1, Reg2, and Reg3 has a maximum data capacity of 1KB. For instance, if write requests Req1 and Req2 are written to the same memory bank (an access conflict exists), the 512 bytes of data from write request Req1 can be written to register Reg0, and the 512 bytes of data from write request Req2 can be written to register Reg1. Conversely, if write requests Req1 and Req2 are written to different memory banks (no access conflict exists), and the total size of the data written by both requests matches the register's data capacity (i.e., does not exceed the register's data capacity), then both the 512 bytes of data written by write request Req1 and Req2 can be written to register Reg0.
[0067] It should be noted that the data capacity of the register can be selected according to the actual situation, and this disclosure does not impose any restrictions on it. For example, the data capacity of the register can also be 2KB, etc.
[0068] See also Figure 4For example, if the bus data is 2KB, it can be determined whether to split the 2KB bus data into 2 paths or 1 path (i.e., store it in one register or two registers) based on whether there is a memory access conflict between write requests Req1 and Req2.
[0069] For example, if there is a memory access conflict in response to the access addresses of write requests Req1 and Req2, the write data corresponding to the two requests will be stored in different registers respectively.
[0070] For example, when selecting a register to store data from the four registers Reg0, Reg1, Reg2, and Reg3, certain rules can be followed. For instance, the least significant register can be prioritized, or the write data for write requests Req1 and Req2 can be prioritized to registers Reg0 and Reg1, respectively. Alternatively, the most significant register can also be prioritized, and this disclosure does not impose any restrictions on this.
[0071] In some embodiments of this disclosure, the first scheduling module 202 is further configured to determine the storeable buffer queue items of the current clock cycle based on the working status of multiple buffer queue items in the current clock cycle, wherein each of the storeable buffer queue items is used to store each access information respectively, and the working status of the multiple buffer queue items includes a busy state and an idle state.
[0072] A busy state for a buffer queue item indicates that the current buffer queue item stores access information, while an idle state indicates that the current buffer queue item does not store access information.
[0073] The buffer queue entries that can be stored are those that can store access information in the current clock cycle.
[0074] See also Figure 4 For example, the working state of four registers Reg0, Reg1, Reg2 and Reg3 can be represented by binary code. For example, the working state of the four registers Reg3, Reg2, Reg1 and Reg0 can be represented as 0011, where binary code 0 can represent the idle state of the register and binary code 1 can represent the busy state of the register.
[0075] For example, the storeable buffer queue items for the current clock cycle, determined by the working status of multiple buffer queue items, can also be represented by binary code. For instance, they can be expressed in a code format where the binary code representing each buffer queue item comes first, followed by the binary code representing the storeable buffer queue item.
[0076] For example, the write requests Req1 and Req2, along with the four registers Reg3, Reg2, Reg1, and Reg0, can be expressed using binary code 0011:1100 to represent the working status of each register and which registers are available for storage. The first binary code (working status binary code) 0011 indicates that registers Reg3 and Reg2 are idle, while registers Reg1 and Reg0 are busy. The second binary code (available storage binary code) 1100 indicates the available storage registers selected. For example, binary code 1 indicates that registers Reg3 and Reg2 are available for storage during the current clock cycle, while binary code 0 indicates that registers Reg1 and Reg0 are not available for storage during the current clock cycle. In other words, for write requests Req1 and Req2 that have memory access conflicts, the corresponding data for write requests Req1 and Req2 can be written to registers Reg3 and Reg2 respectively, based on the aforementioned binary code 0011:1100.
[0077] It should be noted that, although Figure 4 Taking a write request as an example, the processing of read requests follows the same principle. The read request corresponds to the read request information described above, therefore, this disclosure will not elaborate on the processing of read requests; although Figure 4 This explanation uses two write requests as an example, but it does not imply that this disclosure imposes a limit on the number of write requests that can be received.
[0078] In some embodiments of this disclosure, the first scheduling module 202 is further configured to determine a third number of storeable buffer queue entries in the second buffer queue entries in response to the fact that the number of buffer queue entries in the second buffer queue entries whose working state is idle is greater than or equal to a third number of at least two access addresses with memory access conflicts.
[0079] The second buffer queue item is the buffer queue item in the idle state among multiple buffer queues; the third quantity is the number of access addresses (access requests) that actually caused memory access conflicts, and the third quantity is less than or equal to the maximum number of memory access conflicts.
[0080] See also Figure 4For write requests Req1 and Req2 that have memory access conflicts and the four registers Reg3, Reg2, Reg1 and Reg0 mentioned above, for example, if the working state binary code 0011 indicates that the number of registers in the idle state (that is, the number of buffer queue entries in the second buffer queue) is 2, since the number of write requests with memory access conflicts in the current clock cycle (that is, the third number) is 2, a register equal to the third number can be selected from registers Reg3 and Reg2 as the register that can be stored, which can be represented by the binary code 1100.
[0081] For example, the underlying principles represented by binary codes 1100:0011, 0101:1010, and 0110:1001 are the same as those described above, and will not be repeated here.
[0082] In some embodiments of this disclosure, the first scheduling module 202 is further configured to determine buffer queue items that can be stored based on the working status of multiple buffer queue items in the current clock cycle and the first buffer queue item.
[0083] See also Figure 4 For write requests Req1 and Req2 that have memory access conflicts and the four registers Reg3, Reg2, Reg1 and Reg0 mentioned above, if the working state binary code is 0001, then you can either choose to store the data in registers Reg3, Reg2 and Reg1, or determine the storage register based on the registers output to the memory array (i.e., the first buffer queue item).
[0084] For example, if only register Reg0 is busy (containing access information) during the current clock cycle, it can be determined that the access information stored in register Reg0 will be output to the memory array and become idle during the current clock cycle. Therefore, if a rule of prioritizing storage in low-order registers has been set, the registers that can be stored can be determined as registers Reg1 and Reg0, which means that the binary code that can be stored can be 0011.
[0085] For example, the principles represented by binary codes 0000:0011, 0010:0011, 0100:0011, and 1000:0011 are all the same as those described above, and will not be repeated here.
[0086] In some embodiments of this disclosure, the first scheduling module 202 is further configured to determine a third number of storeable buffer queue entries from the second buffer queue entries and the first buffer queue entries, in response to the fact that the number of buffer queue entries of the second buffer queue entries and the number of buffer queue entries of the first buffer queue entries being in an idle state is less than a third number of at least two access addresses with memory access conflicts, and in response to the fact that the sum of the number of buffer queue entries of the second buffer queue entries and the number of buffer queue entries of the first buffer queue entries is greater than or equal to the third number.
[0087] See also Figure 4 For write requests Req1 and Req2 with memory access conflicts and the above four registers Reg3, Reg2, Reg1 and Reg0, if the working state binary code is 0111, the number of registers Reg3 in the idle state is 1 (that is, the number of buffer queue entries in the second buffer queue), which is less than the number of write requests with memory access conflicts, which is 2 (that is, the third number). It is necessary to determine the number of registers that will be output to the memory array in the current clock cycle (that is, the number of buffer queue entries in the first buffer queue).
[0088] For example, if it is determined that the current clock cycle register Reg2 (i.e. the first buffer queue entry) will output access information to the memory array, then based on the sum of the number of output registers 1 (i.e. the number of buffer queue entries in the first buffer queue) and the number of idle registers 1, it is determined that the number of write requests 2 is equal to the sum of the above two. Therefore, two registers (i.e., registers Reg3 and Reg2) can be identified as storeable registers (i.e., storeable buffer queue entries) in registers Reg3 and Reg2.
[0089] For example, the first scheduling module 202 can be configured with the same rules for determining the first buffer queue item as the second scheduling module 203, meaning the first scheduling module 202 can independently determine the buffer queue item that can be stored. By configuring the same rules for determining the first buffer queue item in the first scheduling module 202 as in the second scheduling module 203, the first buffer queue item determined by the second scheduling module 203 can be accurately predicted and determined, thereby accurately determining the buffer queue item that can be stored and avoiding errors in the buffer queue item determined in the current period that could cause access information to be overwritten or lost.
[0090] In some embodiments of this disclosure, the second scheduling module 203 is further configured to update the working status of the first buffer queue item from busy to idle in response to the first buffer queue item outputting the corresponding access information to the storage array.
[0091] For example, see continue. Figure 4After determining the first buffer queue entries, the second scheduling module 203 can output the access information in each first buffer queue entry to the data buffer 106 (for example, outputting nKB of access information). Figure 4 In the example, n is a positive integer, thus the access information is input into the storage array through the data buffer 106. The description of data transmission through the data buffer 106 is the same as above, and will not be repeated here.
[0092] For example, after a register that is busy in the current clock cycle outputs access information to the memory array, the working state of that register will be updated in the next clock cycle.
[0093] For example, when two memory access requests with conflicting access are received in each clock cycle, if the initial state of the four registers Reg3, Reg2, Reg1, and Reg0 is all in an idle state (i.e., 0000), then the binary code for the first clock cycle (Cycle1) is: 0000:0011 (access information is stored in registers Reg1 and Reg0 in the first clock cycle); the binary code for the second clock cycle (Cycle2) can be 0011:1100 (no access information can be output to the memory array in the first clock cycle, and access information is stored in registers Reg3 and Reg2); the binary code for the third clock cycle (Cycle3) can be 1110:0011 (access information is output to the memory array in register Reg0 in the second clock cycle, it is determined that access information is expected to be output to the memory array in register Reg1 in the third clock cycle, and access information is stored in registers Reg1 and Reg0 in the third clock cycle).
[0094] See below. Figure 4 The method by which the second scheduling module 203 determines the output register (i.e., the first buffer queue item) is described in detail.
[0095] In some embodiments of this disclosure, the second scheduling module 203 is further configured to determine a first buffer queue item according to the buffer queue item output principle, wherein the buffer queue item output principle includes a priority principle.
[0096] For example, the second scheduling module 203 is further configured to, in response to at least two of the multiple buffer queue entries storing access information, determine the buffer queue entry whose access information has been stored the longest as the first sub-buffer queue entry according to the priority principle, and in response to the sum of the number of first buffer queue entries of the first sub-buffer queue entry and the number of buffer queue entries of the second buffer queue entry being greater than or equal to a third number, determine the first sub-buffer queue entry as the first buffer queue entry.
[0097] like Figure 4As shown, a counter 205 can be provided in the storage access device. This counter 205 is used to count each buffer queue item storing access information according to a clock cycle to determine the storage time of the access information in each buffer queue item. For example, this counter 205 can be implemented by a hardware counter or software, and this disclosure does not limit it in this way.
[0098] For example, see continue. Figure 4 If each clock cycle needs to process two write requests Req1 and Req2 that have memory access conflicts, the binary code representation of Cycle1 for registers Reg3, Reg2, Reg1 and Reg0 is 0000:0011. Since there is no register in Cycle1 that has stored access information for the longest time, access information is not output in Cycle1.
[0099] Since registers Reg1 and Reg0 store access information, the binary code representation of Cycle2 is 0011:1100. Because registers Reg1 and Reg0 are the registers that have stored access information for the longest time in the current clock cycle (e.g., registers Reg1 and Reg0 are both counted as 1), and there is a memory access conflict in the access information stored in registers Reg1 and Reg0, a register can be randomly selected from registers Reg1 and Reg0 for output. Furthermore, since it is not possible to uniquely determine the register to be output from registers Reg1 and Reg0, an idle register can be selected to store the access information received within the Cycle2 cycle.
[0100] For example, the register that outputs access information in Cycle2 is register Reg0, and the binary code representation of Cycle3 is 1110:0011. Since the access information in register Reg1 was stored in Cycle1, which is longer than the access information stored in registers Reg3 and Reg2 in Cycle2 (for example, registers Reg3 and Reg2 are both counted as 1, while register Reg1 is counted as 2), register Reg1 is selected as the register for outputting access information in Cycle3 (that is, register Reg1 is the first sub-buffer queue item determined according to the priority principle).
[0101] Since the first buffer queue entry (register Reg1) has 1 first buffer queue entry and the second buffer queue entry (register Reg0 in the idle state) has 1 buffer queue entry, the sum of the two is 2; therefore, for the two write requests Req1 and Req2 that have memory access conflicts (that is, the third number is 2), they can handle the two write requests Req1 and Req2 that have access conflicts in the current cycle. Therefore, the first sub-buffer queue entry can be determined as the first buffer queue entry.
[0102] For example, the first buffer queue item involved in the first scheduling module 202 when determining the item that can be stored in the buffer queue is also determined in accordance with the output principle of the buffer queue item.
[0103] Based on the example above, the first scheduling module 202 can execute the corresponding binary code: {1110: Reg3_cnt_max?1001; Reg2_cnt_max?0101; 0011}, where “Reg3_cnt_max?” indicates whether register Reg3 is the register that has stored access information for the longest time. Since neither register Reg3 nor register Reg2 has stored access information for the longest time, the first scheduling module 202 determines the current cycle according to the working state binary code and stores the access information of the current cycle into registers Reg1 and Reg0 according to the instruction of the binary code 0011.
[0104] For example, the first scheduling module 202 can also execute various binary codes such as the following:
[0105] {0111:Reg2_cnt_max?1100; Reg1_cnt_max?1010; 1001}
[0106] {1101: Reg3_cnt_max?1010; Reg2_cnt_max?0110;0011} or
[0107] {1011: Reg3_cnt_max?1100; Reg1_cnt_max?0110; 0101} etc.
[0108] Since the principles represented by these binary codes are similar to those of the aforementioned binary codes: {1110: Reg3_cnt_max?1001; Reg2_cnt_max?0101; 0011}, they can be deduced by analogy, and will not be elaborated here.
[0109] For example, if multiple buffer queue entries are all busy in the current clock cycle, even if the first sub-buffer queue entry determined based on the priority principle is used as the first buffer queue entry, the multiple buffer queue entries in the current clock cycle cannot receive two requests with access conflicts. In order to satisfy the memory array to the greatest extent possible when memory access conflicts occur, the memory array can still follow the 1-cycle-to-1-cycle access mode. Based on the above priority principle, it further determines whether there are other registers that can output access information to the memory array.
[0110] For example, in some embodiments of this disclosure, the buffer queue item output principle also includes a storage access conflict principle.
[0111] For example, the second scheduling module 203 is also configured to, in response to the sum of the number of the first buffer queue entries and the number of the second buffer queue entries being less than a third number, determine a second sub-buffer queue entry that does not have a memory access conflict with the first sub-buffer queue entry according to the memory access conflict principle.
[0112] For example, after the second scheduling module 203 determines the buffer queue item (i.e., the first sub-buffer queue item) that has been stored for the longest time using the priority principle, it can determine whether there is a second sub-buffer queue item among the multiple buffer queue items that does not have a memory access conflict with the first sub-buffer queue item.
[0113] Furthermore, the first sub-buffer queue item and the second sub-buffer queue item are defined as the first buffer queue item, and the sum of the number of the first buffer queue items, the number of the second buffer queue items of the second sub-buffer queue item, and the number of the buffer queue items of the second buffer queue item is greater than or equal to the third number.
[0114] See also Figure 4 For write requests Req1 and Req2 and the four registers Reg3, Reg2, Reg1 and Reg0 that have memory access conflicts, if the working status binary code is 1111, then all registers are busy. In this case, when the first scheduling module 202 determines that an item can be stored in the buffer queue, the above memory access conflict principle can also be used.
[0115] For example, the first scheduling module 202 can execute the following binary code:
[0116] {1111: Reg3_cnt_max?((~32_bc) ?1100; (~31_bc) ?1010; (~30_bc) ?1001;0000) ; Reg2_cnt_max?((~23_bc) ?1100; (~21_bc) ?0110; (~20_bc) ?0101; 0000);Reg1_cnt_max?((~13_bc) ?1010; (~12_bc) ?0110; (~10_bc) ?0011; 0000); ((~03_bc) ?1001; (~02_bc) ?0101; (~01_bc) ?0011; 0000))},
[0117] The "(~32_bc) ?" indicates whether there is no memory access conflict between the access information stored in register Reg3 (the longest) and register Reg2. If so, there is no memory access conflict between register Reg3 and register Reg2. This means that in the current clock cycle, the access information of registers Reg3 and Reg2 can be output to the memory array in the current cycle. Therefore, in the current clock cycle, the access information of write requests Req1 and Req2 that have memory access conflicts can be stored in registers Reg3 and Reg2 respectively. That is, the storable binary code in the current clock cycle is 1100.
[0118] The meanings of “(~23_bc) ?” and “(~12_bc) ?” in the binary code above are similar to those of “(~32_bc) ?” above, and can be deduced from the meaning of “(~32_bc) ?”, so they will not be elaborated here.
[0119] By applying different priority principles and memory access conflict principles to the output of buffer queue entries under different memory access conflict conditions, the first buffer queue entry for outputting access information to the memory array in the memory access device can be flexibly determined according to different access conflict conditions. This allows for flexible determination of the buffer queue entries that can be stored in the current clock cycle. This not only reduces the hardware overhead of handling memory access conflicts and the chip area overhead, but also enables the memory array to respond to access requests according to the ideal access pattern (i.e., 1 cycle in, 1 cycle out) to the greatest extent possible.
[0120] It should be noted that the modules in the above-mentioned storage access device can be combined and separated as needed, and this disclosure does not impose any restrictions on this.
[0121] At least one embodiment of this disclosure also provides a storage access method. Figure 5 An exemplary flowchart of a storage access method provided by at least one embodiment of the present disclosure is shown.
[0122] like Figure 5 As shown, the storage access method is executed, for example, by a processor or computer system, and includes the following steps S10 and S20.
[0123] Step S10: In response to the storage access conflict between the access addresses of at least two access requests, the access information corresponding to the at least two access addresses is stored into different buffer queue entries in multiple buffer queue entries respectively.
[0124] Step S20: Determine the first buffer queue entry of the storage array to output the access information corresponding to the multiple buffer queue entries, so as to output the access information stored in the first buffer queue entry to the target storage group of the multiple storage groups in the storage array.
[0125] In this context, multiple buffer queue entries are shared by multiple memory groups.
[0126] For example, the first number of multiple buffer queue items is less than twice the second number of multiple memory groups, and the first number is greater than or equal to the maximum memory access conflict number, which is the maximum number of memory access conflicts that may occur in each memory group among the multiple memory groups.
[0127] In some embodiments of this disclosure, the above-described storage access method further includes: determining the storeable buffer queue items for the current clock cycle based on the working status of multiple buffer queue items in the current clock cycle, wherein each storeable buffer queue item is used to store each access information respectively, and the working status of the multiple buffer queue items includes a busy state and an idle state.
[0128] In some embodiments of this disclosure, the above-described storage access method further includes: in response to a third number of storeable buffer queue entries being determined from the second buffer queue entries, in response to a third number of buffer queue entries whose working state is idle and the number of buffer queue entries in the second buffer queue entries being greater than or equal to the number of at least two access addresses with storage access conflicts.
[0129] In some embodiments of this disclosure, the above-described storage access method further includes: determining a buffer queue item that can be stored based on the working status of multiple buffer queue items in the current clock cycle and a first buffer queue item.
[0130] In some embodiments of this disclosure, the above-described storage access method further includes: in response to the fact that the number of buffer queue entries of a second buffer queue entry whose working state is idle is less than a third number of at least two access addresses with storage access conflicts, and in response to the fact that the sum of the number of buffer queue entries of the second buffer queue entry and the number of buffer queue entries of the first buffer queue entry is greater than or equal to the third number, determining a third number of storeable buffer queue entries in the second buffer queue entry and the first buffer queue entry.
[0131] In some embodiments of this disclosure, the above storage access method further includes: in response to the first buffer queue item outputting the corresponding access information to the storage array, updating the working state of the first buffer queue item to the first scheduling module from a busy state to an idle state.
[0132] In some embodiments of this disclosure, the above storage access method further includes: determining a first buffer queue item according to the buffer queue item output principle, wherein the buffer queue item output principle includes a priority principle.
[0133] For example, in response to at least two buffer queue entries storing access information among multiple buffer queue entries, the buffer queue entry with the longest storage time of access information is determined as the first sub-buffer queue entry according to the priority principle, and in response to the sum of the number of first buffer queue entries of the first sub-buffer queue entry and the number of buffer queue entries of the second buffer queue entry being greater than or equal to the third number, the first sub-buffer queue entry is determined as the first buffer queue entry.
[0134] For example, the buffer queue item output principle also includes the storage access conflict principle.
[0135] In some embodiments of this disclosure, the above-described storage access method further includes: in response to the sum of the number of first buffer queue entries and the number of buffer queue entries of the second buffer queue being less than a third number, determining a second sub-buffer queue entry that does not have a storage access conflict with the first sub-buffer queue entry according to the storage access conflict principle; determining the first sub-buffer queue entry and the second sub-buffer queue entry as the first buffer queue entry, wherein the sum of the number of first buffer queue entries, the number of second buffer queue entries of the second sub-buffer queue, and the number of buffer queue entries of the second buffer queue entry is greater than or equal to the third number.
[0136] The technical effects of the storage access method in the above embodiments of this disclosure are the same as those of the storage access device described above, and therefore will not be repeated.
[0137] Figure 6 An exemplary block diagram of an electronic device provided by at least one embodiment of the present disclosure is shown.
[0138] At least one embodiment of this disclosure also provides an electronic device, such as Figure 6 As shown, the electronic device 30 includes the storage access device 200 and storage array 204 described in at least one of the above embodiments.
[0139] The storage access device 200 may be located inside the memory, for example, it may be located in the memory controller, and this disclosure does not impose any limitations.
[0140] In some embodiments of this disclosure, the storage array 204 may be a memory, such as a semiconductor memory cell. For example, the memory may be any memory with storage function, such as dynamic random access memory (DRAM), random access memory (RAM), or static random access memory (SRAM). For example, it may be the outermost storage in a GPGPU, low power double data rate synchronous dynamic random access memory (LPDDR), etc. This disclosure does not limit it.
[0141] The technical effects of the electronic device described in the above embodiments of this disclosure are the same as those of the storage access device described above, and therefore will not be repeated here.
[0142] Figure 7 An exemplary block diagram of an electronic device provided by at least one embodiment of the present disclosure is shown.
[0143] Figure 7 The illustrated electronic device 1000 is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this disclosure. For instance, the electronic device 1000 may include the computer system 20 in at least one embodiment of this disclosure.
[0144] The electronic devices in this disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 7 The illustrated electronic device 1000 is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments disclosed herein.
[0145] For example, refer to Figure 7In some examples, electronic device 1000 includes a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 1001, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 1002 or a program loaded from storage access device 2008 into random access memory (RAM) 1003. For example, processing device 1001 can execute the storage access method provided in any of the above embodiments of this disclosure; for example, processing device 1001 can be a GPGPU. Various programs and data required for the operation of the computer system are also stored in RAM 1003. Processing device 1001, ROM 1002, and RAM 1003 are connected hereby via interconnection network 1004. Input / output (I / O) interface 1005 is also connected to interconnection network 1004.
[0146] For example, the following components can be connected to I / O interface 1005: input devices 1006 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 1007 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage access devices 2008 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009, such as network interface cards like LAN cards and modems, etc. Communication device 1009 allows electronic device 1000 to exchange data wirelessly or wiredly with other devices, performing communication processing via networks such as the Internet. Drive 1010 is also connected to I / O interface 1005 as needed. Removable media 1011, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on drive 1010 as needed so that computer programs read from them can be installed into storage access device 2008 as needed. Although Figure 7 An electronic device 1000 including various devices is shown; however, it should be understood that implementation or inclusion of all shown devices is not required. More or fewer devices may be implemented or included alternatively.
[0147] For example, the electronic device 1000 may further include a peripheral interface (not shown in the figure). This peripheral interface can be various types of interfaces, such as a USB interface, a Lightning interface, etc. The communication device 1009 can communicate wirelessly with a network and other devices, such as the Internet, an intranet, and / or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and / or a metropolitan area network (MAN). Wireless communication can use any of a variety of communication standards, protocols, and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and / or IEEE 802.11n standards), Voice over Internet Protocol (VoIP), Wi-MAX, protocols for email, instant messaging, and / or Short Message Service (SMS), or any other suitable communication protocol.
[0148] For example, the electronic device 1000 can be any device such as a mobile phone, tablet computer, laptop computer, e-book, game console, television, digital photo frame, navigator, server, etc., or any combination of data processing device and hardware. The embodiments disclosed herein do not limit this.
[0149] The following points need to be clarified regarding this disclosure:
[0150] (1) The accompanying drawings of the embodiments of this disclosure only involve the structures involved in the embodiments of this disclosure. Other structures can be referred to the general design.
[0151] (2) Where there is no conflict, features of the same embodiment and different embodiments of this disclosure can be combined with each other.
[0152] The above are merely specific embodiments of this disclosure, but the scope of protection of this disclosure is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this disclosure should be included within the scope of protection of this disclosure. Therefore, the scope of protection of this disclosure should be determined by the scope of the claims.
Claims
1. A storage access device, characterized in that, The storage access device includes: Multiple buffer queue items; The first scheduling module is connected to the input ports of the plurality of buffer queue items respectively, and is configured to, in response to the memory access conflict between the access addresses of at least two access requests, store the access information corresponding to the at least two access addresses into different buffer queue items respectively. The second scheduling module is connected to the output ports of the plurality of buffer queue items respectively, and is configured to determine the corresponding output access information of the plurality of buffer queue items to the first buffer queue item of the storage array; The plurality of buffer queue items are shared by a plurality of memory blocks in the memory array.
2. The storage access device according to claim 1, characterized in that, The first number of the plurality of buffer queue items is less than twice the second number of the plurality of memory groups, and the first number is greater than or equal to the maximum memory access conflict number, wherein the maximum memory access conflict number is the maximum number of memory access conflicts that may occur in each of the plurality of memory groups.
3. The storage access device according to claim 1, characterized in that, The first scheduling module is further configured to determine the available buffer queue items for the current clock cycle based on the working status of the multiple buffer queue items in the current clock cycle. The buffer queue items that can be stored in the buffer queue are used to store the access information respectively, and the working states of the multiple buffer queue items include busy state and idle state.
4. The storage access device according to claim 3, characterized in that, The first scheduling module is further configured to, in response to a third number of storeable buffer queue entries in the second buffer queue entries whose working state is idle, the third number of buffer queue entries is greater than or equal to the third number of the at least two access addresses that have memory access conflicts.
5. The storage access device according to claim 3, characterized in that, The first scheduling module is further configured to determine the buffer queue item that can be stored based on the working status of the plurality of buffer queue items in the current clock cycle and the first buffer queue item.
6. The storage access device according to claim 5, characterized in that, The first scheduling module is further configured to determine a third number of storeable buffer queue entries from the second buffer queue entries and the first buffer queue entries in response to the fact that the number of buffer queue entries in the second buffer queue entries and the first buffer queue entries is less than a third number of the at least two access addresses that have memory access conflicts, and in response to the fact that the sum of the number of buffer queue entries in the second buffer queue entries and the number of buffer queue entries in the first buffer queue entries is greater than or equal to the third number.
7. The storage access device according to claim 3, characterized in that, The second scheduling module is further configured to update the working status of the first buffer queue item from busy to idle in response to the first buffer queue item outputting the corresponding access information to the storage array.
8. The storage access device according to any one of claims 1-7, characterized in that, The second scheduling module is further configured to determine the first buffer queue item according to the buffer queue item output principle, wherein the buffer queue item output principle includes a priority principle; The second scheduling module is further configured to, in response to at least two of the plurality of buffer queue entries storing access information, determine the buffer queue entry whose access information has been stored the longest as the first sub-buffer queue entry according to the priority principle, and determine the first sub-buffer queue entry as the first buffer queue entry in response to the sum of the number of first buffer queue entries of the first sub-buffer queue entry and the number of buffer queue entries of the second buffer queue entry being greater than or equal to a third number.
9. The storage access device according to claim 8, characterized in that, The buffer queue item output principle also includes the memory access conflict principle. The second scheduling module is further configured to, in response to the sum of the number of the first buffer queue items and the number of the second buffer queue items being less than a third number, determine a second sub-buffer queue item that does not have a memory access conflict with the first sub-buffer queue item according to the memory access conflict principle. The first sub-buffer queue item and the second sub-buffer queue item are determined as the first buffer queue item, and the sum of the number of the first buffer queue items, the number of the second buffer queue items of the second sub-buffer queue item, and the number of the buffer queue items of the second buffer queue item is greater than or equal to the third number.
10. An electronic device, characterized in that, Includes the storage access device and the storage array as described in any one of claims 1-9.
11. A storage access method, characterized in that, The storage access method includes: In response to a storage access conflict at the access addresses of at least two access requests, the access information corresponding to the at least two access addresses is stored into different buffer queue entries in multiple buffer queue entries respectively. The first buffer queue item in the storage array is determined to output the access information to the storage array from the plurality of buffer queue items, so as to output the access information stored in the first buffer queue item to the target storage group of the plurality of storage groups in the storage array; The plurality of buffer queue items are shared by the plurality of storage groups.