An l2p accelerator
By optimizing the access method of L2P tables through an L2P accelerator, the problems of excessive memory space occupation and increased access time caused by the increase in storage device capacity are solved, and more efficient storage device performance is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHENGDU STARBLAZE TECH CO LTD
- Filing Date
- 2022-04-27
- Publication Date
- 2026-06-19
Smart Images

Figure CN117009259B_ABST
Abstract
Description
Technical Field
[0001] This application generally relates to the field of memory. More specifically, this application relates to an L2P accelerator. Background Technology
[0002] Figure 1 A block diagram of a solid-state storage device (SSD) is shown. The SSD 102 is coupled to a host computer to provide storage capabilities. The host computer and the SSD 102 can be coupled in various ways, including but not limited to connections via SATA (Serial Advanced Technology Attachment), SCSI (Small Computer System Interface), SAS (Serial Attached SCSI), IDE (Integrated Drive Electronics), USB (Universal Serial Bus), PCIe (Peripheral Component Interconnect Express), NVMe (NVM Express), Ethernet, Fibre Channel, and wireless communication networks. The host computer can be an information processing device capable of communicating with the storage device via the above methods, such as a personal computer, tablet computer, server, laptop computer, network switch, router, cellular phone, or personal digital assistant. Storage device 102 (hereinafter referred to as storage device) includes interface 103, control unit 104, one or more NVM chips 105, and DRAM (Dynamic Random Access Memory) 110.
[0003] The aforementioned NVM chip 105 includes NAND flash memory, phase-change memory, FeRAM (Ferroelectric RAM), MRAM (Magnetic Random Access Memory), RRAM (Resistive Random Access Memory), etc., which are common storage media.
[0004] The aforementioned interface 103 can be adapted to exchange data with the host via methods such as SATA, IDE, USB, PCIe, NVMe, SAS, Ethernet, and Fibre Channel.
[0005] The aforementioned control unit 104 is used to control data transmission between interface 103, NVM chip 105, and DRAM 110, and is also used for memory management, host logical address to flash physical address mapping, erase leveling, bad block management, etc. Control unit 104 can be implemented in various ways, including software, hardware, firmware, or combinations thereof. For example, control unit 104 can be in the form of an FPGA (Field-programmable gate array), an ASIC (Application Specific Integrated Circuit), or a combination thereof. Control unit 104 may also include a processor or controller, in which software is executed to manipulate the hardware of control unit 104 to process I / O (Input / Output) commands. Control unit 104 may also include a memory controller for coupling to DRAM 110 and accessing data in DRAM 110.
[0006] The control unit 104 includes a flash interface controller (or media interface, media interface controller, flash channel controller), which is coupled to the NVM chip 105 and issues commands to the NVM chip 105 in accordance with the interface protocol of the NVM chip 105 to operate the NVM chip 105, and receives the command execution results output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", etc.
[0007] NVM storage media typically store and retrieve data in pages, while erasing data in blocks. A block (also called a physical block) on an NVM storage medium contains multiple pages. A page (called a physical page) on the storage medium has a fixed size, such as 17664 bytes. Physical pages can also have other sizes.
[0008] In storage devices, the FTL (Flash Translation Layer) is used to maintain the mapping information from logical addresses to physical addresses. Logical addresses constitute the storage space of the storage device as perceived by upper-layer software such as the operating system. Physical addresses are the addresses used to access the physical storage units of the solid-state storage device. In existing technologies, address mapping can also be implemented using intermediate address formats. For example, a logical address can be mapped to an intermediate address, and then the intermediate address can be further mapped to a physical address. Optionally, the host accessing the storage device provides the FTL.
[0009] A table structure that stores mapping information from logical addresses to physical addresses is called an FTL table (also known as an L2P table). Typically, the data items in an FTL table record the address mapping relationships in storage devices in units of specified storage units (e.g., 512 bytes, 2KB, 4KB, etc.).
[0010] As storage device capacity increases, the size of the L2P table increases to record more storage units, thus requiring more memory to accommodate the L2P table. To address updated storage units, the size of each entry in the L2P table also needs to increase. For example, a 32-bit L2P table entry can address 2^32 (4GB) data units. If each data unit is 4KB, 2^32 data units correspond to a 16TB storage capacity, and the L2P table itself would be 16GB (4B x 4G = 16GB, one entry is 4B, totaling 4G entries, 16GB), requiring at least 16GB of memory space. However, storage devices come in various capacities; for example, if the storage device provided to a user has a capacity of 4TB, then the L2P table itself could be 4GB in size. However, to provide 4TB of storage space, if each data unit is 4KB, then there would be 1GB of units, which is 2^30. Therefore, the L2P table would need to manage 2^30 data units, meaning each entry in the L2P table would only need to be 30 bits in size. Consequently, the L2P table size would be 30 * 2^30 bits (3.75GB, less than 4GB). However, due to limitations imposed by memory chips and CPU addressing methods, CPU addressing channels typically use data widths in multiples of 32 bits or bytes, and memory chips also generally use data widths in multiples of bytes. Therefore, while a 30-bit L2P table entry size reduces the overall size of the L2P table, entries crossing byte boundaries require, for example, two or more bus or memory accesses to load into the CPU. This significantly increases the loading time for L2P table entries, limiting the performance of the storage device.
[0011] To reduce the memory space occupied by L2P tables when providing storage devices of various capacities, and to reduce or eliminate the impact of non-byte-aligned L2P table entries on CPU or other devices within the chip accessing L2P table entries, compressed L2P tables are typically provided. The size of entries in a compressed L2P table may not be an integer multiple of bytes. Furthermore, compressed L2P table entries are tightly packed in memory without leaving unused memory space between entries for byte alignment. However, to eliminate the impact of using compressed L2P tables on the CPU or other devices, the CPU or other devices typically still access the L2P table in their existing byte-aligned or byte-multiple-aligned manner. Summary of the Invention
[0012] As storage device capacity increases, the size of the L2P table increases to record more storage units, thus requiring more memory to accommodate the L2P table. To address updated storage units, the size of each entry in the L2P table also needs to increase. For example, a 32-bit L2P table entry can address 2^32 (4GB) data units. If each data unit is 4KB, 2^32 data units correspond to a 16TB storage capacity, and the L2P table itself would be 16GB in size (4KB x 4GB = 16GB, 4KB per entry, 4GB total entries, 16GB), requiring at least 16GB of memory space. Since storage devices come in various capacities, for example, if the storage device provided to a user has a capacity of 4TB, then the L2P table itself could be 4GB in size. However, to provide 4TB of storage space, if each data unit is 4KB, then there would be 1GB of units, which is 2^30. Therefore, the L2P table would need to manage 2^30 data units, meaning each entry in the L2P table would only need to be 30 bits in size. Consequently, the L2P table size would be 30 * 2^30 bits (3.75GB, less than 4GB). However, due to limitations imposed by memory chips and CPU addressing methods, CPU addressing channels typically use data widths in multiples of 32 bits or bytes, and memory chips also generally use data widths in multiples of bytes. Therefore, while a 30-bit L2P table entry size reduces the overall size of the L2P table, entries crossing byte boundaries require, for example, two or more bus or memory accesses to load into the CPU. This significantly increases the loading time for L2P table entries, limiting the performance of the storage device.
[0013] To reduce the memory space occupied by L2P tables when providing storage devices of various capacities, and to reduce or eliminate the impact of non-byte-aligned L2P table entries on CPU or other devices within the chip accessing L2P table entries, compressed L2P tables are typically provided. The size of entries in the provided compressed L2P table may not be an integer multiple of bytes. Furthermore, compressed L2P table entries are tightly packed in memory without leaving unused memory space between entries for byte alignment. However, to eliminate the impact of using compressed L2P tables on the CPU or other devices, the CPU or other devices typically still access the L2P table in their existing manner, either byte-aligned or aligned to integer multiples of bytes. For L2P tables in memory, the master device accesses the L2P table by sending read commands and writes L2P table entries into the L2P table in memory by sending write commands. Embodiments of this application aim to accelerate the processing of read commands sent by the master device to access the L2P table and write commands instructing the writing of L2P table entries to the L2P table using hardware accelerators, thereby offloading the CPU load and improving the performance of the storage device.
[0014] According to a first aspect of this application, a first L2P accelerator is provided for coupling a host device and a memory, and accelerating the processing of read and write commands issued by the host device to an L2P table in the memory. The accelerator includes a write channel and a read channel, wherein...
[0015] The read channel responds to receiving one or more first read commands from the master device, generates one or more second read commands based on each first read command; responds to receiving first response data from the memory for each second read command, determines the L2P table entry to be read indicated by each first read command based on all the first response data corresponding to each first read command, and sends the L2P table entry corresponding to each first read command and the first protocol information as a response to the first read command to the master device;
[0016] The write channel responds to receiving one or more write commands from the master device by obtaining the corresponding address index and L2P table entry for each write command; determining the location in memory of one or more memory addresses corresponding to each write command and the position of the first bit of the valid data of its L2P table entry based on the address index corresponding to each write command and the number of valid data bits of its L2P table entry; and writing the valid data of the corresponding L2P table entry into memory based on the one or more memory addresses corresponding to each write command and the position of its first bit in memory.
[0017] According to the first L2P accelerator provided in the first aspect of this application, a second L2P accelerator according to this application is provided, wherein the read channel further responds to the presence of a first write command in one or more write commands, wherein the valid data of the first L2P table entry indicated by the first write command is not byte aligned and / or the first bit in the valid data of the first L2P table entry is not located at the starting position of its corresponding storage cell in the memory, generates one or more third read commands according to the memory address corresponding to the first write command, and sends one or more third write commands to the memory;
[0018] The write channel, in response to receiving second response data from all third read commands fed back by the memory, combines the valid data with a portion of the data in the second response data according to the position of the first bit in the valid data of the first L2P table entry in the memory to obtain first data; and generates second data according to the second protocol information stored in the cache and the first data, and sends the second data to the memory.
[0019] According to the first aspect of this application, a first or second L2P accelerator is provided, and a third L2P accelerator is provided according to this application, wherein the one or more write commands include a second write command and a third write command, wherein the second command indicates a second L2P table entry, the third write command indicates a third L2P table entry, and the second L2P table entry and the third L2P table entry are different entries in the L2P table.
[0020] The write channel responds to the fact that the second L2P table entry and the third L2P table entry can be concatenated, concatenating the valid data of the second L2P table entry and the valid data of the third L2P table entry to obtain one or more concatenated data, and then writing the concatenated data into the memory.
[0021] According to the third L2P accelerator provided in the first aspect of this application, a fourth L2P accelerator according to this application is provided, wherein the read channel generates one or more fourth read commands based on one or more memory addresses corresponding to the second write command; the write channel responds to receiving third response data of all fourth read commands fed back by the memory; the third response data is combined with the concatenated data to obtain third data, and the third data and the second protocol information are sent to the memory; or
[0022] The read channel generates one or more fifth read commands based on one or more memory addresses corresponding to the third write command; the write channel responds to the fourth response data received from all fifth read commands fed back by the memory; the fourth response data is combined with the concatenated data to obtain fourth data, and the fourth data and the second protocol information are sent to the memory.
[0023] According to the fourth L2P accelerator provided in the first aspect of this application, a fifth L2P accelerator according to this application is provided, which responds to a second write command with one or more corresponding fourth read commands, and a third write command with one or more corresponding fifth read commands. The write channel, in response to receiving the third response data and the fourth response data, combines the concatenated data with the third response data and the fourth response data to obtain fifth data, and sends the fifth data and the second protocol information to the memory.
[0024] According to the third to fifth L2P accelerators provided in the first aspect of this application, a sixth L2P accelerator according to this application is provided, wherein the write channel, in response to the fact that the second L2P table entry and the third L2P table entry cannot be concatenated, and that the memory addresses indicated by one or more fourth read commands and one or more fifth read commands do not conflict, writes the valid data of the second L2P table entry and the valid data of the third L2P table entry into the memory in parallel.
[0025] According to the sixth L2P accelerator provided in the first aspect of this application, a seventh L2P accelerator according to this application is provided, wherein the write channel, in response to a conflict between one or more fourth read commands and memory addresses indicated by one or more fifth read commands, writes valid data of the second L2P table entry into the memory and then issues one or more fifth read commands to the memory; or writes valid data of the third L2P table entry into the memory and then issues one or more fourth read commands to the memory.
[0026] According to the seventh L2P accelerator provided in the first aspect of this application, an eighth L2P accelerator according to this application is provided, wherein the write channel, in response to a conflict between one or more fourth read commands and one or more fifth read commands indicating memory addresses, sends one or more memory addresses corresponding to the second write command to the read channel.
[0027] The read channel responds to receiving one or more memory addresses corresponding to the second write command, and generates one or more fourth read commands based on the memory addresses;
[0028] The write channel responds to the third response data received from all fourth read commands fed back by the memory, combines the third response data with the valid data of the second L2P table entry to obtain the sixth data, and sends the sixth data and the second protocol information to the memory.
[0029] In response to receiving third response data for all fourth read commands from the memory, the write channel sends one or more memory addresses corresponding to the third write command to the read channel.
[0030] The read channel responds to receiving one or more memory addresses corresponding to the third write command, and generates one or more fifth read commands based on the memory addresses;
[0031] The write channel responds to the fourth response data received from all fifth read commands fed back by the memory, combines the fourth response data with the valid data of the third L2P table entry to obtain the seventh data, and sends the seventh data and the second protocol information to the memory.
[0032] According to the third to eighth L2P accelerators provided in the first aspect of this application, a ninth L2P accelerator according to this application is provided, wherein the write channel responds to the concatenation of the second L2P table entry and the third L2P table entry by concatenating all valid data of the second L2P table entry with all valid data of the third L2P table entry to obtain one or more concatenated data, and writes the concatenated data into the memory.
[0033] According to the first to ninth L2P accelerators provided in the first aspect of this application, a tenth L2P accelerator according to this application is provided, wherein the read channel includes a first logic circuit and a first plurality of caches;
[0034] The first logic circuit, in response to receiving a first read command from the master device, generates one or more second read commands based on the received first read command, and stores the relationship between the first identification information identifying each first read command and the second identification information identifying the one or more second read commands corresponding to it in the first plurality of caches; and in response to receiving first response data for each second read command fed back by the memory, processes the first response data corresponding to each second read command to obtain eighth data and third protocol information, determines all the eighth data corresponding to each first read command and generates the first protocol information based on the third protocol information and the relationship, processes all the eighth data corresponding to each first read command to obtain the entry of the L2P table indicated by each first read command; and sends the first protocol information and the entry of the L2P table indicated by it as a response to each first read command to the master device.
[0035] According to the tenth L2P accelerator provided in the first aspect of this application, an eleventh L2P accelerator according to this application is provided, wherein the first logic circuit includes: a first parsing module, a first calculation module, and a command generation module; wherein...
[0036] The first parsing module, in response to receiving one or more first read commands, parses each first read command to obtain the address index of its corresponding L2P table entry, and stores the address index in a first cache among multiple caches;
[0037] The first calculation module is coupled to the first cache. It calculates the memory address accessed by one or more second read commands based on the address index corresponding to each first read command. It sets a corresponding second identifier for each second read command and stores the relationship between the first identifier and its corresponding one or more second identifiers in the second cache.
[0038] The command generation module is coupled to the computing module, generates at least one second read command based on the memory address, and sends the at least one second read command to the memory.
[0039] According to the eleventh L2P accelerator provided in the first aspect of this application, a twelfth L2P accelerator according to this application is provided. The first parsing module responds to receiving first response data fed back by the memory based on each second read command, and parses the first response data to obtain eighth data and third protocol information or eighth data, third protocol information and a marker, wherein the marker is used to identify the position of the last bit in the valid data of the L2P table entry in the corresponding eighth data.
[0040] Based on the marker, the valid data of the corresponding L2P table entry is parsed from all the eighth data corresponding to each first read command, and the valid data of the L2P table entry and the first protocol information are used as the data in response to the first read command.
[0041] According to the eleventh or twelfth L2P accelerator provided in the first aspect of this application, a thirteenth L2P accelerator according to this application is provided. The first logic circuit further includes a first merging unit. The first merging unit merges the valid data and empty bit data of the corresponding L2P table entry according to the length of the L2P table entry corresponding to each first read command to obtain the L2P table entry to be accessed indicated therein. The valid data of the L2P table entry is located in the first N consecutive bits of the L2P table entry, where N is the length of the valid data.
[0042] The L2P table entry corresponding to each first read command and the first protocol information are combined to obtain the data that serves as the response to the first read command.
[0043] According to the thirteenth L2P accelerator provided in the first aspect of this application, a fourteenth L2P accelerator according to this application is provided, wherein the first merging unit updates a marker in the L2P table entry in response to obtaining the L2P table entry corresponding to each first read command, such that the updated marker indicates the position of the last bit of the L2P table entry or the valid data of the L2P table entry.
[0044] According to the tenth to fourteenth L2P accelerators provided in the first aspect of this application, a fifteenth L2P accelerator according to this application is provided, wherein the first plurality of caches includes: a first cache, a second cache, a third cache, a fourth cache, a fifth cache, a sixth cache, and a seventh cache; wherein, the first cache is used to cache the address index indicated by the first read command; the second cache is used to cache the mapping relationship between the first identification information of each first read command and the second identification information of one or more corresponding second read commands; the third cache is used to cache the first response data corresponding to each second read command; the fourth cache is coupled to the third cache and is used to cache the third protocol information; the fifth cache is coupled to the third cache and is used to cache all the eighth data and tags corresponding to each first read command; the sixth cache is coupled to the fifth cache and is used to cache the L2P table entry and the updated tags corresponding to each first read command; the seventh cache is coupled to the sixth cache and is used to cache the response of each first read command.
[0045] According to the fifteenth L2P accelerator provided in the first aspect of this application, a sixteenth L2P accelerator according to this application is provided, wherein the first logic circuit, in response to receiving first response data for each second read command, stores the first response data of each second read command into the third cache;
[0046] In response to storing the first response data of each second read command in the third cache, obtaining third protocol information from the first response data in the third cache and storing it in the fourth cache, and obtaining the eighth data and storing it in the fifth cache;
[0047] Specifically, based on the relationship between the first identification information and the second identification information stored in the second cache, in response to receiving all first response data corresponding to each first read command, the first merging unit obtains the valid data of the L2P table entry indicated by the first read command from all eighth data corresponding to each first read command in the fifth cache, and merges the valid data of the L2P table entry with the empty bit data according to the length of the L2P table entry to obtain the L2P table entry, and stores the L2P table entry in the sixth cache; and updates the marker, storing the updated marker in the sixth cache as well.
[0048] The L2P table entry and the updated identifier are retrieved from the sixth cache. The first identification information of the first read command is retrieved from the second cache. The first protocol information is generated based on the first identification information. The L2P table entry and the first protocol information are used as a response to the first read command, and the response is stored in the seventh cache.
[0049] According to the first to sixteenth L2P accelerators provided in the first aspect of this application, a seventeenth L2P accelerator according to this application is provided, wherein the write channel includes a second logic circuit and a second plurality of caches; wherein,
[0050] In response to receiving one or more write commands from the master device, the second logic circuit retrieves the address index and the data of the L2P table entry indicated by it from each write command, and stores the address index and the L2P table entry data in the second plurality of caches; determines one or more memory addresses in the memory that store the valid data of the L2P table entry, and the position of the first bit of the valid data in the memory, based on the address index and the number of valid data bits of the L2P table entry; and writes the valid data of the corresponding L2P table entry into the memory according to the one or more memory addresses corresponding to each write command and the position of its first bit in the memory.
[0051] According to the seventeenth L2P accelerator provided in the first aspect of this application, an eighteenth L2P accelerator according to this application is provided. The second logic circuit, in response to receiving a second write command from one or more write commands sent by a master device, obtains a second address index and a second L2P table entry from the second write command, and stores the second address index and the second L2P table entry data in a second plurality of caches; determines a first position in memory for one or more second memory addresses and the first bit in the valid data of the second L2P table entry based on the second address index and the number of valid data bits of the second L2P table entry, and stores the mapping relationship between the identification information of the second write command and the second memory address and the first position in the second plurality of caches; and stores the valid data of the second L2P table entry data in the second plurality of caches.
[0052] In response to receiving a third write command from the one or more write commands, the second logic circuit, regardless of whether the operation of writing the valid data of the second L2P table entry into memory is completed, obtains the third address index and the third L2P table entry from the third write command, determines the second position in memory of one or more third storage addresses and the first bit in the valid data of the third L2P table entry based on the third index address and the number of bits of the valid data of the third L2P table entry, and stores the mapping relationship between the identification information of the third write command and the third memory address and the second position in the second plurality of caches; and stores the valid data of the third L2P table entry into the second plurality of caches.
[0053] The valid data of the second L2P table entry and / or the valid data of the third L2P table entry are written from the cache into the memory, wherein the address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the first location, and the address of the valid data of the third L2P table entry in the memory corresponds to the third memory address and the second location.
[0054] According to the eighteenth L2P accelerator provided in the first aspect of this application, a nineteenth L2P accelerator according to this application is provided. In response to the valid data of the second L2P table entry being non-byte aligned or the first bit in the valid data of the second L2P table entry not being located at the starting position of its corresponding storage cell in the memory, the command generation module in the read channel generates one or more fourth read commands according to one or more second memory addresses corresponding to the second write command, and sends the one or more fourth read commands to the memory.
[0055] In response to receiving third response data from the memory based on each fourth read command, the second logic circuit combines the valid data of the first L2P table entry with a portion of the data in the third response data according to the position of the first bit in the valid data of the first L2P table entry in the memory to obtain third data; and sends the third data to the memory according to the second protocol information stored in the cache.
[0056] According to the nineteenth L2P accelerator provided in the first aspect of this application, a twentieth L2P accelerator according to this application is provided, wherein the second logic circuit, in response to a second write command having a corresponding fourth read command or not having a corresponding fourth read command, stores a first mapping relationship between the identification information of the second write command and its corresponding fourth read command identification information, or between the identification information of the second write command and information on which no fourth read command has been generated, in a second plurality of caches; and / or
[0057] In response to whether a corresponding fifth read command exists for a third write command or not, the second mapping relationship between the identifier information of the third write command and the identifier information of its corresponding fifth read command, or between the identifier information of the third write command and the information of the fifth read command not being generated, is stored in a second plurality of caches.
[0058] According to the twentieth L2P accelerator provided in the first aspect of this application, a twenty-first L2P accelerator according to this application is provided. The second logic circuit, in response to completing the writing of the second L2P table entry to memory, can concatenate the second L2P table entry and the third L2P table entry, concatenating the valid data of the second L2P table entry with the valid data of the third L2P table entry to obtain one or more concatenated data sets, and updating the first mapping relationship and the second mapping relationship.
[0059] In response to receiving one or more concatenated data, the command generation module of the read channel generates one or more sixth read commands to replace the one or more fourth read commands and / or fifth read commands based on the memory address of the concatenated data; the updated combination of the first mapping relationship and the second mapping relationship includes the identifiers of all sixth read commands, and the identifiers of the sixth read commands in the first mapping relationship and the second mapping relationship may be the same or different.
[0060] According to the first aspect of this application, a twenty-second L2P accelerator is provided, wherein the second logic circuit, in response to issuing the one or more fourth read commands and before generating one or more fifth read commands based on the third memory address, identifies a conflict between the one or more fifth read commands and the memory address indicated by the one or more fourth read commands, and suspends processing of the third write command.
[0061] According to the twenty-second L2P accelerator provided in the first aspect of this application, a twenty-third L2P accelerator according to this application is provided, which, in response to pausing processing of a third write command, also pauses processing of subsequent write commands.
[0062] According to the twenty-third L2P accelerator provided in the first aspect of this application, a twenty-fourth L2P accelerator is provided, wherein the second logic circuit, in response to receiving information that valid data of the second L2P table entry has been written to memory, resumes processing of the suspended third write command.
[0063] According to the seventeenth to twenty-fourth L2P accelerators provided in the first aspect of this application, a twenty-fifth L2P accelerator according to this application is provided. The second logic circuit includes: a second parsing module, a second computing module, and a packaging module; wherein...
[0064] The second parsing module, in response to receiving the second write command, parses the second write command to obtain the second address index and the second L2P table entry, and caches the second address index in the eighth cache of the second plurality of caches and caches the second L2P table entry in the ninth cache of the second plurality of caches;
[0065] The second calculation module, coupled to the eighth cache, calculates the one or more second memory addresses and the first location based on the second address index and the number of valid data bits; stores the second memory address and the first location in the tenth cache of the second plurality of caches and stores the mapping relationship between the identification information of the second write command and the second memory address and the first location in the eleventh cache; and caches the valid data of the second L2P table entry in the thirteenth cache of the second plurality of caches.
[0066] The second parsing module receives a third write command. Regardless of whether the operation of writing the valid data of the second L2P table entry into the memory is completed, it obtains the third address index and the third L2P table entry from the third write command, stores the third address index in the eighth cache, and stores the third L2P table entry in the ninth cache.
[0067] The second calculation module is also coupled to the eleventh cache, and determines, based on the third index address and the number of bits of the valid data of the third L2P table entry, the second position of the first bit in the memory of one or more third storage addresses and the valid data of the third L2P table entry; stores the third memory address and the second position in the tenth cache of the second plurality of caches, and stores the mapping relationship between the identification information of the third write command and the third memory address and the second position in the eleventh cache; and caches the valid data of the third L2P table entry in the thirteenth cache;
[0068] The packing module is coupled to the ninth cache, the tenth cache, the thirteenth cache, and the twelfth cache containing the alignment information of the valid data bytes. It writes the valid data of the second L2P table entry and / or the valid data of the third L2P table entry from the thirteenth cache into the memory. The address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the first location, and the address of the valid data of the third L2P table entry in the memory corresponds to the third memory address and the second location.
[0069] According to the twenty-fifth L2P accelerator provided in the first aspect of this application, a twenty-sixth L2P accelerator according to this application is provided, wherein the second logic circuit further includes a second merging unit; the second merging unit extracts valid data of the second L2P table entry from the second L2P table entry and stores the valid data of the second L2P table entry in the sixth cache, and extracts valid data of the third L2P table entry from the third L2P table entry and stores the valid data of the third L2P table entry in the thirteenth cache.
[0070] According to the twenty-sixth L2P accelerator provided in the first aspect of this application, a twenty-seventh L2P accelerator is provided, wherein the second merging unit, in response to the ability to concatenate the second L2P table entry and the third L2P table entry, concatenates the valid data of the second L2P table entry and the valid data of the third L2P table entry according to the first position and the second position to obtain one or more concatenated data.
[0071] According to the twenty-seventh L2P accelerator provided in the first aspect of this application, a twenty-eighth L2P accelerator according to this application is provided. In response to obtaining one or more concatenated data, and a second write command having a corresponding one or more fourth read commands and / or a third write command having a corresponding one or more fifth read commands, the second merging unit further stores the identification information of the second write command and the identification information of one or more fourth read commands in a fourteenth cache; and / or stores the identification information of the third write command and the identification information of one or more fifth read commands in a fourteenth cache.
[0072] According to the twenty-seventh or twenty-eighth L2P accelerator provided in the first aspect of this application, a twenty-ninth L2P accelerator according to this application is provided. In response to obtaining one or more concatenated data, and neither the second write command nor the third write command has a corresponding one or more read commands, the packaging module generates ninth data based on the concatenated data and the second protocol information, and sends the ninth data to the memory.
[0073] According to the twenty-ninth L2P accelerator provided in the first aspect of this application, a thirtieth L2P accelerator according to this application is provided. In response to receiving one or more concatenated data, and the read channel generating one or more fourth read commands based on the first memory address and / or generating one or more fifth read commands based on the second memory address, wherein the one or more fourth read commands do not access the same memory address as the one or more fifth read commands; the packing module, in response to receiving third response data from all fourth read commands and / or fourth response data from all fifth read commands fed back from the memory, combines the concatenated data with a portion of the third response data and / or fourth response data to obtain tenth data; generates eleventh data based on protocol information stored in the cache and the tenth data, and sends the eleventh data to the memory.
[0074] According to the seventeenth to thirtieth L2P accelerators provided in the first aspect of this application, a thirty-first L2P accelerator according to this application is provided, the second plurality of caches including: an eighth cache, a ninth cache, a tenth cache, an eleventh cache, a twelfth cache, a thirteenth cache, a fourteenth cache, a fifteenth cache, a sixteenth cache, and a seventeenth cache; wherein, the eighth cache is used to cache index addresses; the ninth cache is used to cache L2P table entries indicated by write commands; the tenth cache is used to cache the memory address corresponding to the L2P table entry and the location in memory of the first bit of the valid data of the L2P table entry; the eleventh cache is used to cache the write command... The mapping relationship between the identification information and the memory address and the location of the first bit of the valid data of the L2P table entry in memory; the twelfth cache caches the byte alignment information of the valid data; the thirteenth cache is coupled to the ninth cache and caches the valid data of the L2P table entry indicated by the write command; the fourteenth cache is used to cache the identification information of the write command and the identification information of one or more corresponding read commands; the fifteenth cache is used to cache protocol information; the sixteenth cache is coupled to the thirteenth cache and is used to cache the valid data of the L2P table entry; the seventeenth cache is coupled to the logic circuit and is used to cache the response data of the read command sent by the read channel.
[0075] According to the thirty-first L2P accelerator provided in the first aspect of this application, a thirty-second L2P accelerator according to this application is provided, wherein the second logic circuit, in response to obtaining a second address index and a second L2P table entry from a second write command, stores the second address index in the eighth cache and caches the second L2P table entry in the ninth cache; and determines one or more second memory addresses and the first location based on the second address index and the effective data bits, stores the one or more second memory addresses and the first location in the tenth cache; and stores the mapping relationship between the identification information of the second write command and the second memory address and the first location in the eleventh cache;
[0076] In response to caching the second L2P table entry in the ninth cache, the valid data of the second L2P table entry is retrieved from the ninth cache and stored in the thirteenth cache;
[0077] In response to the fact that the third L2P table entry in the ninth cache cannot be concatenated with the second L2P table entry stored in the thirteenth cache, the valid data of the second L2P table entry is moved to the sixteenth cache; in response to the fact that the third L2P table entry in the ninth cache can be concatenated with the second L2P table entry stored in the thirteenth cache, the valid data of the third L2P table entry is written to the thirteenth cache, and then the valid data of the concatenated second L2P table entry and the valid data of the third L2P table entry are moved from the thirteenth cache to the sixteenth cache.
[0078] In response to moving the valid data of the second L2P table entry and the valid data of the third L2P table entry to the sixteenth cache, and in response to the second write command and the third write command not having a corresponding read command, the valid data of the second L2P table entry and the valid data of the third L2P table entry are written from the sixteenth cache into the memory.
[0079] According to the thirty-second L2P accelerator provided in the first aspect of this application, a thirty-third L2P accelerator according to this application is provided. In response to a second write command having one or more corresponding fourth read commands and / or a third write command having one or more corresponding fifth read commands, in response to storing third response data corresponding to all fourth read commands in the tenth cache and / or in response to storing fourth response data corresponding to all fifth read commands in the tenth cache, the valid data of the second L2P table entry is combined with the third response data to obtain twelfth data and / or the valid data of the third L2P table entry is combined with the fourth response data to obtain thirteenth data; fourteenth data is generated based on protocol information stored in the sixth cache and the twelfth or thirteenth data, and the fourteenth data is sent to the memory.
[0080] According to the thirty-third L2P accelerator provided in the first aspect of this application, a thirty-fourth L2P accelerator according to this application is provided, wherein the thirteenth cache includes a first storage unit and a second storage unit, the size of the first storage unit and the second storage unit being the same as the size of the storage unit in the memory;
[0081] In response to the fact that the third L2P table entry in the ninth cache can be concatenated with the second L2P table entry stored in the thirteenth cache, and that the second L2P table entry is adjacent to and precedes the third L2P table entry in the L2P table, and that the memory addresses corresponding to the second write command and the third write command are the same, the second merging unit stores the valid data of the second L2P table entry into the first storage unit in the sixth cache according to the position of the first bit in the valid data of the second L2P table entry in its corresponding memory storage unit.
[0082] The second merging unit concatenates the valid data of the third L2P table entry and the valid data of the second L2P table entry in the first storage unit to obtain a spliced data, wherein the position of the first bit in the valid data of the third L2P table entry in the first storage unit is the same as its position in the corresponding memory storage unit.
[0083] According to the thirty-fourth L2P accelerator provided in the first aspect of this application, a thirty-fifth L2P accelerator according to this application is provided, wherein, in response to the memory addresses corresponding to the second write command and the third write command being different, the second merging unit stores the valid data of the second L2P table entry in the first storage unit according to the position of the first bit in the valid data of the second L2P table entry in its corresponding memory storage unit to obtain the first concatenated data, and stores the valid data of the second L2P table entry in the second storage unit according to the position of the first bit in the valid data of the third L2P table entry in its corresponding memory storage unit to obtain the second concatenated data.
[0084] According to the thirty-fifth L2P accelerator provided in the first aspect of this application, a thirty-sixth L2P accelerator according to this application is provided. In response to the memory address portions corresponding to the second write command and the third write command being the same, the valid data portion of the third L2P table entry is stored in the first storage unit and sequentially concatenated with the valid data of the second L2P table entry to obtain the first concatenated data. The other portion of the valid data of the second L2P table entry is stored in the second storage unit to obtain the second concatenated data.
[0085] According to the thirty-sixth L2P accelerator provided in the first aspect of this application, a thirty-seventh L2P accelerator according to this application is provided. In response to the third L2P table entry being adjacent to and preceding the second L2P table entry in the L2P table, after the second merging unit stores the valid data of the second L2P table entry in the first storage unit, if the number of bits between the starting position and the first bit of the valid data of the second L2P table entry in the first storage unit is less than the number of bits of the valid data of the third L2P table entry, then the second merging unit stores a portion of the valid data of the third L2P table entry in the first storage unit between the starting position and the first bit of the valid data of the second L2P table entry to obtain a concatenated data, moves the obtained concatenated data to the sixteenth cache, and stores the remaining portion of the valid data of the third L2P table entry in the thirteenth cache.
[0086] According to the thirty-first to thirty-seventh L2P accelerators provided in the first aspect of this application, a thirty-eighth L2P accelerator according to this application is provided, which, in response to caching the data of a third L2P table entry in the ninth cache, and wherein the third L2P table entry and the second L2P table entry cannot be concatenated, moves the valid data of the second L2P table entry to the sixteenth cache, and moves the valid data of the third L2P table entry to the thirteenth cache.
[0087] According to the thirty-first to thirty-eighth L2P accelerators provided in the first aspect of this application, a thirty-ninth L2P accelerator according to this application is provided. In response to the fact that the memory address corresponding to the valid data of the second L2P table entry and the valid data of the third L2P table entry are the same as the location of the first bit in the memory, the second merging unit overwrites the valid data of the second L2P table entry in the first storage unit with the valid data of the third L2P table entry to obtain a spliced data.
[0088] According to the thirty-ninth L2P accelerator provided in the first aspect of this application, the fortieth L2P accelerator according to this application is provided, in response to obtaining one or two concatenated data, the second merging unit also moves the data in the first storage unit and / or the second storage unit as a whole to the sixteenth cache.
[0089] According to the fortieth L2P accelerator provided in the first aspect of this application, a forty-first L2P accelerator according to this application is provided, wherein the thirteenth cache includes a first storage unit and a second storage unit, the size of the first storage unit and the second storage unit being the same as the size of the storage unit in the memory;
[0090] The second merging unit stores the valid data of the second L2P table entry into the first storage unit of the thirteenth cache according to the position of the first bit of the valid data of the second L2P table entry in its corresponding memory storage unit.
[0091] In response to caching the data of the second L2P table entry in the ninth cache, the second merging unit retrieves the valid data of the second L2P table entry from the ninth cache, and stores the valid data of the second L2P table entry into the first storage unit of the thirteenth cache according to the position of the first bit of the valid data of the second L2P table entry in its corresponding memory storage unit. The first bit of the valid data of the second L2P table entry in the first storage unit is at the same position as its position in the corresponding memory storage unit. In response to caching the data of the third L2P table entry in the ninth cache, and given that the second L2P table entry and the third L2P table entry are adjacent in the L2P table, the merging unit stores the valid data of the second L2P table entry into the first storage unit of the thirteenth cache according to the position of the first bit of the valid data of the second L2P table entry in its corresponding memory storage unit. The position of the first bit of the valid data of the table entry in its corresponding memory storage unit determines the location of the valid data of the third L2P table entry in the first storage unit and / or the second storage unit in the thirteenth cache. The position of the first bit of the valid data of the third L2P table entry in the first storage unit and / or the second storage unit is the same as its position in the corresponding memory storage unit, and the valid data of the second L2P table entry is adjacent to and does not overlap with the valid data of the third L2P table entry in the thirteenth cache.
[0092] According to the 31st to 41st L2P accelerators provided in the first aspect of this application, a 42nd L2P accelerator according to this application is provided. After the command generation module issues the one or more fourth read commands in the read channel, the logic circuit, in response to a conflict between the memory address indicated by the one or more fifth read commands and the one or more fourth read commands, does not store the identification information of the third write command and the identification information of the corresponding one or more fifth read commands in the fourteenth cache and does not store the valid data of the third L2P table entry in the thirteenth cache, thereby suspending the processing of the third write command and also suspending the processing of subsequent write commands.
[0093] According to the forty-second L2P accelerator provided in the first aspect of this application, a forty-third L2P accelerator according to this application is provided, wherein the second logic circuit, in response to receiving information that the valid data of the second L2P table entry has been written to memory, resumes processing of the third write command, stores the identification information of the third write command and the identification information of one or more corresponding fifth read commands in a fourteenth cache and stores the valid data of the third L2P table entry in a thirteenth cache; and moves the valid data of the third L2P table entry to a sixteenth cache to write the valid data of the third L2P table entry into memory.
[0094] According to the 31st to 43rd L2P accelerators provided in the first aspect of this application, the 44th L2P accelerator according to this application is provided, wherein the second logic circuit, in response to the information received by the 11th cache that the valid data of the second L2P table entry has been written to memory, deletes the mapping relationship between the identification information of the second write command and the second memory address and the first location.
[0095] According to a second aspect of this application, a first control component according to this application is provided, comprising the accelerators described in the first to forty-fourth aspects of the first aspect. Attached Figure Description
[0096] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings.
[0097] Figure 1 A block diagram of a prior art solid-state storage device;
[0098] Figure 2A A schematic diagram of the structure of the control component provided in an embodiment of this application is shown;
[0099] Figure 2B A schematic diagram of the structure of the L2P accelerator provided in the embodiments of this application is shown;
[0100] Figure 2C This illustration shows a schematic diagram of the conversion between L2P table entries perceived by the master device and L2P table entries stored in the memory, as provided in this application.
[0101] Figure 3A A schematic diagram illustrating the accelerator processing of write commands provided in an embodiment of this application is shown;
[0102] Figure 3B A schematic diagram illustrating the parallel processing of multiple write commands by an accelerator provided in an embodiment of this application is shown.
[0103] Figure 3C This illustration shows a combination of valid data of the L2P table entry indicated by the write command provided in an embodiment of this application and response data fed back by the memory;
[0104] Figure 4 This application illustrates a schematic diagram of another accelerator structure provided in an embodiment of the present application;
[0105] Figure 5A This illustration shows a schematic diagram of concatenating valid data from multiple L2P table entries according to an embodiment of this application;
[0106] Figure 5B This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, provided by an embodiment of this application.
[0107] Figure 5C This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, provided by an embodiment of this application.
[0108] Figure 5D This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, provided by an embodiment of this application.
[0109] Figure 5E This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, provided by an embodiment of this application.
[0110] Figure 5F This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, provided by an embodiment of this application.
[0111] Figure 6A This illustration shows a schematic diagram of how spliced data and response data are combined to obtain combined data, as provided in an embodiment of this application.
[0112] Figure 6B This illustration shows another schematic diagram of combining spliced data with response data to obtain combined data, provided by an embodiment of this application.
[0113] Figure 7 This illustration shows a schematic diagram of processing multiple write commands under multiple caches, provided by an embodiment of this application.
[0114] Figure 8A A schematic diagram illustrating the accelerator processing of read commands provided in an embodiment of this application is shown;
[0115] Figure 8B This application illustrates a schematic diagram of the read channel structure provided in an embodiment.
[0116] Figure 8C The mechanism for the accelerator to process multiple read commands A in parallel was demonstrated;
[0117] Figure 8D This application illustrates a schematic diagram of another read channel structure provided in an embodiment of the present application;
[0118] Figure 8E This application illustrates a schematic diagram showing the data storage process of each cache in the logic circuit of an embodiment.
[0119] Figure 8FThis application illustrates a schematic diagram of the structure of multiple caches in the logic circuit of an embodiment;
[0120] Figure 9 A schematic diagram illustrating the accelerator performing read-modify-write operations according to an embodiment of this application is shown. Detailed Implementation
[0121] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0122] Figure 2A A schematic diagram of the structure of the control component provided in the embodiments of this application is shown.
[0123] exist Figure 2A In this system, the control components include a master device, an accelerator, and a slave device. For example, the master device may be a CPU, a media interface controller, or a processing core; the slave device may be a memory controller. The master device and the accelerator, and / or the accelerator and the slave device, are coupled, for example, via a bus. As another example, the accelerator in this embodiment may be an L2P accelerator, used to accelerate the storage of valid data of L2P table entries indicated by write commands sent by the master device into the L2P table of memory.
[0124] The control unit is also coupled to an external memory ( Figure 2A The memory controller is used to access external memory (DRAM). For example, an accelerator includes a slave interface and a master interface. The accelerator is coupled to a bus via both the slave and master interfaces. Thus, one or more master devices of the control unit (e.g., CPU, media interface controller) can access the accelerator as a bus slave via the slave interface. Conversely, the accelerator can access one or more slave devices of the control unit (e.g., the memory controller) as a master via the master interface.
[0125] As an example, a memory located outside the control unit is used to store L2P tables. The master device can write L2P table entry data to the memory's L2P table. The master device issues a write command indicating the L2P table entry data to the bus. The bus sends the write command to an accelerator coupled to the bus. The accelerator determines the location of the L2P table entry data in the memory based on the address index indicated in the received write command, and sends the L2P table entry data indicated by the write command to a slave device (such as a memory controller) via the bus. The slave device sends the received L2P table entry data to the memory, and the memory stores the L2P table entry data in the L2P table according to its location.
[0126] Figure 2B A schematic diagram of the structure of the L2P accelerator provided in the embodiments of this application is shown.
[0127] exist Figure 2B In this embodiment, the L2P accelerator includes a write channel and a read channel. The read channel is a circuit that forms a data path for reading data from memory, and the write channel is a circuit that forms a data path for writing data to memory. To reduce the memory space occupied by the L2P table and to reduce or eliminate the impact of non-byte-aligned L2P table entries on access to L2P table entries by the CPU or other devices within the chip, the L2P table stored in memory is a compressed L2P table. The size of the compressed L2P table entries may not be an integer multiple of bytes. Furthermore, the compressed L2P table entries are tightly packed in memory without leaving unused memory space between entries for byte alignment.
[0128] Furthermore, to enable the memory to store the compressed L2P table, the accelerator provided in this embodiment processes the data of the L2P table entries indicated by the write command, extracting only the valid data of the L2P table entries and sending it to the memory for storage, while the invalid data of the L2P table entries is not provided to the memory. Additionally, the memory includes multiple aligned storage units, each used to store the valid data of multiple entries in the L2P table. The valid data of the multiple entries in the L2P table does not need to be stored in the memory according to byte boundary alignment.
[0129] Furthermore, in the solution provided in this application embodiment, the accelerator reads L2P table entries from the memory according to the read command sent by the master device. For example, the memory stores a compressed L2P table, and the L2P table entries read by the accelerator from the memory are valid data of the L2P table entries. Data is transmitted between the master device and the accelerator via a bus, and the transmitted data must meet the bus protocol. For example, the L2P table entries transmitted between the accelerator and the master device must meet byte alignment or 8-byte alignment. If the L2P table entries stored in the memory do not meet the bus protocol, during the process of the accelerator reading the L2P table entries from the memory according to the read command sent by the master device and transmitting them to the master device, it is necessary to convert the L2P table entries read from the memory into L2P table entries perceived by the master device.
[0130] Figure 2C This illustration shows a schematic diagram of the conversion between L2P table entries perceived by the master device and L2P table entries stored in the memory, as provided in this application.
[0131] An L2P table stored in memory (SRAM or DRAM) consists of multiple entries, and each entry in the L2P table is addressed by a logical address (denoted as LBA). Figure 2C In this system, the L2P table entries perceived by the master device correspond one-to-one with the L2P table entries stored in memory. Therefore, the L2P table entries perceived by the master device and the L2P table entries stored in memory have the same number of entries. For example, if the L2P table includes 8 entries, namely entry 0, entry 1, entry 2, entry 3, entry 4, entry 5, entry 6, and entry 7, the size of the L2P table entries perceived by the master device is M bits, and the size of the L2P table entries stored in memory is N bits, where M and N are both positive integers. It should be understood that the L2P table entries perceived by the master device can be L2P table entries that the master device writes to via a write command, or L2P table entries that are read via a read command. The L2P table perceived by the master device is also called the logical L2P table.
[0132] To facilitate, for example, CPU access to the logical L2P table, the size of M is, for example, an integer multiple of bytes (e.g., 8 bytes), so that entries in the logical L2P table are aligned to 8 bytes or bytes. Figure 2C From the perspective of the CPU accessing the logical L2P table, each entry in the L2P table perceived by the master device is M bits in size. Figure 2A In the example, M = 64), the entries of the L2P table perceived by the master device are arranged sequentially in the storage space, one end to the other. The corresponding L2P table entry is obtained by indexing the storage space of the L2P table perceived by the master device using a logical address (LBA). For example, ... size(L2P entry) represents the size of an L2P table entry, for example, 64 bits; This indicates rounding down. The L2P table entries record the addresses used by the NVM chip (called physical addresses, denoted as PBA). Because the L2P table perceived by the master device is aligned to 8 bytes or bytes, each entry starts at a multiple of 8 bytes in memory and ends at a multiple of 8 bytes. Figure 2C In the example, when the CPU accesses the corresponding entry in the L2P table using the logical address (LBA) as an index, it obtains the address of the corresponding entry in the L2P table by, for example, LBA*8 (64 bits corresponding to 8 bytes).
[0133] Since each entry in the L2P table perceived by the master device may contain some or all valid data, N equals M when all entries in the L2P table perceived by the master device contain valid data. When each entry in the L2P table perceived by the master device contains some valid data and some empty bits, the size N of the L2P table entries stored in the memory is equal to the number of valid data entries in the L2P table perceived by the master device. The number of valid data entries in the L2P table perceived by the master device is determined based on the number of data units (e.g., pages) provided by the addressing NVM chip. For example, to address 2^30 data units, N is 30. Generally, if the entries in the L2P table stored in the memory can address one of 2^n data units, then N = n. As an example, Figure 2C In this context, N = 30. The L2P table stored in the memory stores the valid data for each entry, and the valid data for each entry is stored in the memory space provided by the memory in a head-to-tail order. There is no unused storage space between adjacent entries. Therefore, the start and / or end positions of some entries are not located at byte boundaries.
[0134] Back Figure 2B For example, the master device sends one or more write commands to the accelerator. The accelerator can process the write commands one by one or process multiple write commands in parallel to store the L2P table entries indicated by the write commands into memory.
[0135] Figure 3A A schematic diagram illustrating the accelerator processing write commands provided in an embodiment of this application is shown.
[0136] As an example, in the solution provided in the embodiments of this application, the accelerator mainly processes write commands through a write channel. Figure 3AIn the accelerator structure shown, the write channel includes logic circuit Q1 and multiple caches N1. Logic circuit Q1 includes a parsing module 11, a calculation module 12, and a packing module 13. The parsing module 11, in response to receiving a write command, parses the write command to obtain the address index and L2P table entry data indicated by it, and caches the address index and the L2P table entry data in multiple caches N1. The calculation module 12, coupled to multiple caches N1, determines the address of the storage unit storing the valid data of the L2P table entry in the memory and the position of the first bit of the valid data in the memory based on the address index and the number of valid data bits; it stores the address of the storage unit and the position of the first bit of the valid data in the memory in multiple caches N1. The packing module 13, coupled to multiple caches N1, in response to the valid data not being byte-aligned, receives response data from the memory based on each read command, combines the valid data with a portion of the response data; generates data 1 based on the first protocol information stored in multiple caches N1 and the combined data, and sends data 1 to the memory. It should be understood that the data 1 sent by the memory controller to the memory carries protocol information (such as AXI protocol information), but the memory only stores the valid data of the L2P table entries in the storage cell and does not store the protocol information.
[0137] The master device sends a write command to the write channel. The parsing module 11 in the logic circuit Q1 of the write channel receives the write command, as shown in process (2.1). During this process, the master device and the accelerator can interact with each other via a bus, such as the AXI bus. In addition, the write command indicates the data of the L2P table entry and the address index (e.g., logical address LBA). In response to receiving the write command sent by the master device, the parsing module 11 obtains the address index and the data of the L2P table entry indicated by it from the write command, and stores the address index and the data of the L2P table entry in the cache N1, as shown in process (2.2). Then, the calculation module 11 calculates one or more memory addresses (e.g., one or more memory cell addresses) in the memory that store the valid data of the L2P table entry based on the address index. After calculating one or more memory addresses that store the valid data of the L2P table entry, the first bit of the valid data of the L2P table entry is determined in its corresponding memory cell based on the number of bits of the valid data of the L2P table entry, and the one or more memory addresses and the position of the first bit of the valid data in the memory are stored in multiple caches N1, as shown in process (2.3). In addition, as an example, in Figure 3A In the accelerator shown, the logic circuit Q1 includes a parsing module 11, a calculation module 12, and a packing module 13, as well as a merging unit 14. The merging unit 14 obtains the valid data of the L2P table entry indicated by the write command from multiple caches N1 and stores it in multiple caches N1, as shown in process (2.4).
[0138] Furthermore, since the accelerator transmits data with the master device and the slave device via a bus, the data transmitted between the accelerator and the master device and the slave device must meet the data transmission method defined by the bus protocol (such as the AXI protocol). For example, the bus protocol defines that the data transmitted through the bus must be transmitted in a byte-aligned manner. For example, the transmitted data bit width must be an integer multiple of bytes, or an integer multiple of 8 bytes. That is, the accelerator provided in this application needs to transmit the valid data of the L2P table entry to the slave device in a byte-aligned manner. If the valid data of the L2P table entry is not byte-aligned or the first bit is not located at the beginning of its corresponding memory cell in the memory, the accelerator also needs to perform a read-modify-write operation through the read channel. The read-modify-write operation includes processes (2.5) to (2.12). As an example, in response to valid data byte alignment and the first bit not being located at the start position of its corresponding storage cell in the memory or not being byte aligned, the read channel generates one or more read commands based on one or more memory addresses, as shown in process (2.5); the read channel sends one or more read commands to the memory, as shown in processes (2.6) to (2.7). In response to receiving response data from the memory based on each read command, the packing module 13 combines the valid data of the L2P table entry with a portion of the data in the response data according to the position of the first bit in the memory; it generates data 1 based on the first protocol information stored in the cache and the combined data, wherein the first protocol information is AXI protocol information, data 1 contains protocol information and valid data of the L2P table entry, and sends data 1 to the memory so that the memory stores the valid data of the L2P table entry in data 1, as shown in processes (2.8) to (2.12). For example, the length of the L2P table entry is 64 bits, the valid data of the L2P table entry is 30 bits, the first 30 bits of data 1 are the valid data of the L2P table entry, and the last 34 bits are a portion of the data in the response data. In addition, after writing the valid data of the L2P table entry indicated by the write command into the memory, the write channel also sends feedback information to the master device to indicate that the write command processing is complete, which is represented as process (2.13).
[0139] As another example, if the valid data bytes of the L2P table entry indicated by the write command are aligned and the first bit is located at the beginning of its corresponding storage unit, the packing module 13 directly obtains the valid data of the L2P table entry, generates data 1 with the valid data of the L2P table entry and the first protocol information, and sends data 1 to the memory according to the memory address, as represented by processes (2.10) to (2.12). That is, in the solution provided by the embodiments of this application, the read-modify-write operation (processes (2.5) to (2.12)) is only performed when the valid data of the L2P table entry is not byte aligned or the first bit is not located at the beginning of its corresponding storage unit in the memory.
[0140] Furthermore, in the solution provided in this application embodiment, during the processing of write commands, the accelerator can perform a write operation immediately after receiving a write command, writing the valid data of the L2P table entry indicated by the write command into the memory. To save bandwidth resources or reduce the amount of data accessed to the memory, the accelerator can also wait to receive multiple L2P table entries indicated by write commands before performing a write operation to write the valid data of multiple L2P table entries into the memory. As another example, after the accelerator waits to receive multiple L2P table entries indicated by write commands, the merging unit 14 can concatenate the valid data of multiple L2P table entries to obtain one or more concatenated data sets, and then the accelerator can perform a write operation to write one or more concatenated data sets into the memory (the specific concatenation process is described below); alternatively, the accelerator can directly write the valid data of multiple L2P table entries into the memory based on the memory address corresponding to each L2P table entry and the position of the first bit of the valid data in the memory.
[0141] Furthermore, as an example, there may be more than one L2P table entry to be written to memory. The master device writes multiple L2P table entries to memory by sending multiple write commands. For multiple write commands, the accelerator can process them one by one or in parallel. Taking two write commands sent by the master device, namely write command A1 and write command A2, where write command A1 instructs to write L2P table entry 120 to memory and write command A2 instructs to write L2P table entry 121 to memory, as an example, the parallel processing mechanism of the accelerator will be explained.
[0142] Figure 3B A schematic diagram illustrating the parallel processing of multiple write commands by an accelerator provided in an embodiment of this application is shown.
[0143] As an example, in Figure 3BIn this process, the master device sends a write command A1 to the write channel, as shown in process (3.1). In response to receiving the write command A1 from the master device, logic circuit Q1 in the write channel retrieves address index 1 and the data of the L2P table entry 120 indicated by it from the write command A1, and stores address index 1 and the data of L2P table entry 120 into cache N1. Then, based on address index 1, it calculates one or more memory addresses (such as one or more memory cell addresses) in memory that store the valid data of L2P table entry 120. After calculating one or more memory addresses that store the valid data of L2P table entry 120, it also determines the position of the first bit of the valid data of L2P table entry 120 in its corresponding memory cell based on the number of bits of the valid data in L2P table entry 120, and stores one or more memory addresses, the position of the first bit of the valid data in memory, and L2P table entry 120 into cache N1, as shown in process (3.2).
[0144] After storing the first bit of the valid data of L2P table entry 120 in the corresponding memory cell of one or more memory addresses corresponding to write command A1, and storing L2P table entry 120 in the cache, the write channel receives write command A2 from the master device, as shown in process (3.3). After receiving write command A2, regardless of whether the write channel has completed the operation of writing the valid data of L2P table entry 120 to memory, the logic circuit Q1 in the write channel obtains address index 2 and L2P table entry 121 from the write command A2, determines one or more memory addresses and the position of the first bit of the valid data of L2P table entry 121 in memory according to the index address 2 and the number of bits of the valid data of L2P table entry 121; and stores the mapping relationship between the identification information of write command A2 and memory address 2 and its position in cache N1; and stores the valid data of L2P table entry 121 in cache N1, as shown in process (3.4).
[0145] According to an embodiment of this application, after receiving write command A2, regardless of whether the operation of writing valid data of L2P table entry 120 to memory is completed, the accelerator can still process the received write command A2, parse it to obtain address index 2 and L2P table entry 121, and determine one or more storage addresses and the position of the first bit in the valid data of L2P table entry 121 in memory based on the index address 2 and the number of bits of valid data in L2P table entry 121. Thus, the accelerator has the ability to process multiple write commands issued by the master device in parallel. Although Figure 3B The example provided uses two write commands, A1 and A2, issued by the master device to illustrate that, understandably, the accelerator can process a larger number of write commands from the host in parallel.
[0146] Furthermore, in the solution provided in this application embodiment, during the processing of multiple write commands, the accelerator can perform a write operation immediately after receiving a write command, writing the valid data of the L2P table entry indicated by the write command into the memory. To save bandwidth resources or reduce the amount of data accessed to memory, the accelerator can also wait to receive multiple L2P table entries indicated by multiple write commands before performing a write operation to write the valid data of multiple L2P table entries into memory. As another example, after waiting to receive multiple L2P table entries indicated by multiple write commands, the accelerator can concatenate the valid data of multiple L2P table entries to obtain one or more concatenated data sets, and then write one or more concatenated data sets into memory by performing a single write operation (the specific concatenation process is described below); alternatively, it can avoid concatenating the valid data of multiple L2P table entries and directly write the valid data of multiple L2P table entries into memory based on the memory address corresponding to each L2P table entry and the position of the first bit of the valid data in memory.
[0147] Furthermore, in the aforementioned Figure 3A As previously described, during the process of writing valid data of L2P table entries indicated by write commands into memory, if the valid data of an L2P table entry is not byte-aligned, or is byte-aligned but its first bit is not located at the beginning of its corresponding memory cell, the accelerator needs to perform read-modify-write operations during the process. When the accelerator processes multiple write commands in parallel, it may involve read-modify-write operations on multiple write commands, or it may involve read-modify-write operations on multiple write commands that can be concatenated from the indicated L2P table entries. For ease of understanding, the following is a brief introduction to the read-modify-write operation process of multiple write commands that can be concatenated from the indicated L2P table entries, using write commands A1 and A2 as examples. Specifically, L2P table entries 120 indicated by write command A1 and L2P table entries 121 indicated by write command A2 can be concatenated.
[0148] As an example, in Figure 3BIn the process, after logic circuit Q1 determines one or more memory addresses and the position of the first bit in the valid data of L2P table entry 120 in memory based on the index address 1 corresponding to write command A1 and the number of bits of the valid data, if the valid data of L2P table entry 120 is not byte-aligned or is byte-aligned but the first bit of its valid data is not located at the beginning position of its corresponding memory cell, then the read channel generates read command B11 and read command B12 based on the one or more memory addresses determined by logic circuit Q1. The identifier information for write command A1 is ID1, the identifier information for read command B11 is ID1_1, and the identifier information for read command B12 is ID1_2. After the read channel generates read command B11 and read command B12, the accelerator stores the mapping relationship between the identifier information of write command A1 and the identifier information of read command B11 and read command B12 in cache N1, for example...<ID1→ID1_1,ID1_2> This is represented as process (3.5). Further, after logic circuit Q1 determines one or more memory addresses and the position of the first bit in the effective data of L2P table entry 122 in memory based on the index address 2 corresponding to write command A2 and the number of bits of the effective data of L2P table entry 122, if the effective data of L2P table entry 122 is not byte-aligned or byte-aligned but its first bit is not located at the beginning of its corresponding memory cell, the read channel generates read command B21 and read command B22 based on one or more memory addresses determined by the write channel. The identifier information for write command A2 is ID2, the identifier information for read command B21 is ID2_1, and the identifier information for read command B22 is ID2_2. After the read channel generates read command B21 and read command B22, the accelerator stores the mapping relationship between the identifier information of write command A2 and the identifier information of read command B21 and read command B22 in cache N1, for example,<ID2→ID2_1,ID2_2> This is represented as process (3.6). At this time, the mapping relationship between the identification information of write command A1 and the identification information of read command B11 and read command B12, as well as the mapping relationship between the identification information of write command A2 and the identification information of read command B21 and read command B22, are stored in cache N1.
[0149] Furthermore, logic circuit Q1 extracts valid data from L2P table entry 120 stored in the cache, and extracts valid data from L2P table entry 121. It then concatenates the valid data of L2P table entry 120 and L2P table entry 121 to obtain concatenated valid data of L2P table entry 120 and L2P table entry 121, and stores the concatenated valid data of L2P table entry 120 and L2P table entry 121 in cache N1. The read channel can generate one or more new read commands based on the memory addresses corresponding to the valid data of the concatenated L2P table entries 120 and 121. For example, the memory addresses corresponding to the valid data of the concatenated L2P table entries 120 and 121 include two different memory addresses. The read channel generates read command C1 and read command C2 based on these two different memory addresses. The identifier information for read command C1 is ID3_1, and the identifier information for read command C2 is ID3_2. Based on the identifier information of read command C1 and read command C2, the cached mapping relationships between the identifier information of write command A1 and the identifier information of read commands B11 and B12, and between the identifier information of write command A2 and the identifier information of read commands B21 and B22 are updated. It should be understood that in the updated mapping relationships, the identifier information of the read command corresponding to write command A1 may be the same as or different from the identifier information of the read command corresponding to write command A2. For example, the updated mapping relationship is:<ID1→ID3_1> ;<ID2→ID3_2> In order to replace read commands B11, B12, B21 and B22 with read commands C1 and C2, it is necessary to wait for a period of time (denoted as threshold Tw) after generating read commands B11 and B12 to identify whether other write commands A2 that can be merged with write command A1 have been received. The read channel sends read commands C1 and C2 to the memory. Logic circuit Q1 responds to the response data of read commands C1 and C2 from the memory, combines the valid data of the concatenated L2P table entry 120 and L2P table entry 121 with a portion of the response data of read commands C1 and C2, and sends the combined data along with the first protocol information to the memory. The memory receives the combined data and stores it in the memory. The valid data of L2P table entry 120 is stored in the memory at memory address 1 and its corresponding location; the valid data of L2P table entry 121 is stored in the memory at memory address 2 and its corresponding location, as represented by processes (3.7) to (3.14). Processes (3.7) to (3.14) are the read-modify-write operation process for multiple write commands that can be concatenated for the indicated L2P table entry.In addition, after writing the valid data of L2P table entry 120 indicated by write command A1 and the valid data of L2P table entry 121 indicated by write command A2 into the memory, the write channel also sends feedback information to the master device to indicate that the write command processing is complete, which is represented as process (3.15).
[0150] As another example, instead of replacing read commands B11, B12, B21, and B22 with read commands C1 and C2, after generating read commands B21 and B22, the system considers whether the data to be read by read commands B21 and B22 can be read by read commands B11 and / or B12. If the data read by read commands B11 / B12 already contains the data to be read by read commands B21 / B22, then read commands B21 / B22 do not need to be sent to the memory; only read commands B11 / B12 are sent to the memory. This reduces the bus bandwidth usage and the memory access load. Understandably, if the data to be read by one of read commands B21 / B22 is contained in the data read by read commands B11 / B12, then only the read command whose data is not contained in the previous read command (B11 / B12) is sent to the memory. Correspondingly, the mapping relationship of the cached records does not need to be updated. The identification information of write command A1 → the identification information of read command B11 and read command B12, and the identification information of write command A2 → the identification information of read command B21 and read command B22.
[0151] As another example, if the valid data bytes of L2P table entry 120 indicated by write command A1 are aligned, and the first bit of its valid data is located at the beginning of its corresponding memory cell, then the read channel does not need to generate one or more read commands based on write command A1. The memory still needs to store the identification information of write command A1 and the information that one or more read commands have not been generated, for example, <Identifier of write command A1, no read command generated>. Similarly, if the valid data bytes of L2P table entry 121 indicated by write command A2 are aligned, and the first bit of its valid data is located at the beginning of its corresponding memory cell, then the cache still needs to store the identification information of write command A2 and the information that one or more read commands have not been generated, for example, <Identifier of write command A2, no read command generated>.
[0152] As another example, during the parallel processing of write commands A1 and A2, after issuing read commands B11 and B12 to the memory, if the waiting time (threshold Tw) for merging write commands is exceeded, or if a write command corresponding to a read-modify-write operation of write command A1 has already been issued to the memory, in response to generating read commands B21 and B22 based on memory address 2, the accelerator identifies a conflict between the memory addresses corresponding to read commands B11 and B12 and the memory addresses indicated by read commands B21 and B22. That is, there is a conflict between the memory addresses corresponding to read commands B11 and B12 and one or more memory addresses 2 corresponding to write command A2. For example, if the memory address indicated by read command B11 is the same as a memory address 2, the accelerator suspends the processing of write command A2 to avoid mutual interference between write commands A1 and A2, leading to data errors. Furthermore, in response to suspending the processing of write command A2, the accelerator also suspends the processing of other write commands received after write command A2. As another example, in response to receiving information that write command A1 has been completed, such as information that valid data in L2P table entry 120 has been written to memory, the accelerator resumes processing of the suspended write command A2.
[0153] As another example, in response to receiving information that write command A1 has been completed, the accelerator resumes processing of the suspended write command A2. For instance, after the write channel receives information that valid data for L2P table entry 120 has been written to memory, it controls the read channel to send read commands B21 and B22 corresponding to write command A2 to the memory. Alternatively, the write channel receives all the response data for read commands B21 and B22 from the memory, and then sends one or more memory addresses corresponding to write command A2 to the read channel to generate read commands B21 and B22. The read channel then sends read commands B21 and B22 to the memory.
[0154] Figure 3C This illustration shows a combination of valid data of the L2P table entry indicated by the write command provided in an embodiment of this application and response data fed back by the memory.
[0155] As an example, in Figure 3CIn the diagram, the write command A1 received by the accelerator indicates L2P table entry 120, which is 64 bits in size and has 30 bits of valid data. The memory cell storing the valid data of the L2P table entry is also 64 bits (0-63). The valid data of L2P table entry 120 is stored in bits 31 to 60 of the first memory cell. This means the valid data of L2P table entry 120 is not byte-aligned, and the first bit of the valid data is located at bit 31 of the first memory cell, not at the beginning of its corresponding memory cell. Therefore, when the accelerator writes the valid data of L2P table entry 120 into memory, a read-modify-write operation is required. The read channel generates a read command based on the address of the first storage cell in the memory and sends the read command to the memory controller. The memory controller reads 64 bits of data from the first storage cell in the memory and sends it to the accelerator. The accelerator combines the valid data of L2P table entry 120 with the data from bits 0 to 30 in the first storage cell and the data from bits 61 to 63 in the first storage cell to obtain data A. The valid data of L2P table entry 120 is located in bits 31 to 60 of data A.
[0156] Figure 4 A schematic diagram of another accelerator provided in an embodiment of this application is shown.
[0157] As an example, in Figure 4 In the write channel, multiple caches N1 include cache 1, cache 2, cache 3, cache 4, cache 5, cache 6, cache 7, and cache 8. Cache 1 is used to cache the address index obtained from parsing the write command; cache 2 is used to cache the data to be written by the write command (the complete data of the L2P table entry) obtained from parsing the write command; cache 3 is used to cache one or more memory addresses in memory that determine the location of the first bit of the valid data for the L2P table entry based on the address index and the number of valid data bits; cache 4 is used to cache valid data byte alignment information; cache 5 is used to cache protocol information; cache 6 is coupled to cache Cache 2 and caches the valid data of the L2P table entry indicated by the write command; cache 7 is coupled to cache Cache 6 and caches the valid data of the L2P table entry; cache 8 is coupled to logic circuit Q1 and caches the response data of the read command sent by the read channel. In addition, cache 3, cache 4 or cache 5 in the multiple caches N1 can also store the identification information of one or more read commands generated by the read channel, so that when the logic circuit receives the response data fed back based on each read command, it can determine the valid data of the L2P table entry combined with the response data according to the identification information.
[0158] Further, see Figure 4 The write channel includes multiple caches N1, including cache 9 and cache 10. When the accelerator processes multiple write commands in parallel, cache 9 caches the mapping between the write command's identification information and the memory address and the location of the first bit of the valid data in the L2P table entry in memory; cache 10 caches the write command's identification information and its corresponding identification information for one or more read commands. For example, Figure 4 Cache 9 can be any of cache 3, cache 4, or cache 5; or it can be any cache other than cache 1, cache 2, cache 3, cache 4, cache 5, cache 6, cache 7, or cache 8. Similarly, cache 10 can be any of cache 3, cache 4, or cache 5; or it can be any cache other than cache 1, cache 2, cache 3, cache 4, cache 5, cache 6, cache 7, or cache 8.
[0159] As another example, when the accelerator receives a write command and performs a write operation, cache 6 stores the valid data of one or more individual L2P table entries. When the accelerator performs a write operation on multiple received write commands, cache 6 can store the valid data of individual L2P table entries, or it can store one or more concatenated data sets of valid data from multiple L2P table entries. Here, the valid data of an individual L2P table entry is relative to the concatenated data; "individual" means that the valid data of multiple L2P table entries are not concatenated (instead, the valid data of each L2P table entry occupies cache 6 in a time-sharing manner, and at any given time, cache 6 only holds the valid data of a single L2P table entry; when the valid data of the L2P tables of two write commands cannot be concatenated, cache 6 stores the valid data of a single L2P table entry). As another example, in order to store one or more concatenated data sets in cache 6, cache 6 includes one or more storage units.
[0160] The size of the storage cells in cache 6 is the same as the size of the storage cells in memory; furthermore, the location where the concatenated data is stored in the storage cells of cache 6 is the same as the location where the valid data of each L2P table entry is stored in the corresponding storage cell in memory. For example, the valid data of L2P table entry 120 is stored in memory at bits 0 to 29 of a certain storage cell, and L2P table entry 121 (see also...) Figure 3C If the valid data of L2P table entry 120 is stored in bits 30 to 59 of the storage unit, then the valid data of L2P table entry 120 in cache 6 is stored in bits 0 to 29 of its first storage unit, and the valid data of L2P table entry 121 is stored in bits 30 to 59 of its first storage unit.
[0161] As another example, in Figure 4In this configuration, multiple caches N1 may include cache 1, cache 2, cache 6, and cache 7. Cache 6 is coupled to cache 2 and caches the valid data of the L2P table entry indicated by the write command. One or more memory addresses, the location of the first bit of the valid data in memory, valid data byte alignment information, protocol information, and data read from memory by read-modify-write operations can be stored separately. The valid data byte alignment information and protocol information are known to the accelerator in advance.
[0162] As an example again, Figure 4 In this cache, cache 7 comprises multiple storage units. The size of the storage units in cache 7 is, for example, the same as that in cache 6. Valid data for L2P table entries that need to be read and modified is placed in cache 7 and awaits retrieval from memory. Valid data for L2P table entries that do not need to be read and modified can be written to memory after entering cache 7.
[0163] To facilitate understanding the following... Figure 4 A brief introduction to the working process of the accelerator.
[0164] exist Figure 4 In the process, the master device sends a write command A1 to the accelerator. Upon receiving the write command A1, the parsing module 11 in logic circuit Q1 parses the write command A1 to obtain the address index 1 and the data of L2P table entry 120, and stores the address index 1 in cache 1; and stores the data of L2P table entry 120 in cache 2. Then, the calculation module 12 in logic circuit Q1 is coupled with cache 1, obtains the address index 1 from cache 1, and calculates one or more memory addresses 1 storing the valid data of L2P table entry 120 in the memory, and the position 1 of the first bit of the valid data in the memory, based on the address index 1 and the number of valid data bits, and stores one or more memory addresses 1 and the position 1 of the first bit of the valid data in the memory in cache 3; and stores the mapping relationship between the identification information of write command A1 and memory address 1 and position 1 in cache 9. Merging unit 14 will retrieve the valid data of L2P table entry 120 from cache 2 and store it in cache 6. During this process, merging unit may also concatenate the valid data of multiple L2P table entries and obtain one or more concatenated data in cache 6. Then, merging unit will move the valid data or concatenated data in cache 6 to cache 7.
[0165] As another example, during the accelerator's processing of write command A1, the master device sends write command A2 to the accelerator. Regardless of whether write command A1 has been completed, the parsing module 11 parses write command A2 to obtain address index 2 and L2P table entry 121 data, and stores address index 2 in cache 1; and stores L2P table entry 121 data in cache 2; then the calculation module 12 retrieves address index 2 from cache 1, and calculates one or more memory addresses 2 storing the valid data of L2P table entry 121 in the memory, and the position 2 of the first bit of the valid data in the memory, based on address index 2 and the number of valid data bits, and stores one or more memory addresses 2 and the position 2 of the first bit of the valid data in the memory in cache 3; and stores the mapping relationship between the identification information of write command A2 and memory address 2 and position 2 in cache 9.
[0166] Furthermore, in response to the inability to concatenate L2P table entry 121 in cache 2 with L2P table entry 120 stored in cache 6, the merging unit 14 moves the valid data of L2P table entry 120 to cache 7; in response to the inability to concatenate L2P table entry 121 in cache 2 with L2P table entry 120 stored in cache 6, the merging unit 14 writes the valid data of L2P table entry 121 to cache 6, and then moves the concatenated valid data of L2P table entry 120 and the valid data of L2P table entry 121 from cache 6 to cache 7; or, in response to moving the valid data of L2P table entry 120 and the valid data of L2P table entry 121 to cache 7, and in response to the absence of corresponding read commands for write commands A1 and A2, the merging unit 14 writes the valid data of L2P table entry 120 and the valid data of L2P table entry 121 from cache 7 into memory.
[0167] When no read-modify-write operation is required during the processing of write command A1 and / or write command A2, the packing module 13 retrieves valid data or concatenated data from cache 7 and protocol information from cache 5. It then generates a new data set by combining the valid data or concatenated data with the protocol information and sends this new data to the memory controller. The memory controller then sends this new data to the memory. Additionally, after sending this new data to the memory controller, the accelerator sends a feedback message to the master device indicating that the write command processing is complete. For example, after generating this new data, the packing module 13 can add a marker to the new data to obtain processed data, which is then sent to the memory. The marker identifies the position of the last bit in the processed data. Furthermore, the calculation module 12 stores the identification information of write command A1 and information indicating that one or more read commands have not been generated in cache 10; and / or the calculation module 12 stores the identification information of write command A2 and information indicating that one or more read commands have not been generated in cache 10.
[0168] Furthermore, when a read-modify-write operation needs to be performed during the processing of write command A1 and / or write command A2, the read-modify-write operation needs to be completed by combining the read channel and the write channel. The detailed process of how each module in the read channel cooperates with each module in the write channel to complete the read-modify-write operation is described below and will not be described here.
[0169] Figure 5A This illustration shows a schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0170] As an example, in Figure 5AIn the memory, cache 6 is 128 bits in size, comprising two storage units, storage unit 1 and storage unit 2, each 64 bits in size. Storage unit 1 ranges from 0 to 63 bits, and storage unit 2 ranges from 64 to 127 bits. After waiting to receive data from L2P table entries 120 and 121, the accelerator performs a write operation, writing the valid data of L2P table entries 120 and 121 into memory. L2P table entries 120 and 121 are adjacent in the L2P table and precede each other, both being 64 bits in size, with valid data of M bits in each, where 1 ≤ M ≤ 64. If the storage units corresponding to L2P table entries 120 and 121 are the same storage unit in memory, the first bit of the valid data of L2P table entry 120 is the 0th bit of that storage unit, and the first bit of the valid data of L2P table entry 121 is the Mth bit of that storage unit. If the accelerator receives L2P table entry 120 before L2P table entry 121, the merging unit extracts the valid data of L2P table entry 120 from the second cache and stores it in bits 0 to M-1 of storage unit 1. After extracting the valid data of L2P table entry 121, it stores it in bits M to 2M-1 of storage unit 1 to obtain a concatenated data. The size of the concatenated data is 2M bits, and its position in storage unit 1 is bits 0 to 2M-1. It should be understood that when the storage units corresponding to L2P table entry 120 and L2P table entry 121 are the same storage unit in the memory, the first bit of the valid data of L2P table entry 120 may not be located in bit 0 of the storage unit, but can be any bit L of the storage unit, 1≤L≤63-2M. This case will not be elaborated here.
[0171] Figure 5B This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0172] As an example, in Figure 5BIn the L2P table, L2P table entry 120 and L2P table entry 121 are adjacent in the L2P table and precede each other. Both entries are 64 bits in size, and both have valid data of size M bits, where 1 ≤ M ≤ 64. If the storage units corresponding to L2P table entries 120 and 121 are different storage units in the memory, the first bit of the valid data of L2P table entry 120 is located at bit 63-M of one storage unit in the memory, and the first bit of the valid data of L2P table entry 121 is located at bit 0 of another storage unit in the memory. The storage units storing the valid data of L2P table entry 120 and L2P table entry 121 are adjacent in the memory. If the accelerator receives L2P table entry 120 before L2P table entry 121, the merging unit extracts the valid data of L2P table entry 120 from the second cache and stores it in bits 63-M to 63 in storage unit 1. After extracting the valid data of L2P table entry 121, it stores it in bits 0 to M-1 in storage unit 2, resulting in two concatenated data sets. The size of one concatenated data set stored in storage unit 1 is M, and its position in storage unit 1 is bits 63-M to 63. The size of one concatenated data set stored in storage unit 2 is M, and its position in storage unit 2 is bits 0 to M-1.
[0173] Figure 5C This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0174] As an example, in Figure 5C In this context, the memory cells corresponding to L2P table entries 120 and 121 are partially identical. For example, a portion of the valid data of L2P table entry 120 is stored in one memory cell, while another portion of the valid data of L2P table entry 120 is stored in another memory cell along with the valid data of L2P table entry 121. For instance, the first Q bits of the valid data of L2P table entry 120 are stored in the previous memory cell, and the MQ bits are stored in the same memory cell as the valid data of L2P table entry 121, where 1 ≤ Q ≤ M. In this case, when the merging unit concatenates the valid data of L2P table entry 120 and the valid data of L2P table entry 121, it obtains two concatenated data. One concatenated data is stored in storage unit 1 of cache 6, which is the first Q bits of the valid data of L2P table entry 120. The other concatenated data is stored in storage unit 2 of cache 6, which is the data obtained by concatenating the last MQ bits of the valid data of L2P table entry 120 with the valid data of L2P table entry 121.
[0175] Figure 5D This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0176] As an example, in Figure 5D In the L2P table, L2P table entry 120 and L2P table entry 121 are adjacent in the L2P table and precede each other. Both entries are 64 bits in size, and the size of their valid data is M bits, where 1 ≤ M ≤ 64. If the memory locations corresponding to L2P table entries 120 and 121 are partially identical—for example, if a portion of the valid data of L2P table entry 120 is stored in the same memory location as the valid data of L2P table entry 121, and the other portion of the valid data of L2P table entry 120 is stored in the memory location preceding that location (see [link to documentation] for details)—[further details omitted]. Figure 3C For example, in L2P table entry 120, Q bits of the valid data are stored in the previous storage unit, and MQ bits are stored in the same storage unit as the valid data of L2P table entry 121, where 1 ≤ Q ≤ M. If the accelerator receives L2P table entry 121 before L2P table entry 120, the merging unit extracts the valid data of L2P table entry 121 from the second cache and stores it in storage unit 1 from bits MQ-1 to 2M-Q-1. Since L2P table entry 120 is located before L2P table entry 121 in the L2P table, after the merging unit extracts the valid data of L2P table entry 120, it needs to place the valid data of L2P table entry 120 before the valid data of L2P table entry 121 in cache 6. The number of idle / invalid bits before the valid data of L2P table entry 121 in storage unit 1 is MQ bits, while the size of the valid data of L2P table entry 120 is M bits. Therefore, the first Q bits of the valid data of L2P table entry 120 cannot be stored in cache 6. The merging unit will concatenate the last MQ bits of the valid data of L2P table entry 120 with the valid data of L2P table entry 121 to obtain a concatenated data, and obtain a concatenated data in cache 6. The size of the concatenated data is 2M-Q bits, and its position in storage unit 1 is from bit 0 to bit 2M-Q-1.
[0177] Optionally, for the first Q bits of the valid data in L2P table entry 120, after moving the concatenated data in storage unit 1 to cache 7, storage unit 1 is cleared, and the cleared storage unit 1 is used to accommodate the first Q bits of the valid data in L2P table entry 120.
[0178] Figure 5E This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0179] As an example, in Figure 5E In this table, L2P table entries 120 and 121 are both 64 bits in size, and the size of the valid data in both is M bits, where 1 ≤ M ≤ 64. L2P table entries 120 and 121 are stored in the same memory cell, and the first bit of the valid data in L2P table entry 120 and the first bit of the valid data in L2P table entry 121 are located in the same memory cell, such as the Lth bit, where 0 ≤ L ≤ 63 - M. That is, L2P table entries 120 and 121 are the same L2P table entry. If the accelerator receives L2P table entry 121 before L2P table entry 120, the merging unit 14 extracts the valid data of L2P table entry 121 from the cache 2 and stores it in storage unit 1. After extracting the valid data of L2P table entry 120, the merging unit 14 overwrites the valid data of L2P table entry 121 with the valid data of L2P table entry 120 at the location where the valid data of L2P table entry 121 is stored in storage unit 1 to obtain a concatenated data. This concatenated data is located in storage unit 1 and its size is the same as the size of the valid data of L2P table entry 120 or L2P table entry 121. Similarly, if the accelerator receives L2P table entry 120 before L2P table entry 121, it overwrites the valid data of L2P table entry 120 with the valid data of L2P table entry 121 in storage unit 1 to obtain a concatenated data. That is, the concatenated data is valid data for L2P table entry 120 or valid data for L2P table entry 121.
[0180] Figure 5F This illustration shows another schematic diagram of concatenating valid data from multiple L2P table entries, as provided in an embodiment of this application.
[0181] As an example, in Figure 5FIn the L2P table, both L2P table entries 120 and 121 are 64 bits in size, and the size of the valid data is M bits, where 1 ≤ M ≤ 64. The memory locations corresponding to L2P table entries 120 and 121 are different, and they are not adjacent in the L2P table. If the accelerator receives L2P table entry 120 before L2P table entry 121, the merging unit 14 extracts the valid data of L2P table entry 120 from cache 2 and stores it in memory location 1. For example, the valid data of L2P table entry 120 is stored starting from the Lth bit in memory location 1, where 0 ≤ L ≤ 63 - M. Since L2P table entry 121 and L2P table entry 120 are not adjacent in the L2P table, after the merging unit 14 extracts the valid data of L2P table entry 121, the valid data of L2P table entry 121 cannot be stored in cache 6. Therefore, the merging unit will not concatenate the valid data of L2P table entry 120 with the valid data of L2P table entry 121. Instead, it stores the valid data of L2P table entry 120 in cache 6 and transfers the valid data of L2P table entry 120 in cache 6 to cache 7. Next, the merging unit stores the valid data of L2P table entry 121 in cache 6.
[0182] Then, the valid data of L2P table entry 121 in cache 6 is transferred to cache 7.
[0183] Furthermore, after the logic circuitry of the write channel in the accelerator stores the valid data of the L2P table entry, or one or more concatenated copies, into cache 6, it also needs to store the valid data of the L2P table entry, or one or more concatenated copies, into memory. As mentioned above, since the accelerator and slave devices (such as memory controllers) transmit data via a bus, the transmitted data needs to meet the bus protocol, for example, the valid data of the transmitted L2P table entry must be byte-aligned or 8-byte aligned. If the valid data of the transmitted L2P table entry is not byte-aligned or the first bit of the valid data of the L2P table entry is not located at the beginning of its corresponding memory cell, a read-modify-write operation needs to be performed. For ease of understanding, the following will combine the above... Figures 5A to 5F The following is a brief introduction to the read, modify, and write operation process in the context of a scenario.
[0184] for Figure 5AIn the scenario shown, as an example, if the effective data size M of L2P table entries 120 and 121 is not an integer multiple of bytes (e.g., M = 30), then the effective data of L2P table entries 120 and 121 are not byte-aligned, requiring a read-overwrite operation. In this case, the command generation module in the read channel generates a read command based on the address of the memory cell corresponding to L2P table entries 120 and 121, and sends this read command to the memory. The memory then sends the data (response data) from the memory cell corresponding to L2P table entries 120 and 121 to logic circuit Q1 based on the read command. Logic circuit Q1 combines the concatenated data in memory cell 1 of cache 6 with the response data to obtain the combined data (see [link to relevant documentation]). Figure 6A The combined data is sent to the slave device (such as the memory controller). For example, if the effective data size M of L2P table entries 120 and 121 is an integer multiple of bytes (e.g., M = 24), but the first bit of the effective data of L2P table entry 120 is not located at the 0th bit of the storage unit (e.g., the first bit is located at the 2nd bit), then the first bit of the effective data of L2P table entry 120 is not at the beginning of its corresponding storage unit. In this case, a read-modify-write operation is also required, and the specific read-modify-write operation is similar to the above and will not be elaborated here. For example, if the effective data size M of L2P table entries 120 and 121 is an integer multiple of bytes, and the first bit of the effective data of L2P table entry 120 is located at the 0th bit of the storage unit, then a read-modify-write operation is not required. The data corresponding to storage unit 1 in the seventh cache is directly sent to the slave device so that the data in storage unit 1 is stored in memory. Figure 5E In the scenario shown, a copy of the concatenated data is also cached in cache 6. The specific process is similar to that in Figure 5A, and will not be described in detail here.
[0185] for Figure 5BIn the scenario shown, as an example, two concatenated data sets are stored in cache 6. If at least one of the concatenated data sets in storage unit 1 or storage unit 2 is not byte-aligned or the first bit of the valid data is not located at the beginning of its corresponding storage unit, a read-modify-write operation will be performed on the non-byte-aligned valid data or the valid data whose first bit is not located at the beginning of its corresponding storage unit. The specific read-modify-write operation is similar to the above and will not be elaborated here. If both the concatenated data sets in storage unit 1 and storage unit 2 in cache 6 are byte-aligned, and the first bit of the valid data is located at the beginning of their respective storage units, no read-modify-write operation will be performed. Storage units 1 and 2 are moved as a whole to cache 7, and then the logic circuit sends the entire data set of storage units 1 and 2 to the slave device (memory controller). See [link to relevant documentation] for details. Figure 6B .for Figure 5C In the scenario shown, two concatenated copies of the data are also cached in cache 6. The specific process is the same as... Figure 5B Similarly, I will not elaborate further here.
[0186] For example, for Figure 5D In the scenario shown, when storing the valid data of L2P table entries 120 and 121 into cache 6, since the first Q bits of the valid data of L2P table entry 121 cannot be stored together with the valid data of L2P table entry 120 in cache 6, the merging unit concatenates the last MQ bits of the valid data of L2P table entry 120 with the valid data of L2P table entry 121 to obtain a concatenated data. A copy of this concatenated data, which is 2M-Q bits in size, is stored in cache 6, located at bits 0 to 2M-Q-1 in storage unit 1. The merging unit also sends the concatenated data stored in cache 6 to cache 7, which then sends it to the logic circuit so that the logic circuit can send the concatenated data to the slave device.
[0187] The merging unit also stores the first Q bits of the valid data from L2P table entry 120 into cache 6. At this point, the merging unit can check if there is a new L2P table entry in the second cache, and whether the new entry can be concatenated with the first Q bits of the valid data from L2P table entry 120 in cache 6. The concatenation method is similar to the various cases described above and will not be repeated here. If concatenation is not possible, the first Q bits of the valid data from L2P table entry 120 in cache 6 are sent to cache 7.
[0188] Furthermore, whether the process of writing the concatenated data or the first Q bits of the valid data from L2P table entry 120 into memory requires a read-modify-write operation is related to the above. Figure 5AThe scenarios shown are similar, so I will not go into detail here.
[0189] For example, for Figure 5F In the scenario shown, the merging unit does not concatenate the valid data of L2P table entry 120 with the valid data of L2P table entry 121. Instead, it stores the valid data of L2P table entry 120 in cache 6 and transfers the valid data of L2P table entry 120 in cache 6 to cache 7. The merging unit then stores the valid data of L2P table entry 121 in cache 6 and transfers the valid data of L2P table entry 121 in cache 6 to cache 7.
[0190] Furthermore, whether the process of writing valid data from L2P table entry 120 or valid data from L2P table entry 121 into memory requires a read-modify-write operation is related to the above. Figure 5A The scenarios shown are similar, so I will not go into detail here.
[0191] Furthermore, as an example, in Figure 4 In the process, the merging unit 14 obtains the valid data of the L2P table entry from the cache 2 and stores it in the cache 6; it also deletes the data of the L2P table entry from the cache 2.
[0192] As another example, in response to combining the valid data of the L2P table entry in cache 7 or one or more concatenated data with a portion of the response data in cache 8, and generating new data based on the protocol information in cache 5 and the data, and sending the new data to the memory, the location of the one or more memory addresses and the first bit of the valid data in the memory is deleted from cache 3, or the valid data of the L2P table entry is deleted from cache 6.
[0193] As another example, in response to determining one or more memory addresses in the memory that store valid data of the L2P table entry and the location of the first bit of the valid data in the memory, the address index of the write command is deleted from cache 1.
[0194] Figure 7 This illustration shows a schematic diagram of processing multiple write commands under multiple caches, as provided in an embodiment of this application.
[0195] exist Figure 7 In the diagram, T0-T10 represent multiple consecutive time periods, and the content below each time period indicates the operations performed on the write channels of the accelerator within that time period. Figure 7The above-mentioned multiple caches N1 shown in the figure include cache 2, cache 3, cache 6, cache 7, cache 8, and cache 10; among them, cache 2 is coupled with cache 6 and is used to cache the valid data indicated by a write command. As an example, cache 2 includes a single storage unit for caching the data indicated by a write command, and cache 3, cache 6, cache 7, cache 8, and cache 10 may include multiple storage units, and different storage units are used to store the data corresponding to different write commands.
[0196] During the T0 time period, the logic circuit receives a write command A1 (LBA1, data1, id1), where the write command A1 indicates the address index LBA1, the L2P table entry data data1, and the identification information id1, and parses the write command A1 to obtain the address index LBA1, data1, and id1. The data data1 of the received write command A1 is stored in cache 2.
[0197] During the T1 time period (the T1 time period is a time period after the T0 time period), the logic circuit extracts the valid data in data1 from cache 2, stores the valid data in cache 6, and determines one or more memory addresses 1 and the position where the first bit of the valid data of data1 is stored in the memory according to the address index LBA1 indicated by the write command A1 and the number of bits of the valid data, and stores the mapping relationship between the address index LBA1 and its corresponding memory address 1 in cache 3, for example, <LBA1, memory address 1>. In addition, after determining the memory address corresponding to the write command A1 and the position where the valid data of data1 is stored in the memory, if it is recognized that the valid data of data1 is not byte-aligned or byte-aligned but the first bit of its valid data is not located at the starting position of the storage unit in the memory, the read channel generates read commands B11 and B12 according to the memory address 1 corresponding to the write command A1, where the identification information of the read command B11 is id1_1, and the identification information of the read command B12 is id1_2; the logic circuit stores the mapping relationship between the identification information of the write command A1 and the identification information of its corresponding read commands B11 and B12 in cache 10, for example, <id1 → id1_1, id1_2>. In addition, during the T1 time period, the logic circuit also receives a write command A2 (LBA2, data2, id2), where the write command A2 indicates the address index LBA2, the L2P table entry data data2, and the identification information id2, parses the write command A2 to obtain the address index LBA$2$, data2, and id2, and stores data2 in cache 2.
[0198] It can be understood that Figure 7is schematic. Moving the valid data of data1 from cache 2 to cache 6 can occur immediately after data1 is added to cache 2, without waiting to receive write command A2, nor necessarily occurring simultaneously with the reception of write command A2. Move the valid data of data1 from cache 2 to cache 6 as early as possible so that cache 2 becomes idle earlier to receive write command A2. Cache 6 is also used to process the merging of valid data of multiple write commands. In the case where there is no valid data to be merged, the data in cache 6 is moved to cache 7 as early as possible. In the case where there is valid data to be merged, it is moved to cache 7 after the data merging is completed in cache 6.
[0199] Continue to refer to Figure 7 , within the time period T2 (the time period T2 is a time period after the time period T1), since the address indices indicated by write command A1 and write command A2 are adjacent, the data indicated by write command A1 and write command A2 are also adjacent in the L2P table and can be spliced / merged. After the logic circuit obtains data data2, it splices the valid data of data2 with the valid data of data1 in cache 6 to obtain the spliced valid data (using cache 6 to complete the merging); and updates the mapping relationship in cache 10. For example, the updated mapping relationship is <id1→id1_1,id1_2>, <id2→id2_1, id2_2>.
[0200] In addition, the logic circuit determines one or more memory addresses 2 and the position where the first bit of the valid data of data2 is stored in the memory according to the address index LBA2 indicated by write command A2 and the number of bits of the valid data, and stores the mapping relationship between the address index LBA2 and its corresponding memory address 2 in cache 3. At this time, cache 3 records "<LBA1, memory address 1> and <LBA2, memory address 2>" corresponding to the two write commands A1 and A2.
[0201] After using cache 6 to complete the merging of the valid data of data1 and the valid data of data2, next, the valid data in cache 6 is moved to cache 7. Cache 7 includes multiple storage units and can accommodate multiple copies of data from cache 6. In the case of read-modify-write, the valid data waits for the data read from the memory in cache 7.
[0202] Although the merged valid data in cache 6 is shown in the time period T2 in Figure 7 , and write command A3 is received in the time period T2, the merging of the valid data has nothing to do with the reception of write command A3, and there is no temporal dependence between their occurrences. The operation of moving the unmerged or merged valid data to cache 7 also has nothing to do with the reception of write command A3.
[0203] Furthermore, after determining the memory address corresponding to write command A2 and the location of the valid data of data2 in memory, if it is identified that the valid data of data2 is not byte-aligned or is byte-aligned but its first bit is not located at the beginning of the memory cell, the read channel generates read command B21 and read command B22 according to the memory address 2 corresponding to write command A2. The identifier information of read command B21 is id2_1, and the identifier information of read command B22 is id2_2. The logic circuit stores the mapping relationship between the identifier information of write command A2 and the identifier information of its corresponding read commands B21 and B22 in cache 10. At this time, cache 10 records the mapping relationship between the identifier information of the two write commands A1 and A2 and the identifier information of their corresponding read commands. In addition, during the T2 time period, the logic circuit also receives a write command A3 (LBA4, data4, id4). The write command A3 indicates the address index LBA4, the L2P table entry data data4, and the identification information id4. The write command A3 is parsed to obtain the address index LBA4, data4, and id4, and data4 is stored in cache 2.
[0204] Continue reading Figure 7 During time period T3 (which follows time period T2), the processing of read commands B11 and B12 is not yet complete. A conflict is detected between read commands B21 and B22, meaning they are accessing the same addresses as read commands B11 and B12 respectively. Since the data read back by read commands B11 and B12 will include the data required by read commands B21 and B22, processing of read commands B21 and B22 is terminated, and the mapping relationship of id2 recorded in cache 10 is modified.<id2,id1_1,id1_2> .
[0205] In addition, the valid data of the merged data1 and data2 in cache 6 is moved to cache 7 (this operation can be completed before time period T3). After that, cache 6 becomes available, and the logic circuit reads the valid data of data4 from cache 2 and moves it to cache 6. Additionally, the logic circuit also determines one or more memory addresses 3 and the position in the memory where the first bit of the valid data of data4 is stored based on the address index LBA4 indicated by the write command A3 and the number of bits of the valid data, and stores the mapping relationship between the address index LBA4 and its corresponding memory address 3 in cache 3. At this time, cache 3 records the three write commands A1, A2, and A3, namely "<LBA1, memory address 1>", "<LBA2, memory address 2>", and "<LBA4, memory address 3>". Further, after determining the memory address corresponding to the write command A3 and the position in the memory where the valid data of data4 is stored, if it is recognized that the valid data of data4 is byte-aligned and its first bit is at the starting position of the storage unit in the memory, the accelerator's processing of the write command A3 will not trigger a read-modify-write operation, that is, it will not trigger the read channel to generate one or more read commands based on the memory address 3 corresponding to the write command A3. At this time, cache 10 records the mapping relationships between the identification information of the two write commands A1 and A2 and the identification information of their corresponding read commands, as well as the relationship between the identification information of the write command A3 and the information that no read command is generated, <id1→id1_1,id1_2>, <id2→id1_1,id1_2>, and <id4→none>.
[0206] During time period T4 (time period T4 is a time period after time period T3), no new write command is received. At this time, it is possible to continue waiting for the next command. If the data indicated by the next command can be concatenated with the valid data of data4 in cache 6, the data is concatenated in cache 6 and the concatenated data is moved to cache 7; it is also possible not to wait for the next command, and the logic circuit moves the valid data of data4 in cache 6 to cache 7. At this time, the valid data of the merged data1 and data2 and the valid data of data4 are stored in different storage units of cache 7.
[0207] During time period T5 (time period T5 is a time period after time period T4), since the processing of the write command A3 does not trigger a read-modify-write operation, after moving the valid data of data4 to cache 7, the logic circuit can directly generate a data based on the valid data of data4 in cache 7 and protocol information (for example, AXI protocol information), and send the data to the memory so that the valid data of data4 in the data is stored in the memory.
[0208] In addition, in response to sending the valid data of data4 to the memory, the logic circuit may delete the mapping relationship between the address index indicated by the write command A3 stored in cache 3 and the memory address 3. At this time, "<LBA1, memory address 1> and <LBA2, memory address 2>" corresponding to the two write commands A1 and A2 are recorded in cache 3; and the relationship between the identification information of the write command A3 and the information of the ungenerated read command in cache 10 is deleted. At this time, the mapping relationships <id1 →id1_1,id1_2>, <id2→id1_1,id1_2> of the identification information of the two write commands A1 and A2 and the identification information of the corresponding read commands are recorded in cache 10; and the valid data of data4 stored in cache 7 is deleted.
[0209] In addition, within the time period T5, the logic circuit also receives a write command A4 (LBA10, data10, id10), where the write command A4 indicates an address index LBA10, an L2P table entry data data10, and identification information id10. The logic circuit parses the write command A4 to obtain the address index LBA10, data10, and id10, stores data10 in cache 2, and determines one or more memory addresses 4 and the position in the memory where the first bit of the valid data of data10 is stored according to the address index LBA10 indicated by the write command A4 and the number of bits of the valid data. After determining the memory addresses 4 corresponding to the write command A4 and the position in the memory where the valid data of data10 is stored, if it is recognized that the valid data of data10 is not byte-aligned or is byte-aligned but the first bit of its valid data is not at the starting position of the storage unit in the memory, then a read-modify-write operation needs to be performed during the processing of the write command A4. At this time, the processing of the read commands B11 and B12 has not been completed, and it is also detected that there is a conflict between the write command A4 and the read commands B11 and / or B12, that is, the one or more read commands corresponding to the read-modify-write operation during the processing of the write command A4 have the same address as the read commands B11 and / or B12. Therefore, the processing of the write command A4 is suspended. On the one hand, it does not trigger the read channel to generate one or more read commands according to the memory addresses 4, or does not send the one or more generated read commands to the memory after generating them. On the other hand, the mapping relationship between the address index LBA10 and its corresponding memory address 4 is not stored in cache 3. And / or the mapping relationship between the identification information of the write command A4 and the information of its corresponding read command is not stored in cache 10. At this time, only the "<LBA1, memory address 1> and <LBA2, memory address 2>" of the two write commands A1 and A2 are recorded in cache 3; and only the mapping relationships <id1→id1_1,id1_2>, <id2→id1_1,id1_2> between the respective identification information of the two write commands A1 and A2 and the identification information of the corresponding read commands are recorded in cache 10.
[0210] Continue to refer to Figure 7 , within the time period T6 (the time period T6 is a time period after the time period T5), the logic circuit does not receive the data read back by the read commands B11 and B12, nor does it receive a new write command. Therefore, the logic circuit needs to wait to receive the data read back by the read commands B11 and B12. In addition, it should be understood that there is no temporal and logical relationship between waiting to receive the data read back by the read commands B11 and B12 and receiving a new write command, and waiting to receive the data read back by the read commands B11 and B12 has nothing to do with receiving a new write command.
[0211] Further, within time period T7 (time period T7 is a time period after time period T6), the logic circuit receives the data read back by read commands B11 and B12, and stores the data read back by read commands B11 and B12 into buffer 8. In response to storing the response data of read commands B11 (id1_1) and B12 (id1_2) into buffer 8, it is recognized from the mapping relationships <id1→id1_1,id1_2> and <id2→id1_1,id1_2> recorded in buffer 10 that the response data of read commands B11 (id1_1) and B12 (id1_2) are simultaneously applicable to the valid data of write commands A1 and A2. Thus, all response data is obtained from buffer 8 and the valid data of combined data1 and data2 is obtained from buffer 7. The valid data of combined data1 and data2 is partially combined with all response data, and the combined data and protocol information are used to generate a data including the protocol information and the valid data of combined data1 and data2. This data is sent to the memory so that the valid data of combined data1 and data2 is stored in the memory. Among them, the valid data of data1 in the memory is stored at memory address 1 and its corresponding position, and the valid data of data2 is stored at memory address 2 and its corresponding position.
[0212] Continue to refer to Figure 7 , within time period T8 (time period T8 is a time period after time period T7), in response to storing the valid data of combined data1 and data2 into the memory, the logic circuit may delete the mapping relationship between the address index indicated by write commands A1 and A2 stored in buffer 3 and the memory address; and delete the relationship between the respective identification information of write commands A1 and A2 in buffer 10 and the information of their corresponding read commands; and delete the valid data of combined data1 and data2 in buffer 7.
[0213] At this time, the accelerator has completed the processing of write commands A1 and A2, and will resume the processing of write command A4. The valid data in data10 is obtained from buffer 2 and stored in buffer 6; the read channel generates read command B41 according to memory address 4 corresponding to write command A4 and sends read command B41 to the memory. Among them, the identification information of read command B41 is id4_1, and the logic circuit stores the mapping relationship between the identification information of write command A4 and the identification information of its corresponding read command B41 in buffer 10. At this time, the mapping relationship <id4→id4_1> between the identification information of write command A4 and the identification information of its corresponding read command is recorded in buffer 10; and the mapping relationship <LAB10→memory address 4> between address index LBA10 and its corresponding memory address 4 is stored in buffer 3.
[0214] The T8 time period shows the completion of writing commands A1 and A2 and the reprocessing of writing command A4. There is a time relationship between the completion of writing commands A1 and A2 and the reprocessing of writing command A4. The reprocessing of writing command A4 must be done after writing commands A1 and A2 have been completed.
[0215] During time period T9 (which is the time period after T8), the logic circuit moves the valid data in data10 in cache 6 to cache 7 and receives the read command response identified as id10_1 from the memory. The read command response identified as id10_1 is stored in cache 8. At this time, the logic circuit receives all the response data corresponding to write command A4, combines the valid data in data10 with a part of the response data to obtain the combined data, and sends the combined data to the memory for storage, thus completing the processing of write command A4.
[0216] Back Figure 2B For example, in order to read an L2P table entry from memory, the master device sends one or more read commands to the accelerator. The accelerator can process the read commands one by one or process multiple read commands in parallel to read the L2P table entry from memory.
[0217] Figure 8A A schematic diagram illustrating the accelerator processing of read commands provided in an embodiment of this application is shown.
[0218] As an example, in Figure 8A In this process (8.1), the master device sends a read command D1 to the logic circuit Q2. During this process, the master device and the logic circuit Q2 can exchange data via a bus, such as the AXI bus. The read command D1 includes the L2P table entry address perceived by the master device (obtained by mapping from the logical address; for example, the L2P table entry address perceived by the master device = base address + LBA * size(L2P entry)), and third identification information (used to identify the read command D1 itself, such as ID), where size(L2P entry) represents the size of the L2P table entry. After receiving the read command D1, logic circuit Q2 parses the read command D1 to obtain the L2P table entry address and third identification information perceived by the master device. Then, it stores the L2P table entry address or third identification information in the same or different caches in multiple caches (such as storing it in cache 1_1 in the multiple caches below, denoted as process (8.2)). Further, after parsing the L2P table entry address, logic circuit Q2 calculates the L2P table entry address stored in memory based on the L2P table entry address, denoted as process (8.3).
[0219] Because when each entry in the L2P table perceived by the master device contains part valid data and part empty bits, the memory only stores the valid data of the L2P table entries. This means that each memory cell stores the valid data of one or more L2P table entries perceived by the master device, or stores a portion of the valid data of the L2P table entries perceived by the master device. Furthermore, the L2P table entries are stored in memory sequentially, end-to-end. Therefore, the valid data of the L2P table entries accessed by the read command D1 sent by the master device can be stored in one memory cell or multiple memory cells. That is, the number of memory cells occupied by different L2P table entries accessed by the master device also varies. When the L2P table entries accessed by the master device occupy multiple memory cells (the master device needs to access multiple memory cells), the logic circuit Q2 generates a read command for each memory cell accessed by the read command D1, such as read command E11 or read command E12. Read command E11 is used to read data from one memory cell. For example, if the L2P table entry to be accessed by read command D1 occupies two memory cells, then logic circuit Q2 will generate two read commands, such as read commands E11 and E12; if the L2P table entry to be accessed by read command D1 occupies one memory cell, then logic circuit Q2 will generate one read command. Next, logic circuit Q2 sends one or more generated read commands to the memory controller, as shown in process (8.4). Then, the memory controller reads data from the memory according to one or more read commands, as shown in processes (8.5) and (8.6). For example, if multiple memory cells in the memory are aligned to 8 bytes, and their corresponding byte addresses are 0, 8, 16, 24, etc., then each read command is used to read 8 bytes of data starting from any of the aforementioned byte addresses. Then, the memory controller sends the read data and protocol information as a response to read command E11 or E12 to the L2P accelerator, as shown in process (8.7). For example, the protocol information is AXI protocol information, which includes the identification information of read command E11 or E12. Logic circuit Q2 processes the response to one or more read commands to obtain the L2P table entry to be accessed by read command D1, and feeds back part of read command D1 and the L2P table entry to be accessed by read command D1 as a response to read command D1 to the master device, as shown in process (8.8).
[0220] As another example, the master device may need to access more than one L2P table entry. The master device reads multiple L2P table entries by sending multiple read commands to the accelerator. The accelerator can process multiple read commands one by one or in parallel. Taking the parallel processing of read commands D1 and D2 sent by the master device by the accelerator as an example, the parallel processing mechanism of multiple read commands by the accelerator will be explained.
[0221] When logic circuit Q2 in the read channel processes read commands D1 and D2 in parallel, it generates one or more new read commands based on each read command. Each new read command is used to read data from one memory cell. For example, if the L2P table entry to be accessed by read command D1 occupies two memory cells, logic circuit Q2 will generate two read commands for read command D1, namely read command E11 and read command E12; if the L2P table entry to be accessed by read command D2 also occupies two memory cells, logic circuit Q2 will generate two read commands for read command D2, namely read command E21 and read command E22. When processing read commands D1 and D2, the read channel generates the response to read command D1 or read command D2 based on the response data fed back from all new read commands corresponding to read command D1 or read command D2. Therefore, to determine which new read command the response data received from the memory is based on and whether that new read command corresponds to read command D1 or read command D2, in... Figure 8A In the middle, when logic circuit Q2 processes read command D1 or read command D2, for each read command, in addition to executing... Figure 8A The processes shown in (8.1) to (8.8) also require the execution of process (8.9). That is, when the read channel processes multiple read commands in parallel, the processing of each read command includes processes (8.1) to (8.9). For example, for read command D1, process (8.9) includes: after generating read command E11 and read command E12 for read command D1, setting identification information for read command E11 and read command E12 to identify themselves, and constructing the relationship between the identification information of each read command D1 and the identification information of its corresponding read command E11 and read command E12, and saving it to multiple caches N2. For example, the identification information of read command D1 is ID_D1, and the identification information of read command E11 and read command E12 are ID_E11 and ID_E12, respectively. The identification information is then stored in multiple caches N2.<ID_D1→ID_E11,ID_E12> The information is stored in the cache in the form of [data]. For example, in procedure (8.2), the L2P table entry address and identification information are stored in the same cache as in procedure (8.9), where the identification information of each read command D1 is stored in relation to the identification information of its corresponding read commands E11 and E12. Alternatively, they can be stored in different caches. The processing of read command D2 is similar to that of read command D1 and will not be described in detail here.
[0222] Figure 8B A schematic diagram of the read channel structure provided in an embodiment of this application is shown.
[0223] As an example, in Figure 8BIn this configuration, the read channel also includes logic circuit Q2 and multiple caches N2. Logic circuit Q2 includes: a parsing module 21, a calculation module 22, and a command generation module 23; wherein, in response to receiving one or more first read commands from the master device, the parsing module 21 parses each first read command to obtain the address index of its corresponding L2P table entry, and stores the address index in multiple caches N2; the calculation module 22 is coupled to multiple caches N2, calculates the memory address accessed by one or more second read commands corresponding to each first read command based on the address index corresponding to it; and sets its corresponding second identification information for each second read command, stores the relationship between the first identification information and its corresponding one or more second identification information in multiple caches N2, and calculates one or more second read hits corresponding to each first read command based on the address index corresponding to it; the command generation module 23 is coupled to the calculation module 22, generates at least one second read command based on the memory address, and sends the at least one second read command to the memory.
[0224] exist Figure 8B In this process, the master device sends one or more read commands D to the accelerator, and the parsing module 21 receives one or more read commands D, as represented by process (9.1). During this process, the master device and the accelerator can exchange data via a bus, such as the AXI bus. The logic circuits in the accelerator receive one or more read commands D from the bus. In addition, each read command D indicates the address index of the L2P table entry perceived by the master device, for example, the address index is the logical address LBA; it also indicates identification information used to identify the read command D itself, such as ID. After the logic circuit Q2 receives each read command D, the parsing module 21 parses each read command D to obtain the address index and identification information, etc. After parsing the address index indicated by each read command D, the calculation module 22 calculates the address of the L2P table entry perceived by the master device in memory based on the address index indicated by each read command D, and then stores the L2P table entry address and identification information in the cache, as shown in process (9.2); for example, the address of the L2P table entry perceived by the master device in memory = base address + LBA * size(L2P entry), where size(L2P entry) represents the size of each L2P table entry perceived by the master device, for example, 64 bits.
[0225] When each entry in the L2P table perceived by the master device contains part valid data and part empty bits, the memory stores only the valid data of the L2P table entries. This means that each memory cell stores the valid data of one or more L2P table entries perceived by the master device, or stores a portion of the valid data of the L2P table entries perceived by the master device. Furthermore, the L2P table entries are stored in the memory sequentially, end-to-end. Therefore, the valid data of each L2P table entry accessed by each read command D sent by the master device can be stored in one memory cell or multiple memory cells. That is, the number of memory cells occupied by different L2P table entries accessed by the master device varies. When the L2P table entry accessed by the master device occupies multiple memory cells (the master device needs to access multiple memory cells), the command generation module 23 generates multiple read commands E for each read command D. Each read command E is used to read data from one memory cell, as represented by process (9.3). For example, if each read command D accesses an L2P table entry that occupies two memory cells, the logic circuit will generate two read commands E; if each read command D accesses an L2P table entry that occupies one memory cell, the logic circuit will generate one read command E for each read command D.
[0226] As another example, logic circuit Q2 also includes a merging unit 24. The merging unit 24 merges the valid data and empty bit data of the entry corresponding to each read command D according to the entry length indicated by each read command D to obtain the entry of the L2P table to be accessed. The valid data is located in the first N consecutive bits of the first entry, where N is the length of the valid data. Second protocol information is generated based on the first protocol information of one or more read commands E corresponding to each read command D. The entry and the second protocol information are merged to obtain data as a response to read command D. For example, if the length of the L2P table entry read by read command D is 64 bits, and the length of the valid data is 30 bits, then the merging unit merges the obtained 30 bits of valid data with 34 bits of empty data to obtain the 64-bit L2P table entry to be read by read command D.
[0227] As another example, to improve the processing efficiency of read commands, logic circuit Q2 can process multiple read commands D in parallel. Based on each read command D, one or more read commands E are generated. Therefore, after generating one or more read commands E for each read command D, command generation module 23 also sets identification information for each read command E, constructs the relationship between the identification information of each read command D and the identification information of its corresponding one or more read commands E, and saves it to multiple caches N2. This process is represented as process (9.4). For example, in process (9.2), the L2P table entry address and identification information are stored in the same cache as in process (9.4), or they can be stored in different caches. Next, the command generation module 23 sends one or more read commands E corresponding to each read command D to the memory controller, as shown in process (9.5). Then, the memory controller reads the response data of each read command E from the memory according to each read command E, as shown in processes (9.6) and (9.7). For example, if multiple memory cells in the memory are aligned to 8 bytes, and their corresponding byte addresses are 0, 8, 16, 24, etc., then each read command E is used to read the first 8 bytes of data starting from any of the aforementioned byte addresses. Then, the memory controller will read the response data of each read command E according to each read command E. The first data read and its corresponding first protocol information are used as a response transmission accelerator (also called an L2P accelerator) for each read command E, as shown in process (9.8). The response data refers to the data of a storage unit read according to each read command E, and the first protocol information contains the identification information of each read command E. Then, the logic circuit Q2 processes the response data corresponding to each read command E to obtain a data and the first protocol information. Based on the first protocol information and the relationship, it determines one or more of the data corresponding to each read command D and generates second protocol information. It processes one or more of the data corresponding to each read command A to obtain the entry of the L2P table to be accessed by each read command E. The protocol information P and the entry of the L2P table to be accessed indicated by it are sent to the master device as a response to each read command A, as shown in process (9.9). The data obtained by processing the response data refers to the partial data of the L2P table entry to be accessed by read command D contained in the first data corresponding to each read command E, and the second protocol information contains the identification information of read command D.
[0228] Figure 8C The process of parallel processing of multiple read commands A by the accelerator is demonstrated.
[0229] As an example, in Figure 8CIn the process, the accelerator receives two read commands from the master device, namely read command D1 and read command D2. Based on read command D1, the accelerator generates two memory access commands, namely read command E11 and read command E12. Based on read command D2, the accelerator generates two memory access commands, namely read command E21 and read command E22. The following explanation uses the accelerator's processing of read commands D1 and D2 as an example to illustrate the accelerator's parallel processing mechanism.
[0230] exist Figure 8C In the diagram, T0-T4 represent multiple consecutive time periods, and the content below each time period indicates the operations performed by each module of the accelerator within that time period.
[0231] During the T0 time period, the parsing module 21 receives the read command D1 and parses it to obtain the address index and identification information. After obtaining the address index and identification information of the read command D1, the calculation module 22 calculates the memory address based on the address index and identification information of the read command D1; then, after calculating the memory address, it generates the read command E11 and read command E12 corresponding to the read command D1 based on the memory address. Next, the command generation module 23 generates the read command E11 and read command E12 corresponding to the read command D1, and stores the relationship between the identification information of the read command D1 and the identification information of the read command E11 and read command E12 in multiple caches.
[0232] During the T1 time period (which is the time period after the T0 time period), the accelerator receives the data corresponding to the read command E11 and stores the data corresponding to the read command E11 in multiple caches N2.
[0233] During time period T2 (which follows time period T1), parsing module 21 receives read command D2 and parses it to obtain the address index and identification information. After obtaining the address index and identification information of read command D2, calculation module 22 calculates the memory address based on the address index and identification information of read command D2. Then, after calculating the memory address, command generation module 23 generates read commands E21 and E22 corresponding to read command D2 based on the memory address. Next, after generating read commands E21 and E22, the relationship between the identification information of read command D2 and the identification information of read commands E21 and E22 is stored in multiple caches N2. At this time, since read command D1 has not been processed, in addition to storing the relationship between the identification information of read command D2 and the identification information of read commands E21 and E22, the second cache also stores the relationship between the identification information of read command D1 and the identification information of read commands E11 and E12. According to an embodiment of this application, during time period T2, although read command D1 has not yet been processed, the L2P accelerator can still process the received read command D2. Therefore, the L2P accelerator has the ability to process multiple read commands issued by the master device in parallel. Although Figure 8C The example provided uses two read commands, D1 and D2, issued by the master device to illustrate that, understandably, the L2P accelerator can process a larger number of read commands from the host in parallel.
[0234] During time period T3 (which follows time period T2), the accelerator receives the data corresponding to read command E12 and stores it in multiple caches N2. At this time, the multiple caches N2 store the data corresponding to read commands E11 and E12. Further, after the accelerator receives the data corresponding to read commands B11 and E12, all data corresponding to read command D1 is received. The accelerator processes the received data corresponding to read commands E11 and E12 and concatenates them to obtain the entry of the L2P table to be accessed by read command D1. Based on the identification information of read command D1, the accelerator generates corresponding protocol information and stores this protocol information and the entry of the L2P table to be accessed by read command D1 as a response to read command D1 in the seventh cache. At this point, since the response to read command D1 has been received, the relationship between the identification information of read command D1 and the identification information of read command E11 and read command E12 in multiple caches N2 can be deleted, and the relationship between the identification information of the remaining unprocessed read command D2 and the identification information of command E21 and read command E22 can be deleted.
[0235] During time period T4 (which follows time period T3), the accelerator receives the data corresponding to read commands E21 and E22 and stores this data in multiple caches N2. At this point, the multiple caches N2 store the data corresponding to read commands E21 and E22; that is, all data corresponding to read command D2 has been received. The accelerator processes the received data corresponding to read commands E21 and E22 and concatenates it to obtain the entry for the L2P table that read command D2 needs to access. Based on the identification information of read command D2, it generates corresponding protocol information and stores this protocol information and the L2P table entry that read command D2 needs to access as a response to read command D2 in the multiple caches N2. Since a response to read command D2 has been received, the relationship between the identification information of read command D2 and the identification information of read commands B21 and E22 in the multiple caches N2 can be deleted. Since both read command D1 and read command D2 have been processed at this time, there is no identification information for the pending read commands in the multiple caches N2.
[0236] As can be seen from the above, when receiving data from read commands E11 and E12, the data received from read commands E11 and E12 can be discontinuous in time. That is, between receiving data from read commands E11 and E12, the accelerator can process other read commands (read command D2). Therefore, during the processing of read commands D1 and D2, the accelerator can process read commands D1 and D2 in parallel.
[0237] Figure 8D A schematic diagram of another read channel structure provided in an embodiment of this application is shown.
[0238] As an example, in Figure 8D In this configuration, multiple caches N2 include cache 1_1, cache 1_2, cache 1_3, cache 1_4, cache 1_5, cache 1_6, and cache 1_7. Specifically, parsing module 21, in response to receiving one or more first read commands from the master device, parses each first read command to obtain the address index of its corresponding L2P table entry and stores the address index in cache 1_1. Calculation module 22, coupled to cache 1_1, calculates the memory address accessed by one or more second read commands corresponding to each first read command based on the address index; sets corresponding second identification information for each second read command, and stores the relationship between the first identification information and its corresponding one or more second identification information in cache 1_2. Command generation module 23, coupled to calculation module 22, generates at least one second read command based on the memory address and sends the at least one second read command to the memory.
[0239] Furthermore, in response to receiving the response to the second read command from the memory controller, logic circuit Q2 stores the entire response to the second read command into cache 1_3; merging unit 24, in response to storing the response to the second read command into cache 1_3, obtains the first protocol information from the response to the second read command B in cache 1_3 and stores it in cache 1_4; merging unit 24 In response to storing the response to the second read command in cache 1_3, the second data and identifier are obtained from the response to the second read command in cache 1_3 and stored in cache 1_5; wherein, according to the relationship between the first identification information and the second identification information stored in cache 1_2, in response to receiving the response to all second read commands generated according to any first read command, the merging unit 24 obtains the valid data of the entry for accessing the L2P table indicated by the first read command from one or more second data in cache 1_5, and merges the valid data of the entry with the empty bit data according to the entry length to obtain the entry, and stores the entry in cache 1_6; and updates the marker, indicating the position of the last bit of the obtained entry or the valid data of the entry in cache 1_6; the updated marker is also stored in cache 1_6; the entry and the updated marker are obtained from cache 1_6, the second protocol information corresponding to the first protocol information is obtained from cache 1_2, the response to the first read command is generated according to the entry and the second protocol information, and stored in cache 1_7. Next, retrieve the response to the first read command from cache 1_7 and provide it to the master device via the bus.
[0240] Figure 8E The diagram illustrates the process of storing data in each cache in the logic circuit of an embodiment of this application.
[0241] Taking the logic circuit receiving a read command D1 from the master device, where read command D1 requests access to the data corresponding to entry 122 in the L2P table, and entry 122 is 64 bits long with 30 bits of valid data, the valid data of entry 122 is stored in two consecutive memory cells, with the first 4 bits of the valid data in the first memory cell and the remaining 26 bits in the second memory cell. Since the valid data of entry 122 is located in two consecutive memory cells, after receiving read command D1, the logic circuit will generate two read commands based on read command D1: read command E11 and read command E12. Read command E11 is used to read the 64 bits (8 bytes) of data from the memory cell corresponding to the first 4 bits of the valid data of entry 122, and read command E12 is used to read the 64 bits (8 bytes) of data from the memory cell corresponding to the remaining 26 bits of the valid data of entry 122. Figure 8EIn the second cache, the logic circuit stores the identification information of read command D1 in association with the identification information of read commands E11 and E12, for example, in the form of <identification information of read command D1, identification information of read command E11, identification information of read command E12>, represented as process (10.1). The memory controller sends the data M11 read according to read command E11 to the L2P accelerator. The L2P accelerator controls the storage of data M11 in cache 1_3, represented as process (10.2). Then, the L2P accelerator parses the data M11 to obtain protocol information 11 and data Q11, and controls the storage of protocol information 11 in cache 1_4, represented as process (10.3), and controls the storage of data Q11 in cache 1_5, represented as process (10.4). Here, data Q11 represents 64 bits of data read from the memory storage cell according to read command E11. In response to the memory controller sending data M12 read according to read command E12 to the L2P accelerator, the L2P accelerator controls the storage of data E12 in cache 1_3, as shown in process (10.5). Then, the L2P accelerator parses data M12 to obtain protocol information 12 and data Q12, and controls the storage of protocol information 12 in cache 1_4, as shown in process (10.6), and controls the storage of data Q12 in cache 1_5, as shown in process (10.7). Here, data Q12 represents 64 bits of data read from the memory storage unit according to read command E12. Although represented as processes (10.2) and (10.5) respectively, according to the embodiments of this application, the order in which the memory controller provides data M11 and M12 to the L2P accelerator is not limited. Furthermore, between the L2P accelerator receiving data M11 and M12, it may also receive response data to other read commands provided by the memory controller.
[0242] Based on the relationship between protocol information 11 and protocol information 12 stored in cache 1_4 and the identification information of read command D1 stored in cache 1_2 and the identification information of read commands E11 and E12, it is determined whether the data corresponding to read commands E11 and E12 are both stored in cache 1_5, as shown in process (10.8). After the data corresponding to read commands E11 and E12 are both stored in cache 1_5, the L2P accelerator retrieves the valid data of entry 122 from the fifth cache, and merges the valid data of entry 122 with the empty bit data according to the length of entry 122 to obtain entry 122, stores entry 122 in cache 1_6; and generates a new flag Q to indicate the position of the last bit of the retrieved entry 122 or the valid data of entry 122 in cache 1_6, as shown in process (10.9). Then, the L2P accelerator retrieves the identification information of read command D1 from the cache based on the correspondence between the identification information of read command D1 recorded in cache 1_2 and the identification information of read command E11 and read command E12, and generates the corresponding protocol information 2 (containing the identification information of read command D1) based on the identification information, which is represented as process (10.10); and generates a response to read command D1 based on the generated protocol information 2 and the entry 122 retrieved from cache 1_6, and stores it in cache 1_7, which is represented as process (10.11).
[0243] According to embodiments of this application, by setting up multiple caches to record the L2P accelerator's responses to read commands from the memory controller, the L2P accelerator can simultaneously process responses to multiple read commands from the memory controller. These responses do not need to correspond to the same read command from the master device, but can correspond to multiple read commands from the master device. For example, each response received from the memory controller is recorded in cache 1_3, so that even if the response does not yet provide the complete entry required for the read command from the master device, the response can still be cached without affecting the reception of other responses. Furthermore, after moving a certain response data from cache 1_3 to cache 1_4 and cache 1_5 respectively, this part of the data in cache 1_3 can be deleted to reduce the occupation of cache 1_3. For example, cache 1_5 records the valid data of the entry the master device wants to access. Before the master device has received all the valid data for an entry, the partially received valid data from the L2P accelerator is recorded in cache 1_5. Even if the master device issues multiple read commands simultaneously, a portion of each read command received by the L2P accelerator is recorded in cache 1_5, thus supporting parallel processing of multiple read commands issued by the master device. When the L2P accelerator receives all the valid data for an entry the master device wants to access, it promptly moves this valid data from cache 1_5 to cache 1_6 to construct the entry the master device wants to access, and clears the space occupied by this valid data in cache 1_5. Therefore, cache 1_5 serves to cache multiple responses to multiple read commands corresponding to read commands issued by the master device. Cache 1_6, on the other hand, serves to concatenate the entire entry the master device wants to access.
[0244] The following section uses the example of a logic circuit receiving a read command D1 from the master device and generating read commands E1 and E12 based on read command D1 to explain other operations in each buffer.
[0245] As an example, in Figure 8E In response, the corresponding protocol information 11 is obtained from the response to the read command E11 in cache 1_3 and stored in cache 1_4, and the corresponding data Q11 and tag Q are obtained from the response to the read command E11 in cache 1_3 and stored in cache 1_5; the response to the read command E11 is deleted from cache 1_3.
[0246] As another example, in response to storing the protocol information 11 corresponding to the read command E11 in the cache 1_4, the cache 1_2 is also accessed according to the protocol information 11 to determine whether all read commands E11 and read commands E12 corresponding to the read command D1 have been received.
[0247] As another example, if the read command E12 corresponding to read command D1 has not been received, mark the number of read commands that have been received or have not yet been received.
[0248] As another example, if both read commands E11 and E12 have been received, the valid data of the entry for accessing the L2P table indicated by read command D1 is obtained from the data Q11 corresponding to read command E11 and the data Q12 corresponding to read command E12 in cache 1_5, and the valid data of the entry is merged with the null bit data to obtain the entry, and the entry is stored in cache 1_6; and the data corresponding to read commands E11 and E12 and the tag Q are deleted from cache 1_5.
[0249] As another example, if both read command E11 and read command E12 have been received, the protocol 11 corresponding to read command E11 and the protocol information 12 corresponding to read command B12 are deleted from the fourth buffer.
[0250] As another example, in response to storing the entry for accessing the L2P table indicated by the read command D1 in cache 1_6, a response to the read command D1 is generated based on the protocol information 2 corresponding to the read command D1 and the entry obtained from cache 1_6, and stored in cache 1_7; and the protocol information 2 corresponding to the read command D1, the protocol information 11 corresponding to the read command E11, and the protocol information 12 corresponding to the read command E12 are deleted from cache 1_2, and the entry is deleted from cache 1_6.
[0251] As another example, the response to read command D1 is stored in cache 1_7, the response to read command D1 is retrieved from cache 1_7 and sent to the master device, and the response to read command D1 is deleted from cache 1_7.
[0252] As another example, in response to generating read commands E11 and E12 based on read command D1, the address index of read command D1 is deleted from cache 1_1.
[0253] By promptly deleting data from the cache, cache utilization is improved, reducing the need for a large cache size while still supporting the processing of multiple concurrent read commands. For example, if the L2P accelerator supports processing a maximum of N read commands from the host simultaneously, then cache 1_6 needs to cache a maximum of 6 L2P table entries and their corresponding Q values simultaneously; cache 1_3 needs to cache a maximum of 2N responses from the memory controller simultaneously; and caches 1_4 and 1_5 need to cache a maximum of 2N data entries simultaneously. Cache 1_7 only needs to accommodate the response to one master device read command. Caches 1_1 and 1_2 need to cache a maximum of N data entries simultaneously. Optionally, the capacity of each cache can be smaller than the above values to reduce costs without significantly reducing the L2P accelerator's concurrent processing capability for master device read commands.
[0254] For example, cache 1_2, cache 1_4, cache 1_5, and cache 1_6 are cache arrays. The cache array includes multiple cache units, and each cache unit is respectively used to store the relationship between the identification information of a read command D and the identification information of one or more corresponding read commands E, the protocol information of one or more read commands E corresponding to each read command D, the response data corresponding to each read command E, or the entries and updated tags corresponding to each read command D.
[0255] Figure 8F It shows a schematic diagram of the structures of multiple caches in the logic circuit according to an embodiment of the present application.
[0256] Taking the logic circuit receiving two read commands D1 and D2 from the master device and generating read commands E11 and E12 according to read command D1, and generating read commands E21 and E22 according to read command D2 as an example, in Figure 8F , the cache array corresponding to cache 1_2 includes two cache units, which are respectively used to cache the relationship between the identification information of read command D1 and the identifications of read commands E11 and E12, and to cache the relationship between the identification information of read command D2 and the identifications of read commands E21 and E22. For example, <identifications of D1, E11, E12>, <identifications of D2, E21, E22>. The cache array corresponding to cache 1_4 includes four cache units, which are respectively used to cache the protocol information of read command E11, the protocol information of read command E12, the protocol information of read command E21, and the protocol information of read command E22. Among them, the protocol information of read command E11 contains the E11 identification, the protocol information of read command E12 contains the E12 identification, the protocol information of read command E21 contains the E21 identification, and the protocol information of read command E22 contains the E22 identification. The cache array corresponding to cache 1_5 includes four cache units, which are respectively used to cache the response data of read command E—11, the response data of read command E12, the response data of read command E21, and the response data of read command E22. The cache array corresponding to cache 1_6 includes two cache units, which are respectively used to cache the L2P table entry and the updated tag to be accessed by read command D1, and to cache the L2P table entry and the updated tag to be accessed by read command D2.
[0257] In addition, it should be understood that in Figure 8FIn the context of cache 1_6, once the L2P table entry to be accessed by read command D1 is cached, the E11 and E12 identifiers cached in cache 1_4, the relationship between the identifier information of read command D1 and the identifiers of read commands E11 and E12 in cache 1_2, and the E11 and E12 data in cache 1_5 can be deleted. Similarly, once the L2P table entry to be accessed by read command D2 is cached in cache 1_6, the E21 and E22 identifiers cached in cache 1_4, the relationship between the identifier information of read command D2 and the identifiers of read commands E21 and E22 in cache 1_2, and the E21 and E22 data in cache 1_5 can be deleted.
[0258] Furthermore, the above Figure 4 As previously mentioned, when the accelerator needs to perform a read-modify-write operation during the processing of a write command, the processing of the write command requires the participation of both the write channel and the read channel. To facilitate understanding, taking the example of the master device sending a write command A1 to the accelerator and the accelerator needing to perform a read-modify-write operation during the processing of the write command A1, we will briefly introduce the processing procedures of each module involved in the write channel and the read channel.
[0259] Figure 9 A schematic diagram illustrating the accelerator performing read-modify-write operations according to an embodiment of this application is shown.
[0260] As an example, in Figure 9 In the process of performing a read-modify-write operation, the command generation module 23 of the read channel obtains the byte alignment information of the valid data of the L2P table entry indicated by the write command A1 from the cache 4 of the write channel, and obtains one or more memory addresses corresponding to the write command A1 from the cache 3. The command generation module 23 generates a corresponding read command based on each memory address, and adds the content corresponding to each generated read command to the cache 1-1 of the read channel. Each read command generates a record in the cache 1-1 of the read channel, recording <read address, identifier>, where the "identifier" has a specific value, meaning that it is a read command from the write channel, and the result read through it should be sent to the write channel, not to the master device. After the read channel generates one or more read commands corresponding to the write command A1, the subsequent processing operation for the one or more read commands is the same as the processing operation for read commands sent by the master device, such as filling the cache 1-2. In the cache 1-2, the "identifier" is also used to record that this is a read command from the write channel, and the one or more read commands are sent to the memory through the AXI bus. As another example, after the command generation module 23 generates a corresponding read command based on each memory address, it can also directly add the content corresponding to each generated read command to the cache 1-2 of the read channel. Each read command generates a record in the cache 1-2 of the read channel, recording <read address, identifier>.
[0261] Furthermore, after receiving one or more read commands corresponding to write command A1, the memory reads data according to each read command and feeds back the read data as response data to the read channel. The read channel receives the response data corresponding to each read command and stores the response data in caches 1-5. In response to storing the response data in caches 1-5, the read channel records a specific "identifier" in caches 1-4. By querying caches 1-2, it knows that this specific identifier represents a read command from the write channel, and thus moves the data in caches 1-5 to cache 8 of the write channel. In response to moving the response data to cache 8, the packing module 13 of the write channel, during the data packing process, in addition to obtaining valid data or concatenated data from cache 7 and protocol information from cache 5, also needs to obtain response data from cache 8. Then, the packing module 13 combines the obtained valid data or concatenated data with the response data to obtain combined data, and generates the data according to the combined data and protocol information. The data is then sent to the memory controller, and the memory controller sends the data to the memory.
[0262] It should be noted that, for the sake of brevity, this application describes some methods and their embodiments as a series of actions and combinations thereof. However, those skilled in the art will understand that the solution of this application is not limited to the order of the described actions. Therefore, based on the disclosure or teachings of this application, those skilled in the art will understand that some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art will understand that the embodiments described in this application can be considered as optional embodiments, that is, the actions or modules involved are not necessarily essential for the implementation of one or more solutions of this application. In addition, depending on the solution, the description of some embodiments in this application also has different emphases. In view of this, those skilled in the art will understand that parts not described in detail in a certain embodiment of this application can also be referred to the relevant descriptions of other embodiments.
[0263] In terms of specific implementation, based on the disclosure and teachings of this application, those skilled in the art will understand that the several embodiments disclosed in this application can also be implemented in other ways not disclosed herein. For example, regarding the various units in the electronic device or device embodiments described above, this document has divided them based on logical functions, but in actual implementation, there may be other ways of division. As another example, multiple units or components can be combined or integrated into another system, or some features or functions in a unit or component can be selectively disabled. Regarding the connection relationship between different units or components, the connection discussed above in conjunction with the accompanying drawings can be a direct or indirect coupling between units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, wherein the communication interface can support electrical, optical, acoustic, magnetic, or other forms of signal transmission.
[0264] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application. Clearly, those skilled in the art can make various alterations and variations to this application without departing from its spirit and scope. Thus, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. An L2P accelerator for coupling a host device and a memory, and accelerating the processing of read and write commands issued by the host device to an L2P table in the memory, characterized in that, include: Write channels and read channels, among which, The read channel responds to receiving one or more first read commands from the master device, generates one or more second read commands based on each first read command; responds to receiving first response data from the memory for each second read command, determines the L2P table entry to be read indicated by each first read command based on all the first response data corresponding to each first read command, and sends the L2P table entry corresponding to each first read command and the first protocol information as a response to the first read command to the master device; The write channel responds to receiving one or more write commands from the master device by: obtaining the corresponding address index and L2P table entry for each write command; determining the location in memory of one or more memory addresses corresponding to each write command and the position of the first bit of the valid data in its L2P table entry based on the address index and the number of valid data bits in the L2P table entry; and writing the valid data of the corresponding L2P table entry into memory based on the one or more memory addresses corresponding to each write command and the position of its first bit in memory. The read channel also responds to the presence of a first write command in one or more write commands, wherein the valid data of the first L2P table entry indicated by the first write command is not byte aligned and / or the first bit in the valid data of the first L2P table entry is not located at the starting position of its corresponding storage cell in the memory, generates one or more third read commands according to the memory address corresponding to the first write command, and sends one or more third read commands to the memory. The write channel, in response to receiving second response data from all third read commands fed back by the memory, combines the valid data with a portion of the data in the second response data according to the position of the first bit in the valid data of the first L2P table entry in the memory to obtain first data; and generates second data according to the second protocol information stored in the cache and the first data, and sends the second data to the memory.
2. The accelerator according to claim 1, characterized in that, The one or more write commands include a second write command and a third write command, wherein the second write command indicates a second L2P table entry, the third write command indicates a third L2P table entry, and the second L2P table entry and the third L2P table entry are different entries in the L2P table; The write channel responds to the fact that the second L2P table entry and the third L2P table entry can be concatenated, concatenating the valid data of the second L2P table entry and the valid data of the third L2P table entry to obtain one or more concatenated data, and then writing the concatenated data into the memory.
3. The accelerator according to claim 2, characterized in that, The read channel generates one or more fourth read commands based on one or more memory addresses corresponding to the second write command; the write channel responds to receiving third response data from all fourth read commands fed back by the memory; the third response data is combined with the concatenated data to obtain third data, and the third data and the second protocol information are sent to the memory; or The read channel generates one or more fifth read commands based on one or more memory addresses corresponding to the third write command; the write channel responds to the fourth response data received from all fifth read commands fed back by the memory; the fourth response data is combined with the concatenated data to obtain fourth data, and the fourth data and the second protocol information are sent to the memory.
4. The accelerator according to claim 3, characterized in that, In response to the second write command, there are one or more corresponding fourth read commands, and the third write command has one or more corresponding fifth read commands. In response to receiving the third response data and the fourth response data, the write channel combines the concatenated data with the third response data and the fourth response data to obtain the fifth data, and sends the fifth data and the second protocol information to the memory.
5. The accelerator according to any one of claims 2-4, characterized in that, The write channel responds to the condition that the second L2P table entry and the third L2P table entry cannot be concatenated, and that the memory addresses indicated by one or more fourth read commands and one or more fifth read commands do not conflict, by writing the valid data of the second L2P table entry and the valid data of the third L2P table entry into the memory in parallel.
6. The accelerator according to claim 5, characterized in that, In response to a conflict between one or more fourth read commands and one or more fifth read commands indicating memory addresses, the write channel writes the valid data of the second L2P table entry into the memory and then issues one or more fifth read commands to the memory. Alternatively, after writing the valid data of the third L2P table entry into the memory, one or more fourth read commands can be issued from the memory.
7. The accelerator according to claim 6, characterized in that, In response to a conflict between one or more fourth read commands and one or more fifth read commands indicating memory addresses, the write channel sends one or more memory addresses corresponding to the second write command to the read channel. The read channel responds to receiving one or more memory addresses corresponding to the second write command, and generates one or more fourth read commands based on the memory addresses; The write channel responds to the third response data received from all fourth read commands fed back by the memory, combines the third response data with the valid data of the second L2P table entry to obtain the sixth data, and sends the sixth data and the second protocol information to the memory. In response to receiving third response data for all fourth read commands from the memory, the write channel sends one or more memory addresses corresponding to the third write command to the read channel. The read channel responds to receiving one or more memory addresses corresponding to the third write command, and generates one or more fifth read commands based on the memory addresses; The write channel responds to the fourth response data received from all fifth read commands fed back by the memory, combines the fourth response data with the valid data of the third L2P table entry to obtain the seventh data, and sends the seventh data and the second protocol information to the memory.
8. The accelerator according to claim 1, characterized in that, The read channel includes a first logic circuit and a first plurality of caches; wherein the first plurality of caches includes: a first cache, a second cache, a third cache, a fourth cache, a fifth cache, a sixth cache, and a seventh cache; The first logic circuit, in response to receiving a first read command from the master device, parses the first read command to obtain the address index indicated by the first read command and stores it in a first cache; and generates one or more second read commands based on the received first read command, and stores the relationship between the first identification information identifying each first read command and the second identification information identifying its corresponding one or more second read commands in a second cache. In response to receiving first response data for each second read command, the first logic circuit stores the first response data for each second read command into the third cache; The first logic circuit is responsive to storing the first response data of each second read command into the third cache, obtaining third protocol information from the first response data in the third cache and storing it in the fourth cache, and obtaining the eighth data and storing it in the fifth cache; Specifically, based on the relationship between the first identification information and the second identification information stored in the second cache, in response to receiving all first response data corresponding to each first read command, the first logic circuit obtains the valid data of the L2P table entry indicated by the first read command from all eighth data corresponding to each first read command in the fifth cache, and merges the valid data of the L2P table entry with the empty bit data according to the length of the L2P table entry to obtain the L2P table entry, and stores the L2P table entry in the sixth cache; and updates the flag, storing the updated flag in the sixth cache as well. The L2P table entry and the updated identifier are retrieved from the sixth cache. The first identification information of the first read command is retrieved from the second cache. The first protocol information is generated based on the first identification information. The L2P table entry and the first protocol information are used as a response to the first read command, and the response is stored in the seventh cache.
9. The accelerator according to claim 2, characterized in that, The write channel includes a second logic circuit and a second plurality of caches; wherein the second plurality of caches includes: an eighth cache, a ninth cache, a tenth cache, an eleventh cache, a twelfth cache, a thirteenth cache, a fourteenth cache, a fifteenth cache, a sixteenth cache, and a seventeenth cache; In response to obtaining an address index and an L2P table entry from a write command, the second logic circuit stores the address index in the eighth cache and caches the L2P table entry in the ninth cache; and determines one or more memory addresses and a first location based on the address index and the number of valid data bits, stores the one or more second memory addresses and the first location in the tenth cache; and stores the mapping relationship between the identification information of the write command and the memory address and the first location in the eleventh cache. In response to caching L2P table entries in the ninth cache, valid data of L2P table entries is retrieved from the ninth cache and stored in the thirteenth cache; In response to the fact that an L2P table entry in the ninth cache cannot be concatenated with an L2P table entry stored in the thirteenth cache, the valid data of the L2P table entry is moved to the sixteenth cache; in response to the fact that an L2P table entry in the ninth cache can be concatenated with an L2P table entry stored in the thirteenth cache, the valid data of the L2P table entry is written to the thirteenth cache, and then the concatenated valid data of the L2P table entry and the valid data of the L2P table entry are moved from the thirteenth cache to the sixteenth cache. In response to moving the valid data of the L2P table entry to the sixteenth cache, and in response to the second write command and the third write command not having a corresponding read command, the valid data of the second L2P table entry and the valid data of the third L2P table entry are written from the sixteenth cache into the memory.