FPGA-based nvme host controller hardware offload apparatus

By using FPGA hardware to offload the NVMe protocol stack in embedded systems, the problem of high CPU resource consumption is solved, and performance and stability are improved, especially significantly reducing latency and increasing bandwidth in embedded systems.

CN120540735BActive Publication Date: 2026-06-23ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2025-05-16
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In embedded systems, software-implemented NVMe protocol stacks result in high CPU resource consumption, making it impossible to fully utilize the performance of NVMe SSDs, especially in embedded systems with limited processing power.

Method used

An FPGA-based NVMe host controller hardware offloading device is adopted, which includes a MicroBlaze initialization and configuration module, an I/O queue processing module, a master AXI bus control module, a slave AXI bus control module, and a data buffer DMA module. The key processes of the NVMe protocol stack are offloaded to the FPGA hardware logic circuit for queue management, data scheduling, and status monitoring.

Benefits of technology

Significantly reduces CPU load, improves system performance and stability, especially in application scenarios with limited resources and strict response time requirements, increases bandwidth and reduces latency, and avoids system instability caused by software resource allocation issues.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120540735B_ABST
    Figure CN120540735B_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on FPGA's NVMe host controller hardware unloading device, include: MicroBlaze initialization configuration module, for initialization configuration to system after power on;I / O queue processing module, for command segmentation, encapsulation, distribution and doorbell register maintenance etc.;Master AXI bus control module, for the doorbell value is packed into AXI data package and is submitted to NVMe SSD;From AXI bus control module, for SSD read-write transaction is converted into SQE / PRP reading, data read-write, CQE writing and interrupt submission etc. Operation;Data buffer DMA module, for data in data buffer according to base address and length is read-write handling.The based on FPGA's NVMe host controller hardware unloading device of the application, NVMe protocol stack is unloaded by FPGA hardware, uses hardware to realize NVMe protocol stack in, such as queue management, data transmission and interrupt processing, to optimize resource allocation, break through the performance bottleneck of existing software implementation NVMe protocol stack scheme, more fully utilize the bandwidth of NVMe SSD.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention is geared towards embedded application scenarios, and specifically relates to an FPGA-based NVMe host controller hardware offloading device. Background Technology

[0002] NVMe (Non-Volatile Memory Express) solid-state drives are widely used in data centers, the Internet of Things (IoT), and cloud computing due to their low latency and high bandwidth. The software-based NVMe protocol stack has significant advantages over the traditional AHCI (Advanced Host Controller Interface) protocol, greatly reducing latency, increasing storage I / O parallelism, and significantly improving data read and write speeds.

[0003] As storage capacity continues to expand and the demand for high real-time embedded scenarios (such as autonomous driving) increases, software-implemented NVMe protocol stacks are gradually revealing some bottlenecks. Throughout the interaction with NVMe SSDs, the CPU frequently participates in protocol parsing, queue scheduling, and interrupt handling. These tasks require significant computing resources, leading to a substantial increase in context switching and memory access overhead, further increasing the CPU load. This is particularly pronounced in embedded systems with relatively low operating frequencies and weak processing power, where software-implemented NVMe drivers exacerbate CPU resource consumption. Limited by CPU processing power, the performance of NVMe SSDs in embedded systems cannot be fully utilized. Summary of the Invention

[0004] Therefore, the present invention provides an FPGA-based NVMe host controller hardware offloading device to solve the aforementioned technical problems, specifically adopting the following technical solution:

[0005] An FPGA-based NVMe host controller hardware offloading device includes: a MicroBlaze initialization configuration module, an I / O queue processing module, a master AXI bus control module, a slave AXI bus control module, and a data buffer DMA module;

[0006] The MicroBlaze initialization configuration module is used to initialize the system after power-on, including initializing the NVMe SSD bar space, establishing the management queue and I / O queue, obtaining various parameters of the NVMe SSD, configuring MSI-X interrupt information, and configuring the doorbell register address.

[0007] The I / O queue processing module is used to cut the NVMe host controller commands issued by the user into NVMe host controller sub-commands according to a certain step size, and encapsulate them into standard I / O submission queue entries based on the NVMe protocol. It maps the physical memory space pointed to by the submission queue entries through the PRP addressing model, maintains the pointer and doorbell status of the queue, and generates a completion signal after the data is completely transmitted, and returns the completion status.

[0008] The main AXI bus control module is based on a PCIe bus DMA bridge and is used to package the I / O queue doorbell value into an AXI data packet and submit it to the NVMe SSD.

[0009] The DMA bridge based on the PCIe bus from the AXI bus control module is used to parse the read and write transactions initiated by the NVMe SSD into SQE / PRP read commands, read and write data commands, CQE fill commands, and MSI-X interrupt commit commands according to the address, and send them to the I / O queue processing module or the data buffer DMA module to perform I / O queue data transmission or data transmission between the SSD and the buffer.

[0010] The data buffer DMA module is used to read, write, and move data in the data buffer according to the base address and length.

[0011] Furthermore, the I / O queue processing module includes: a command parsing submodule, a command encapsulation submodule, an SQ doorbell register maintenance submodule, a command retrieval submodule, and a completion signal generation submodule;

[0012] The command parsing submodule is used to divide the NVMe host controller command submitted by the user into several NVMe host controller subcommands according to the parameters set during initialization and a certain step size. Based on the set data size that a single SQE can describe, the submodule submits the number of SQEs contained in the NVMe host controller subcommand to the SQ doorbell register maintenance submodule and submits the number of divided NVMe host controller subcommands to the completion signal generation submodule.

[0013] The command encapsulation submodule is used to encapsulate the segmented NVMe host controller subcommands into SQEs that conform to the NVMe protocol specification and use the PRP addressing model, and store the generated SQEs in Block Bram. When writing SQEs, the queue depth and the current tail pointer information of SQ are used to determine the empty or full state of the queue. When the queue is full, writing stops. When the queue is not full, the SQE is written to the queue, and an update command for the SQ tail pointer is generated and submitted to the SQ doorbell register maintenance submodule.

[0014] The SQ doorbell register maintenance submodule is used to generate and update the SQ TailDoorbell register based on the SQ head and tail pointer states.

[0015] The command retrieval submodule is used to receive the SQE / PRP read command from the AXI bus control module. The command retrieval submodule reads the SQE from the Block RAM, generates a PRP entry, and returns the content of the read SQE or the generated PRP entry.

[0016] The completion signal generation submodule is used to generate NVMe host controller command completion signals and their completion status.

[0017] Furthermore, in an SQE using the PRP addressing model, there are two PRP entries, and each PRP entry points to only one physical page. If the data distribution described by the SQE is no more than one physical page, only one PRP entry is used in the SQE; if the data distribution described by the SQE is more than one physical page but no more than two physical pages, two PRP entries are used in the SQE; if the data distribution described by the SQE is more than two physical pages, the first PRP entry in the SQE points to the data of one physical page, while the second PRP entry points to a list that stores more PRP entries, thus enabling an SQE to describe more data.

[0018] Furthermore, the specific method for generating PRP entries is to increment the content of the PRP entry according to the offset address of the accessed PRP space based on the physical page size, thereby describing a large, contiguous data address space in the local data buffer.

[0019] Furthermore, the completion signal generation submodule compares the number of NVMe host controller subcommands with the number of CQEs submitted. When it is ensured that the data transmission of all NVMe host controller subcommands has been completed, a completion signal for an NVMe host controller command is generated.

[0020] The completion signal generation submodule also counts the status of CQE and integrates the status information of all CQEs to return the completion status of the NVMe host controller command.

[0021] Furthermore, after receiving a new doorbell value update command, the main AXI bus control module initiates an AXI write transaction to write the doorbell value to the SQ Tail Doorbell register address or CQ Head Doorbell register address set during initialization.

[0022] Furthermore, the AXI bus control module includes:

[0023] The read transaction processing submodule is used to parse the AXI read transactions initiated by the DMA bridge of the PCIe bus into read transaction commands containing address and length, distribute them to other modules, and receive the read transaction data returned by other modules and submit them to the NVMe SSD through the AXI bus.

[0024] The write transaction processing submodule is used to parse the AXI write transactions initiated by the DMA bridge of the PCIe bus into write transaction commands containing address, length and data, distribute them to other modules, and receive the write transaction results returned by other modules and submit them to the NVMe SSD.

[0025] Furthermore, the read transaction processing submodule determines whether the content to be read is an SQE / PRP entry or data in the buffer waiting to be written to the SSD based on the address of the AXI read request. Then, it generates a corresponding read transaction command according to the type of content to be read. If it is an SQE / PRP read command, the command is submitted to the I / O queue management module, which performs entry retrieval based on the read transaction command. If it is a read command for data to be written to the SSD, the command is submitted to the data buffer DMA control module, which reads the data in the corresponding buffer according to the base address and length in the command and returns the result.

[0026] Furthermore, the write transaction processing submodule includes:

[0027] The AXI write transaction channel control submodule is used to parse different types of write transaction commands based on the write transaction address, and to distribute different types of write transaction commands and the data to be sent.

[0028] The CQ processing submodule is used to receive CQE and process it accordingly, while updating the CQ HeadDoorbell value.

[0029] The MSI-X interrupt handling submodule is used to receive interrupt requests and generate interrupt signals to submit to the MicroBlaze initialization configuration module.

[0030] Furthermore, when the bus initiates a write transaction, the AXI write transaction channel control submodule determines whether the content of this request includes CQE, MSI-X interrupt, or SSD data read by the user based on the address of the write request. If it is determined to be CQE, the data to be written is submitted to the CQ processing submodule. After receiving the CQE, the CQ processing submodule updates the value of CQ HeadDoorbell internally and updates the CQ Head. Doorbell's command is submitted to the main AXI bus control module. Simultaneously, the CQ processing submodule submits the completion status information contained in the CQE to the I / O queue control module, which generates read / write completion and read / write completion status signals. If an MSI-X interrupt is detected, the interrupt address and data information are submitted to the MSI-X interrupt processing submodule. The MSI-X interrupt processing submodule compares this information with preset parameters. After confirming the interrupt information is correct, it generates an interrupt signal and submits it to the MicroBlaze initialization configuration module for interrupt processing. If SSD data needs to be written to the buffer, the write transaction channel control submodule submits the base address and length information as a data write command to the data buffer DMA control module, and simultaneously transfers the write data to the data buffer.

[0031] The advantage of this invention lies in the fact that the FPGA-based NVMe host controller hardware offloading device offloads key processes and steps in the NVMe protocol stack onto the FPGA hardware logic circuit, optimizing resource allocation and improving stability while enhancing performance. Especially in application scenarios with limited software resources and stringent response time requirements, leveraging the parallel processing capabilities of the FPGA to complete key processes such as queue management, data scheduling, and status monitoring significantly reduces CPU load, lowers latency, increases bandwidth, and improves overall system performance.

[0032] The advantages of this invention also lie in the fact that the FPGA-based NVMe host controller hardware offloading device provided offloads the key steps of the NVMe protocol stack to the FPGA logic circuit. It can take advantage of the strong anti-interference capability of the hardware circuit to enable the embedded system to work normally in complex electromagnetic environments. At the same time, it can also avoid the phenomenon that the software may crash or deadlock due to resource allocation problems during operation, which affects the stability of the system. Attached Figure Description

[0033] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0034] Figure 1 This is a system structure block diagram of an FPGA-based NVMe host controller hardware offloading device according to the present invention.

[0035] Figure 2 This invention relates to the content and description of NVMe host controller commands in an FPGA-based NVMe host controller hardware offloading device.

[0036] Figure 3 This is a functional block diagram of the I / O queue management module of an FPGA-based NVMe host controller hardware offloading device according to the present invention.

[0037] Figure 4 This is a flowchart illustrating the I / O queue management module of an FPGA-based NVMe host controller hardware offloading device according to the present invention.

[0038] Figure 5 The present invention relates to the mapping relationship between PRP entries and host memory address space when the data distribution described by the SQE does not exceed one physical page in an FPGA-based NVMe host controller hardware offloading device.

[0039] Figure 6 The present invention relates to the mapping relationship between PRP entries and host memory address space when the data distribution described by the SQE exceeds one physical page but does not exceed two physical pages in an FPGA-based NVMe host controller hardware offloading device.

[0040] Figure 7 The present invention relates to the mapping relationship between PRP entries and host memory address space when the data distribution described by the SQE exceeds two physical pages in an FPGA-based NVMe host controller hardware offloading device.

[0041] Figure 8 This is a functional structure block diagram of the main AXI interface control module in an FPGA-based NVMe host controller hardware offloading device according to the present invention.

[0042] Figure 9 This is a functional block diagram of the read transaction processing submodule of the AXI interface control module in an FPGA-based NVMe host controller hardware offloading device.

[0043] Figure 10 This is a functional block diagram of the write transaction processing submodule of the AXI interface control module in an FPGA-based NVMe host controller hardware offloading device.

[0044] Figure 11 This is a flowchart illustrating the workflow of the slave AXI interface control module for DMABridge ForPCIe in an FPGA-based NVMe host controller hardware offloading device. Detailed Implementation

[0045] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.

[0046] like Figure 1The diagram shows a system architecture block diagram of an FPGA-based NVMe host controller hardware offloading device according to the present invention. It includes: a MicroBlaze initialization and configuration module, an I / O queue processing module, a master AXI bus control module for DMABridge For PCIe (a DMA bridge for the PCIe bus), a slave AXI bus control module for DMABridge For PCIe, and a data buffer DMA module. The MicroBlaze initialization and configuration module is used to initialize and configure the system after power-on. The I / O queue processing module is used to cut the NVMe host controller commands issued by the user into NVMe host controller sub-commands according to a certain step size, and encapsulate them into standard Submission Queue Entries (SQEs) based on the NVMe protocol. It maps the physical memory space pointed to by the submission queue entries through the Physical Region Page (PRP) addressing model, while maintaining the queue pointer and doorbell status. After complete data transmission, it generates a completion signal and returns a completion status. The master AXI bus control module for DMABridge For PCIe is used to package the doorbell value into an AXI data packet and submit it to the NVMe SSD. The AXI bus control module for DMABridge For PCIe resolves read / write transactions initiated by the NVMe SSD into SQE / PRP read commands, read / write data commands, CQE (Completion Queue Entry) fill commands, and MSI-X interrupt commit commands, based on the address. These commands are then sent to the I / O queue processing module or the data buffer DMA module for I / O queue data transfer or data transfer between the SSD and the buffer. The data buffer DMA module reads, writes, and moves data in the data buffer according to the base address and length.

[0047] In the embodiments of this application, after the system is powered on, the MicroBlaze initialization configuration module will perform operations such as initializing the NVMe SSD bar space, establishing management queues and I / O queues, obtaining various parameters of the NVMe SSD (such as physical block size, number, maximum queue depth, maximum read / write length of a single commit queue entry, etc.), configuring MSI-X interrupt information, and configuring the doorbell register address. Afterward, SSD read / write operations are entirely controlled by hardware without software intervention.

[0048] The following combination Figure 4 The workflow diagram of the I / O queue management module of the FPGA-based NVMe host controller hardware offloading device and Figure 11The flowchart of the AXI interface control module of the FPGA-based NVMe host controller hardware offloading device provides a detailed explanation of the system's workflow.

[0049] In embodiments of this application, the user, through a custom-formatted command interface, will include... Figure 2 After the NVMe host controller commands are submitted to the NVMe host controller, the I / O queue processing module will cut these commands into NVMe host controller sub-commands according to the initialization parameters at a certain step size, and encapsulate them into standard Submission Queue Entry (SQE) based on the NVMe protocol. The module will map the physical memory space pointed to by the submission queue entry through the Physical Region Page (PRP) addressing model, while maintaining the queue pointer and doorbell status, and submitting the doorbell to the main AXI bus control module.

[0050] In the embodiments of this application, after receiving the SQTail Doorbell from the I / O queue processing module, the master AXI bus control module encapsulates it into an AXI data packet and submits it to the NVMe SSD. Upon receiving the SQ Tail Doorbell from the hardware host, the NVMe SSD proactively initiates a read transaction via the PCIe link and submits the transaction to the slave AXI bus control module. The slave AXI bus control module parses the read transaction into an SQE read command, retrieves the corresponding SQE from the Block RAM storing the SQE, and returns it to the NVMe SSD.

[0051] In the embodiments of this application, after obtaining the SQE, the NVMe controller will actively initiate read / write transactions to the host's local memory via the DMA bridge IP core of the PCIe bus, based on its contents. The AXI bus control module resolves the read / write transactions initiated by the NVMe SSD into SQE / PRP read commands, read / write data commands, CQE fill commands, and MSI-X interrupt commit commands, and sends them to the I / O queue processing module or the data buffer DMA module for I / O queue data transfer or data transfer between the SSD and the buffer.

[0052] In the embodiments of this application, after receiving a read / write data command, the data buffer DMA module reads, writes, and moves data in the data buffer according to the base address and length.

[0053] In the embodiments of this application, when the AXI bus control module receives the CQE fill command from the NVMe SSD and the hardware host ensures that all data transmissions of the current SQE are completed, it will update the CQ Head Doorbell value and submit the value to the NVMe SSD through the main AXI bus control module.

[0054] In the embodiments of this application, the I / O queue processing module compares the number of NVMe host controller subcommands (equal to the number of SQEs) with the number of CQEs submitted. When it is ensured that the data transmission of all NVMe host controller subcommands has been completed, a completion status of an NVMe host controller command will be generated. It will also count the status of CQEs and, by combining the status information of all CQEs, return the completion status of the NVMe host controller command.

[0055] In the embodiments of this application, such as Figure 3 As shown, the I / O queue processing module includes: a command parsing submodule, a command encapsulation submodule, an SQ doorbell register maintenance submodule, a command retrieval submodule, and a completion signal generation submodule.

[0056] The command parsing submodule divides the user-submitted NVMe host controller command into several NVMe host controller subcommands according to the initialization parameters and a certain step size. Based on the set data size that a single SQE can describe, it submits the number of SQEs contained in the NVMe host controller subcommand to the SQ doorbell register maintenance submodule and the number of divided NVMe host controller subcommands to the completion signal generation submodule. The command encapsulation submodule encapsulates the divided NVMe host controller subcommands into SQEs conforming to the NVMe protocol specification and using the PRP addressing model. It stores the generated SQEs in the Block Bram. When writing an SQE, it checks the queue depth and the current SQ tail pointer information to determine the queue's empty / full status. When the queue is full, writing stops; when the queue is not full, the SQE is written to the queue, and an SQ tail pointer update command is generated and submitted to the SQ doorbell register maintenance submodule. The SQ doorbell register maintenance submodule generates a Tail Doorbell register to update the SQ based on the SQ head and tail pointer status. The command retrieval submodule receives SQE / PRP read commands from the AXI bus control module. It reads the SQE from Block RAM, generates a PRP entry, and returns the content of either the read SQE or the generated PRP entry. The completion signal generation submodule generates the NVMe host controller command completion signal and its completion status.

[0057] In the embodiments of this application, the SQE using the PRP addressing model contains two PRP entries, and each PRP entry points to only one physical page. If the data distribution described by the SQE does not exceed one physical page, only one PRP entry is used in the SQE, such as... Figure 5 As shown. If the data distribution described by the SQE exceeds one physical page but does not exceed two physical pages, the SQE uses two PRP entries, such as... Figure 6 As shown. If the data distribution described by an SQE exceeds two physical pages, the first PRP entry in the SQE points to the data on one physical page, while the second PRP entry points to a list containing more PRP entries (PRP List). This allows an SQE to describe more data, such as... Figure 7 As shown.

[0058] In the embodiments of this application, the specific method for generating PRP entries is to increment the content of the PRP entry according to the offset address of the accessed PRP space based on the physical page size, thereby describing a large, contiguous data address space in the local data buffer.

[0059] In the embodiments of this application, the completion signal generation submodule compares the number of NVMe host controller subcommands (equal to the number of SQEs) with the number of CQEs submitted. When it is ensured that the data transmission of all NVMe host controller subcommands has been completed, a completion signal for the NVMe host controller command is generated. The completion signal generation submodule also counts the status of CQEs and, by combining the status information of all CQEs, returns the completion status of the NVMe host controller command.

[0060] In the embodiments of this application, after receiving a new doorbell value update command, the main AXI bus control module initiates an AXI write transaction to write the doorbell value to the SQ Tail Doorbell register address or CQ HeadDoorbell register address set during initialization.

[0061] like Figure 8 As shown, in an embodiment of this application, the main AXI bus control module includes:

[0062] The data channel control submodule is used to receive SQ Tail Doorbell and CQ HeadDoorbell and temporarily store them in the FIFO within the module. When a write transaction is initiated, the doorbell value is submitted to the DMA bridge of the PCIe bus through the AXI write data channel.

[0063] The address channel control submodule is used to initiate an AXI write transaction request to the corresponding address based on the doorbell type when the doorbell FIFO in the data channel control submodule is not empty.

[0064] The response channel control submodule is used to receive AXI write transaction response signals from the DMA bridge on the PCIe bus.

[0065] In embodiments of this application, the AXI bus control module includes a read transaction processing submodule and a write transaction processing submodule.

[0066] The read transaction processing submodule parses AXI read transactions initiated by the DMA bridge on the PCIe bus into read transaction commands containing address and length, distributes them to other modules, and receives read transaction data returned by other modules, submitting it to the NVMe SSD via the AXI bus. The write transaction processing submodule parses AXI write transactions initiated by the DMA bridge on the PCIe bus into write transaction commands containing address, length, and data, distributes them to other modules, and receives write transaction results returned by other modules, submitting them to the NVMe SSD.

[0067] like Figure 9 As shown in the embodiments of this application, the read transaction processing submodule includes:

[0068] The read address channel control submodule is used to determine the type of content to be read in this read transaction based on the AXI read transaction address initiated by the DMA bridge of the PCIe bus, and encapsulate the address and length of the read transaction into a read command with flag bits and submit it to the read command distribution submodule according to the type.

[0069] The read command distribution submodule is used to determine the content to be read in this read transaction based on the flag bits in the received read command, and further encapsulate the read command into an SQE / PRP read command with flag bits or a read command for data to be written to the SSD, and distribute it to the corresponding module;

[0070] The read data channel control submodule is used to receive SQE / PRP data or data to be written to the SSD from the corresponding module, and return this data to the DMA bridge of the PCIe bus through the AXI read data channel.

[0071] Specifically, the read transaction processing submodule determines whether the content to be read is an SQE / PRP entry or data in the buffer waiting to be written to the SSD based on the address of the AXI read request. Then, it generates the corresponding read transaction command according to the type of content to be read. If it is an SQE / PRP read command, the command is submitted to the I / O queue management module, which performs entry retrieval based on the read transaction command. If it is a read command for data to be written to the SSD, the command is submitted to the data buffer DMA control module, which reads the data in the corresponding buffer according to the base address and length in the command and returns the result.

[0072] like Figure 10 As shown, in the embodiments of this application, the write transaction processing submodule includes: an AXI write transaction channel control submodule, a CQ processing submodule, and an MSI-X interrupt processing submodule.

[0073] The AXI write transaction channel control submodule is used to parse different types of write transaction commands based on the write transaction address, and to distribute different types of write transaction commands and the data to be sent.

[0074] The CQ processing submodule is used to receive CQE and process it accordingly, while updating the CQ HeadDoorbell value.

[0075] The MSI-X interrupt handling submodule is used to receive interrupt requests and generate interrupt signals to submit to the MicroBlaze initialization configuration module.

[0076] In the embodiments of this application, when a write transaction is initiated on the bus, the AXI write transaction channel control submodule determines whether the content of this request includes CQE, MSI-X interrupt, or SSD data read by the user based on the address of the write request. If it is determined to be CQE, the data to be written is submitted to the CQ processing submodule. After receiving the CQE, the CQ processing submodule updates the value of CQ HeadDoorbell internally and updates the CQ Head. Doorbell's command is submitted to the main AXI bus control module. Simultaneously, the CQ processing submodule submits the completion status information contained in the CQE to the I / O queue control module, which generates read / write completion and read / write completion status signals. If an MSI-X interrupt is detected, the interrupt address and data information are submitted to the MSI-X interrupt processing submodule. The MSI-X interrupt processing submodule compares this information with preset parameters. After confirming the interrupt information is correct, it generates an interrupt signal and submits it to the MicroBlaze initialization configuration module for interrupt processing. If SSD data needs to be written to the buffer, the write transaction channel control submodule submits the base address and length information as a data write command to the data buffer DMA control module, and simultaneously transfers the write data to the data buffer.

[0077] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by equivalent substitution or equivalent transformation fall within the protection scope of the present invention.

Claims

1. A hardware offloading device for an FPGA-based NVMe host controller, characterized in that, Includes: MicroBlaze initialization and configuration module, I / O queue processing module, master AXI bus control module, slave AXI bus control module, and data buffer DMA module; The MicroBlaze initialization configuration module is used to initialize the system after power-on, including initializing the NVMe SSD bar space, establishing the management queue and I / O queue, obtaining various parameters of the NVMe SSD, configuring MSI-X interrupt information, and configuring the doorbell register address. The I / O queue processing module is used to cut the NVMe host controller commands issued by the user into NVMe host controller sub-commands according to a certain step size, and encapsulate them into standard I / O submission queue entries based on the NVMe protocol. It maps the physical memory space pointed to by the submission queue entries through the PRP addressing model, maintains the pointer and doorbell status of the queue, and generates a completion signal after the data is completely transmitted, and returns the completion status. The main AXI bus control module is based on a PCIe bus DMA bridge and is used to package the I / O queue doorbell value into an AXI data packet and submit it to the NVMe SSD. The DMA bridge based on the PCIe bus from the AXI bus control module is used to parse the read and write transactions initiated by the NVMe SSD into SQE / PRP read commands, read and write data commands, CQE fill commands, and MSI-X interrupt commit commands according to the address, and send them to the I / O queue processing module or the data buffer DMA module to perform I / O queue data transmission or data transmission between the SSD and the buffer. The data buffer DMA module is used to read, write, and move data in the data buffer according to the base address and length.

2. The FPGA-based NVMe host controller hardware offloading device according to claim 1, characterized in that, The I / O queue processing module includes: a command parsing submodule, a command encapsulation submodule, an SQ doorbell register maintenance submodule, a command retrieval submodule, and a completion signal generation submodule; The command parsing submodule is used to divide the NVMe host controller command submitted by the user into several NVMe host controller subcommands according to the parameters set during initialization and a certain step size. Based on the set data size that a single SQE can describe, the submodule submits the number of SQEs contained in the NVMe host controller subcommand to the SQ doorbell register maintenance submodule and submits the number of divided NVMe host controller subcommands to the completion signal generation submodule. The command encapsulation submodule is used to encapsulate the segmented NVMe host controller subcommands into SQEs that conform to the NVMe protocol specification and use the PRP addressing model, and store the generated SQEs in Block Bram. When writing SQEs, the queue depth and the current tail pointer information of SQ are used to determine the empty or full state of the queue. When the queue is full, writing stops. When the queue is not full, the SQE is written to the queue, and an update command for the SQ tail pointer is generated and submitted to the SQ doorbell register maintenance submodule. The SQ doorbell register maintenance submodule is used to generate and update the SQ Tail Doorbell register based on the SQ head and tail pointer states; The command retrieval submodule is used to receive the SQE / PRP read command from the AXI bus control module. The command retrieval submodule reads the SQE from the Block RAM, generates a PRP entry, and returns the content of the read SQE or the generated PRP entry. The completion signal generation submodule is used to generate NVMe host controller command completion signals and their completion status.

3. The FPGA-based NVMe host controller hardware offloading device according to claim 2, characterized in that, In a SQE using the PRP addressing model, there are two PRP entries, and each PRP entry points to only one physical page. If the data distribution described by the SQE is no more than one physical page, only one PRP entry is used in the SQE; if the data distribution described by the SQE is more than one physical page but no more than two physical pages, two PRP entries are used in the SQE; if the data distribution described by the SQE is more than two physical pages, the first PRP entry in the SQE points to the data of one physical page, while the second PRP entry points to a list that stores more PRP entries, so that an SQE can describe more data.

4. The FPGA-based NVMe host controller hardware offloading device according to claim 2, characterized in that, The specific method for generating PRP entries is to increment the content of the PRP entry according to the physical page size based on the offset address of the accessed PRP space, thereby describing a large, contiguous data address space in the local data buffer.

5. The FPGA-based NVMe host controller hardware offloading device according to claim 2, characterized in that, The completion signal generation submodule compares the number of NVMe host controller subcommands with the number of CQEs submitted. Once it is ensured that the data transmission of all NVMe host controller subcommands has been completed, a completion signal for an NVMe host controller command is generated. The completion signal generation submodule also counts the status of CQE and integrates the status information of all CQEs to return the completion status of the NVMe host controller command.

6. The FPGA-based NVMe host controller hardware offloading device according to claim 1, characterized in that, After receiving a new doorbell value update command, the main AXI bus control module initiates an AXI write transaction to write the doorbell value to the SQ Tail Doorbell register address or CQ Head Doorbell register address set during initialization.

7. The FPGA-based NVMe host controller hardware offloading device according to claim 1, characterized in that, The AXI bus control module includes: The read transaction processing submodule is used to parse the AXI read transactions initiated by the DMA bridge of the PCIe bus into read transaction commands containing address and length, distribute them to other modules, and receive the read transaction data returned by other modules and submit them to the NVMe SSD through the AXI bus. The write transaction processing submodule is used to parse the AXI write transactions initiated by the DMA bridge of the PCIe bus into write transaction commands containing address, length and data, distribute them to other modules, and receive the write transaction results returned by other modules and submit them to the NVMe SSD.

8. The FPGA-based NVMe host controller hardware offloading device according to claim 7, characterized in that, The read transaction processing submodule determines whether the content to be read is an SQE or PRP entry or data in the buffer waiting to be written to the SSD based on the address of the AXI read request, and then generates the corresponding read transaction command according to the type of content to be read. If it is an SQE / PRP read command, the command is submitted to the I / O queue management module, which performs entry retrieval based on the read transaction command; if it is a read command to write data to the SSD, the command is submitted to the data buffer DMA control module, which reads the data in the corresponding buffer according to the base address and length in the command and returns.

9. The FPGA-based NVMe host controller hardware offloading device according to claim 7, characterized in that, The write transaction processing submodule includes: The AXI write transaction channel control submodule is used to parse different types of write transaction commands based on the write transaction address, and to distribute different types of write transaction commands and the data to be sent. The CQ processing submodule is used to receive CQE and process it accordingly, while also updating the CQ Head Doorbell value; The MSI-X interrupt handling submodule is used to receive interrupt requests and generate interrupt signals to submit to the MicroBlaze initialization configuration module.

10. The FPGA-based NVMe host controller hardware offloading device according to claim 9, characterized in that, When a write transaction is initiated on the bus, the AXI write transaction channel control submodule determines whether the content of the request is CQE, an MSI-X interrupt, or SSD data read by the user based on the address of the write request. If it is determined to be CQE, the data to be written is submitted to the CQ processing submodule. Upon receiving the CQE, the CQ processing submodule updates the value of CQ HeadDoorbell internally and updates the CQ Head. Doorbell's command is submitted to the main AXI bus control module. Simultaneously, the CQ processing submodule submits the completion status information contained in the CQE to the I / O queue control module, which generates read / write completion and read / write completion status signals. If an MSI-X interrupt is detected, the interrupt address and data information are submitted to the MSI-X interrupt processing submodule. The MSI-X interrupt processing submodule compares this information with preset parameters. After confirming the interrupt information is correct, it generates an interrupt signal and submits it to the MicroBlaze initialization configuration module for interrupt processing. If SSD data needs to be written to the buffer, the write transaction channel control submodule submits the base address and length information as a data write command to the data buffer DMA control module, and simultaneously transfers the write data to the data buffer.