Multi-function storage device and method of processing messages

By designing a multi-functional storage device and utilizing the SR-IOV and NVMe protocols, efficient data transmission and management of multiple hosts simultaneously connected to the storage device are achieved, solving the cost and complexity issues caused by the increase in the number of storage devices in existing technologies.

CN113253919BActive Publication Date: 2026-06-26SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2021-02-03
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing storage devices can typically only be used with one host, which means that adding more storage devices requires additional costs, network connectivity resources, and energy consumption. At the same time, adding software to the host server increases complexity and computational overhead.

Method used

It employs a multi-functional storage device, including a rack, storage device, bridging device, and controller, supports single root input/output virtualization (SR-IOV), enables multi-host communication through an embedded network interface controller and bridging circuit, uses the NVMe protocol for data transmission, and manages and isolates multiple virtual functions through an FPGA and NVMe controller.

Benefits of technology

It enables multiple hosts to connect to the storage device simultaneously without additional overhead, reducing costs and complexity, and supporting efficient data transfer and management of multiple remote hosts.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN113253919B_ABST
    Figure CN113253919B_ABST
Patent Text Reader

Abstract

A multifunctional storage device and a method of processing a message are disclosed. The multifunctional storage device can include a chassis, a storage device, and a bridge device. The storage device can include a connector to receive a first message from a host using a first protocol, a physical function (PF) and a virtual function (VF) exposed by the storage device via the connector, a storage to store data related to the first message, and a controller to manage writing write data to the storage and reading read data from the storage. The bridge device can include an embedded network interface controller (eNIC) to receive a second message from the host using a second protocol, a write buffer, a read buffer, a bridge circuit to convert the second message using the second protocol into the first message using the first protocol, and a root port to identify the storage device and to transmit the first message to the VF. The bridge device can be configured to map the host to the VF.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application claims the benefit of U.S. Provisional Patent Application No. 62 / 971,902, filed February 7, 2020, and U.S. Provisional Patent Application No. 63 / 075,092, filed September 4, 2020, both of which are incorporated herein by reference for all purposes. Technical Field

[0002] The inventive concept generally relates to storage devices, and more specifically, to the use of storage devices via multiple remote hosts. Background Technology

[0003] Virtual machines provide a way for host servers to support different functions in response to different requests from the server. However, some storage devices can provide a single controller that can be used with only one host at a time.

[0004] In some respects, storage providers can use an increase in the number of storage devices to enable more virtual machines to connect to the storage devices simultaneously. However, more storage devices can lead to additional costs for purchasing storage devices (and the servers that house them), greater network connectivity resources, and increased energy expenditure to power the storage devices.

[0005] In other respects, software may be included at the host server, for example, to enable multiple virtual machines (which may appear as separate hosts) to communicate with the storage device. However, adding software to the host server can lead to increased server complexity and additional computational overhead. Summary of the Invention

[0006] According to one aspect of an example embodiment, a multi-functional storage device is provided, comprising: a rack; a storage device associated with the rack, the storage device including: a connector for receiving a first message originating from a host using a first protocol; physical functions (PF) and virtual functions (VF) exposed by the storage device via the connector; storage devices for data associated with the first message; and a controller for managing writing write data to the storage device and reading read data from the storage device; and a bridging device associated with the rack, the bridging device including: an embedded network interface controller (ENIC) for receiving a second message from a host using a second protocol; a write buffer for storing write data to be written by the host to the storage device; a read buffer for storing read data to be read from the storage device for the host; bridging circuitry for converting the second message using the second protocol into the first message using the first protocol; and a root port for identifying the storage device and for sending the first message to the VF. The bridging device can map a host to the VF.

[0007] The storage device can implement single root input / output virtualization (SR-IOV).

[0008] The storage device may expose the PF, the VF, and the second VF. The multi-functional storage device may support a host communicating with the storage device using the VF and a second host communicating with the storage device using the second VF.

[0009] Storage devices may include solid-state drives (SSDs).

[0010] The storage device may include: a non-volatile memory fast (NVMe) SSD; and a first NVMe controller associated with the PF and a second NVMe controller associated with the VF. A first protocol may include an NVMe protocol. A second protocol may include an NVMe (NVMeoF) protocol via a structure.

[0011] The host may include a super supervisor and a virtual machine (VM). The storage device can communicate with the super supervisor using the PF and with the VM using the VF.

[0012] The bridging device may include: a first NVMeoF commit / complete queue pair for the super supervisor; a second NVMeoF commit / complete queue pair for the VM; a first NVMe commit / complete queue pair for the PF; and a second NVMe commit / complete queue pair for the VF.

[0013] The bridging device may further include a third NVMe commit / complete queue pair for the VM. The bridging device may use the second and third NVMe commit / complete queue pairs to meet the VM's Quality of Service (QoS) requirements.

[0014] The bridging circuit can generate at least two first messages based at least in part on the second message to send to the storage device.

[0015] The bridging device is operable to assign a tag to a second message.

[0016] According to one aspect of an example embodiment, a method is provided comprising: receiving a first message from a host at a bridging device, the first message using a first protocol; generating a second message at least in part based on the first message, the second message using a second protocol; mapping the host to a virtual function (VF) exposed by a storage device; and sending the second message from the bridging device to the VF exposed by the storage device. The storage device may simultaneously receive a third message originating from a second host.

[0017] The storage device may include: a non-volatile memory fast (NVMe) solid-state drive (SSD); and an NVMe controller associated with the VF. A second protocol may include the NVMe protocol. A first protocol may include the NVMe (NVMeoF) protocol via a structure.

[0018] The host may include a super supervisor and a virtual machine (VM). The step of mapping a host to a VF exposed by a storage device may include: mapping a VM to a VF exposed by the storage device; and mapping a super supervisor to a physical function (PF) exposed by the storage device. The method may further include: receiving a fifth message from the super supervisor at a bridging device, the fifth message using a first protocol; generating a fourth message based at least in part on the fifth message, the fourth message using a second protocol; and sending the fourth message from the bridging device to the PF exposed by the storage device.

[0019] The step of receiving a first message from a host at a bridging device may include: receiving a write request from a host at a bridging device, the write request using a first protocol; receiving data for the write request from a host at a bridging device; and buffering the data for the write request in a write buffer.

[0020] The method may further include: receiving a fourth message at a bridging device from a VF exposed by a storage device, the fourth message using a second protocol and at least partially based on a first message; generating a fifth message at least partially based on the fourth message, the fifth message using the first protocol; mapping the VF exposed by the storage device to a host; and sending the fifth message from the bridging device to the host.

[0021] The step of receiving a fourth message from a VF exposed by a storage device at a bridging device may include: receiving a read response from the VF exposed by the storage device at the bridging device, the read response using a second protocol; receiving data for the read response from the VF exposed by the storage device at the bridging device; and buffering the data for the read response in a read buffer.

[0022] The bridging device may include: an NVMeoF commit / complete queue pair for the VM; a first NVMee commit / complete queue pair for the VM; and a second NVMee commit / complete queue pair for the VM. The method may also include: enforcing QoS requirements for the host using the first and second NVMee commit / complete queue pairs for the VM.

[0023] The step of receiving the first message from the host at the bridging device may include: assigning a tag to the first message.

[0024] According to one aspect of an example embodiment, an apparatus is provided including a non-transitory storage medium having instructions stored thereon, the instructions, when executed by a machine, causing: receiving a first message from a host at a bridging device, the first message using a first protocol; generating a second message at least partially based on the first message, the second message using a second protocol; mapping a host to a virtual function (VF) exposed by the storage device; and sending the second message from the bridging device to the VF exposed by the storage device. The storage device may simultaneously receive a third message originating from a second host.

[0025] The non-transitory storage medium may have further instructions stored thereon, which, when executed by a machine, cause: receiving a fourth message at the bridging device from a VF exposed by the storage device, the fourth message using a second protocol and being at least partially based on a first message; generating a fifth message at least partially based on the fourth message, the fifth message using the first protocol; mapping the VF exposed by the storage device to a host; and sending the fifth message from the bridging device to the host. Attached Figure Description

[0026] The accompanying drawings described below are examples of how embodiments of the inventive concept can be implemented and are not intended to limit the embodiments of the inventive concept. Various embodiments of the inventive concept may include elements not shown in a particular drawing and / or elements shown in a particular drawing may be omitted. The drawings are intended to provide illustration and may not be to scale.

[0027] Figure 1 The machine shown is a multi-functional storage device capable of supporting multiple hosts, according to an embodiment of the inventive concept.

[0028] Figure 2 Examples of embodiments according to the inventive concept are shown. Figure 1 Additional details about the machine.

[0029] Figure 3 Examples of embodiments based on the inventive concept are shown. Figure 1 Remote host for machine communication.

[0030] Figures 4A to 4B Comparison of embodiments based on the inventive concept Figure 3 How to view remote hosts Figure 1 Multifunctional storage device and Figure 1 How are multi-functional storage devices actually implemented?

[0031] Figures 5A to 5C The connection shown is based on an embodiment of the inventive concept. Figure 1 Multifunctional storage device Figure 4B The remote host.

[0032] Figure 6Examples of embodiments according to the inventive concept are shown. Figure 4B Details of the bridging device.

[0033] Figure 7 Examples of embodiments according to the inventive concept are shown. Figure 4B Details of non-volatile memory fast (NVMe) SSDs.

[0034] Figures 8 to 9 Processing according to embodiments of the inventive concept is shown Figure 3 remote host and Figure 1 Multifunctional storage device Figure 4B Messages between virtual functions Figure 4B The bridging device.

[0035] Figures 10A to 10B Examples of embodiments according to the inventive concept are shown. Figure 4B Bridging device processing Figure 3 remote host and Figure 1 Multifunctional storage device Figure 4B A flowchart illustrating an example process of messages between virtual functions.

[0036] Figure 11 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device maps the remote host to the interface provided by the bridging device. Figure 4B A flowchart illustrating an example process for the functions exposed by a multi-functional storage device.

[0037] Figure 12 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device handles from Figure 3 The flowchart shows an example process of a remote host receiving a write request.

[0038] Figure 13 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device processes data from... Figure 1 Multifunctional storage device Figure 4B A flowchart illustrating an example process for reading responses from virtual functions.

[0039] Figure 14 Examples of embodiments according to the inventive concept are shown. Figure 4B The flowchart shows an example process by which the bridging device uses submit / complete queue pairs to enforce Quality of Service (QoS) specifications.

[0040] Figure 15 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device uses the data sent to Figure 1 Multifunctional storage device Figure 4B Multiple messages from virtual functions are processed. Figure 3A flowchart illustrating an example process of a remote host receiving messages. Detailed Implementation

[0041] Referring now to embodiments of the inventive concept, examples of which are illustrated in the accompanying drawings. Numerous specific details are set forth in the following detailed description to provide a thorough understanding of the inventive concept. However, it should be understood that those skilled in the art can practice the inventive concept without these specific details. In other instances, well-known methods, processes, components, circuits, and networks have not been described in detail to avoid unnecessarily obscuring various aspects of the embodiments.

[0042] It will be understood that although the terms first, second, etc., may be used herein to describe various elements, these elements should not be limited by these terms. These terms are used only to distinguish one element from another. For example, without departing from the scope of the inventive concept, a first module may be referred to as a second module, and similarly, a second module may be referred to as a first module.

[0043] The terminology used herein to describe the inventive concept is for the purpose of describing particular embodiments only and is not intended to limit the inventive concept. As used in the description of the inventive concept and the appended claims, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. It will also be understood that the term “and / or” as used herein means and includes any and all possible combinations of one or more of the associated listed items. It will also be understood that when the terms “comprising” and / or “including” are used in this specification, they indicate the presence of the stated features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof. Components and features in the drawings are not necessarily drawn to scale.

[0044] Non-Volatile Memory Express over Fabric (NVMeoF, also known as NVMe-oF) allows storage devices such as solid-state drives (SSDs) to be connected over a network. NVMeoF SSDs can support low-latency remote direct-attach architectures via network protocols including, but not limited to, Ethernet, Transmission Control Protocol / Internet Protocol (TCP / IP), User Datagram Protocol / Internet Protocol (UDP / IP), Remote Direct Memory Access (RDMA), Fibre Channel, or other protocols over a network. NVMeoF SSDs can be connected to one or more remote hosts via switches (such as Ethernet switches or other network switches), which may be located within a chassis or mechanically and / or electrically connected to the chassis. NVMeoF SSDs can operate independently of each other, and the chassis may appear as a pool of SSDs to the remote hosts.

[0045] For example, an NVMeoF Ethernet SSD (eSSD) chassis can contain 48 SSDs. The NVMeoF SSDs can be connected to one or more remote hosts via an Ethernet switch or other network switches present in the chassis. The chassis may also contain a Baseboard Management Controller (BMC) device (or a processing element configured to perform BMC-like operations) and an enclosure that manages at least a portion of the SSDs. Each SSD may have an associated NVMeoF drive running on a remote host. The SSDs can operate independently of each other: to the remote host, such a chassis can appear as an SSD pool (flash cluster (also known as a Simple Bunch of Flash) or JBOF).

[0046] In some respects, NVMeoF SSDs can be more complex (and therefore more expensive) than Non-Volatile Memory Fast (NVMe) SSDs. Furthermore, because the controller can be associated with a single host, multiple hosts may not be able to access data on a single SSD simultaneously. With the increasing use of virtualization, associating a specific controller with a single virtual machine (VM) means that other VMs may not be able to access that SSD.

[0047] In various embodiments of the inventive concept, the disclosed system may include a relatively compact, low-latency multi-storage-function device (appearing to be, for example, an NVMeoF storage device capable of communicating with multiple independent remote hosts or VMs) without requiring any special applications, drivers, or NVMe controllers. In some embodiments of the inventive concept, the storage device may include, for example, a field-programmable gate array (FPGA) and an NVMe controller, which may include multi-functional single-root input / output virtualization (SR-IOV) support that exposes the storage device, each function appearing as a separate storage device to the remote host. SR-IOV provides a mechanism for virtualizing access to the storage device within the machine by using both physical functions (also known as physical functions) and virtual functions of the storage device. By handling virtualization within the storage device, such an implementation avoids the overhead associated with some virtualization intermediary layers of management that separate the processor and the processing system image. More information about SR-IOV can be found, for example, in the Single Root I / O Virtualization and Sharing Specification Revision 1.1, published on January 20, 2010 by the Peripheral Component Interconnect Special Interest Group (PCI-SIG), which is incorporated herein by reference for all purposes.

[0048] Using such a multi-functional storage device, multiple VMs can communicate with a dedicated storage controller simultaneously without additional overhead: for example, no sideband communication channels are added between hosts, and the complexity of the associated hypervisor is not significantly increased. Therefore, using such a multifunctional device can be an economical and energy-efficient way to enable multiple remote hosts to connect to a single storage device simultaneously.

[0049] Multi-functional NVMeoF storage devices

[0050] Storage devices can implement the NVMeoF host protocol via Ethernet and other network transmissions. Although the host interface can be NVMeoF, standard peripheral component interconnect (PCIe) SSD controllers with virtualization function (VF) support, but not NVMeoF storage devices, can be used.

[0051] In some embodiments of the inventive concept, the SSD controller may expose a PCIe Physical Function (PF) and a set of Virtual Functions (VFs). External logic may be used to implement the network host interface and the logic for connecting to the VFs of the SSD. The external logic may be implemented in an FPGA or System-on-Chip (SoC) processor that may be part of the SSD's storage device. The external logic may perform NVMeoF to NVMe bridging functionality. The external NVMeoF to NVMe bridge may map a remote host to a PF and a set of VFs. From the host's perspective, multiple independent NVMe controllers may be exposed to the host. Remote super-supervisors may connect to the NVMe controller associated with the PF, while remote VMs may be mapped and connected to the NVMe controller associated with the VFs. The external NVMeoF to NVMe bridge may serve as a lightweight implementation of a root-complex of the host and the back-end SSD controller. The external NVMeoF to NVMe bridge may implement the SR-IOV protocol to configure and manage the VFs of the back-end SSD controller.

[0052] The super overseer can perform various management and administrative functions on the NVMe controllers assigned to VMs. These management functions may include, but are not limited to: 1) establishing Quality-of-Service (QoS) policies; 2) providing resource allocation (such as bandwidth and capacity); 3) enabling and disabling storage functions; and 4) providing overall storage device management. Storage devices can support such management functions on the PF of the SSD controller and can use the VF NVMe controller for input / output (I / O) functions. Privacy and isolation can be achieved between VF NVMe controllers for storage devices.

[0053] FPGAs can implement the physical layer, media access controller (MAC), and embedded network interface controller (eNIC) (sometimes referred to as embedded network interface circuitry) to provide network communication between VMs and multi-memory functional NVMeoF storage devices.

[0054] Storage devices can utilize existing NVMe controllers that implement Power Factors (PFs) and a large number of Virtual Factors (VFs) to provide multiple independent channels or connections for NVMeoF communication. Even if only a single storage device physically exists and communicates with a remote host, the remote host can see n independent NVMeoF storage devices, where n is the number of VFs populated in the NVMe controller. The host super-supervisor can set the Quality of Service (QoS) requirements for the VMs. FPGAs can provide isolation and fairness between VMs.

[0055] In some embodiments of the inventive concept, an NVMeoF to NVMe bridge, along with a configuration manager and root port (RP) module, provides an application-level interface to the NVMe controller, for example, via a PCIe link. The bridge can perform the necessary protocol translations. The bridge can terminate the NVMeoF protocol between the remote host and the FPGA, and generate the NVMe protocol between the FPGA and the SSD controller. The FPGA can emulate a lightweight root union and PCIe drivers of the back-end NVMe controller. The bridge can generate one or more memory read / write transactions and can map them to the VF before forwarding addresses to the NVMe controller.

[0056] In some embodiments of the inventive concept, the configuration manager can deliver the required PCIe initialization for different modules in the device. The configuration manager may consist of a state machine for initializing the RP and NVMe controller, and can provide status monitoring and performance metrics. The configuration manager can initialize and configure SR-IOV capabilities in the SSD controller, and can enable the use of VFs in the SSD controller. For each storage function, independent management and multiple I / O commit queues (SQs) can be created, maintaining compliance with at least some of the NVMe (NVMe Over Fabrics) specifications.

[0057] In various embodiments of the inventive concept, a given function (PF or VF) supporting remote connections from a VM may support a separate namespace. Alternatively, two or more functions may support a shared namespace. If multiple SSDs communicate, functions on different SSDs may also be able to share a namespace.

[0058] Multifunctional devices can implement multiple storage functions with NVMeoF support in a compact system. Any SSD with SR-IOV support can be used. Each storage device can serve multiple remote hosts simultaneously. Initializing VF requires no specific software or drivers. Initializing the storage device does not require additional sideband communication between hosts. Multifunctional devices are economical, energy-efficient, and scalable.

[0059] Storage devices may include network-connected SSDs using transport technologies such as Ethernet, Fibre Channel, etc. The NVMeoF protocol can be used as the host interface. The eNIC can be used to perform any network session management-related functions. Some examples of such functions are TCP session management, RDMA connection management, and combinations thereof.

[0060] Using the NVMeoF protocol, one or more host command submission queues can be implemented on the storage device side. Each host can be provided with a set of queues. The NVMeoF to NVMe bridge can terminate the NVMeoF protocol and translate it into the NVMe protocol of the back-end SSD controller. NVMe commands can be placed in the PCIe Submission Queue (SQ) for the SSD controller to execute. The SSD controller can place NVMe completion entries into the PCIe Completion Queue (CQ). The SSD controller has access to the SQ and CQ, as well as the write and read data buffers used in PCIe requests. The SSD controller can implement one or more FPGAs and a set of VFs. The FPGA performs SR-IOV management of the VFs. The FPGA and VF can have an associated NVMe controller. The NVMe controller can then communicate with the Flash Translation Layer (FTL) for data persistence. The Flash Channel (FC) module performs flash media management and data persistence.

[0061] The storage device can implement the NVMeoF protocol to connect to a remote host. The storage device can use various network transport protocols, including but not limited to: Ethernet, Fibre Channel, TCP / IP, UDP, RDMA, wireless broadband, iWARP, etc. The storage device can allow multiple VMs and super-supervisors on a remote server to access the storage device. That is, super-supervisors and VMs on the server can have dedicated NVMeoF controllers to write data to and read data from the storage device. The storage device can implement a PCIe-based SSD controller including a PCIe VF to implement an NVMe controller as required by the super-supervisor and VMs. Super-supervisors and VMs may not be aware that the storage device can utilize a PF and a set of VFs to implement a set of NVMe controllers to use the NVMeoF-to-NVMe bridging function. While the following description provides an example description of a storage device using the RDMA-based NVMeoF protocol and a single host, multiple hosts (e.g., super-supervisors and / or VMs) and other network transport technologies can operate in a similar manner.

[0062] Therefore, even though various NVMeoF controllers can be implemented using FPGAs, and the associated NVMe controllers and flash memory can all be housed in a single SSD, as shown above, the storage device can still appear to a remote host as multiple independent NVMeoF controllers. In other words, the super-supervisor and the VM may not be aware that the storage device uses a PF and a set of VFs to implement a set of NVMe controllers to act as an NVMe-to-NVMe bridge.

[0063] In some embodiments of the inventive concept, the NVMeoF storage device can reliably exchange commands and data between a remote host and the NVMeoF controller using the RDMA protocol. For example, NVMe commands can be sent using the RDMA_SEND service (or a similar service) provided by an embedded RDMA network interface card (eRNIC) logic. In an NVMeoF-based RDAS architecture, the command queue (SQ) can be implemented on the storage device (SSD) side. When a command capsule or data packet is received at the eRNIC, the eRNIC can forward the command capsule or data packet, along with the queue pair (QP) number to which it was received, to the NVMeoF controller. The received command can be stored in the SQ of the corresponding host. The QP can have a 1:1 mapping to a fabric-side submission queue. The first level of mapping can be mapping the active host to an associated fabric-side submission queue. Then, one of the selected host's SQs can be chosen to execute the command. The command can be retrieved from the selected queue for further processing.

[0064] Command capsules can be parsed to check the type of command. There can be at least one and at most five NVMeoF commands: Fabrics, Admin, Write, Read, and Special. The handling of these commands may differ depending on the FPGA and / or NVMe storage device. For each command selected for execution, a Fabric command context can be created. The Fabric command context can track parameters specific to that command, such as the submission queue (i.e., the QP number that identifies the submission queue), command ID, etc. Commands can also be assigned tags that allow access to the context throughout the command's lifetime. Tags can be simple indexes into the context RAM or unique values ​​that can be further looked up. These tags can be recorded in the command context throughout the command's lifetime during execution.

[0065] NVMeoF commands can be categorized into at least three distinct classes: 1) Structure commands, 2) Management commands, and 3) I / O commands. Structure commands may involve reading and / or writing to SSD NVMe controller registers that expose various features and capabilities to a remote host. Some advanced structure commands may involve firmware support for implementing commands related to authentication and security, for example. Management commands perform SSD NVMe controller register read / write operations as well as data buffer transfers.

[0066] In one embodiment of the inventive concept, the NVMeoF write I / O command can first be broken down into subcommands with appropriate data transfer lengths. For each write subcommand, the write data can be retrieved from the remote host to the local write data buffer (WDB) using an RDMA read service. Once the write data is retrieved, the NVMe write command can be submitted to the NVMe controller within the SSD controller. This process can be repeated until the entire NVMeoF write command is complete. The number of NVMe write subcommands used can be maintained in the FPGA's command context. The FPGA's command context can be updated when the NVMe write subcommand is completed by the NVMe storage device. Once all write subcommands have successfully completed, the FPGA can generate an NVMeoF completion entry for the remote host. The FPGA can then use an RDMA send (RDMA-SEND) operation to transmit the completion entry to the host CQ.

[0067] In another embodiment of the inventive concept, the entire write data corresponding to the NVMeoF write command can be acquired, and then a single NVMe write command can be issued to the NVMe controller within the SSD controller.

[0068] NVMeoF read commands can have a similar flow to NVMeoF write commands. In one embodiment of the inventive concept, a read command can be broken down into appropriately sized NVMe read subcommands. The number of such read subcommands can be maintained within the FPGA's command context. As the SSD controller stores the read data into the local read buffer (RDB), the NVMe controller within the SSD controller can place the NVMe completion entry for that subcommand into the appropriate VF's FPGA completion queue. Once the read subcommand is complete, the read data block can be transferred to the remote host using the RDMA WRITE service. Once all NVMe read subcommands have successfully completed and all read data has been transferred to the remote host, the NVMeoF read command can be prepared for completion. The FPGA's command context can be used to create NVMeoF completion entries, which can be placed into the appropriate CQ located in the remote host's memory.

[0069] In another embodiment of the inventive concept, the NVMeoF read command is executed as a single NVMe read command, rather than being further divided into subcommands as described in the previous embodiments of the inventive concept.

[0070] Various data structures can be used to track the execution of NVMeoF commands and bridged NVMe commands, maintaining consistency as these commands are processed. For example, tables or other data structures can be used to track the status of individual commands, queue pairs to which individual commands are submitted, tags (if any) associated with individual commands, which commands are submitted to the NVMe controller within the SSD controller, and so on.

[0071] Since multiple remote hosts can be activated at any given time, multiple concurrent NVMeoF and NVMe commands can be executed in parallel.

[0072] A super supervisor on a remote host can send management commands, such as setting QoS requirements and / or resource allocation parameters for a VM host. The super supervisor can be mapped to a PF NVMe controller within the SSD controller. The VM host can be mapped to a VF NVMe controller within the SSD controller. QoS and resource allocation parameters sent from the super supervisor can be forwarded to a PF NVMe controller within the SSD controller that can be used as a management NVMe controller.

[0073] Embedded multi-connection / multi-host NIC

[0074] eNIC implements network protocols and physical and data link layers for network communication between network-attached SSDs and multiple hosts. eNIC can also provide control communication channels between SSDs and super supervisors and VMs.

[0075] Various wired, optical, or wireless network topologies can be used to create links to remote hosts. The NVMeoF transport layer can be implemented independently on top of the network layer. According to the NVMeoF specification, the transport layer can encapsulate NVMe commands, responses, and data transfers. Similar to the network layer, different styles of NVMeoF transport layers (such as RDMA, Transmission Control Protocol (TCP), RDMA over Converged Ethernet (RoCE), RDMA V2 over Converged Ethernet, RDMA over Wireless Bandwidth Networking (WBN), Internet Wide Area RDMA Protocol (iWARP), SCSI RDMA, NVMe over TCP, and NVMe over Fibre Channel, etc.) can be implemented.

[0076] Network protocols can support both control path and data path functions. Control path functions are used for various control functions (such as discovery, identification, configuration, connection establishment, and other network connection management functions). Data path functions can include transmitting data messages in either direction, flow control, error detection and repair, etc.

[0077] The NVMe protocol uses network protocols as a transport mechanism to send NVMe commands to the storage device and transfer data between the host and the device. For this purpose, a single host can use a set of network connections (i.e., one or more network connections). Therefore, the storage device can support multiple sets of connections, one set per remote host.

[0078] The storage device can establish simultaneous connections to multiple hosts and independently perform NVMeoF communication with each host. After one or more connections are established, a lookup table containing host IDs (such as a Content Addressable Memory (CAM) table) can be used to determine and differentiate channel indices based on network topology and transport layer, and to forward control or packets to the appropriate channel-based interface.

[0079] NVMeoF Command Queue

[0080] The storage device can implement a queuing mechanism to buffer and arbitrate NVMeoF commands, responses, and data from multiple hosts.

[0081] Each host can encapsulate a Submission Queue Entry (SQE) and optional data or a Scatter List (SGL) into an NVMeoF command capsule that can be sent to the storage device. In the FPGA, the command capsule can first be extracted from the fabric packet and stored in a channel-based NVMeoF command queue. If the command capsule includes data or an SGL, the data can be retrieved and stored in the WDB.

[0082] Each channel-based command queue can store command capsules from a specific host and can be independent of other queues. The SQE command identifier field can be unique within each command in each channel-based queue. A selection scheme can be implemented to select a command entry from the queue and deliver the selected command entry to the NVMeoF to NVMe bridge. If the super-supervisor has not specified a QoS policy for how to select the next command entry, a round-robin selection from the available queues can be performed to select the next command.

[0083] A similar set of queues can be implemented for response capsules containing completion queue entries (CQE) and optional data received from the NVMeoF to the NVMe bridge. A channel can be selected from channels with pending responses, response entries can be selected, and responses can be generated and sent to the eNIC module based on the selected response entries. If the CQE specifies a data block or SGL, data can be obtained from the RDB and transmitted to the eNIC.

[0084] The super supervisor can define QoS criteria and size for each queue. Queue size can be the same or different for each channel. Depending on the queue size, memory allocation can be implemented in the FPGA, on-board DRAM memory, or both.

[0085] Data buffer

[0086] The data buffer can be physical on-chip memory storage or an off-chip DDR region dedicated to in-flight data staging. The data buffer can be protected with error-correcting codes (ECC) if needed. Each VM can have a different memory allocation type (on-chip or off-chip): which memory allocation type will be used for a given VM can be determined based on the VM's latency requirements. There are two parts: WDB and RDB.

[0087] Write to the data buffer (WDB):

[0088] Incoming data received from the host at the FPGA can be temporarily stored in the WDB until it is ready to be forwarded to the SSD. Typically, for a given queue, the NVMeoF to NVMe bridge can generate an NVMe transaction to the SSD when the Maximum Transfer Unit (MTU) is reached or all data for the transaction has been received from the host. Data in the WDB can then be retrieved and moved to the SSD accordingly. In some embodiments of the inventive concept, the WDB can be a shared memory: i.e., a buffer pool shared among all SQs and hosts. In other embodiments of the inventive concept, the WDB can be a dedicated memory space for each individual host. The memory space allocation for each host can be determined by QoS settings set by a super-supervisor.

[0089] Read the data buffer (RDB):

[0090] Egress data from the SSD to the host can be temporarily stored in the RDB until it is ready to be sent to the host. Typically, for a given queue, the NVMeoF to NVMe bridge can generate an RDMA transaction and move the data to the host accordingly when the MTU is reached or all data for the transaction has been received from the SSD. In some embodiments of the inventive concept, the RDB is a common memory: i.e., a buffer pool shared among all SQs and hosts. In other embodiments of the inventive concept, the RDB can be a dedicated memory space for each individual host. The memory space allocation for each host can be determined by QoS settings set by a super-supervisor.

[0091] NVMeoF to NVMe bridging

[0092] In embodiments of the inventive concept, multiple VMs on a remote server can access storage devices using the NVMeoF protocol. The FPGA can internally bridge the NVMe protocol supported by the NVMe SSD controller with the NVMeoF protocol. That is, external logic can terminate the NVMeoF protocol from the remote host and can generate the NVMe protocol for the NVMe controller within the SSD controller. There are some protocol differences between the NVMeoF and NVMe protocols, therefore the bridging function can perform the conversion between the protocols. Some example differences that can be handled by the NVMeoF to NVMe bridge are listed below: NVMe create and delete I / O SQ commands compared to the Connect command in NVMeoF; differences in data structures; commands that may only exist or be useful in NVMeoF (such as connect and keep-alive); differences in memory models using keyed SGLs; different NVMe register accesses; and authentication commands that do not exist in NVMe in NVMeoF.

[0093] When a remote host sends an NVMeoF command to a storage device, the equivalent action and command can be taken against the NVMe controller within the SSD controller. In other words, the bridging logic makes the existence of the NVMe controller within the SSD controller transparent to the remote host. Simultaneously, this bridging function makes the existence of the remote host transparent to the NVMe controller present in the SSD controller.

[0094] The bridging function creates an equal number of command commit queues for remote hosts within the NVMe domain. It fetches NVMeoF write command data into the WDB and then issues NVMe write commands to the appropriate VF and NVMe controllers. Similarly, for NVMeoF read commands, the bridging function first fetches data into the RDB and then returns that data to the remote host. Data read and write operations between the remote NVMeoF host and the NVMe controller within the SSD controller can be performed using cut-through and / or store-forward methods.

[0095] At any given time, multiple NVMeoF commands can be activated or executed on multiple remote hosts. The bridging function can track these commands using unique tags and ensure that appropriate privacy, security, and QoS policies are applied during command execution.

[0096] NVMe host emulation

[0097] The NVMeoF to NVMe bridge essentially functions as a lightweight host for an NVMe SSD controller, which can be a standard NVMe SSD controller using PCIe as the transport link. The bridge implements a lightweight PCIe root union and NVMe host interface to the SSD controller. The PCIe root union performs PCIe configuration and PCIe bus enumeration of the PCIe endpoints present in the SSD controller. The bridge can configure the appropriate PCIe capabilities and features. The configuration manager within the FPGA performs all discovery and configuration functions. The configuration manager can also configure PCIe SR-IOV capabilities to enable the desired number of VFs and configure the VFs.

[0098] The NVMe-to-NVMe bridge sends commands to the SSD controller via the command queuing mechanism defined by NVMe. The command processing module of the NVMe-to-NVMe bridge creates commands and provides these SQEs (Signal Queues) along with SQ identifiers (such as QP numbers) to the PCIe Submission Queue (PSQ) module. The PSQ module manages command requesters to determine which requester will be processed next and places commands in the appropriate queue position. The PSQ module maintains the necessary data structures for such command queues. For each SQ, a head pointer and a tail pointer exist. The SQ tail pointer is maintained by the PSQ module to add new SQEs to the SQ. The SQ head pointer is maintained by the SSD controller to read the next SQE from that SQ. When there are pending commands in the submission queue, the NVMe-defined doorbell mechanism can be used to provide a trigger to the backend SSD controller. This trigger, called the doorbell, can be a PCIe memory write operation used to update the submission queue tail pointer in the SSD controller. The PSQ module performs flow control on the submission queues. The PSQ module maintains the full and empty states of all submission queues. The SSD controller performs SQ-level flow control by periodically sending SQ header pointer values ​​to the PSQ module via Completion Entries (CEs). The storage device can implement a set of PSQs to match the number of PFs and VFs enabled on the SSD controller. In other words, the number of PSQ groups can be matched to the number of supported remote hosts on a 1:1 mapping.

[0099] The PCIe Completion Queue (PCQ) module implements a command completion queue. When the SSD controller completes the data transfer for a command, the PCQ module can issue a CE (Command Complete Message) for that command. The CE can be placed in the CQ (Command Queue) corresponding to the SQ (Supported Queue) to which the command belongs. For each CQ, a head pointer and a tail pointer can exist for flow control. The tail pointer can be maintained by the SSD controller to write new CEs. The head pointer can be maintained by the PCQ module to read issued CEs. Received CEs can be removed from the CQ and then parsed to extract command and status information. The parsed CEs can also provide SQ flow control in the form of an SQ head pointer. The extracted SQ head pointer can be sent to the appropriate PSQ (Personal Support Queue) module. Finally, the CQ can use a CQ head doorbell mechanism. When a CE is removed from the CQ, the updated CQ head pointer can be sent to the SSD controller in the form of a CQ doorbell. This method allows a set of PCQs to match the number of PFs (Power Factors) and VFs (Virtual Factors) enabled on the SSD controller. In other words, the number of PCQ groups can be matched to the number of supported remote hosts according to a 1:1 mapping.

[0100] Service quality

[0101] QoS definitions are sets of profiles, policies, and guidelines used to prioritize resource allocation and restrict traffic to individual VMs accessing the NVMeoF storage device. A host super-supervisor is responsible for assigning QoS policies and communicating with the storage device. The FPGA, on the other hand, provides isolation and fairness across all VMs based on the defined QoS profile, ensuring that the entire bandwidth is not exhausted by a single channel or a group of channels. The FPGA can also measure critical information (such as bandwidth and buffer levels) and provide this information to the super-supervisor for monitoring and fine-tuning policies.

[0102] QoS profiling can include, but is not limited to, strict priority arbitration, round-robin arbitration, weighted round-robin arbitration, or time division multiple access (TDMA). If the policy is not defined by a super-supervisor, the FPGA can perform unbiased traffic scheduling mechanisms (such as round-robin scheduling). As described throughout this document, QoS policies can be defined and implemented in various design components. Examples of such implementations can be done in the eNIC, NVMeoF command queue, NVMeoF to NVMe bridge, or root union module. Policies can also be forwarded to the NVMe controller associated with the PF, which acts as a management module.

[0103] Since each network connection can be managed by each VM, privacy for each channel can be maintained within the FPGA. The SR-IOV device also knows each physical host connected to a separate VF. Therefore, implementing security measures in the system and method is feasible. Privacy and multi-tenancy policies from the super-supervisor can be received by the FPGA and then forwarded to the NVMe controller associated with the PF.

[0104] FPGAs can provide independent links and expose a list of registers to the super-supervisor for setting network information for VMs, specifying QoS and privacy policies, and monitoring status, errors, and other performance measurements.

[0105] PCIe Root Consortium

[0106] This system and method enable a lightweight PCIe root union that interfaces with a backend SSD controller. External logic (such as an FPGA) can emulate the NVMe host of the SSD controller and enable NVMeoF to NVMe bridging functionality.

[0107] The root union provides the necessary configuration, PCIe enumeration, and initialization of the SSD controller's PCIe endpoints (EPs). The root union generates PCIe configuration transactions for the EPs to configure the PCIe configuration space and various capabilities. The root union generates, receives, analyzes, and completes PCIe memory space transaction requests issued by the SSD controller. The root union then forwards memory read / write transactions generated by the SSD controller to the NVMeoF to NVMe bridge.

[0108] The Configuration Manager can perform PCIe bus enumeration for EPs in the SSD controller. The Configuration Manager can initialize and configure EPs in the SSD controller. The Configuration Manager can set various PCIe capabilities in the SSD controller, including but not limited to: commands, base address registers (BAR), PCIe, SR-IOV, ARI, device control, link control, power management, and interrupts. The Configuration Manager can enable SR-IOV and configure multiple VFs in the controller. When all configurations are complete, the Configuration Manager can enable PCIe transactions.

[0109] The PCIe standard provides the SR-IOV specification, which allows a single physical function to be shared across multiple VMs. SR-IOV can be used in environments where multiple NVMe controllers reside within a single PCIe SSD device. However, SR-IOV functionality can introduce complexity and performance overhead within the VM. To address this, a PF-based NVMe controller is desirable, which can be assigned a separate channel in the NVMeoF to NVMe bridge. The root union in this architecture manages the VF and provides the VF-to-PF translation before forwarding requests to the NVMeoF to NVMe bridge. Any request from a VM to access its dedicated physical storage device can be similarly translated and mapped to the appropriate VF before being forwarded to the SSD controller via the PCIe link.

[0110] The root union is responsible for creating, removing, configuring, and managing Virtual Functions (VFs). The root union controls both the Power Provider (PF) and VFs, handles all events (such as errors), monitors all states, maintains QoS, and resets functionality when necessary. Overall state and settings (such as QoS management) are available to the super-supervisor for monitoring or modification as needed.

[0111] Figure 1 A machine comprising a multi-functional storage device capable of supporting multiple hosts is shown as an embodiment according to the inventive concept. Figure 1 In the diagram, machine 105 is shown. Machine 105 may include processor 110. Processor 110 may be any type of processor. (For ease of illustration, processor 110 and other components discussed below are shown externally to machine 105: embodiments of the inventive concept may include these components within machine 105.) Although Figure 1 A single processor 110 is shown in machine 105, but machine 105 may include any number of processors, each of which may be a single-core or multi-core processor, each of which may implement a reduced instruction set computer (RISC) architecture or a complex instruction set computer (CISC) architecture (and other possibilities), and may be mixed in any desired combination.

[0112] Machine 105 may also include memory 115. Memory 115 may be any type of memory (such as flash memory, dynamic random access memory (DRAM), static random access memory (SRAM), persistent random access memory, ferroelectric random access memory (FRAM), or non-volatile random access memory (NVRAM) (such as magnetoresistive random access memory (MRAM)). Memory 115 may also be any desired combination of different memory types. Machine 105 may also include a memory controller 120 for managing access to memory 115.

[0113] Machine 105 may also include a storage device, such as a multi-function storage device 125. Multi-function storage device 125 can be used to store data. Processor 110 may run a device drive 130 that supports access to multi-function storage device 125. While embodiments of the inventive concept may focus on a solid-state drive (SSD) as part of multi-function storage device 125, any desired storage device that can operate using any desired storage principle may be used. Therefore, multi-function storage device 125 may include block-based SSDs, key-value SSDs (KV-SSDs), hard disk drives, or any other desired storage device. Although Figure 1 Only one multi-functional storage device 125 is shown, but embodiments of the inventive concept can support any number of storage devices installed in machine 105 that are different from each other (or similar or identical to each other) for any particular individual element.

[0114] Key-value stores use keys instead of logical block addresses (LBAs) to identify data. Unlike block-based stores, where data is expected to be written to and read in units of a specific, predefined size (e.g., pages or blocks), objects can be of any size. The size of an object stored on a key-value store can be limited by the capacity of the key-value store. Therefore, an object can be smaller than a page or block, or larger than a page or block. (While the size of a page or block still controls how an object is stored, how the storage device is managed is separate from how the object can be written to or read from.)

[0115] Similarly, while block-based storage devices are expected to have LBAs that fit a specific range of values ​​(and therefore use a specific predefined number of bits in the LBA), keys can be of any size and can take any desired value. Because the number of bits in the key can vary, key-value stores are more flexible than block-based storage devices. However, other considerations exist. For example, although the LBAs used by different applications may be unique, nothing prevents different applications from attempting to write data using the same key. In this case, the key-value store can return an error to the second application, informing it that the key has already been used and therefore the value cannot be written to the key-value store.

[0116] Machine 105 is shown connected to network 135. Network 135 can be any kind of network (such as a local area network (LAN), wide area network (WAN), metropolitan area network (MAN), global network (such as the Internet), or any combination thereof). Furthermore, network 135 may include any type of connection comprising wired and wireless connections (or combinations thereof). Network 135 may have other devices connected thereto (as described below). Figure 3 (As discussed).

[0117] also, Figure 1A multifunction storage device 125 connected to network 135 is shown. By directly connecting to network 135, the multifunction storage device 125 can bypass the internal components of machine 105. This arrangement can result in faster communication with the multifunction storage device 125 from a remote host. Furthermore, by utilizing sufficient intelligence built into the multifunction storage device 125 (as discussed below), components (such as processor 110, memory 115, and memory controller 120) can be reduced or eliminated entirely in terms of desired functionality. Embodiments of the inventive concept are intended to cover all such variations in which machine 105 includes processor 110, memory 115, and memory controller 120, or omits those components.

[0118] although Figure 1 Machine 105 is described as a server (which may be a standalone server or a rack server), but embodiments of the inventive concept may include any desired type of machine 105 without limitation. For example, machine 105 may be replaced by a desktop computer or laptop computer or any other machine that may benefit from embodiments of the inventive concept. Machine 105 may also include dedicated portable computing machines, tablet computers, smartphones, and other computing machines. Furthermore, applications that can access data from the multi-functional storage device 125 may reside in another machine, which is separate from machine 105 and accesses machine 105 via a network connection traversing one or more networks of any type (wired, wireless, global, etc.). Machine 105 may be a conventional server (such as one that supports virtual machines, as referred to below). Figure 3 The machine 105, as discussed, may be a storage server (processor 110, memory 115, memory controller 120 and / or device driver 130 may be omitted).

[0119] Figure 2 Examples of embodiments according to the inventive concept are shown. Figure 1 Additional details about the machine. Figure 2 Typically, machine 105 includes one or more processors 110, which may include a memory controller 120 and a clock 205 for coordinating the operation of components of machine 105. Processor 110 may also be coupled to memory 115, which may include, for example, random access memory (RAM), read-only memory (ROM), or other state-keeping media. Processor 110 may also be coupled to a multi-function storage device 125 and a network connector 210, which may be, for example, an Ethernet connector or a wireless connector. Processor 110 may also be connected to bus 215, and other components such as user interface 220 and input / output interface ports that can be managed using input / output engine 225 may be attached to bus 215.

[0120] Figure 3The embodiments shown are based on the inventive concept. Figure 1 The remote host communicates with the machine. Figure 3 Remote hosts 305-1, 305-2, and 305-3 are shown in the diagram. Although Figure 3 Three remote hosts are shown, but embodiments of the inventive concept may include any number (zero or more) of remote hosts connected to network 135 at any given time. Furthermore, as referenced below... Figures 4A to 4B Further discussion suggests that a remote host can be an independent computer system or a virtual machine (VM) running on a computer system and managed by a super overseer (the super overseer itself can also be considered a remote host).

[0121] Remote hosts 305-1, 305-2, and 305-3 can be connected to network 135, which in turn can be connected to machine 105. More specifically, Ethernet switch 310 can connect machine 105 to network 135. Ethernet SSDs (eSSDs) 125-1, 125-2, and 125-3 (e.g., eSSD 1, eSSD 2, eSSD n, where n is an integer greater than 1) can be connected to Ethernet switch 310, thereby enabling eSSDs 125-1, 125-2, and 125-3 to connect to network 135 (and to remote hosts 305-1, 305-2, and 305-3). Although Figure 3 Three eSSDs, 125-1, 125-2, and 125-3, are shown, but embodiments of the inventive concept may include any number of eSSDs (one or more). Furthermore, eSSDs 125-1, 125-2, and 125-3 may be replaced by other network-enabled storage devices without sacrificing versatility.

[0122] In addition to Ethernet switch 310 and eSSDs 125-1, 125-2, and 125-3, machine 105 may also include a baseboard management controller (BMC) 315, a peripheral component interconnect (PCIe) switch 320, and a midplane 325. BMC 315 can be connected to Ethernet switch 310 and subsequently to PCIe switch 320. Through midplane 325, PCIe switch 320 can connect to eSSDs 125-1, 125-2, and 125-3 using a communication path separate from the communication path used to connect eSSDs 125-1, 125-2, and 125-3 to Ethernet switch 310. In practice, the existence of two connections provides several alternatives for different embodiments of the inventive concept. In some embodiments of the inventive concept, the direct communication path from Ethernet switch 310 to eSSDs 125-1, 125-2, and 125-3 can be used as a data path, and the communication path from Ethernet switch 310 to eSSDs 125-1, 125-2, and 125-3 via BMC 315, PCIe switch 320, and middleboard 325 can be used as a control path (i.e., the communication path via BMC 315 can be used to send information related to the management of eSSDs 125-1, 125-2, and 125-3, rather than requests for data to be written to or read from eSSDs 125-1, 125-2, and 125-3). In other embodiments of the inventive concept, in the event that the direct communication path from Ethernet switch 310 to eSSDs 125-1, 125-2, and 125-3 fails for some reason, the communication path from Ethernet switch 310 to eSSDs 125-1, 125-2, and 125-3 via BMC 315, PCIe switch 320, and middleboard 325 can be used as a backup. In yet another embodiment of the inventive concept, the two communication paths can be used in parallel for any reason, thereby potentially accelerating communication between remote hosts 305-1, 305-2, and 305-3 and eSSDs 125-1, 125-2, and 125-3.

[0123] Figures 4A to 4B Comparison of embodiments based on the inventive concept Figure 3 How to view remote host 305-1 Figure 1 Multifunctional storage device 125 and Figure 1 How is the multi-functional storage device 125 actually presented? Figure 4A In the diagram, remote host 305-1 is shown as including super-overseer 405 and VMs 410-1, 410-2, and 410-3. Although Figure 4A Remote host 305-1 is shown as including three VMs (e.g., VM1, VM2, VM3). m(where m is an integer greater than 1), but embodiments of the inventive concept can support remote hosts with any number of VMs and different remote hosts with different numbers of VMs. Furthermore, remote host 305-1 can support multiple super-supervisors, each super-supervisor managing a subset of the VMs running on remote host 305-1.

[0124] From the perspective of remote host 305-1 (more specifically, from the perspective of super-overseer 405 and VMs 410-1, 410-2, and 410-3), a multi-functional storage device 125, which may include rack 415 and internal components, appears to include non-volatile memory fast (NVMeoF) SSDs 420-1 and 420-2 (e.g., NVMeoF device 1 and NVMeoF device k, where k is an integer greater than 1) via the structure. While Figure 4A While it may appear that only two NVMeoF SSDs, 420-1 and 420-2, are present, rack 415 can appear to include any number (one or more) of NVMeoF SSDs (the exact number can vary depending on the number of Physical Functions (PF) and Virtual Functions (VF) exposed by the storage devices within rack 415, and the number of those functions that are enabled or disabled). Typically, the number of NVMeoF SSDs "seen" by remote host 305-1 is at least as large as (typically several times larger than) the number of actual SSDs installed within rack 415. Each NVMeoF SSD appears to be an SSD that includes flash memory (such as Flash 425-1 or 425-2) with a front end that presents the NVMeoF interface to remote host 305-1.

[0125] Because the remote host 305-1 “sees” multiple individual NVMeoF SSDs, the remote host can connect to any available NVMeoF SSD without worrying that the connection might affect any other remote host connected to one of the other NVMeoF SSDs.

[0126] like Figure 4B As shown, the actual implementation inside rack 415 differs from what the remote host 305-1 "sees." Figure 4B In this configuration, cabinet 415 may include bridging devices (or external circuitry, bridging device circuitry) 430. See below for further details. Figure 6The implementation of bridging device 430 is discussed further. Bridging device 430 can interface with non-volatile memory fast (NVMe) SSDs (hereinafter also referred to as storage devices) 435. NVMe SSD 435 can expose PF 440 and VF 445-1 and 445-2. For each exposed function (physical or virtual), NVMe SSD 435 can include NVMe controllers (such as NVMe controllers 450-1 and 450-2), each of which can interface with flash memory 425. NVMe SSD 435 can implement single root input / output virtualization (SR-IOV) to support multiple remote hosts communicating with NVMe SSD 435.

[0127] The term "external" in "external circuitry (or bridging device)" should be understood to mean that the circuitry is located "outside" the storage device itself. In such embodiments of the inventive concept, the storage device itself is protected from modification, and theoretically any other storage device that can be used can be used with bridging device 430. However, bridging device 430 does not necessarily have to be "outside" everything, or even obviously separated from storage device 435. For example, both bridging device 430 and storage device 435 may be inside rack 415 and therefore not individually visible to the customer. However, in some embodiments of the inventive concept, it is feasible to separate bridging device 430 from NVMe SSD 435. In other words, both bridging device 430 and NVMe SSD 435 may not be inside rack 415 and can be considered to be geographically distant from each other, with communication between them traversing some kind of connection (which may or may not include). Figure 1 (Network 135). However, the association between the bridging device 430 and the NVMe SSD 435 remains, allowing the bridging device 430 to operate so that the NVMe SSD 435 appears to be multiple NVMeoF SSDs.

[0128] Although Figure 4B This shows one PF 440 and three VF 445-1 and 445-2 (e.g., VF1, VF2, VF...). k Where k is an integer greater than 1, but embodiments of the inventive concept can support any number (one or more) of PFs and any number (one or more) of VFs. Typically, each PF may include its own hardware resources, which may be shared by a subset of VFs. Although Figure 4BA rack 415 is shown, including an NVMe SSD 435; however, embodiments of the inventive concept may include any number (one or more) of storage devices within the rack 415. Finally, embodiments of the inventive concept can be extended to storage devices using other protocols for communication (i.e., other than NVMe) or storage devices that may use other storage technologies (such as hard disk drives).

[0129] NVMe controllers 450-1 to 450-2 can be implemented using hardware (such as a processor for which only a few feasible implementations are listed, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), or a general-purpose GPU (GPGPU)), software (which can run on appropriately available hardware), or a combination of both.

[0130] In some embodiments of the inventive concept, each function exposed by the NVMe SSD 435 can communicate with at most one remote host 305-1 at a time. (Note that from the perspective of the remote host 305-1, this relationship does not have to be one-to-one: the remote host 305-1 can communicate with multiple functions exposed by the NVMe SSD 435 or functions exposed by other storage devices.) In other embodiments of the inventive concept, the bridging device 430 can enable multiple remote hosts 305-1 to communicate simultaneously with a single function exposed by the NVMe SSD 435 (this can be understood as meaning that the bridging device 430 can communicate with multiple remote hosts 305-1 approximately at the same time and possibly simultaneously, but different communications can be in different states, and at any given time the bridging device 430 can communicate with zero, one, or more remote hosts 305-1). However, allowing multiple remote hosts 305-1 to communicate simultaneously with a single function exposed by the NVMe SSD 435 does present some potential risks, namely that data on one host may not remain securely accessible from another host. In such an embodiment of the inventive concept, the bridging device 430 can subdivide the address space of the functionality exposed by the NVMe SSD 430 in a manner that maintains data isolation between or within hosts. (This is similar to how the bridging device 430 can enforce isolation within a host, as discussed below.)

[0131] It is also possible for different functions exposed by an NVMe SSD 435 to share namespaces. By sharing namespaces, it is possible for two (or more) remote hosts 305-1 (or super-overseer 405 or VM410-1, 410-2, or 410-3) to share information. These different functions can be exposed by a single NVMe SSD 435 or multiple NVMe SSD 435s (potentially involving communication between multiple NVMe SSD 435s).

[0132] Figures 5A to 5C Examples of embodiments according to the inventive concept are shown. Figure 4B Remote host 305-1 connects to Figure 1 125 is a multi-functional storage device. Figure 5A In the middle, the remote host 305-1 can communicate with the multi-function storage device 125 via a wired connection 505. (Although) Figures 5A to 5C Remote host 305-1 is shown instead of VM 410-1 or 410-2 in Figure 4, but embodiments of the inventive concept may include a VM running on remote host 305-1. Figure 4B The VM 410-1 or 410-2, therefore figuratively speaking, can be a "remote host" connected to the multi-functional storage device 125. Figure 5B In this configuration, the remote host 305-1 can communicate with the multi-function storage device 125 via a wireless connection 510. Figure 5C In this configuration, the remote host 305-1 can communicate with the multi-function storage device 125 via optical connection 515. Other types of connections are also possible.

[0133] Figure 6 Examples of embodiments according to the inventive concept are shown. Figure 4B Details of the bridging device 430. Figure 6 In this embodiment, bridging device 430 may be implemented using a processor, FPGA, ASIC, GPU, or GPGPU, among other possible implementations listed only. Bridging device 430 may also be implemented using software that can run on appropriately available hardware, or a combination of hardware and software implementations.

[0134] The bridging device 430 may include a media access controller (MAC) 605 and an embedded network interface controller (eNIC) 610. The MAC 605 can be used as the physical interface of the bridging device 430, allowing the bridging device 430 to connect to a remote host (such as...). Figure 3The eNIC 610 receives communication from remote hosts (305-1, 305-2, and 305-3) (or from VMs running on these remote hosts). The eNIC 610 acts as a network interface controller, converting signals received via the MAC 605 and interpreting protocols used for data transmission (such as NVMeoF). Together, the MAC 605 and eNIC 610 implement network protocols and physical and data link layers for network communication with remote hosts. Any desired transport layer can be used: for example, RDMA over Converged Ethernet (RoCE), RDMA V2 over Converged Ethernet, RDMA for Wireless Bandwidth Networking (WBN), Interconnect Wide Area RDMA Protocol (iWARP), Small Computer System Interface (SCSI) RDMA Protocol, NVMe over Transmission Control Protocol (TCP), or NVMe over Fibre Channel, among other transport layer protocols listed to the few. Messages can be encoded into NVMeoF messages by the remote host using techniques such as Submit Queue Entry (SQE) or Distributed List (SGL) before being encapsulated for transmission.

[0135] Upon receiving an NVMeoF message, bridging device 430 may assign a tag to the NVMeoF message. This tag may be assigned by MAC 605, eNIC 610, or any other component of bridging device 430. By assigning tags to NVMeoF messages, bridging device 430 can keep track of various messages from various remote hosts being processed simultaneously through bridging device 430.

[0136] The bridging device 430 can extract commands from NVMeoF messages, which can be placed in the commit queue of the NVMeoF commit / complete queue pair 615 for the remote host within the bridging device 430. For example... Figure 6 As shown, the bridging device 430 may include any number of NVMeoF commit / complete queue pairs 615, which are currently communicating with... Figure 1 The multi-functional storage device 125 communicates with each remote host (or VM on the remote host) via an NVMeoF commit / complete queue pair 615. For example, Figure 6 The diagram shows three NVMeoF commit / complete queue pairs: one for host 0 (e.g., the super supervisor), one for host 1 (e.g., VM1), and one for host m (e.g., VM1). m), where m is a positive integer. (Note that, unlike when the remote host receives a response from bridging device 430, since bridging device 430 can be responsible for sending a response to the remote host when the commit is fully completed, the NVMeoF commit / complete queue pair 615 may consist only of the commit queue, and the complete queue may be omitted. For the purposes of this discussion, the term "commit / complete queue pair" is defined as representing a pair of queues, one for commit and one for complete, or a single queue for commit only.)

[0137] In addition to the NVMeoF commit / complete queue pair 615, the bridging device 430 may include a write buffer 620. The write buffer 620 can be used to store data received from a remote host that will be written to the storage device 435. That is, when the bridging device 430 receives a write request from a remote host, the request itself may be stored in the NVMeoF commit / complete queue pair 615, while the data to be written may be stored in the write buffer 620. As discussed below, the storage device 435 may access the data from the write buffer 620 to perform the write request, or the bridging device 430 may obtain the data from the write buffer 620 and include the data as part of the request when sending it to the storage device 435. The write buffer 620 may use on-chip storage or external memory (such as…). Figure 1 The memory 115 (which may be, for example, dual data rate (DDR) DRAM) is implemented.

[0138] At this point, messages originating from the remote host can still use the protocol used by the remote host: for example, NVMeoF. However, storage device 435 can use a different protocol: for example, NVMe. To handle the conversion from one protocol to another, bridging device 430 may include an NVMeoF to NVMe bridge 625. (If a protocol other than NVMeoF or NVMe is used, the NVMeoF to NVMe bridge 625 can be replaced by a bridge that converts between other protocols; therefore, a more general bridging circuit can be used to describe the NVMeoF to NVMe bridge 625.) While NVMeoF and NVMe are similar, they also have some differences. For example, NVMeoF can use messages for communication, while NVMe can use memory space transactions to transfer requests and data between the storage device and its host. Therefore, message-related information in a request from a remote host can be stored in some shared memory (such as a write buffer 620 for data to be written to the storage device, or some other shared memory for the request itself) rather than encapsulated in the message. To this end, the NVMeoF to NVMe bridge 625 can be used to terminate the NVMeoF protocol used by a remote host and can generate (as root) the NVMe protocol for the NVMe controller used by the storage device 435. The NVMeoF to NVMe bridge 625 can be used as a lightweight PCIe root union (via root port 630) to perform PCIe enumeration, initialization, and configuration of the storage device 435 and its controller. Root port 630 can be used to carry communication with the storage device 435, supporting enumeration of the storage device 435 via PCIe. Although Figure 6 The bridging device 430 is shown to include a root port 630, but embodiments of the inventive concept may allow the bridging device 430 to include any number (one or more) of root ports 630.

[0139] On the NVMe side of the NVMeoF to NVMe bridge 625, the bridge device 430 can communicate with the storage device 435 using another protocol (such as NVMe). The bridge device 430 may include NVMe commit / complete queue pairs 635. The bridge device 430 may include one NVMe commit / complete queue pair 635 for each function exposed by the storage device 435. Since the storage device 435 may expose more than one function, there may be more than one NVMe commit / complete queue pair for each storage device 435. For example, Figure 6 Three NVMe commit / complete queue pairs are shown: one for PF, one for VF1, and one for VF2. k .

[0140] Furthermore, for a given function exposed by storage device 435, there may be more than one NVMe commit / complete queue pair. For example, bridging device 430 may support two (or more) NVMe commit / complete queue pairs for a single exposed function to support the QoS requirements of the VM using that exposed function: one queue pair may be used for normal requests, while another queue pair may be used for priority requests (priority requests may be processed before any requests pending in the queue pair used for normal requests).

[0141] The bridging device 430 may also include a read buffer 640. The read buffer 640 is similar to the write buffer 620 and can be used to buffer data read from the storage device 435. When data is read from the storage device 435, the read data can be buffered in the read buffer 640 until all data has been read. Once all data has been read, the data can be read from the read buffer 640 and combined into a message that can be sent to a remote host using, for example, the NVMeoF protocol. The read buffer 640 can use on-chip storage devices or external memory (such as...). Figure 1 The memory 115 (external memory may be, for example, dual data rate (DDR) DRAM) is implemented.

[0142] There are several reasons why the bridging device 430 may include a write buffer 620 and a read buffer 640. One reason for including write buffer 620 and read buffer 640 could be to speed up command processing. For example, if a command requested by a remote host involves processing a large amount of data, waiting until all the data becomes available can cause the overall processing of the request to take longer than expected. By processing the request in smaller blocks, each block can be processed as soon as it becomes available, resulting in faster overall processing.

[0143] Another reason for including write buffer 620 and read buffer 640 could be to support commands involving very large amounts of data. According to the protocol used to communicate with storage device 435, there are limited restrictions on how much data can be sent to or received from storage device 435 in response to a single request. For requests that may further exceed the size of the requests processed by storage device 435, write buffer 620 and read buffer 640 can be used to segment data received from or to be sent to a remote host.

[0144] Because the NVMeoF to NVMe bridge 625 places messages in the NVMe commit / complete queue pair 635 to send messages to the storage device 435, and places responses in the NVMeoF commit / complete queue pair 615 to send responses to... Figure 3The remote hosts 305-1, 305-2, and / or 305-3 are provided, so the NVMeoF to NVMe bridge 625 can store a mapping between the remote hosts and the functions exposed by the storage device 435. This mapping may include, for example, a table that associates functions with the remote hosts that send requests for those functions.

[0145] Once bridging device 430 has received all data from storage device 435 (via the completion queue of NVMe commit / complete queue pair 635), NVMeoF to NVMe bridge 625 can convert the response from the NVMe protocol to the NVMeoF protocol. This process may involve, for example, taking a memory space transaction and generating a response suitable for the original NVMeoF request. Bridging device 430 can encapsulate the completion queue entry into an NVMeoF message for transmission back to the remote host.

[0146] As referenced above Figure 4B What it implies, Figure 4B The super supervisor 405 can also communicate with the storage device 435. Figure 4B The super supervisor 405 does not send user data to or receive user data from the storage device 435; however, the super supervisor 405 may want to configure the storage device 435 and / or the bridging device 430. Therefore, the bridging device 430 may include [configuration details for...]. Figure 4B The super supervisor 405's NVMeoF commit / completion queue pair 615 (in the case where the NVMe commit / completion queue pair 635 "corresponds" to the PF exposed by the storage device 435). Figure 4B The super supervisor 405 can use its NVMeoF commit / complete queue pair 615 to send requests to configure the bridging device 430 and / or the storage device 435 (more specifically, the functions exposed by the storage device 435).

[0147] In order to configure storage device 435, Figure 4B The super supervisor 405 can send requests to the PF exposed by the storage device 435. Figure 4B Examples of how the super supervisor 405 can configure the storage device 435 may include how to configure the NVMe controller for one or more functions exposed by the storage device 435, the allocation of resources (such as bandwidth and storage capacity) to the functions exposed by the storage device 435, and other management of the storage device 435. Figure 4B The Super Overseer 405 can also send messages that can be forwarded to Figure 4B Privacy and multi-tenancy policies for the PF 440 and the associated NVMe controller 450-1.

[0148] when Figure 4BWhen the super supervisor 405 sends a request to configure the bridging device 430, the configuration manager 645 can be used. Examples of commands that can be sent to the configuration manager 645 may include Quality of Service (QoS) requirements for one of the VMs running on the remote host and information about what features are enabled or disabled on the storage device 435. The configuration manager 645 can then use this information to configure the bridging device 430. For example, the configuration manager 645 may establish additional NVMe commit / complete queue pairs 635 for specific features exposed by the storage device 435 to provide different ways to organize requests sent to that feature.

[0149] The bridging device 430 can be responsible for managing the QoS of remote hosts. Although the host (more specifically, Figure 4B The super supervisor 405 can be responsible for generating QoS policies, but the bridging device 430 can enforce (or satisfy) the policy. The bridging device 430 can enforce the QoS policy in any desired component: for example, the QoS policy can be enforced by the eNIC 610, the NVMeoF submit / complete queue pair 615, the NVMeoF to NVMe bridge 625, the NVMe submit / complete queue pair 635, or the root port 630. Furthermore, the bridging device circuitry 430 can enable... Figure 4B Associated with PF 440 Figure 4B The NVMe controller 450-1 operates as a management module.

[0150] The bridging device 430 ensures that data is isolated between different remote hosts, but all remote hosts (and all requests originating from them) are processed fairly. For example, the bridging device 430 ensures that no request can be preempted by a higher-priority request indefinitely. The bridging device 430 can also provide... Figure 4B The super-observer 405 provides monitoring information, enabling... Figure 4B The Super Supervisor 405 can monitor the effectiveness of QoS policies and fine-tune them appropriately.

[0151] Configuration Manager 645 also has other uses. Configuration Manager 645 can be used to configure the SR-IOV capability of storage device 435. Configuration Manager 645 can also be based on... Figure 4B The super supervisor 405 requests the functionality to create, remove, configure, and manage the exposed functions of storage device 435.

[0152] Although Figure 6 A bridge device 430 connected to one storage device 435 is shown, but embodiments of the inventive concept may include connections to more than one storage device 435. Figure 4BThe bridging device 430. For each attached storage device 435, the bridging device 430 may include a separate root port 630, or the root port 630 may be shared by more than one storage device 435. Furthermore, although the above discussion suggests... Figure 4B The storage device 435 may include a PF having a plurality of associated VFs, but embodiments of the inventive concept may include a storage device 435 exposing more than one PF (with a set of associated VFs for each exposed PF).

[0153] Figure 7 Examples of embodiments according to the inventive concept are shown. Figure 4B Details of the NVMe SSD 435. Figure 7 The NVMe SSD 435 may include connector 705 (also referred to as a port), which can be used to connect the NVMe SSD 435 to... Figure 4B The bridging device 430. The NVMe SSD 435 may also include a host interface 710 (also referred to as host interface logic or HIL), an SSD controller 715, and various flash memory chips 425-1 to 425-8 (also referred to as flash storage devices) that can be organized into various channels 720-1 to 720-4. The host interface logic 710 manages the NVMe SSD 435 with other components (such as... Figure 4B Communication between bridging devices 430. Host interface logic 710 can manage interfaces spanning only a single connector 705, or it can manage interfaces spanning multiple connectors 705. Optionally, the NVMe SSD 435 may include multiple connectors 705, each connector 705 may have a separate host interface logic 710 to manage interfaces spanning that connector 705. Embodiments of the inventive concept may also combine these possibilities (e.g., an SSD with three connectors may have a first host interface logic for managing one connector and a second host interface logic for managing the other two connectors).

[0154] The SSD controller 715 can use a flash controller ( Figure 7 (Not shown) to manage read and write operations, garbage collection, and other operations on flash memory chips 425-1 to 425-8. The SSD controller 715 may include a flash translation layer 725. The SSD controller 715 may be implemented with appropriate software in some form of hardware (e.g., as a custom controller for the NVMe SSD 435 (such as an FPGA, ASIC, general-purpose processor, GPU, or other possibilities such as a general-purpose GPU, etc.)).

[0155] The SSD controller 715 may include a flash translation layer 725, which can manage, for example, by... Figure 6The bridging device 430 performs the translation between the logical block address used by the bridging device 430 and the physical block address in the flash memory chips 425-1 to 425-8 where the data is stored. Although Figure 7 The SSD controller 715 is shown as including a flash translation layer 725, but embodiments of the inventive concept may place the flash translation layer 725 in any desired location. Embodiments of the inventive concept may also place these modules within different portions of the NVMe SSD 435 (e.g., none of these modules are within the SSD controller 715).

[0156] Although Figure 7 The NVMe SSD 435 is shown as comprising eight flash memory chips 425-1 to 425-8 organized into four channels 720-1 to 720-4; however, embodiments of the inventive concept can support any number of flash memory chips organized into any number of channels. Similarly, although Figure 7 The structure of an SSD is shown, but other storage devices (e.g., hard disk drives) can be implemented using different but similarly beneficial structures.

[0157] Figures 8 to 9 Show Figure 4B Bridge device 430 processing Figure 3 Remote hosts 305-1, 305-2, and 305-3 and Figure 1 Multifunctional storage device 125 Figure 4B The messages between virtual functions 445-1 and 445-2. Figure 8 In the diagram, VM 410-1 is shown facing... Figure 4B The NVMe SSD 435 sends message 805 (recalling VM 410-1 for bridge device 430). Figure 4B The fact that the NVMe SSD 435 appears to be multiple NVMeoF SSDs is unknown. Message 805 may use the NVMeoF protocol and may include data 810.

[0158] Upon receiving message 805, bridging device 430 may apply tag 815 to message 805, enabling bridging device 430 to distinguish message 805 from messages sent by other remote hosts (or from other messages sent by VM 410-1). Bridging device 430 can then use... Figure 6 The NVMeoF commit / complete queue has 615 commit queues and Figure 6The NVMeoF to NVMe bridge 625 converts the (NVMeoF) message 805 into messages 820-1 and 820-2 that can be used in the NVMe protocol. Each of messages 820-1 and 820-2 can include its own data 825-1 and 825-2, respectively. Messages 820-1 and 820-2 can be placed in a format suitable for VF 445-1. Figure 6 The NVMe commit / complete queue is in the commit queue of 635 (this appears to be the function of the target NVMeoF SSD from the perspective of VM 410-1). Messages 820-1 and 820-2 may include information that associates them with tag 815: for example, a part of messages 820-1 and 820-2 may be tag 815 itself, so that once VF 445-1 has finished processing the request, Figure 6 The NVMeoF to NVMe bridge 625 is ready to respond appropriately to message 805. Optionally, the bridge device 430 may internally maintain tracking of (NVMe) messages 820-1 and 820-2 associated with (NVMeoF) message 805.

[0159] Note that (NVMeoF) message 805 can generate multiple (NVMe) messages 820-1 and 820-2. See above for reference. Figure 6 The discussion likely stems from the fact that sending multiple subcommands to the VF 445-1 is more efficient than sending a single command (or is necessary for other considerations). For example, a single NVMeoF write request can place data in blocks. Figure 6 In the write buffer 620, each block becomes part of a separate (NVMe) write request sent to the VF 445-1. Although Figure 8 Two messages, 820-1 and 820-2, are shown, but in response to (NVMeoF) message 805, bridging device 430 can send any number (one or more) (NVMe) messages to VF 445-1 according to (NVMeoF) message 805.

[0160] Messages 820-1 and 820-2 may differ from message 805 in some details (e.g., the protocol used in the messages). However, in the context, messages 820-1 and 820-2 may be... Figure 6 The NVMeoF to NVMe bridge 625 is generated based on message 805. Therefore, messages 820-1 and 820-2 can be referred to as "originating" from VM 410-1. Theoretically, message 805 itself can be relayed from another host, for example, on VM 410-1 or... Figure 3 When the remote host 305-1 operates as a router, message 805 may be referred to as being on VM 410-1, depending on the viewpoint applied. Figure 3 The remote host 305-1 or send the original message to Figure 3 The remote host 305-1 / VM 410-1 is located at any of the hosts "originating".

[0161] Figure 9 Show Figure 8 The opposite operation. In Figure 9 In this configuration, VF 445-1 can send one or more NVMe responses 905-1 and 905-2 to bridging device 430. These responses 905-1 and 905-2 can each include data 910-1 and 910-2, respectively. These responses can be... Figure 6 The NVMe commit / completion queue is received in the completion queue of 635. Once targeted Figure 8 Having received all responses to the original (NVMeoF) message 805, the bridging device 430 can then generate an (NVMeoF) response 915 with data 920, which can be placed... Figure 6 The NVMeoF commit / complete queue for 615 is in the completion queue (or response 915 can be sent directly to VM 410-1 without using...). Figure 6 The NVMeoF commit / complete queue has 615 completion queues.

[0162] As referenced above Figure 8 The subject of discussion Figure 8 Messages 820-1 and 820-2 may include Figure 8 Label 815 to maintain Figure 8 Messages 820-1 and 820-2 with Figure 8 The (NVMeoF) message 805 matches. Then, when responses 905-1 and 905-2 are sent back to the bridging device 430, VF 445-1 can include those tags in responses 905-1 and 905-2.

[0163] because Figure 8 The (NVMeoF) message 805 can produce Figure 8 The VF445-1 generates multiple NVMe messages 820-1 and 820-2, therefore it can generate multiple responses 905-1 and 905-2. Typically, the VF445-1 can be used for... Figure 8 Each message 820-1 and 820-2 generates a response 905-1 and 905-2, but in some embodiments of the inventive concept, two or more messages may result in a single response, or a single message may result in two or more responses. See above for reference. Figure 6The reason for this discussion is that sending multiple sub-responses to the bridging device 430 by the VF 445-1 may be more efficient than sending a single response to the bridging device 430 by the VF 445-1 (or may be necessary for other considerations).

[0164] Figures 10A to 10B Examples of embodiments according to the inventive concept are shown. Figure 4B Bridge device 430 processing Figure 3 Remote hosts 305-1, 305-2, and 305-3 and Figure 4B Storage device 435 Figure 4B A flowchart illustrating an example process for messages between functions 440, 445-1, and 445-2. Figure 10A In the middle, in box 1005, Figure 6 The bridging device 430 can be from Figure 3 Remote host 305-1, 305-2 or 305-3 receives Figure 8 NVMeoF message 805. In box 1010, Figure 6 The bridging device 430 can assign tag 815 to Figure 8 NVMeoF message 805. In box 1015, Figure 6 The bridging device 430 can Figure 8 The NVMeoF message 805 is placed Figure 6 The NVMeoF commit / complete queue is in the commit queue of 615. This process may also involve... Figure 8 The written data 810 is stored in Figure 6 The write buffer is in 620. In box 1020, Figure 6 The 625 NVMeoF to NVMe bridge can be based on Figure 8 NVMeoF message 805 generation Figure 8 NVMe messages 820-1 and 820-2. In box 1025, Figure 6 The bridging device 430 can Figure 3 Remote host 305-1, 305-2 or 305-3 is mapped to by Figure 4B The storage device 435 exposed Figure 4B Functions 440, 445-1, or 445-2.

[0165] In box 1030 ( Figure 10B ), Figure 6 The bridging device 430 can Figure 8 Messages 820-1 and / or 820-2 are sent to Figure 4B Storage device 435. Figure 8 Messages 820-1 and / or 820-2 can use the NVMe protocol. Figure 6 The bridging device 430 can connect via Figure 8 Messages 820-1 and / or 820-2 are placed in Figure 6 The NVMe commit / complete queue will be used to add 635 commits to the commit queue. Figure 8 Messages 820-1 and / or 820-2 are sent to Figure 4B Storage device 435. When it's time to... Figure 8 Messages 820-1 and / or 820-2 are processed to Figure 4B When the storage device is 435, Figure 4B The storage device 435 can be from Figure 6 The NVMe commit / completion queue has 635 entries. Figure 8 Messages 820-1 and / or 820-2, or, when Figure 4B It's time for the storage device 435 to... Figure 8 Messages 820-1 and / or 820-2 are processed to Figure 4B When the storage device is 435, Figure 6 The bridging device 430 can be used for sending to Figure 4B Storage device 435 Figure 8 Messages 820-1 and / or 820-2 generate memory space transactions and send them to Figure 4B Storage device 435.

[0166] final, Figure 4B The storage device 435 can Figure 9 NVMe response 905-1 and / or 905-2 sent to Figure 6 The NVMe commit / completion queue is a 635-bit completion queue. Figure 4B The storage device 435 can Figure 9 The responses 905-1 and / or 905-2 are placed in Figure 6 In the NVMe commit / completion queue of 635, or Figure 4B The storage device 435 can be targeted at Figure 9 The response 905-1 and / or 905-2 generates a memory space transaction, and when Figure 9 When responses 905-1 and / or 905-2 are ready, they will be sent to Figure 6 Bridging device 430. In frame 1035, Figure 6 The bridging device 430 can (possibly via from) Figure 6 The NVMe commit / completion queue has 635 completion queue entries. Figure 9 The response from 905-1 and / or 905-2) Figure 4B Storage device 435 receives Figure 9 The response is 905-1 and / or 905-2. This processing may involve... Figure 6 Read buffer 640 read Figure 9Read data 910-1 and / or 910-2. In box 1040, Figure 6 The NVMeoF to NVMe bridge 625 can generate Figure 9 The NVMeoF response is 915. Finally, in box 1045, Figure 6 The bridging device 430 can connect via Figure 9 The response 915 is placed Figure 6 The NVMeoF commit / completion queue is in the completion queue of 615 or through... Figure 9 The response 915 was sent directly to Figure 3 The remote host 305-1, 305-2 or 305-3, to... Figure 9 The response 915 was sent Figure 3 The remote host is 305-1, 305-2 or 305-3.

[0167] Although Figures 10A to 10B The process for handling a single message from a remote host is described, but embodiments of the inventive concept may include... Figure 6 The bridging device uses different messages more than 430 times, possibly simultaneously, and at different processing stages, to perform... Figures 10A to 10B The operations described in [the document]. That is to say, Figures 10A to 10B The example processing shown can be executed multiple times, where different messages from different remote hosts are transmitted to... Figure 4B The storage device 435 exposes different functions.

[0168] Figure 11 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device 430 maps the remote host to the interface provided by the bridging device 430. Figure 4B Flowcharts of example processes for functions 440, 445-1, and 445-2 exposed by storage device 435. Figure 11 In the middle, in box 1105, Figure 6 The bridging device 430 can Figure 4B Super Overseer 405 mapped to Figure 4B PF 440, and in box 1110, Figure 6 The bridging device 430 can Figure 4B Individual VMs 410-1, 410-2, and 410-3 are mapped to Figure 4B VF 445-1 and 445-2.

[0169] Figure 12 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device 430 processes from Figure 3 The remote hosts 305-1, 305-2, and 305-3 received Figure 8A flowchart illustrating an example process for a write request. Figure 12 In the middle, in box 1205, Figure 6 The bridging device 430 can be from Figure 3 Remote hosts 305-1, 305-2 and / or 305-3 receive Figure 8 Write request. In box 1210, Figure 6 The bridging device 430 can be from Figure 3 Remote hosts 305-1, 305-2 and / or 305-3 receive Figure 8 Write data 810. In box 1215, Figure 6 The bridging device 430 can Figure 8 Write data in 810 buffer Figure 6 The write buffer is 620. If Figure 8 Write data 810 from Figure 3 If remote hosts 305-1, 305-2, and / or 305-3 are received in blocks, then boxes 1210 and 1215 may be repeated. Finally, in box 1220, Figure 6 The bridging device 430 can Figure 8 Write data 810 from Figure 6 The write buffer 620 is sent to Figure 4B The VF 445-1 or 445-2 (as discussed above, this may involve sending) Figure 8 The written data 810 as Figure 8 Part of NVMe messages 820-1 and / or 820-2, or using memory space transactions.

[0170] Figure 13 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device 430 processes data from... Figure 1 Multifunctional storage device 125 Figure 4B Virtual functions 445-1 and 445-2 Figure 9 The flowchart shows the example process for reading responses 905-1 and 905-2.

[0171] exist Figure 13 In the middle, in box 1305, Figure 6 The bridging device 430 can be from Figure 4B VF 445-1 or 445-2 receiver Figure 9 The read response 905-1 and / or 905-2 (as discussed above, this may involve receiving) Figure 9 Data 910-1 and 910-2 as Figure 9 The NVMeoF response 915 data 920, or using memory space transactions). In box 1310, Figure 6 The bridging device 430 can be from Figure 4B VF445-1 or 445-2 receiver Figure 9 Read data 910-1 and / or 910-2. In box 1315, Figure 6 The bridging device 430 can Figure 9 Read data from buffers 910-1 and / or 910-2. Figure 6 In the read buffer 640. If it exists, it will be read from... Figure 4B VF 445-1 or 445-2 received Figure 9 If multiple blocks of data 910-1 and / or 910-2 are read, then boxes 1310 and 1315 can be repeated. Finally, in box 1320, Figure 6 The bridging device 430 can Figure 9 Read data 920 from Figure 6 The read buffer 640 is sent to Figure 3 Remote hosts 305-1, 305-2 and / or 305-3.

[0172] Figure 14 Examples of embodiments according to the inventive concept are shown. Figure 4B The flowchart illustrates an example process by which the bridging device 430 uses submit / complete queue pairs to enforce QoS regulations. Figure 14 In the middle, in box 1405, Figure 6 The bridging device 430 can be from Figure 3 Remote host 305-1, 305-2 or 305-3 receives Figure 8 Message 805. In box 1410, the NVMeoF to NVMe bridge 625 is optional. Figure 6 One of two or more NVMe commit / complete queues (pair 635) is used to enforce the QoS policy. In box 1405, Figure 6 The NVMeoF to NVMe bridge 625 can be used Figure 6 Select NVMe commit / complete queue pair 635 to enforce QoS policies.

[0173] Figure 15 Examples of embodiments according to the inventive concept are shown. Figure 4B The bridging device 430 uses the data sent to... Figure 1 Multifunctional storage device 125 Figure 4B The virtual functions 445-1 and 445-2 handle multiple messages from Figure 3 A flowchart illustrating an example process of messages received by remote hosts 305-1, 305-2, and 305-3. Figure 15 In box 1505, the NVMeoF to NVMe bridge 625 may be at least partially based on Figure 8 NVMeoF message 805 generation Figure 8Two or more NVMe messages 820-1 and / or 820-2. Finally, in box 1510, the NVMeoF to NVMe bridge 625 can... Figure 8 Two or more NVMe messages 820-1 and / or 820-2 are sent to by Figure 4B The storage device 435 exposes appropriate functions.

[0174] exist Figures 10A to 15 Some embodiments of the inventive concept are shown in the figures. However, those skilled in the art will recognize that other embodiments of the inventive concept are possible by changing the order of blocks, by omitting blocks, or by including links not shown in the figures regardless of any elements that can be specifically omitted. All such changes to the flowchart, whether explicitly described or not, are considered embodiments of the inventive concept.

[0175] Embodiments of the inventive concept include technical advantages over some implementations. Embodiments of the inventive concept provide a compact solution for implementing multiple storage functions with NVMeoF support. Any storage device exposing the functions (such as an SR-IOV storage device) can be used, and multiple remote hosts can connect to the same storage device simultaneously. No special functions or drivers are required to initialize the virtual functions, and remote hosts are completely unaware that the backend storage device may not be an NVMeoF storage device, or even a physically different storage device. No sideband communication between hosts or between a host and a storage device is required to initialize the exposed functions. Embodiments of the inventive concept are economical and energy-efficient, and the design is scalable.

[0176] The following discussion is intended to provide a brief general description of one or more suitable machines for realizing specific aspects of the inventive concept. One or more machines can be controlled at least in part by input from input devices (such as keyboards, mice, etc.) and by instructions received from another machine, interaction with a virtual reality (VR) environment, biometric feedback, or other input signals. As used herein, the term "machine" is intended to broadly encompass a single machine, a virtual machine, or a system of machines, virtual machines, or devices that are communicatively connected or operate together. Exemplary machines include computing devices (such as personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, etc.) and transportation devices (such as private or public transportation (e.g., cars, trains, taxis, etc.)).

[0177] One or more machines may include embedded controllers (such as programmable or non-programmable logic devices or arrays, application-specific integrated circuits (ASICs), embedded computers, smart cards, etc.). One or more machines may utilize one or more connections to one or more remote machines (such as via network interfaces, modems, or other communication connections). Machines may be interconnected via physical and / or logical networks (such as intranets, the Internet, local area networks, wide area networks, etc.). Those skilled in the art will understand that network communication may utilize various wired and / or wireless short-range or long-range carriers and protocols (including radio frequency (RF), satellite, microwave, IEEE 802.11, Bluetooth, etc.). (Optics, infrared, cables, lasers, etc.)

[0178] Embodiments of the inventive concept can be described by reference to or in conjunction with associated data including functions, processes, data structures, applications, etc., which, when accessed by a machine, cause the machine to perform tasks or define abstract data types or low-level hardware contexts. The associated data may be stored, for example, in volatile and / or non-volatile memory (e.g., RAM, ROM, etc.), or in other storage devices and their associated storage media (including hard disk drives, floppy disks, optical storage devices, magnetic tape, flash memory, memory sticks, digital video disks, bio-storage devices, etc.). The associated data may be transmitted over a transmission environment (including physical and / or logical networks) in the form of data packets, serial data, parallel data, propagated signals, etc., and may be used in compressed or encrypted formats. The associated data may be used in a distributed environment and stored locally and / or remotely for machine access.

[0179] Embodiments of the inventive concept may include a tangible, non-transitory machine-readable medium comprising instructions executable by one or more processors, the instructions including instructions for performing elements of the inventive concept as described herein.

[0180] Having described and illustrated the principles of the inventive concept with reference to the illustrated embodiments, it will be appreciated that the illustrated embodiments may be modified in arrangement and detail, and may be combined in any desired manner without departing from such principles. Furthermore, although the foregoing discussion focuses on particular embodiments, other configurations are contemplated. Specifically, even though expressions such as "according to embodiments of the inventive concept" are used herein, these phrases indicate the general applicability of the embodiments and are not intended to limit the inventive concept to particular embodiment configurations. As used herein, these terms may refer to the same or different embodiments that can be combined into other embodiments.

[0181] The foregoing illustrative embodiments should not be construed as limiting the inventive concept. Although several embodiments have been described, those skilled in the art will readily understand that numerous modifications to these embodiments may be possible without substantially departing from the novel teachings and advantages of this disclosure. Therefore, all such modifications are intended to be included within the scope of the inventive concept as defined in the claims.

[0182] Embodiments of the inventive concept can be extended to the following statements without limitation:

[0183] Statement 1: Embodiments of the inventive concept include a multi-functional storage device, comprising:

[0184] Server rack;

[0185] Storage devices, associated with the server rack, include:

[0186] Connector, used to receive first messages originating from the host using the first protocol;

[0187] Physical functions (PF) and virtual functions (VF) are exposed by the storage device via connectors;

[0188] Storage devices for data related to the first message; and

[0189] The controller manages the writing of data to and reading of data from the storage device; and

[0190] Bridging devices, associated with the server rack, include:

[0191] An embedded network interface controller (eNIC) is used to receive a second message from the host using a second protocol;

[0192] A write buffer is used to store write data that will be written from the host to the storage device;

[0193] A read buffer is used to store read data that will be read from the storage device for the host.

[0194] A bridging circuit is used to convert a second message using a second protocol into a first message using a first protocol; and

[0195] The root port is used to identify the storage device and to send the first message to the VF.

[0196] The bridging device is configured to map the host to the VF.

[0197] Statement 2. Embodiments of the inventive concept include the multi-functional storage device according to Statement 1, wherein the storage device implements single root input / output virtualization (SR-IOV).

[0198] Statement 3. Embodiments of the inventive concept include the multi-functional storage device according to Statement 1, wherein the connector includes a peripheral component interconnect fast (PCIe) connector.

[0199] Statement 4. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein the bridging device is implemented using at least one of the following: a processor, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), and a general-purpose GPU (GPGPU).

[0200] Statement 5. Embodiments of the inventive concept include the multi-functional storage device according to Statement 1, wherein the connection between the eNIC and the host is at least one of a wired connection, a wireless connection, and an optical connection.

[0201] Statement 6. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein the write buffer is at least one of an on-chip storage device and an external memory.

[0202] Statement 7. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein the read buffer is at least one of an on-chip storage device and an external memory.

[0203] Statement 8. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein:

[0204] The multifunctional storage device further includes: a second storage device, associated with the rack, the second storage device comprising:

[0205] Second connector;

[0206] The second PF and the second VF are exposed by the second storage device via the second connector;

[0207] A second storage device for data; and

[0208] The second controller is used to manage writing the second write data to the second storage device and reading the second read data from the second storage device.

[0209] The root port of the bridging device is configured to identify the storage device and the second storage device.

[0210] Statement 9. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein:

[0211] The storage device is configured to expose the PF, VF, and second VF, and

[0212] The multi-functional storage device is configured to support a host that communicates with the storage device using a VF and a second host that communicates with the storage device using a second VF.

[0213] Statement 10. Embodiments of the inventive concept include the multifunctional storage device according to Statement 9, wherein:

[0214] The host includes the first super supervisor and the first virtual machine (VM), and

[0215] The second host includes a second super supervisor and a second VM.

[0216] Statement 11. Embodiments of the inventive concept include the multifunctional storage device according to Statement 1, wherein the storage device includes a solid-state drive (SSD).

[0217] Statement 12. Embodiments of the inventive concept include the multifunctional storage device according to Statement 11, wherein:

[0218] The storage device includes:

[0219] Non-volatile memory fast (NVMe) SSDs; and

[0220] The first NVMe controller associated with PF and the second NVMe controller associated with VF,

[0221] The first protocol includes the NVMe protocol, and

[0222] The second protocol includes the NVMe (NVMeoF) protocol via a structure.

[0223] Statement 13. Embodiments of the inventive concept include the multifunctional storage device according to Statement 12, wherein,

[0224] The host includes a super supervisor and a VM, and

[0225] The storage device is configured to communicate with the super supervisor using PF and with the VM using VF.

[0226] Statement 14. Embodiments of the inventive concept include the multi-functional storage device according to Statement 13, wherein the bridging device includes:

[0227] The first NVMeoF commit / complete queue pair for the super supervisor;

[0228] A second NVMeoF commit / complete queue pair for the VM;

[0229] The first NVMe commit / complete queue pair used for PF; and

[0230] The second NVMe commit / complete queue pair for VF.

[0231] Statement 15. Embodiments of the inventive concept include the multifunctional storage device according to Statement 14, wherein:

[0232] The bridging device also includes: a third NVMe commit / complete queue pair for VF, and

[0233] The bridging device is configured to use a second NVMe commit / complete queue pair and a third NVMe commit / complete queue pair to meet the quality of service (QoS) requirements of the VM.

[0234] Statement 16. Embodiments of the inventive concept include the multifunctional storage device according to Statement 13, wherein:

[0235] The bridging device includes a configuration manager, and

[0236] The super supervisor is configured to use the configuration manager to manage the bridging device.

[0237] Statement 17. Embodiments of the inventive concept include the multi-functional storage device according to Statement 16, wherein a configuration manager is configured to manage an NVMe controller associated with a VF.

[0238] Statement 18. Embodiments of the inventive concept include the multi-functional storage device according to Statement 16, wherein a super supervisor is configured to establish QoS requirements for a VM.

[0239] Statement 19. Embodiments of the inventive concept include the multi-functional storage device according to Statement 16, wherein a super supervisor is configured to allocate resources of the storage device to a VM.

[0240] Statement 20: Embodiments of the inventive concept include the multifunctional storage device according to Statement 16, wherein a super-supervisor is configured to enable or disable the storage function of the storage device.

[0241] Statement 21: An embodiment of the inventive concept includes a multifunctional storage device according to Statement 16, wherein a super supervisor is configured to manage the storage device.

[0242] Statement 22. An embodiment of the inventive concept includes a multi-functional storage device according to Statement 12, wherein the bridging circuitry is configured to generate at least two first messages based at least in part on the second message for transmission to the storage device.

[0243] Statement 23. An embodiment of the inventive concept includes the multifunctional storage device according to Statement 1, wherein the bridging device operates to assign a tag to a second message.

[0244] Statement 24. An embodiment of the inventive concept includes the multi-functional storage device according to Statement 23, wherein the bridging device operates to assign a second tag to a third message, wherein the second tag is different from the first tag.

[0245] Statement 25. An embodiment of the inventive concept includes a method comprising:

[0246] The bridging device receives the first message from the host, and the first message uses the first protocol;

[0247] The second message is generated at least in part based on the first message, and the second message uses the second protocol;

[0248] Map the host to a virtual function (VF) exposed by the storage device; and

[0249] The second message is sent from the bridging device to the VF exposed by the storage device.

[0250] Meanwhile, the storage device simultaneously receives a third message from the second host.

[0251] Statement 26. Embodiments of the inventive concept include the method according to Statement 25, wherein the bridging device is implemented using at least one of the following: a processor, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), and a general-purpose GPU (GPGPU).

[0252] Statement 27. Embodiments of the inventive concept include the method according to Statement 25, wherein the storage device implements single root input / output virtualization (SR-IOV).

[0253] Statement 28. Embodiments of the inventive concept include the method according to Statement 25, wherein the step of sending a second message from the bridging device to a VF exposed by the storage device includes: sending the second message from the bridging device to the VF exposed by the storage device via a Peripheral Component Interconnect Fast (PCIe) connection.

[0254] Statement 29. An embodiment of the inventive concept includes the method according to Statement 25, wherein the step of receiving a first message from a host at the bridging device includes: receiving the first message from a host at the embedded network interface controller (eNIC) of the bridging device.

[0255] Statement 30, embodiments of the inventive concept include the method according to statement 29, wherein the step of receiving a first message from a host at the embedded network interface controller (eNIC) of the bridging device includes: receiving the first message from the host at the embedded network interface controller (eNIC) of the bridging device using at least one of a wired connection, a wireless connection, and an optical connection.

[0256] Statement 31, embodiments of the inventive concept include the method according to statement 25, further comprising:

[0257] The fourth message is received from the host at the bridging device, and the fourth message uses the first protocol;

[0258] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the second protocol;

[0259] Map the host to the second VF exposed by the second storage device; and

[0260] The fifth message is sent from the bridging device to the second VF exposed by the second storage device.

[0261] Statement 32, embodiments of the inventive concept include the method according to statement 25, further comprising:

[0262] The fourth message is received from the second host at the bridging device, and the fourth message uses the first protocol;

[0263] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the second protocol;

[0264] Map the host to a second VF exposed by the storage device; and

[0265] The fifth message is sent from the bridging device to the second VF exposed by the storage device.

[0266] Statement 33. Embodiments of the inventive concept include the method according to statement 25, wherein the storage device includes a solid-state drive (SSD).

[0267] Statement 34. Embodiments of the inventive concept include the method according to statement 33, wherein,

[0268] The storage device includes:

[0269] Non-volatile memory fast (NVMe) SSDs; and

[0270] NVMe controller associated with VF

[0271] The second protocol includes the NVMe protocol, and

[0272] The first protocol includes the NVMe (NVMeoF) protocol via a structure.

[0273] Statement 35. Embodiments of the inventive concept include the method according to statement 34, wherein,

[0274] The host includes the super supervisor and the VM.

[0275] The steps to map a host to a VF exposed by a storage device include:

[0276] Map the VM to the VF exposed by the storage device; and

[0277] Map the super-overseer to the physical function (PF) exposed by the storage device, and

[0278] The method further includes:

[0279] At the bridging device, a fifth message is received from the super-supervisor, the fifth message using the first protocol; a fourth message is generated at least in part based on the fifth message, the fourth message using the second protocol;

[0280] Send the fourth message from the bridging device to the PF exposed by the storage device.

[0281] Statement 36. An embodiment of the inventive concept includes the method according to statement 35, wherein the fifth message includes configuration information of at least one of the bridging device and the storage device.

[0282] Statement 37. Embodiments of the inventive concept include the method according to Statement 36, wherein the configuration information includes at least one of the following: NVMe configuration of an NVMe controller associated with a VF, resource allocation for a storage device for a VM, quality of service (QoS) requirements for the VM, instructions to enable or disable storage functionality of the storage device, and instructions to manage the storage device.

[0283] Statement 38. Embodiments of the inventive concept include the method according to statement 34, wherein the step of receiving the first message from the host at the bridging device includes:

[0284] At the bridging device, write requests are received from the host, and the write requests use the first protocol;

[0285] At the bridging device, receive data from the host for the write request; and

[0286] The data used for write requests is buffered in the write buffer.

[0287] Statement 39. An embodiment of the inventive concept includes the method according to statement 38, wherein the step of sending a second message from the bridging device to a VF exposed by the storage device includes: sending data for a write request from a write buffer to the VF exposed by the storage device.

[0288] Statement 40, embodiments of the inventive concept include the method according to statement 34, further comprising:

[0289] A fourth message is received at the bridging device from the VF exposed by the storage device. The fourth message uses the second protocol and is at least partially based on the first message.

[0290] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the first protocol;

[0291] Map the VF exposed by the storage device to the host; and

[0292] Send the fifth message from the bridging device to the host.

[0293] Statement 41, embodiments of the inventive concept include the method according to statement 40, wherein the step of receiving a fourth message at the bridging device from a VF exposed by the storage device includes:

[0294] At the bridging device, a read response is received from the VF exposed by the storage device, and the read response uses a second protocol;

[0295] At the bridging device, data for reading the response is received from the VF exposed by the storage device; and

[0296] The data used to read the response is buffered in the read buffer.

[0297] Statement 42, embodiments of the inventive concept include the method according to statement 41, wherein the step of sending a fifth message from the bridging device to the host includes: sending data for reading a response from a read buffer to the host.

[0298] Statement 43, embodiments of the inventive concept include the method according to statement 34, wherein,

[0299] The bridging device includes:

[0300] NVMeoF commit / complete queue pairs for VMs;

[0301] The first NVMe commit / complete queue pair for VF; and

[0302] The second NVMe commit / complete queue pair used for VF, and

[0303] The method further includes enforcing the host's QoS requirements using a first NVMe commit / complete queue pair for VF and a second NVMe commit / complete queue pair for VF.

[0304] Statement 44, embodiments of the inventive concept include the method according to statement 25, wherein,

[0305] The step of generating a second message based at least in part on a first message includes: generating a fourth message based at least in part on the first message, wherein the fourth message uses a second protocol, and

[0306] The step of sending a second message from the bridging device to the VF exposed by the storage device includes: sending a fourth message from the bridging device to the VF exposed by the storage device.

[0307] Statement 45, embodiments of the inventive concept include the method according to statement 25, wherein the step of receiving a first message from a host at a bridging device includes: assigning a tag to the first message.

[0308] Statement 46. An embodiment of the inventive concept includes the method according to statement 45, wherein the tag assigned to the first message is different from the second tag assigned to the third message.

[0309] Statement 47. Embodiments of the inventive concept include an apparatus comprising a non-transitory storage medium having instructions stored thereon, which, when executed by a machine, cause:

[0310] The bridging device receives the first message from the host, and the first message uses the first protocol;

[0311] The second message is generated at least in part based on the first message, and the second message uses the second protocol;

[0312] Map the host to a virtual function (VF) exposed by the storage device; and

[0313] The second message is sent from the bridging device to the VF exposed by the storage device.

[0314] Meanwhile, the storage device simultaneously receives a third message from the second host.

[0315] Statement 48. Embodiments of the inventive concept include the apparatus according to Statement 47, wherein the bridging device is implemented using at least one of the following: a processor, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), and a general-purpose GPU (GPGPU).

[0316] Statement 49. Embodiments of the inventive concept include the apparatus according to Statement 47, wherein the storage device implements single root input / output virtualization (SR-IOV).

[0317] Statement 50, embodiments of the inventive concept include the apparatus according to Statement 47, wherein the step of sending a second message from the bridging device to a VF exposed by the storage device includes: sending the second message from the bridging device to the VF exposed by the storage device via a Peripheral Component Interconnect Fast (PCIe) connection.

[0318] Statement 51. An embodiment of the inventive concept includes the apparatus according to Statement 47, wherein the step of receiving a first message from a host at the bridging device includes: receiving the first message from the host at the embedded network interface controller (eNIC) of the bridging device.

[0319] Statement 52, embodiments of the inventive concept include the apparatus according to Statement 51, wherein the step of receiving a first message from a host at the embedded network interface controller (eNIC) of the bridging device includes: receiving the first message from the host at the embedded network interface controller (eNIC) of the bridging device using at least one of a wired connection, a wireless connection, and an optical connection.

[0320] Statement 53, embodiments of the inventive concept include the apparatus according to Statement 47, wherein a non-transitory storage medium has further instructions stored thereon, which, when executed by a machine, cause:

[0321] The fourth message, which has the first protocol, is received from the host at the bridging device.

[0322] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the second protocol;

[0323] Map the host to the second VF exposed by the second storage device; and

[0324] The fifth message is sent from the bridging device to the second VF exposed by the second storage device.

[0325] Statement 54, embodiments of the inventive concept include the apparatus according to Statement 47, wherein a non-transitory storage medium has further instructions stored thereon, which, when executed by a machine, cause:

[0326] The fourth message is received from the second host at the bridging device, and the fourth message uses the first protocol;

[0327] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the second protocol;

[0328] Map the host to a second VF exposed by the storage device; and

[0329] The fifth message is sent from the bridging device to the second VF exposed by the storage device.

[0330] Statement 55. Embodiments of the inventive concept include the apparatus according to statement 47, wherein the storage device includes a solid-state drive (SSD).

[0331] Statement 56. Embodiments of the inventive concept include the apparatus according to statement 55, wherein,

[0332] The storage device includes:

[0333] Non-volatile memory fast (NVMe) SSDs; and

[0334] NVMe controller associated with VF

[0335] The second protocol includes the NVMe protocol.

[0336] The first protocol includes the NVMe (NVMeoF) protocol via a structure.

[0337] Statement 57. Embodiments of the inventive concept include the apparatus according to statement 56, wherein,

[0338] The host includes the super supervisor and virtual machines (VMs).

[0339] The steps to map a host to a VF exposed by a storage device include:

[0340] Map the VM to the VF exposed by the storage device; and

[0341] Map the super-overseer to the physical function (PF) exposed by the storage device, and

[0342] The device further includes:

[0343] The fifth message is received from the super-overseer at the bridging device; the fifth message uses the first protocol.

[0344] The fourth message is generated at least in part based on the fifth message, and the fourth message uses the second protocol;

[0345] Send the fourth message from the bridging device to the PF exposed by the storage device.

[0346] Statement 58. An embodiment of the inventive concept includes the apparatus according to Statement 57, wherein the fifth message includes configuration information of at least one of the bridging device and the storage device.

[0347] Statement 59. Embodiments of the inventive concept include the apparatus according to Statement 58, wherein the configuration information includes at least one of the following: NVMe configuration of an NVMe controller associated with a VF, resource allocation of a storage device for a VM, quality of service (QoS) requirements of the VM, instructions to enable or disable storage functionality of the storage device, and instructions to manage the storage device.

[0348] Statement 60, an embodiment of the inventive concept includes the apparatus according to statement 56, wherein the step of receiving the first message from the host at the bridging device includes:

[0349] At the bridging device, write requests are received from the host, and the write requests use the first protocol;

[0350] At the bridging device, receive data from the host for the write request; and

[0351] The data used for write requests is buffered in the write buffer.

[0352] Statement 61. An embodiment of the inventive concept includes the apparatus according to Statement 60, wherein the step of sending a second message from the bridging means to a VF exposed by the storage means includes: sending data for a write request from a write buffer to the VF exposed by the storage means.

[0353] Statement 62. An embodiment of the inventive concept includes the apparatus according to Statement 56, wherein a non-transitory storage medium has further instructions stored thereon, which, when executed by a machine, cause:

[0354] A fourth message is received at the bridging device from the VF exposed by the storage device. The fourth message uses the second protocol and is at least partially based on the first message.

[0355] The fifth message is generated at least in part based on the fourth message, and the fifth message uses the first protocol;

[0356] Map the VF exposed by the storage device to the host; and

[0357] Send the fifth message from the bridging device to the host.

[0358] Statement 63, embodiments of the inventive concept include the apparatus according to Statement 62, wherein the step of receiving a fourth message from a VF exposed by a storage device at a bridging device includes:

[0359] At the bridging device, a read response is received from the VF exposed by the storage device, and the read response uses a second protocol;

[0360] At the bridging device, data for reading the response is received from the VF exposed by the storage device; and

[0361] The data used to read the response is buffered in the read buffer.

[0362] Statement 64. An embodiment of the inventive concept includes the apparatus according to Statement 63, wherein the step of sending a fifth message from the bridging device to the host includes: sending data for reading a response from a read buffer to the host.

[0363] Statement 65. Embodiments of the inventive concept include the apparatus according to statement 56, wherein,

[0364] The bridging device includes:

[0365] NVMeoF commit / complete queue pairs for VMs;

[0366] The first NVMe commit / complete queue pair for VF; and

[0367] The second NVMe commit / complete queue pair used for VF, and

[0368] The apparatus further includes: enforcing QoS requirements of the host using a first NVMe commit / complete queue pair for VF and a second NVMe commit / complete queue pair for VF.

[0369] Statement 66. Embodiments of the inventive concept include the apparatus according to statement 47, wherein,

[0370] The step of generating a second message based at least in part on a first message includes: generating a fourth message based at least in part on the first message, wherein the fourth message uses a second protocol, and

[0371] The step of sending a second message from the bridging device to the VF exposed by the storage device includes: sending a fourth message from the bridging device to the VF exposed by the storage device.

[0372] Statement 67. An embodiment of the inventive concept includes the apparatus according to statement 47, wherein the step of receiving a first message from a host at a bridging device includes: assigning a tag to the first message.

[0373] Statement 68. An embodiment of the inventive concept includes the apparatus according to statement 67, wherein the tag assigned to the first message is different from the second tag assigned to the third message.

[0374] Therefore, given the various arrangements of the embodiments described herein, this detailed description and the appended materials are intended to be illustrative only and should not be considered as limiting the scope of the inventive concept. Thus, the claimed inventive concept encompasses all such modifications that are possible within the scope and spirit of the claims and their equivalents.

Claims

1. A multi-functional storage device, comprising: Server rack; Storage device, associated with a server rack, includes: a connector for receiving a first message originating from a host using a first protocol, the host including a super-supervisor and virtual machines; physical and virtual functions exposed by the storage device via the connector; storage devices for data related to the first message; and a controller for managing the writing of data to the storage devices and the reading of data from the storage devices; and Bridging devices, associated with the server rack, include: The first NVMeoF commit / complete queue pair for the super supervisor; A second NVMeoF commit / complete queue pair for the virtual machine; The first NVMe commit / complete queue pair for the physical function; A second NVMe commit / complete queue pair for the virtual function; An embedded network interface controller for receiving a second message from a host using a second protocol; A write buffer is used to store write data that will be written from the host to the storage device; A read buffer is used to store read data that will be read by the host from the storage device; A bridging circuit is used to convert a second message using a second protocol into a first message using a first protocol; and The root port is used to identify the storage device and to send the first message to the virtual function. The bridging mechanism is configured to map super-supervisors to physical functions exposed by the storage device and virtual machines to virtual functions exposed by the storage device. The storage device is configured to communicate with the super supervisor using the physical functions and with the virtual functions using the virtual functions.

2. The multifunctional storage device according to claim 1, wherein, The storage device implements single-root input / output virtualization.

3. The multifunctional storage device according to claim 1, wherein, The storage device is configured to expose the physical functionality, the virtual functionality, and the second virtual functionality, and The multi-functional storage device is configured to support a host that communicates with the storage device using the virtual function and a second host that communicates with the storage device using the second virtual function.

4. The multifunctional storage device according to claim 1, wherein, Storage devices include solid-state drives.

5. The multifunctional storage device according to claim 4, wherein, The storage device includes: Non-volatile memory fast NVMe solid-state drive; and A first NVMe controller associated with the physical function and a second NVMe controller associated with the virtual function. The first protocol includes the NVMe protocol, and The second protocol includes the fast NVMeoF protocol via non-volatile memory structure.

6. The multifunctional storage device according to claim 1, wherein, The bridging device also includes: a third NVMe commit / complete queue pair for the virtual function, and The bridging device is configured to use a second NVMe commit / complete queue pair and a third NVMe commit / complete queue pair to meet the quality of service requirements of the virtual machine.

7. The multifunctional storage device according to claim 5, wherein, The bridging circuit is configured to generate at least two first messages based at least in part on the second message and send them to the storage device.

8. The multi-functional storage device according to any one of claims 1 to 7, wherein, The bridging device operates by assigning a tag to the second message.

9. A method for processing messages, comprising: A first message is received from the host at the bridging device of the multi-functional storage device. The first message uses a first protocol, wherein the host includes a super supervisor and a virtual machine. The first message is placed in the commit queue of the NVMeoF commit / complete queue pair of the bridge device; The second message is generated by the bridging device based at least in part on the first message, and the second message uses the second protocol; The bridging device maps the host to virtual functions exposed by storage devices within the multi-functional storage device; and The commit queue of the NVMe commit / complete queue pair using the bridging device sends a second message from the bridging device to the virtual function exposed by the storage device. The storage device simultaneously receives a third message from the second host. The steps of mapping the host to virtual functions exposed by the storage device include: Map virtual machines to virtual functions exposed by storage devices; and Map the super-overseer to the physical functions exposed by the storage device. The storage device is configured to communicate with the super supervisor using the physical functions and with the virtual functions using the virtual functions.

10. The method according to claim 9, wherein, The storage device includes: Non-volatile memory fast NVMe solid-state drive; and The NVMe controller associated with the virtual function, The second protocol includes the NVMe protocol, and The first protocol includes the fast NVMeoF protocol via non-volatile memory in the architecture.

11. The method according to claim 10, wherein, The method further includes: The fifth message is received from the super-overseer at the bridging device; the fifth message uses the first protocol. The fourth message is generated at least in part based on the fifth message, and the fourth message uses the second protocol; Send the fourth message from the bridging device to the physical function exposed by the storage device.

12. The method according to claim 10, wherein, The steps of receiving the first message from the host at the bridging device of the multi-functional storage device include: At the bridging device, write requests are received from the host, and the write requests use the first protocol; At the bridging device, receive data from the host for the write request; and The data used for write requests is buffered in the write buffer of the bridge device.

13. The method of claim 10, further comprising: A fourth message is received at the bridging device from a virtual function exposed by the storage device. The fourth message uses a second protocol and is at least partially based on the first message. The fifth message is generated by the bridging device based at least in part on the fourth message, and the fifth message uses the first protocol; A bridging device maps virtual functions exposed by the storage device to the host. and The fifth message is sent from the bridging device to the host via the bridging device.

14. The method according to claim 13, wherein, The steps of receiving a fourth message from a virtual function exposed by a storage device at the bridging device include: At the bridging device, a read response is received from the virtual function exposed by the storage device, and the read response uses a second protocol; At the bridging device, data for reading responses is received from virtual functions exposed by the storage device; and The data used to read the response is buffered in the read buffer of the bridge device.

15. The method according to claim 10, wherein, The bridging device includes: NVMeoF commit / complete queue pairs for virtual machines; The first NVMe commit / complete queue pair for the virtual function; and The second NVMe commit / complete queue pair for the virtual function, and The method further includes enforcing the host's quality of service requirements using a first NVMe commit / complete queue pair for the virtual function and a second NVMe commit / complete queue pair for the virtual function.

16. The method according to any one of claims 9 to 15, wherein, The steps of receiving the first message from the host at the bridging device include: assigning a tag to the first message.

17. An apparatus comprising a non-transitory storage medium having instructions stored thereon, the instructions, when executed by a machine, causing the machine to perform a method of processing a message, the method comprising: A first message is received from the host at the bridging device of the multi-functional storage device. The first message uses a first protocol, wherein the host includes a super supervisor and a virtual machine. The first message is placed in the commit queue of the NVMeoF commit / complete queue pair of the bridge device; The second message is generated by the bridging device based at least in part on the first message, and the second message uses the second protocol; The bridging device maps the host to virtual functions exposed by storage devices within the multi-functional storage device; and The commit queue of the NVMe commit / complete queue pair using the bridging device sends a second message from the bridging device to the virtual function exposed by the storage device. The storage device simultaneously receives a third message from the second host. The steps of mapping the host to virtual functions exposed by the storage device include: Map virtual machines to virtual functions exposed by storage devices; and Map the super-overseer to the physical functions exposed by the storage device. The storage device is configured to communicate with the super supervisor using the physical functions and with the virtual functions using the virtual functions.

18. The apparatus according to claim 17, wherein, The method further includes: A fourth message is received at the bridging device from a virtual function exposed by the storage device. The fourth message uses a second protocol and is at least partially based on the first message. The fifth message is generated by the bridging device based at least in part on the fourth message, and the fifth message uses the first protocol; A bridging device maps virtual functions exposed by the storage device to the host; and The fifth message is sent from the bridging device to the host via the bridging device.