An inter-card communication method, a data processing unit, a network card, a medium and a product

By establishing a direct inter-card management channel within the DPU and utilizing the PCIe bus P2P communication capability, DMA data transmission between the DPU and NIC is achieved, solving the problem of low CPU computing power and bandwidth consumption for network card management in AI servers and realizing efficient inter-card data transmission.

CN122293618APending Publication Date: 2026-06-26SHENZHEN JAGUAR MICROSYSTEMS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN JAGUAR MICROSYSTEMS CO LTD
Filing Date
2026-05-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing AI servers, network card management is handled by the CPU, which consumes computing resources and has low bandwidth, making it impossible to achieve high-bandwidth, bidirectional, and flexible data transmission between cards.

Method used

By establishing a direct card-to-card management channel within the Data Processing Unit (DPU) and utilizing the P2P communication capability of the PCIe bus, DMA data transfer between the DPU and NIC is achieved, bypassing host memory and CPU intervention, and employing a custom communication channel design.

Benefits of technology

It frees up CPU computing resources, significantly improves data throughput, meets the low latency requirements for large-capacity data transmission in AI scenarios, and has a bandwidth of up to 64GB/s.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122293618A_ABST
    Figure CN122293618A_ABST
Patent Text Reader

Abstract

This application relates to an inter-card communication method, a data processing unit, a network interface card (NIC), a medium, and a product. The data processing unit includes a first physical functional device (PVM). The first BAR space of the first PVM includes at least the physical address of the second PVM of the first NIC, a first transmit queue producer pointer, a first transmit queue consumer pointer, and a first data buffer to be transmitted. The inter-card communication method includes: the data processing unit acquiring first data to be transmitted; writing the first data into the first data buffer to be transmitted according to the first transmit queue producer pointer value and the first transmit queue consumer pointer value; updating the first transmit queue producer pointer value according to the storage location of the first data; and writing the updated first transmit queue producer pointer value into the first NIC according to the physical address of the second PVM to notify the first NIC to read the first data. This application enables high-bandwidth transmission between the DPU and the NIC.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer communication technology, specifically to an inter-card communication method, a data processing unit, a network card, a computer-readable storage medium, and a computer program product. Background Technology

[0002] With the rapid development of artificial intelligence (AI) technology, the computing power demand of AI servers is increasing day by day. As AI application scenarios continue to upgrade and iterate, new challenges are posed to the control plane management channel. One is the increased bandwidth requirements, as new application scenarios require the transmission of large amounts of data, such as the "parent-child machine mapping table" (containing a large amount of IP address mapping information) and various statistical information inside the NIC.

[0003] In a typical AI server architecture, multiple graphics processing units (GPUs) and network interface cards (NICs) are usually installed. For example, a common one-machine-eight-NIC topology includes a server configured with eight GPUs and eight NICs.

[0004] Currently, NIC management is typically handled by a management program (MGMT) running on the central processing unit (CPU). This management program accesses the NIC's control plane through the PCIe switch to perform read and write operations on the NIC's internal registers, such as... Figure 1 As shown in path ①, this access method places the management program on the host CPU, which consumes the CPU's valuable computing resources. In addition, each access can only be a 4B register read / write access, which is too low in bandwidth. Moreover, each access can only be initiated by the management program on the host CPU acting as the Master, and the managed NIC cannot actively send data to the management program.

[0005] Therefore, there is an urgent need for a new card management channel solution that can support high bandwidth and flexible bidirectional transmission. Summary of the Invention

[0006] The purpose of this application is to propose a card-to-card communication method, device, and server to at least support high-bandwidth transmission between the DPU and NIC.

[0007] According to a first aspect of this application, an embodiment of this application proposes an inter-card communication method applied to a data processing unit. The data processing unit includes a first physical functional device, the first physical functional device including a first BAR space, the first BAR space including at least a first physical address storage space, a first sending queue producer pointer storage space, a first sending queue consumer pointer storage space, and a first data buffer to be sent. The first physical address storage space is used to store the physical address of the second physical functional device of the first network card, the first sending queue producer pointer storage space is used to store the first sending queue producer pointer value, the first sending queue consumer pointer storage space is used to store the first sending queue consumer pointer value, and the first data buffer to be sent is used to store data to be sent. The method includes: Obtain the first data to be sent, and write the first data into the first data to be sent buffer according to the first sending queue producer pointer value and the first sending queue consumer pointer value; Update the first sending queue producer pointer value according to the storage location of the first data in the first data to be sent buffer; The updated first transmit queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to read the first data; wherein, the first transmit queue consumer pointer value is updated by the first network card after reading at least a portion of the first data.

[0008] In some specific implementations, the first BAR space further includes a first receiving queue producer pointer storage space, a first receiving queue consumer pointer storage space, and a first data buffer to be received. The first receiving queue producer pointer storage space is used to store the first receiving queue producer pointer value, the first receiving queue consumer pointer storage space is used to store the first receiving queue consumer pointer value, and the first data buffer to be received is used to store the data to be received. The method further includes: The first receiving queue consumer pointer value is updated according to the data reading status of the first receiving data buffer, and the updated first receiving queue consumer pointer value is written into the second physical function device of the first network card to notify the first network card to send data. In response to the first network interface card updating the first receive queue producer pointer value, the second data is read from the first data to be received buffer according to the updated first receive queue producer pointer value and the first receive queue consumer pointer value; Update the first receive queue consumer pointer value according to the storage location of the second data in the first receive data buffer; The updated first receive queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to determine that the second data has been received by the data processing unit according to the updated first receive queue consumer pointer value.

[0009] In some specific implementations, the first physical functional device further includes a first CFG space, which stores the vendor ID, device ID, and physical address of the first physical functional device. The method further includes: Run the firmware to generate the first physical functional device; The first physical functional device is exposed to the host, and read requests from the host to access the first CFG space are received and responded to. The vendor ID and device ID of the first physical functional device are sent to the host. The write requests from the host to access the first CFG space are received and parsed to obtain the physical address of the first physical functional device. The physical address of the first physical functional device is written into the first CFG space. The physical address of the first physical functional device is obtained by the host through system address space allocation based on the vendor ID and device ID of the first physical functional device. The system receives and responds to the host's read request to access the first CFG space, sends the physical address of the first physical function device in the first CFG space to the host, so that the host writes the physical address of the first physical function device into the second physical function device of the first network card; it also receives and parses the host's write request to access the first BAR space to obtain the physical address of the first network card, and writes the physical address of the first network card into the first BAR space.

[0010] In some specific implementations, the first data includes multiple data packets; The method includes: During the process of continuously writing multiple data packets of the first data into the first data buffer to be sent, the first sending queue producer pointer value is updated once for each data packet written. If the distance between the first send queue producer pointer value and the first send queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the first send queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first send queue producer pointer value and the first send queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. If N data packets have been written consecutively, the first sending queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold.

[0011] In some specific implementations, writing the first data into the first data to be sent buffer based on the first send queue producer pointer value and the first send queue consumer pointer value includes: Generate a corresponding record header for each data packet of the first data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be sent. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

[0012] In some specific implementations, the second data includes multiple data packets; The method includes: During the process of continuously reading multiple data packets of the second data from the first data buffer to be received, the consumer pointer value of the first receiving queue is updated once for each data packet read. If the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the first receiving queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. If N data packets have been read consecutively, the first receive queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold.

[0013] In some specific implementations, reading the second data from the first data buffer based on the updated first receive queue producer pointer value and the first receive queue consumer pointer value includes: When reading any data packet of the second data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined based on the record length in the record header to read the payload, and / or the payload is extracted from the data block based on the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained based on the flag bit in the record header.

[0014] In some specific implementations, the first BAR space further includes a second physical address storage space, a second transmit queue producer pointer storage space, a second transmit queue consumer pointer storage space, a second receive queue producer pointer storage space, a second receive queue consumer pointer storage space, a second data to be transmitted buffer, and a second data to be received buffer. The second physical address storage space is used to store the physical address of the third physical function device of the second network card. The second transmit queue producer pointer storage space is used to store the second transmit queue producer pointer value. The second transmit queue consumer pointer storage space is used to store the second transmit queue consumer pointer value. The second receive queue producer pointer storage space is used to store the second receive queue producer pointer value. The second receive queue consumer pointer storage space is used to store the second receive queue consumer pointer value. The second data to be transmitted buffer is used to store data to be transmitted. The second data to be received buffer is used to store data to be received. The method includes: Obtain the third data to be sent, and write the third data into the second data to be sent buffer according to the producer pointer value of the second sending queue and the consumer pointer value of the second sending queue; Update the second sending queue producer pointer value according to the storage location of the third data in the second data to be sent buffer; The updated second transmit queue producer pointer value is written into the third physical function device of the second network interface card (NIC) according to the physical address of the third physical function device, so as to notify the second NIC to read the third data according to the updated second transmit queue producer pointer value; wherein, the second transmit queue consumer pointer value is updated by the second NIC after reading at least a portion of the third data; In response to the second network interface card updating the second receive queue producer pointer value, the fourth data is read from the second data to be received buffer according to the updated second receive queue producer pointer value and the second receive queue consumer pointer value; Update the consumer pointer value of the second receiving queue according to the storage location of the fourth data in the second data to be received buffer; The updated second receive queue consumer pointer value is written into the second network card according to the physical address of the third physical function device, so as to notify the second network card to determine, based on the updated second receive queue consumer pointer value, that at least a portion of the fourth data has been received by the data processing unit.

[0015] According to a second aspect of this application, an embodiment of this application proposes an inter-card communication method applied to a first network interface card (NIC). The first NIC includes a second physical function device (PVM). The second PVM of the first NIC includes a second BAR space. The second BAR space includes at least a third physical address storage space, a third send queue producer pointer storage space, and a third send queue consumer pointer storage space. The third physical address storage space is used to store the physical address of the first PVM of the data processing unit. The third send queue producer pointer storage space is used to store the third send queue producer pointer value. The third send queue consumer pointer storage space is used to store the third send queue consumer pointer value. In response to the data processing unit updating the third sending queue producer pointer value, the first data is read from the first data to be sent buffer of the data processing unit according to the physical address of the first physical function device, the third sending queue producer pointer value, and the third sending queue consumer pointer value; The third sending queue consumer pointer value is updated according to the storage location of the first data in the first data to be sent buffer. The updated third send queue consumer pointer value is written into the first physical function device of the data processing unit to notify the data processing unit that at least a portion of the first data has been read by the first network card.

[0016] In some specific implementations, the second BAR space also includes a third receiving queue producer pointer storage space and a third receiving queue consumer pointer storage space. The third receiving queue producer pointer storage space is used to store the third receiving queue producer pointer value, and the third receiving queue consumer pointer storage space is used to store the third receiving queue consumer pointer value. The method includes: The second data to be sent is obtained, and the second data is written into the first data to be received buffer of the data processing unit according to the physical address of the first physical function device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue. The producer pointer value of the third receiving queue is updated according to the storage location of the second data in the first data to be received buffer. The updated third receive queue producer pointer value is written to the first physical function device of the data processing unit to notify the data processing unit to read the second data from the first data to be received buffer.

[0017] In some specific implementations, the second BAR space also includes a channel identifier storage space, which is used to store a channel identifier storage space that uniquely corresponds to the first network card; The step of reading the first data from the first data buffer to be sent in the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value specifically includes: The first data is read from the first data buffer of the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, the third sending queue consumer pointer value, and the channel identifier; The step of writing the second data into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue specifically includes: The second data is written into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, the consumer pointer value of the third receiving queue, and the channel identifier.

[0018] In some specific implementations, the second physical function device also includes a second CFG space, which stores the vendor ID, device ID, and physical address of the second physical function device. The method further includes: Run the firmware to generate the second physical functional device; The system exposes the second physical function device to the host, receives and responds to the host's read request to access the second CFG space, and sends the vendor ID and device ID of the second physical function device to the host; it receives and parses the host's write request to access the second CFG space to obtain the physical address of the second physical function device, and writes the physical address of the second physical function device into the second CFG space; wherein, the physical address of the second physical function device is obtained by the host through system address space allocation based on the vendor ID and device ID of the second physical function device; The system receives and responds to the host's read request to access the second CFG space, sends the physical address of the second physical function device in the second CFG space to the host, so that the host writes the physical address of the second physical function device into the first physical device of the data processing unit; it also receives and parses the host's write request to access the second BAR space to obtain the physical address of the second physical function device, and writes the physical address of the second physical function device into the second BAR space.

[0019] In some specific implementations, the first data includes multiple data packets; The method includes: During the process of continuously reading multiple data packets of the first data from the first data buffer to be sent, the consumer pointer value of the third sending queue is updated once for each data packet read. If the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. If N data packets have been read consecutively, the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold.

[0020] In some specific implementations, reading the first data from the first data buffer to be sent in the data processing unit based on the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value includes: When reading any data packet of the first data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined based on the record length in the record header to read the payload, and / or the payload is extracted from the data block based on the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained based on the flag bit in the record header.

[0021] In some specific implementations, the second data includes multiple data packets; The method includes: During the process of continuously writing multiple data packets of the second data into the first data buffer to be received, the producer pointer value of the third receiving queue is updated once for each data packet written. If the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. If N data packets have been written consecutively, the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold.

[0022] In some specific implementations, writing the second data into the first data buffer of the data processing unit based on the physical address of the first physical functional device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue includes: Generate a corresponding record header for each data packet of the second data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be received. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

[0023] According to a third aspect of this application, embodiments of this application provide a data processing unit, including a module for performing the method as described in the first aspect of this application.

[0024] According to a fourth aspect of this application, embodiments of this application provide a network interface card (NIC) including a module for performing the method as described in the second aspect of this application.

[0025] According to a fifth aspect of this application, embodiments of this application provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first or second aspect of this application.

[0026] According to a sixth aspect of this application, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the method described in the first or second aspect of this application.

[0027] This application proposes an inter-card communication method, data processing unit, network interface card (NIC), medium, and product. It establishes a direct inter-card management channel between the DPU and NIC through a first physical functional device within the data processing unit (DPU) and a second physical functional device within the first network interface card (NIC). Utilizing the P2P (Peer-to-Peer) communication capability of the PCIe bus, it enables data transfer between the DPU and NIC to bypass host memory and CPU intervention, directly via DMA. Compared to traditional register read / write methods, this application's embodiments offer the following significant advantages: First, by offloading the management channel to the DPU, CPU computing resources are freed up, eliminating the need for the CPU to waste computing power on managing and controlling the NIC. Second, through a custom communication channel design, large-block data transfers are performed directly via DMA, significantly improving the data throughput of the management channel compared to traditional register read / write methods. This meets the low-latency requirements for transferring large amounts of data, such as parent-child machine mapping tables, in AI scenarios. Traditional register read / write methods can only handle 4 bytes per read / write operation, with a bandwidth of less than 500MB / s, while this embodiment uses DMA to transfer large blocks of data, achieving a bandwidth of up to 64GB / s. 80% = 51GB / s (based on PCIe Gen5) (Example 16). Attached Figure Description

[0028] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the accompanying drawings required in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0029] Figure 1 This is a schematic diagram of the topology and management channel path of the AI ​​server disclosed in the background technology.

[0030] Figure 2 This is a flowchart illustrating an inter-card communication method in one embodiment of this application.

[0031] Figure 3 This is a schematic diagram of the design of the CFG space and BAR space of the physical functional devices on the DPU side and NIC side in one embodiment of this application.

[0032] Figure 4 This is a schematic diagram illustrating the management channel device presentation and system address space distribution in one embodiment of this application; Figure 5 This is a schematic diagram of the software call stack hierarchy of the management channel in one embodiment of this application. Detailed Implementation

[0033] The detailed description of the accompanying drawings is intended to illustrate the present preferred embodiments of this application and is not intended to represent only the forms in which this application can be implemented. It should be understood that the same or equivalent functions can be achieved by different embodiments intended to be included within the spirit and scope of this application.

[0034] Example 1 See Figure 2 Embodiment 1 of this application proposes an inter-card communication method applied to a data processing unit (DPU). The data processing unit includes a first physical functional device, which includes a first BAR space. The first BAR space includes at least a first physical address storage space, a first transmit queue producer pointer storage space, a first transmit queue consumer pointer storage space, and a first data buffer to be transmitted. The first physical address storage space is used to store the physical address of the second physical functional device of the first network card. The first transmit queue producer pointer storage space is used to store the first transmit queue producer pointer value. The first transmit queue consumer pointer storage space is used to store the first transmit queue consumer pointer value. The first data buffer to be transmitted is used to store data to be transmitted. Specifically, the first Physical Functional Device (APF) is generated by the Integrated Management Unit (IMU) within the DPU through firmware simulation. This first PAF is exposed to the host via the PCIe bus, and the host allocates a physical address for it in the system address space. The first BAR space is a set of registers mapped to the host memory address space, such as... Figure 3 As shown, it includes at least: a first physical address storage space (e.g. Figure 3 The `Chnl 0 BAR0H / L` register is used to store the base address of the second physical function device of the first network interface card (NIC); the first transmit queue producer pointer storage space (e.g., `Chnl 0 txpi`) is read-only, and its first transmit queue producer pointer value (`txpi`) will be written to the corresponding `txring pi` register on the first NIC side; the first transmit queue consumer pointer storage space (e.g., `Chnl 0 BAR0H / L`) is used to store the base address of the second physical function device of the first NIC; the first transmit queue producer pointer storage space (e.g., `Chnl 0 txpi ... consumer pointer storage space (e.g., Figure 3 The register Chnl 0 tx ci is readable and writable, updated by the first network card, and its first transmit queue consumer pointer (tx ci) indicates the position that the first network card has processed; and the first data buffer to be transmitted (e.g. Figure 3 The Data Tx Buf 0 in the code is used to store the data to be sent to the first network card.

[0035] The MGMT program (the management module responsible for the control plane inside the network card driver) that is set on the host side in the existing technology is moved down to the DPU. The ARM CPU in the DPU runs the driver program of the first network card. The MGMT program module inside the driver program accesses the control plane register of the first network card through the PCIe switch, thereby realizing the management of the network card.

[0036] Specifically, the method includes: Step S101: Obtain the first data to be sent, and write the first data into the first data to be sent buffer according to the first sending queue producer pointer value and the first sending queue consumer pointer value; Specifically, the DPU acquires the first data to be sent (e.g., a parent-child mapping table or configuration commands). The first send queue producer pointer points to the next writable empty slot in the first send data buffer, and the first send queue consumer pointer points to the location of data already retrieved by the first network interface card. The DPU can calculate the currently available buffer space size (i.e., the difference between the first send queue producer pointer and the first send queue consumer pointer, which can retain a 16-byte safety window) based on the first send queue producer pointer and the first send queue consumer pointer. Then, it writes the first data to the buffer location indicated by the first send queue producer pointer. If the first data includes multiple data packets, they are written sequentially.

[0037] Step S102: Update the first sending queue producer pointer value according to the storage location of the first data in the first data to be sent buffer; Specifically, after the first data is written to the first data buffer to be sent, the DPU updates the first send queue producer pointer to point to the new next writable position. The updated first send queue producer pointer value reflects that the new data is ready.

[0038] Step S103: Write the updated first transmit queue producer pointer value into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to read the first data; wherein, the first transmit queue consumer pointer value is updated by the first network card after reading at least a portion of the first data; Specifically, the DPU reads the physical address of the second physical function device of the first network card stored in the first BAR space, and then initiates a write operation through the PCIe bus to write the updated first transmit queue producer pointer value into the first network card side register corresponding to the physical address of the second physical function device (e.g., via SDMA / DMA write operation). Figure 3 The first network interface card (NIC) updates the third transmit queue producer pointer value on its side using the TX ring PI. This write operation is equivalent to sending a doorbell signal to notify the first NIC that data is available for reading. Upon receiving this doorbell signal, the first NIC, based on its maintained third transmit queue producer pointer value (i.e., the DPU writes the updated first transmit queue producer pointer value to the first NIC's register) and the third transmit queue consumer pointer value (updated automatically by the first NIC based on data read status), initiates a data retrieval via the SDMA module, reading the first data from the DPU's first data buffer. After reading the first data, the first NIC updates the third transmit queue consumer pointer value on its side (e.g., ...). Figure 3 The first network card's TX ring ci) and the SDMA / DMA inside the first network card write the third transmit queue consumer pointer value back to the first transmit queue consumer pointer storage space of the DPU to update the first transmit queue consumer pointer value on the DPU side. This write operation is equivalent to sending a doorbell signal to notify the DPU that the data has been read.

[0039] DMA (Direct Memory Access) is a data transfer technology that allows peripherals to bypass the CPU and directly read and write memory, thereby improving data transfer efficiency. SDMA (System DMA) is an enhanced implementation based on this technology, specifically designed for this scenario. It is integrated into the DPU and the first network card chip, programmably controlled by the IMU firmware, and supports point-to-point (P2P) direct transmission between endpoint devices on the PCIe bus. This allows the DPU and the first network card to bypass the host memory and CPU and efficiently transfer large blocks of data directly between their data buffers, better meeting the management channel's requirements for high bandwidth and flexible bidirectional transmission.

[0040] It should be noted that, unlike the traditional approach where only the CPU, acting as the MASTER, can initiate read and write operations between the host CPU and the managed NIC, the custom management channel allows the SLAVE to proactively initiate data transmission to the MASTER. This management channel is more flexible and can be applied to a wider range of control plane management application scenarios.

[0041] In some embodiments, the first BAR space further includes a first receive queue producer pointer storage space, a first receive queue consumer pointer storage space, and a first data to be received buffer. The first receive queue producer pointer storage space is used to store the first receive queue producer pointer value, the first receive queue consumer pointer storage space is used to store the first receive queue consumer pointer value, and the first data to be received buffer is used to store the data to be received. like Figure 3 As shown, it also includes at least: a first receive queue producer pointer storage space (e.g., Chnl 0 rxpi), which is read-only, and its first receive queue producer pointer value (rx pi) will be written to the corresponding rxring pi register on the first network card side; a first receive queue consumer pointer storage space (e.g. Figure 3 The register Chnl 0 rx ci is readable and writable, updated by the first network card, and its first receive queue consumer pointer indicates the position that the first network card has processed; and the first data buffer to be received (e.g. Figure 3 The Data Rx Buf 0 in the DPU is used to store the data to be received by the DPU.

[0042] The method further includes: Step S201: Update the first receiving queue consumer pointer value according to the data reading status of the first data to be received buffer, and write the updated first receiving queue consumer pointer value into the second physical function device of the first network card to notify the first network card to send data; Specifically, the DPU updates the first receive queue consumer pointer value based on the data reading status of the first data to be received buffer, indicating which position in the first data to be received buffer the DPU has read data from, and promptly reclaims the cache resources of the first data to be received buffer. Then, the updated first receive queue consumer pointer value is written to the second physical function device of the first network card to notify the first network card, so that the first network card can know the data reading status of the first data to be received buffer by the DPU, and combined with the third receive queue producer pointer value maintained by the first network card itself, thereby determining whether there are enough free cache resources on the DPU side for receiving data.

[0043] Step S202: In response to the first network card updating the first receive queue producer pointer value, read the second data from the first data to be received buffer according to the updated first receive queue producer pointer value and the first receive queue consumer pointer value; Specifically, when the DPU has sufficient free buffer resources for receiving data, if the first network interface card (NIC) has second data (such as statistical DFX information) that needs to be actively reported to the DPU, the first NIC will call its internal SDMA / DMA module to write the second data into the DPU's first data buffer to be received via a PCIe write operation. After writing the data, it will update its own third receive queue producer pointer value and call the NIC's internal SDMA / DMA module again to write the updated third receive queue producer pointer value into the DPU's first receive queue producer pointer storage space via a PCIe write operation to update the first receive queue producer pointer value, which is equivalent to sending a doorbell signal to the DPU. When the DPU detects that the first receive queue producer pointer value in the first receive queue producer pointer storage space has been updated (i.e., a doorbell signal has been received), it will read the second data from the first data buffer to be received based on the updated first receive queue producer pointer value and its local first receive queue consumer pointer.

[0044] Step S203: Update the first receiving queue consumer pointer value according to the storage location of the second data in the first data to be received buffer; Specifically, each time the DPU reads a data packet from the first data buffer to be received, it updates the first receive queue consumer pointer to point to the next data to be read. The updated first receive queue consumer pointer value indicates the data position that the DPU has finished processing.

[0045] Step S204: Write the updated first receive queue consumer pointer value into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to determine that the second data has been received by the data processing unit according to the updated first receive queue consumer pointer value; Specifically, the DPU writes the updated first receive queue consumer pointer value to the corresponding storage space in the second physical functional device of the first network card (e.g., via SDMA / DMA write operations). Figure 3 The first network card (NIC) updates the third receive queue consumer pointer value on the first NIC side (rx ringci). This write operation is equivalent to sending a doorbell signal to the first NIC, notifying the first NIC that the DPU has successfully read the data and the corresponding buffer space has been released, so the first NIC can continue to write new data to this area.

[0046] In some embodiments, the first physical function device further includes a first CFG space, wherein the first CFG space stores the vendor ID, device ID and physical address of the first physical function device. Specifically, the first CFG space (configuration space) conforms to the PCIe specification and stores the vendor ID (VendorId), device ID (DeviceId), and physical address (BAR address) assigned to the first physical functional device by the host. The vendor ID can be the same as the service PF (physical device), while the device ID is set to a different value to prevent the host from loading standard drivers for it.

[0047] The method further includes: Step S301: During the startup phase, firmware is run to generate the first physical functional device; Specifically, during the DPU startup phase, the firmware running inside the IMU executes initialization code, creates a descriptor for the first physical functional device in the internal data structure, and configures its vendor ID, device ID, BAR space layout, and address mapping rules, thereby logically generating the first physical functional device.

[0048] Step S302: During the device enumeration phase, the first physical functional device is exposed to the host, a read request from the host to access the first CFG space is received and responded to, and the vendor ID and device ID of the first physical functional device are sent to the host; a write request from the host to access the first CFG space is received and parsed to obtain the physical address of the first physical functional device, and the physical address of the first physical functional device is written into the first CFG space; wherein, the physical address of the first physical functional device is obtained by the host through system address space allocation based on the vendor ID and device ID of the first physical functional device; Specifically, when the host performs PCIe device enumeration, it sends a configuration space read request to obtain the device's vendor ID and device ID. The DPU's IMU captures the request and returns the preset vendor ID and special device ID. The host identifies the device based on these IDs, allocates a physical address in the system address space for it, and then writes the physical address back through a configuration space write request. The IMU receives and parses the write request and stores the physical address allocated by the host in the first CFG space.

[0049] Step S303: During the driver loading phase, receive and respond to the host's read request to access the first CFG space, send the physical address of the first physical function device in the first CFG space to the host, so that the host writes the physical address of the first physical function device into the second physical function device of the first network card; receive and parse the host's write request to access the first BAR space to obtain the physical address of the first network card, and write the physical address of the first network card into the first BAR space; Specifically, during the host driver loading phase, it reads the first CFG space to obtain the physical address of the first physical functional device, and writes this physical address into the BAR space of the second physical functional device of the first network interface card (NIC) (i.e., the BAR address register of the peer physical functional device) through driver configuration. Simultaneously, the host also writes the physical address of the second physical functional device of the first NIC into the first physical address storage space in the first BAR space of the DPU. All of the above read and write requests are simulated responses by the DPU's IMU.

[0050] like Figure 4 As shown, in Embodiment 1, the host enumerates the Physical Function Devices (APFs) exposed by the DPU and NIC through the ECAM space, reads their CFG space to obtain the Vendor ID and Device ID, and assigns them BAR addresses in the system address space. The Physical Function Devices exposed by the DPU and NIC are respectively mapped to different address regions in the host memory. The DPU and NIC exchange the physical addresses of the peer's Physical Function Devices through the host's driver configuration, thereby establishing the address basis for P2P communication.

[0051] In some embodiments, the first data includes a plurality of data packets; The method includes: During the process of continuously writing multiple data packets of the first data into the first data buffer to be sent, the first sending queue producer pointer value is updated once for each data packet written. Specifically, when the DPU needs to send multiple data packets of a large block of data, it can write them continuously in one batch; after each data packet is successfully written, the first send queue producer pointer is updated immediately.

[0052] If the distance between the first send queue producer pointer value and the first send queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the first send queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first send queue producer pointer value and the first send queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. Specifically, to prevent overflow, after each update, the distance between the first send queue producer pointer value and the first send queue consumer pointer value (i.e., the available space on the ring) is checked; if the distance is less than or equal to a preset threshold (e.g., a 16B safety window), writing is paused, and the current first send queue producer pointer value is written to the first network interface card to trigger data migration; after the first network interface card processes and updates the first send queue consumer pointer value, the distance between the first send queue producer pointer value and the first send queue consumer pointer value recovers to be greater than the threshold, and writing continues.

[0053] If N data packets have been written consecutively, the first sending queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold.

[0054] Specifically, to avoid wasting bandwidth due to frequent PCIe doorbell write operations, when the number of continuously written data packets reaches a preset threshold N (e.g., 8 or 16), even if the safety window boundary has not been reached, the current first sending queue producer pointer value will be actively written to the first network card, triggering a batch data retrieval. The value of N can be configured according to the data packet size and latency requirements in the actual application.

[0055] In some embodiments, writing the first data into the first data to be sent buffer according to the first sending queue producer pointer value and the first sending queue consumer pointer value includes: Generate a corresponding record header for each data packet of the first data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be sent. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

[0056] Specifically, the custom Record data format in this embodiment is shown in Table 1; Table 1 Record Header Definitions

[0057] Specifically, before writing each data packet to the first data buffer to be sent, the DPU first constructs a 16-byte record header. The meanings of the fields in the record header are as follows: Flags are used to carry out-of-band information such as parity checks or priority markers; PSN (Packet Sequence Number) is used by the receiver to detect packet loss. If the receiver finds that the PSN is discontinuous, it can determine that a packet was lost during transmission. Record Length indicates the length of the entire data block (including the record header and payload), and must be a multiple of 4 bytes. Real Length indicates the actual number of bytes in the payload. Because PCIe transmissions typically require address alignment (such as 4-byte alignment), the payload may need padding. Record Length indicates the total length including padding, helping the receiver locate the next Record. Real Length indicates the length of valid data, helping the receiver extract valid information.

[0058] A timestamp is used to record the time when a data packet is generated, which facilitates latency analysis.

[0059] In some embodiments, the second data includes a plurality of data packets; The method includes: During the process of continuously reading multiple data packets of the second data from the first data buffer to be received, the consumer pointer value of the first receiving queue is updated once for each data packet read. Specifically, when the DPU reads multiple data packets continuously from the first data buffer to be received, it updates the consumer pointer of the first receive queue every time it successfully parses and reads the payload of a data packet.

[0060] If the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the first receiving queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. Specifically, if the distance between the consumer pointer and the producer pointer of the first receiving queue is less than or equal to a preset threshold (indicating that the data to be processed is about to be read empty), reading is paused, and the current value of the consumer pointer of the first receiving queue is written to the first network card, notifying the first network card that new data can continue to be written; after the first network card updates the value of the producer pointer of the first receiving queue (i.e., indicating that new data is written to the first data buffer to be received), the distance between the producer pointer value and the consumer pointer value of the first receiving queue returns to a value greater than the preset distance threshold, and reading continues.

[0061] If N data packets have been read consecutively, the first receive queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold. Specifically, to reduce the number of doorbell write operations, when the number of data packets continuously read by the DPU reaches a preset threshold N, even if the safety window boundary has not been reached, the current first receive queue producer pointer value is written to the first network card to notify the first network card that N data packets have been consumed, allowing the first network card to continue sending subsequent data.

[0062] In some embodiments, reading the second data from the first data buffer to be received based on the updated first receive queue producer pointer value and the first receive queue consumer pointer value includes: When reading any data packet of the second data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined according to the record length in the record header to read the payload, and / or the payload is extracted from the data block according to the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained according to the flag bit in the record header; Specifically, when reading any data packet, the DPU first reads the first 16 bytes from the first data buffer to be received as a header and parses out each field. The record length in the header determines the boundary of the entire data block, thus correctly segmenting subsequent data packets; the payload is extracted from the data block based on the actual length, avoiding reading invalid padding bytes; the PSN sequence number is used to determine if there is packet loss or out-of-order delivery; and out-of-band information (such as checksum results) is obtained based on the flags.

[0063] In some embodiments, the preset distance threshold can be set to 16 bytes, i.e., to retain a minimum safety window to prevent pointer tailing; the preset quantity threshold N can be set according to the PCIe maximum payload (MPS) and typical packet size, for example, 8, 16 or 32.

[0064] In some embodiments, the first BAR space further includes a second physical address storage space, a second transmit queue producer pointer storage space, a second transmit queue consumer pointer storage space, a second receive queue producer pointer storage space, a second receive queue consumer pointer storage space, a second data to be transmitted buffer, and a second data to be received buffer. The second physical address storage space is used to store the physical address of the third physical function device of the second network interface card. The second transmit queue producer pointer storage space is used to store the second transmit queue producer pointer value. The second transmit queue consumer pointer storage space is used to store the second transmit queue consumer pointer value. The second receive queue producer pointer storage space is used to store the second receive queue producer pointer value. The second receive queue consumer pointer storage space is used to store the second receive queue consumer pointer value. The second data to be transmitted buffer is used to store data to be transmitted. The second data to be received buffer is used to store data to be received. Specifically, in this embodiment, the DPU can establish management channels with multiple NICs simultaneously. A set of registers is independently allocated to each channel (each NIC) in the first BAR space. The configuration of each network card is the same as the configuration of the first network card. For example, ... Figure 3 As shown, the first BAR space also includes a second physical address storage space (Chnl 1 BAR0H / L), a second transmit queue producer pointer storage space (Chnl 1 tx pi), a second transmit queue consumer pointer storage space (Chnl 1 txci), a second receive queue producer pointer storage space (Chnl 1 rx pi), a second receive queue consumer pointer storage space (Chnl 1 rx ci), a second data buffer to be transmitted (Data Tx Buf1), and a second data buffer to be received (Data Rx Buf1).

[0065] The method includes: Obtain the third data to be sent, and write the third data into the second data to be sent buffer according to the producer pointer value of the second sending queue and the consumer pointer value of the second sending queue; Update the second sending queue producer pointer value according to the storage location of the third data in the second data to be sent buffer; The updated second transmit queue producer pointer value is written into the third physical function device of the second network interface card (NIC) according to the physical address of the third physical function device, so as to notify the second NIC to read the third data according to the updated second transmit queue producer pointer value; wherein, the second transmit queue consumer pointer value is updated by the second NIC after reading at least a portion of the third data; In response to the second network interface card updating the second receive queue producer pointer value, the fourth data is read from the second data to be received buffer according to the updated second receive queue producer pointer value and the second receive queue consumer pointer value; Update the consumer pointer value of the second receiving queue according to the storage location of the fourth data in the second data to be received buffer; The updated second receive queue consumer pointer value is written into the second network card according to the physical address of the third physical function device, so as to notify the second network card to determine, based on the updated second receive queue consumer pointer value, that at least a portion of the fourth data has been received by the data processing unit.

[0066] Specifically, the second network interface card (NIC) has the same structure as the first NIC, for example... Figure 3 As shown, the first BAR space of the DPU contains channels Chnl 0 (corresponding to the first network card) and Chnl 1 (corresponding to the second network card), which are independently configured with physical address storage space (Chnl 0 / 1 BAR0H / L), pointer storage space (Chnl 0 / 1 tx / rx pi / ci), and independent data buffers to be sent / received (Data Tx / Rx Buf 0 / 1) for the peer physical function device. The second network card can perform the same functions as the first network card. The communication process between the DPU and the second network card is the same as that between the DPU and the first network card. Therefore, the communication process between the DPU and the second network card will not be described in detail in this embodiment. The relevant technical features and principles of the second network card can be obtained by referring to the first network card.

[0067] Example 2 Embodiment 2 of this application proposes an inter-card communication method applied to a first network interface card (NIC). The first NIC includes a second physical function device. The second physical function device of the first NIC includes a second BAR space. The second BAR space includes at least a third physical address storage space, a third send queue producer pointer storage space, and a third send queue consumer pointer storage space. The third physical address storage space is used to store the physical address of the first physical function device of the data processing unit (DPU). The third send queue producer pointer storage space is used to store the third send queue producer pointer value. The third send queue consumer pointer storage space is used to store the third send queue consumer pointer value. Specifically, the second physical function device of the first network interface card (NIC) is also generated by the IMU inside the first NIC through firmware simulation. The second BAR space is a register group, and the second BAR space includes at least: a third physical address storage space (such as...). Figure 3 The Admin BAR0H / L is used to store the physical address of the first physical function device on the DPU side; the third send queue producer pointer storage space (such as...) Figure 3 The `tx ring pi` in the `<tx>` section is used to store the producer pointer value of the third send queue from the DPU (this value is updated by the DPU, indicating the data position that the DPU has written); the third send queue consumer pointer storage space (e.g., ...) Figure 3 The tx ring ci is used to store the third transmit queue consumer pointer value (indicating the data position that has been read) maintained by the first network card itself.

[0068] The method includes: Step S401: In response to the data processing unit updating the third sending queue producer pointer value, first data is read from the first data to be sent buffer of the data processing unit according to the physical address of the first physical function device, the third sending queue producer pointer value, and the third sending queue consumer pointer value. Specifically, when the DPU executes steps S101 to S103 in Embodiment 1 above, writing first data into the first data buffer to be transmitted in the DPU, updating the first transmit queue producer pointer value of the DPU, and writing the first transmit queue producer pointer value into the third transmit queue producer pointer storage space of the first network card to update the third transmit queue producer pointer value, the IMU of the first network card detects that the third transmit queue producer pointer value has changed (equivalent to receiving a doorbell signal). The first network card calculates the address and length of the data to be read based on the physical address of the first physical function device in the third physical address storage space, as well as the current third transmit queue producer pointer value and third transmit queue consumer pointer value. Then, it calls the SDMA / DMA module inside the first network card to directly pull the first data from the first data buffer to be transmitted in the DPU through a PCIe P2P read operation.

[0069] Step S402: Update the third sending queue consumer pointer value according to the storage location of the first data in the first data to be sent buffer; Specifically, each time the first network card successfully reads a data packet (e.g., after pulling it from the DPU to the local SRAM), it updates the local third send queue consumer pointer value to point to the next position to be read. The updated third send queue consumer pointer value indicates that the corresponding data on the DPU side has been consumed by the first network card.

[0070] Step S403: Write the updated third send queue consumer pointer value into the first physical function device of the data processing unit to notify the data processing unit that at least a portion of the first data has been read by the first network card. Specifically, the first network interface card (NIC) writes the updated third transmit queue consumer pointer value into the first transmit queue consumer pointer storage space of the first physical functional device of the DPU via a PCIe write operation. This is equivalent to sending a doorbell signal to the DPU, notifying the DPU that at least a portion of the first data has been read by the first NIC. After receiving the doorbell signal, the DPU can release or reuse the corresponding buffer space. It should be noted that the first NIC can write the updated third transmit queue consumer pointer value to the DPU in real time, periodically, or according to other preset conditions to update the first transmit queue consumer pointer value on the DPU side. The third transmit queue producer pointer storage space and the third transmit queue consumer pointer storage space, together with the first data buffer to be transmitted on the DPU side, realize data transmission from the DPU to the first NIC.

[0071] In some embodiments, the second BAR space further includes a third receive queue producer pointer storage space and a third receive queue consumer pointer storage space, wherein the third receive queue producer pointer storage space is used to store the third receive queue producer pointer value, and the third receive queue consumer pointer storage space is used to store the third receive queue consumer pointer value. Specifically, the second BAR space also includes the registers required for the first network card to actively send data to the DPU: the storage space for the third receive queue producer pointer (such as...). Figure 3 The `rx ring pi` in the code is used to store the producer pointer value of the third receive queue maintained locally by the first network interface card (indicating the data location where the first network interface card writes data to the DPU side), and the consumer pointer storage space of the third receive queue (e.g., ...). Figure 3 The rx ring ci is used to store the third receive queue consumer pointer value maintained by the DPU (indicating the current processing position of the data sent by the DPU to the first network card). The storage space of the third receive queue producer pointer and the storage space of the third receive queue consumer pointer work together with the first data buffer to be received on the DPU side to realize the data transmission from the first network card to the DPU.

[0072] The method includes: Step S501: Obtain the second data to be sent, and write the second data into the first data to be received buffer of the data processing unit according to the physical address of the first physical function device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue. Specifically, when the first network card needs to report data to the DPU (such as statistical DFX information), the first network card obtains the second data to be sent, calculates the free position in the first data buffer to be received on the DPU side based on the physical address of the first physical function device, as well as the current third receive queue producer pointer value and third receive queue consumer pointer value, and then writes the second data directly into the corresponding free position in the buffer through the SDMA / DMA module.

[0073] Step S502: Update the producer pointer value of the third receiving queue according to the storage location of the second data in the first data to be received buffer; Specifically, each time the first network card writes a data packet to the first data buffer to be received, it updates the third receive queue producer pointer value once. The updated third receive queue producer pointer value indicates that the new data has arrived at the corresponding position in the first data buffer to be received of the DPU.

[0074] Step S503: Write the updated third receiving queue producer pointer value into the first physical function device of the data processing unit to notify the data processing unit to read the second data from the first data to be received buffer. Specifically, the first network interface card (NIC) calls its internal SDMA / DMA module to write the updated third receive queue producer pointer value into the DPU's first receive queue producer pointer storage space via a PCIe write operation. This is equivalent to sending a doorbell signal to the DPU, notifying it that data is available for reading. Upon receiving this doorbell signal, the DPU executes the read operations in steps S201-S203 of the above embodiment. It should be noted that the first NIC can write the updated third receive queue producer pointer value to the DPU in real-time, periodically, or according to other preset conditions to update the first receive queue producer pointer value on the DPU side.

[0075] In some embodiments, the second BAR space further includes a channel identifier storage space, which is used to store a channel identifier storage space that uniquely corresponds to the first network interface card. Specifically, the second BAR space also includes a read-only channel identifier storage space (such as...). Figure 3 The ChnlId in the code is a channel identifier assigned to each first network interface card (NIC) during system initialization, used to uniquely identify the management channel between each NIC and the DPU. When the DPU connects to multiple first NICs simultaneously, the first BAR space of the DPU's first physical functional device includes multiple different data buffers (e.g., ...) corresponding one-to-one with the multiple first NICs / management channels. Figure 3 Data TxBuf0, Data Tx Buf1, etc.) and pointer storage space (e.g. Figure 3 In the case of Chnl 0 tx pi, Chnl 1 tx pi, etc., the first network card needs to carry the channel identifier when reading and writing the first data buffer to be sent, the first data buffer to be received, the first send queue consumer pointer storage space, the first receive queue producer pointer storage space, etc. of the DPU, so as to correctly route to the corresponding buffer or storage space.

[0076] The step of reading the first data from the first data buffer to be sent in the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value specifically includes: The first data is read from the first data buffer of the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, the third sending queue consumer pointer value, and the channel identifier; Specifically, when the first network interface card (NIC) initiates a DMA read, in addition to using the base address of the first physical function device on the DPU side and the offset calculated from the third transmit queue producer pointer value and the third transmit queue consumer pointer value, it also uses the channel identifier as part of the address calculation or as routing information in the DMA descriptor. For example, Figure 3 As shown, the DPU side allocates an independent data buffer (Data Tx Buf0, Data Tx Buf1, etc.) for each channel. The first network card selects the corresponding data buffer to be sent according to its own channel identifier (e.g., 0 or 1) to read, thereby avoiding data confusion between multiple network cards.

[0077] The step of writing the second data into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue specifically includes: The second data is written into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, the consumer pointer value of the third receiving queue, and the channel identifier.

[0078] Specifically, similarly, when the first network interface card (NIC) writes data to the DPU, it determines the corresponding data buffer to be received on the DPU side based on its own channel identifier (e.g., Figure 3 The idle positions calculated by combining the Data Rx Buf0, Data Rx Buf1, etc. in the third receiving queue producer pointer value and the third receiving queue consumer pointer value are used to initiate DMA write and directly write the data to the correct target buffer, thereby ensuring data isolation and correctness in a multi-channel environment.

[0079] In some embodiments, the second physical function device further includes a second CFG space, which stores the vendor ID, device ID and physical address of the second physical function device; Specifically, the second CFG space (configuration space) conforms to the PCIe specification and stores the vendor ID (VendorId), device ID (DeviceId), and physical address (BAR address) assigned to the second physical function device by the host. The vendor ID can be the same as the service PF (physical device), while the device ID is set to a different value to prevent the host from loading standard drivers for it.

[0080] The method further includes: Step S601: During the startup phase, firmware is run to generate the second physical functional device; Specifically, during the first network interface card (NIC) startup phase, its internal IMU runs firmware to create a descriptor for the second physical function device (PMP) within its internal data structure. This PMP configures the vendor ID, device ID, BAR space layout (including the aforementioned third physical address storage space, various pointer storage spaces, and channel identifier storage space), and address mapping rules, thereby logically generating the second PMP. The vendor ID can be the same as the service PF on the DPU side, while the device ID is set to a value that does not match the standard NIC driver to prevent the host from loading the default driver.

[0081] Step S602: During the device enumeration phase, the second physical function device is exposed to the host, a read request from the host to access the second CFG space is received and responded to, and the vendor ID and device ID of the second physical function device are sent to the host; a write request from the host to access the second CFG space is received and parsed to obtain the physical address of the second physical function device, and the physical address of the second physical function device is written into the second CFG space; wherein, the physical address of the second physical function device is obtained by the host through system address space allocation based on the vendor ID and device ID of the second physical function device; Specifically, when the host performs PCIe device enumeration, it sends a CFG space read request. The IMU of the first network interface card (NIC) captures the read request for the second physical function device and returns a preset vendor ID and a unique device ID. The host identifies the device based on these IDs, allocates a physical address (i.e., a BAR address) for it in the system address space, and then writes this physical address back to the second CFG space via a configuration space write request. The IMU receives and parses the write request, storing the physical address allocated by the host in the corresponding storage space within the second CFG space.

[0082] Step S603: During the driver loading phase, receive and respond to the host's read request to access the second CFG space, send the physical address of the second physical function device in the second CFG space to the host, so that the host writes the physical address of the second physical function device into the first physical device of the data processing unit; receive and parse the host's write request to access the second BAR space to obtain the physical address of the second physical function device, and write the physical address of the second physical function device into the second BAR space; Specifically, during the host driver loading phase, the driver reads the second CFG space to obtain the physical address of the second physical functional device, and then writes this address into the first physical address storage space in the first BAR space of the first physical functional device of the DPU through host software configuration. Simultaneously, the host also writes the physical address of the first physical functional device on the DPU side (already obtained by the host in step S303 of Embodiment 1) into the third physical address storage space (Admin BAR0H / L) in the second BAR space of the first network interface card (NIC) through configuration. These read and write requests are all simulated responses by the IMU of the first NIC, thereby completing the exchange of physical functional device addresses between the DPU and NIC, establishing the address mapping basis for subsequent P2P DMA transmission between the NIC and the DPU.

[0083] like Figure 4 As shown in Embodiment 2, the host enumerates the Physical Function Devices (APFs) exposed by the DPU and NIC through the ECAM space, reads their CFG space to obtain the Vendor ID and Device ID, and assigns them BAR addresses in the system address space. The Physical Function Devices exposed by the DPU and NIC are respectively mapped to different address regions in the host memory. The DPU and NIC exchange the physical addresses of the peer's Physical Function Devices through the host's driver configuration, thereby establishing the address basis for P2P communication.

[0084] In some embodiments, the first data includes a plurality of data packets; The method includes: During the process of continuously reading multiple data packets of the first data from the first data buffer to be sent, the consumer pointer value of the third sending queue is updated once for each data packet read. Specifically, when the first network card continuously retrieves multiple data packets from the first data buffer to be sent in the DPU, it updates the local third send queue consumer pointer value once after successfully receiving each data packet (for example, after moving the data from the PCIe bus to the local SRAM), so that it points to the next position to be read.

[0085] If the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. Specifically, to prevent data overwriting due to untimely processing by the first network interface card (NIC), after each update of the third transmit queue consumer pointer value, the first NIC checks the difference between the third transmit queue producer pointer value and the third transmit queue consumer pointer value (i.e., the amount of unprocessed data on the ring). If the difference is less than or equal to a preset distance threshold (e.g., a 16-bit safety window), it indicates that the writable space on the DPU side is about to be exhausted. At this time, the first NIC pauses initiating new DMA reads and calls the SDMA / DMA module to write the current third transmit queue consumer pointer value back to the first transmit queue consumer pointer storage space of the DPU via a PCIe write operation, sending a doorbell signal to the DPU. After receiving the doorbell signal, if there is still data to be sent, the DPU continues to write new data to the first data buffer to be sent, thereby increasing the distance between the third transmit queue producer pointer value and the third transmit queue consumer pointer value. When the first NIC detects that the distance is greater than the preset distance threshold, it resumes the read operation.

[0086] If N data packets have been read consecutively, the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold.

[0087] Specifically, to avoid frequent PCIe doorbell write operations, when the number of data packets continuously read by the first network card reaches a preset threshold N (e.g., 8 or 16), even if the safety window boundary has not been reached, the current third send queue consumer pointer value is actively written to the first physical management device of the DPU to notify the DPU that N data packets have been consumed, allowing the DPU to release or reuse the corresponding send buffer resources.

[0088] The method in this embodiment can effectively reduce small data write transactions on the PCIe bus and improve bus utilization efficiency through a batch update mechanism.

[0089] In some embodiments, reading the first data from the first data buffer to be sent in the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value includes: When reading any data packet of the first data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined according to the record length in the record header to read the payload, and / or the payload is extracted from the data block according to the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained according to the flag bit in the record header; Specifically, when the first network interface card (NIC) reads data from the first data buffer of the DPU, each data packet is preceded by a 16-byte header (format as shown in Table 1). The first NIC first reads this header and parses out the fields. Based on the record length, the number of bytes in the entire data block (header + payload + padding) can be determined, thus locating the start position of the next header; based on the actual length, the payload can be accurately extracted from the data block, ignoring padding bytes; based on the PSN sequence number, packet loss or out-of-order delivery can be detected (e.g., the expected PSN is not consecutive with the received PSN); based on the Flags, out-of-band information, such as hardware-calculated checksums or priority markers, can be obtained. These parsing operations are performed by the IMU firmware or hardware logic inside the first NIC, ensuring that the received data can be correctly unpacked and verified.

[0090] In some embodiments, the second data includes a plurality of data packets; The method includes: During the process of continuously writing multiple data packets of the second data into the first data buffer to be received, the producer pointer value of the third receiving queue is updated once for each data packet written. Specifically, when the first network card needs to send multiple data packets (such as DFX statistics) to the DPU in batches, after each DMA write operation of a data packet is completed (that is, the data is moved from the local SRAM / DMA module of the first network card to the first data buffer to be received in the DPU), the local third receive queue producer pointer value is updated once to point to the next free position to be written.

[0091] If the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. Specifically, to prevent the DPU's receive buffer from filling up, after each update of the third receive queue producer pointer value, the first network interface card (NIC) checks the difference between the third receive queue producer pointer value and the locally cached third receive queue consumer pointer value (updated by the DPU via a doorbell). If the difference is less than or equal to a preset distance threshold (indicating insufficient free buffer on the DPU side), new data writing is paused, and the SDMA / DMA module is invoked to write the current third receive queue producer pointer value to the DPU's first receive queue producer pointer storage space via a PCIe write operation. A doorbell signal is then sent to the DPU to notify it that new data is available for reading. After reading the data, the DPU updates the first receive queue consumer pointer value and writes it back to the first NIC's second physical function device via the doorbell signal, thereby updating the third receive queue consumer pointer value. When the distance between the third receive queue producer pointer value and the third receive queue consumer pointer value exceeds the preset distance threshold, the remaining data of the second data set is written.

[0092] If N data packets have been written consecutively, the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold. Specifically, to reduce the number of doorbell write operations, when the number of data packets continuously written by the first network card reaches a preset threshold N (e.g., 8 or 16), even if the safety window boundary has not been reached, the current third receive queue producer pointer value is actively written to the first physical management device of the DPU to notify the DPU that N new data packets are ready and to request the DPU to read them in time.

[0093] The method in this embodiment balances real-time performance and bus overhead through the aforementioned batch notification mechanism.

[0094] In some embodiments, writing the second data into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the third receive queue producer pointer value, and the third receive queue consumer pointer value includes: Generate a corresponding record header for each data packet of the second data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be received. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

[0095] Specifically, similar to the DPU's transmission direction, when the first network interface card (NIC) writes data packets to the DPU, it also constructs a 16-byte record header for each data packet (format as shown in Table 1), which is filled with Flags, PSN, Record Length, Real Length, and an optional Timestamp. Then, the record header is concatenated with the payload and written to the DPU's first receive data buffer via DMA. When the DPU reads the data, it correctly extracts the payload of each data packet, detects packet loss, and extracts out-of-band information by following the same record header parsing rules.

[0096] It should be noted that Embodiments 1 and 2 describe a bidirectional data transmission method for the same management channel from the DPU side and the NIC side, respectively. Specifically, the first transmit queue producer pointer value on the DPU side corresponds to the third transmit queue producer pointer value on the NIC side (the two are synchronized); the first transmit queue consumer pointer value on the DPU side corresponds to the third transmit queue consumer pointer value on the NIC side; similarly, the first receive queue producer pointer value on the DPU side corresponds to the third receive queue producer pointer value on the NIC side; and the first receive queue consumer pointer value on the DPU side corresponds to the third receive queue consumer pointer value on the NIC side. Through the cross-chip synchronization and doorbell mechanism of these pointers, efficient and reliable bidirectional data communication between the DPU and the NIC is achieved.

[0097] like Figure 5 As shown, the management channel software stack in Examples 1 and 2 consists of three layers: The application layer directly sends and receives raw data (such as mapping tables and DFX information). The message layer encapsulates data into a custom message format (including message type, operation object, etc.); The data transmission layer is further encapsulated into Record format and calls the DMA / SDMA module for transmission, reception, and flow control.

[0098] For example, when the NIC sends the second data to the DPU, the NIC (producer) is responsible for encapsulation. It automatically generates a Record header and concatenates it with the data payload, ensuring that the buffer stores a uniformly formatted data stream. The DPU (consumer) is responsible for parsing. The DPU hardware or firmware first reads the Record header, parsing out boundary and length information to accurately extract the data payload. Simultaneously, the DPU uses the PSN field to check data integrity. If a PSN jump is detected, a retransmission mechanism can be triggered or an error log can be recorded, enhancing the reliability of the management channel. For instance, when the DPU reads data from the receive buffer, it first reads the 16-byte header, parses it to find a Record Length of 64 bytes and a PSN of 101. The DPU then reads the following 64-byte payload, checks the PSN, and finds that the previous packet's PSN was 99. Determining that a packet with PSN 100 has been lost, the DPU sends a packet loss alarm to the NIC.

[0099] Example 3 Embodiment 3 of this application provides a data processing unit, including a module for performing the method as described in Embodiment 1 of this application.

[0100] Specifically, the data processing unit (DPU) includes at least a PCIe interface, an integrated management unit (IMU), a DMA / SDMA module, and internal memory. The IMU runs firmware configured to execute the steps described in Embodiment 1, including but not limited to: generating a first physical functional device (APF), managing its CFG and BAR spaces, responding to the host's configuration read / write requests, writing data to be sent into a first data buffer based on a first transmit queue producer / consumer pointer, updating the pointer, sending a doorbell message to the peer network card via the SDMA module, and reading data from the first data buffer and updating the receive consumer pointer in response to the network card's doorbell message. The SDMA module supports PCIe point-to-point (P2P) transmission, enabling data transfer directly between the network card and the host, bypassing the host. This data processing unit can be a specially designed chip or chipset used to offload host management tasks and achieve high-performance inter-card communication.

[0101] Example 4 Embodiment 4 of this application provides a network interface card (NIC) including a module for performing the method as described in Embodiment 2 of this application.

[0102] Specifically, the network interface card (NIC) at least includes: a PCIe interface, an integrated management unit (IMU), and a DMA / SDMA module. There is firmware running in the IMU, and the firmware is configured to execute the method steps described in Embodiment 2, including but not limited to: generating a second physical function device (APF), managing its CFG space and BAR space, responding to the host's configuration read / write requests, pulling data from the first data buffer to be sent of the DPU according to the change of the producer pointer on the DPU side, updating the consumer pointer and writing back the doorbell, and actively writing the data to be reported into the first data buffer to be received of the DPU, updating the producer pointer and sending the doorbell. The SDMA module also supports PCIe P2P transmission. This network interface card can be a Smart NIC or a dedicated network interface card supporting the DPU, and can cooperate with the management channel to achieve high-bandwidth two-way communication.

[0103] Embodiment 5 Embodiment 5 of the present application provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, it implements the method described in Embodiment 1 or 2 of the present application.

[0104] Specifically, the computer-readable storage medium can be any tangible medium that contains or stores program instructions, such as a read-only memory (ROM), a random access memory (RAM), a magnetic disk, an optical disc, a flash memory, etc. When the computer program is executed by a processor inside the DPU or the network interface card (such as a small core in the IMU or an independent CPU core), it enables the processor to execute the inter-card communication method described in Embodiment 1 or Embodiment 2, including steps such as device emulation, pointer update, doorbell sending, and data transfer. This storage medium can be integrated inside the DPU or network interface card chip, or can be used as an external firmware memory.

[0105] Embodiment 6 Embodiment 6 of the present application provides a computer program product, including a computer program. When the computer program is executed by a processor, it implements the method described in Embodiment 1 or 2 of the present application.

[0106] Specifically, the computer program product can be in the form of a firmware image, driver software, or an SDK (software development kit). When the computer program is loaded into the processor of the DPU or the network interface card and executed, it enables the processor to implement the inter-card communication method described in Embodiment 1 or Embodiment 2. This computer program product can be downloaded through the network or pre-installed in the hardware, facilitating system integration and mass deployment.

[0107] The various embodiments of this application have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many updates and modifications will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the embodiments in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method for inter-card communication, applied to a data processing unit, characterized in that, The data processing unit includes a first physical function device, which includes a first BAR space. The first BAR space includes at least a first physical address storage space, a first transmit queue producer pointer storage space, a first transmit queue consumer pointer storage space, and a first data buffer to be transmitted. The first physical address storage space is used to store the physical address of the second physical function device of the first network card. The first transmit queue producer pointer storage space is used to store the first transmit queue producer pointer value. The first transmit queue consumer pointer storage space is used to store the first transmit queue consumer pointer value. The first data buffer to be transmitted is used to store data to be transmitted. The method includes: Obtain the first data to be sent, and write the first data into the first data to be sent buffer according to the first sending queue producer pointer value and the first sending queue consumer pointer value; Update the first sending queue producer pointer value according to the storage location of the first data in the first data to be sent buffer; The updated first send queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to read the first data; The first transmit queue consumer pointer value is updated by the first network card after reading at least a portion of the first data.

2. The method of claim 1, wherein, The first BAR space also includes a first receive queue producer pointer storage space, a first receive queue consumer pointer storage space, and a first data to be received buffer. The first receive queue producer pointer storage space is used to store the first receive queue producer pointer value, the first receive queue consumer pointer storage space is used to store the first receive queue consumer pointer value, and the first data to be received buffer is used to store the data to be received. The method further includes: The first receiving queue consumer pointer value is updated according to the data reading status of the first receiving data buffer, and the updated first receiving queue consumer pointer value is written into the second physical function device of the first network card to notify the first network card to send data. In response to the first network interface card updating the first receive queue producer pointer value, the second data is read from the first data to be received buffer according to the updated first receive queue producer pointer value and the first receive queue consumer pointer value; Update the first receive queue consumer pointer value according to the storage location of the second data in the first receive data buffer; The updated first receive queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device, so as to notify the first network card to determine that the second data has been received by the data processing unit according to the updated first receive queue consumer pointer value.

3. The method according to claim 1 or 2, characterized in that, The first physical functional device further includes a first CFG space, which stores the vendor ID, device ID, and physical address of the first physical functional device; The method further includes: Run the firmware to generate the first physical functional device; The first physical functional device is exposed to the host, and read requests from the host to access the first CFG space are received and responded to. The vendor ID and device ID of the first physical functional device are sent to the host. The write requests from the host to access the first CFG space are received and parsed to obtain the physical address of the first physical functional device. The physical address of the first physical functional device is written into the first CFG space. The physical address of the first physical functional device is obtained by the host through system address space allocation based on the vendor ID and device ID of the first physical functional device. The system receives and responds to the host's read request to access the first CFG space, sends the physical address of the first physical function device in the first CFG space to the host, so that the host writes the physical address of the first physical function device into the second physical function device of the first network card; it also receives and parses the host's write request to access the first BAR space to obtain the physical address of the first network card, and writes the physical address of the first network card into the first BAR space.

4. The method of claim 1, wherein, The first data includes multiple data packets; The method includes: During the process of continuously writing multiple data packets of the first data into the first data buffer to be sent, the first sending queue producer pointer value is updated once for each data packet written. If the distance between the first send queue producer pointer value and the first send queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the first send queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first send queue producer pointer value and the first send queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. If N data packets have been written consecutively, the first sending queue producer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold.

5. The method of claim 4, wherein, The step of writing the first data into the first data to be sent buffer based on the first sending queue producer pointer value and the first sending queue consumer pointer value includes: Generate a corresponding record header for each data packet of the first data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be sent. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

6. The method of claim 2, wherein, The second data includes multiple data packets; The method includes: During the process of continuously reading multiple data packets of the second data from the first data buffer to be received, the consumer pointer value of the first receiving queue is updated once for each data packet read. If the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the first receiving queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device. When the distance between the first receiving queue producer pointer value and the first receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. If N data packets have been read consecutively, the first receive queue consumer pointer value is written into the second physical function device of the first network card according to the physical address of the second physical function device; where N is a preset quantity threshold.

7. The method of claim 6, wherein, The step of reading the second data from the first data buffer based on the updated first receive queue producer pointer value and the first receive queue consumer pointer value includes: When reading any data packet of the second data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined based on the record length in the record header to read the payload, and / or the payload is extracted from the data block based on the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained based on the flag bit in the record header.

8. The method of claim 2, wherein, The first BAR space further includes a second physical address storage space, a second transmit queue producer pointer storage space, a second transmit queue consumer pointer storage space, a second receive queue producer pointer storage space, a second receive queue consumer pointer storage space, a second data to be transmitted buffer, and a second data to be received buffer. The second physical address storage space is used to store the physical address of the third physical function device of the second network card. The second transmit queue producer pointer storage space is used to store the second transmit queue producer pointer value. The second transmit queue consumer pointer storage space is used to store the second transmit queue consumer pointer value. The second receive queue producer pointer storage space is used to store the second receive queue producer pointer value. The second receive queue consumer pointer storage space is used to store the second receive queue consumer pointer value. The second data to be transmitted buffer is used to store data to be transmitted. The second data to be received buffer is used to store data to be received. The method includes: Obtain the third data to be sent, and write the third data into the second data to be sent buffer according to the producer pointer value of the second sending queue and the consumer pointer value of the second sending queue; Update the second sending queue producer pointer value according to the storage location of the third data in the second data to be sent buffer; The updated second transmit queue producer pointer value is written into the third physical function device of the second network interface card (NIC) according to the physical address of the third physical function device, so as to notify the second NIC to read the third data according to the updated second transmit queue producer pointer value; wherein, the second transmit queue consumer pointer value is updated by the second NIC after reading at least a portion of the third data; In response to the second network interface card updating the second receive queue producer pointer value, the fourth data is read from the second data to be received buffer according to the updated second receive queue producer pointer value and the second receive queue consumer pointer value; Update the consumer pointer value of the second receiving queue according to the storage location of the fourth data in the second data to be received buffer; The updated second receive queue consumer pointer value is written into the second network card according to the physical address of the third physical function device, so as to notify the second network card to determine, based on the updated second receive queue consumer pointer value, that at least a portion of the fourth data has been received by the data processing unit.

9. A method for inter-card communication, applied to a first network card, the method comprising: The first network interface card (NIC) includes a second physical function device (PVM). The second PVM includes a second BAR space. The second BAR space includes at least a third physical address storage space, a third transmit queue producer pointer storage space, and a third transmit queue consumer pointer storage space. The third physical address storage space is used to store the physical address of the first PVM of the data processing unit. The third transmit queue producer pointer storage space is used to store the third transmit queue producer pointer value. The third transmit queue consumer pointer storage space is used to store the third transmit queue consumer pointer value. In response to the data processing unit updating the third sending queue producer pointer value, the first data is read from the first data to be sent buffer of the data processing unit according to the physical address of the first physical function device, the third sending queue producer pointer value, and the third sending queue consumer pointer value; The third sending queue consumer pointer value is updated according to the storage location of the first data in the first data to be sent buffer. The updated third send queue consumer pointer value is written into the first physical function device of the data processing unit to notify the data processing unit that at least a portion of the first data has been read by the first network card.

10. The method of claim 9, wherein, The second BAR space also includes a third receive queue producer pointer storage space and a third receive queue consumer pointer storage space. The third receive queue producer pointer storage space is used to store the third receive queue producer pointer value, and the third receive queue consumer pointer storage space is used to store the third receive queue consumer pointer value. The method includes: The second data to be sent is obtained, and the second data is written into the first data to be received buffer of the data processing unit according to the physical address of the first physical function device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue. The producer pointer value of the third receiving queue is updated according to the storage location of the second data in the first data to be received buffer. The updated third receive queue producer pointer value is written to the first physical function device of the data processing unit to notify the data processing unit to read the second data from the first data to be received buffer.

11. The method of claim 10, wherein, The second BAR space also includes a channel identifier storage space, which is used to store a channel identifier storage space that uniquely corresponds to the first network card; The step of reading the first data from the first data buffer to be sent in the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value specifically includes: The first data is read from the first data buffer of the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, the third sending queue consumer pointer value, and the channel identifier; The step of writing the second data into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue specifically includes: The second data is written into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, the consumer pointer value of the third receiving queue, and the channel identifier.

12. The method according to any one of claims 9 to 11, characterized in that, The second physical function device also includes a second CFG space, which stores the vendor ID, device ID, and physical address of the second physical function device; The method further includes: Run the firmware to generate the second physical functional device; The system exposes the second physical function device to the host, receives and responds to the host's read request to access the second CFG space, and sends the vendor ID and device ID of the second physical function device to the host; it receives and parses the host's write request to access the second CFG space to obtain the physical address of the second physical function device, and writes the physical address of the second physical function device into the second CFG space; wherein, the physical address of the second physical function device is obtained by the host through system address space allocation based on the vendor ID and device ID of the second physical function device; The system receives and responds to the host's read request to access the second CFG space, sends the physical address of the second physical function device in the second CFG space to the host, so that the host writes the physical address of the second physical function device into the first physical device of the data processing unit; it also receives and parses the host's write request to access the second BAR space to obtain the physical address of the second physical function device, and writes the physical address of the second physical function device into the second BAR space.

13. The method according to claim 9, characterized in that, The first data includes multiple data packets; The method includes: During the process of continuously reading multiple data packets of the first data from the first data buffer to be sent, the consumer pointer value of the third sending queue is updated once for each data packet read. If the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is less than or equal to a preset distance threshold, then data reading is paused, and the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third sending queue producer pointer value and the third sending queue consumer pointer value is greater than the preset distance threshold, the remaining data is read again. If N data packets have been read consecutively, the third sending queue consumer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold.

14. The method according to claim 13, characterized in that, The step of reading first data from the first data buffer to be sent in the data processing unit according to the physical address of the first physical functional device, the third sending queue producer pointer value, and the third sending queue consumer pointer value includes: When reading any data packet of the first data, the record header of the data packet is extracted; wherein, the record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information of the data; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; the timestamp is used to record the time information of the data packet; The storage boundary of the data block is determined based on the record length in the record header to read the payload, and / or the payload is extracted from the data block based on the actual length in the record header, and / or the packet sequence number in the record header is used to determine whether there is packet loss, and / or the out-of-band information of the data is obtained based on the flag bit in the record header.

15. The method according to claim 10, characterized in that, The second data includes multiple data packets; The method includes: During the process of continuously writing multiple data packets of the second data into the first data buffer to be received, the producer pointer value of the third receiving queue is updated once for each data packet written. If the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is less than or equal to a preset distance threshold, then data writing is paused, and the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device. When the distance between the third receiving queue producer pointer value and the third receiving queue consumer pointer value is greater than the preset distance threshold, the remaining data is written again. If N data packets have been written consecutively, the third receiving queue producer pointer value is written into the first physical function device of the data processing unit according to the physical address of the first physical function device; where N is a preset quantity threshold.

16. The method according to claim 15, characterized in that, The step of writing the second data into the first data buffer of the data processing unit according to the physical address of the first physical functional device, the producer pointer value of the third receiving queue, and the consumer pointer value of the third receiving queue includes: Generate a corresponding record header for each data packet of the second data, and concatenate the generated record header with the payload of the corresponding data packet and write it into the first data buffer to be received. The record header includes at least one of a flag bit, a data packet sequence number, a record length, a true length, and a timestamp; the flag bit is used to carry out-of-band information; the data packet sequence number is used by the data receiver to determine whether there is packet loss; the record length is used to indicate the length of the data block; the true length is used to indicate the true length of the payload; and the timestamp is used to record the time information of the data packet.

17. A data processing unit, characterized in that, Includes a module for performing the method as described in any one of claims 1 to 8.

18. A network interface card (NIC), characterized in that, Includes a module for performing the method as described in any one of claims 9 to 16.

19. A computer-readable storage medium, characterized in that, It stores a computer program thereon, which, when executed by a processor, implements the method as described in any one of claims 1 to 16.

20. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the method as described in any one of claims 1 to 16.