RDMA-based packet sending method, RDMA-based packet processing method, system, and electronic device

By carrying message identifiers and indication information in RDMA messages, out-of-order message reception is allowed, which solves the problem of low transmission efficiency in RDMA ordered transmission mode and achieves efficient network transmission and load balancing.

WO2026138245A1PCT designated stage Publication Date: 2026-07-02ZTE CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ZTE CORP
Filing Date
2025-11-14
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

When using the order-preserving transmission mode provided by RDMA technology to transmit messages, the transmission efficiency is low, especially since packet loss or out-of-order delivery leads to high retransmission costs, affecting network transmission efficiency.

Method used

The sending end carries message identification and indication information in the message, allowing messages to be received out of order. The receiving end processes the messages according to the indication information, and only performs order preservation within the message, reducing the scope of retransmission.

Benefits of technology

By limiting retransmissions to within the message itself, network transmission efficiency and load balancing are improved, retransmission costs are reduced, and overall network utilization is increased.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025134952_02072026_PF_FP_ABST
    Figure CN2025134952_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present application provides an RDMA-based packet sending method, an RDMA-based packet processing method, a system, and an electronic device. The packet sending method comprises: sending a first packet to a receiving end, wherein the first packet carries indication information and identification information of a message to which the first packet belongs, and the indication information is used for indicating whether out-of-order message reception is allowed.
Need to check novelty before this filing date? Find Prior Art

Description

RDMA-based message sending and processing method and system, and electronic device

[0001] Cross-reference to Related Applications

[0002] The present application claims priority to the Chinese patent application No. 202411917494.X, filed on December 24, 2024, and entitled "RDMA-based message sending and processing method, system and electronic device", the content of which is incorporated herein by reference in its entirety. TECHNICAL FIELD

[0003] The present application relates to the field of network communication technology, and in particular to an RDMA-based message sending and processing method, system and electronic device. BACKGROUND

[0004] Remote Direct Memory Access (RDMA) is a network communication technology that allows computers to directly transfer data between memories over a network. RDMA technology can provide multiple transmission modes at the transport layer, and the commonly used one is the ordered transmission mode, such as the Reliable Connect (RC) mode used by RoCEv2, the Reliable Ordered Delivery (ROD) mode used by the Ultra Ethernet Consortium (UEC), etc. In the related art, when using the ordered transmission mode provided by the RDMA technology for data transmission, the sending end can split the data to be sent into messages, and the receiving end will check whether the received messages are continuous. If not, it is considered that packet loss or out-of-order has occurred and retransmission is triggered, thereby ensuring the ordered transmission of messages. However, in actual applications, this transmission mode has a high retransmission cost when packet loss or out-of-order occurs, which reduces the transmission efficiency of the network. SUMMARY

[0005] The present application provides an RDMA-based message sending and processing method, system and electronic device, which is used to solve the problem of low transmission efficiency when using the ordered transmission mode provided by the RDMA technology to transmit messages.

[0006] In a first aspect, an RDMA-based message sending method is provided, applied to a sending end, and includes: sending a first message to a receiving end, the first message carrying indication information and identification information of a message to which the first message belongs, the indication information being used to indicate whether out-of-order reception between messages is allowed.

[0007] In a second aspect, a message processing method based on RDMA is provided, which is applied to a receiving end and includes: receiving a first message sent by a sending end, the first message carrying indication information and identification information of a message to which the first message belongs, the indication information being used to indicate whether inter-message out-of-order receiving is allowed; and processing the first message according to the indication information and the identification information, the processing including inter-message out-of-order receiving or inter-message in-order receiving of the message to which the first message belongs.

[0008] In a third aspect, a message sending and processing system based on RDMA is provided, which includes a sending end and a receiving end, wherein: the sending end sends a first message to the receiving end, the first message carrying indication information and identification information of a message to which the first message belongs, the indication information being used to indicate whether inter-message out-of-order receiving is allowed; and the receiving end receives the first message, and processes the first message according to the indication information and the identification information, the processing including inter-message out-of-order receiving or inter-message in-order receiving of the message to which the first message belongs.

[0009] In a fourth aspect, an electronic device is provided, which includes: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the method according to the first aspect or the second aspect.

[0010] In a fifth aspect, a computer readable storage medium is provided, which, when instructions stored in the storage medium are executed by a processor of an electronic device, enables the electronic device to perform the method according to the first aspect or the second aspect.

[0011] In a sixth aspect, a computer program product is provided, which includes a non-transitory computer readable storage medium storing a computer program, the computer program being operable to cause a computer to perform some or all of the steps in the method according to the first aspect, or to perform some or all of the steps in the method according to the second aspect. BRIEF DESCRIPTION OF DRAWINGS

[0012] In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed in the embodiments or prior art description will be briefly introduced as follows. Obviously, the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can also be obtained without creative labor.

[0013] FIG. 1 is a schematic diagram of an embodiment of the present application, in which the application layer does not require inter-message in-order;

[0014] FIG. 2 is a flowchart of an embodiment of a message sending method based on RDMA of the present application;

[0015] Figure 3 is a schematic diagram of an embodiment of this application showing the application layer sending the first parameter through an extended programming interface;

[0016] Figure 4 is a schematic diagram of load balancing at the sending end according to an embodiment of this application;

[0017] Figure 5 is a schematic diagram of a retransmission message sent by the sender according to an embodiment of this application;

[0018] Figure 6 is a flowchart illustrating an embodiment of the message processing method based on RDMA according to this application;

[0019] Figure 7 is a schematic diagram of the message processing process according to an embodiment of this application;

[0020] Figure 8 is a schematic diagram of message-granular load balancing according to an embodiment of this application;

[0021] Figure 9 is a schematic diagram of the basic encapsulation format of a message according to an embodiment of this application;

[0022] Figure 10 is a schematic diagram illustrating an embodiment of this application that extends the basic encapsulation format of a message;

[0023] Figure 11 is a schematic diagram illustrating an embodiment of this application that extends the semantic header encapsulation format of a message;

[0024] Figure 12 is a schematic diagram illustrating an embodiment of this application that extends the encapsulation format of NACK messages;

[0025] Figure 13 is a schematic diagram of the structure of an electronic device according to an embodiment of this application;

[0026] Figure 14 is a schematic diagram of the structure of a message transmission device based on RDMA according to an embodiment of this application;

[0027] Figure 15 is a schematic diagram of the structure of an electronic device according to an embodiment of this application;

[0028] Figure 16 is a schematic diagram of the structure of a message processing device based on RDMA according to an embodiment of this application. Detailed Implementation

[0029] RDMA is a network communication technology that employs mechanisms such as "zero-copy," kernel bypass, and hardware offloading to allow computers to directly transfer data between memory locations over a network. By using RDMA for data transfer, throughput can be increased while end-to-end communication latency is reduced, making RDMA a crucial technology in high-performance computing and intelligent data center networks.

[0030] RDMA technology provides multiple transmission modes in the transport layer. Taking the widely used RoCEv2 as an example, it includes modes such as reliable connection (RC), unreliable datagram (UD), etc. Among them, RC is the most commonly used mode, which can provide reliable and ordered transmission at the packet level. For example, the sender divides the data to be sent into packets, and each packet carries a sequence number (PSN). If the RC mode is used, the receiver will check whether the received PSN is continuous. If it is not continuous, it will be treated as a packet loss by default and trigger retransmission. Other RDMA technologies are similar. For example, the current popular UEC provides reliable and ordered transmission (ROD) and reliable unordered transmission (RUD) modes in the transport layer. When the ROD mode is used, the receiver transport layer will order the received packets. When the RUD mode is used, the receiver directly delivers the packets to the host memory. At this time, the packets of the same message may arrive out of order. When the transport layer judges that a packet is lost (typically based on an out-of-order threshold to determine whether a packet is lost), a selective retransmission mechanism needs to be used. Due to the shortcomings of the selective retransmission mechanism, the current commercial RDMA network card still uses the go-back-N retransmission mechanism in the production environment.

[0031] In the related art, when transmitting packets using an ordered transmission mode (such as the RC mode used by RoCEv2 or the ROD mode used by UEC), since the packets need to be ordered, once the packets are lost or out of order, the receiver will trigger retransmission, resulting in high retransmission cost and affecting the transmission efficiency of the network.

[0032] In some application scenarios, the transport layer provides programming interfaces (Verbs, libfabric, etc.) to the application layer. Application layer programs may require intra-message ordering but not inter-message ordering. As shown in Figure 1, the sending application layer needs to send two high-performance computing messages, each segmented into four packets, each carrying a matrix data. The receiving application layer needs to receive these two messages and perform matrix addition. In this case, intra-message ordering is required; otherwise, the carried data will be incorrect. However, due to the commutative law of matrix addition, the arrival order of messages does not affect the calculation result; therefore, inter-message ordering is not required. Other scenarios, such as mathematical multiplication and solving linear equations, satisfy the commutative law and similarly do not require inter-message ordering. Thus, if the transport layer could provide a processing method that only ensures intra-message ordering without inter-message ordering, it could significantly reduce the scope of packet loss or out-of-order retransmission in traditional reliable connections while satisfying the above application scenarios, thereby improving network transmission efficiency. However, current RDMA technology lacks such a fine-grained transmission processing mechanism.

[0033] This application provides a method, system, and electronic device for message transmission and processing based on RDMA. When transmitting messages based on RDMA, the sending and receiving ends can transmit messages based on messages. When sending a first message to the receiving end, the sending end can carry identification information of the message to which the first message belongs, as well as indication information indicating whether out-of-order message reception is allowed. After receiving the first message, the receiving end can process the message based on the identification information and indication information carried in the first message. Thus, when the first message sent by the sending end indicates that out-of-order message reception is allowed, the receiving end only needs to maintain the order of messages within the message, without needing to maintain the order between messages. That is, the receiving end can avoid triggering retransmissions in the event of out-of-order message reception, only triggering retransmissions when packets within the message are lost or out of order. This reduces the scope of retransmissions (limiting the retransmission scope to within the message itself), thereby improving transmission efficiency.

[0034] It should be noted that the technical solutions provided in this application embodiment can be applied to routers or switches and their corresponding configuration units.

[0035] To enable those skilled in the art to better understand the technical solutions in this application, the technical solutions in this application will be clearly and completely described below with reference to the accompanying drawings of one or more embodiments. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of this application.

[0036] The terms "first," "second," etc., used in this application and the claims are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such use of data can be interchanged where appropriate so that this application can be implemented in orders other than those illustrated or described herein. Furthermore, in this application and the claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0037] The technical solutions provided by the various embodiments of this application are described in detail below with reference to the accompanying drawings.

[0038] Figure 2 is a flowchart illustrating an embodiment of the message transmission method based on RDMA according to this application. The message transmission method shown in Figure 2 is applied to the sending end; that is, the message transmission method shown in Figure 2 can be executed by software or hardware installed on the sending end. In some embodiments, the message transmission method shown in Figure 2 can be applied to the transport layer of the sending end. The message transmission method shown in Figure 2 is described below.

[0039] S202: Send a first message to the receiving end. The first message carries indication information and the identification information of the message to which the first message belongs. The indication information is used to indicate whether out-of-order reception of messages is allowed.

[0040] When transmitting messages based on RDMA, the sending end can transmit messages based on messages. For example, as shown in Figure 1, when the sending end sends msg1 and msg2 to the receiving end, it can split msg1 and msg2 into four messages each, and then send these messages to the receiving end sequentially. In this embodiment, when the sending end sends messages to the receiving end, taking the sending of the first message as an example, the sending end can carry indication information and identification information in the first message. The indication information is used to indicate whether out-of-order reception of messages is allowed. For example, when the sending end sends msg1 and msg2 to the receiving end, the sent message carries indication information, which is used to indicate whether out-of-order reception of msg1 and msg2 is allowed by the receiving end. The identification information is the identification information of the message to which the first message belongs. For example, if the first message is a message obtained by splitting msg1, then the identification information is the identification information of msg1, such as the ID of msg1.

[0041] In this way, since the sending end carries message identification information and indication information for whether out-of-order message reception is allowed in the first message, the receiving end only needs to maintain the order of messages within the message if the indication information in the first message indicates that out-of-order message reception is allowed. There is no need to maintain the order of messages between messages. In other words, the receiving end can avoid triggering retransmission in the case of out-of-order message reception, and only trigger retransmission when messages within the message are lost or out of order. This can reduce the scope of retransmission (that is, limit the retransmission scope to within the message) and thus improve the transmission efficiency of the network.

[0042] In some implementations, sending a first message from the sending end to the receiving end may include: receiving a first parameter sent by the application layer, the first parameter indicating whether out-of-order message reception is allowed; and sending the first message to the receiving end according to the first parameter.

[0043] In this embodiment, the sending application layer can pass a first parameter to the sending transport layer based on whether message ordering is required. The first parameter can indicate whether out-of-order message reception is allowed. For example, if the application layer does not require message ordering, the first parameter can indicate that out-of-order message reception is allowed; if the application layer requires message ordering, the first parameter can indicate that out-of-order message reception is not allowed.

[0044] To facilitate the application layer in passing the first parameter, in some implementations, the programming interface provided by the transport layer to the application layer can be extended. This allows the application layer to send the first parameter to the transport layer through the extended programming interface, and the transport layer to receive the first parameter sent by the application layer based on the extended programming interface. The extended programming interface may include an extended first field, which can be used to carry the first parameter. In other words, when extending the programming interface, a first field can be added to the interface so that the application layer can use this first field to transmit its message ordering requirement to the transport layer.

[0045] It should be noted that different RDMA technologies use different programming interfaces. For example, RoCEv2 uses libverbs, UEC uses libfabric, Intel uses PSM2, and Cisco uses usNIC. While the details of these programming interfaces differ, the extension methods are similar. For instance, in the IB libverbs interface, when extending the interface, new transport sub-modes can be added to the ib_create_qp interface parameter ibv_qp_init_attr to convey application layer requirements. As shown in Figure 3, when qp_type is RC and qp_sub_type = true, it indicates that the application layer does not require message ordering, meaning the application layer allows out-of-order message reception.

[0046] After receiving the first parameter sent by the application layer, the sending end can send the first message to the receiving end based on the first parameter. Sending the first message based on the first parameter can be done by setting whether out-of-order message reception is allowed based on the indication information carried in the first message. For example, if the first parameter indicates that out-of-order message reception is allowed, then the indication information carried in the first message is used to indicate that out-of-order message reception is allowed; if the first parameter indicates that out-of-order message reception is not allowed, then the indication information carried in the first message is used to indicate that out-of-order message reception is not allowed.

[0047] In some implementations, the first message can be extended as needed to carry message identification information and indication information indicating whether out-of-order message reception is allowed. For example, regarding message identification information, since message identification information is not currently defined in RoCEv2 service messages, but is defined in UEC transport layer messages, RoCEv2 service messages need to be extended accordingly, while UEC transport layer messages do not. Similarly, regarding indication information indicating whether out-of-order message reception is allowed, since such indication information is not currently defined in either RoCEv2 service messages or UEC transport layer messages, both RoCEv2 service messages and UEC transport layer messages need to be extended accordingly so that they can carry this indication information.

[0048] When extending a message, it can be extended according to the message format of the RDMA technology used. Specifically, for the indication information used to indicate whether out-of-order message reception is allowed, in one example, a 1-bit flag can be used. For instance, when the flag is true (or the flag value is 1), the indication information indicates that out-of-order message reception is allowed; when the flag is false (or the flag value is 0), the indication information indicates that out-of-order message reception is not allowed.

[0049] In some implementations, the first message may also carry an opcode, which characterizes the current transmission mode, such as reliable mode or unreliable mode. When sending the first message to the receiver, the sender may also perform load balancing based on the identification information in the first message, provided a first condition is met. The first condition includes an opcode indicating reliable mode (e.g., RC0), and the first message carrying message identification information and indication information, with the indication information indicating that out-of-order message reception is permitted. In other words, if the first message carries an opcode indicating reliable mode, carries message identification information and indication information, and the indication information indicates that out-of-order message reception is permitted, the sender is allowed to perform load balancing at the message granularity level.

[0050] Currently widely used flow-level hash routing algorithms are based on the IP 5-tuple of service flows, which suffers from numerous problems such as hash collision polarization. When the sender determines that load balancing can be performed at the message granularity level, it can use the message's identifier information as input to the hash algorithm. In this case, packets belonging to the same message (i.e., packets with the same identifier information) can be hashed to the same path, thus ensuring the packet order within the message. Packets belonging to different messages (i.e., packets with different identifier information) may be sent to different paths, thereby improving the load balancing effect. In other words, when performing load balancing based on identifier information, the sender can send packets with the same identifier information to the same transmission path, and send packets with different identifier information to the same or different transmission paths.

[0051] Please refer to Figure 4. The sending end sends msg1, msg2, msg3, and msg4 to the receiving end. Each message contains multiple packets (not shown in Figure 4). When the sending end determines that load balancing can be performed at the message granularity level, the sending method for each message can be as shown in Figure 4. In Figure 4, when performing load balancing, the sending end sends packets belonging to the same message to a single transmission path, and sends packets belonging to different messages to the same or different transmission paths. Taking the first switch connected to the sending end in Figure 4 as an example, when forwarding msg1, msg2, msg3, and msg4, each message is forwarded through a single transmission path, ensuring that packets within the same message are sent to the same path without being split. Different messages are forwarded through the same or different transmission paths; that is, msg1 and msg4 are sent to one transmission path, and msg2 and msg3 are sent to another, allowing packets within different messages to be sent to the same or different transmission paths.

[0052] It should be noted that when the sender performs load balancing based on message granularity, the load balancing effect is related to the message size. Generally, the smaller the message, the better the load balancing effect. The message size can be selected according to different application characteristics, and no limitation is made here.

[0053] In some implementations, when the sending end sends the first message to the receiving end, the first message may also carry a Packet Sequence Number (PSN), a Start Of Msg (SOM), an End Of Msg (EOM), an identifier for the Source Queue Pair (SQP), and an identifier for the Destination Queue Pair (DQP). This information can be used by the receiving end to process the first message. The processing flow can be seen in the embodiment shown in Figure 6, which will not be described in detail here. The Packet Sequence Number can represent the order of messages and can be used by the receiving end to save the messages. The Start Of Msg is used to indicate whether the message is the first message in its class. In one example, the Start Of Msg can use 1 bit for indication; a bit value of 1 indicates that the message is the first message in its class, and a bit value of 0 indicates that the message is not the first message in its class. The tail packet marker indicates whether a message is the last message in its class. In one instance, the tail packet marker can also use 1 bit for this indication: a value of 1 indicates that the message is the last message in its class, and a value of 0 indicates that the message is not the last message in its class. The identifier for the source queue pair can be the ID of the source queue pair. The identifier for the destination queue pair can be the ID of the destination queue pair.

[0054] In some implementations, where the indication information carried in the first message is used to indicate that out-of-order reception of messages is permitted, the sending end may also perform the following operations: receive a NACK message sent by the receiving end, the NACK message being sent by the receiving end when it determines that a message within the message to which the first message belongs has been lost or out of order; and retransmit the message based on the NACK message.

[0055] In this embodiment, when the indication information carried in the first message indicates that out-of-order reception between messages is permitted, the receiving end can perform only intra-message ordering and not inter-message ordering. That is, it needs to receive multiple messages within the same message as the first message in sequence, but out-of-order reception is permissible. However, when the receiving end performs intra-message ordering, there may be packet loss or out-of-order delivery among the multiple messages within the same message as the first message. In this case, the receiving end can trigger retransmission and send a NACK message to the sending end. Upon receiving the NACK message, the sending end can retransmit the message based on the NACK message.

[0056] In some implementations, to limit the retransmission scope of a message to within a given message, the NACK message may carry the identifier information of the message to which the retransmitted message belongs (i.e., the identifier information of the message to which the first message belongs) and the retransmission start packet sequence number. Thus, when retransmitting a message based on the NACK message, the sender can use the identifier information and the retransmission start packet sequence number carried in the NACK message to send only the messages within the message corresponding to that identifier information, starting from that retransmission start packet sequence number, to the receiver. This limits the retransmission scope to within the given message, reduces the number of retransmitted messages, lowers the retransmission cost, and improves network transmission efficiency.

[0057] To enable the NACK message to carry message identification information, some implementations can extend the NACK message as needed. That is, if the current NACK message does not define message identification information, it needs to be extended to include it. In one example, the NACK message can also carry indication information allowing out-of-order message reception; in this case, the NACK message also needs to be extended accordingly to include this indication information.

[0058] Figure 5 is a schematic diagram of the retransmission of a message by the sending end according to an embodiment of this application.

[0059] In Figure 5, the sender sends msg1 and msg2 to the receiver. msg1 contains four packets with PSNs 1, 2, 3, and 4, while msg2 contains five packets with PSNs 5, 6, 7, 8, and 9. The sender's application layer allows out-of-order message reception. When the receiver receives packets, it finds that the packet with PSN = 3 is lost, triggers a retransmission, and sends a NACK packet to the sender. The NACK packet carries the message identifier msg1, the retransmission start packet sequence number PSN = 3, and the indication information msg_order_flag = true to indicate that out-of-order message reception is allowed. After receiving the NACK packet, when retransmitting packets, the sender can retransmit only the packets with PSNs = 3 and 4 in msg1, i.e., the retransmission range is PSN = {3~4}. It should be noted that, since out-of-order message reception is allowed, msg2 may arrive at the receiving end's transport layer and be delivered to the receiving host's memory before msg1's {3,4} message.

[0060] Assuming that the sending application layer does not allow out-of-order message reception, then in Figure 5, when a message with PSN=3 is lost, the retransmission range of the sending end is PSN={3~9}. However, when out-of-order message reception is allowed, the retransmission range of the sending end is PSN={3~4}. It can be seen that when the sending application layer allows out-of-order message reception, the retransmission range of the sending end can be reduced, thereby improving transmission efficiency.

[0061] In this embodiment, since the sending end carries message identification information and indication information indicating whether out-of-order message reception is allowed in the first message, the receiving end only needs to maintain the order of messages within the message when the indication information in the first message indicates that out-of-order message reception is allowed. There is no need to maintain the order of messages between different messages. In other words, the receiving end can avoid triggering retransmissions when messages are out of order between different messages, and only trigger retransmissions when packets within the message are lost or out of order. This reduces the scope of retransmissions (limiting the retransmission scope to within the message itself), thereby improving network transmission efficiency. Furthermore, since the sending end can perform load balancing based on message granularity when out-of-order message reception is allowed, message blocking can be avoided, further improving network transmission efficiency and overall network utilization.

[0062] Figure 6 is a schematic flowchart of a message processing method based on RDMA according to an embodiment of this application. The message sending method shown in Figure 2 is applied to the receiving end; that is, the message processing method shown in Figure 6 can be executed by software or hardware installed at the receiving end. In some embodiments, the message processing method shown in Figure 6 can be applied to the transport layer of the receiving end. The message processing method shown in Figure 6 is described below.

[0063] S602: Receive the first message sent by the sending end. The first message carries indication information and the identification information of the message to which the first message belongs. The indication information is used to indicate whether out-of-order reception of messages is allowed.

[0064] After the sending end sends the first message to the receiving end based on RDMA, the receiving end can receive the first message. The first message carries indication information for whether out-of-order message reception is allowed and identification information of the message to which the first message belongs. The implementation method of the sending end sending the first message to the receiving end can be referred to the corresponding content in the embodiment shown in Figure 2, and will not be described in detail here.

[0065] S604: Process the first message according to the instruction information and identification information. The processing includes out-of-order reception or ordered reception of the message to which the first message belongs.

[0066] After receiving the first message, the receiving end can process it according to the indication and identification information carried in the first message. This processing can include receiving the message out of order or receiving it in order. If the receiving end receives the message out of order (i.e., does not maintain order among the messages), then in the event of message out-of-order delivery, the receiving end can avoid triggering retransmissions, thereby reducing retransmission events and improving transmission efficiency. Alternatively, the processing of the first message can also include receiving it in order, i.e., maintaining the order of multiple messages within the message to which the first message belongs. Thus, if the receiving end maintains the order of the first message but does not maintain order among the messages, the receiving end can trigger retransmissions only when messages within a message are lost or out of order, and not when messages are out of order, thereby reducing the scope of retransmissions (limiting the retransmission scope to within the message itself) and improving transmission efficiency.

[0067] In some implementations, the receiving end processes the first message based on the indication information and identification information carried in the first message. This may include: if the indication information indicates that out-of-order message reception is allowed, performing first processing on the first message based on the identification information, the processing principle of the first processing includes maintaining the order of multiple messages within the message to which the first message belongs and not maintaining the order of messages within the message to which the first message belongs; and performing second processing on the first message if the indication information indicates that out-of-order message reception is not allowed, the processing principle of the second processing includes maintaining the order of multiple messages within the message to which the first message belongs and maintaining the order of messages within the message to which the first message belongs.

[0068] The indication information carried in the first message includes two scenarios: one indicating that out-of-order message reception is permitted, and the other indicating that out-of-order message reception is not permitted. When the indication information permits out-of-order message reception, the receiving end needs to perform intra-message ordering during message processing, but may not perform inter-message ordering. That is, the receiving end can perform first processing on the first message based on the identification information. The first processing principle includes ensuring the order of multiple messages within the same message and not performing inter-message ordering on the message to which the first message belongs. When the indication information disallows out-of-order message reception, the receiving end needs to perform both intra-message ordering and inter-message ordering during message processing. That is, the receiving end can perform second processing on the first message. The second processing principle includes ensuring the order of multiple messages within the same message and performing inter-message ordering on the message to which the first message belongs. The implementation method of the receiving end performing the second line processing on the first message can be found in the relevant technology, and will not be described in detail here. The embodiments of this application focus on how the receiving end performs the first processing on the first message according to the identification information.

[0069] In some implementations, the first message may also carry a packet sequence number, a first packet marker, a last packet marker, a source queue pair identifier, and a destination queue pair. Detailed descriptions of this information can be found in the corresponding content of the embodiment shown in Figure 2, and will not be repeated here. When the first message carries a packet sequence number, a first packet marker, a last packet marker, a source queue pair identifier, and a destination queue pair, the receiving end performs a first processing on the first message based on the identifier information. This first processing can be performed based on the identifier information, packet sequence number, first packet marker, last packet marker, source queue pair identifier, and destination queue pair identifier carried in the first message. It may include: determining whether the first message is the first or last message in its message group based on the first packet marker and last packet marker; and performing the first processing on the first message based on the determination result, the identifier information, the packet sequence number, the source queue pair identifier, and the destination queue pair identifier.

[0070] The first packet flag indicates whether the first message is the first message in its class. For example, the first packet flag can use 1 bit, where a value of 1 indicates that the message is the first message in its class, and a value of 0 indicates that the message is not the first message in its class. The last packet flag indicates whether the first message is the last message in its class. The last packet flag can also use 1 bit, where a value of 1 indicates that the message is the last message in its class, and a value of 0 indicates that the message is not the last message in its class. When the receiving end determines whether the first message is the first or last message in its class based on the first and last packet flags, it can do so by using the values ​​of the first and last packet flags. For example, if the first packet flag is set to 1, it can be determined that the first message is the first message in the message. If the last packet flag is set to 1, it can be determined that the first message is the last message in the message. If both the first and last packet flags are set to 0, it can be determined that the first message is neither the first nor the last message in the message, but rather one of the other messages in the message besides the first and last messages.

[0071] After determining whether the first message is the first or last message in the message, the first message can be processed based on the determination result, the identification information carried in the first message, the packet sequence number, the identifier of the source queue pair, and the identifier of the destination queue pair.

[0072] As mentioned earlier, the judgment result can include three cases: first, the first message is the first message in the message family; second, the first message is neither the first nor the last message in the message family; and third, the first message is the last message in the message family. For different judgment results, the receiving end can perform different processing operations when processing the first message. The following will explain in detail how to process the first message for different judgment results.

[0073] If the determination result includes that the first message is the first message in the message, in some implementations, the first message is processed based on the determination result, identification information, packet sequence number, source queue pair identifier, and destination queue pair identifier. This processing may include: generating corresponding first and second variables based on the identification information, source queue pair identifier, and destination queue pair identifier. The first variable is used to record the packet sequence number of the last message received in order within the message, and the second variable is used to indicate whether the first message in the message has been successfully received; initializing the first and second variables, where the value of the first variable after initialization is the packet sequence number of the first message, and the value of the second variable after initialization is used to indicate that the first message in the message has been successfully received; and delivering the first message to the host memory.

[0074] In this embodiment, when the first message is the first message within a given message, two variables can be generated for that message. One variable records the sequence number of the last message received in order within the message, i.e., the first variable mentioned above (which can be represented as msg_psn_cur). The other variable indicates whether the first message within the message has been successfully received, i.e., the second variable mentioned above (which can be represented as som_received). In this embodiment, the first variable can have a one-to-one mapping relationship with the message identifier, the source queue pair identifier, and the destination queue pair identifier. This mapping relationship can be stored in a table, where the keys are the message identifier, the source queue pair identifier, and the destination queue pair identifier, and the values ​​are the first variable. Similarly, the second variable can also have a one-to-one mapping relationship with the message identifier, the source queue pair identifier, and the destination queue pair identifier.

[0075] After generating the first and second variables, these variables can be initialized. Since the first message is the first message in its class and has been successfully received by the receiver, after initializing the first and second variables, the value of the first variable is the packet sequence number of the first message, and the value of the second variable is used to indicate that the first message in the class has been successfully received. For example, if the second variable is represented by 1 bit, then after initializing the second variable, its value can be true or 1.

[0076] After initializing the first and second variables, the first message can be delivered to the host memory.

[0077] In some implementations, if the determination result includes that the first message is not the first or last message in its message, the first message is processed based on the determination result, identification information, packet sequence number, source queue pair identifier, and destination queue pair identifier. This processing may include: searching for a second variable corresponding to the identification information, source queue pair identifier, and destination queue pair identifier; if the second variable exists and is used to indicate that the first message in the message has been successfully received, searching for a first variable corresponding to the identification information, source queue pair identifier, and destination queue pair identifier; if the packet sequence number of the first message is equal to the sum of the value of the first variable and 1, delivering the first message to the host memory and updating the value of the first variable to the packet sequence number of the first message; if the packet sequence number of the first message is not equal to the sum of the value of the first variable and 1, determining that a message in the message to which the first message belongs has been lost or out of order, and sending a NACK message to the sender.

[0078] In this embodiment, if the first message is neither the first nor the last message in its message, the corresponding second variable (used to indicate whether the first message in the message has been successfully received) can be found based on the identification information, source queue pair identifier, and destination queue pair identifier carried in the first message. If the second variable is not found or its value indicates that the first message in the message has not been successfully received (e.g., the value of the second variable is false), it can be concluded that the receiving end has not successfully received the first message in the message to which the first message belongs. In this case, the first message can be processed according to the processing method in related technologies, which will not be described in detail here. If the second variable exists and its value indicates that the first message in the message has been successfully received (e.g., the value of the second variable is true), it can be concluded that the receiving end has successfully received the first message in the message to which the first message belongs. In this case, subsequent operations can continue, that is, based on the identification information, source queue pair identifier, and destination queue pair identifier carried in the first message, the corresponding first variable (used to record the packet sequence number of the last message that has been transmitted in order within the message) can be found. Since the first and second variables are generated by the sender together with the identifier information, the identifier of the source queue pair, and the identifier of the destination queue pair when the first message is received, the first variable can also be found if the second variable is found.

[0079] After locating the first variable, it can be determined whether the sequence number of the first message is equal to the sum of the value of the first variable and 1. If the sequence number of the first message is equal to the sum of the value of the first variable and 1, it indicates that the sequence number of the first message is continuous with the sequence number of the previous message received in order by the receiver, and no packet loss or out-of-order delivery has occurred. In this case, the first message can be delivered to the host memory, and the value of the first variable can be updated to the sequence number of the first message. This allows the receiver to determine whether the sequence number of the first message is continuous with the sequence number of the next message when receiving the next message, and thus determine whether packet loss or out-of-order delivery has occurred. If the sequence number of the first message is not equal to the sum of the value of the first variable and 1, it indicates that the sequence number of the first message is not continuous with the sequence number of the previous message received in order by the receiver, and packet loss or out-of-order delivery has occurred within the message to which the first message belongs. In this case, the receiver can trigger a retransmission and send a NACK message to the sender.

[0080] When the sending end sends a NACK message to the receiving end, in order to limit the retransmission scope of the sending end to the message itself, in some implementations, the NACK message may carry the identification information of the message to which the retransmitted message belongs (i.e., the identification information of the message to which the first message belongs) and the retransmission start packet sequence number. This retransmission start packet sequence number can be equal to the sum of the value of a first variable and 1, indicating that packet loss or out-of-order delivery occurred starting from this start packet sequence number, requiring retransmission of messages within the message starting from that start packet sequence number. For the sending end, when receiving a NACK message and performing message retransmission, based on the identification information and the retransmission start packet sequence number carried in the NACK message, it can send only the messages within the message corresponding to that identification information, starting from the retransmission start packet sequence number, to the receiving end. This limits the retransmission scope to the message itself, reduces the number of retransmitted messages, lowers the retransmission cost, and improves network transmission efficiency. See the embodiment shown in Figure 5; further examples will not be provided here.

[0081] If the determination result includes that the first message is the last message in the corresponding message, in some implementations, the first message is processed according to the determination result, identification information, packet sequence number, source queue pair identifier, and destination queue pair identifier. This processing includes: finding the first variable and the second variable corresponding to the identification information, source queue pair identifier, and destination queue pair identifier; releasing the first variable and the second variable; delivering the first message to the host memory; and sending an ACK message to the sender.

[0082] To facilitate understanding of the message processing flow at the receiving end, the following description will use one implementation method shown in Figure 7 as an example. As shown in Figure 7, processing a message may include the following steps.

[0083] Step 1: Extract the following fields from the message: msg_id, packet sequence number (PSN), first packet marker (SOM), last packet marker (EOM), msg_order_flag (indicating whether out-of-order message reception is allowed), source queue pair (SQP), destination queue pair (DQP), etc.

[0084] Step 2: Check the value of msg_order_flag. If msg_order_flag = false, meaning out-of-order message reception is not allowed, process the message according to the relevant technical procedures and proceed to step 8. If msg_order_flag = true, meaning out-of-order message reception is allowed, proceed to step 3.

[0085] Step 3: Check the value of SOM. If SOM = 1, maintain a variable msg_psn_cur for the msg_id, with the initial value of msg_psn_cur being the PSN of the message. Also maintain a variable som_received = true. Then, proceed to step 5. If SOM = 0, check if som_received exists for the msg and if it is true. If it exists and is true, find the corresponding msg_psn_cur based on the msg_id and execute step 4. If it does not exist or is not true, process the message according to the relevant technical procedures and proceed to step 8.

[0086] Step 4: Determine if the PSN of the message is equal to msg_psn_cur + 1. If yes, update msg_psn_cur to the PSN of the message and then proceed to Step 5. If no, proceed to Step 7.

[0087] Step 5: Deliver the message to the host memory and determine if the message is the tail packet. If it is the tail packet, proceed to step 6; otherwise, proceed to step 8.

[0088] Step 6: If the message EOM=1, release the current msg_psn_cur variable, send an ACK to the source (i.e., the sender), generate a Complete Queue Entry (CQE) (this process is the same as in related technologies), and proceed to step 8.

[0089] Step 7: If the PSN of the packet is not equal to msg_psn_cur+1, then trigger a retransmission, that is, send a NACK packet to the source end and indicate that the retransmission start sequence number is msg_psn_cur+1, and then proceed to step 8.

[0090] Step 8: The current message processing is complete.

[0091] In this embodiment, since the first message received by the receiving end carries the identification information of the message to which the first message belongs and the indication information for indicating whether out-of-order reception between messages is allowed, for the receiving end, when the first message sent by the sending end indicates that out-of-order reception between messages is allowed, the receiving end only needs to maintain the order of the messages within the message, without needing to maintain the order between messages. That is to say, the receiving end can avoid triggering retransmission in the case of out-of-order message reception, and only trigger retransmission when the messages within the message are lost or out of order. This can reduce the scope of retransmission (that is, limit the scope of retransmission to within the message) and thus improve transmission efficiency.

[0092] To facilitate understanding of the technical solutions provided in the embodiments of this application, three implementation methods will be used as examples for explanation below.

[0093] Example 1: This example illustrates how to perform load balancing based on message granularity when the sending application layer allows out-of-order message delivery. This example uses a typical spine-leaf architecture. Assume the switch has 4 ports with a bandwidth of 25Gbps. In a network without bandwidth convergence, each leaf switch has 2 available uplinks. Assume the remaining available bandwidth from spine1 to leaf2 is 10Gbps.

[0094] Server 1 sends an elephant stream (flow1) to Server 2, assuming the elephant stream rate is 16Gbps. On the leaf switch, hash routing based on the flow ID (typically a 5-tuple) is random and cannot be optimized according to the spine egress link, such as WCMP. In this case, it may hash to the first uplink and send the elephant stream to the spine1 node. Since there is only 10Gbps left from spine1 to leaf2, this will cause congestion at the spine1->leaf2 egress and trigger congestion control, flow control and other mechanisms, which will eventually limit the sending rate of flow1 to below 10Gbps and increase the flow tail latency, as shown in Figure 8(a).

[0095] If the message carries the `msg_order_flag=true` flag, the leaf switch can identify the service type of the message and flexibly perform hash routing based on `msg_id` + `flow ID`. Due to the randomness of the hash, in this way, each uplink of leaf1 carries 8Gbps of traffic, thus preventing congestion at the output port of spine1->leaf2. Ultimately, flow1 can send at the full rate of 16Gbps without loss of throughput, as shown in Figure 8(b).

[0096] When the message arrives at server 2, the receiving network card transport layer determines that message ordering is not required based on the msg_order_flag flag carried in the message. Therefore, there is no need to rearrange out-of-order messages, thus avoiding the overhead of message reordering.

[0097] Example 2: This example illustrates the overall processing flow of the receiving transport layer using a message-based fine-grained approach. Assume a certain service flow (flow1) does not require message ordering, and the message carries a flag of msg_order_flag=true. When the receiving transport layer parses the message, it performs transport layer reception processing based on message granularity. This example illustrates the advantages of this message-based fine-grained processing method in two scenarios.

[0098] Scenario 1: Flow1 contains two messages, msg1 and msg2, each of which is split into 4 packets. The PSNs of msg1 are 1-4, and those of msg2 are 5-8. The packets are sent at the source in the order of PSN1->PSN8. Assume that due to a message-level load balancing strategy, the two messages arrive out of order (but remain in the correct order within the message), but no packets are lost. Assume the PSNs of the packets arriving at the receiving end are in the order of 1, 2, 5, 3, 6, 4, 7, 8. The receiving end's processing of the packets is as follows.

[0099] 1) When a PSN1 message arrives, with msg number 1 and SOM flag 1, variables msg1_psn_cur = 1 and msg1_som_received = true are created. The PSN1 message is delivered to the host memory, the EOM flag is set to 0, and no message completion processing is performed.

[0100] 2) When the PSN2 message arrives, the msg number is 1 and the SOM flag is 0. Based on msg1, msg1_psn_cur = 1 and msg1_som_received = true. Since the PSN of the PSN2 message is 2, which is equal to msg1_psn_cur + 1, the current message order is correct. msg1_psn_cur is updated to 2, the PSN2 message is delivered to the host memory, the EOM flag is set to 0, and no message completion processing is performed.

[0101] 3) When a PSN5 message arrives with msg number 2 and SOM flag 1, variables msg2_psn_cur = 5 and msg2_som_received = true are created. The PSN5 message is delivered to the host memory, the EOM flag is set to 0, and no message completion processing is performed.

[0102] 4) When the PSN3 message arrives, the msg number is 1 and the SOM flag is 0. Based on msg1, we look up msg1_psn_cur = 2 and msg2_som_received = true. Since the PSN of the PSN3 message is 3, which is equal to msg1_psn_cur + 1, the current message order is correct. We update msg1_psn_cur to 3, deliver the PSN3 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0103] 5) When the PSN6 message arrives, the msg number is 2 and the SOM flag is 0. Based on msg2, we look up msg2_psn_cur = 5 and msg2_som_received = true. Since the PSN of the PSN6 message is 6, which is equal to msg2_psn_cur + 1, the current message order is correct. We update msg2_psn_cur to 6, deliver the PSN6 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0104] 6) When the PSN4 message arrives, its msg number is 1 and its SOM flag is 0. Based on msg1, msg1_psn_cur = 3 and msg1_som_received = true. Since the PSN of the PSN4 message is 4, which equals msg1_psn_cur + 1, the current message order is correct. The PSN4 message is then delivered to the host memory. Because the EOM flag is 1, message completion processing is performed, and the msg1_psn_cur variable is cleared.

[0105] 7) When the PSN7 message arrives, the msg number is 2 and the SOM flag is 0. Based on msg2, we find msg2_psn_cur = 6 and msg2_som_received = true. Since the PSN of the PSN7 message is 7, which is equal to msg2_psn_cur + 1, the current message order is correct. We update msg2_psn_cur to 7, deliver the PSN7 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0106] 8) When the PSN8 message arrives, its msg number is 2 and its SOM flag is 0. Based on msg2, msg2_psn_cur = 7 and msg2_som_received = true. Since the PSN of the PSN8 message is 8, which equals msg2_psn_cur + 1, the current message order is correct. The PSN8 message is then delivered to the host memory. Because the EOM flag is 1, message completion processing is performed, and the msg2_psn_cur variable is cleared.

[0107] In this situation, although the message PSN arrives at the receiving end out of order, since there is no need to maintain the order between messages, and the out-of-order status is only checked within the msg, the packet loss and retransmission process will not be triggered.

[0108] Scenario 2: Flow1 contains four messages: msg1, msg2, msg3, and msg4. msg1, msg2, and msg4 are each segmented into four packets, while msg3 is segmented into one packet. The PSNs of msg1 are 1-4, msg2's are 5-8, msg3's is 9, and msg4's are 10-13. The packets are sent at the source in the order of PSN1->PSN13. Assuming that packets PSN3, PSN9, and PSN10 are lost due to bit errors during transmission, the receiving end receives packets with the following PSN order: 1, 2, 4, 5, 6, 7, 8, 11, 12, 13. The receiving end then processes the packets as follows.

[0109] 1) When a PSN1 message arrives, with msg number 1 and SOM flag 1, variables msg1_psn_cur = 1 and msg1_som_received = true are created. The PSN1 message is delivered to the host memory, the EOM flag is set to 0, and no message completion processing is performed.

[0110] 2) When the PSN2 message arrives, the msg number is 1 and the SOM flag is 0. Based on msg1, we look up msg1_psn_cur = 1 and msg1_som_received = true. Since the PSN of the PSN2 message is 2, which is equal to msg1_psn_cur + 1, the current message order is correct. We update msg1_psn_cur to 2, deliver the PSN1 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0111] 3) When the PSN4 message arrives, the msg number is 1 and the SOM flag is 0. Based on msg1, we look up msg1_psn_cur = 2 and msg1_som_received = true. Since the PSN of the PSN4 message is equal to 4, which is not equal to msg1_psn_cur + 1 (i.e., 3), the current message order is incorrect. The current PSN4 message is discarded, and a NACK message is returned to the sender, carrying the information of msg1, psn3 and msg_order_flag = true.

[0112] 4) When a PSN5 message arrives with msg number 2 and SOM flag 1, create variables msg2_psn_cur = 5 and msg2_som_received = true, deliver the PSN5 message to the host memory, set EOM flag to 0, and do not perform message completion processing.

[0113] 5) When the PSN6 message arrives, the msg number is 2 and the SOM flag is 0. Based on msg2, we look up msg2_psn_cur = 5 and msg2_som_received = true. Since the PSN of the PSN6 message is 6, which is equal to msg2_psn_cur + 1, the current message order is correct. We update msg2_psn_cur to 6, deliver the PSN6 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0114] 6) When the PSN7 message arrives, the msg number is 2 and the SOM flag is 0. Based on msg2, we look up msg2_psn_cur = 6 and msg2_som_received = true. Since the PSN of the PSN7 message is 7, which is equal to msg2_psn_cur + 1, the current message order is correct. We update msg2_psn_cur to 7, deliver the PSN7 message to the host memory, set the EOM flag to 0, and do not perform any message completion processing.

[0115] 7) When the PSN8 message arrives, its msg number is 2 and its SOM flag is 0. Based on msg2, msg2_psn_cur = 7 and msg2_som_received = true. Since the PSN of PSN8 is 8, which equals msg2_psn_cur + 1, the current message order is correct. The PSN8 message is then delivered to the host memory. Because the EOM flag is 1, message completion processing is performed, and the msg2_psn_cur variable is cleared.

[0116] 8) When the PSN11 message arrives, the msg number is 4 and the SOM mark is 0, the msg4_som_received information is not found when extracting msg4, indicating that the first packet was lost, and the original process is followed.

[0117] 9) PSN12 message arrives, msg number is 4, SOM mark is 0, extract msg4 but no msg4_som_received information is found, indicating that the first packet was lost, follow the original process.

[0118] 10) PSN13 message arrives, msg number is 4, SOM mark is 0, extract msg4 but no msg4_som_received information is found, indicating that the first packet was lost, follow the original process.

[0119] 11) When the PSN3 message arrives, the msg number is 1 and the SOM flag is 0. Based on msg1, we look up msg1_psn_cur = 2 and msg1_som_received = true. Since the PSN of PSN3 is 3, which is equal to msg1_psn_cur + 1, the current message order is correct. We update msg1_psn_cur to 3, deliver the PSN3 message to the host memory, set the EOM flag to 0, and do not perform message completion processing.

[0120] 12) When the PSN4 message arrives, its msg number is 1 and its SOM flag is 0. Based on msg1, msg1_psn_cur = 3 and msg1_som_received = true. Since the PSN of PSN4 is 4, which equals msg1_psn_cur + 1, the current message order is correct. The PSN4 message is then delivered to the host memory. Because the EOM flag is 1, message completion processing is performed, and the msg1_psn_cur variable is cleared.

[0121] 13) When the PSN9 message arrives after timeout and is retransmitted, the msg number is 3. Since it carries SOM and EOM tags, the message is delivered to the host memory for message completion processing.

[0122] 14) When PSN10 to PSN13 messages arrive after timeout retransmission, the processing procedure is exactly the same as that for msg2, and will not be repeated here.

[0123] In this scenario, although packet loss occurs, the retransmission range is limited to within the msg, requiring only two packets to be retransmitted, thus reducing the packet loss retransmission rate and improving transmission efficiency.

[0124] Example 3: This example illustrates the encapsulation extension method of carrying msg_id (if needed) and msg_order_flag information in service packets and carrying msg_id (if needed) and msg_order_flag in NACK packets, taking the currently popular Infiniband (IB) and UEC protocols as examples.

[0125] For the Infiniband protocol, the basic encapsulation format of a message is shown in Figure 9. The Opcode indicates the type of the current message. The Opcode consists of 8 bits. When the Opcode starts with 000, it indicates that the current message is an RC service message or an ACK / NACK message. The last 5 bits are used to indicate the message type.

[0126] In the basic encapsulation format, there are two Reserved fields that can be reused. One bit can be used to represent the msg_order_flag information, and the remaining 12 bits can be used to identify the msg_id information. This extension method is suitable for both business messages and NACK messages. The only difference is that it is different from the definition of Opcode, as shown in Figure 10.

[0127] New RC message types can be defined by modifying the Opcode. In the Opcode field of the RC operation, values ​​from 00011000 to 00011111 are reserved and can be used to define new encapsulation formats. For example, in this invention, 00011000 indicates that the message is a Write message and does not require message ordering. Similarly, 00011111 can be used to indicate that the message is a NACK message, while still containing the content described in this invention. The NACK message carries an AETH header (Ack Extension Transport Header), the definition of which remains unchanged.

[0128] For the UEC protocol, the semantic header encapsulation format extension is shown in Figure 11. The message already carries a 16-bit message id field and does not require further extension. Additionally, the message contains three Reserved fields, totaling 13 bits, which can be used to carry the msg_order_flag. The example in the figure uses 1 bit of the first reserved field to carry the msg_order_flag. The opcode in the reserved value indicates the message encapsulation format described in this invention, for example, 0x11.

[0129] In a NACK message, the extended encapsulation format is shown in Figure 12. The message_id is carried in the semantic header, which is a generic return header encapsulation. The PSN is carried in the delivery sublayer header. When the Code value is not equal to 0, it indicates that the current message is a NACK message. The index generation field in the generic semantic header of a NACK message is invalid and can be used to carry msg_order_flag. The Opcode redefines the encapsulation type indicating the semantic header and delivery sublayer message in reserved values, for example, 0x12.

[0130] The three embodiments described above are merely embodiments of this application and are not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0131] The foregoing has described specific embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0132] Figure 13 is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Referring to Figure 13, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, and a memory. The memory may include RAM, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk storage device. Of course, the electronic device may also include other hardware required for other services.

[0133] The processor, network interface, and memory can be interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. This bus can be categorized as an address bus, data bus, control bus, etc. For ease of illustration, only a single bidirectional arrow is used in Figure 13, but this does not imply that there is only one bus or one type of bus.

[0134] Memory is used to store programs. Programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.

[0135] The processor reads the corresponding computer program from non-volatile memory into memory and then runs it, forming a message transmission device based on RDMA at the logical level. The processor executes the program stored in memory and performs the following operations: sending a first message to the receiving end, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information being used to indicate whether out-of-order message reception is allowed.

[0136] The method executed by the RDMA-based message transmission device disclosed in the embodiment shown in Figure 13 of this application can be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.

[0137] The electronic device can also execute the method of FIG2 and implement the functions of the RDMA-based message sending device in the embodiment shown in FIG2, which will not be described in detail here.

[0138] Of course, in addition to software implementation, the electronic device of this application does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

[0139] This application also proposes a computer-readable storage medium that stores one or more programs, the one or more programs including instructions that, when executed by a portable electronic device including multiple applications, enable the portable electronic device to perform the method of the embodiment shown in FIG2 and to perform the following operations: sending a first message to a receiving end, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information being used to indicate whether out-of-order reception of messages is allowed.

[0140] Figure 14 is a schematic diagram of the structure of an RDMA-based message sending device 140 according to an embodiment of this application. Referring to Figure 14, in one software implementation, the RDMA-based message sending device 140 may include: a sending module 141, wherein the sending module 141 sends a first message to a receiving end, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information being used to indicate whether out-of-order message reception is allowed.

[0141] In some embodiments, the sending module 141 sends a first message to the receiving end, including: receiving a first parameter sent by the application layer, the first parameter being used to indicate whether out-of-order message reception is allowed; and sending a first message to the receiving end according to the first parameter; wherein, when the first parameter indicates that out-of-order message reception is allowed, the indication information carried in the first message is used to indicate that out-of-order message reception is allowed, and when the first parameter indicates that out-of-order message reception is not allowed, the indication information carried in the first message is used to indicate that out-of-order message reception is not allowed.

[0142] In some implementations, the sending module 141 receives a first parameter sent by the application layer, including receiving the first parameter sent by the application layer based on an extended programming interface, wherein the extended programming interface includes an extended first field, and the first field is used to carry the first parameter.

[0143] In some implementations, the first message also carries an opcode; the sending module 141 further performs load balancing based on the identification information when a first condition is met; wherein, the first condition includes the opcode being in reliable mode and the indication information being used to indicate that out-of-order message reception is allowed; the load balancing based on the identification information includes sending messages with the same identification information to the same transmission path, and sending messages with different identification information to the same or different transmission paths.

[0144] In some implementations, when the indication information is used to indicate that out-of-order reception of messages is permitted, the sending module 141 further includes: receiving a NACK message sent by the receiving end, the NACK message being sent by the receiving end when it is determined that a message within the message to which the first message belongs has been lost or out of order; and retransmitting the message based on the NACK message.

[0145] In some implementations, the NACK message carries the identification information and the retransmission start packet sequence number; the sending module 141 performs message retransmission based on the NACK message, including: sending the message corresponding to the identification information, starting from the start packet sequence number, to the receiving end based on the identification information and the start packet sequence number.

[0146] The RDMA-based message sending device 140 provided in this application can also execute the method of FIG2 and realize the function of the RDMA-based message sending device 140 in the embodiment shown in FIG2, which will not be described again here.

[0147] Figure 15 is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Referring to Figure 15, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, and a memory. The memory may include RAM, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk storage device. Of course, the electronic device may also include other hardware required for other services.

[0148] The processor, network interface, and memory can be interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. This bus can be categorized as an address bus, data bus, control bus, etc. For ease of illustration, only a single bidirectional arrow is used in Figure 15, but this does not imply that there is only one bus or one type of bus.

[0149] Memory is used to store programs. Programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.

[0150] The processor reads the corresponding computer program from non-volatile memory into memory and then runs it, forming a message processing device based on RDMA at the logical level. The processor executes the program stored in memory and performs the following operations: receiving a first message sent by a transmitter, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information indicating whether out-of-order message reception is allowed; processing the first message according to the indication information and the identification information, the processing including out-of-order message reception or ordered message reception of the message to which the first message belongs.

[0151] The method executed by the RDMA-based message processing device disclosed in the embodiment shown in Figure 15 of this application can be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.

[0152] The electronic device can also execute the method of FIG6 and implement the functions of the RDMA-based message processing device in the embodiment shown in FIG6, which will not be described in detail here.

[0153] Of course, in addition to software implementation, the electronic device of this application does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

[0154] This application also proposes a computer-readable storage medium storing one or more programs, the programs including instructions that, when executed by a portable electronic device including multiple applications, enable the portable electronic device to perform the method of the embodiment shown in FIG6 and to perform the following operations: receiving a first message sent by a transmitter, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information indicating whether out-of-order message reception is allowed; and processing the first message according to the indication information and the identification information, the processing including out-of-order message reception or ordered message reception of the message to which the first message belongs.

[0155] Figure 16 is a schematic diagram of the structure of an RDMA-based message processing device 160 according to an embodiment of this application. Referring to Figure 16, in one software implementation, the RDMA-based message processing device 160 may include: a receiving module 161 and a processing module 162, wherein: the receiving module 161 receives a first message sent by a sending end, the first message carrying indication information and identification information of the message to which the first message belongs, the indication information being used to indicate whether out-of-order message reception is allowed; the processing module 162 processes the first message according to the indication information and the identification information, the processing including out-of-order message reception or ordered message reception of the message to which the first message belongs.

[0156] In some embodiments, the processing module 162 processes the first message according to the indication information and the identification information, including: when the indication information indicates that out-of-order message reception is allowed, performing first processing on the first message according to the identification information, wherein the processing principle of the first processing includes preserving the order of multiple messages within the message to which the first message belongs and not preserving the order of messages within the message to which the first message belongs; and performing second processing on the first message when the indication information indicates that out-of-order message reception is not allowed, wherein the processing principle of the second processing includes preserving the order of multiple messages within the message to which the first message belongs and preserving the order of messages within the message to which the first message belongs.

[0157] In some implementations, the first message also carries a packet sequence number, a first packet marker, a last packet marker, an identifier for the source queue pair, and an identifier for the destination queue pair. The processing module 162 performs a first processing on the first message based on the identifier information, including: determining whether the first message is the first or last message in the message based on the first packet marker and the last packet marker; and performing the first processing on the first message based on the determination result, the identifier information, the packet sequence number, the identifier for the source queue pair, and the identifier for the destination queue pair.

[0158] In some implementations, the determination result includes that the first message is the first message in the message; the processing module 162 performs a first processing on the first message based on the determination result, the identification information, the packet sequence number, the identifier of the source queue pair, and the identifier of the destination queue pair, including: generating corresponding first and second variables based on the identification information, the identifier of the source queue pair, and the identifier of the destination queue pair, wherein the first variable is used to record the packet sequence number of the last message received in order within the message, and the second variable is used to indicate whether the first message in the message has been successfully received; initializing the first and second variables, wherein the value of the first variable after initialization is the packet sequence number of the first message, and the value of the second variable after initialization is used to indicate that the first message in the message has been successfully received; and delivering the first message to the host memory.

[0159] In some implementations, the determination result includes that the first message is not the first or last message in the message to which it belongs; the processing module 162 performs a first processing on the first message based on the determination result, the identification information, the packet sequence number, the identifier of the source queue pair, and the identifier of the destination queue pair, including: finding a second variable corresponding to the identification information, the identifier of the source queue pair, and the identifier of the destination queue pair; if the second variable exists and is used to indicate that the first message in the message has been successfully received, finding a first variable corresponding to the identification information, the identifier of the source queue pair, and the identifier of the destination queue pair; if the packet sequence number is equal to the sum of the value of the first variable and 1, delivering the first message to the host memory and updating the value of the first variable to the packet sequence number; if the packet sequence number is not equal to the sum of the value of the first variable and 1, determining that the message to which the first message belongs has been lost or out of order, and sending a NACK message to the sending end.

[0160] In some implementations, the NACK message carries the identification information and the retransmission start packet sequence number, which is equal to the sum of the value of the first variable and 1.

[0161] In some implementations, the determination result includes that the first message is the last message in the message to which it belongs; the processing module 162 performs a first processing on the first message based on the determination result, the identification information, the packet sequence number, the identifier of the source queue pair, and the identifier of the destination queue pair, including: finding a first variable and a second variable corresponding to the identification information, the identifier of the source queue pair, and the identifier of the destination queue pair; releasing the first variable and the second variable; delivering the first message to the host memory; and sending an ACK message to the sending end.

[0162] The RDMA-based message processing apparatus 160 provided in this application can also execute the method of FIG6 and implement the functions of the RDMA-based message processing apparatus 160 in the embodiment shown in FIG6, which will not be described again here.

[0163] This application also provides an RDMA-based message transmission and processing system, which includes a sender and a receiver. The sender sends a first message to the receiver. The first message carries indication information and identification information of the message to which the first message belongs. The indication information indicates whether out-of-order message reception is allowed. The receiver receives the first message and processes it according to the indication information and identification information. This processing includes performing out-of-order message reception or ordered message reception on the message to which the first message belongs.

[0164] In this embodiment, the implementation of each step performed by the sending end can be found in the implementation of the corresponding steps in the embodiment shown in Figure 2, and the same technical effect can be achieved. Therefore, it will not be described in detail here. Similarly, the implementation of each step performed by the receiving end can be found in the implementation of the corresponding steps in the embodiment shown in Figure 6, and the same technical effect can be achieved. Therefore, it will not be described in detail here either.

[0165] This application also proposes a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps in the above-described RDMA-based message transmission method embodiments, or to perform some or all of the steps in the above-described RDMA-based message processing method embodiments.

[0166] In summary, the above description is merely a preferred embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

[0167] The systems, apparatus, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products having certain functions. A typical implementation device is a computer. A computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0168] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0169] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0170] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

Claims

1. A method for sending RDMA-based messages, applied to a sending end, comprising: sending a first message to a receiving end, the first message carrying indication information and identification information of a message to which the first message belongs, the indication information being used to indicate whether inter-message out-of-order reception is allowed. 2.The method of claim 1, wherein the sending of the first message to the receiving end comprises: receiving a first parameter sent by an application layer, the first parameter being used to indicate whether inter-message out-of-order reception is allowed; sending the first message to the receiving end according to the first parameter; wherein, in a case where the first parameter indicates that inter-message out-of-order reception is allowed, the indication information carried in the first message is used to indicate that inter-message out-of-order reception is allowed, and in a case where the first parameter indicates that inter-message out-of-order reception is not allowed, the indication information carried in the first message is used to indicate that inter-message out-of-order reception is not allowed. 3.The method of claim 2, wherein the receiving of the first parameter sent by the application layer comprises: receiving the first parameter sent by the application layer based on an extended programming interface, the extended programming interface comprising an extended first field, the first field being used to carry the first parameter. 4.The method of claim 1, wherein the first message further carries an operation code; and the method further comprises: in a case where a first condition is met, performing load balancing according to the identification information; wherein the first condition comprises that the operation code is in a reliable mode and the indication information is used to indicate that inter-message out-of-order reception is allowed; and the performing of the load balancing according to the identification information comprises sending messages with the same identification information to the same transmission path and sending messages with different identification information to the same or different transmission paths. 5.The method of claim 1, wherein, in a case where the indication information is used to indicate that inter-message out-of-order reception is allowed, the method further comprises: receiving a NACK message sent by the receiving end, the NACK message being sent by the receiving end in a case where it is determined that packet loss or out-of-order occurs in a message to which the first message belongs; performing packet retransmission according to the NACK message. 6.The method of claim 5, wherein the NACK message carries the identification information and a starting packet sequence number of retransmission; and the performing of the packet retransmission according to the NACK message comprises: sending, to the receiving end, messages in a message corresponding to the identification information and starting from the starting packet sequence number according to the identification information and the starting packet sequence number. 7.A method for processing RDMA-based messages, applied to a receiving end, comprising: receiving a first message sent by a sending end, the first message carrying indication information and identification information of a message to which the first message belongs, the indication information being used to indicate whether inter-message out-of-order reception is allowed; processing the first message according to the indication information and the identification information, the processing comprising inter-message out-of-order reception or inter-message in-order reception of a message to which the first message belongs.

8. The method of claim 7, wherein processing the first packet according to the indication information and the identification information comprises: in a case where the indication information indicates that inter-message out-of-order reception is allowed, processing the first packet according to the identification information, wherein the first processing includes processing packets in the message to which the first packet belongs in order and not processing the message to which the first packet belongs in inter-message order; and in a case where the indication information indicates that inter-message out-of-order reception is not allowed, processing the first packet according to the identification information, wherein the second processing includes processing packets in the message to which the first packet belongs in order and processing the message to which the first packet belongs in inter-message order. The processing of the first packet according to the identification information comprises:

9. The method of claim 8, wherein the first packet further carries a packet sequence number, a first packet flag, a last packet flag, an identification of a source queue pair, and an identification of a destination queue pair. determining whether the first packet is the first packet or the last packet in the message according to the first packet mark and the last packet mark; and processing the first packet according to the determination result, the identification information, the packet sequence number, the identification of the source queue pair, and the identification of the destination queue pair.

10. The method of claim 9, wherein the determination result includes that the first packet is the first packet in the message, and the processing of the first packet according to the determination result, the identification information, the packet sequence number, the identification of the source queue pair, and the identification of the destination queue pair comprises: generating a first variable and a second variable corresponding to the identification information, the identification of the source queue pair, and the identification of the destination queue pair, wherein the first variable is used to record the packet sequence number of the last packet in the message that has been received in order, and the second variable is used to indicate whether the first packet in the message has been successfully received; initializing the first variable and the second variable, wherein the value of the first variable after initialization is the packet sequence number of the first packet, and the value of the second variable after initialization is used to indicate that the first packet in the message has been successfully received; and delivering the first packet to a host memory.

11. The method of claim 9, wherein the determination result includes that the first packet is not the first packet or the last packet in the message, and the processing of the first packet according to the determination result, the identification information, the packet sequence number, and the identification of the source queue pair and the identification of the destination queue pair comprises: finding a second variable corresponding to the identification information, the identification of the source queue pair, and the destination queue pair; in a case where the second variable exists and is used to indicate that the first packet in the message has been successfully received, finding a first variable corresponding to the identification information, the identification of the source queue pair, and the identification of destination queue pair; in a case where the packet sequence number is equal to the sum of the value of the first variable and 1, delivering the first packet to a host memory, and updating the value of the first variable to the packet sequence number. ​ ​ ​ ​ ​ ​ In a case where the packet sequence number is not equal to the sum of the value of the first variable and 1, it is determined that packet loss or out-of-order occurs in the message to which the first packet belongs, and a NACK packet is sent to the sending end.

12. The method of claim 11, wherein the NACK packet carries the identification information and a start packet sequence number for retransmission, and the start packet sequence number is equal to the sum of the value of the first variable and 1.

13. The method of claim 9, wherein the determination result includes that the first packet is the last packet in the message to which the first packet belongs, and the first processing of the first packet according to the determination result, the identification information, the packet sequence number, the identification of the source queue pair, and the identification of the destination queue pair includes: finding a first variable and a second variable corresponding to the identification information, the identification of the source queue pair, and the identification of the destination queue pair; releasing the first variable and the second variable; delivering the first packet to a host memory; sending an ACK packet to the sending end.

14. An RDMA-based packet sending and processing system, comprising a sending end and a receiving end, wherein: the sending end sends a first packet to the receiving end, the first packet carrying indication information and identification information of a message to which the first packet belongs, the indication information being used to indicate whether inter-message out-of-order reception is allowed; the receiving end receives the first packet, and processes the first packet according to the indication information and the identification information, the processing including inter-message out-of-order reception or inter-message in-order reception of the message to which the first packet belongs.

15. An electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 13.

16. A computer-readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform the method of any one of claims 1 to 13.

17. A computer program product, comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps in the method of any one of claims 1 to 13.