RDMA network receiver-side load balancing methods, devices, equipment, media, and products
By using a load balancing method on the receiver side of the RDMA network, which utilizes hashing and locking mechanisms to distribute packets to different linked lists and schedules them based on virtual and physical functions, the problem of traffic and load imbalance on the receiver side of the RDMA network is solved, thereby improving system performance and throughput.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- WUXI STARS MICRO SYSTEM TECHNOLOGIES CO LTD
- Filing Date
- 2025-02-19
- Publication Date
- 2026-06-30
Smart Images

Figure CN120034492B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer network communication technology, specifically to a load balancing method, apparatus, device, medium, and product for the receiving side of an RDMA network. Background Technology
[0002] RDMA (Remote Direct Memory Access) is a direct memory access technology used to solve the problem of server-side data processing latency in network transmission. Its core concept is to allow one computer system to directly read and write to the memory of another system without much intervention from the operating system kernel and CPU of both systems.
[0003] RDMA is a stateful service, and the processing contexts of a single stream can influence each other, making single-stream performance a bottleneck on the receiving side. Header blocking can occur between received traffic within the network interface card (NIC). For example, if a specific traffic function experiences a write-to-host traffic bottleneck, it can lead to overall NIC receive traffic congestion. To improve processing performance, the receiving side typically divides processing into multiple pipelined operations. However, due to the stateful nature of the service, processing in later pipelines can affect processing in earlier pipelines; therefore, there is an inherent processing bottleneck within the pipeline level of the same stream. High-bandwidth, high-performance processing usually requires parallel processing of related traffic using multiple engines similar to multi-process architectures. However, RDMA, being a stateful service, has coupled relationships between traffic flows, preventing interaction between processing modules across multiple engines. This results in varying load states across multiple engines, impacting maximum performance.
[0004] Therefore, there is an urgent need for a method to perform traffic and load balancing in RDMA network receiver congestion scenarios, so as to ensure inter-flow balancing processing under congestion conditions. Summary of the Invention
[0005] In view of this, this application provides a load balancing method, apparatus, device, medium and product for the receiving side of an RDMA network, which performs traffic and load balancing in congestion scenarios on the receiving side of an RDMA network to ensure inter-flow balancing processing under congestion conditions. The technical solution is as follows.
[0006] In a first aspect, this application provides a load balancing method for the receiver side of an RDMA network, wherein the receiver side of the RDMA network includes a receive buffer and a receiver side processing engine, and the method includes:
[0007] The received traffic data is retrieved from the receive buffer; this traffic data consists of multiple packets from different data streams.
[0008] Hash the traffic data to obtain a data linked list corresponding to the packets of different data streams;
[0009] The data list is categorized based on the load status of the receiving side processing engine;
[0010] Select the virtual function corresponding to the data list, perform flow control round-robin fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data list.
[0011] Based on the current load of the receiving side processing engine, the target message is transmitted to the target receiving side processing engine.
[0012] In one optional implementation, after the step of transmitting the target message to the target receiving side processing engine, the method further includes: locking the target data list, and releasing the target data list after the target receiving side processing engine has finished processing the target message.
[0013] In one optional implementation, the data list is classified according to the load status of the receiving-side processing engine, including:
[0014] The data list is classified based on physical or virtual functions using a load balancing algorithm to ensure fair traffic scheduling for the physical or virtual functions.
[0015] In one optional implementation, the step of performing flow control polling fair scheduling based on the virtual function to schedule the corresponding target packet from the target data list includes:
[0016] When traffic congestion occurs in the scheduling function, the corresponding scheduling function is paused from scheduling packets from the data list.
[0017] The RDMA network receiver-side load balancing method provided in this application has the following advantages:
[0018] The RDMA network receiver-side load balancing method in this application is designed to perform load balancing between the receiver buffer and the receiver-side processing engine under receiver-side congestion scenarios. ,Traffic balancing is implemented across virtual functions. Before pushing packets from the receive buffer to the receiving-side processing engine, received traffic data, composed of packets from multiple different data streams, is retrieved from the receive buffer. The packets are first hashed; the hashing function chains packets from different streams into different chains. Since different traffic streams do not require order preservation, small packets can be bypassed even in adverse conditions. After chaining the packets through hashing, a locking function is applied to each chain. After a packet is scheduled out of each chain, the chain is locked to ensure that only one packet from the same stream is processed by subsequent modules. After subsequent modules complete processing, the lock is released on the corresponding chain to allow the next packet to be scheduled for processing. Based on traffic partitioning methods such as VF (Virtual Function), traffic corresponding to the chains is allocated. When VF traffic is congested, scheduling of the corresponding VF is stopped to prevent the corresponding packets from occupying the receiving-side processing engine's processing space, thus achieving traffic balancing. Furthermore, traffic can be allocated based on the Physical Function (PF) of the linked list. Packets scheduled from the data linked list are distributed to different receiving-side processing engines based on their current load. Since each flow has only one packet processing pipeline, there is no context interaction between receiving-side processing engines, and no fallback processing between different pipelines within the same receiving-side processing engine. This ensures balanced processing between flows even under receiving-side congestion; supports inter-flow bypass to reduce tail latency for small packets; and supports flexible architecture to improve performance.
[0019] Secondly, this application provides a load balancing device for the receiver side of an RDMA network, wherein the receiver side of the RDMA network includes a receive buffer and a receiver side processing engine, and the device includes:
[0020] The acquisition module is used to acquire received traffic data from the receive buffer; this traffic data consists of multiple packets from different data streams.
[0021] The hash module is used to hash the traffic data to obtain a data linked list corresponding to the packets of different data streams;
[0022] The allocation module is used to classify the data list according to the load status of the receiving side processing engine;
[0023] The scheduling module is used to select the virtual function corresponding to the data list, perform flow control round-robin fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data list.
[0024] The processing module is used to transmit the target message to the target receiving side processing engine based on the current load status of the receiving side processing engine.
[0025] In one optional implementation, the scheduling module is further configured to lock the target data list and release the lock on the target data list after the target receiving side processing engine has finished processing the target message.
[0026] In one optional implementation, the scheduling module is further configured to: suspend the corresponding scheduling function from scheduling packets from the data list when there is traffic congestion in the scheduling function.
[0027] In one alternative implementation, the allocation module is specifically used to: classify the data list based on physical or virtual functions using a load balancing algorithm, so as to ensure fair traffic scheduling for the physical or virtual functions.
[0028] Thirdly, this application provides a computer device, including: a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the RDMA network receiver-side load balancing method described in the first aspect or any corresponding embodiment.
[0029] Fourthly, this application provides a computer-readable storage medium storing computer instructions for causing a computer to execute the RDMA network receiver-side load balancing method described in the first aspect or any corresponding embodiment thereof.
[0030] Fifthly, this application provides a computer program product, including computer instructions for causing a computer to execute the RDMA network receiver-side load balancing method described in the first aspect or any corresponding embodiment thereof. Attached Figure Description
[0031] To more clearly illustrate the technical solutions in the specific embodiments of this application or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0032] Figure 1 This is a flowchart illustrating a typical RX processing flow.
[0033] Figure 2 This is a flowchart illustrating the exception handling pipeline of an RX Engine.
[0034] Figure 3 This is a flowchart illustrating a multi-RX Engine parallel processing workflow.
[0035] Figure 4 This is a flowchart illustrating an RDMA network receiver-side load balancing method according to an exemplary embodiment of this application.
[0036] Figure 5 This is a schematic diagram illustrating traffic load balancing processing according to an exemplary embodiment of this application.
[0037] Figure 6 This is a schematic diagram of the structure of the RDMA network receiver-side load balancing device provided in the embodiments of this application.
[0038] Figure 7 This is a schematic diagram of the structure of a computer device provided in an optional embodiment of this application. Detailed Implementation
[0039] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0040] First, let me introduce the terminology used in this application.
[0041] RDMA: Remote Direct Memory Access, a technology to solve the problem of server-side data processing latency in network transmission;
[0042] CPU: Central Processing Unit;
[0043] RX Engine: Receiver-side processing engine;
[0044] RC: Reliable Connections, a type of transport service in RDMA technology, namely reliable connection;
[0045] InfiniBand: A high-speed, low-latency computer network communication standard;
[0046] WR: Work Request, a work request issued by a user process;
[0047] SEND: An RDMA operation type used to send data from the local end to the remote end, which needs to be received by the remote end;
[0048] WRITE: An RDMA operation type used for writing data from the local end to the remote end;
[0049] RX_BUFFER: Receive-side buffer space;
[0050] PIPE1, PIPE2, PIPE3: These are used to describe the pipes through which data flows through different processing stages.
[0051] In traditional network communication, when a computer system needs to transmit data to another system, the data typically undergoes multiple layers of processing. For example, at the sending end, data is first copied from the application buffer to the operating system kernel buffer, and then processed through layers of encapsulation by the network protocol stack before being sent over the network. At the receiving end, the data needs to be unpacked in reverse and copied from the kernel buffer to the application buffer. This multiple data copying and complex software layer processing introduces numerous problems.
[0052] High latency: Data transmission and processing between different layers consume a significant amount of time, resulting in a substantial increase in end-to-end latency. This is extremely detrimental to applications with high real-time requirements, such as financial trading systems and real-time data interaction in high-performance computing. Low bandwidth utilization: Frequent data copying operations, along with the processing overhead of the operating system and network protocol stack, consume a large amount of CPU resources and network bandwidth, significantly reducing the bandwidth actually available for effective data transmission.
[0053] To overcome these shortcomings of traditional network communication, Reliable Memory Access (RDMA) technology emerged. The core concept of RDMA is to allow one computer system to directly read and write to the memory of another system without excessive intervention from the operating system kernels and CPUs of both systems. In RDMA technology, the commonly used Reliable Connection Service type (RC) ensures that information is sent completely and without error to the destination, and the destination returns an acknowledgment notifying the requesting end that the information has been received completely and without error. The RC service type supports the following operation types: SEND operation: The local end sends data to the remote end, which stores it in its receiving space. The remote host reads and processes the data, and returns an ACK (Acknowledgement) message after receiving it. WRITE operation: The local end writes data to the remote end without the remote host's intervention, and the remote end returns an ACK upon completion. READ operation: The local end reads data from the remote end without the remote host's intervention, and the remote end returns an RESP message carrying the data back to the local end upon completion.
[0054] Due to the aforementioned advantages, current standard network interface cards (NICs) and smart NICs typically require offloading the RDMA protocol internally. However, RDMA is a stateful protocol offloading mechanism, and typical receiver-side protocol offloading implementations suffer from the following bottlenecks:
[0055] RDMA, as a popular high-speed network solution, boasts high bandwidth and low latency as its key technical features. In current GPU traffic training models, the slowest path becomes the biggest bottleneck to training efficiency, and tail latency is becoming an increasingly important metric. Furthermore, incast congestion on the receiving network card side, "elephant flow" will rapidly increase the tail latency of "mouse flow." And "mouse flow" is typically very sensitive to latency. Figure 1 As shown, in a typical RDMA RX Engine processing flow, packets enter the network card sequentially and are stored in the network card's BUFFER space. The RX_BUFFER sequentially pushes the packets to the RX_ENG for processing. Different colors represent different flows, and processing does not need to maintain order. Due to network card receiving and processing congestion, red traffic will be queued far back, making it impossible to achieve the overtaking of latency-sensitive traffic.
[0056] In terms of technical implementation, traffic is typically sent and received in bursts. Therefore, the burst nature of elephant streams causes a short period of single-stream processing at the receiving end. Since RDMA is a stateful service, the single-stream processing contexts influence each other, making single-stream performance a bottleneck for the receiving end. For example... Figure 1 As shown, when RX_Engine processes packets sequentially, many of the preceding data are from the same traffic stream, as indicated by the yellow traffic in the figure. This results in single-stream processing performance within a short period, leading to a decrease in overall RX_ENGNE performance. Header blocking may exist between received traffic within the network interface card (NIC). For example, if a specific traffic function experiences a write-to-host traffic bottleneck, it can cause congestion in the overall NIC receive traffic.
[0057] To improve processing performance, the receiving side typically divides processing into multiple pipelined operations. However, due to the stateful nature of the processing, the processing of later pipeline stages may affect the processing of earlier pipeline stages; therefore, there is an inherent processing bottleneck within the pipeline stages of the same flow. For example... Figure 2 As shown, during exception handling, if an error occurs in PIPE3, it may affect the processing of packets in the same flow that is currently being processed in PIPE2. However, if packets in different flows belong to different traffic types, they will not affect each other.
[0058] High-bandwidth, high-performance processing typically requires the use of multiple engines, similar to multi-process engines, to process related service traffic in parallel, such as... Figure 3As shown in the diagram, RDMA is a stateful service. Services with the same traffic are coupled and cannot interact between multiple processing modules in the Engine, resulting in different load states for multiple Engines and affecting maximum performance. As shown in the diagram, when ENG0 experiences packet congestion and cannot process packets, packets from the same flow, being stateful, cannot reach other ENGs; while other flows are blocked by packets in the RX BUFFER, causing RX_ENG1 to become idle.
[0059] To address the congestion on the receiving side in RDMA protocol offloading implementations, this application provides a load balancing method for the receiving side of an RDMA network to ensure inter-flow load balancing under congestion conditions; it supports inter-flow bypass to reduce tail latency for small packets; and it supports flexible architecture to improve performance.
[0060] This embodiment of the RDMA network receiver-side load balancing method includes a receiver buffer and a receiver-side processing engine on the RDMA network receiver side. The method flow of this embodiment is as follows: Figure 4 As shown, it includes the following steps:
[0061] S401. Obtain the received traffic data from the receive buffer; the traffic data consists of multiple messages from different data streams.
[0062] Specifically, in step S401, before the message is pushed from the receive buffer to the receiving side processing engine, the received traffic data is obtained from the receive buffer. This traffic data consists of messages from multiple different data streams.
[0063] S402. Hash the traffic data to obtain the data linked list corresponding to the packets of different data streams.
[0064] Specifically, in step S402, the traffic data is hashed using a hash algorithm, linking packets from different data streams into different data lists. The purpose of the hash operation is to allocate data packets from different data streams to different lists, ensuring that data packets from the same data stream are processed sequentially in subsequent processing. By forming different lists for each data stream's packets, it is ensured that data from different data streams does not interfere with each other during processing, effectively reducing the problem of congestion across all traffic caused by congestion in a specific list.
[0065] S403. Classify the data list according to the load status of the receiving side processing engine.
[0066] Specifically, in step S403, traffic is allocated based on Virtual Functions (VFs). Each Virtual Function (VF) corresponds to different traffic, and congestion is controlled through real-time traffic monitoring. When traffic in a VF becomes congested, scheduling of that VF is suspended to prevent it from continuing to occupy resources and affect the overall system performance. This method effectively avoids excessive resource consumption, especially under high load conditions, and can balance the load between different flows, reducing system congestion and latency. Furthermore, traffic can also be allocated based on Physical Functions (PFs) corresponding to linked lists.
[0067] Optionally, the classification of the data linked lists in the above steps relies on a load balancing algorithm, but the specific implementation differs from traditional load balancing. Specifically, based on the load status of the receiving-side processing engines, the data linked lists are first statically hashed and distributed. By evenly distributing the hash keys of the flows, different flows are distributed across multiple linked lists, essentially a static load balancing strategy. Hash distribution only completes the initial load allocation; subsequent steps require adjustments using dynamic load balancing algorithms. For example, the least-connections algorithm assigns newly arriving packets to the processing engine with the fewest currently active connections, avoiding overload of a single engine. It also monitors the queue depth and processing latency of each engine in real time, prioritizing the lightest-loaded node. Another example is the weighted round-robin algorithm, which allocates weights based on the processing capacity of the receiving-side engines, proportionally distributing packets to different processing engines based on these weights, allocating more resources to high-priority VFs or traffic categories. Through the layered collaboration of static and dynamic load balancing, the requirements for data flow order preservation, low latency, and high throughput can be balanced.
[0068] S404. Select the virtual function corresponding to the data list, perform flow control polling fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data list.
[0069] Optionally, in step S404, after scheduling the corresponding packet from the data linked list using the scheduling function, the linked list is locked to ensure that only one packet from the same flow is processed by subsequent modules. The main purpose of locking is to prevent multiple packets from being scheduled to subsequent processing modules simultaneously, causing resource contention and conflicts. After the subsequent module completes processing, it releases the lock on the corresponding linked list so that the next packet can be scheduled for processing. This avoids resource conflicts and interference between different flows, ensuring efficient traffic scheduling.
[0070] S405. Based on the current load of the receiving side processing engine, transmit the target message to the target receiving side processing engine.
[0071] Specifically, in step S405, each ENGINE (processing unit) is assigned to a different processing unit based on its current load. At any given time, each ENGINE processes only one packet from one stream, while multiple streams can process multiple packets simultaneously. Therefore, there is no context switching or rollback issue between different streams. In this way, the load of each processing unit can be optimized, avoiding the performance loss caused by context switching and improving the parallel processing capability of multi-core and multi-processing unit systems.
[0072] In summary, the RDMA network receiver-side load balancing method provided in this application embodiment is designed to perform load balancing between the receiver buffer and the receiver-side processing engine under receiver-side congestion scenarios. , Traffic balancing is implemented across virtual functions. Before pushing packets from the receive buffer to the receiving-side processing engine, received traffic data, composed of packets from multiple different data streams, is retrieved from the receive buffer. The packets are first hashed; the hashing function chains packets from different streams into different chains. Since different traffic streams do not require order preservation, small packets can be overcome even in adverse conditions. After chaining the packets through hashing, a locking function is applied to each chain. After a packet is scheduled out of each chain, the chain is locked to ensure that only one packet from the same stream is processed by subsequent modules. After subsequent modules complete processing, the lock is released on the corresponding chain to allow the next packet to be scheduled for processing. Based on traffic partitioning methods such as VF (Virtual Function), traffic is allocated to the corresponding chains. When VF traffic is congested, scheduling of the corresponding VF is stopped to prevent the corresponding packets from occupying the receiving-side processing engine's processing space, thus achieving traffic balancing. Furthermore, traffic can be allocated based on the Physical Function (PF) of the linked list. Packets scheduled from the data linked list are distributed to different receiving-side processing engines based on their current load. Since each flow has only one packet processing pipeline, there is no context interaction between receiving-side processing engines, and no fallback processing between different pipelines within the same receiving-side processing engine. This ensures balanced processing between flows even under receiving-side congestion; supports inter-flow bypass to reduce tail latency for small packets; and supports flexible architecture to improve performance.
[0073] For example, based on the RDMA network receiver-side load balancing method of the above embodiments, a traffic load balancing module is constructed between RX_BUFFER and RX_ENG, such as... Figure 5As shown, traffic hashing avoids header blocking between flows and reduces latency for small packets. The hashed traffic can be differentiated through hashing and Virtual Functions (VFs), enabling function-based traffic balancing control. By hashing the traffic, different flows can be allocated to different Virtual Functions (VFs) for processing. Each VF can be viewed as an independent virtual functional unit responsible for processing packets from a specific flow. VF-based differentiation effectively achieves traffic balancing control, preventing congestion and blocking between VFs. VFs are allocated and managed through traffic scheduling, avoiding contention or resource contention among multiple traffic processing tasks within the same VF, thus reducing system congestion and latency, and preventing blocking between functions. Simultaneously, the traffic scheduling process includes locking for each flow, ensuring that only one packet from each flow is processed by the RX at any given time, preventing rollback of stateful services, and ensuring no blocking between ENGs, allowing for load balancing among ENGINEs. This traffic load balancing module has a flexible structure and can integrate any number of RX_ENGINEs depending on performance. It can implement single-stream locking, realize the conversion of stateful services to stateless services, avoid conflicts, and solve the multi-level pipeline rollback problem. It can achieve traffic balancing based on traffic partitioning such as PF / VF. In addition, it can realize the conversion of stateful services to stateless services, improve parallel processing capabilities, and achieve load balancing between engines.
[0074] By introducing a traffic load balancing module, a linked list locking and unlocking mechanism, VF flow control, and intelligent packet scheduling, the traffic management and load balancing on the RDMA receiver side are optimized. These measures effectively reduce resource contention and avoid congestion when facing large-scale data flows and complex network loads, improving data processing efficiency and stability. Optimizing traffic allocation and scheduling avoids congestion problems, reduces small packet latency, and increases overall system throughput. Overall, this design significantly improves the processing performance of the RDMA system, especially in high-traffic, high-load network environments, demonstrating better reliability and efficiency.
[0075] This application also provides an RDMA network receiver-side load balancing device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0076] This application provides an RDMA network receiver-side load balancing device. Figure 6This is a schematic diagram of a load balancing device for the receiver side of an RDMA network provided in an embodiment of this application. The RDMA network receiver side includes a receive buffer and a receiver side processing engine. The device includes:
[0077] The acquisition module 601 is used to acquire received traffic data from the receive buffer; the traffic data consists of multiple packets of different data streams;
[0078] The hash module 602 is used to hash the traffic data to obtain a data linked list corresponding to the packets of different data streams;
[0079] The allocation module 603 is used to classify the data list according to the load status of the receiving side processing engine;
[0080] The scheduling module 604 is used to select the virtual function corresponding to the data list, perform flow control round-robin fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data list.
[0081] The processing module 605 is used to transmit the target message to the target receiving side processing engine based on the current load status of the receiving side processing engine.
[0082] In an optional implementation, the scheduling module 604 is further configured to lock the target data list and release the target data list after the target receiving side processing engine has finished processing the target message.
[0083] In an optional implementation, the scheduling module 604 is further configured to: suspend the corresponding scheduling function from scheduling packets from the data list when there is traffic congestion in the scheduling function.
[0084] In one alternative implementation, the allocation module 605 is specifically used to: classify the data list based on physical or virtual functions using a load balancing algorithm, so as to enable fair scheduling of traffic for the physical or virtual functions.
[0085] Further functional descriptions of the above modules and units are the same as those in the corresponding embodiments described above, and will not be repeated here.
[0086] In this embodiment, the RDMA network receiver-side load balancing device is presented in the form of a functional unit. Here, a unit refers to an ASIC (Application Specific Integrated Circuit) circuit, a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions.
[0087] This application also provides a computer device having the above-described features. Figure 6The RDMA network receiver-side load balancing device shown is illustrated.
[0088] Please see Figure 7 , Figure 7 This is a schematic diagram of the structure of a computer device provided in an optional embodiment of this application, such as... Figure 7 As shown, the computer device includes one or more processors 10, memory 20, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The components communicate with each other via different buses and can be mounted on a common motherboard or otherwise installed as needed. The processors can process instructions executed within the computer device, including instructions stored in or on memory to display graphical information in a graphical user interface on an external input / output device (such as a display device coupled to the interface). In some alternative implementations, multiple processors and / or multiple buses can be used with multiple memories and multiple memory modules, if desired. Similarly, multiple computer devices can be connected, each providing some of the necessary operations (e.g., as a server array, a group of blade servers, or a multiprocessor system). Figure 7 Take a processor 10 as an example.
[0089] Processor 10 may be a central processing unit, a network processor, or a combination thereof. Processor 10 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The programmable logic device may be a complex programmable logic device (CAMP), a field-programmable gate array (FPGA), a general-purpose array logic (GDA), or any combination thereof.
[0090] The memory 20 stores instructions executable by at least one processor 10 to cause the at least one processor 10 to perform the method shown in the above embodiments.
[0091] The memory 20 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created based on the use of the computer device. Furthermore, the memory 20 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, the memory 20 may optionally include memory remotely located relative to the processor 10, and these remote memories may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0092] The memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk or solid-state drive; the memory 20 may also include a combination of the above types of memory.
[0093] The computer device also includes an input device 30 and an output device 40. The processor 10, memory 20, input device 30, and output device 40 can be connected via a bus or other means. Figure 7 Taking the example of a connection between China and Israel via a bus.
[0094] This application also provides a computer-readable storage medium. The methods described in this application can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code downloaded over a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and subsequently stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the methods shown in the above embodiments are implemented.
[0095] A portion of this application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to this application through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.
[0096] Although embodiments of this application have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of this application, and all such modifications and variations fall within the scope defined by the appended claims.
Claims
1. A load balancing method for the receiver side of an RDMA network, characterized in that, The RDMA network receiver side includes a receive buffer and a receiver processing engine, and the method includes: The received traffic data is retrieved from the receive buffer; the traffic data consists of multiple packets from different data streams. The traffic data is hashed to obtain data linked lists corresponding to packets of different data streams; The data lists are classified according to the load status of the receiving side processing engine; Select the virtual function corresponding to the data chain list, perform flow control round-robin fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data chain list. Based on the current load of the receiving side processing engine, the target message is transmitted to the target receiving side processing engine; The method further includes: After the corresponding packet is scheduled from the data list by the scheduling function, the target data list is locked to ensure that only one packet from the same flow is processed by the subsequent module. After the subsequent module finishes processing, the target data list is unlocked so that the next packet can be scheduled for processing.
2. The method according to claim 1, characterized in that, After the step of transmitting the target message to the target receiving-side processing engine, the method further includes: The target data list is locked, and the target data list is released after the target receiving side processing engine has finished processing the target message.
3. The method according to claim 2, characterized in that, The step of classifying the data linked list according to the load status of the receiving side processing engine includes: The data list is classified based on physical or virtual functions using a load balancing algorithm to ensure fair traffic scheduling for the physical or virtual functions.
4. The method according to claim 3, characterized in that, The flow control polling fair scheduling based on the virtual function, which schedules the corresponding target packet from the target data list, includes: When traffic congestion occurs in the scheduling function, the corresponding scheduling function is paused from scheduling packets from the data list.
5. A load balancing device for the receiver side of an RDMA network, characterized in that, The RDMA network receiver side includes a receive buffer and a receiver processing engine; the device includes: The acquisition module is used to acquire received traffic data from the receive buffer; the traffic data consists of multiple packets of different data streams. The hashing module is used to hash the traffic data to obtain data linked lists corresponding to packets of different data streams; The allocation module is used to classify the data linked list according to the load status of the receiving side processing engine; The scheduling module is used to select the virtual function corresponding to the data list, perform flow control round-robin fair scheduling based on the virtual function, and schedule the corresponding target packet from the target data list. The processing module is used to transmit the target message to the target receiving side processing engine based on the current load status of the receiving side processing engine; The scheduling module is also used for: After the corresponding packet is scheduled from the data list by the scheduling function, the target data list is locked to ensure that only one packet from the same flow is processed by the subsequent module. After the subsequent module finishes processing, the target data list is unlocked so that the next packet can be scheduled for processing.
6. The apparatus according to claim 5, characterized in that, The device further includes: The locking module is used to lock the target data list, and release the target data list after the target receiving side processing engine has finished processing the target message.
7. The apparatus according to claim 6, characterized in that, The scheduling module is also used for: When traffic congestion occurs in the scheduling function, the corresponding scheduling function is paused from scheduling packets from the data list.
8. A computer device, characterized in that, include: A memory and a processor are communicatively connected, the memory stores computer instructions, and the processor executes the computer instructions to perform the RDMA network receiver-side load balancing method according to any one of claims 1 to 4.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the RDMA network receiver-side load balancing method according to any one of claims 1 to 4.
10. A computer program product, characterized in that, Includes computer instructions for causing a computer to perform the RDMA network receiver-side load balancing method according to any one of claims 1 to 4.