Congestion control method, device and equipment of data center network and medium

By detecting congestion and employing differentiated rate control strategies in cross-domain multi-computing cluster networks, the problem of insufficient network performance in existing technologies is solved, thereby improving network resource utilization and large model training efficiency.

CN120378361BActive Publication Date: 2026-06-26BEIJING JILIU TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING JILIU TECH CO LTD
Filing Date
2025-05-29
Publication Date
2026-06-26

Smart Images

  • Figure CN120378361B_ABST
    Figure CN120378361B_ABST
Patent Text Reader

Abstract

The application provides a congestion control method, device, equipment and medium of a data center network. The method of the application is applied to a first node in a first intelligent algorithm cluster in a multi-intelligent algorithm cluster network, the first node is in communication connection with a second node, the second node is located in the first intelligent algorithm cluster or in a second intelligent algorithm cluster different from the first intelligent algorithm cluster, and the first node is used for transmitting target data to the second node. When an explicit congestion notification packet returned by the second node is detected, the traffic type of a target data flow currently congested is determined according to the explicit congestion notification packet, and when the traffic type is cross-intelligent algorithm cluster traffic, the congestion position of the target data flow is determined. The congestion degree and transmission progress of the target data flow are determined according to the information of the historical explicit congestion notification packet. According to the traffic type, the congestion position, the congestion degree and the transmission progress, the transmission rate of the target data flow is reduced. The application effectively solves the problems in the related art.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of communication technology, and in particular to a congestion control method, apparatus, device and medium for data center networks. Background Technology

[0002] Intelligent computing refers to computing specifically designed for artificial intelligence tasks, such as deep learning model training, large-scale data analysis, natural language processing, and computer vision. Clustering refers to connecting multiple independent computers (called nodes) through a high-speed network to work together as a whole, providing more powerful computing capabilities, storage capabilities, or higher availability than a single computer. Intelligent computing clusters are computer clusters specifically built to meet the needs of large-scale artificial intelligence computing. They typically have the following characteristics: (1) Powerful parallel computing capabilities: extensive use of graphics processors, tensor processors, or other AI acceleration chips; (2) High-speed interconnection network: extremely high bandwidth and extremely low latency network connections are required between nodes to efficiently transmit massive amounts of training data and model parameters; (3) High-performance storage: storage systems capable of quickly reading and writing large files are required to match the data consumption speed of computing nodes; (4) Optimized software stack: running specialized AI frameworks, libraries, and cluster management software. Cross-domain multi-intelligent computing clusters are a whole system composed of multiple intelligent computing clusters distributed in different geographical locations or network domains, and these clusters need to communicate and work together.

[0003] Data Center Quantized Congestion Notification (CQCN) is a congestion control mechanism specifically designed for Remote Direct Memory Access (RDMA) based networks. Its goal is to prevent congestion and maximize throughput in the high-speed, low-latency networks of modern data centers. High Precision Congestion Control (HPCC) is a congestion control algorithm designed to achieve extremely low latency (near-zero queuing), high bandwidth utilization, and fast convergence, typically targeting high-performance data center networks. RTT-based congestion control is a general term for a class of congestion control algorithms that primarily infer network congestion by measuring and analyzing the round-trip time (RTT) of data packets, rather than relying on explicit congestion notifications (ECN) or in-band telemetry.

[0004] Existing network congestion control algorithms (such as DCQCN, HPCC, and RTT-based algorithms mentioned above) have shortcomings when applied to cross-domain multi-intelligent computing cluster networks. The main problems include: (1) Adaptability issues: Although DCQCN is more hardware-adaptable than HPCC and RTT-based algorithms (no special function modifications are required for network cards and switches), existing DCQCN implementations (such as those on Nvidia network cards) cannot adapt well to the traffic patterns of intelligent computing cluster networks, resulting in the network performance not being fully utilized; (2) Cross-domain characteristics not considered: Existing network congestion control algorithms have not optimized the configuration for the characteristics of long link latency and insufficient link bandwidth caused by cross-domain connections; (3) Low network utilization: In the cross-domain multi-intelligent computing cluster network environment, the network utilization of existing congestion control algorithms is insufficient, affecting the efficiency of tasks such as large model training. Summary of the Invention

[0005] This application provides a method, apparatus, device, and medium for congestion control in data center networks, which can effectively solve the problems existing in the prior art.

[0006] This application provides a congestion control method for a data center network, applied to a first node in a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node is communicatively connected to a second node, which is located within the first intelligent computing cluster or in a second intelligent computing cluster different from the first intelligent computing cluster. The intelligent computing clusters are located in a data center. The first node is used to transmit target data to the second node. The network congestion control method includes:

[0007] When an explicit congestion notification packet returned by the second node is detected, the traffic type of the target data stream currently experiencing congestion is determined based on the explicit congestion notification packet. If the traffic type of the target data stream is cross-computing cluster traffic, the congestion location of the target data stream is determined. The target data stream is any one of at least one data stream included in the target data stream.

[0008] Based on the information received from historical explicit congestion notification packets, determine the congestion level and transmission progress of the target data stream;

[0009] The transmission rate of the target data stream is reduced based on the traffic type of the target data stream, the location of the congestion, the degree of congestion, and the transmission progress.

[0010] According to the congestion control method provided in this application, the traffic type of the target data stream includes cross-computing cluster traffic and intra-computing cluster traffic, and the congestion location includes the link connecting the first computing cluster and the second computing cluster and the internal network of the first computing cluster; the step of reducing the transmission rate of the target data stream according to the traffic type of the target data stream, the congestion location, the congestion degree, and the transmission progress includes:

[0011] If the traffic type of the target data stream is cross-computing cluster traffic, and the congestion location is the link connecting the first intelligent computing cluster and the second intelligent computing cluster, a first multiplicative rate reduction factor is determined according to the congestion level and the transmission progress, a first rate is determined according to the first multiplicative rate reduction factor, and the transmission rate of the target data stream is controlled according to the first rate.

[0012] If the traffic type of the target data stream is cross-computing cluster traffic, and the congestion location is the internal network of the computing cluster, a second multiplicative rate reduction factor is determined according to the congestion level and the transmission progress, a second rate is determined according to the second multiplicative rate reduction factor, and the transmission rate of the target data stream is controlled according to the second rate.

[0013] If the type of the communication flow is intra-cluster traffic of the intelligent computing cluster, a third multiplicative rate reduction factor is determined according to the congestion level and the transmission progress, a third rate is determined according to the third multiplicative rate reduction factor, and the transmission rate of the target data flow is controlled according to the third rate.

[0014] According to the congestion control method provided in this application, after reducing the transmission rate of the target data stream, the method further includes:

[0015] Upon first detection of a speed-up event, the transmission rate of the target data stream is controlled to enter a fast recovery mode. In the fast recovery mode, the transmission rate approximates the target rate in logarithmic form. The target rate is the transmission rate of the target data stream before the transmission rate of the target data stream is reduced.

[0016] When the number of detected acceleration events meets the first preset condition, the transmission rate of the target data stream is controlled to switch from the fast recovery mode to the aggressive increase mode. In the aggressive increase mode, the transmission rate approaches the target rate by an increase of a first preset constant value.

[0017] When the number of detected speed-up events meets the second preset condition, the transmission rate of the target data stream is controlled to switch from the aggressive increase mode to the super aggressive increase mode. In the super aggressive increase mode, the transmission rate approaches the target rate with an increase of a second preset constant value, where the second preset constant value is greater than the first preset constant value.

[0018] According to the congestion control method provided in this application, the method further includes:

[0019] In any of the fast recovery mode, the aggressive increase mode, and the ultra-aggressive increase mode, if an explicit congestion notification packet returned by the second node is detected, the step of reducing the transmission rate of the target data stream is re-executed.

[0020] According to the congestion control method provided in this application, the acceleration event is generated when the following rules are met:

[0021] No explicit congestion notification packet was detected within a preset time interval, and a preset amount of data (in bytes) in the target data stream was successfully sent.

[0022] According to the network congestion control method provided in this application, the first node communicates with the second node through a switch, and the congestion control strategy adopted in the switch is as follows:

[0023] Upon receiving any data packet from the target data stream, determine the length of the queue at the switch's egress end;

[0024] If the length of the queue is less than a preset minimum threshold, the data packet is not explicitly marked, and the data packet is sent to the second node;

[0025] If the length of the queue is greater than or equal to the preset minimum threshold and less than or equal to the preset maximum threshold, the data packet is explicitly marked according to the marking probability matching the length of the queue, and the processed data packet is sent to the second node;

[0026] If the length of the queue at the egress end of the switch is greater than the preset minimum threshold, the data packet is explicitly marked according to the marking probability of 1, and the marked data packet is sent to the second node.

[0027] According to the congestion control method provided in this application, the congestion control strategy adopted in the second node is as follows:

[0028] Upon first detection of a data packet carrying the explicit tag, an explicit congestion notification packet is generated for the target data stream containing the data packet, and the explicit congestion notification packet is returned to the first node;

[0029] After a preset time interval, it is determined whether at least one data packet carrying the explicit tag of the target data stream is received within the preset time interval. If the result is yes, an explicit congestion notification packet is generated for the target data stream, and the explicit congestion notification packet is returned to the first node.

[0030] This application also provides a congestion control device for a data center network, configured in a first node of a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node is communicatively connected to a second node, which is located within the first intelligent computing cluster or in a second intelligent computing cluster different from the first intelligent computing cluster. The intelligent computing clusters are located in a data center. The first node is used to transmit target data to the second node. The network congestion control device includes:

[0031] The first determining module is used to determine the traffic type of the target data stream currently experiencing congestion based on the explicit congestion notification packet when the explicit congestion notification packet is detected returned by the second node, and to determine the congestion location of the target data stream when the traffic type of the target data stream is cross-computing cluster traffic, wherein the target data stream is any one of at least one data stream included in the target data stream.

[0032] The second determining module is used to determine the congestion level and transmission progress of the target data stream based on the information of the received historical explicit congestion notification packets.

[0033] The speed reduction module is used to reduce the transmission rate of the target data stream based on the traffic type of the target data stream, the congestion location, the congestion degree, and the transmission progress.

[0034] This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement a congestion control method for a data center network as described above.

[0035] This application also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a congestion control method for a data center network as described above.

[0036] This application also provides a computer program product, including a computer program that, when executed by a processor, implements a congestion control method for a data center network as described above.

[0037] The congestion control method for a data center network according to this application, upon detecting an explicit congestion notification packet returned by the second node, determines the traffic type of the target data stream currently experiencing congestion based on the explicit congestion notification packet. If the traffic type of the target data stream is cross-computing cluster traffic, the congestion location of the target data stream is determined. The target data stream is any one of at least one data stream included in the target data stream. Next, based on information from previously received historical explicit congestion notification packets, the congestion level and transmission progress of the target data stream are determined. Finally, based on the traffic type, congestion location, congestion level, and transmission progress of the target data stream, the transmission rate of the target data stream is reduced. When the target data stream is of the cross-computing cluster type, this application further subdivides the congestion location of the target data stream and controls it accordingly. This can effectively address the long latency and insufficient bandwidth caused by cross-domain connections, better adapt to the traffic patterns of the computing cluster network, fully utilize network performance, and overcome the problem of insufficient network utilization of existing congestion control algorithms in cross-domain multi-computing cluster network environments. This can significantly improve the efficiency of tasks such as large model training based on computing clusters. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1 This is a schematic diagram of a communication architecture shown in one embodiment of this application;

[0040] Figure 2 This is a flowchart illustrating a congestion control method for a data center network according to an embodiment of this application;

[0041] Figure 3 This is a schematic diagram of a switch packet marking algorithm according to an embodiment of this application;

[0042] Figure 4 This is a schematic diagram of a state machine of a receiving end according to an embodiment of this application;

[0043] Figure 5 This is a schematic diagram of a state machine of a transmitting end according to an embodiment of this application;

[0044] Figure 6 This is a schematic diagram illustrating a deceleration control principle according to an embodiment of this application;

[0045] Figure 7This is a schematic diagram illustrating an acceleration process according to an embodiment of this application;

[0046] Figure 8 This is a schematic diagram of the parameters of the key mathematical formulas in the LLM-CC algorithm shown in one embodiment of this application;

[0047] Figure 9 This is a schematic diagram of the control logic of the LLM-CC algorithm shown in one embodiment of this application;

[0048] Figure 10 This is a structural block diagram of a congestion control device for a data center network according to an embodiment of this application;

[0049] Figure 11 This is a schematic diagram of the physical structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0050] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0051] To address the problems in existing technologies, this application proposes a novel congestion control algorithm, an improvement upon the DCQCN algorithm, named LLM-CC. The LLM-CC protocol is implemented entirely on the network interface card (NIC), primarily modifying the protocol stacks of the NIC and the switch. The LLM-CC algorithm consists of corresponding algorithms for the sender NIC, the switch, and the receiver NIC. Specifically, the sender corresponds to the Reaction Point (RP) and Congestion Point (CP), and the receiver corresponds to the Notification Point (NP), as follows: Figure 1 As shown. Figure 1 This is a schematic diagram of a communication architecture shown in one embodiment of this application.

[0052] In this application, the sender refers to a specific computing node within the intelligent computing cluster (specifically, the network interface card (NIC) responsible for data transmission on that node and its running protocol stack). An intelligent computing cluster is a collection of many computing nodes (typically including CPUs, GPUs, etc.), high-speed interconnect network devices (switches, NICs), and storage systems. It is a computing environment or infrastructure. The main entity executing the sender logic (RP function) is the network interface card (NIC) of a single node and its associated drivers and software. For example, in the communication scenario of intra-cluster communication (Intra-DC), the first node A (as the sender RP) in the first intelligent computing cluster sends data to the second node B (as the receiver NP) in the same cluster. As another example, in the communication scenario of inter-cluster communication (Inter-DC), the first node A (as the sender RP) in the first intelligent computing cluster sends data to the second node C (as the receiver NP) in the second intelligent computing cluster 2.

[0053] This application's network congestion control method is applied to a first node in a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node is communicatively connected to a second node, which transmits target data to the second node. The second node is located within the first intelligent computing cluster or in a second intelligent computing cluster different from the first intelligent computing cluster. The first node acts as the sender (RP), and the second node acts as the receiver (NP).

[0054] Figure 2 This is a flowchart illustrating a congestion control method for a data center network according to an embodiment of this application. (Refer to...) Figure 2 The method of this application includes:

[0055] Step 210: When the explicit congestion notification packet (CNP) returned by the second node is detected, the traffic type of the target data stream currently experiencing congestion is determined according to the explicit congestion notification packet. If the traffic type of the target data stream is cross-computing cluster traffic, the congestion location of the target data stream is determined. The target data stream is any one of the at least one data stream included in the target data stream.

[0056] Step 202: Determine the congestion level and transmission progress of the target data stream based on the information from the received historical explicit congestion notification packets.

[0057] Step 203: Reduce the transmission rate of the target data stream based on its traffic type, congestion location, congestion level, and transmission progress.

[0058] Before describing the control strategy of the transmitting end in this application, the congestion control strategy (CP algorithm) used in the switch will be described in detail below. Specifically, the congestion control strategy used in the switch is as follows:

[0059] When any data packet is received from the target data stream, determine the length of the queue at the switch's egress end;

[0060] If the length of the queue is less than the preset minimum threshold, the data packet is not explicitly marked, and the data packet is sent to the second node;

[0061] If the length of the queue is greater than or equal to the preset minimum threshold and less than or equal to the preset maximum threshold, the data packet is explicitly marked according to the marking probability matching the length of the queue, and the processed data packet is sent to the second node.

[0062] If the length of the queue at the switch's egress end is greater than the preset minimum threshold, the data packet is explicitly marked with a marking probability of 1, and the marked data packet is sent to the second node.

[0063] In this application, the CP algorithm is the same as the DCQCN algorithm. In the egress queue, if the queue length (QueueSize) exceeds a threshold, arriving packets are marked with an ECN. This is accomplished through the Random Early Detection (RED) feature supported by all modern switches. Marking congestion is a probabilistic function of the queue length, such as... Figure 3 As shown, two threshold values ​​for queue length define the marking probability. When the queue length is below the lower threshold, the ECN bit will not be marked. When the queue length exceeds the upper threshold, all packets transmitted from that queue will be marked with ECN. When the queue length is between the two threshold values, packets will be marked with ECN with a probability that increases linearly with the queue length. Figure 3 This is a schematic diagram of a switch data packet marking algorithm according to an embodiment of this application.

[0064] Specifically, the core of the CP algorithm is to use the RED mechanism, which is commonly supported by switches, to predict congestion by probabilistically marking data packets before the queue becomes completely full and packet loss occurs.

[0065] The implementation process of the CP algorithm includes the following aspects:

[0066] (1) Monitoring object: The switch monitors the real-time length of the queue (Egress Queue) of its egress port (usually referring to the number of data packets or bytes buffered in the queue).

[0067] (2) Key Thresholds: The algorithm pre-sets two key queue length thresholds, including and . The minimum threshold is used to determine if the network is considered unobstructed when the queue length is below this value. This is the maximum threshold value; when the queue length exceeds this value, the network is considered to be significantly congested.

[0068] (3) Marking logic (based on queue length): When a data packet arrives at the switch and is about to enter the egress queue, the switch checks the current length of the queue and decides whether to mark the data packet with ECN according to the following rules:

[0069] A: If the queue length is less than the minimum threshold, it indicates that the network is unobstructed, and the switch does not mark the data packet with ECN; the marking probability is 0. Figure 3 Midline segment 0 to Partially shown.

[0070] B: When the queue length is between the minimum and maximum thresholds, it indicates an early warning stage of congestion. The switch marks the packet with an ECN (Extraneous Communication Number) with a certain probability. This probability increases linearly with the increase of the queue length, starting from 0 (when...). (Time) linearly increases to (when (time), such as Figure 3 From the middle point Time The slashed part is shown. It is usually a value close to 1, or it can be 1 depending on the specific configuration of RED.

[0071] C: When the queue length exceeds the maximum threshold, it indicates significant network congestion. The switch deterministically (i.e., with 100% probability) marks all packets arriving in that queue with an ECN (Engineering Change Notice). The marking probability is 1. Figure 3 middle The horizontal line that follows is shown.

[0072] In summary, the CP algorithm utilizes the RED mechanism to... and These two thresholds divide the queue state into three regions: a smooth zone (unmarked), a warning zone (probabilistically marked), and a congested zone (certainly marked). This allows for early detection of congestion as soon as signs of network congestion appear (when the queue length is within a certain range). and Instead of waiting for the queue to be completely full before dropping packets, congestion control can be achieved earlier by notifying the receiver via ECN tags and influencing the sender's sending strategy through the receiver.

[0073] Next, we will introduce the control strategy of the receiver NP algorithm in this application. Specifically, the control strategy (NP algorithm) of the receiver is as follows:

[0074] After the first detection of a packet carrying an explicit tag, an explicit congestion notification packet is generated for the target data stream containing the packet, and the explicit congestion notification packet is returned to the first node;

[0075] After a preset time interval, determine whether at least one data packet carrying an explicit tag has been received from the target data stream within the preset time interval. If the result is yes, generate an explicit congestion notification packet for the target data stream and return the explicit congestion notification packet to the first node.

[0076] In this application, in the NP algorithm, the receiving end NP determines that network congestion has occurred after receiving a data packet marked with ECN, and then transmits this information back to the sending end RP. The RoCEv2 standard defines the Explicit Congestion Notification Packet (CNP) for this purpose. The NP algorithm is used to specify how and when to generate CNPs.

[0077] For each data stream, the algorithm follows Figure 4 The state machine is shown. If a tagged packet for a data stream arrives at the receiver and no CNP has been sent for that data stream in the last N microseconds, a CNP is sent immediately. Then, if any packets arriving within that time window are tagged, the network interface card (NIC) generates at most one CNP packet for that data stream every N microseconds. This application uses... Processing tagged packets and generating CNPs is costly, therefore this application minimizes the activity per tagged packet. Figure 4 This is a schematic diagram of the state machine of a receiving end according to an embodiment of this application. Figure 4 middle, .

[0078] In this application, the NP algorithm is a key component deployed at the receiver in the LLM-CC (and its underlying DCQCN) congestion control framework. Its core objective is to monitor the ECN markers set by the network switch (CP) in the incoming data stream and, when a congestion signal is detected, to feed back congestion information to the sender (RP) of the data stream according to specific rules so that the sender can adjust its transmission rate accordingly.

[0079] The NP algorithm operates based on each independent data stream and maintains corresponding state information for each data stream. Its main implementation process revolves around the generation and transmission logic of CNP, which strictly follows the rate limiting principle. The specific steps are as follows:

[0080] (1) Congestion event detection: When the receiving end (NP) receives a data packet that is marked with ECN Congestion Experience (ECN-Marked) in its IP header or transport protocol header, the event is regarded as a congestion indication signal.

[0081] (2) State variables and timers: For each data stream, the NP maintains at least one state variable and one timer. The state variable is used to record whether a CNP has been sent for the data stream in the most recent time window, and the timer is used to implement the rate limit for CNP generation.

[0082] (3) CNP generation rules - first trigger:

[0083] Judgment criteria: When a data stream (e.g., the target data stream) is detected to have received a packet marked with an ECN, the NP algorithm first checks the state variables associated with the data stream to determine whether a CNP has been sent for the data stream within the most recent N microsecond time window.

[0084] Triggering action: If the determination result is negative (i.e., no CNP has been sent within N microseconds), the NP algorithm immediately generates a CNP for the data stream and sends it back to the sender (RP) of the data stream over the network. At the same time, the state variable of the stream is updated (marked as "sent") and the N microsecond timer is started (or reset).

[0085] (4) CNP generation rules - subsequent suppression and rate limiting:

[0086] Suppression Phase: After successfully sending a CNP and starting an N-microsecond timer, a suppression period begins. During these N microseconds, even if the data stream receives more ECN-tagged packets, the NP algorithm will not immediately generate a new CNP.

[0087] Periodic Check and Generation: When the N-microsecond timer expires, the NP algorithm checks whether at least one ECN-tagged packet for the data stream was received within the recently passed N-microsecond time window. If the check is yes, the NP algorithm generates at most one new CNP for the data stream and sends it to the sender, then resets the N-microsecond timer again and enters the next suppression and check cycle. If the check is no, the NP algorithm does not generate a new CNP, and the state can return to the waiting-for-first-trigger state. This mechanism ensures that for any data stream continuously experiencing congestion, the CNP sending frequency is strictly limited to no more than one per N microsecond.

[0088] (5) Parameter configuration and optimization: The time window parameter N is configurable, for example, it can be set to 50 microseconds.

[0089] In summary, the NP algorithm effectively detects and responds to network congestion signals by monitoring ECN tags, managing the state of each data stream, and implementing a strict N-microsecond time window rate limiting mechanism. This ensures that congestion information can be transmitted to the sender in a timely manner (through the first-trigger mechanism) and efficiently (through the rate limiting mechanism).

[0090] In this application, the traffic types of the target data flow include cross-computing cluster traffic and intra-computing cluster traffic, and the congestion locations include links connecting the first and second computing clusters and the internal network of the first computing cluster. Accordingly, step 203 may include:

[0091] Step 2031: If the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is the link connecting the first and second intelligent computing clusters, determine the first multiplicative rate reduction factor according to the congestion level and transmission progress, determine the first rate according to the first multiplicative rate reduction factor, and control the transmission rate of the target data stream according to the first rate.

[0092] Step 2032: If the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is the internal network of the computing cluster, determine the second multiplicative rate reduction factor according to the congestion level and transmission progress, determine the second rate according to the second multiplicative rate reduction factor, and control the transmission rate of the target data stream according to the second rate.

[0093] Step 2033: If the type of communication flow is intra-cognitive cluster traffic, determine the third multiplicative rate reduction factor based on the congestion level and transmission progress, determine the third rate based on the third multiplicative rate reduction factor, and control the transmission rate of the target data flow according to the third rate.

[0094] The control strategy (RP algorithm) of the sending end (i.e., the first node) of this application will be described in detail below.

[0095] 1. :

[0096] In this application, time is divided into configurable time slots. Each time slot indicates whether a CNP arrives within that time slot. (Right now The parameter is a constantly changing average value, representing the proportion of time intervals in which CNPs arrive (if more than one CNP arrives within the same time interval, it has the same effect as if only one CNP arrives). At the end of each time interval, Update using the following formula g is a constant parameter between 0 and 1, and CNP_arrived is a bit field used to indicate whether a CNP arrived in the previous time slot.

[0097] In this application, The function is a core component of the sender (RP) congestion control logic. Its main objective is to dynamically calculate and maintain a key state parameter. .this The parameter is designed to quantify and smoothly reflect the degree of congestion experienced on recent network paths, based on the arrival frequency of Congestion Notification Packets (CNPs) returned by the receiver (NP). The calculated... The value will serve as the input for the RP algorithm to make subsequent rate adjustment decisions.

[0098] The function's implementation is based on the following key mechanisms:

[0099] (1) Time discretization and periodic execution: The algorithm divides continuous time into discrete, configurable "time gaps". Figure 5 The state machine diagram shown in the RP diagram This is used to define these time intervals. The function is called and executed periodically, and its triggering event is... Expiry date ( That is, it is executed once at the end of each time interval. Value update calculation. Figure 5 This is a schematic diagram of the state machine of a transmitting end shown in one embodiment of this application.

[0100] (2) Congestion Signal Monitoring: In each time slot, the RP monitors whether at least one CNP arrives from the receiver. For The calculation of the value only cares about whether the CNP exists within the time interval, and does not care about the specific number of CNPs that arrive.

[0101] (3) State variables RP maintains a Boolean (or bit) state variable. At the end of each time interval, execution Previously, this variable was set based on the monitoring results within the just-ended time interval: if a CNP arrived during this period, then... Set to 1; if no CNP arrives, then... Set to 0.

[0102] (4) Exponential Moving Average (EMA) Update: The parameter itself represents an exponential moving average, reflecting the proportion of time intervals containing CNP arrival events in recent history. The function is updated using the following formula. value: . For the new time interval after the current time interval is calculated Value. Among them, This represents the old value calculated at the end of the previous time interval. The value g represents a pre-configured smoothing factor. It determined history The weight of the value and the current interval state; the larger the value of g, the greater the historical influence. The more gradual the change, the more sensitive it is to recent changes. This indicates the value (0 or 1) determined in the above steps, representing whether a CNP has arrived in the most recent time interval.

[0103] The EMA formula effectively incorporates historical average congestion levels ( (and the latest congestion indications) Combined. If a CNP arrives within the nearest interval. ,but It will adjust in the direction of 1 (increasing or slowing the descent); if no CNP arrives. ,but It will adjust towards 0 (reduce or slow the rise).

[0104] The RP algorithm passes through The function, utilizing monitoring based on fixed time intervals and exponential moving average calculations, transforms discrete CNP arrival events into a continuous, smoothly varying congestion level indicator parameter. . The parameters can reflect recent congestion trends and are used by the RP algorithm in subsequent, more refined rate control decisions, thereby achieving an adaptive response to network congestion conditions.

[0105] 2. Slowing down strategy.

[0106] In this application, firstly based on Congestion is categorized into two types: traffic across computing clusters. Traffic within the intelligent computing cluster Then, based on the BTH identifier, congestion occurs in the links between data centers (intelligent computing clusters) for cross-DC traffic. Congestion of cross-DC traffic links within the data center (intelligent computing cluster) Different speed-down strategies are implemented respectively.

[0107] CutRate indicates rate reduction. The CutRate rate reduction process is triggered when the sender (RP) receives a Congestion Notification Packet (CNP) sent back by the receiver (NP). The core objective of this process is to respond to detected network congestion by reducing the transmission rate to alleviate congestion. A key feature of LLM-CC is its implementation of differentiated rate reduction strategies, that is, applying different rate adjustment logic based on the inferred link type and location of the congestion.

[0108] CutRate's execution flow follows a structured decision-making process, as follows:

[0109] (1) Preliminary determination of traffic type (based on ): Distinguish whether the currently congested communication flow (queue pair, QP) belongs to cross-computing cluster / cross-data center ( Traffic type: within the intelligent computing cluster / data center Traffic type. The primary criterion for determination is the baseline round-trip time of the data stream. By Compared with a preset threshold, distinguish long latency (usually corresponding to) ) and short latency (usually corresponding to The connection of ). This determination corresponds to the process. Figure 6 The first diamond-shaped decision node in the process categorizes the traffic as follows: or . Figure 6 This is a schematic diagram illustrating a deceleration control principle in one embodiment of this application.

[0110] (2) Congestion location subdivision (for) (Based on BTH): If the traffic type was determined in the previous step to be... Then, it is necessary to further refine the possible locations where congestion may occur. This step utilizes specific information in the data packet transmission protocol header, namely the BTH identifier (corresponding to the process). Figure 6 In By parsing the BTH information, the algorithm infers that this congestion event occurred on a long-distance link connecting different data centers (intelligent computing clusters) (marked as...). ), or it occurred within the internal network of a remote data center (intelligent computing cluster) (marked as This judgment corresponds to the process. Figure 6 The second diamond-shaped decision node from the bottom in the middle, only when Execute on the branch.

[0111] (3) Select and execute a specific deceleration function:

[0112] Based on the congestion scenario classification completed through the above steps, the algorithm will select and call one of three independent deceleration processing functions:

[0113] When judged as And the congestion type is Called at any time.

[0114] When judged as And the congestion type is Called at any time.

[0115] When the first step is determined to be It is invoked when congestion is detected.

[0116] Each The function encapsulates multiplicative decrement (MD) logic tailored to the specific congestion scenario. This process involves calculating a specific multiplicative decrement factor. This factor depends on congestion perception. (i.e., congestion level), transmission rate ratio, and other parameters, and apply this factor to reduce the current transmission rate. Different processing functions use The calculation formulas or parameters can be different to achieve different speed reduction effects.

[0117] In conclusion, of The deceleration mechanism employs a two-stage classification process (first distinguishing...). Traffic, traffic, and then to Traffic is segmented into congestion locations, mapping received congestion signals to three specific congestion scenarios. Then, for each scenario, a specially designed processing function is invoked to perform differentiated multiplicative slowdown operations. This fine-grained processing aims to respond more accurately to network congestion of different natures, alleviating congestion while minimizing false positives on non-congested paths or traffic contributing little to congestion, thereby improving overall network performance and resource utilization.

[0118] In this application, after step 203, the method further includes:

[0119] When a speed-up event is first detected, the transmission rate of the target data stream is controlled to enter a fast recovery mode. In fast recovery mode, the transmission rate approximates the target rate in logarithmic form. The target rate is the transmission rate of the target data before the transmission rate of the target data stream is reduced.

[0120] When the number of detected speed-up events meets the first preset condition, the transmission rate of the target data stream is controlled to switch from fast recovery mode to aggressive increase mode. In aggressive increase mode, the transmission rate approaches the target rate by increasing at a first preset constant value.

[0121] When the number of detected speed-up events meets the second preset condition, the transmission rate of the target data stream is controlled to switch from aggressive increase mode to super aggressive increase mode. In super aggressive increase mode, the transmission rate approaches the target rate with an increase of the second preset constant value, which is greater than the first preset constant value.

[0122] In any of the fast recovery mode, aggressive increase mode, and ultra-aggressive increase mode, if an explicit congestion notification packet returned by the second node is detected, the step of reducing the transmission rate of the target data stream is re-executed.

[0123] In one implementation, the acceleration event is generated when the following rule is met:

[0124] No explicit congestion notification packets were detected within a preset time interval, and the preset amount of data in the target data stream was successfully sent.

[0125] The following section details the speed-up strategy for the sending end.

[0126] In this application, The acceleration logic is divided into three sequential phases: rapid recovery, aggressive increase (keeping the probe in place), and super aggressive increase (keeping the probe in place).

[0127] The transition from one stage to the next is defined by the number of acceleration events counted in that stage. Once the number of acceleration events in a stage exceeds a predefined threshold, the logic moves to the next stage. Deceleration events reset all acceleration-related counters and return to the fast recovery stage. Furthermore, once acceleration occurs, the current speed is saved in a register called [database name missing] before deceleration. The parameters are as follows: Since the last speed-up, if no speed-down event occurs after a predefined time interval or a predefined number of bytes sent, a speed-up event will occur. During the fast recovery phase, the speed is adjusted according to the parameters for each speed-up event. Increase by half the distance (that is, logarithmically closer), This allows for a rapid recovery to the speed at which congestion occurred at the start of the fast recovery phase, followed by a more cautious increase in speed as it approaches the congestion-inducing rate. In the latter two phases, the speed increases by a constant value once a speed-up event occurs. This allows for throughput gains during bandwidth release.

[0128] Specifically, in this application, The algorithm's acceleration logic is at the sending end. Perform a speed reduction operation Subsequently, no congestion notification packets were received during the following observation period. It is activated under certain conditions. Its overall goal is to cautiously increase the transmission rate after confirming that network congestion has been alleviated or eliminated, in order to probe and utilize potential available network bandwidth. The rate-up process is not continuous but consists of discrete rate-up events. drive.

[0129] In this application, a speed-up event is defined as: since the last rate adjustment (whether speed-up or speed-down), meeting any of the following conditions, and during this period, no further data is received. 1) After a predefined time interval (corresponding to the process) Figure 7 In ); 2) Successfully sent the predefined number of bytes (corresponding process) Figure 7 In ). Figure 7 This is a schematic diagram illustrating an acceleration process according to an embodiment of this application. In this diagram, T is used to count the cumulative number of detected acceleration events, and BC is used to count the cumulative number of times a predefined number of bytes have been successfully sent.

[0130] The ramp-up process of LLM-CC is designed as three sequentially executed phases to balance rate recovery speed and network stability:

[0131] Phase 1: Rapid Recovery .

[0132] Entry condition: This is the default initial stage, which is entered when a speed-up event is triggered for the first time after each slowdown event.

[0133] Core objective: To quickly restore the transmission rate to near pre-congestion levels. The algorithm utilizes a state variable. This variable stores the most recent execution. The sending rate value before the operation.

[0134] Rate update mechanism: During this phase, each time a speed-up event occurs, the current transmission rate ( (Or RC) Update according to the following formula: This method allows the rate to be approximated logarithmically. That is, the initial increment is large, and as near The increment gradually decreases, showing a pattern of rapid initial growth followed by cautious growth later.

[0135] Phase 2: Actively increasing ( (Keep detecting).

[0136] Entry condition: When the algorithm logic migrates to this stage after the first preset condition is met based on the cumulative number of acceleration events during the fast recovery phase, or as follows: Figure 7 In the middle, in satisfying When the first preset condition is met, the algorithm logic migrates to this stage.

[0137] Core objective: To linearly probe available bandwidth in a stable and conservative manner once the rate has recovered to a certain level.

[0138] Rate update mechanism: During this phase, the current sending rate is updated each time a speed-up event occurs. Add a fixed, smaller constant value. .Right now .

[0139] Phase 3: Super-positive increase ( (Keep detecting).

[0140] Entry condition: When the second preset condition is met based on the cumulative number of acceleration events during the active increase phase, the algorithm logic migrates to this phase, or as follows: Figure 7 In the middle, in satisfying When the second preset condition is met, the algorithm logic migrates to this stage.

[0141] Core objective: To proactively probe for potential higher bandwidth at a faster rate while maintaining stable network performance.

[0142] Rate update mechanism: During this phase, the current sending rate is updated each time a speed-up event occurs. Add a fixed, relatively large constant value. .Right now .

[0143] Phase transition and reset mechanism:

[0144] Phase transition: The conditions for moving from one phase to the next higher phase are determined by the cumulative number of consecutive "speed-up events" and the cumulative number of successfully sent predefined bytes within that phase. The first preset condition for moving from phase 1 to phase 2 is... The second precondition that must be met to transition from stage 2 to stage 3 is: .

[0145] Global Reset: At any time, once the sender (RP) receives a CNP (i.e., a "deceleration event" occurs), the entire three-stage ramp-up logic will be immediately interrupted and reset. All ramp-up-related states (including the current stage, accumulated ramp-up event counters T and BC, etc.) will be restored to their initial states. The next time the "deceleration event" condition is met, execution will restart from stage 1 fast recovery.

[0146] In summary, the acceleration mechanism of LLM-CC involves a structured three-stage process (fast recovery). Actively increase The rate increase is achieved through a super-aggressive increase. This mechanism utilizes... A rapid initial recovery is initiated, followed by continuous bandwidth probing using two additive increments of varying magnitudes. The transition between phases is controlled by the cumulative number of uninterrupted acceleration events, while the presence of any congestion signal (CNP) triggers an immediate reset, ensuring a rapid response and adaptability to changes in network congestion during the acceleration process. This design aims to efficiently utilize network bandwidth while maintaining the stability of the probing process.

[0147] Next, according to Figure 8 This section provides a detailed explanation of the meaning of each key mathematical formula in the LLM-CC algorithm. Figure 8 This is a schematic diagram of the parameters of the key mathematical formulas in the LLM-CC algorithm shown in one embodiment of this application.

[0148] (1) . The transmission progress represents the amount of data already sent in the current data transmission task. This accounts for a portion of the total amount of data that needs to be sent. The ratio. For example, if a large file is 100MB in total and 30MB has already been sent, then the ratio is 0.3 (or 30%). The ratio is used to adjust the intensity of slowing down or speeding up the transmission.

[0149] (2) .

[0150] When using this formula, calculate the multiplicative reduction factor. Update the current rate . This refers to the recent level of congestion perceived by the sending end, as mentioned earlier.

[0151] (3) .

[0152] When using this formula, calculate the multiplicative reduction factor. Update the current rate .

[0153] (4) .

[0154] When using this formula, .

[0155] (5) .

[0156] When using this formula, RC is the current rate, RT is... .

[0157] (6) .

[0158] When applying this formula, the additive growth rate adjustment factor... Target rate Current actual rate θ is a growth factor. This is used to fine-tune the increment step size. During the aggressive increment phase, the target rate RT itself increases by an amount equal to a base, smaller increment step size. Multiplied by the adjustment factor .

[0159] (7) .

[0160] When applying this formula, the additive growth rate adjustment factor... Target rate Current actual rate During the super-aggressive growth phase, the increase in the target rate RT is equal to a base, relatively large growth step. Multiplied by the adjustment factor .

[0161] Next, according to Figure 9 This section provides a detailed explanation of the core control logic in the LLM-CC algorithm. Figure 9 This is a schematic diagram of the control logic of the LLM-CC algorithm shown in one embodiment of this application.

[0162] exist Figure 9 In the pseudocode The core logic of the LLM-CC congestion control algorithm at the sender (RP) is defined. It integrates two main functions: congestion control (rate adjustment) and priority scheduling. The algorithm is implemented through three independent functions: Used for congestion control when handling connections across data centers. Congestion control for handling internal data center connections, and Used to implement priority scheduling based on application characteristics (such as AI training). Figure 9 middle, These represent the dynamic adjustment factors calculated from the multiplicative decrease and the additive increase, respectively. A smoothed average reflecting recent congestion levels. This is a factor used to adjust for the additive growth effect. For the current transmission rate RC, The base growth rate step size for the active and super-active growth phases.

[0163] (function (Lines 1-11): This function is responsible for handling cross-datacenter operations ( Queue pairs The congestion control logic, i.e., rate adjustment for long-distance, high-latency connections, is implemented. The execution flow is as follows:

[0164] Line 3: Based on the ECN flag bit of the received data packet and basic transport header information Determine the current congestion type This corresponds to In the process of targeting Steps for breaking down traffic congestion locations.

[0165] Line 4: Calculate the completion rate of the current transmission task. .

[0166] Lines 5-6 (processing) If congestion is determined to occur on links between data centers: calculate the multiplicative deceleration factor. The formula is Applying multiplicative speed reduction: Current rate According to the calculation and congestion awareness Reduce it.

[0167] Lines 7-8 (processing) If the congestion is determined to occur on an internal link within a remote data center: calculate the multiplicative deceleration factor. The formula is Applying multiplicative speed reduction: .

[0168] Lines 9-11 (processing) If no congestion is determined, calculate the additive growth factor. The formula is And note This indicates that here Primarily determined by the ratio. Applying additive growth: The current rate Rcur increases by an amount equal to the base growth step size. and regulatory factors Decision. This corresponds to the operation in the aggressive increase or super aggressive increase phase of the acceleration logic.

[0169] (function (Lines 12-21): This function is responsible for handling the internal data center processes. queue pairs The congestion control logic is for rate adjustment of short-distance, low-latency connections. The execution flow is as follows:

[0170] Line 14: Get congestion type (usually) or ).

[0171] Line 15: Calculate transmission progress .

[0172] Lines 16-18 (processing) If no congestion is determined, calculate the additive growth factor. Application-based additive growth: .

[0173] Lines 19-21 (processing) If congestion is detected, calculate the multiplicative deceleration factor. The formula is Applying multiplicative speed reduction: .

[0174] (function (Lines 22-27): This function is independent of congestion control and is responsible for prioritizing upcoming data packets based on the characteristics of the application (especially AI model training). The execution flow is as follows:

[0175] Line 24: Obtain the current communication round (CommRound) information from the application layer (Model_Framework).

[0176] Line 25: Calculate a priority value based on the communication round CommRound. Modulo operation is used here. This indicates that priorities may change periodically.

[0177] Line 26: Set the calculated priority value Prior to a specific field in the packet header (e.g., the DSCP field in the IP header).

[0178] Line 27: Place this packet with the priority tag into the corresponding hardware send queue on the network interface card (NIC). The network interface card (NIC) hardware then determines the actual transmission order of data packets based on queue priority. The purpose is to achieve fine-grained scheduling of specific data streams (such as different data blocks in AI training), prioritize the transmission of critical data, and ensure the synchronization and efficiency of the application layer (such as model training).

[0179] In conclusion, Figure 9 In LLM-CC, an integrated transmitter algorithm framework is described. It achieves this through... and The function implements connection type-based functionality. and congested locations Differentiated congestion perception and rate adjustment (including multiplicative deceleration and additive acceleration). Simultaneously, through... The function introduces an application-aware priority scheduling mechanism, allocating data packets to different sending queues based on their importance in a specific application (such as an AI training round). The combination of these two aspects aims to comprehensively optimize end-to-end transmission performance for high-performance applications in complex network environments, especially across multi-domain computing clusters.

[0180] When the target data stream is cross-computing cluster traffic, this application further subdivides the congestion location of the target data stream and controls it accordingly. This can effectively address the long latency and insufficient bandwidth caused by cross-domain connections, better adapt to the traffic patterns of intelligent computing cluster networks, fully utilize network performance, and overcome the problem of insufficient network utilization in existing congestion control algorithms in cross-domain multi-computing cluster network environments. It can significantly improve the efficiency of tasks such as large model training based on intelligent computing clusters.

[0181] The congestion control device for the data center network provided in this application is described below. The congestion control device described below can be referred to in correspondence with the congestion control method described above.

[0182] Figure 10 This is a structural block diagram illustrating a congestion control device for a data center network according to an embodiment of this application. (Refer to...) Figure 10 The congestion control device 1000 of this application is deployed on a first node in a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node is communicatively connected to a second node. The second node is located within the first intelligent computing cluster or within a second intelligent computing cluster different from the first intelligent computing cluster. The intelligent computing cluster is located in a data center. The first node is used to transmit target data to the second node. The network congestion control device 1000 includes:

[0183] The first determining module 1001 is used to determine the traffic type of the target data stream currently experiencing congestion based on the explicit congestion notification packet when the explicit congestion notification packet is detected returned by the second node, and to determine the congestion location of the target data stream when the traffic type of the target data stream is cross-computing cluster traffic, wherein the target data stream is any one of at least one data stream included in the target data stream.

[0184] The second determining module 1002 is used to determine the congestion level and transmission progress of the target data stream based on the information of the received historical explicit congestion notification packets.

[0185] The speed reduction module 1003 is used to reduce the transmission rate of the target data stream based on the traffic type of the target data stream, the congestion location, the congestion degree, and the transmission progress.

[0186] According to the congestion control device 1000 of this application, the traffic type of the target data stream includes cross-computing cluster traffic and intra-computing cluster traffic, and the congestion location includes the link connecting the first computing cluster and the second computing cluster and the internal network of the first computing cluster; the speed reduction module 1003 includes:

[0187] The first speed-reduction submodule is used to determine a first multiplicative speed-reduction factor based on the congestion level and the transmission progress if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is a link connecting the first computing cluster and the second computing cluster, determine a first rate based on the first multiplicative speed-reduction factor, and control the transmission rate of the target data stream according to the first rate if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is a link connecting the first computing cluster and the second computing cluster.

[0188] The second speed-reduction submodule is used to determine a second multiplicative speed-reduction factor based on the congestion level and the transmission progress if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is the internal network of the computing cluster, determine a second rate based on the second multiplicative speed-reduction factor, and control the transmission rate of the target data stream according to the second rate.

[0189] The third speed-reduction submodule is used to determine a third multiplicative speed-reduction factor based on the congestion level and the transmission progress if the type of the communication flow is intra-computing cluster traffic, determine a third rate based on the third multiplicative speed-reduction factor, and control the transmission rate of the target data flow according to the third rate.

[0190] The congestion control device 1000 according to this application further includes:

[0191] The first acceleration module is used to control the transmission rate of the target data stream to enter a fast recovery mode when an acceleration event is detected for the first time. In the fast recovery mode, the transmission rate approximates the target rate in logarithmic form. The target rate is the transmission rate of the target data before the transmission rate of the target data stream is reduced.

[0192] The second acceleration module is used to control the transmission rate of the target data stream to switch from the fast recovery mode to the aggressive increase mode when the number of detected acceleration events meets the first preset condition. In the aggressive increase mode, the transmission rate approaches the target rate by an increase of a first preset constant value.

[0193] The third acceleration module is used to control the transmission rate of the target data stream to switch from the aggressive increase mode to the super aggressive increase mode when the number of detected acceleration events meets the second preset condition. In the super aggressive increase mode, the transmission rate approaches the target rate with an increase of a second preset constant value, where the second preset constant value is greater than the first preset constant value.

[0194] The congestion control device 1000 according to this application further includes:

[0195] The monitoring module is configured to, in any of the fast recovery mode, the aggressive increase mode, and the ultra-aggressive increase mode, if an explicit congestion notification packet returned by the second node is detected, re-execute the step of reducing the transmission rate of the target data stream.

[0196] According to the congestion control device 1000 of this application, the acceleration event is generated when the following rules are met:

[0197] No explicit congestion notification packet was detected within a preset time interval, and a preset amount of data (in bytes) in the target data stream was successfully sent.

[0198] According to the congestion control device 1000 of this application, the first node is communicatively connected to the second node through a switch, and the congestion control strategy adopted in the switch is as follows:

[0199] Upon receiving any data packet from the target data stream, determine the length of the queue at the switch's egress end;

[0200] If the length of the queue is less than a preset minimum threshold, the data packet is not explicitly marked, and the data packet is sent to the second node;

[0201] If the length of the queue is greater than or equal to the preset minimum threshold and less than or equal to the preset maximum threshold, the data packet is explicitly marked according to the marking probability matching the length of the queue, and the processed data packet is sent to the second node;

[0202] If the length of the queue at the egress end of the switch is greater than the preset minimum threshold, the data packet is explicitly marked according to the marking probability of 1, and the marked data packet is sent to the second node.

[0203] According to the congestion control device 1000 of this application, the congestion control strategy adopted in the second node is as follows:

[0204] Upon first detection of a data packet carrying the explicit tag, an explicit congestion notification packet is generated for the target data stream containing the data packet, and the explicit congestion notification packet is returned to the first node;

[0205] After a preset time interval, it is determined whether at least one data packet carrying the explicit tag of the target data stream is received within the preset time interval. If the result is yes, an explicit congestion notification packet is generated for the target data stream, and the explicit congestion notification packet is returned to the first node.

[0206] Figure 11 This is a schematic diagram of the physical structure of an electronic device according to an embodiment of this application, as shown below. Figure 11 As shown, the electronic device may include a processor 1110, a communications interface 1120, a memory 1130, and a communication bus 1140, wherein the processor 1110, the communications interface 1120, and the memory 1130 communicate with each other through the communication bus 1140. The processor 1110 can call logical instructions in the memory 1130 to execute a network congestion control method.

[0207] Furthermore, the logical instructions in the aforementioned memory 1130 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0208] On the other hand, this application also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute a data center network congestion control method provided by the above methods.

[0209] In another aspect, this application also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform a congestion control method for a data center network provided by the methods described above.

[0210] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0211] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0212] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A congestion control method for a data center network, characterized in that, A network congestion control method is applied to a first node in a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node is communicatively connected to a second node, which is located either within the first intelligent computing cluster or within a second intelligent computing cluster different from the first intelligent computing cluster. The intelligent computing clusters are located in a data center. The first node is used to transmit target data to the second node. When an explicit congestion notification packet returned by the second node is detected, the traffic type of the target data stream currently experiencing congestion is determined based on the explicit congestion notification packet. If the traffic type of the target data stream is cross-computing cluster traffic, the congestion location of the target data stream is determined. The target data stream is any one of at least one data stream included in the target data stream. The traffic type of the target data stream includes cross-computing cluster traffic and intra-computing cluster traffic. The congestion location includes the link connecting the first intelligent computing cluster and the second intelligent computing cluster and the internal network of the first intelligent computing cluster. Based on the information received from historical explicit congestion notification packets, determine the congestion level and transmission progress of the target data stream; The method of reducing the transmission rate of the target data stream based on its traffic type, congestion location, congestion level, and transmission progress includes: if the target data stream's traffic type is cross-computing cluster traffic and the congestion location is a link connecting the first and second intelligent computing clusters, a first multiplicative rate reduction factor is determined based on the congestion level and transmission progress; a first rate is determined based on the first multiplicative rate reduction factor; and the transmission rate of the target data stream is controlled according to the first rate. If the target data stream's traffic type is cross-computing cluster traffic and the congestion location is within the intelligent computing cluster's internal network, a second multiplicative rate reduction factor is determined based on the congestion level and transmission progress; a second rate is determined based on the second multiplicative rate reduction factor; and the transmission rate of the target data stream is controlled according to the second rate. If the target data stream's traffic type is intra-computing cluster traffic, a third multiplicative rate reduction factor is determined based on the congestion level and transmission progress; a third rate is determined based on the third multiplicative rate reduction factor; and the transmission rate of the target data stream is controlled according to the third rate.

2. The congestion control method according to claim 1, characterized in that, After reducing the transmission rate of the target data stream, the method further includes: Upon first detection of a speed-up event, the transmission rate of the target data stream is controlled to enter a fast recovery mode. In the fast recovery mode, the transmission rate approximates the target rate in logarithmic form. The target rate is the transmission rate of the target data stream before the transmission rate of the target data stream is reduced. When the number of detected acceleration events meets the first preset condition, the transmission rate of the target data stream is controlled to switch from the fast recovery mode to the aggressive increase mode. In the aggressive increase mode, the transmission rate approaches the target rate by an increase of a first preset constant value. When the number of detected speed-up events meets the second preset condition, the transmission rate of the target data stream is controlled to switch from the aggressive increase mode to the super aggressive increase mode. In the super aggressive increase mode, the transmission rate approaches the target rate with an increase of a second preset constant value, where the second preset constant value is greater than the first preset constant value.

3. The congestion control method according to claim 2, characterized in that, The method further includes: In any of the fast recovery mode, the aggressive increase mode, and the ultra-aggressive increase mode, if an explicit congestion notification packet returned by the second node is detected, the step of reducing the transmission rate of the target data stream is re-executed.

4. The congestion control method according to claim 2, characterized in that, The acceleration event is generated when the following rules are met: No explicit congestion notification packet was detected within a preset time interval, and a preset amount of data (in bytes) in the target data stream was successfully sent.

5. The congestion control method according to claim 1, characterized in that, The first node communicates with the second node through a switch, and the congestion control strategy used in the switch is as follows: Upon receiving any data packet from the target data stream, determine the length of the queue at the switch's egress end; If the length of the queue is less than a preset minimum threshold, the data packet is not explicitly marked, and the data packet is sent to the second node; If the length of the queue is greater than or equal to the preset minimum threshold and less than or equal to the preset maximum threshold, the data packet is explicitly marked according to the marking probability matching the length of the queue, and the processed data packet is sent to the second node; If the length of the queue at the egress end of the switch is greater than the preset minimum threshold, the data packet is explicitly marked according to the marking probability of 1, and the marked data packet is sent to the second node.

6. The congestion control method according to claim 5, characterized in that, The congestion control strategy used in the second node is as follows: Upon first detection of a data packet carrying the explicit tag, an explicit congestion notification packet is generated for the target data stream containing the data packet, and the explicit congestion notification packet is returned to the first node; After a preset time interval, it is determined whether at least one data packet carrying the explicit tag of the target data stream is received within the preset time interval. If the result is yes, an explicit congestion notification packet is generated for the target data stream, and the explicit congestion notification packet is returned to the first node.

7. A congestion control device for a data center network, characterized in that, A first node is configured in a first intelligent computing cluster in a multi-intelligent computing cluster network. The first node and a second node are communicatively connected. The second node is located within the first intelligent computing cluster or within a second intelligent computing cluster different from the first intelligent computing cluster. The intelligent computing cluster is located in a data center. The first node is used to transmit target data to the second node. The network congestion control device includes: The first determining module is configured to, upon detecting an explicit congestion notification packet returned by the second node, determine the traffic type of the target data stream currently experiencing congestion based on the explicit congestion notification packet, and determine the congestion location of the target data stream if the traffic type of the target data stream is cross-computing cluster traffic. The target data stream is any one of at least one data stream included in the target data stream, the traffic type of the target data stream includes cross-computing cluster traffic and intra-computing cluster traffic, and the congestion location includes the link connecting the first intelligent computing cluster and the second intelligent computing cluster and the internal network of the first intelligent computing cluster. The second determining module is used to determine the congestion level and transmission progress of the target data stream based on the information of the received historical explicit congestion notification packets. A rate-reduction module is used to reduce the transmission rate of the target data stream based on the traffic type of the target data stream, the congestion location, the congestion degree, and the transmission progress. The speed reduction module includes: The first speed-reduction submodule is used to determine a first multiplicative speed-reduction factor based on the congestion level and the transmission progress if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is a link connecting the first computing cluster and the second computing cluster, determine a first rate based on the first multiplicative speed-reduction factor, and control the transmission rate of the target data stream according to the first rate if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is a link connecting the first computing cluster and the second computing cluster. The second speed-reduction submodule is used to determine a second multiplicative speed-reduction factor based on the congestion level and the transmission progress if the traffic type of the target data stream is cross-computing cluster traffic and the congestion location is the internal network of the computing cluster, determine a second rate based on the second multiplicative speed-reduction factor, and control the transmission rate of the target data stream according to the second rate. The third speed-reduction submodule is used to determine a third multiplicative speed-reduction factor based on the congestion level and the transmission progress if the traffic type of the target data stream is intra-computing cluster traffic, determine a third rate based on the third multiplicative speed-reduction factor, and control the transmission rate of the target data stream according to the third rate.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements a congestion control method for a data center network as described in any one of claims 1 to 6.

9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements a congestion control method for a data center network as described in any one of claims 1 to 6.