Congestion control capability test method and electronic device

By performing point-to-point testing and low-flow latency testing between devices within the RDMA network, the problem of evaluating congestion control capabilities without accessing the switch is solved, enabling fast and accurate congestion control capability testing and improving testing efficiency and accuracy.

WO2026138047A1PCT designated stage Publication Date: 2026-07-02CHINA TELECOM CORP LTD TECHNOLOGY INNOVATION CENTER +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CHINA TELECOM CORP LTD TECHNOLOGY INNOVATION CENTER
Filing Date
2025-09-28
Publication Date
2026-07-02

Smart Images

  • Figure CN2025124929_02072026_PF_FP_ABST
    Figure CN2025124929_02072026_PF_FP_ABST
Patent Text Reader

Abstract

A congestion control capability test method and an electronic device. The method comprises: controlling an i-th device among N devices in a network under test to send first traffic data to a j-th device so as to acquire an ij-th transmission parameter; controlling the j-th device to send the first traffic data to the i-th device so as to acquire a ji-th transmission parameter, thereby acquiring N(N-1) transmission parameters; on the basis of the N(N-1) transmission parameters, determining whether the N devices and a switch are faulty; among M non-faulty devices, controlling M-2 devices to send second traffic data to a k-th device and controlling one device to send the second traffic data and third traffic data to the k-th device so as to acquire a k-th small flow delay corresponding to the third traffic data; and on the basis of the N(N-1) transmission parameters and M small flow delays corresponding to the third traffic data, determining whether the congestion control capability of the network under test is acceptable. In embodiments of the present disclosure, the congestion control capability can be tested without accessing the switch.
Need to check novelty before this filing date? Find Prior Art

Description

Congestion control capability testing methods and electronic equipment

[0001] Cross-references

[0002] This disclosure claims priority to Chinese Patent Application No. 2024119104608, filed on December 23, 2024, entitled "Congestion Control Capability Testing Method and Electronic Equipment", the entire contents of which are incorporated herein by reference. Technical Field

[0003] This disclosure relates to the Internet field, and more specifically, to a method and electronic device for testing congestion control capabilities. Background Technology

[0004] Currently, programmable network congestion control technology is becoming increasingly mature, and congestion control algorithms in RDMA networks within network clusters, such as intelligent computing centers and storage centers, are being updated and iterated rapidly. After each algorithm iteration, the congestion control capability of the algorithm needs to be evaluated to ensure that it meets business requirements. Performance evaluation of the congestion control algorithm is a crucial step in verifying algorithm performance.

[0005] In related technologies, it is necessary to determine the congestion control capability of the algorithm by accessing the switch queue. However, since switches in the existing network are usually maintained by dedicated personnel, it is usually difficult for end-side network card maintenance personnel to access them, making deployment difficult.

[0006] Therefore, there is a need for a method that can accurately and efficiently evaluate the congestion control capabilities of congestion control algorithms without accessing a switch.

[0007] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0008] The purpose of this disclosure is to provide a congestion control capability testing method and electronic device for automatically and accurately evaluating the congestion control capability of a network under test without accessing a switch.

[0009] According to a first aspect of the present disclosure, a congestion control capability testing method is provided, comprising: determining N devices in a network under test, wherein the N devices are connected to at least one switch; controlling the i-th device to send first traffic data to the j-th device to obtain the ij-th transmission parameter; controlling the j-th device to send the first traffic data to the i-th device to obtain the ji-th transmission parameter, thereby obtaining N(N-1) transmission parameters, 1≤i≤N, 1≤j≤N, i≠j; determining, based on the N(N-1) transmission parameters, whether the N devices and the switch connected to the N devices are faulty, and transferring the faulty switch and... The devices are labeled as faulty switches and faulty devices, and the devices connected to the faulty switches are labeled as faulty devices. Among the N devices excluding the faulty devices, M-2 devices are controlled to send second traffic data to the k-th device, and one device sends the second traffic data and third traffic data to the k-th device to obtain the k-th small flow delay corresponding to the third traffic data. The third traffic data is less than a first preset data amount, 2≤M≤N, 1≤k≤M. Based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data, it is determined whether the congestion control capability of the network under test is qualified.

[0010] In an exemplary embodiment of this disclosure, the transmission parameters include transmission delay. Determining whether the N devices and the switches connected to the N devices are faulty based on the N(N-1) transmission parameters includes: identifying two or more devices corresponding to transmission delays greater than a first preset delay among the N(N-1) transmission delays; if the transmission delay corresponding to one of the two or more devices exceeds a first preset proportion, marking the device as a suspicious device; if the proportion of suspicious devices among the devices connected to a switch is greater than a second preset proportion, marking the switch as a suspicious switch; and determining whether the suspicious device and the suspicious switch are faulty based on the suspicious device and the suspicious switch.

[0011] In an exemplary embodiment of this disclosure, determining whether the congestion control capability of the network under test is qualified based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data includes: if there is a transmission delay greater than a second preset delay among the N(N-1) transmission delays, it is determined that the congestion control capability of the network under test is unqualified.

[0012] In an exemplary embodiment of this disclosure, the transmission parameters include the transmission duration for completing the transmission of the first traffic data, the first traffic data being greater than a second preset data amount, the second preset data amount being greater than the first preset data amount, and determining whether the congestion control capability of the network under test is qualified based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data includes: if at least one of the N(N-1) transmission durations has a transmission duration greater than a first preset duration, it is determined that the congestion control capability of the network under test is unqualified.

[0013] In an exemplary embodiment of this disclosure, determining whether the congestion control capability of the network under test is qualified based on the N (N-1) transmission parameters and the M small flow delays corresponding to the third traffic data includes: if at least one of the M small flow delays is greater than a third preset delay, it is determined that the congestion control capability of the network under test is unqualified.

[0014] In one exemplary embodiment of this disclosure, after marking a faulty switch as a faulty switch, marking the devices connected to the faulty switch as faulty devices, and marking the faulty devices as faulty devices, the method further includes: responding to a switch accessing the network under test message and determining the identifier of the switch; if the identifier of the switch corresponds to the identifier of the faulty switch, controlling M fault-free devices in the network under test to send the first traffic data to each other with the devices connected to the faulty switch to obtain multiple transmission parameters; if it is determined from the multiple transmission parameters that the faulty switch has recovered, marking the faulty switch as a fault-free switch, marking the devices connected to the faulty switch as fault-free devices, and updating the number M of fault-free devices.

[0015] In one exemplary embodiment of this disclosure, after marking a faulty switch as a faulty switch, marking the devices connected to the faulty switch as faulty devices, and marking the faulty devices as faulty devices, the method further includes: responding to a device accessing the network under test message and determining the identifier of the device; if the identifier of the device corresponds to the identifier of the faulty device, controlling M fault-free devices in the network under test to send the first traffic data to the faulty device to obtain the transmission parameters corresponding to the faulty device; if it is determined that the faulty device has recovered to normal based on the transmission parameters corresponding to the faulty device, marking the faulty device as a fault-free device and updating the number M of fault-free devices.

[0016] According to a second aspect of the present disclosure, an electronic device is provided, comprising: a memory; and a processor coupled to the memory, the processor being configured to perform the method as described in any one of the preceding embodiments based on instructions stored in the memory.

[0017] According to a third aspect of this disclosure, a computer-readable storage medium is provided having a program stored thereon that, when executed by a processor, implements the congestion control capability testing method as described in any of the preceding claims.

[0018] According to a fourth aspect of this disclosure, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the steps of the method as described in any of the preceding claims.

[0019] This embodiment of the disclosure first controls the devices in the network under test to perform point-to-point tests to eliminate device and switch failures, assess the basic network status, and thus preliminarily determine obvious congestion control capability deficiencies. It can quickly identify congestion control capability deficiencies without accessing the switch, thereby improving the testing efficiency of congestion control capability. In addition, by performing latency tests on fault-free devices under multi-call congestion scenarios, it can accurately determine whether the congestion control capability of the network under test is qualified. It can measure the queue processing capacity of the switch without accessing the switch, and achieve accurate testing of congestion control capability without accessing the switch.

[0020] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0021] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0022] Figure 1 is a flowchart of a congestion control capability testing method in an exemplary embodiment of this disclosure.

[0023] Figure 2 is a schematic diagram of the network under test in an exemplary embodiment of this disclosure.

[0024] Figure 3 is a sub-flowchart of step S3 in an exemplary embodiment of this disclosure.

[0025] Figure 4 is a flowchart of updating the number of devices after step S3 in an exemplary embodiment of this disclosure.

[0026] Figure 5 is a flowchart of updating the number of devices after step S3 in an exemplary embodiment of this disclosure.

[0027] Figure 6 is a schematic diagram of a small-flow latency test performed in an exemplary embodiment of this disclosure.

[0028] Figure 7 is a schematic diagram of the test process in an exemplary embodiment of this disclosure.

[0029] Figure 8 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Detailed Implementation

[0030] Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided to make this disclosure more comprehensive and complete, and to fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a full understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure can be practiced with one or more of the specific details omitted, or other methods, components, apparatus, steps, etc., can be employed. In other instances, well-known technical solutions are not shown or described in detail to avoid obscuring various aspects of this disclosure.

[0031] Furthermore, the accompanying drawings are merely illustrative of this disclosure, and the same reference numerals in the drawings denote the same or similar parts, thus repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.

[0032] The exemplary embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.

[0033] Figure 1 is a flowchart of a congestion control capability testing method in an exemplary embodiment of this disclosure.

[0034] Referring to Figure 1, the congestion control capability testing method 100 may include:

[0035] Step S1: Identify N devices in the network under test, wherein the N devices are connected to at least one switch;

[0036] Step S2: Control the i-th device to send the first traffic data to the j-th device to obtain the ij-th transmission parameter; control the j-th device to send the first traffic data to the i-th device to obtain the ji-th transmission parameter, thereby obtaining N(N-1) transmission parameters, 1≤i≤N, 1≤j≤N, i≠j;

[0037] Step S3: Determine whether the N devices and the switches connected to the N devices are faulty based on the N(N-1) transmission parameters, mark the faulty switches and devices as faulty switches and faulty devices respectively, and mark the devices connected to the faulty switches as faulty devices.

[0038] Step S4: Among the M devices (excluding the faulty device) of the N devices, control M-2 devices to send second flow data to the k-th device, and one device to send the second flow data and third flow data to the k-th device to obtain the k-th small flow delay corresponding to the third flow data. The third flow data is less than the first preset data amount, 2≤M≤N, 1≤k≤M;

[0039] Step S5: Based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data, determine whether the congestion control capability of the network under test is qualified.

[0040] This embodiment of the disclosure first controls the devices in the network under test to perform point-to-point tests to eliminate device and switch failures, assess the basic network status, and thus preliminarily determine obvious congestion control capability deficiencies. It can quickly identify congestion control capability deficiencies without accessing the switch, thereby improving the testing efficiency of congestion control capability. In addition, by performing latency tests on fault-free devices under multi-call congestion scenarios, it can accurately determine whether the congestion control capability of the network under test is qualified. It can measure the queue processing capacity of the switch without accessing the switch, and achieve accurate testing of congestion control capability without accessing the switch.

[0041] The following is a detailed explanation of each step of the congestion control capability testing method 100.

[0042] In step S1, N devices in the network under test are identified, and the N devices are connected to at least one switch.

[0043] Figure 2 is a schematic diagram of the network under test in an exemplary embodiment of this disclosure.

[0044] Referring to Figure 2, the network under test 200 may include at least one switch 21, and each switch 21 is connected to at least one device 22. The number of devices in the network under test 200 is N, where N≥1. Devices 22 may be various computing nodes, storage devices, and network peripheral devices, and devices 22 can communicate with each other through switches 21. The communication connections between switches 21 are either direct connections or connections through upper-layer switches such as core switches.

[0045] In an exemplary embodiment, the network under test 200 is, for example, an RDMA (Remote Direct Memory Access) network of a computing center or storage center. RDMA is a high-performance network communication technology that allows computers to directly transfer data between the memory of different nodes without the intervention of the operating system kernel, greatly reducing data transmission latency and improving transmission efficiency.

[0046] When the network under test 200 is an RDMA network, switch 21 can identify and process RDMA data packets, and quickly forward the data packets to device 22 according to the network topology and traffic load. The device 22 connected to each switch 21 can be a server with powerful computing capabilities, equipped with high-performance processors and large-capacity memory, specifically designed to handle complex computing tasks; or it can be a massive storage device, such as a disk array, used to store massive amounts of data resources.

[0047] Due to the stringent performance and stability requirements of RDMA networks, congestion control algorithms need to be configured on switch 21 and device 22 to ensure the data transmission performance of the entire network. Congestion control algorithms are a crucial mechanism designed to prevent network performance degradation or even collapse due to excessive data traffic.

[0048] At switch layer 21, its congestion control algorithm monitors the traffic load of each port. For example, it counts parameters such as the number of packets received per unit time, byte traffic, and queue length. When it detects that the traffic on a port is approaching or exceeding its capacity, it initiates congestion control. This includes prioritizing incoming packets to ensure that high-priority RDMA traffic (such as data transmission related to real-time computing tasks) can pass first, while appropriately delaying or limiting low-priority traffic (such as some background data synchronization tasks). Simultaneously, switch layer 21's congestion control algorithm can also exchange information with neighboring switches to collaboratively adjust traffic paths, redirecting some traffic to lighter-loaded links to balance the overall network load.

[0049] At the device 22 level, its congestion control algorithm primarily focuses on managing its own transmit and receive buffers and responding to network congestion signals. At the transmitting end, device 22 dynamically adjusts its transmission rate based on its own buffer occupancy and congestion feedback received from the network (such as congestion notifications sent by switch 21 or indirect congestion detection through changes in packet latency). When the buffer is about to fill or network congestion appears, device 22 reduces its transmission rate to avoid further exacerbating network congestion. At the receiving end, device 22 allocates memory resources to process received data, preventing data loss due to receive buffer overflow. For example, when the amount of received data suddenly increases, device 22 can temporarily request more memory space or preprocess the data to speed up the transfer of data from the buffer.

[0050] In the RDMA network, the congestion control algorithms of switch 21 and device 22 cooperate and work together, acting like the network's "traffic police," constantly monitoring the network's "traffic flow" to ensure that data can be transmitted smoothly and efficiently in the network. This provides a stable and reliable network environment for various complex tasks in the intelligent computing center and storage center, ensuring the continuous and stable operation of critical services such as large-scale parallel computing, massive data storage and retrieval, and enabling the entire RDMA network to fully leverage its high-performance advantages.

[0051] Therefore, as the congestion control algorithm is updated, it is necessary to test the congestion control capability of the congestion control algorithm configured in the network under test to ensure that the congestion control algorithm can meet the business requirements.

[0052] In step S1, for the network under test 200, its topology needs to be sorted out and recorded first, clarifying the connection relationship between each switch 21 and each device 22, including the connected port information. Simultaneously, basic configuration information for each device 22 needs to be collected, such as its supported RDMA protocol version, network interface card parameters, and memory configuration. After determining the switches 21 and devices 22 participating in the test within the network under test 200, the switches 21 and devices 22 need to be configured for testing to enable subsequent point-to-point device testing and multi-to-one congestion scenario small-flow testing.

[0053] In an exemplary embodiment, the testing method of this disclosure can be implemented by one of the N devices 22, or by a dedicated testing device connected to each switch 21. Regardless of the device used, the tester can input test configuration information into the device executing method 100 before testing. This configuration information includes, for example, the management port IP address, RDMA port IP address, and test result storage path for each device 22. Since the method of this disclosure is implemented without accessing the switches, it is sufficient to ensure that the communication connection between the device executing method 100 and each switch 21 is normal; no configuration of the switches 21 is required.

[0054] In addition, it is also necessary to set the traffic type (e.g., write, send, read type), data volume (e.g., how many bytes, bits), transmission protocol (TCP, UDP, etc.) and other related parameters (e.g., packet size) of the traffic data sent by device 22.

[0055] In this embodiment of the disclosure, a fixed amount of traffic data is transmitted between the control devices 22 for testing. In the same type of test, the amount of traffic data transmitted between the devices 22 is the same. Therefore, the amount of traffic data can be preset according to the type of test.

[0056] In an exemplary embodiment, multiple tests can be set up for the same test, with each test corresponding to a different traffic type. For example, when performing a transmission latency test, the transmission latency of sending write-type first traffic data, sending send-type first traffic data, and sending read-type first traffic data between devices 22 can be tested three times. For this purpose, three types of first traffic data can be set, each corresponding to a different traffic type. Since the comparison of transmission latency is performed within the same traffic type, the data volume of the first traffic data for different traffic types can be the same or different, as detailed in subsequent embodiments.

[0057] Finally, the serial numbers of each device 22 can be set to enable subsequent test control. For example, the identifiers of N devices 22 can be written into a list or array, and these identifiers can be sorted according to certain rules, such as the physical location of the devices, performance parameters, or hierarchical relationship in the network topology.

[0058] When sorting by physical location, if the devices are distributed in the computer room according to the location of the cabinets and racks, the devices can be assigned serial numbers in order from left to right and from top to bottom. In this way, when conducting regional network testing, it is convenient to select a group of devices in adjacent or specific areas for targeted testing, and quickly locate possible local network problems, such as communication abnormalities between devices in a certain cabinet.

[0059] Furthermore, devices can be sorted based on performance parameters, prioritizing those with high processing power, large memory, and high network interface speeds, and placing relatively weaker devices at the bottom. During congestion scenario simulations, test data can clearly observe the collaborative working relationship between high-performance and low-performance devices, as well as the rationality of network resource allocation among devices of different performance levels.

[0060] Sorting according to the hierarchical relationship in the network topology is suitable for complex network architectures, such as the device division into core layer, aggregation layer, and access layer. Placing core layer devices at the front of the list, followed by aggregation layer devices, and then access layer devices last, allows for a systematic approach when testing the performance of the network's layered architecture, data flow, and interactions between different layers.

[0061] Once the device serial numbers are determined and recorded in a list or array, specific devices can be quickly located during the test control process based on their serial numbers. This allows for individual parameter configuration, traffic transmission and reception control, and status monitoring. For example, during concurrent multi-device streaming tests, the streaming process can be initiated sequentially according to the device serial numbers, ensuring the orderliness and accuracy of the test. Alternatively, when a fault is detected in a certain area of ​​the network, the faulty device and its surrounding connected devices can be quickly identified based on the device serial number, improving the efficiency of fault diagnosis and repair. This ensures the efficient and stable progress of the entire network testing process, providing a solid foundation for network performance evaluation and optimization in complex network environments such as intelligent computing centers and storage centers.

[0062] After confirming that the relevant test settings configuration for the device 22 participating in the test is complete, you can proceed to step S2 to start the point-to-point test.

[0063] In step S2, control the i-th device to send the first traffic data to the j-th device to obtain the ij-th transmission parameter; control the j-th device to send the first traffic data to the i-th device to obtain the ji-th transmission parameter, thereby obtaining N(N-1) transmission parameters, 1≤i≤N, 1≤j≤N, i≠j.

[0064] Step S2 refers to performing point-to-point one-way communication between N devices in a test type to obtain the transmission parameters corresponding to N(N-1) point-to-point communications.

[0065] In an exemplary embodiment, the transmission parameters include transmission latency, and the test types include testing the transmission latency between each pair of N devices in two corresponding directions. In this type of test, the traffic type of the first traffic data can be changed.

[0066] For example, when testing transmission latency, first, the first traffic data is set to the read type. Then, the data volume of the first traffic data is set, which can be flexibly adjusted according to the network characteristics, network functions, and actual application scenario requirements of the network under test. Finally, according to the device sequence number, communication is performed between each pair of devices. The difference between the reception time and the transmission time of the first traffic data sent from the i-th device to the j-th device is recorded as the ij-th transmission latency, and the difference between the reception time and the transmission time of the first traffic data sent from the j-th device to the i-th device is recorded as the ji-th transmission latency. Then, the next group of devices is tested, until the pairwise testing of N devices is completed, resulting in N(N-1) transmission latencies corresponding to the read type traffic.

[0067] Next, the first traffic data can be changed to the write type, and the pairwise tests between devices can be performed again to obtain N(N-1) transmission delays corresponding to the write type traffic. This process can be repeated. Depending on the testing requirements, other elements of the first traffic data (traffic type is only an example) can be adjusted to measure the transmission delay corresponding to each combination of elements.

[0068] Transmission latency reflects the basic operational status of the current network. Excessive transmission latency indicates a problem with at least one of the devices, switches, or congestion control algorithms. Detailed judgment logic will be explained later.

[0069] In an exemplary embodiment, the transmission parameters include the transmission time for the device to complete sending the first traffic data. In this case, the amount of the first traffic data needs to be set to be fixed and relatively large, for example, greater than a second preset data amount, to facilitate measurement of the transmission time for sending the first traffic data. The transmission time is used to determine the network bandwidth of the network under test under the current congestion control algorithm. The congestion control algorithm adjusts the traffic transmission rate of each device based on the network bandwidth.

[0070] In this embodiment, the transmission duration of a first traffic stream with a fixed data volume is measured by testing the device. This reflects the device's transmission rate, thereby determining whether the congestion control algorithm has automatically adjusted the transmission rate and whether it has impacted network bandwidth. Since the test is conducted through one-way communication between devices, if the first traffic stream data volume is not excessive, it will not cause network congestion and should not trigger the congestion control algorithm to adjust the transmission rate. Therefore, if a transmission duration exceeds a first preset duration, it indicates that the congestion control algorithm has adjusted the transmission rate, suggesting that the congestion control algorithm is ineffective.

[0071] Sending duration tests can typically be performed for different types of traffic, such as read-type traffic and write-type traffic mentioned above. These will not be elaborated upon further here.

[0072] It should be noted that during the testing process, it is necessary to ensure the stability of the network environment under test, eliminate other external interference factors, such as other irrelevant traffic bursts, network resource consumption by other devices, and potential impacts of complex network topologies. In addition, it is necessary to ensure that each device is not running other complex tasks (e.g., the total CPU or memory usage exceeds 10%), so as to avoid other factors interfering with the judgment of test results.

[0073] The aforementioned transmission latency test and network bandwidth (sending duration) test can be performed individually, simultaneously (measuring both sending duration and transmission latency for the first traffic data), or sequentially (e.g., testing transmission latency first, followed by testing sending duration). In this embodiment, performing the transmission latency and sending duration tests sequentially results in higher test accuracy.

[0074] In the above transmission latency test and transmission duration test, since only one device sends unidirectional traffic and one device receives traffic each time, the basic parameters of the network can be tested under conditions such as fewer transmission tasks, fewer concurrent connections, and lower data packet transmission frequency.

[0075] When devices communicate with each other, the network is relatively idle, meaning there are fewer transmission tasks and fewer concurrent connections. The measured transmission parameters can more clearly reflect the health status of the devices themselves and the initial performance of the congestion control mechanism. Fewer concurrent connections mean a limited number of channels for simultaneous data exchange; if an anomaly occurs, it's easier to pinpoint the problem to a specific device or connection link. Furthermore, because of the low frequency of data packet transmission between devices, mutual interference during data transmission is reduced, making the transmission status of individual data packets easier to observe and analyze. For example, whether the transmission latency is stable is crucial for determining whether the device is working properly and whether there are any anomalies in congestion control. When the network is idle, any abnormalities in congestion control capabilities can be detected more intuitively and quickly in this relatively simple testing environment.

[0076] Therefore, if there are problems with transmission parameters such as transmission delay or transmission duration, in an environment without complex network conditions to mask or mislead, it is highly likely to be one of the three reasons: equipment failure, switch failure, or inadequate congestion control capability. Setting step S2 can quickly locate the fault, detect the fault as early as possible, and improve testing efficiency.

[0077] In addition to testing transmission latency and transmission duration (network bandwidth) as mentioned above, other types of transmission parameters can be tested to assess the transmission parameters of each device in network environments with fewer transmission tasks, fewer concurrent connections, and lower data packet transmission frequency. This information can be used to diagnose device faults and quickly and intuitively detect abnormalities in congestion control capabilities. Those skilled in the art can set the types and number of transmission parameters according to testing needs; this disclosure is not limited thereto.

[0078] After measuring the transmission parameters, the faults can be initially screened based on these parameters to quickly identify obvious problems and improve testing efficiency.

[0079] In step S3, based on the N(N-1) transmission parameters, it is determined whether the N devices and the switches connected to the N devices are faulty. The faulty switches and devices are marked as faulty switches and faulty devices, respectively, and the devices connected to the faulty switches are marked as faulty devices.

[0080] Figure 3 is a sub-flowchart of step S3 in an exemplary embodiment of this disclosure.

[0081] Referring to FIG3, in an exemplary embodiment, step S3 may include:

[0082] Step S31: Among the N(N-1) transmission delays, determine two or more devices corresponding to the transmission delay that is greater than the first preset delay;

[0083] Step S32: If the transmission delay of one of the two or more devices exceeds the first preset proportion, the device is marked as a suspicious device.

[0084] Step S33: If the proportion of suspicious devices among the devices connected to a switch is greater than the second preset proportion, mark the switch as a suspicious switch.

[0085] Step S34: Determine whether the suspicious device or the suspicious switch is malfunctioning based on the suspicious device and the suspicious switch.

[0086] As analyzed above, in a network environment with fewer transmission tasks, fewer concurrent connections, and a lower data packet transmission frequency, a large transmission delay likely indicates at least one of three problems: equipment failure, switch failure, or inadequate congestion control capabilities. Therefore, a first preset delay is set to filter out problematic transmission delays.

[0087] The first preset delay can be determined based on factors such as the environment of the network under test, the type and performance of the equipment, and the amount of data in the first traffic flow. For example, a reasonable transmission delay can be determined based on actual test results—which need to be obtained in a network environment with fault-free equipment and congestion control capabilities—and on this basis, the transmission delay corresponding to abnormal conditions can be determined, thereby reasonably determining the first preset delay.

[0088] In step S31, if a transmission delay exceeds the first preset delay, the two devices (the sending device and the receiving device) corresponding to the transmission delay can be marked as suspicious devices, and it is recorded that the device is suspicious when sending or receiving (because some devices may have normal sending function but abnormal receiving function, etc.). The reason for marking them as suspicious devices rather than faulty devices is that some transmission anomalies may be caused by switch failure or congestion control capabilities, so only suspicious devices are marked in step S31.

[0089] Next, in step S32, the transmission delays (transmission delay as a receiver and transmission delay as a sender) corresponding to each suspicious device are queried, and the proportion of abnormal transmission delays is calculated. If most of the transmission delays corresponding to a suspicious device exceed a first preset delay (the proportion exceeds the first preset proportion), it indicates that the device is likely to be abnormal. This first preset proportion can be set relatively high, such as 80%, to improve the accuracy of the judgment.

[0090] If the transmission delay for a suspicious device is mostly within the first preset delay range, other causes can be investigated further.

[0091] In step S33, during the process of determining the switch fault, the devices connected to the switch are first summarized, and the proportion of abnormal devices is determined. If a switch is connected to more than the second preset proportion of abnormal devices, it means that the abnormality of these abnormal devices is most likely caused by the switch, and the switch can be marked as a suspicious switch.

[0092] Next, in step S34, further screening is conducted based on suspicious switches and suspicious devices.

[0093] In some embodiments, the method for determining faulty devices and switches based on suspicious devices and suspicious switches includes, for example, temporarily interrupting the connection of a suspicious device to other devices, using a backup link (if available) or directly connecting it to a known normal switch port, restarting the flow test with other normal devices, and observing whether its transmission latency returns to normal. If it returns to normal, it can be basically determined that the device is a faulty device, and the cause of its failure is most likely due to its own hardware or software configuration problems, such as network card failure, driver abnormality, etc.

[0094] For a suspected switch, select some devices connected to its ports and transfer their connections to another normally functioning switch (provided a spare switch is available). Perform another flow test and compare the transmission latency changes before and after the transfer. If the latency significantly improves after the transfer, approaching or reaching the expected value under normal network conditions, then the switch is likely faulty. The fault may lie in the switch's internal switching matrix, port modules, or congestion control modules.

[0095] Furthermore, clustering analysis techniques can be introduced to assist in the judgment. Relevant parameters of all devices and switches, such as average transmission delay, standard deviation of transmission delay, traffic rate, and number of connected devices, are used as feature vectors, and clustering algorithms (such as K-Means) are applied for clustering. Normally functioning devices and switches will typically cluster in one or a few clusters with similar characteristics, while suspicious devices and switches, due to their abnormal parameter behavior, may be assigned to other clusters. By observing the distribution within and between clusters, faulty devices and switches can be further identified. For example, if other devices in the cluster of a suspicious device also exhibit similar problems such as high latency and low traffic rate (high transmission time), and this cluster is clearly distinguishable from normal clusters, then the likelihood of that device being faulty is higher. Similarly, for a suspicious switch, if most of the devices connected to it are located in the problematic cluster, this also provides strong evidence for judging its fault.

[0096] Throughout the troubleshooting process, various test data can be automatically recorded and output (e.g., via a txt document or output to the screen), including detailed results of each flow test, configuration changes of devices and switches, etc., for subsequent review and analysis, continuous optimization of fault diagnosis methods, improvement of network operation and maintenance efficiency and accuracy, and ensuring that the network under test operates stably and efficiently in various complex application scenarios, meeting the stringent network performance requirements of intelligent computing centers, storage centers, etc.

[0097] After identifying the faulty device or switch, you can either remove it from the network under test and proceed with the next test based solely on the remaining M devices and switches that are guaranteed to be functioning properly, or wait for a period of time to see if the faulty device or switch can be repaired and then rejoined to the network under test after it has been repaired.

[0098] Figure 4 is a flowchart of updating the number of devices after step S3 in an exemplary embodiment of this disclosure.

[0099] Referring to Figure 4, the process of updating the number of devices may include:

[0100] Step S301: Respond to the switch access to the network under test message and determine the switch identifier;

[0101] Step S302: If the identifier of the switch corresponds to the identifier of the faulty switch, control the M fault-free devices in the network under test to send first traffic data to each other with the devices connected to the faulty switch in order to obtain multiple transmission parameters.

[0102] Step S303: If the faulty switch is determined to have recovered to normal based on multiple transmission parameters, the faulty switch is marked as a fault-free switch, the devices connected to the faulty switch are marked as fault-free devices, and the number M of fault-free devices is updated.

[0103] In the embodiment shown in Figure 4, if a faulty switch is repaired and rejoined the network under test, its identifier can be used to determine that it is a repaired faulty switch. A detection program is then initiated, controlling the devices connected to the faulty switch to perform pairwise tests with other normal devices in the network under test, as in step S2, to determine whether the transmission latency of the devices connected to the switch has returned to normal. For example, if the transmission latency of these devices is all lower than a first preset latency, it is determined that the devices and the switch have returned to normal. Then, the flag status of the switch and its connected devices is updated, and the number of fault-free devices M is updated.

[0104] Figure 5 is a flowchart of updating the number of devices after step S3 in an exemplary embodiment of this disclosure.

[0105] Referring to Figure 5, the process of updating the number of devices may also include:

[0106] Step S304: Respond to the device accessing the network under test message and determine the device identifier;

[0107] Step S305: If the identifier of the device corresponds to the identifier of the faulty device, control the M fault-free devices in the network under test to send first traffic data to each other with the faulty device in order to obtain the transmission parameters corresponding to the faulty device.

[0108] Step S306: If the faulty device is determined to have recovered to normal based on the transmission parameters corresponding to the faulty device, the faulty device is marked as a fault-free device, and the number M of fault-free devices is updated.

[0109] In the embodiment shown in Figure 5, if a faulty device is repaired and rejoined to the network under test, a transmission parameter test can be initiated based on the identifier to determine whether the device has been repaired. If it is determined that the device has been repaired (e.g., its corresponding transmission delays do not exceed a first preset delay), the status of the device and the number M of fault-free devices are updated.

[0110] By identifying faulty devices and switches based on transmission parameters and updating device records after repair, abnormal test results caused by device or switch failures can be eliminated in a simple test environment. Subsequent tests can then be conducted in a normal device environment, improving test efficiency and accuracy.

[0111] Since the embodiments of this disclosure need to address the problem of inaccessibility to the switch, the inventors, through analysis and testing, determined to use small flow testing to measure the system's congestion control capability.

[0112] That is, in step S4, among the M devices other than the faulty device among the N devices, M-2 devices are controlled to send the second flow data to the k-th device, and one device sends the second flow data and the third flow data to the k-th device to obtain the k-th small flow delay corresponding to the third flow data. The third flow data is less than the first preset data amount, 2≤M≤N, 1≤k≤M.

[0113] The first preset data volume can be relatively small to limit the third flow data to a "small flow".

[0114] Small flows (data streams with small data packets and short durations) refer to data streams with relatively small data volumes and specific traffic characteristics in network data transmission. When a large number of large flows are transmitted in the network, they may occupy the buffer space and bandwidth resources of network devices, causing small flows to face queuing or delayed processing during transmission. Therefore, the latency of a small flow when passing through the switch queue mainly consists of the waiting time in the queue and the transmission time. When the length of the switch queue increases, the waiting time of the small flow after entering the queue will be longer, thus increasing the latency of the small flow. Therefore, the length of the switch queue can be indirectly reflected by monitoring the latency of small flows. In step S4, in the scenario of actively causing congestion (controlling multiple devices to send flows to one device), obtaining the latency data of the small flows superimposed on the flow can reflect the length of the current switch queue in the network under test when the network is congested, thereby evaluating the congestion control capability of the algorithm. At the same time, it avoids accessing the switch, greatly simplifying the deployment difficulty.

[0115] Figure 6 is a schematic diagram of a small-flow latency test performed in an exemplary embodiment of this disclosure.

[0116] Referring to Figure 6, assume that there are two switches 211 and 212 in the network under test 200, and each switch connects three devices, for a total of six devices 221 to 226. All six devices and the two switches are fault-free, and M = 6.

[0117] In step S4, during the small flow latency test, device 221 is first set as the receiving device, and devices 222-226 are set as the sending devices to transmit the second flow data A2, thereby creating a network congestion environment. Simultaneously, any one of devices 222-226 is set to overlay a third flow data A3 (i.e., a small flow) while transmitting the second flow data A2, thus testing the transmission latency of the third flow data, i.e., the small flow latency, which reflects the switch queue length of the network under test 200 in a congested scenario. Since small flows are not bandwidth-limited and will not further affect bandwidth or exacerbate congestion, the processing speed of small flows by the switch is mainly affected by its queue length, and the switch queue length can reflect the congestion control capability of the current congestion control algorithm.

[0118] After measuring the small stream delay corresponding to device 221 as a receiving device, device 222 is set as a receiving device, and devices 221 and 223-226 are set as sending devices to send the second flow data. At the same time, any one of devices 221 and 223-226 is set to superimpose the third flow data, i.e., the small stream, while sending the second flow data, thereby measuring the small stream delay corresponding to device 222 as a receiving device.

[0119] Similarly, a total of six small-flow delays were measured. Since the six devices are connected to two switches respectively, these six tests and six small-flow delays can reflect the queue length of the two switches in multiple directions, providing a comprehensive test of the switch's queue length.

[0120] After the test is completed, based on the test results data, such as transmission delay, transmission duration, and small stream delay, the data can be automatically output to the screen or a file, and can also be automatically processed to provide an evaluation result of the congestion control capability of the current congestion control algorithm.

[0121] In step S5, based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data, it is determined whether the congestion control capability of the network under test is qualified.

[0122] In an exemplary embodiment, if there is a transmission delay greater than the second preset delay among the N(N-1) transmission delays, the congestion control capability of the network under test is determined to be unqualified.

[0123] In the above embodiments, a first preset delay is set to filter out devices and switches with problems in the sending and receiving mechanisms. This first preset delay can be calculated or measured and corresponds to scenarios such as device failure and switch failure.

[0124] In this embodiment, a second preset delay is set to identify congestion control failures. This second preset delay can be set more leniently than the first preset delay. The first preset delay primarily focuses on situations where devices and switches experience abnormally high transmission delays due to hardware or software malfunctions during normal operation, aiming to accurately pinpoint potentially faulty individual devices or switch ports. The second preset delay, however, considers the overall network congestion control, taking into account the allowable load fluctuations and reasonable latency range during normal network operation. When the transmission delay exceeds the second preset delay, it indicates a severe network congestion state where the congestion control mechanism has failed to effectively intervene and adjust, resulting in a significant deterioration in overall network performance and an inability to meet basic quality of service requirements.

[0125] Determining the second preset latency requires considering multiple factors. Firstly, the design specifications and expected service level of the network under test must be taken into account. For example, if the network under test is intended to support a real-time financial trading system that is extremely sensitive to latency, then the second preset latency must be set very low to ensure that network congestion will not substantially affect the rapid execution of transactions under any circumstances. Conversely, if it is a network for ordinary enterprise offices, which has a relatively high tolerance for latency, then the second preset latency can be appropriately relaxed. Secondly, historical network performance data needs to be analyzed. By monitoring the changes in transmission latency under different load conditions over a long period, statistical analysis methods can be used to determine a reasonable threshold. This ensures that, under normal business peak periods and other conditions, as long as the congestion control mechanism operates normally, the transmission latency will not easily exceed this threshold.

[0126] Using a second preset delay to quickly screen for unqualified congestion control capabilities can improve the efficiency of problem localization—if the unqualified congestion control capability is determined based on the transmission delay, no further processing or testing is required.

[0127] Of course, even if the transmission delay does not exceed the second preset delay, it does not mean that the congestion control capability is definitely not a problem. Further judgment is needed by combining other transmission parameters.

[0128] In an exemplary embodiment, if at least one of the N(N-1) transmission durations has a transmission duration longer than a first preset duration, the congestion control capability of the network under test is determined to be unqualified.

[0129] As analyzed above, under controlled data volume in the first transmission traffic during point-to-point testing, network congestion will not occur, and the transmission time of each device for the first transmission traffic should be relatively similar. If the measured transmission time exceeds the first preset time, after ruling out equipment failure, it indicates a problem with the congestion control mechanism, confirming that the congestion control capability is inadequate, and the current congestion control algorithm needs to be modified.

[0130] Furthermore, in this embodiment of the disclosure, the congestion control capability is primarily determined based on the small flow delays under a congestion scenario. If at least one of the M small flow delays is greater than a third preset delay, the congestion control capability of the network under test is determined to be unqualified.

[0131] As analyzed above, the latency of a small flow (which can be understood as a data stream with small packets and short duration) passing through the switch queue mainly consists of the waiting time in the queue and the transmission time. When the switch queue length increases, the waiting time for a small flow after entering the queue becomes longer, thus increasing the small flow latency. Therefore, the switch queue length can be indirectly reflected by monitoring the small flow latency. The small flow latency reflects the switch queue length; if the latency of any small flow exceeds a third preset latency, it indicates that the switch queue length in that small flow transmission direction is too long, and the congestion control capability is inadequate. The third preset latency can be set based on factors such as the data volume of the third flow, the number of devices connected to the system, and the performance of the devices and switches, etc., and will not be elaborated upon here.

[0132] Figure 7 is a schematic diagram of the test process in an exemplary embodiment of this disclosure.

[0133] Referring to Figure 7, in step S71, the configuration information of the test node, i.e., the device of the network under test, is first obtained.

[0134] In step S72, a point-to-point bandwidth test is performed to determine the transmission duration of a fixed amount of traffic data sent by the device. The point-to-point test requires two loops: an outer loop and an inner loop. The outer loop records the number of loop iterations *i* and the server *i* (i.e., the i-th device) required to start the test. The inner loop records the number of loop iterations *j*, the server *j* (i.e., the j-th device) required to start the test, and outputs the test results. For example, in the outer loop, a device *i* is selected, the corresponding server is started, and the inner loop corresponding to device *i* is entered. In the inner loop, N-1 other devices are selected and communicate with device *i*. Each device communicates with device *i* twice, recording and outputting the corresponding transmission parameters for each communication. After all N-1 devices have completed communication, the current inner loop ends, the outer loop is returned to, device *i+1* is selected, and the inner loop corresponding to device *i+1* is entered, achieving 2*(N-1) communications. This process continues.

[0135] In step S73, a point-to-point latency test is performed to measure the latency of transmitting the first traffic data between any two devices. The point-to-point process also enters the outer and inner loops described above, which will not be repeated here.

[0136] In step S74, a low-flow latency test is performed under congestion scenarios. Specifically, this includes:

[0137] Step S741: Initiate multicast background traffic, which means controlling M-1 devices to simultaneously send second traffic data to one device. Incast traffic is a special traffic mode that typically occurs in scenarios such as data center networks. When multiple senders simultaneously send incast traffic to a single receiver, the queue status of the switch can be easily affected.

[0138] Step S742: Start small flow overlay, control any one of the M-1 devices to overlay the third flow data into the second flow data, and the third flow data is a small flow.

[0139] Step S743: Record the small flow latency corresponding to each node / device.

[0140] In this embodiment, the node bandwidth can be obtained from the output of perftest.

[0141] During testing, if testing is conducted on an RDMA network, the specific RDMA operation (send, read, write, etc.), the number of QPs (Queue Pairs), and the traffic direction used in each test are all configurable by the operations and maintenance personnel. In the RDMA architecture, a QP is the basic unit of communication, consisting of a send queue and a receive queue, used to achieve efficient data transmission between different nodes (such as servers and other devices). Each QP corresponds to a specific source and destination, and can be viewed as a "channel" specifically opened for data interaction between two nodes.

[0142] The above process can be implemented through virtual modules such as automated testing modules, test result acquisition modules, and result output modules, and this disclosure does not impose any special restrictions on it.

[0143] When automatically generating test reports, the test reports can include RDMA communication types such as write, send, and read, the amount of traffic data, the bandwidth test results (sending duration) of each node (device), the fairness (distribution) of transmission latency and bandwidth test results, the status of bottleneck switch queues, and other information. They can also include information such as the average transmission latency, 99% tail latency, and the type of background traffic (second traffic data) corresponding to small flow latency.

[0144] In summary, the embodiments of this disclosure can implement a congestion control assessment scheme in a physical environment without requiring switch login information (existing schemes require configuring switch login information, logging into the switch, and collecting data via the SNMP protocol; since end-side network card maintenance personnel typically have difficulty obtaining switch login permissions, this scheme is difficult to deploy in practice). By using a scheme that indirectly perceives the queuing status of the switch under low-flow latency in congestion scenarios, pure end-side deployment of congestion control capability testing can be achieved, making it highly feasible. Furthermore, by setting up automated assessment of point-to-point basic capabilities in the network under test, it is easy to quickly identify and locate problem nodes. Simultaneously, it enables a comprehensive assessment of the congestion control capabilities of congestion control algorithms for different communication primitives (traffic types) and different message lengths (data volumes). Finally, maintenance personnel only need to perform simple configuration of relevant end-side information to obtain a complete test report, which can greatly improve the efficiency and accuracy of assessment and verification.

[0145] Therefore, the embodiments of this disclosure can efficiently and accurately evaluate the congestion control capability of the congestion control algorithm currently used by the network under test through simple settings, testing, and judgment, without logging into the switch, relying solely on the terminal device.

[0146] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0147] In an exemplary embodiment of this disclosure, an electronic device capable of implementing the above-described method is also provided.

[0148] Those skilled in the art will understand that various aspects of this disclosure can be implemented as a system, method, or program product. Therefore, various aspects of this disclosure can be specifically implemented in the following forms: a completely hardware implementation, a completely software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, collectively referred to herein as a "circuit," "module," or "system."

[0149] The electronic device 800 according to this embodiment of the present disclosure will now be described with reference to FIG8. The electronic device 800 shown in FIG8 is merely an example and should not be construed as limiting the functionality or scope of the embodiments of the present disclosure. The electronic device 800 may be any one of the devices 22 in the network under test 200, or it may be a dedicated test device separately from each device 22. The electronic device 800 is used to perform the methods described in the above embodiments.

[0150] As shown in Figure 8, the electronic device 800 is presented in the form of a general-purpose computing device. The components of the electronic device 800 may include, but are not limited to: at least one processor 810, at least one memory 820, and a bus 830 connecting different system components (including memory 820 and processor 810).

[0151] The memory stores program code that can be executed by the processor 810, causing the processor 810 to perform the steps described in the "Exemplary Methods" section above, according to various exemplary embodiments of this disclosure. For example, the processor 810 can perform methods as shown in embodiments of this disclosure.

[0152] The memory 820 may include a readable medium in the form of volatile memory, such as random access memory (RAM) 8201 and / or cache memory 8202, and may further include read-only memory (ROM) 8203.

[0153] The memory 820 may also include a program / utility 8204 having a set (at least one) of program modules 8205, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0154] Bus 830 can represent one or more of several types of bus structures, including a memory bus or memory controller, peripheral bus, graphics acceleration port, processor, or a local bus using any of the various bus structures.

[0155] Electronic device 800 can also communicate with one or more external devices 900 (e.g., keyboard, pointing device, Bluetooth device, etc.), and with one or more devices that enable a user to interact with electronic device 800, and / or with any device that enables electronic device 800 to communicate with one or more other computing devices (e.g., router, modem, etc.). This communication can be performed via input / output (I / O) interface 850. Furthermore, electronic device 800 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 via bus 830. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 800, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0156] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0157] In exemplary embodiments of this disclosure, a computer-readable storage medium is also provided, on which a program product capable of implementing the methods described above is stored. In some possible implementations, various aspects of this disclosure may also be implemented as a program product including program code that, when the program product is run on a terminal device, causes the terminal device to perform the steps of the various exemplary embodiments of this disclosure described in the "Exemplary Methods" section above.

[0158] The program product for implementing the above-described method according to embodiments of the present disclosure may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device.

[0159] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0160] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting programs for use by or in conjunction with an instruction execution system, apparatus, or device.

[0161] The program code contained on the readable medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.

[0162] Program code for performing the operations of this disclosure can be written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Java and C++, and conventional procedural programming languages ​​such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0163] Furthermore, the above figures are merely illustrative of the processes included in the method according to exemplary embodiments of this disclosure and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Additionally, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.

[0164] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and concept of this disclosure are indicated by the claims. Industrial applicability

[0165] This embodiment of the disclosure first controls the devices in the network under test to perform point-to-point tests to eliminate device and switch failures, assess the basic network status, and thus preliminarily determine obvious congestion control capability deficiencies. It can quickly identify congestion control capability deficiencies without accessing the switch, thereby improving the testing efficiency of congestion control capability. In addition, by performing latency tests on fault-free devices under multi-call congestion scenarios, it can accurately determine whether the congestion control capability of the network under test is qualified. It can measure the queue processing capacity of the switch without accessing the switch, and achieve accurate testing of congestion control capability without accessing the switch.

Claims

1. A method of testing congestion control capabilities, characterized by, include: Identify N devices within the network to be tested, wherein the N devices are connected to at least one switch; Control the i-th device to send the first traffic data to the j-th device to obtain the ij-th transmission parameter; control the j-th device to send the first traffic data to the i-th device to obtain the ji-th transmission parameter, thereby obtaining N(N-1) transmission parameters, 1≤i≤N, 1≤j≤N, i≠j; Based on the N(N-1) transmission parameters, determine whether the N devices and the switches connected to the N devices are faulty, mark the faulty switches and devices as faulty switches and faulty devices respectively, and mark the devices connected to the faulty switches as faulty devices. In the N devices excluding the faulty device, among the M devices, M-2 devices are controlled to send second flow data to the k-th device, and one device sends the second flow data and third flow data to the k-th device to obtain the k-th small flow delay corresponding to the third flow data. The third flow data is less than the first preset data amount, 2≤M≤N, 1≤k≤M; Based on the N(N-1) transmission parameters and the M small flow delays corresponding to the third traffic data, determine whether the congestion control capability of the network under test is qualified.

2. The congestion control capability testing method of claim 1, wherein, The transmission parameters include transmission delay. Determining whether the N devices and the switches connected to the N devices are faulty based on the N(N-1) transmission parameters includes: Among N(N-1) transmission delays, identify two or more devices corresponding to the transmission delay that is greater than the first preset delay; If the transmission delay of one of the two or more devices exceeds the first preset proportion, the device is marked as a suspicious device. If the proportion of suspicious devices among the devices connected to a switch is greater than a second preset proportion, the switch is marked as a suspicious switch. Based on the suspected device and the suspected switch, it is determined whether the suspected device or the suspected switch is malfunctioning.

3. The congestion control capability testing method of claim 2, wherein, Based on the N(N-1) transmission parameters and the M small-stream delays corresponding to the third traffic data, determining whether the congestion control capability of the network under test is qualified includes: If any of the N(N-1) transmission delays has a transmission delay greater than the second preset delay, the congestion control capability of the network under test is determined to be unqualified.

4. The congestion control capability testing method of claim 1, wherein, The transmission parameters include the transmission duration for completing the transmission of the first traffic data, where the first traffic data is greater than a second preset data amount, and the second preset data amount is greater than the first preset data amount. Based on the N(N-1) transmission parameters and the M small-stream delays corresponding to the third traffic data, determining whether the congestion control capability of the network under test is qualified includes: If at least one of the N(N-1) transmission durations is longer than the first preset duration, the congestion control capability of the network under test is determined to be unqualified.

5. The congestion control capability testing method of claim 1, wherein, Based on the N(N-1) transmission parameters and the M small-stream delays corresponding to the third traffic data, determining whether the congestion control capability of the network under test is qualified includes: If at least one of the M small flow delays is greater than the third preset delay, the congestion control capability of the network under test is determined to be unqualified.

6. The congestion control capability testing method of claim 1, wherein, After marking the faulty switch as a faulty switch, marking the devices connected to the faulty switch as faulty devices, and marking the faulty devices as faulty devices, the method further includes: In response to a message from the switch accessing the network under test, the identifier of the switch is determined; If the identifier of the switch corresponds to the identifier of the faulty switch, control the M fault-free devices in the network under test to send the first traffic data to each other with the devices connected to the faulty switch in order to obtain multiple transmission parameters. If the faulty switch is determined to have recovered to normal based on the multiple transmission parameters, the faulty switch is marked as a fault-free switch, the devices connected to the faulty switch are marked as fault-free devices, and the number M of fault-free devices is updated.

7. The congestion control capability testing method of claim 1 or 6, wherein, After marking the faulty switch as a faulty switch, marking the devices connected to the faulty switch as faulty devices, and marking the faulty devices as faulty devices, the method further includes: The device responds to the message that it accesses the network under test and determines the identifier of the device. If the identifier of the device corresponds to the identifier of the faulty device, control the M fault-free devices in the network under test to send the first traffic data to each other with the faulty device in order to obtain the transmission parameters corresponding to the faulty device. If the faulty device is determined to have recovered based on the transmission parameters corresponding to the faulty device, the faulty device is marked as a fault-free device, and the number M of fault-free devices is updated.

8. An electronic device, comprising: include: Memory; as well as A processor coupled to the memory, the processor being configured to perform the method as described in any one of claims 1-7 based on instructions stored in the memory.

9. A computer-readable storage medium having a program stored thereon that, when executed by a processor, implements the method as described in any one of claims 1-7.

10. A computer program product comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the method according to any one of claims 1-7.