Graphics processing unit (GPU) cluster system and server

By adopting an orthogonal architecture design in the GPU cluster system, compute nodes and switching nodes are interconnected through orthogonal OD connectors, which solves the problem of limited GPU interconnection capacity under cable interconnection and realizes high-performance and high-bandwidth GPU cluster computing.

WO2026138119A1PCT designated stage Publication Date: 2026-07-02ZTE CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ZTE CORP
Filing Date
2025-10-27
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

In related technologies, the use of cable interconnection limits the number of GPU interconnects, resulting in lower computing scale and performance of GPU clusters.

Method used

The system adopts an orthogonal architecture design, with compute nodes and switching nodes arranged orthogonally within the rack. The interconnection between the GPU and the switching chip is achieved through orthogonal OD connectors, thereby increasing the number of interconnects and the performance of the GPU cluster.

Benefits of technology

It significantly increases the number of interconnects between GPUs and computing performance, meeting the high computing power and high bandwidth communication requirements of large-scale AI computing, while also having stronger heat dissipation and power supply capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025130293_02072026_PF_FP_ABST
    Figure CN2025130293_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure provide a graphics processing unit (GPU) cluster system and a server. The system comprises: a computing node and a switching node; a first board where the computing node is located and a second board where the switching node is located are orthogonally arranged in a cabinet, and each GPU of the computing node is interconnected with each switching chip of the switching node respectively by means of an orthogonal OD connector.
Need to check novelty before this filing date? Find Prior Art

Description

Graphics Processing Unit (GPU) Cluster System and Server

[0001] Cross-references to related applications

[0002] This disclosure is based on and claims priority to Chinese patent application CN202411929333.2, filed on December 25, 2024, entitled “Graphics Processing Unit GPU Cluster System and Server”, and incorporates the entire contents of that patent application by reference. Technical Field

[0003] This disclosure relates to the field of intelligent computing, and more specifically, to a graphics processing unit (GPU) cluster system and server. Background Technology

[0004] With the rapid growth of large model parameters and training data, there is a demand for large-scale computing power expansion in intelligent computing systems. By fully considering the computing, network, and storage requirements of distributed training of large models, high-performance, scalable GPU (Graphics Processing Unit) clusters can be built to meet the computing power requirements. A GPU cluster is an intelligent computing system composed of multiple GPUs. By working together, the computing capabilities of large models for training and inference can be significantly improved.

[0005] In related technologies, GPU clusters are built using a modular approach, which can achieve large-scale computing power expansion. Within the GPU cluster, interconnect bandwidth is improved through scale-up interconnection between GPUs. However, the interconnect bandwidth achieved by this solution is relatively limited, and it can only interconnect GPUs with a scale of 4-16 cards.

[0006] Therefore, the cable tray solution is adopted, which connects the GPU of the computing node and the switching chip of the switching node through a cable. However, the cable interconnection method has the following effects: (1) the consistency of the cable interconnection is low and the cost is high; (2) the number of GPU interconnections is limited, resulting in a lower computing scale and performance of the intelligent computing system.

[0007] In summary, no effective solution has yet been proposed in the relevant technologies. Summary of the Invention

[0008] This disclosure provides a graphics processing unit (GPU) cluster system and server to at least solve the problem in the related art where the number of interconnected GPUs is limited by the interconnection method via cables, resulting in low computing scale and performance of the GPU cluster.

[0009] According to one embodiment of this disclosure, a graphics processing unit (GPU) cluster system is provided, including: compute nodes and switching nodes; a first board containing the compute nodes and a second board containing the switching nodes are orthogonally arranged in a rack, and each GPU of the compute nodes is interconnected with each switching chip of the switching nodes through orthogonal OD connectors. Attached Figure Description

[0010] Figure 1 is a schematic diagram of a GPU cluster system according to an embodiment of the present disclosure;

[0011] Figure 2 is a block diagram of the physical structure of the cabinet according to an embodiment of the present disclosure;

[0012] Figure 3 is a network architecture diagram of component interconnection in a computing node according to an embodiment of the present disclosure;

[0013] Figure 4 is a network architecture diagram of component interconnection in a switching node according to an embodiment of the present disclosure;

[0014] Figure 5 is a network architecture diagram of component interconnection in a chassis management unit according to an embodiment of the present disclosure;

[0015] Figure 6 is a block diagram of the physical structure of the components in the power supply unit according to an embodiment of the present disclosure;

[0016] Figure 7 is a block diagram of the physical structure of the components in the liquid cooling unit according to an embodiment of the present disclosure. Detailed Implementation

[0017] The embodiments of this disclosure will be described in detail below with reference to the accompanying drawings and examples.

[0018] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0019] Currently, with the rapid development of large-scale model technology, higher demands are being placed on the computing performance of hardware. AI servers provide larger-scale computing power through the parallel computing advantages of GPUs. The interconnection between multiple GPUs can significantly improve the computing performance of hardware. However, GPUs consume a lot of power, which also places higher demands on the power supply and heat dissipation of hardware.

[0020] In view of the above problems, this disclosure proposes a GPU cluster system based on an orthogonal architecture, involving server equipment in a whole rack. It expands the interconnection of 4-16 GPUs in related technologies to 64-128 GPUs interconnected via Ethernet switching chips, improves the high bandwidth domain (HBD), builds a higher performance GPU cluster, meets the requirements of large model parameter volume, and has the function of supporting greater power consumption and stronger heat dissipation capabilities.

[0021] Figure 1 is a schematic diagram of a GPU cluster system according to an embodiment of the present disclosure. As shown in Figure 1, the GPU cluster system includes a compute node 100 and a switching node 200.

[0022] As shown in Figure 2, the entire rack includes: compute nodes, switching nodes, chassis management boards, and power supply racks.

[0023] Compute nodes are the core component of a GPU cluster system, responsible for performing the actual computational tasks, especially the training and inference of AI models. Each compute node contains one or more Central Processing Units (CPUs) and multiple GPUs. The CPU is primarily responsible for controlling and coordinating various operations within the node, while the GPUs utilize their parallel computing capabilities to accelerate large-scale data processing and computational tasks.

[0024] Switching nodes act as the network hub in a GPU cluster system, enabling high-speed data exchange between nodes by containing multiple switching chips. These switching chips handle both scale-up and scale-out network requirements. Scale-up networks primarily optimize the GPU interconnects within compute nodes, while scale-out networks horizontally extend data to other nodes in the cluster. The CPU and BMC (Baseboard Management Controller) within the switching node manage the node and monitor its status, ensuring normal network operation and maintenance.

[0025] Chassis Management Board: Responsible for the management and monitoring of the entire system, including the configuration, status monitoring, and fault diagnosis of compute nodes, switching nodes, and power supply racks. Through communication with the BMC and other management units, the chassis management board can centrally control and adjust the cluster's operating parameters, monitor hardware status such as temperature, voltage, and fan speed, and connect to the external management network, achieving unified management and maintenance of the cluster.

[0026] Power shelf: Provides power to the entire GPU cluster system. It contains one or more power modules that distribute power to compute and switching nodes via power busbars and a switching backplane. The power shelf is designed for blind insertion, meaning power modules can be directly inserted and connected without needing to be aligned to a specific position. This simplifies maintenance and upgrades, and improves system flexibility and availability.

[0027] In summary, compute nodes, switching nodes, chassis management boards, and power racks each undertake core tasks in a GPU cluster system, including computing, data exchange management, system management, and power supply. They cooperate to build a high-performance, scalable, and easily maintainable GPU cluster computing environment. Through orthogonal architecture design, they can work together efficiently, thus meeting the demands of large-scale AI computing for high computing power, high-bandwidth communication, and stable power supply.

[0028] In this embodiment, both the computing nodes and the switching nodes are modularly designed, capable of supporting different CPUs, GPUs, and switching chips, thereby enabling the evolution of orthogonal architecture.

[0029] In one embodiment, the first board containing the compute node and the second board containing the switching node are orthogonally arranged in the rack, and each GPU of the compute node is interconnected with each switching chip of the switching node through an orthogonal OD connector.

[0030] In this embodiment, the first board containing the switching node is vertically placed inside the rack, and the second board containing the computing node is horizontally placed inside the rack. It should be noted that the first board containing the switching node can also be horizontally placed inside the rack, and the second board containing the computing node can also be vertically placed inside the rack; this is not a limitation.

[0031] In one embodiment, the switching chip includes a vertically extended switching chip and a horizontally extended switching chip; or, a vertically extended switching chip.

[0032] In this embodiment, there are two types of switching chips for the switching node: one is a scale-up switching chip interconnected to the GPU, and the other is a scale-out switching chip interconnected to the network card module. The switching chip can be a combination of scale-up switching chips and scale-out switching chips, or it can be a combination of the same scale-up switching chips.

[0033] In one embodiment, the computing node further includes a horizontally expandable network interface card (NIC) module for providing an external internet port to the computing node.

[0034] In this embodiment, the computing node further includes a first central processing unit (CPU), which is connected to each GPU via a PCIe computer bus.

[0035] In one embodiment, the OD connector includes a plurality of first OD connectors, wherein the ports of i of the plurality of first OD connectors are connected to each GPU; the ports of j of the plurality of first OD connectors are respectively connected to the GPU and the horizontal expansion network interface card, wherein i is an integer greater than 0 and j is an integer greater than 0 and less than i.

[0036] In this embodiment, FIG3 is a network architecture diagram of the interconnection of components in a computing node according to an embodiment of the present disclosure. As shown in FIG3, for example, the computing node includes 1 CPU, 4 GPUs, 2 RDIMM*8 (Registered DIMM), a network card module, a hard disk, and 8 OD (orthogonal design) connectors.

[0037] In this embodiment, the CPU, GPU, OD connector, and network card module (in the case of scale-out network aggregation) are the core components of the computing node in this embodiment. The number of components in the computing node can be adjusted according to the actual situation.

[0038] Among them, RDIMM provides high-speed data access capabilities, and the network interface card (NIC) module is responsible for external network communication. The NIC module includes scale-out NICs, storage NICs, management NICs, and service NICs.

[0039] The CPU (i.e., the first CPU) of the compute node is interconnected with the GPU via PCIe. The scale-out network card is interconnected with the GPU via PCIe. The storage network card, management network card, service network card, hard disk and CPU are interconnected via PCIe. The GPU is interconnected with the switching chip of the switching node via the OD connector of the compute node (i.e., the first OD connector). The GPU is interconnected with the switching chip in an orthogonal manner using protocols such as ROCE (Rdma Over Converged Ethernet) or improved Ethernet adapted for GPU interconnection.

[0040] In one embodiment, GPU system interconnect includes the following two interconnection methods:

[0041] (1) The first OD connector includes two types: one is the interconnection scale up signal, and the other is the interconnection scale up+scale out signal.

[0042] In this embodiment, for example, there are 8 first OD connectors. All first OD connectors are interconnected with a scale-up network, i.e., a GPU-Scale-up switching chip. The scale-out network can be interconnected by one or more first OD connectors according to actual needs. The corresponding switching chip responsible for scale-out on the switching node can then satisfy the interconnection.

[0043] When the first OD connector transmits the scale-up signal, the signals in the GPU need to be transmitted to the first OD connector; when the first OD connector transmits the scale-out signal, the signals in the network card module need to be transmitted to the first OD connector.

[0044] In one embodiment, when i=6, the first OD connector interconnects the scale-up signal. The ports of the four GPUs are connected to the six first OD connectors in an equal manner. The six first OD connectors interconnect the scale-up switching chip of the switching node, and this part of the signal is interconnected to the switching chip responsible for scale-up. This part of the interconnection is called scale-up interconnection.

[0045] When j=2, the first OD connector interconnects the Scale up and Scale out signals. The ports of the four GPUs are connected to the two first OD connectors in an equal manner. The two first OD connectors are interconnected with the Scale up switching chip of the switching node, and this part of the signal is interconnected to the switching chip responsible for Scale up. The Scale out network can choose to interconnect two first OD connectors according to actual needs. The ports of the Scale out network cards are interconnected to the OD connector of the switching node with Scale out aggregation function (i.e., the second OD connector). The second OD connector is interconnected to the switching chip responsible for Scale out on the switching node, realizing the Scale out network aggregation.

[0046] (2) The first OD connector includes two types: one is the interconnect scale up signal and the other is the interconnect scale out signal.

[0047] In this embodiment, for example, there are 8 first OD connectors, of which 6 first OD connectors are interconnected with scale-up networks, i.e. GPU-Scale-up switching chips. The scale-out network can be interconnected with 2 first OD connectors according to actual needs, and the corresponding switching chip responsible for scale-out on the switching node can meet the interconnection requirements.

[0048] In one embodiment, when i=6, the first OD connector interconnects the scale-up signal. The ports of the four GPUs are connected to the six first OD connectors in an equal manner. The six first OD connectors interconnect the scale-up switching chip of the switching node, and this part of the signal is interconnected to the switching chip responsible for scale-up. This part of the interconnection is called scale-up interconnection.

[0049] When j=2, the first OD connector interconnects the scale out signal, where the ports of the two GPUs are connected equally to the two first OD connectors. The scale out network can choose to interconnect the two first OD connectors according to actual needs. The ports of the scale out network cards are interconnected to the OD connectors of the switching nodes with scale out aggregation function (i.e., the second OD connectors). The second OD connectors are then interconnected to the switching chip on the switching node responsible for scale out, thus realizing the scale out network aggregation.

[0050] In one embodiment, the OD connector further includes a plurality of second OD connectors, wherein the ports of the plurality of second OD connectors are respectively connected to each GPU and the vertical expansion switch chip, and the ports of the plurality of second OD connectors are respectively connected to the horizontal expansion network interface card and the horizontal expansion switch chip; or, the ports of the plurality of second OD connectors are respectively connected to each GPU and the vertical expansion switch chip; or, the ports of the plurality of second OD connectors are respectively connected to the horizontal expansion network interface card and the horizontal expansion switch chip.

[0051] In one embodiment, the switching node further includes a second central processing unit (CPU) connected to each switching chip via PCIe.

[0052] In this embodiment, FIG4 is a network architecture diagram of the interconnection of components in the switching node according to an embodiment of the present disclosure. As shown in FIG4, the switching node includes 1 BMC, 1 CPU, 2 switching chips (SWitchA and SWitchB) and 16 OD connectors.

[0053] In this embodiment, the BMC, CPU, switching chip, and OD connector are the core components of the switching node. The number of core components in the switching node can be adjusted according to the actual situation.

[0054] The BMC and the CPU of the switching node (i.e., the second CPU) use MISC (Multiple Independent Sub-clusters) to configure, manage, and upgrade the switching node. The second CPU is connected to the SWitchA and SWitchB switching chips via PCIe.

[0055] There are two types of switching chips in the switching nodes: scale-up switching chips that interconnect to the GPU and scale-out switching chips that interconnect to the network interface card (NIC) module. The number of switching chips can be configured according to the aggregation requirements of the CPU and scale-out switching chips. The switching chips can be a combination of scale-up and scale-out switching chips, or a combination of the same scale-up switching chips.

[0056] In this embodiment, if the switching chip is a scale-up switching chip and a scale-out switching chip, the second OD connector can transmit signals in two ways: one can transmit scale-up signals, and the other can transmit both scale-up and scale-out signals. Furthermore, the second OD connector can also transmit signals in two ways: one can transmit only scale-up signals, and the other can transmit only scale-out signals. The interconnection method of the second OD connector in this embodiment is basically the same as the interconnection method of the first OD connector in the previous embodiment; therefore, the similarities will not be repeated here.

[0057] If all switching chips are scale-up switching chips, then all scale-up switching chips are connected to the ports of the second OD connector, and the ports of the second OD connector are connected to each GPU of the compute node.

[0058] In this embodiment, the number of switching nodes matches the number of OD connectors of the computing nodes, thereby realizing a fully interconnected scenario where the computing nodes are interconnected with each switching node and the switching nodes are interconnected with each computing node.

[0059] In one embodiment, the GPU cluster system further includes a chassis management module; the chassis management module includes a baseboard management controller (BMC) and a peripheral management interface, wherein the BMC is configured to manage the GPU cluster system through the peripheral management interface.

[0060] In this embodiment, FIG5 is a network architecture diagram of component interconnection in the chassis management unit according to an embodiment of the present disclosure. As shown in FIG5, the chassis management unit mainly consists of BMC and peripheral management interface (panel interface).

[0061] The BMC (Bridge Management Unit) is the core of the chassis management unit. It monitors and manages system status in real time by communicating with hardware components such as compute nodes, switching nodes, and power supply racks. The BMC interface can not only monitor the operating temperature, voltage, and current of components such as CPU, GPU, and memory, but also track the health status of power supplies, fans, and liquid cooling units, thereby providing timely alarms and troubleshooting when anomalies occur.

[0062] The peripheral management interface is mainly divided into two parts. One part realizes the detection of liquid leakage in the whole cabinet and the management of CDU and power rack. The other part is the management Internet port, which connects to the management network side through the external leaf switch to realize the unified management of computing nodes, switching nodes and GPU clusters.

[0063] In this embodiment, the peripheral management interface in the chassis management unit is a key component for unified monitoring and control of the entire rack-mounted GPU cluster system. It is responsible for communicating with internal and external management units to ensure the normal operation and efficient management of the cluster. The following is a detailed description of the peripheral management interface:

[0064] DDR4, Flash, and EMMC interfaces: used for local data storage and access of the BMC. DDR4 memory provides high-speed data caching and processing capabilities, while Flash and EMMC are used to store system configuration, firmware, and log information, ensuring that the management unit can save critical data and configuration information even in abnormal situations such as power outages, so as to quickly restore the operating state after a restart.

[0065] LM75, SPI, I2C, GPIO interfaces: LM75 is a temperature sensor interface used to acquire temperature information from hardware; SPI (Serial Peripheral Interface) and I2C (Inter-Integrated Circuit) are serial interfaces for low-speed peripheral communication, which can be connected to temperature sensors, fan controllers, and other monitoring devices; GPIO (General-Purpose Input / Output) interfaces provide general input / output control capabilities, which can be used to control devices such as indicator lights and buttons, and can also be used as an expansion interface to connect additional monitoring or control hardware.

[0066] RGMII, PHY, and MDI interfaces are associated with network communication. RGMII (Reduced Gigabit Media Independent Interface) is an Ethernet interface used to connect the BMC to external network devices. The PHY (Physical Layer) interface is the physical layer network signal transceiver port, and the MDI (Media Dependent Interface) provides media-independent network interface capabilities.

[0067] Panel Interfaces (RS485, RS232, Pwr / Sys / Health LED): The panel interfaces are used to visually display the system status. The RS485 and RS232 interfaces can be connected to external devices, such as leak detection systems and CDUs (Cooling Distribution Units), for monitoring and managing the liquid cooling system's operation. The Pwr / Sys / Health LED indicators directly display the power status, system health, and operating status on the front panel of the chassis, facilitating on-site operation and initial fault diagnosis.

[0068] RJ45 interface: This is an Ethernet interface used for direct physical connection between the BMC and external management networks. Through the RJ45 interface, the BMC can access a local area network (LAN) or wide area network (WAN) to communicate and exchange data with other management systems, thereby enabling remote monitoring and management operations.

[0069] Through the management interfaces described above, the chassis management unit can effectively monitor and control various components of the GPU cluster system, including but not limited to hardware status monitoring, network communication management, fault diagnosis, and system configuration adjustments. The collaborative work of these management interfaces ensures high availability, maintainability, and security of the system, which is crucial for the stable operation and efficient management of large-scale intelligent computing servers. In data center and cloud computing environments, the chassis management unit, through integration with enterprise-level monitoring systems, can achieve automated cluster operation and maintenance, significantly reducing operational costs and complexity, and improving overall computing efficiency.

[0070] In this embodiment, the leaf switch is integrated into the rack via electrical interconnection, saving optical modules and leaf switches and reducing networking costs.

[0071] In one embodiment, the GPU cluster system further includes a power supply unit for supplying power to the compute nodes and switching nodes via blind plugging.

[0072] In this embodiment, FIG6 is a block diagram of the physical structure of the components in the power supply unit according to the present disclosure. As shown in FIG6, the power supply unit mainly includes a power shelf, a busbar, and a switching power supply backplane, etc., to provide power to the computing node and the switching node in a blind-plug form.

[0073] In one embodiment, a blind-plug design under an orthogonal architecture is used, such as placing the power supply copper busbar of the power supply unit vertically and the switching power supply backplane horizontally, thereby realizing the blind-plug operation of the power supply unit and continuously providing power to the computing nodes and switching nodes. Blind plugging ensures that the power supply unit is automatically aligned and connected with the power supply when assembled in the rack, effectively connecting the hardware without additional manual operation. This is especially important in the daily operation and maintenance of data, enhancing the stability and reliability of the system.

[0074] In one embodiment, the GPU cluster system further includes a liquid cooling module for liquid cooling of compute nodes and switching nodes via blind insertion.

[0075] In this embodiment, Figure 7 is a block diagram of the physical structure of the components in the liquid cooling unit according to an embodiment of the present disclosure. As shown in Figure 7, the liquid cooling unit mainly consists of a compute node quick-connect fitting, a switch node quick-connect fitting, an inlet pipe, and an outlet pipe. It also includes a CDU and a manifold (distributor). The CDU provides cold water input. The manifold evenly distributes the cold water from the CDU to the compute node quick-connect fittings and switch node quick-connect fittings in the inlet pipe, and then delivers the cold water to the compute node and switch node. The hot water is then returned to the CDU through the manifold via the compute node quick-connect fittings and switch node quick-connect fittings in the outlet pipe, thereby achieving heat transfer of the entire device. The heat dissipation of high-power chips in the compute node and switch node is achieved through liquid cooling. Heat dissipation can also be achieved through fans on the compute node to achieve a cooling effect.

[0076] In this embodiment of the disclosure, the CPU and GPU parts of the computing node can be separated, and the GPU cluster interconnection and blind insertion operation under the orthogonal architecture can be realized through the OD connector and the channel on the switching node. This method can be adapted to users who need to decouple the CPU and GPU nodes for networking.

[0077] In this embodiment of the disclosure, the GPU cluster system based on orthogonal architecture can be applied to big data centers, cloud services, enterprises, scientific research, education and high-performance computing fields, and can provide stable power supply, heat dissipation conditions and high-speed network connection.

[0078] Through the above steps, a GPU cluster system is provided. The first board containing the compute nodes and the second board containing the switching nodes are orthogonally arranged in the rack. That is, through an orthogonal interconnect architecture, each GPU of the compute node is interconnected with each switching chip of the switching node through an orthogonal OD connector, which significantly improves the GPU density and shortens the computing time. Therefore, it can solve the problem of limited GPU interconnection capacity and low GPU cluster computing scale and performance caused by cable interconnection in related technologies, thereby achieving the effect of increasing the number of GPU interconnections and computing performance.

[0079] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this disclosure, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of this disclosure.

[0080] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.

[0081] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.

[0082] It is obvious to those skilled in the art that the modules or steps of this disclosure described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this disclosure is not limited to any particular combination of hardware and software.

[0083] The above description is merely a preferred embodiment of this disclosure and is not intended to limit this disclosure. Various modifications and variations can be made to this disclosure by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A graphics processing unit (GPU) cluster system, comprising: Compute nodes and switching nodes; The first board containing the computing node and the second board containing the switching node are orthogonally arranged in the rack. Each GPU of the computing node is interconnected with each switching chip of the switching node through an orthogonal OD connector.

2. The GPU cluster system according to claim 1, wherein, The switching chip includes a vertically extended switching chip and a horizontally extended switching chip; or, a vertically extended switching chip.

3. The GPU cluster system according to claim 1, wherein, The computing node also includes a horizontally expandable network interface card (NIC) module, which provides an external internet port for the computing node.

4. The GPU cluster system according to any one of claims 1 or 3, wherein, The OD connector includes a plurality of first OD connectors; The ports of i of the plurality of first OD connectors are connected to each GPU; The ports of j of the plurality of first OD connectors are respectively connected to the GPU and the horizontal expansion network card module, wherein i is an integer greater than 0 and j is an integer greater than 0 and less than i.

5. The GPU cluster system according to any one of claims 1-3, wherein, The OD connector further includes multiple second OD connectors, wherein the ports of the multiple second OD connectors are respectively connected to each GPU and the vertical expansion switching chip, and the ports of the multiple second OD connectors are respectively connected to the horizontal expansion network interface module and the horizontal expansion switching chip; or, The ports of the plurality of second OD connectors are respectively connected to each GPU and the vertically extended switching chip; or, The ports of the plurality of second OD connectors are respectively connected to the horizontal expansion network card module and the horizontal expansion switching chip.

6. The GPU cluster system according to claim 1, wherein, The computing node also includes a first central processing unit (CPU), which is connected to each GPU via a PCIe computer bus.

7. The GPU cluster system according to claim 1, wherein, The switching node also includes a second central processing unit (CPU), which is connected to each switching chip via PCIe.

8. The GPU cluster system according to claim 1, wherein, Also includes: Chassis management module; The chassis management module includes a baseboard management controller (BMC) and a peripheral management interface, wherein the BMC is configured to manage the GPU cluster system through the peripheral management interface.

9. A server comprising the GPU cluster system of any one of claims 1 to 8.