Wafer level chip, board card and electronic device
By employing a multi-layer interconnect structure of ring and star networks in wafer-level chips, the problems of interconnect topology limitations and poor robustness of wafer-level chips are solved, achieving a balance between high bandwidth and low latency, and improving robustness in the event of node failure.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING TSINGMICRO INTELLIGENT TECH CO LTD
- Filing Date
- 2026-02-14
- Publication Date
- 2026-06-19
AI Technical Summary
Wafer-level chips suffer from interconnect topology limitations and poor robustness under high-density interconnection, especially when computing nodes or switching nodes fail, affecting the normal operation of the entire chip.
A multi-layer network architecture is adopted, including a ring network and a star network. Computation blocks are interconnected through a ring network, and computation blocks are interconnected through a high-bandwidth second network. Routing paths are dynamically selected to cope with node failures.
It achieves a balance between high bandwidth and low latency, and improves the chip's robustness in the event of node failure, ensuring that other computing blocks are not affected.
Smart Images

Figure CN122240553A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of chip design technology, and more particularly to the field of wafer-level chip architecture design technology, specifically to a wafer-level chip, board, and electronic device. Background Technology
[0002] A wafer-scale chip (WSE) is a type of chip that is designed, manufactured, and packaged as a single, giant chip, directly on an entire wafer (or most of it). It integrates all functional units, such as computing cores, memory, and communication networks, that would otherwise require multiple independent chips interconnected externally.
[0003] It's clear that wafer-level chips (WLS) offer advantages like low latency and high bandwidth. However, their high density limits interconnect resources, restricting interconnect topology. Furthermore, the failure of individual compute nodes within a WLS can disrupt the entire chip's operation. Therefore, robust design is crucial to ensure proper functioning and mitigate the risk of failures. Specifically, existing technologies typically employ mesh networks for WLS interconnects. While these networks offer easy scalability and high overall bandwidth, they suffer from limited global communication bandwidth and long worst-case paths. The failure of a compute or switching node significantly impacts the bandwidth of adjacent nodes.
[0004] This section is intended to provide background or context for the embodiments of the invention set forth in the claims. The description herein is not an admission that it is prior art simply because it is included in this section. Summary of the Invention
[0005] This invention provides a wafer-level chip that achieves a balance between bandwidth and latency while maintaining high bandwidth and low latency between wafer-level chips, and improves robustness.
[0006] The wafer-level chip includes: multiple computing blocks, each computing block including a switching node and multiple computing nodes distributed around the switching node; the multiple computing nodes in the computing block and the switching node are interconnected through a first network, the first network including a ring network and a star network, the multiple computing nodes in the computing block are interconnected through the ring network, and the switching node in the computing block is interconnected with the multiple computing nodes through the star network; the switching nodes of the multiple computing blocks are interconnected through a second network; wherein, the bandwidth of the first network is less than the bandwidth of the second network.
[0007] In some embodiments, data transmission within a compute block dynamically selects a routing path based on the compute node status and load within the compute block, wherein the compute node status is used to indicate whether a compute node has failed.
[0008] In some embodiments, the bandwidth of the second network is n times the bandwidth of the first network, where n is not greater than the number of computing nodes in the computing block.
[0009] In some embodiments, when one or a neighboring computing node in the computing block fails, the other computing nodes communicate through the ring network or the star network.
[0010] In some embodiments, when a non-adjacent computing node in the computing block fails, other computing nodes communicate through the star network.
[0011] In some embodiments, multiple computing nodes of different computing blocks communicate through the first network and the second network; The number of computing nodes differs between different computing blocks.
[0012] In some embodiments, the second network is a ring network.
[0013] In some embodiments, the second network is a mesh network.
[0014] This invention also provides a board, including the wafer-level chip described above.
[0015] This invention also provides an electronic device including the aforementioned circuit board.
[0016] As described above, the wafer-level chip provided in this embodiment of the invention includes: multiple computing blocks, each computing block including a switching node and multiple computing nodes distributed around the switching node; the multiple computing nodes in the computing block and the switching node are interconnected through a first network, the first network including a ring network and a star network, the multiple computing nodes in the computing block are interconnected through the ring network, and the switching node in the computing block is interconnected with the multiple computing nodes through the star network; the switching nodes of the multiple computing blocks are interconnected through a second network; wherein, the bandwidth of the first network is less than the bandwidth of the second network.
[0017] The wafer-level chip provided by this invention can achieve a balance between bandwidth and latency under the premise of high bandwidth and low latency, and can significantly improve the robustness when node fails. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. In the drawings: Figure 1 This is a schematic diagram of the structure of a wafer-level chip in an embodiment of the present invention; Figure 2 This is a schematic diagram of the structure of the computation block in an embodiment of the present invention; Figure 3 This is a schematic diagram of the ring network structure in an embodiment of the present invention; Figure 4 This is a schematic diagram of the star network structure in an embodiment of the present invention; Figure 5 This is a schematic diagram illustrating the robustness analysis of a wafer-level chip in an embodiment of the present invention; Figure 6 This is another schematic diagram illustrating the robustness analysis of a wafer-level chip in an embodiment of the present invention; Figure 7 This is a schematic diagram of the communication routes between computing nodes of different computing blocks in an embodiment of the present invention; Figure 8 This is a third schematic diagram of a robustness analysis of a wafer-level chip in an embodiment of the present invention. Detailed Implementation
[0019] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings. Here, the illustrative embodiments and their descriptions are used to explain the present invention, but are not intended to limit the present invention. It should be noted that, unless otherwise specified, the embodiments and features in the embodiments of this application can be arbitrarily combined with each other. The acquisition, storage, use, and processing of data in the technical solutions of this application all comply with relevant laws and regulations. The user information in the embodiments of this application is obtained through legal and compliant means, and the acquisition, storage, use, and processing of user information have been authorized and agreed upon by the customer.
[0020] To facilitate understanding of the technical solution provided in this application, the relevant content of the technical solution in this application will be explained below.
[0021] In existing technologies, wafer-level chip architectures face the following core contradictions: the number of computing cores is growing exponentially (from hundreds of thousands to millions), but data supply capacity (bandwidth) and energy efficiency (data transfer power consumption) have become the biggest bottlenecks. The more powerful the computing unit, the "hunger" it becomes, while in the traditional von Neumann architecture, data needs to be transferred long distances between memory and computing units, resulting in extremely high energy consumption.
[0022] In addition, when a computing node or switching node in an existing wafer-level chip fails, it has a significant impact on the bandwidth of adjacent nodes, i.e., poor robustness.
[0023] To address the aforementioned technical problems, this invention provides a wafer-level chip. For example... Figure 1 As shown, the wafer-level chip includes: Multiple computation blocks 3 (dashed boxes in the figure), each computation block 3 includes a switching node 2 (blue box in the figure) and multiple computation nodes 1 (white boxes in the figure) distributed around the switching node 2 (blue box in the figure); the multiple computation nodes 1 in the computation block 3 and the switching node 2 are interconnected through a first network, the first network including a ring network (see Figure 3 ) and star networks (see Figure 4 The multiple computing nodes 1 of the computing block 3 are interconnected through a ring network, and the switching nodes 2 in the computing block 3 are interconnected with the multiple computing nodes 1 through a star network; the switching nodes 2 of the multiple computing blocks 3 are interconnected through a second network; wherein, the bandwidth of the first network is less than the bandwidth of the second network.
[0024] As described above, the wafer-level chip provided in this embodiment of the invention includes: multiple computing blocks, each computing block including a switching node and multiple computing nodes distributed around the switching node; the multiple computing nodes in the computing block and the switching node are interconnected through a first network, the first network including a ring network and a star network, the multiple computing nodes in the computing block are interconnected through the ring network, and the switching node in the computing block is interconnected with the multiple computing nodes through the star network; the switching nodes of the multiple computing blocks are interconnected through a second network; wherein, the bandwidth of the first network is less than the bandwidth of the second network.
[0025] The wafer-level chip provided by this invention can achieve a balance between bandwidth and latency under the premise of high bandwidth and low latency, and can significantly improve the robustness when node fails.
[0026] In some embodiments, wafer-level chips can be divided into two types: compute nodes and switching nodes.
[0027] Compute nodes are the computation cores of a coarse-grained reconfigurable architecture (CGRA), responsible for executing computational tasks. Unlike fine-grained FPGAs (which operate on a per-logic-gate basis), CGRA computation cores are "coarse-grained," typically consisting of word-length (e.g., 16 / 32-bit) arithmetic units (ALUs, multiply-accumulate units) or small processing units. They are connected through a highly reconfigurable on-chip interconnect network, which can quickly reconstruct the optimal hardware data flow path at runtime according to different algorithms.
[0028] Switching node: A specially designed data switching core responsible for data switching.
[0029] The aforementioned computing nodes can be CPU cores (such as Arm cores), dedicated accelerator cores (such as AI tensor cores, graphics cores, cryptography cores, etc.), or a combination of both. Their core characteristics are shown in Table 1.
[0030] Table 1
[0031] In some embodiments, data transmission within a compute block dynamically selects a routing path based on the compute node status and load within the compute block, wherein the compute node status is used to indicate whether a compute node has failed.
[0032] Specifically, when selecting a route path, the route path can be selected dynamically by software, or the next hop route path can be determined by each node. This application does not limit this to either method.
[0033] In some embodiments, the wafer-level chip includes at least two computing blocks.
[0034] It should be noted that different computing blocks do not share computing nodes; that is, a computing node belongs to only one computing block and cannot belong to two or more computing blocks at the same time.
[0035] In some embodiments, see Figure 1 Adjacent computing nodes 1 are directly interconnected to form a ring-shaped computing node network (the ring network mentioned above).
[0036] In some embodiments, in order to achieve a balance between bandwidth and latency at the wafer-level chip, this application sets the second network between multiple switching nodes 2 as a high-speed transmission network, and sets the first network within a single computing block 3, which is formed between multiple computing nodes 1 and between multiple computing nodes 1 and switching nodes 2, as a low-speed network.
[0037] It should be noted that, see Figure 2A computation block 3 contains only one swap node 2 and multiple computation nodes 1. Different computation blocks 3 can contain different numbers of computation nodes 1, or they can contain the same number of computation nodes 1.
[0038] In some embodiments, the bandwidth of the second network is n times the bandwidth of the first network, where n is not greater than the number of computing nodes in the computing block. For example, see [link to relevant documentation]. Figure 1 If the number of computing nodes in computing block 3 is 8, then the bandwidth of the second network is 1 to 8 times that of the bandwidth of the first network (n can be a decimal).
[0039] In some embodiments, see Figure 2 The multiple computing nodes 1 of the computing block 3 are connected in a ring.
[0040] The connection relationship between the switching node 2 of the computing block 3 and multiple computing nodes 1 is a star connection, that is, the switching node 2 is directly interconnected with all the computing nodes 1 around it, forming a star-shaped switching node network.
[0041] Specifically, see Figure 3 Within a single computation block, all computation nodes 1 surrounding node 2 are connected in a ring, forming a ring network.
[0042] Next, see Figure 4 Within a single computation block, swap node 2 and all surrounding computation nodes 1 are connected in a star topology (swap node 2 is the center of the star topology), forming a star network.
[0043] In some embodiments, the routing algorithm between computing nodes within a computing block is the shortest path in a ring network.
[0044] In some embodiments, when one or a neighboring computing node in the computing block fails, the other computing nodes communicate through the ring network or the star network.
[0045] In some embodiments, when a non-adjacent computing node in the computing block fails, other computing nodes communicate through the star network.
[0046] It is understandable that when compute node 1 fails, only compute block 3 containing compute node 1 is affected, and other compute blocks 3 are not affected. Furthermore, compute node 1 ( Figure 5 When the dashed circle in the diagram fails, the star network inside the computation block 3 where it is located, as well as the unfailed ring network, can supplement the failed ring network.
[0047] Specifically, see Figure 6When multiple computing nodes 1 fail (and the failed computing nodes 1 are not adjacent), the computing nodes 1 in the middle of the multiple failed computing nodes 1 can only communicate through a star network.
[0048] In some embodiments, multiple computing nodes of different computing blocks communicate through the first network and the second network.
[0049] Specifically, when computing nodes between computation blocks need to exchange data, they first travel through a star network to the exchange node. Then, the shortest path in the second network (i.e., the path with the fewest exchange nodes) is calculated between the exchange nodes. See also... Figure 7 When the two computation nodes 1 (red boxes) of the two computation blocks 3 above communicate, the communication route between them is shown by the green line in the figure.
[0050] In some embodiments, different computation blocks have the same number of computation nodes. For example, see Figure 1 The number of computation nodes 1 in different computation blocks 3 is 8.
[0051] In some embodiments, see Figure 1 The second network is a ring network, that is, the connection relationship (red line) of the multiple switching nodes 2 in the second network is a ring connection.
[0052] In some embodiments, see Figure 8 In its initial state, the wafer-level chip consists of three computation blocks 3 (excluding the computation block 3 in the lower right corner of the figure), and one or more spare computation blocks are pre-set. Figure 8 The computation block 3 in the lower right corner of the image (as shown by the dashed circle in the image) will be used to introduce a spare computation block 3 into the second network through the logic configuration set in order to improve the robustness of the wafer-level chip.
[0053] In some embodiments, the second network is a mesh network. The first network is a ring network and a star network with lower bandwidth than the mesh network, and the bandwidth of the mesh network of the switching node 2 is n times the bandwidth of the first network, where n ranges from 1 to the number of computing nodes, in order to achieve a balance between bandwidth and latency at the wafer-level chip.
[0054] A mesh network is a network topology where all nodes are interconnected. In the field of wafer-level chips, it specifically refers to a regular, scalable physical routing scheme. Simply put, it can be imagined as a "chessboard city road network" for integrated circuits: each computing unit (intersection) is directly connected to its neighbors in the east, west, north, and south by roads, and data packets can choose the shortest or optimal path along the grid to reach their destination.
[0055] Mesh networks outperform other common interconnect topologies in high-performance computing because they achieve the optimal balance between regularity, bandwidth, and latency. A comparison with other common interconnect topologies is shown in Table 2. Table 2
[0056] In a wafer-level chip scenario, mesh networks offer the following technical advantages: 1. Extreme regularity and manufacturability: The checkerboard structure of the mesh network perfectly maps to the physical layout of the silicon chip. The design of each compute node and router can be completely repeated, greatly simplifying the physical design, verification, and manufacturing process of ultra-large-scale chips. This is the premise that it can be used to integrate hundreds of thousands of cores.
[0057] 2. High bandwidth and scalability: Bandwidth aggregation: The total bandwidth of the network increases linearly with the number of nodes. Because the communication channels are distributed, there is no centralized bottleneck.
[0058] Unlimited scalability: Within the limits of manufacturing process, the number of rows and columns can theoretically be continuously increased to expand the scale, easily scaling from dozens of cores to hundreds of thousands of cores.
[0059] 3. High fault tolerance and flexible routing: Multipath redundancy: There are usually multiple paths between two points. When a link or node fails, the data can be automatically redirected.
[0060] Dynamic adaptive routing: Routers can dynamically select idle paths based on real-time network congestion to achieve load balancing and avoid local "traffic jams".
[0061] 4. Flexible mapping of computing tasks: Any computing task that requires close communication can be mapped to a cluster of nodes that are physically adjacent, minimizing communication overhead.
[0062] 5. Achieve linear performance scaling: As long as the algorithm has sufficient parallelism, adding computing cores can achieve near-linear performance improvements because communication bottlenecks are minimized.
[0063] As described above, the wafer-level chip provided in this embodiment of the invention employs a two-layer network (first network and second network) architecture to balance bandwidth and latency, and uses a separate design for compute nodes and switching nodes to improve robustness. Therefore, compared to existing wafer-level chip prices, this invention has the following beneficial effects in terms of robustness: 1. When a compute node fails, it only affects the compute block where the compute node is located, and does not affect other compute blocks.
[0064] 2. When a switching node fails, it only affects the computation block where the switching node is located, and does not affect other computation blocks.
[0065] 3. When computing nodes fail, star networks can supplement ring networks.
[0066] 4. The switching nodes are mainly logical and much smaller in scale than the computing nodes, resulting in a low failure rate.
[0067] As described above, the wafer-level chip provided in the embodiments of the present invention includes: Multiple computing blocks are provided, each computing block including a switching node and multiple computing nodes distributed around the switching node; the multiple computing nodes in the computing block and the switching node are interconnected through a first network, the first network including a ring network and a star network, the multiple computing nodes in the computing block are interconnected through the ring network, and the switching node in the computing block is interconnected with the multiple computing nodes through the star network; the switching nodes of the multiple computing blocks are interconnected through a second network; wherein, the bandwidth of the first network is less than the bandwidth of the second network.
[0069] The wafer-level chip provided by this invention can achieve a balance between bandwidth and latency under the premise of high bandwidth and low latency, and can significantly improve the robustness when node fails.
[0070] This application is described with reference to flowchart illustrations and / or block diagrams of methods and apparatus (systems) according to embodiments of this application.
[0071] In the description of this specification, the references to terms such as "an embodiment," "a specific embodiment," "some embodiments," "for example," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0072] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above descriptions are merely specific embodiments of this application and are not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A wafer-level chip, characterized in that, The wafer-level chip includes multiple computing blocks, and each computing block includes a switching node and multiple computing nodes distributed around the switching node; The multiple computing nodes in the computing block and the switching node are interconnected through a first network, which includes a ring network and a star network. The multiple computing nodes in the computing block are interconnected through the ring network, and the switching node in the computing block is interconnected with the multiple computing nodes through the star network. The switching nodes of multiple computing blocks are interconnected through a second network; wherein the bandwidth of the first network is less than the bandwidth of the second network.
2. The wafer-level chip according to claim 1, characterized in that, Data transmission within a compute block dynamically selects a routing path based on the compute node status and load within the compute block, whereby the compute node status indicates whether a compute node has failed.
3. The wafer-level chip according to claim 2, characterized in that, When one or an adjacent computing node in the computing block fails, the other computing nodes communicate through the ring network or the star network.
4. The wafer-level chip according to claim 3, characterized in that, When a non-adjacent computing node in the computing block fails, the other computing nodes communicate through the star network.
5. The wafer-level chip according to claim 1, characterized in that, The bandwidth of the second network is n times the bandwidth of the first network, where n is not greater than the number of computing nodes in the computing block.
6. The wafer-level chip according to claim 1, characterized in that, Multiple computing nodes from different computing blocks communicate through the first network and the second network; The number of computing nodes differs between different computing blocks.
7. The wafer-level chip according to claim 1, characterized in that, The second network is a ring network.
8. The wafer-level chip according to claim 1, characterized in that, The second network is a mesh network.
9. A circuit board, characterized in that, Includes the wafer-level chip as described in any one of claims 1 to 8.
10. An electronic device, characterized in that, Includes the board as described in claim 9.