Method, apparatus and computer device for data routing of network-on-chip

CN121887709BActive Publication Date: 2026-06-30MOXIN ARTIFICIAL INTELLIGENCE TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MOXIN ARTIFICIAL INTELLIGENCE TECH (SHENZHEN) CO LTD
Filing Date
2026-03-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing on-chip network multicast and broadcast solutions are inadequate in terms of communication efficiency, bandwidth usage, hardware overhead, and scalability, and cannot meet high-performance requirements.

Method used

At the source node, a target address code is generated, including the reference coordinates and mask, to determine the coverage area of ​​the data routing task. The data packets are then routed to the nodes within the coverage area via unicast operations, and the replication branch is used to ensure that the data packets reach all destination nodes.

Benefits of technology

It enables efficient multicast or broadcast communication, reduces link redundancy overhead, improves communication efficiency and scalability, and reduces software intervention costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121887709B_ABST
    Figure CN121887709B_ABST
Patent Text Reader

Abstract

This disclosure provides a method, apparatus, and computer device for data routing in an on-chip network. The method includes: generating a target address code at a source node based on multiple destination nodes of a data routing task, wherein the target address code includes reference coordinates and a mask, and the value of each bit of the mask indicates whether the bit is a wildcard bit; determining a coverage area for the data routing task based on the reference coordinates and the mask; in response to determining based on the mask that the current node is not within the coverage area, performing unicast routing with the reference coordinates as the target node to route data packets of the data routing task to the next node; and in response to determining based on the mask that the current node is within the coverage area, determining a replication branch for routing data packets to route data packets of the data routing task to the corresponding target adjacent nodes from at least two output ports corresponding to at least two target adjacent nodes in the current node.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to networks on a chip, and more particularly to a method for data routing in networks on a chip. Background Technology

[0002] With the continuous evolution of semiconductor manufacturing technology, Network on Chip (NoC) has gradually become a core solution for efficient data interaction between multiple nodes on a chip due to its modularity, scalability, and parallel communication capabilities.

[0003] In various communication scenarios of on-chip networks, both multicast and broadcast are one-to-many data transmission modes. Broadcast communication refers to a transmission method in which the source node sends the same message to all reachable destination nodes in the on-chip network, and the transmission coverage is global. Multicast communication refers to a transmission method in which the source node sends the same message to all destination nodes in a pre-defined multicast group, completing data distribution to nodes in the group, and has the characteristic of strong transmission targeting. Multicast and broadcast communication are widely used in core processes such as cache consistency protocol interaction and configuration information synchronization, and are the fundamental supporting technologies for ensuring the collaborative work of multiple nodes on the chip. Summary of the Invention

[0004] This disclosure provides a method and apparatus for data routing in a network-on-a-chip, as well as a computer device, computer-readable medium, and computer program product for performing the method.

[0005] According to embodiments of this disclosure, a method for data routing in an on-chip network is provided. The method includes: generating a target address code at a source node based on multiple destination nodes of a data routing task, wherein the target address code includes reference coordinates and a mask, and the value of each bit of the mask indicates whether the bit is a wildcard bit; determining a coverage area for the data routing task based on the reference coordinates and the mask; in response to determining based on the mask that a current node is not located within the coverage area, performing a unicast route using the reference coordinates as the target node to route data packets of the data routing task to the next node; and in response to determining based on the mask that the current node is located within the coverage area, determining a replication branch for routing the data packets to route the data packets of the data routing task to the corresponding target neighbor nodes from at least two output ports corresponding to at least two target neighbor nodes of the current node.

[0006] In some implementations, determining the coverage area for the data routing task based on the reference coordinates and the mask includes: obtaining the minimum value of the coverage area by performing an AND operation between the reference coordinates and the inverse mask of the mask; and obtaining the maximum value of the coverage area by performing an OR operation between the reference coordinates and the mask.

[0007] In some implementations, it is determined whether the current node is within the coverage area by: determining whether the current node is within the coverage area in response to the AND operation result between the current node and the inverse mask of the mask and the AND operation result between the reference coordinates and the inverse mask of the mask.

[0008] In some implementations, the reference coordinates are the coordinates of a specific destination node among the plurality of destination nodes.

[0009] In some implementations, the target neighbor node is determined based on the following steps: in response to determining that the current node is located within the coverage area based on the mask, determining whether a first candidate neighbor node in a first candidate neighbor node set is located within the coverage area, wherein the first candidate neighbor node includes the current node's neighbor node in a first direction and excludes the node preceding the current node that routes the data packet to the current node; in response to determining that at least one first candidate neighbor node is located within the coverage area, determining the at least one first candidate neighbor node as the target neighbor node; in response to determining that none of the first candidate neighbor nodes in the first candidate neighbor node set are located within the coverage area, determining whether a second candidate neighbor node in a second candidate neighbor node set is located within the coverage area, wherein the second candidate neighbor node includes the current node's neighbor node in a second direction and excludes the node preceding the current node that routes the data packet to the current node, the second direction being different from the first direction; in response to determining that at least one second candidate neighbor node is located within the coverage area, determining the at least one second candidate neighbor node as the target neighbor node.

[0010] In some embodiments, the method further includes: in response to determining that the current node is within the coverage area, delivering the data packet of the data routing task to the local network interface unit of the current node; the local network interface unit determining whether the current node belongs to the plurality of destination nodes; in response to determining that the current node does not belong to the plurality of destination nodes, discarding the data packet; and in response to determining that the current node belongs to the plurality of destination nodes, using the processing unit of the current node to obtain the payload of the data packet.

[0011] In some implementations, the packet header includes a type field that indicates whether the data routing task is a unicast or multicast task.

[0012] In some implementations, the data packet includes information indicating the plurality of destination nodes.

[0013] In some implementations, the data packets corresponding to each copy branch logically point to the same data payload storage area.

[0014] In some implementations, the plurality of destination nodes include a plurality of nodes to which a plurality of operation requests collected at the source node during a predetermined time window are directed.

[0015] According to another embodiment of this disclosure, an apparatus for data routing in an on-chip network is also provided. The apparatus includes: an address encoding unit configured to generate a target address code at a source node based on a plurality of destination nodes of a data routing task, wherein the target address code includes reference coordinates and a mask, and the value of each bit of the mask indicates whether the bit is a wildcard bit; a coverage determination unit configured to determine a coverage area for the data routing task based on the reference coordinates and the mask; a first routing unit configured to, in response to determining based on the mask that a current node is not located within the coverage area, perform unicast routing with the reference coordinates as the target node, routing the data packets of the data routing task to the next node; and a second routing unit configured to, in response to determining based on the mask that a current node is located within the coverage area, determine a replication branch for routing the data packets, for routing the data packets of the data routing task to the corresponding target neighbor nodes from at least two output ports corresponding to at least two target neighbor nodes of the current node.

[0016] According to another embodiment of this disclosure, a computer device is provided, the computer device comprising: at least one processor; and a memory having a computer program stored thereon, wherein, when executed by the at least one processor, the computer program causes the at least one processor to perform the aforementioned method.

[0017] According to another embodiment of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the aforementioned method.

[0018] According to another embodiment of this disclosure, a computer program product is provided, the computer program product including a computer program that, when executed by a processor, causes the processor to perform the aforementioned method.

[0019] The data routing method for on-chip networks provided by the embodiments of this disclosure requires only one unicast operation from the source node to enable data packets to reach all destination nodes, thereby achieving a more efficient, lower bandwidth consumption, more scalable and portable multicast or broadcast communication.

[0020] These and other aspects of this disclosure will be apparent from the embodiments described below, and will be elucidated with reference to the embodiments described below. Attached Figure Description

[0021] The accompanying drawings exemplify embodiments and form part of the specification, serving together with the textual description to explain exemplary implementations of the embodiments. The illustrated embodiments are for illustrative purposes only and do not limit the scope of this disclosure. Throughout the drawings, the same reference numerals refer to similar but not necessarily identical elements.

[0022] Figure 1 An exemplary flowchart illustrating a method for data routing for an on-chip network according to embodiments of the present disclosure is shown.

[0023] Figure 2 An example network-on-a-chip is illustrated according to an embodiment of the present disclosure.

[0024] Figure 3A An example destination address encoding for data routing for a network on-chip according to an embodiment of the present disclosure is illustrated.

[0025] Figure 3B Another example of target address encoding for data routing for a network-on-chip according to embodiments of the present disclosure is illustrated.

[0026] Figure 4 An exemplary block diagram illustrating an apparatus for data routing in a network-on-chip according to embodiments of the present disclosure is shown.

[0027] Figure 5 An exemplary block diagram of a computer device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0028] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0029] In this disclosure, unless otherwise stated, the use of terms such as "first," "second," etc., to describe various elements is not intended to limit the positional, temporal, or importance relationships of these elements; such terms are merely used to distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of that element, while in other cases, based on the context, they may refer to different instances.

[0030] The terminology used in the description of the various examples described in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context explicitly indicates otherwise, an element may be one or more unless the number of elements is specifically limited. As used herein, the term "multiple" means two or more, and the term "based on" should be interpreted as "at least partially based on". Furthermore, the terms "and / or" and "at least one of..." cover any one of the listed items and all possible combinations thereof.

[0031] In related technologies, multicast and broadcast solutions for on-chip networks have shortcomings in terms of communication efficiency, bandwidth consumption, hardware overhead, and scalability, failing to meet the comprehensive requirements of current large-scale, high-performance on-chip networks for high efficiency, low power consumption, high scalability, and high reliability in multicast and broadcast communication. Therefore, an improved solution is urgently needed to overcome the deficiencies of existing technologies.

[0032] Currently, existing systems mainly implement multicast or broadcast functions through the following methods.

[0033] Firstly, multicast or broadcast can be achieved through multiple unicasts. That is, the source node performs multiple unicast operations, sending data packets one by one to each destination node. However, the communication efficiency of multiple unicasts is extremely low, and the multiple transmissions of unicast data packets consume a large amount of bandwidth, which can easily cause network congestion.

[0034] Secondly, there's explicit multicast / broadcast. For example, software triggers packet copying via dedicated instructions or APIs. However, this approach requires new instruction semantics, compiler support, or runtime library cooperation, and also necessitates collaborative modifications between system and application software, resulting in poor cross-platform portability. Another example is software triggering copying via an explicit multicast destination list. The destination nodes are encoded and written into the multicast address segment of the packet header. Hardware determines the location of all destination nodes and forwards the packet to the port in the direction of each destination node, thus performing transmission for each hop. However, in this approach, when the set of destination nodes is large, the packet header overhead swells significantly, and the hardware distribution strategy becomes more complex, increasing implementation costs.

[0035] Third, dedicated transmission structures or consensus-based directory protocols can be used to implement multicast or broadcast. For example, tree-like or ring-like dedicated transmission structures can be used to specifically carry broadcast / multicast data, or a message distribution path can be built around directory nodes within a consensus-based directory protocol framework to achieve targeted distribution of the same message to multiple destination nodes. However, once a dedicated transmission structure is established, it cannot be flexibly expanded according to the increase in the number of on-chip nodes or changes in topology. Furthermore, in consensus-based directory protocol schemes, directory nodes must undertake all message management and distribution tasks, which can easily create performance bottlenecks.

[0036] Based on the above problems, this disclosure proposes a data routing method for on-chip networks, so that multicast / broadcast still appears to the upper-layer software as a regular read / write process, but data routing to multiple destination nodes can be implemented inside the NoC.

[0037] Specifically, in the embodiments provided in this disclosure, the source node can encode the destination node's target address, determine the coverage area of ​​the data routing task in the NoC based on the target address encoding, and write the target address encoding into the packet header. In this way, multiple destination nodes can be implicitly represented as a region in the NoC (e.g., an entire row, an entire column, a rectangle, all chips, etc.). The source node can then unicast the data packet to the nodes within the coverage area, and then propagate the data packet to all nodes within the coverage area according to routing rules, thereby achieving routing of data packets to all destination nodes.

[0038] Therefore, the data routing method for implicit broadcasting proposed in this disclosure only requires the source node to perform a single unicast operation to ensure that data packets reach all destination nodes. Furthermore, upper-layer software can initiate read / write requests using only regular access requests, without requiring dedicated instructions or APIs. Moreover, this method is not limited to dedicated transport structures or specific nodes. Consequently, the communication efficiency of NoC multicast or broadcast communication is improved, link redundancy overhead is reduced, software intervention costs are lowered, and scalability is enhanced.

[0039] The following text will first combine Figure 1 This section describes an example process for implicit broadcasting in an on-chip network according to embodiments of the present disclosure, and then... Figure 2 This section introduces an example on-chip network according to embodiments of the present disclosure, and then... Figure 3A and Figure 3B This paper introduces example destination address encoding for data routing in on-chip networks according to embodiments of the present disclosure, and finally... Figures 4-5 This disclosure describes apparatus and computer devices for data routing in on-chip networks according to embodiments of the present disclosure. Additionally, computer-readable media and computer program products according to embodiments of the present disclosure will also be described.

[0040] Figure 1 A method 100 for data routing for a network on chip according to an embodiment of the present disclosure is illustrated.

[0041] At step 101, a target address code is generated at the source node based on the multiple destination nodes of the data routing task. The target address code includes reference coordinates and a mask, where each bit of the mask indicates whether it is a wildcard bit.

[0042] In step 102, the coverage area for the data routing task is determined based on the reference coordinates and the mask.

[0043] In step 103, in response to determining that the current node is not within the coverage area based on the mask, a unicast route is executed with the reference coordinates as the target node, and the data packets of the data routing task are routed to the next node.

[0044] At step 104, in response to determining that the current node is within the coverage area based on the mask, a replica branch for routing data packets is determined for routing data packets of the data routing task to the corresponding target neighboring nodes from at least two output ports corresponding to at least two target neighboring nodes in the current node.

[0045] Using the method provided in the embodiments of this disclosure, regardless of the location or number of destination nodes, the source node can perform unicast routing towards the reference coordinates. After reaching the coverage area indicated by the reference coordinates and mask, a replication branch for routing data packets is determined, thereby ensuring that data packets reach all destination nodes. Therefore, before reaching the coverage area, the method is similar to unicast routing, requiring no dedicated multicast or broadcast instructions from the upper-layer software. The upper-layer software also does not need to concern itself with the meaning or distribution strategy of multicast or broadcast when issuing requests. Moreover, in the method described above, as long as the addresses of multiple destination nodes are determined at the source node, the coverage area of ​​the data routing task can be determined, and data packets can be routed to multiple destination nodes according to the routing strategy. Therefore, this method is not limited to dedicated transmission structures or specific nodes. Thus, the method provided in this disclosure can achieve data packet replication and propagation in NoC networks based on destination nodes without requiring dedicated configuration or costly modifications to upper-layer software or systems. This improves communication efficiency, reduces link redundancy overhead, lowers software intervention costs, and enhances scalability.

[0046] The principles of this disclosure will now be described in detail.

[0047] In step 101, a target address code can be generated at the source node based on the multiple destination nodes of the data routing task.

[0048] In NoC communication scenarios, upper-layer software may need to initiate the same operation request to one or more nodes of the NoC (e.g., reading data from or writing data to multiple destination nodes). The source node can then determine the multiple destination nodes to which these operation requests are directed, generate data packets in response to the requests, and initiate a data routing task from the source node to the multiple destination nodes. The upper-layer software referred to here can be application drivers, operating system kernels, or other application- or operating system-oriented software that does not directly manipulate the NoC hardware circuitry. The source node refers to the starting point of the data routing task, which can be a processor core, accelerator, direct memory access (DMA) controller, memory controller, etc., deployed on the NoC.

[0049] The multiple destination nodes mentioned here can include multiple nodes to which multiple operation requests collected at the source node are directed. The source node can determine multiple destination nodes by collecting operation requests in real time. Specifically, multiple destination nodes can also include multiple nodes to which multiple operation requests collected at the source node are directed during a predetermined time window. Alternatively, the source node can also collect multiple operation requests (e.g., from software) to multiple nodes during a predetermined time window and generate data packets for the collected operation requests to create data routing tasks to multiple destination nodes. Compared to the source node collecting operation requests to destination nodes in real time, this method significantly reduces the number of small data packets frequently routed in the link, reduces header overhead, avoids unnecessary packet forwarding overhead in NoC routing, and improves bandwidth utilization.

[0050] The source node can generate an example destination address encoding for data routing in an on-chip network according to embodiments of this disclosure, based on the addresses of multiple destination nodes for the data routing task. This destination address encoding can be included in the header of the data packet to be routed.

[0051] The target address encoding may include a reference coordinate and a mask, where the value of each bit in the mask indicates whether the bit is a wildcard bit.

[0052] In embodiments of this disclosure, the reference coordinates can be represented as (X, Y), which can be the coordinates of a specified destination node among multiple destination nodes. Any other suitable method can be used to represent the reference coordinates without departing from the principles of this disclosure. The reason for determining the reference coordinates in the destination address encoding is that, before reaching the coverage area involved in the data routing task in step 102, data routing can be performed using unicast routing in the direction of the reference coordinates to reach the coverage area. Figure 2Taking a 4×4 2D mesh NoC network 200 as an example, exemplary destination nodes may include A(1,0) and B(1,3), and source node S can specify the coordinates of any destination node as the reference coordinates. For example, source node S can specify the reference coordinates as the coordinates (1,3) of destination node A. Alternatively, the reference coordinates (X,Y) can also be the coordinates of other non-destination nodes included in the coverage area for the data routing task determined in step 102. For example, source node S can specify the reference coordinates as (1,2).

[0053] The mask can be represented as MaskX and MaskY. The mask is a bitwise mask, where the value of each bit indicates whether it is a wildcard bit. In embodiments of this disclosure, a wildcard bit can be indicated by setting a bit to 1 and a non-wildcard bit by setting it to 0. For example, if a bit in MaskX is 0, indicating that the corresponding bit in the binary representation of the X-coordinate of a node within the coverage area needs to be equal to the corresponding bit in the binary representation of the X-coordinate of the reference coordinate, then the bit needs to be a strict match and is not a wildcard bit. If a bit in MaskX is 1, indicating that the corresponding bit in the binary representation of the X-coordinate of a node within the coverage area can be 0 or 1 and does not need to be equal to the corresponding bit in the binary representation of the X-coordinate of the reference coordinate, then the bit is a wildcard bit. The same applies to MaskY. Any other suitable value can be used to indicate whether a bit is a wildcard bit without departing from the principles of this disclosure. In embodiments of this disclosure, the coverage area refers to the set of all possible coordinates determined by the mask and the reference coordinates.

[0054] by Figure 2 Taking the NoC network 200 as an example, the destination nodes include A(1,0) and B(1,3). In the X dimension, the binary addresses of the X-dimensional coordinates of destination nodes A and B can be represented as Ax = 01 and Bx = 01, respectively. That is, the 0th bit of the X-dimensional coordinates of destination nodes A and B is 1, and the 1st bit is 0, meaning that destination nodes A and B are both in the first column. In this case, the 0th and 1st bits of MaskX need to strictly match the corresponding bits of the X-dimensional coordinates of the destination nodes. Each bit of MaskX is not a wildcard bit and should be 0. Therefore, the source node S can determine the mask MaskX = 00. Similarly, in the Y dimension, the binary addresses of the Y-dimensional coordinates of destination nodes A and B can be represented as Ay = 00 and By = 11, respectively. That is, the 0th and 1st bits of the Y-dimensional coordinates of destination nodes A and B can be 0 or 1, respectively. Therefore, both the 0th and 1st bits of MaskY can be wildcard bits, so the source node S can determine the mask MaskY = 11.

[0055] It is understandable that the above process uses a 2-bit width for the node address as an example to describe how the mask is determined. Depending on the actual application (e.g., the size of the mesh network), the coordinates of the destination node and the corresponding mask can have any other suitable bit width.

[0056] Furthermore, the principles of this disclosure have been described using two destination nodes as an example in the above process. It is understood that the number of destination nodes can be higher than two without departing from the principles of this disclosure. Based on the same method as the process of determining the mask described above, the value of each bit of the mask can be determined for the destination nodes.

[0057] In step 102, the coverage area for the data routing task can be determined based on the reference coordinates and the mask.

[0058] In some embodiments, step 102 may include: obtaining the minimum coverage area by performing a bitwise AND operation between the reference coordinates and the inverse mask; and obtaining the maximum coverage area by performing an OR operation between the reference coordinates and the mask. Using the above method, bitwise operations can be used to quickly determine the upper and lower bounds of the coverage area, improving routing matching efficiency and computational performance.

[0059] In other embodiments, the coverage area can be determined based on reference coordinates, a mask, and additional constraints. For example, a constraint could be that the coordinate span of the coverage area in a first direction (such as the X direction) does not exceed 5. Another example is that the maximum value of the coverage area in a second direction does not exceed 6. The above examples are not intended to limit the scope of this disclosure. Appropriate constraints can be set according to actual circumstances without departing from the principles of this disclosure.

[0060] by Figure 2 Taking the NoC network 200 as an example, the source node S is specified with reference coordinates (1,3), and the mask MaskX = 00 and MaskY = 11.

[0061] In the X dimension, based on the reference coordinate X = 01 and the mask MaskX = 00, the minimum value xmin and the maximum value xmax of the coverage area for the data routing task can be determined as follows: Minimum value xmin = X & ~ MaskX = 01, Maximum value xmax = X | MaskX = 01. Therefore, in the X dimension, the coverage area is 01, that is, the column with x-coordinate 1.

[0062] Similarly, in the Y dimension, based on the reference coordinate Y = 11 and the mask MaskY = 11, the minimum and maximum values ​​of the coverage area for the data routing task can be determined as follows: ymin = Y & ~ MaskY = 00, ymax = Y | MaskY = 11. Therefore, in the Y dimension, the coverage area is 00~11, that is, the four rows with y coordinates from 0 to 3. Combining the X and Y dimensions, the coverage area for the data routing task can be determined to be nodes (1,0) to (1,3).

[0063] As can be seen from the above calculation process of the coverage area, each bit of the mask can be configured as a wildcard bit as needed, thereby flexibly configuring the coverage area. Regardless of the number and location of the destination nodes in the data routing task, the coverage area for all given destination nodes can be calculated through the above step 102. Moreover, considering that the upper-layer software may request destination nodes in different locations, and the coverage area is determined for the data routing task, the coverage area may not be limited to the entire column range described in this embodiment, but may be an entire row, a specific rectangular area, or even the entire chip range. For example, if the coverage area for the data routing task is an entire row, the mask can be determined as MaskX = 11, MaskY = 00, that is, each bit of MaskX is a wildcard bit, so the 0th and 1st bits of the X coordinate of the destination node can be 0 or 1, while each bit of MaskY is not a wildcard bit, so the 0th and 1st bits of MaskY need to strictly match the 0th and 1st bits of the Y coordinate of the destination node, so the coverage area of ​​an entire row can be determined according to the mask. For example, if the mask is set to MaskX = 01 and MaskY = 01, the coverage area for the data routing task can include a rectangular area of ​​two rows and two columns. As another example, if the mask is set to MaskX = 11 and MaskY = 11, the coverage area for the data routing task can include... Figure 2 The example illustrates the entire 4×4 2D mesh network. Broadcast or multicast domains can be flexibly configured through mask settings.

[0064] In step 103, in response to determining that the current node is not within the coverage area based on the mask, a unicast route with the reference coordinates as the target node can be executed to route the data packets of the data routing task to the next node.

[0065] When a data routing task is started at the source node, the source node is first used as the current node to determine whether the current node (i.e., the source node) is within the coverage area. Then, after the data packet is routed to the next node, the next node is used as the current node to determine whether the current node is within the coverage area.

[0066] The following method can be used to determine whether the current node is within the coverage area.

[0067] In some embodiments, the current node can determine whether it is within the coverage area by responding to the AND operation result between the current node and the inverse mask, and the AND operation result between the reference coordinates and the inverse mask. That is, if the current node's coordinates are (x, y), the reference coordinates are (X, Y), and the masks are MaskX and MaskY, it is necessary to determine whether the following equations hold: x &~ MaskX = X &~ MaskX, y &~ MaskY = Y &~ MaskY. Using the above method, it is possible to efficiently determine whether a node is within the coverage area containing the destination node. By performing this determination step, the NoC can determine at which node the coverage area has been reached, thereby deciding on the subsequent packet copying strategy. In other embodiments, the current node can perform a bitwise AND operation between its own coordinates and the reference coordinates and the mask respectively; if the results are equal, the current node is determined to be within the coverage area. Without departing from the principles of this disclosure, those skilled in the art can use any suitable means to determine whether the current node is within the coverage area.

[0068] by Figure 2 Taking the NoC network 200 as an example, the source node S is the current node, with coordinates S(x,y) of (0,1) and reference coordinates (X,Y) of (1,3). The masks are MaskX = 00 and MaskY = 11. Calculations show that in the X dimension, Sx & ~ MaskX equals 00, while X & ~ MaskX equals 01, which are not equal. In the Y dimension, Sy & ~ MaskY equals 00. Therefore, the current node (0,1) is not within the coverage area. It can be understood that this judgment step is not limited to the source node. During the subsequent routing of the data packet, it may pass through multiple nodes. For each node, it is necessary to determine whether the node is within the aforementioned coverage area.

[0069] In response to the mask-based determination that the current node is not within the coverage area, a unicast route with the reference coordinates as the target node is executed, routing the data packets of the data routing task to the next node. It can be seen that by executing this step, the current node can first perform a unicast route towards the reference coordinates before reaching the coverage area. This is because, since the reference coordinates are within the coverage area, the current node can route towards the reference coordinates, thus routing in the direction of the coverage area, and eventually reaching the coverage area. Before reaching the coverage area, the routing process is similar to unicast routing with the reference coordinates as the target node. Unicast routing can be performed using any suitable routing strategy, such as Dimension-Order Routing (DOR) or any other suitable deadlock-free and / or loop-free routing strategy to achieve the above unicast routing.

[0070] For example, an XY routing strategy can be used to perform unicast routing from the current node to the reference coordinates. In XY routing, path selection strictly follows "dimension priority": first, it moves along the X dimension (horizontal) until the x-coordinate of the current node matches the x-coordinate of the destination node; then it moves along the Y dimension (vertical) until the destination node is reached. Figure 2 Taking the NoC network 200 as an example, the current node is (0,1). It will take the reference coordinate (1,3) as the target node and perform XY routing. Therefore, it will move along the X dimension and route to the next node (1,1).

[0071] Furthermore, as described in step 101, the reference coordinates (X, Y) can also be the coordinates of other non-destination nodes included in the coverage area. This is because the inventors of this disclosure realized that since the current node routes towards the reference coordinates with the ultimate goal of reaching the coverage area, the current node will route towards the coverage area as long as the specified reference coordinates are within the coverage area. Therefore, the reference coordinates can be any value within the coverage area, can be one of the given destination nodes, or can be a non-destination node.

[0072] At step 104, in response to determining that the current node is within the coverage area based on the mask, a replica branch for routing data packets can be determined to route data packets of the data routing task to the corresponding target neighboring nodes from at least two output ports corresponding to at least two target neighboring nodes in the current node.

[0073] If, in step 104, it is determined that the current node is within the aforementioned coverage area, then continue referring to... Figure 2In the NoC network 200 shown above, as described earlier, data packets move from the source node (0,1) to (1,1) according to the XY route. At this point, it is necessary to use (1,1) as the current node to determine whether the current node is within the aforementioned coverage area. Using the method described above in conjunction with step 103, calculations show that in the X dimension, x&~MaskX and X&~MaskX are equal, both being 01; in the Y dimension, y&~MaskY and Y&~MaskY are equal, both being 00. That is to say, the current node (1,1) is within the coverage area.

[0074] After determining that the current node is within the coverage area, a replication branch for routing data packets needs to be established. This branch routes the data packets of the data routing task from at least two output ports corresponding to at least two target neighboring nodes within the current node to the corresponding target neighboring nodes. During this process, the current node needs to determine, according to routing rules, which output ports to use to route the data packets to which target neighboring nodes, thereby replicating and disseminating the data packets to these target neighboring nodes. It is understood that although unicast routing performed before reaching the coverage area uses the base node as the target node, the aforementioned unicast routing does not necessarily use the base node as the routing endpoint. Once the data packets are routed into the coverage area, even if the base coordinates have not yet been reached, unicast routing can be terminated, and data packet replication can continue within the coverage area until the data packets are routed to all nodes within the coverage area.

[0075] The target neighbor node can be determined based on the following steps: in response to determining that the current node is within the coverage area based on the mask, determining whether a first candidate neighbor node in a first candidate neighbor node set is within the coverage area, wherein the first candidate neighbor node includes the current node's neighbor node in a first direction and excludes the node preceding the current node that routes the data packet to the current node; in response to determining that at least one first candidate neighbor node is within the coverage area, determining at least one first candidate neighbor node as the target neighbor node; in response to determining that none of the first candidate neighbor nodes in the first candidate neighbor node set are within the coverage area, determining whether a second candidate neighbor node in a second candidate neighbor node set is within the coverage area, wherein the second candidate neighbor node includes the current node's neighbor node in a second direction and excludes the node preceding the current node that routes the data packet to the current node, the second direction being different from the first direction; in response to determining that at least one second candidate neighbor node is within the coverage area, determining at least one second candidate neighbor node as the target neighbor node.

[0076] In a 2D mesh network, the first direction can be the X direction, and the second direction can be the Y direction. It is understood that, without departing from the principles of this disclosure, the definitions of the first and second directions can be changed according to the actual situation; for example, the first direction can be the Y direction, and the second direction can be the X direction. When other network structures are used on the chip, the specific meanings of the first and second directions can also be determined according to the actual network structure.

[0077] In other words, the current node can first determine whether the first candidate neighbor node in the first candidate neighbor node set in the first direction (e.g., the current routing direction) is within the coverage area to determine if a target neighbor node can be identified in the first direction. If the first candidate neighbor node in the first direction is within the coverage area, then the first candidate neighbor node in that direction can be identified as the target neighbor node. If none of the first candidate neighbor nodes in the first candidate neighbor node set in the first direction are within the coverage area, then the node can next determine whether the second candidate neighbor node in the second candidate neighbor node set in the second routing direction is within the coverage area to determine if a target neighbor node can be identified in the second direction. If the second candidate neighbor node in the second routing direction is within the coverage area, then the second candidate neighbor node in the second routing direction can be identified as the target neighbor node.

[0078] Continue with Figure 2 Taking the NoC network 200 shown as an example, the reference coordinates (X,Y) are (1,3), the current node (x,y) is (1,1), and the mask is MaskX = 00, MaskY = 11. It has been determined that the current node (1,1) is within the coverage area. In response to determining that the current node (1,1) is within the coverage area based on the mask, it is necessary to determine whether the first candidate neighbor node in the first candidate neighbor node set is within the coverage area. That is, it is necessary to determine whether the neighbor node of (1,1) in the first direction is within the coverage area. According to the XY routing rule, the routing direction currently moves along the X dimension first, so the first direction can be the X-dimensional direction, and therefore the first candidate neighbor node may be (0,1) and (2,1). However, since the first candidate node does not include the node (0,1) that routes the data packet to the current node (1,1), the first candidate neighbor node only includes (2,1). It can be calculated in the manner described in step 103 whether (2,1) is within the coverage area. Calculations show that x&~MaskX equals 10, while X&~MaskX equals 01. Therefore, (2,1) is not within the coverage area. In other words, in this example, the first candidate neighbor node in the first direction is not within the coverage area.

[0079] In response to determining that none of the first candidate neighbor nodes in the first candidate neighbor node set are within the coverage area, it can be determined whether the second candidate neighbor node in the second candidate neighbor node set is within the coverage area. Still using... Figure 2 Taking the NoC network 200 shown as an example, according to the XY routing rule, the second direction can be the Y-dimensional direction. The second candidate neighbor nodes in the second candidate neighbor node set can be (1,0) and (1,2). Determining whether (1,0) and (1,2) are within the coverage area can be calculated in the manner described in step 103. After calculation, in the X dimension, the results of x&~MaskX and X&~MaskX are equal, both being 01; in the Y dimension, the results of y&~MaskY and Y&~MaskY are equal, both being 00. Therefore, both (1,0) and (1,2) are within the coverage area, so both (1,0) and (1,2) can be determined as target neighbor nodes. Thus, the current node (1,1) can determine that the replication branches used for routing data packets include two replication branches pointing to (1,0) and (1,2), which are used to route the data packets of the data routing task to the corresponding target adjacent nodes (1,0) and (1,2) from the two output ports corresponding to at least two target adjacent nodes (1,0) and (1,2).

[0080] It is understandable that the number of output ports or target neighboring nodes here is determined based on the number of candidate neighboring nodes in the coverage area. There are cases where there are two candidate neighboring nodes and two corresponding target neighboring nodes and two output ports, as well as cases where there is only one candidate neighboring node and one corresponding target neighboring node and one output port.

[0081] Furthermore, it can be understood that the first and second directions here are not limited to bidirectional directions in the X or Y dimensions. For example, a bidirectional direction in the X or Y dimension could include both X+ and X- packet replication and diffusion directions simultaneously, and a second direction could include both Y+ and Y- packet replication and diffusion directions simultaneously. Additionally, to reduce replication branches and control bandwidth, the first and second directions could also include only unidirectional directions in the X or Y dimensions. For example, a unidirectional direction in the X or Y dimension could include only the X+ direction, and a second direction could include only the Y+ direction. After determining the coverage area for the data routing task using the target address encoding and reference coordinates, the coverage area can be determined again by specifying the packet replication and diffusion directions.

[0082] By performing the above steps, the replication branches of the data packet after it is routed into the coverage area can be determined, thereby spreading the data packet to the entire coverage area determined in step 102, and eventually reaching each destination node.

[0083] Furthermore, in response to determining that the current node is within the coverage area, the data packets for the data routing task are delivered to the local network interface unit of the current node, and the local network interface unit determines whether the current node belongs to one of the multiple destination nodes determined in step 101. In response to determining that the current node does not belong to multiple destination nodes, the data packets are discarded. In response to determining that the current node belongs to multiple destination nodes, the payload of the data packets is obtained using the processing unit of the current node.

[0084] The above process explains at which nodes data packets should be delivered, and how those nodes should acquire or discard the packets. In other words, if the current node is within its coverage area, the data packet is delivered. After delivery, if the local network interface unit of the current node can determine that the node belongs to the destination node, it can obtain the data packet's payload through the current node's processing unit. If it is determined that the current node does not belong to the destination node, the local network interface unit can discard the data packet. Using this method, the routing strategy for data packets does not need to consider which nodes need to acquire the data packet's payload; instead, it routes the data packet to all nodes that might need to acquire the payload, and then the nodes themselves determine whether to actually acquire the data packet's payload.

[0085] Continue with Figure 2 For example, in the previous step, the data packet was routed to the current node (1,1). Since (1,1) is within the coverage area, the data packet can be delivered to the local network interface unit of (1,1). The local network interface unit can, for example, receive the data packet from the local port of the current node (1,1) and determine whether the current node (1,1) belongs to one of the multiple destination nodes.

[0086] To enable the local network interface unit to determine whether the current node belongs to multiple destination nodes, the data packet includes information indicating the multiple destination nodes. For example, the coordinates of the multiple destination nodes can be carried in the packet header or the packet payload, so that the local network interface unit can determine the information of the multiple destination nodes by parsing the packet header or by unpacking the packet. For example, the coordinates of the multiple destination nodes can be carried in the packet payload, thereby reducing the overhead of the packet header.

[0087] In some embodiments, the header of the routed data packet may include a type field indicating whether the data routing task is a unicast or multicast task. The local network interface unit can determine whether it needs to determine if the local node is a destination node based on the data packet type. In some examples, when the type field indicates that the data routing task is a unicast task, the local node to which the packet is delivered is the destination node for the unicast route, and therefore the local node does not need to perform a determination on whether to discard the data packet. In other examples, when the type field indicates that the data routing task is a multicast task, as mentioned above, the data packet will be routed and delivered to all nodes within the coverage area that may be the destination node. Therefore, in this case, the local node to which the packet is delivered needs to perform a determination on whether to discard the data packet.

[0088] On the one hand, in response to determining that the current node does not belong to multiple destination nodes, data packets can be discarded. Continuing... Figure 2 For example, in response to determining that the current node (1,1) does not belong to the destination node A (1,0) or B (1,3), the local network interface unit of (1,1) can discard the data packet.

[0089] On the other hand, in response to determining that the current node belongs to multiple destination nodes, the payload of the data packet can be obtained using the processing unit of the current node. Continuing with... Figure 2 For example, when a data packet is routed from node (1,1) to the aforementioned target neighboring nodes (1,0) and (1,2), if the current node (1,0) belongs to the destination node A (1,0), then the processing unit of the current node (1,0) can obtain the payload of the data packet. For the current node (1,2), since it does not belong to the destination node, the local network interface unit of (1,2) can discard the data packet.

[0090] Furthermore, to save on the overhead of node packet replication in NoC routing, packets corresponding to each replication branch logically point to the same payload storage area. That is, only the packet header can be replicated on each replication branch, allowing multiple replication branches to share the same payload buffer. This reduces the bandwidth required for the routing process and improves bandwidth utilization.

[0091] pass Figure 1 The data routing method described above can predetermine the coverage area of ​​all destination nodes. The source node only needs to execute unicast routing with the reference coordinates as the destination nodes to ensure that data packets reach all destination nodes within the coverage area. In other words, multicast or broadcast can be achieved through replication in an address-driven NoC network, thereby improving communication efficiency, simplifying replication and distribution strategies, and enhancing the energy efficiency of on-chip interconnects.

[0092] Figure 2 An example on-chip network 200 according to an embodiment of the present disclosure is illustrated. As previously described, the example on-chip network 200 is a 4×4 2D mesh network with source node S (0,1) and destination nodes A (1,0) and B (1,3). (Already combined with...) Figure 1 The description indicates that it can be executed on the example on-chip network 200. Figure 1 The methods shown will not be elaborated upon here.

[0093] Understandable, although Figure 2 This disclosure uses a 4×4 2D mesh network as an example, but it can be applied to 2D meshes of any size, as well as to other common NoC topologies such as Torus and Fat-tree. The size and structure of the NoC topology are not limited by this disclosure.

[0094] Figure 3A An example destination address encoding for data routing for a network on-chip according to an embodiment of the present disclosure is illustrated. Figure 3A The destination address code 301 illustrated can be located in the header of the packet to be routed and can be used for... Figure 1 The method executed in the process.

[0095] from Figure 3A As can be seen, the target address encoding includes the base coordinates and the mask, and may specifically include the following fields.

[0096] - The reference coordinate field, also known as the X / Y field, is used to indicate the reference coordinates. As mentioned earlier, the reference coordinates can be the coordinates of a specific destination node among multiple destination nodes, or the coordinates of other non-destination nodes within the coverage area of ​​the data routing task.

[0097] - Mask fields, namely MaskX / MaskY fields. The mask is a bitwise mask, where each bit indicates whether it is a wildcard. For example, a bit in MaskX being 0 indicates that the corresponding bit in the binary representation of the X-coordinate of a node within the coverage area must be equal to the corresponding bit in the binary representation of the X-coordinate in the reference coordinates; this means the bit must be a strict match and is not a wildcard. If a bit in MaskX being 1 indicates that the corresponding bit in the binary representation of the X-coordinate of a node within the coverage area can be either 0 or 1, and does not need to be equal to the corresponding bit in the binary representation of the X-coordinate in the reference coordinates; this means the bit is a wildcard. The same applies to MaskY. Without departing from the principles of this disclosure, any other suitable value can be used to indicate whether a bit is a wildcard. The bit width of each field can be flexibly set according to the NoC size and topology parameters.

[0098] Figure 3B Another example of target address encoding for data routing for a network-on-chip according to embodiments of the present disclosure is illustrated. Figure 3B The destination address code 302 illustrated can be located in the header of the packet to be routed and can be used for... Figure 1 The method executed in the process.

[0099] from Figure 3B As can be seen, in addition to the reference coordinates and mask, the target address encoding can also include other fields, specifically the following fields.

[0100] - The Type field indicates whether the data routing task is a unicast or multicast task. Specifically, if the field indicates that the data routing task is a unicast task, then each node can perform unicast routing to the destination node; if the field indicates that the data routing task is a multicast task, then each node can perform the data routing method as described in steps 101 to 104.

[0101] The -GroupID field is used to define a multicast / broadcast domain or permission domain. For example, NoC can use this field to define a global broadcast domain, kernel-mode permission domain, or other broadcast or permission domains.

[0102] - The datum coordinate field, also known as the X / Y field, is used to indicate the datum coordinates. This field is related to... Figure 3A The content is the same as described above, so I will not repeat it here.

[0103] - The mask field, i.e., the MaskX / MaskY field. This field is related to... Figure 3A The content is the same as described above, so I will not repeat it here.

[0104] - Reserved fields. These fields can be reserved for future expansion.

[0105] The bit width of each field can be flexibly set according to the NoC size and topology parameters.

[0106] Below, refer to Figure 4 This describes an apparatus 400 for data routing for a network on chip according to embodiments of the present disclosure. Figure 4 An exemplary block diagram illustrating an apparatus 400 for implicit broadcasting in an on-chip network according to an embodiment of the present disclosure is shown. It can be utilized in conjunction with... Figure 4 The described device 400 realizes the combination Figure 1 Method 100 is described.

[0107] The apparatus 400 includes an address encoding unit 401, a coverage area determination unit 402, a first routing unit 403, and a second routing unit 404. In addition to these units, the apparatus 400 may also include other components; however, since these components are not relevant to the content of this embodiment, their illustrations and descriptions are omitted here. Furthermore, the specific details of the operations performed by the apparatus 400 according to this embodiment are consistent with those described above. Figure 2 The details described are the same, so repeated descriptions of the same details are omitted here to avoid repetition.

[0108] The address encoding unit 401 of the device 400 is configured to generate a target address code at the source node based on multiple destination nodes of the data routing task. The target address code includes a reference coordinate and a mask, wherein the value of each bit of the mask indicates whether the bit is a wildcard bit.

[0109] The coverage determination unit 402 is configured to determine the coverage for the data routing task based on the reference coordinates and the mask.

[0110] The first routing unit 403 is configured to, in response to determining based on the mask that the current node is not within the coverage area, perform unicast routing with the reference coordinates as the target node, and route the data packets of the data routing task to the next node.

[0111] The second routing unit 404 is configured to determine a replica branch for routing data packets in response to determining that the current node is within the coverage area based on the mask, for routing data packets of the data routing task to the corresponding target adjacent nodes from at least two output ports corresponding to at least two target adjacent nodes in the current node.

[0112] Below, refer to Figure 5 To describe a computer device 500 according to embodiments of the present disclosure, Figure 5 An exemplary block diagram of a computer device 500 according to an embodiment of the present disclosure is shown.

[0113] Computer device 500 may be used as one or more components for implementing the systems and methods described above. Computer device 500 may include a bus 502 or other communication mechanism for communicating information, and one or more processors 504 coupled to the bus 502 for processing information. Processor 504 may be, for example, one or more general-purpose microprocessors.

[0114] Computer device 500 may also include main memory 506, such as random access memory (RAM), cache, and / or other dynamic storage devices, coupled to bus 502, for storing information and instructions to be executed by processor 504. Main memory 506 may also be used to store temporary variables or other intermediate information during the execution of instructions to be executed by processor 504. Such instructions, when stored in a storage medium accessible to processor 504, can make computer device 500 a special-purpose machine customized to perform the operations specified in the instructions. Main memory 506 may include non-volatile media and / or volatile media. Non-volatile media may include, for example, optical discs or magnetic disks. Volatile media may include dynamic memory. Common media formats may include, for example, floppy disks, collapsible disks, hard disks, solid-state drives, magnetic tapes or any other magnetic data storage media, CD-ROMs (read-only optical disc drives), any other optical data storage media, any physical media with a perforated arrangement, RAM (random access memory), DRAM (dynamic random access memory), PROM (programmable read-only memory) and EPROM (erasable programmable read-only memory), FLASH-EPROM (fast erase programmable read-only memory), NVRAM (non-volatile random access memory), any other memory chips or tape cartridges, or network versions of the above.

[0115] Computer device 500 may implement the techniques described herein using custom hardwired logic, one or more ASICs (Application-Specific Integrated Circuits) or FPGAs (Field-Programmable Gate Arrays), firmware, and / or program logic, which, when combined with computer device 500, enable computer device 500 to become a special-purpose machine or to be programmed therein. According to one embodiment, the techniques described herein are executed by computer device 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 508. Executing the sequence of instructions contained in main memory 506 causes processor 504 to perform the processing steps described herein. For example, the processes / methods disclosed herein may be implemented by computer program instructions stored in main memory 506. When these instructions are executed by processor 504, they may perform the steps shown in the corresponding figures and as described above. In alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions.

[0116] Computer device 500 also includes a network interface 510 coupled to bus 502. Network interface 510 can provide bidirectional data communication coupled to one or more network links connected to one or more networks. As another example, network interface 510 can be a local area network (LAN) card to provide data communication connectivity with a compatible LAN (or a WAN component communicating with a WAN (wide area network)). Wireless links can also be implemented.

[0117] The performance of certain operations can be distributed across processors, not just residing within a single machine, but deployed across many machines. In some exemplary embodiments, the processor or the processor-implemented engine may reside in a single geographic location (e.g., in a home environment, office environment, or server farm). In other exemplary embodiments, the processor or the processor-implemented engine may be distributed across many geographic locations.

[0118] Each process, method, and algorithm described in the preceding sections can be embodied in a code module executed by one or more computer systems or computer processors including computer hardware, and can be fully or partially automated by them. These processes and algorithms can be implemented, in part or in whole, in a specific application circuit.

[0119] When the functions disclosed herein are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Specific technical solutions (all or part) disclosed herein, or aspects contributing to the prior art, can be embodied in the form of a software product. This software product can be stored in a storage medium and includes instructions to cause a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in the embodiments of this application. The storage medium may include a flash drive, a portable hard drive, ROM, RAM, a magnetic disk, an optical disk, another medium suitable for storing program code, or any combination thereof.

[0120] The embodiments disclosed herein can be implemented via a cloud platform, server, or group of servers that interact with a client. The client can be a terminal device or a client registered by a user on the platform, wherein the terminal device can be a mobile terminal, a personal computer (PC), or any device that can install platform applications.

[0121] The various features and processes described above can be used independently or combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. Furthermore, certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are not limited to any particular order, and associated blocks or states may be executed in other suitable orders. For example, described blocks or states may be executed in a non-specifically disclosed order, or multiple blocks or states may be combined in a single block or state. Exemplary blocks or states may be executed serially, in parallel, or otherwise. Blocks or states may be added to or removed from the disclosed exemplary embodiments. The exemplary systems and components described herein may be configured differently from those described. For example, elements may be added, removed, or rearranged compared to the disclosed exemplary embodiments.

[0122] The various operations of the exemplary methods described herein can be performed at least in part by an algorithm. An algorithm may consist of program code or instructions stored in memory (such as the non-transitory computer-readable storage medium described above). Such an algorithm may include a machine learning algorithm. In some embodiments, the machine learning algorithm may not be explicitly programmed into the computer to perform the function, but may learn from training data to obtain a predictive model for performing that function.

[0123] The various operations of the exemplary methods described herein can be performed at least in part by one or more processors, which are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors can constitute the engine of a processor implementation whose operation is to perform one or more of the operations or functions described herein.

[0124] Similarly, the methods described herein can be implemented at least partially by a processor, where a specific processor or one or more processors are examples of hardware. For example, at least some operations of the methods can be performed by one or more processors or an engine implemented by a processor. Furthermore, one or more processors can also run in a “cloud computing” environment or as “Software as a Service” (SaaS) to support the execution of the relevant operations. For example, at least some operations can be performed by a group of computers (as an example of a machine including processors), which can be accessed via a network (e.g., the Internet) and through one or more appropriate interfaces (e.g., application programming interfaces (APIs)).

[0125] The performance of certain operations can be distributed across processors, not just residing within a single machine, but deployed across many machines. In some exemplary embodiments, the processor or the processor-implemented engine may reside in a single geographic location (e.g., in a home environment, office environment, or server farm). In other exemplary embodiments, the processor or the processor-implemented engine may be distributed across many geographic locations.

[0126] The following describes a computer storage medium according to the present invention, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the aforementioned method for data routing for a network on a chip.

[0127] The following describes a computer program product according to the present invention, which includes a computer program that, when executed by a processor, causes the processor to perform the aforementioned method for data routing for a network on-chip.

[0128] In this specification, multiple instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are described and illustrated as independent operations, one or more individual operations may be performed concurrently, and these operations are not required to be performed in the order shown. Structures and functionalities presented as independent components in the example configuration may be implemented as combined structures or components. Similarly, structures and functionalities presented as individual components may be implemented as independent components. These and other variations, modifications, additions, and improvements are all within the scope of this document.

[0129] As used herein, “or” is inclusive rather than exclusive unless explicitly stated or indicated by context. Furthermore, “and” is both common and individual unless explicitly stated or indicated by context. Moreover, multiple instances may be provided for the resources, operations, or structures described herein as a single example. Furthermore, the boundaries between various resources, operations, engines, and data stores are somewhat arbitrary, and specific operations are illustrated within the context of a particular illustrative configuration. The allocation of other functionalities is conceivable and may fall within the scope of various embodiments of this disclosure. Generally, structures and functionalities presented as independent resources in example configurations may be implemented as combined structures or resources. Similarly, structures and functionalities presented as individual resources may be implemented as independent resources. These and other variations, modifications, additions, and improvements are all within the scope of embodiments of this disclosure. Therefore, this specification and accompanying drawings should be viewed in an illustrative rather than restrictive sense.

[0130] The terms “comprising” or “including” are used to indicate the presence of a subsequently stated feature, but do not preclude the addition of other features. Conditional language, in particular, such as “may,” “can,” or “may,” unless specifically stated or otherwise understood in the context of use, is generally intended to express that certain embodiments include certain features, elements, and / or steps, while other embodiments do not. Therefore, such conditional language generally does not imply that a feature, element, and / or step is necessary in any way for one or more embodiments, or that one or more embodiments must include logic that, with or without user input or prompting, determines whether such features, elements, and / or steps are included in any particular embodiment, or whether they are to be performed in any particular embodiment.

Claims

1. A method for data routing in an on-chip network, characterized in that, The method includes: At the source node, a target address code is generated based on multiple destination nodes of the data routing task. The target address code includes a reference coordinate and a mask, and the value of each bit of the mask indicates whether the bit is a wildcard bit. Based on the reference coordinates and the mask, determine the coverage area for the data routing task; In response to determining that the current node is not within the coverage area based on the mask, a unicast route is executed with the reference coordinates as the target node, and the data packets of the data routing task are routed to the next node; In response to determining that the current node is within the coverage area based on the mask, a replica branch for routing the data packets is determined, for routing the data packets of the data routing task to the corresponding target neighboring nodes from at least two output ports corresponding to at least two target neighboring nodes in the current node.

2. The method as described in claim 1, characterized in that, Determining the coverage area for the data routing task based on the reference coordinates and the mask includes: The minimum coverage area is obtained by performing an AND operation between the reference coordinates and the inverse mask of the mask; The maximum value of the coverage area is obtained by performing an OR operation between the reference coordinates and the mask.

3. The method as described in claim 1, characterized in that, The following steps are used to determine whether the current node is within the coverage area: In response to the AND operation result between the current node and the inverse mask of the mask and the AND operation result between the reference coordinates and the inverse mask of the mask, it is determined whether the current node is located within the coverage area.

4. The method as described in claim 1, characterized in that, The reference coordinates are the coordinates of a specific destination node among the plurality of destination nodes.

5. The method as described in claim 1, characterized in that, The target neighboring nodes are determined based on the following steps: In response to determining that the current node is located within the coverage area based on the mask, it is determined whether a first candidate neighbor node in the first candidate neighbor node set is located within the coverage area, wherein the first candidate neighbor node includes the neighbor node of the current node in the first direction and does not include the previous node that routes the data packet to the current node; In response to determining that at least one first candidate neighbor node is located within the coverage area, the at least one first candidate neighbor node is determined as the target neighbor node; In response to determining that none of the first candidate neighboring nodes in the first candidate neighboring node set are located within the coverage area, it is determined whether a second candidate neighboring node in the second candidate neighboring node set is located within the coverage area, wherein the second candidate neighboring node includes the neighboring node of the current node in a second direction and does not include the node preceding the current node that routes the data packet to the current node, and the second direction is different from the first direction; In response to determining that at least one second candidate neighbor node is located within the coverage area, the at least one second candidate neighbor node is determined as the target neighbor node.

6. The method according to any one of claims 1-5, characterized in that, The method further includes: In response to determining that the current node is within the coverage area, the data packets of the data routing task are delivered to the local network interface unit of the current node; The local network interface unit determines whether the current node belongs to the multiple destination nodes; In response to determining that the current node does not belong to the plurality of destination nodes, the data packet is discarded; In response to determining that the current node belongs to the plurality of destination nodes, the payload of the data packet is obtained using the processing unit of the current node.

7. The method as described in claim 6, characterized in that, The header of the data packet includes a type field that indicates whether the data routing task is a unicast task or a multicast task.

8. The method as described in claim 7, characterized in that, The data packet includes information indicating the plurality of destination nodes.

9. The method according to any one of claims 1-5, characterized in that, The data packets corresponding to each replicated branch logically point to the same data payload storage area.

10. The method according to any one of claims 1-5, characterized in that, The multiple destination nodes include multiple nodes to which multiple operation requests collected at the source node during a predetermined time window are directed.

11. An apparatus for data routing in an on-chip network, characterized in that, The device includes: The address encoding unit is configured to generate a target address code at the source node based on multiple destination nodes of the data routing task, wherein the target address code includes a reference coordinate and a mask, and the value of each bit of the mask indicates whether the bit is a wildcard bit; The coverage determination unit is configured to determine the coverage area for the data routing task based on the reference coordinates and the mask; The first routing unit is configured to, in response to determining based on the mask that the current node is not within the coverage area, perform unicast routing with the reference coordinates as the target node, and route the data packets of the data routing task to the next node; The second routing unit is configured to determine a replica branch for routing the data packets in response to determining that the current node is within the coverage area based on the mask, for routing the data packets of the data routing task to the corresponding target neighboring nodes from at least two output ports corresponding to at least two target neighboring nodes of the current node.

12. A computer device, characterized in that, The computer device includes: At least one processor; A memory having a computer program stored thereon, wherein, when executed by the at least one processor, the computer program causes the at least one processor to perform the method of any one of claims 1-10.

13. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to perform the method according to any one of claims 1-10.

14. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, causes the processor to perform the method of any one of claims 1-10.