Method, network, medium and product for accessing a target address

By determining the access type and next-hop node in the on-chip bus interconnect network, the target address can be directly accessed, solving the packet loss problem caused by the increase in network size and improving transmission efficiency.

CN121597633BActive Publication Date: 2026-06-12SANECHIPS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SANECHIPS TECH CO LTD
Filing Date
2026-01-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In on-chip bus interconnect networks, as the number of CPUs and other devices increases, the network size increases, the probability of packet loss increases during the transmission of debug and trace data, and the transmission efficiency decreases.

Method used

The access node determines whether the access type is local access or cross-slice access based on the attribute information of the NUMA node to which the preset storage space belongs, and determines the next hop node accordingly, directly sending the access request, eliminating the relay process of low-speed bus and consistency node, and simplifying the routing path.

🎯Benefits of technology

It reduces the probability of packet loss, improves transmission efficiency, and enables simple and efficient access.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121597633B_ABST
    Figure CN121597633B_ABST
Patent Text Reader

Abstract

The present disclosure provides a method, network, medium and product for accessing a target address. An on-chip bus interconnection network includes a plurality of nodes located on one or more chips, the plurality of nodes including one or more slave nodes, the slave nodes being configured to connect one or more storage devices and receive an access request corresponding to a storage address, the storage space provided by the one or more storage devices including a preset storage space configured to store debugging trace data of the on-chip bus interconnection network, the target address being an address in the preset storage space, and any one of the plurality of nodes being configured as an access node configured to initiate an access to the preset storage space. The method of the present disclosure includes determining, by the access node, an access type according to attribute information of a NUMA node to which the preset storage space belongs; determining a next-hop node based on the access type and the target address; and sending an access request for accessing the target address to the next-hop node.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of communication technology, and in particular to a method, network, medium, and product for accessing a target address. Background Technology

[0002] A debug trace component is used to monitor and debug distributed systems. It assists developers in quickly locating the source of failures by recording key information and visualizing the flow of events. As the number of central processing units (CPUs) and other devices integrated into on-chip bus interconnect networks increases, the network size grows, and the probability of packet loss and transmission efficiency decreases during the transmission of debug trace data. Summary of the Invention

[0003] This disclosure provides a method, network, medium, and product for accessing a target address.

[0004] In a first aspect, embodiments of this disclosure provide a method for accessing a target address in an on-chip bus interconnect network. The on-chip bus interconnect network includes multiple nodes located on one or more chips. Each of the multiple nodes includes one or more slave nodes. Each slave node is used to connect to one or more storage devices and receive access requests for corresponding storage addresses. The storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network. The target address is an address in the preset storage space. Any one of the multiple nodes acts as an access node, which initiates access to the preset storage space. The method includes:

[0005] The access node determines the access type based on the attribute information of the non-uniform memory access node (NUMA node) to which the preset storage space belongs. The access type is either local access or cross-slice access.

[0006] The access node determines the next-hop node based on the access type and the target address;

[0007] The access node sends the access request for accessing the target address to the next-hop node.

[0008] In one possible implementation, the access node determines the access type based on the attribute information of the Non-Unified Memory Access Node (NUMA node) to which the preset storage space belongs, including:

[0009] The access node obtains the attribute information of the NUMA node to which the preset storage space belongs, wherein the attribute information of the NUMA node to which the preset storage space belongs includes information used to indicate whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node.

[0010] The access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node.

[0011] In one possible implementation, the access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node, including:

[0012] If it is determined that the NUMA node to which the preset storage space belongs is a cross-chip NUMA node, the access node determines whether the slave node corresponding to the target address is located on the chip where the access node is located based on the target address;

[0013] If it is determined that the slave node corresponding to the target address is located on the chip where the access node is located, the access node determines that the access type is local access;

[0014] If it is determined that the slave node corresponding to the target address is not located on the chip where the access node is located, the access node determines that the access type is cross-chip access.

[0015] In one possible implementation, the access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node, including:

[0016] If it is determined that the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, the access node determines whether the NUMA node is on the chip where the access node is located;

[0017] If it is determined that the NUMA node is located on the chip where the access node is located, the access node determines that the access type is local access;

[0018] If it is determined that the NUMA node is not on the chip where the access node is located, the access node determines that the access type is cross-chip access.

[0019] In one possible implementation, if the NUMA node is not a cross-chip NUMA node, the access node pre-stores NUMA node type information indicating whether the NUMA node is on the chip where the access node is located, wherein the attribute information of the NUMA node includes the NUMA node type information.

[0020] In one possible implementation, the access node determines the next-hop node based on the access type and the target address by:

[0021] When the access type is local access, the access node determines the slave node corresponding to the target address from the slave nodes on the chip where the access node is located, and uses it as the next hop node.

[0022] In one possible implementation, at least one node in each chip's nodes is a cross-chip node, and each of the cross-chip nodes is connected to cross-chip nodes on other chips. The access node determines the next-hop node based on the access type and the target address, including:

[0023] When the access type is cross-chip access, the access node determines the cross-chip nodes that need to be passed through to reach the target chip from the cross-chip nodes on the chip where the access node is located, based on the target address, and uses them as the next hop nodes. The target chip is the chip where the subordinate node corresponding to the target address is located.

[0024] In one possible implementation, the method further includes:

[0025] The cross-chip node receives the access request and sends the access request to the cross-chip node on the target chip connected to it;

[0026] The cross-chip node on the target chip determines the slave node corresponding to the target address and sends the access request to the slave node.

[0027] In one possible implementation, the attribute information of the NUMA node to which the preset storage space belongs includes the hash mode information of the NUMA node, and the access node determines the slave node corresponding to the target address from the slave nodes on the chip where the access node is located by:

[0028] The access node uses a calculation method corresponding to the hash pattern information to determine the subordinate node corresponding to the target address;

[0029] The hash pattern information includes at least one of the following:

[0030] The number of slave nodes included in the NUMA node;

[0031] Whether the subordinate nodes included in the NUMA node are grouped;

[0032] The number of slave node groups contained in the NUMA node;

[0033] The number of groups of consistent nodes contained in the NUMA node.

[0034] In one possible implementation, the access node determines the subordinate node corresponding to the target address using a calculation method corresponding to the hash pattern information, including:

[0035] The access node determines the calculation method corresponding to the hash pattern information based on whether the number of slave nodes contained in the NUMA node is an integer power of 2 and whether the slave nodes contained in the NUMA node are grouped.

[0036] The access node uses the calculation method described above to determine the slave node corresponding to the target address.

[0037] In one possible implementation, when the number of slave nodes in the NUMA node is a power of 2, the slave nodes in the NUMA node are not grouped, and the consensus nodes in the NUMA node are grouped and the number of consensus node groups is a power of 2 multiple of the number of slave nodes in the NUMA node, the calculation method is as follows:

[0038] Based on the multiple relationship between the number of consensus node groups and the number of slave nodes, a correspondence between consensus node groups and slave nodes is established, such that an integer power of 2 consensus node groups correspond to one slave node.

[0039] Based on the number of groups of the consistent nodes, a hash operation is performed on the target address to obtain the index value of the consistent node group;

[0040] The slave node corresponding to the consensus node group represented by the index value of the consensus node group is determined as the slave node corresponding to the target address.

[0041] In one possible implementation, when the number of slave nodes included in the NUMA node is not an integer power of 2 and the slave nodes included in the NUMA node are not grouped, the calculation method is as follows:

[0042] Based on the number of slave nodes contained in the NUMA node, a hash operation other than a power of 2 is performed on the target address to obtain the index value of the slave node, and the slave node corresponding to the index value of the slave node is determined as the slave node corresponding to the target address.

[0043] In one possible implementation, when the number of slave nodes included in the NUMA node is not a power of 2, the slave nodes included in the NUMA node are grouped, the consistent nodes included in the NUMA node are grouped, and the number of consistent node groups included in the NUMA node is a power of 2 multiple of the number of slave node groups included in the NUMA node, the calculation method is as follows:

[0044] Based on the multiple relationship between the number of groups of the consensus node and the number of groups of the slave node, a correspondence between consensus node groups and slave node groups is established, such that an integer power of 2 consensus node groups correspond to one slave node group.

[0045] The target address is hashed based on the number of groups of consistent nodes contained in the NUMA node to obtain the index value of the consistent node group.

[0046] The subordinate node group corresponding to the consistent node group represented by the index value of the consistent node group is determined as the target subordinate node group;

[0047] Based on the number of slave nodes in a slave node group, perform a hash operation on the target address that is not an integer power of 2 to obtain the index value in the slave node group.

[0048] The slave node represented by the index value of the consistent node group is determined as the slave node corresponding to the target address.

[0049] In one possible implementation, the method further includes:

[0050] The access node performs a hash operation on the target address to a power of 2 based on the number of groups of consistent nodes contained in the NUMA node, and obtains the group index value.

[0051] The access node performs a hash operation that is not a power of 2 on the number of consistent nodes in a consistent node group to obtain the index value of the group.

[0052] The access node determines the consistency node corresponding to the target address based on the group index value and the intra-group index value.

[0053] The consistency node corresponding to the target address manages the cache consistency of the target address.

[0054] In one possible implementation, the access node determines the cross-chip nodes required to reach the target chip from the cross-chip nodes on the chip where the access node is located, based on the target address, including:

[0055] Obtain the number of cross-chip nodes on the chip where the access node is located;

[0056] When the number of cross-chip nodes is a power of 2 and not 1, a power of 2 hash operation is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be passed to reach the target chip.

[0057] And / or,

[0058] If the number of cross-chip nodes is not a power of 2, a hash operation of a power of 2 is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be traversed to reach the target chip.

[0059] And / or,

[0060] When the number of cross-chip nodes is 1, the cross-chip node is determined as the cross-chip node that needs to be passed through to reach the target chip.

[0061] Secondly, embodiments of this disclosure provide an on-chip bus interconnect network, wherein the on-chip bus interconnect network includes multiple nodes located on one or more chips, the multiple nodes include one or more slave nodes, the slave nodes are used to connect to one or more storage devices and receive access requests for corresponding storage addresses, the storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network, the target address is an address in the preset storage space, any one of the multiple nodes is used as an access node, the access node is used to initiate access to the preset storage space;

[0062] The access node is used to determine the access type based on the attribute information of the non-uniform memory access node (NUMAnode) to which the preset storage space belongs. The access type is either local access or cross-slice access.

[0063] The access node is used to determine the next-hop node based on the access type and the target address;

[0064] The access node is used to send the access request for accessing the target address to the next-hop node.

[0065] Thirdly, embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the method of accessing a target address in an on-chip bus interconnect network of the first aspect.

[0066] Fourthly, embodiments of this disclosure provide a computer program product including a computer program that, when executed by a processor, implements the method of accessing a target address in an on-chip bus interconnect network as described in the first aspect.

[0067] In this embodiment, the on-chip bus interconnect network includes multiple nodes located on one or more chips. These nodes include one or more slave nodes, which connect to one or more storage devices and receive access requests for corresponding storage addresses. The storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network. The target address is an address within the preset storage space. Any one of the multiple nodes acts as an access node. This access node first determines whether the access type to the target address is local or cross-chip based on the attribute information of the non-uniform storage access node to which the preset storage space belongs. Then, it determines the next-hop node based on the access type and the target address, and finally sends the access request for accessing the target address to the next-hop node. In this way, any node in the on-chip bus interconnect network can directly initiate an access request to the target address and directly reach the node connected to the storage device. This eliminates the need for aggregation using a low-speed bus and a dedicated trace data bus, and also eliminates the need for a consistency node as an intermediary. This simplifies the routing path of the target address, achieves simple and efficient access, reduces packet loss probability, and improves transmission efficiency. Attached Figure Description

[0068] In the accompanying drawings of the embodiments disclosed herein:

[0069] Figure 1 This is a schematic diagram of the routing path for debugging and tracing in related technologies.

[0070] Figure 2 This is a schematic diagram of a consistency transfer process in related technologies.

[0071] Figure 3 This is a schematic diagram of a consistency transfer process in related technologies.

[0072] Figure 4 This is a schematic diagram of a consistency transfer process in related technologies.

[0073] Figure 5 This is a schematic diagram of the architecture of the on-chip bus interconnect network provided in the embodiments of this disclosure.

[0074] Figure 6A flowchart illustrating a method for accessing a target address in an on-chip bus interconnect network, as provided in an embodiment of this disclosure.

[0075] Figure 7 This is a schematic diagram of the path for debugging tracing access provided in the embodiments of this disclosure.

[0076] Figure 8 This is a schematic diagram of the path for debugging tracing access provided in the embodiments of this disclosure.

[0077] Figure 9 This is a schematic diagram of TART processing provided in an embodiment of this disclosure. Detailed Implementation

[0078] To enable those skilled in the art to better understand the technical solutions of this disclosure, the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings.

[0079] The present disclosure will be described more fully below with reference to the accompanying drawings; however, the embodiments shown may be embodied in different forms, and the present disclosure should not be construed as limited to the embodiments set forth below. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will enable those skilled in the art to fully understand the scope of the disclosure.

[0080] The accompanying drawings are provided to further illustrate this disclosure and form part of the specification. They are used together with the detailed embodiments to explain this disclosure and do not constitute a limitation thereof. These and other features and advantages will become more apparent to those skilled in the art from the description of detailed embodiments with reference to the accompanying drawings.

[0081] Unless otherwise specified, each embodiment and feature of this disclosure may be used individually or in combination with other embodiments and features thereof.

[0082] Those skilled in the art will understand that various changes in form and detail may be made to the embodiments of this disclosure without departing from the scope of this disclosure as set forth by the appended claims.

[0083] The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. The term "and / or" as used in this disclosure includes any and all combinations of one or more of the associated enumerated entries. The singular forms "a" and "the" as used in this disclosure are also intended to include the plural forms, unless the context clearly indicates otherwise. The terms "comprising," "made of," etc., as used in this disclosure specify the presence of the stated feature, integral, step, operation, element, and / or component, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof.

[0084] Unless otherwise specified, all terms used in this disclosure (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and this disclosure, and will not be interpreted as having an idealized or overly formal meaning, unless expressly so defined in this disclosure.

[0085] This disclosure is not limited to the embodiments shown in the accompanying drawings, but includes modifications to the configuration based on the manufacturing process. Therefore, the areas illustrated in the drawings are schematic, and the shapes of the areas shown illustrate specific shapes of the areas of an element, but are not intended to be limiting.

[0086] For ease of understanding and description, some of the terms used in the embodiments of this disclosure will be explained below.

[0087] Request nodes (RNs) can be used to connect to processors or other devices and can initiate different types of access requests, such as read and write, to a consistent memory space.

[0088] A Coherent Home Node (CHN) can be used to cache cache lines in the memory address space it manages, maintaining cache coherency between the processor and storage devices. A cache line is the smallest data block in the cache.

[0089] Chip-to-Chip Protocol Gateway (C2C) can be used for cross-chip communication. Cross-chip communication generally uses a different transmission protocol than on-chip communication. For example, cross-chip communication can use the Chip-to-Chip (C2C) protocol, the Peripheral Component Interconnect Express (PCIE) bus protocol, etc.

[0090] Slave nodes (SNs) can be used to connect to Double Data Rate (DDR) or other types of storage devices to receive requests for access to memory space from the requesting device.

[0091] The Request Node Address Route Table (RART) is located in both the RN and C2C. It can be used to process the target address in the RN or C2C and find the destination node of the next level of routing, i.e., the next-hop node.

[0092] The Home Address Route Table (HART) is located in the CHN and can be used to process the destination address in the CHN and find the target node of the next level of routing, i.e., the next-hop node.

[0093] The debug trace component is a component for monitoring and debugging distributed systems. It can be used to help developers quickly locate the source of failure by recording key information and visualizing the transaction flow.

[0094] Network on Chip (NoC) is a communication network architecture implemented on an integrated circuit chip, used to connect various functional modules, processors, storage devices and other important components on the chip.

[0095] Interconnect buses can be used to connect all processors and storage devices on an on-chip network.

[0096] On-chip bus interconnect network refers to an on-chip network that connects the processor and storage devices through an interconnect bus.

[0097] In related technologies, distributed debug trace components are mainly distributed in various nodes of the on-chip bus interconnect network to monitor transaction routing information and performance information. Figure 1 This is a schematic diagram of the routing path used for debugging and tracing in related technologies. (Refer to...) Figure 1 When any node initiates a debug trace access request, the node that initiates the access first aggregates the debug trace access request to the input / output (I / O) consistency node containing the debug trace component via a low-speed bus. Then, the I / O consistency node containing the debug trace component aggregates the debug trace access request to the input / output consistency request node via a dedicated trace data bus. After that, the input / output consistency request node converts the debug trace access request into a write event based on the consistency bus and sends the write event based on the consistency bus to the slave node through a consistency relay, so that the slave node can initiate access to the memory space.

[0098] The consistency relay process for write events based on the consistency bus can be divided into three paths, which are mainly determined by the location of the accessed memory space. The following section will combine... Figure 2 , Figure 3 and Figure 4 Each will be explained separately.

[0099] Figure 2 This is a schematic diagram of a consistency transfer process in related technologies. (Refer to...) Figure 2If the I / O consistency request node and the accessed memory space are located in the same Non-Uniform Memory Access node (NUMA node), the consistency relay path passes through the I / O consistency request node, the local NUMA node's CHN and SN in sequence.

[0100] Figure 3 This is a schematic diagram of a consistency transfer process in related technologies. (Refer to...) Figure 3 If the I / O consistency request node and the accessed memory space are located in different NUMA nodes within the same chip, the consistency relay path passes through the following nodes in sequence: I / O consistency request node, CHN of the local NUMA node, CHN and SN of the remote NUMA node.

[0101] Figure 4 This is a schematic diagram of a consistency transfer process in related technologies. (Refer to...) Figure 4 If the I / O consistency request node and the accessed memory space are located in different chips, the consistency transfer path passes through the following in sequence: I / O consistency request node, local chip CHN, C2C, remote chip CHN and SN.

[0102] In all three paths described above, regardless of whether it crosses NUMA or chips, it is necessary to first access the CHN with consistent management addresses within the local NUMA node. If the accessed memory address does not contain valid data in the CHN of the local NUMA node, then the next level needs to be determined based on the location of the accessed memory space, such as accessing the SN, the CHN of a different NUMA node within the same chip, or accessing C2C across chips.

[0103] Similarly, processors and other host devices in the on-chip bus interconnect network can also initiate regular read and write transactions to the same memory space. In related technologies, these transactions are also accessed based on the consistency bus, and their path logic is consistent with the three consistency transfer paths mentioned above that start from the I / O consistency request node.

[0104] In related technologies, the operation of aggregation via low-speed bus and the operation of aggregation via dedicated tracking data bus have bandwidth limitations. As the number of processors and other devices integrated in the on-chip bus interconnect network increases, the network size increases, the probability of packet loss increases and the transmission efficiency decreases during the transmission of debugging and tracking data.

[0105] This disclosure provides an on-chip bus interconnect network, which includes multiple nodes located on one or more chips. The multiple nodes include one or more slave nodes, which are used to connect to one or more storage devices and receive access requests for corresponding storage addresses. The storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network. The target address is an address in the preset storage space. Any one of the multiple nodes acts as an access node, which is used to initiate access to the preset storage space.

[0106] In this embodiment of the disclosure, the on-chip bus interconnect network may include one or more chips, each chip may include multiple dies, and each die may include multiple nodes, which may include one or more requesting nodes, one or more coherent nodes, one or more slave nodes, and one or more cross-chip nodes.

[0107] The slave nodes of the on-chip bus interconnect network (OBI) can be used to connect to one or more storage devices, which provide a preset storage space for storing debug trace data. Each node in the OPI can contain a debug trace component to access the preset storage space; that is, each node in the OPI can act as an access node. When the target address accessed by the access node is an address in the preset storage space, the access node can access the debug trace data.

[0108] In some embodiments, the on-chip bus interconnect network of this disclosure may adopt a symmetric multi-processing (SMP) architecture, on which all processors are equal and share system resources.

[0109] Figure 5 This is a schematic diagram of the architecture of the on-chip bus interconnect network provided in an embodiment of this disclosure. (Refer to...) Figure 5 The on-chip bus interconnect network includes Chip0 and Chip1, which are connected via C2C. Taking Chip0 as an example, it includes Die0 and Die2. Die0 includes two RNs, two CHNs, and two SNs; Die1 includes two RNs, two CHNs, and two SNs. The on-chip bus interconnect network provides a memory space including a preset memory space for storing debug trace data. (Refer to...) Figure 5 The components RN, CHN, SN, and C2C each contain a debugtrace component, which can access addresses in the preset storage space to access debug trace data.

[0110] It should be understood that Figure 5 This is merely an example of an on-chip bus interconnect network according to embodiments of this disclosure and is not intended to limit the scope of on-chip bus interconnect networks. In embodiments of this disclosure, there are no limitations on the number of chips in the on-chip bus interconnect network, the number of C2C connections between any two chips, the number of dies contained in any chip, and the number of RNs, CHNs, and SNs contained in any die.

[0111] In this embodiment, the access node first determines whether the access type to the target address is local or cross-chip access based on the attribute information of the non-uniform storage access node to which the preset storage space belongs. Then, it determines the next-hop node based on the access type and the target address, and finally sends the access request for accessing the target address to the next-hop node. In this way, any node in the on-chip bus interconnect network can directly initiate an access request to the target address and directly reach the node connected to the storage device. This eliminates the process of aggregation using a low-speed bus and a dedicated trace data bus, and also eliminates the relay process using a consistency node. This simplifies the routing path of the target address, achieves simple and efficient access, reduces the probability of packet loss, and improves transmission efficiency.

[0112] In a first aspect, embodiments of this disclosure provide a method for accessing a target address in an on-chip bus interconnect network. This method can be applied to the aforementioned on-chip bus interconnect network. (Refer to...) Figure 6 The method for accessing a target address in an on-chip bus interconnect network provided in this disclosure embodiment may include:

[0113] S101, the access node determines the access type based on the attribute information of the NUMA node to which the preset storage space belongs.

[0114] The on-chip bus interconnect network includes one or more chips, each chip includes one or more dies, and each die's slave node can be connected to DDR or other types of storage devices as the storage space for the entire system, used to store the instructions and working data of multiple processors.

[0115] The dies or chips in the on-chip interconnect network can be flexibly grouped into NUMA node groups. Once the NUMA node groups for each die or chip are confirmed, they will not be changed during power-up. In one example, dies on the same chip can be assigned to the same NUMA node group or to different NUMA node groups. Dies on different chips can be assigned to the same NUMA node group.

[0116] In this embodiment of the disclosure, a contiguous address space is defined within the storage space provided by the on-chip interconnect network as a preset storage space for storing debug trace data. This address space is determined synchronously when dividing NUMA node groups and is fixed within a specific group of NUMA nodes. Since the routing attributes of the storage space within this specific NUMA node remain unchanged throughout the entire power-on period, the preset storage space embedded within the NUMA node's storage space also possesses the same routing attributes throughout the entire power-on period.

[0117] An access node can refer to a node used to initiate access to a preset memory space. An access node can be any node in the on-chip interconnect bus network, such as any RN, CHN, SN, or C2C node of any die on any chip. This disclosure does not limit the access node.

[0118] The access node can determine the access type based on the attribute information of the NUMA node to which the preset storage space belongs. The access type is either local access or cross-chip access. Local access means the target address and the access node reside on the same chip. Cross-chip access means the target address and the access node are processed on different chips.

[0119] The attribute information of the NUMA node to which the preset storage space belongs can be used to indicate the chip location of the NUMA node. In one example, the attribute information of the NUMA node to which the preset storage space belongs can be used to indicate whether the NUMA node to which the preset storage space belongs is cross-chip, that is, whether all the nodes contained in the NUMA node to which the preset storage space belongs are on one chip, or can be distributed across different chips. In another example, the attribute information of the NUMA node to which the preset storage space belongs can also be used to indicate the type of the NUMA node to which the preset storage space belongs, that is, whether the NUMA node to which the preset storage space belongs is on a local chip or on a remote chip. The process of determining the access type based on the attribute information of the NUMA node to which the preset storage space belongs will be explained in detail later, and will not be repeated here.

[0120] S102, the access node determines the next-hop node based on the access type and the target address.

[0121] When the access type is local access, it indicates that the target address belongs to the local chip's storage space, and the accessing node can find the target address within the local chip. In this case, the next-hop node is a slave node within the local chip. The accessing node can send the access request to this slave node, which can then directly access the target address, thereby enabling access to debug trace data. Since a chip can include one or more slave nodes, the accessing node also needs to determine the next-hop node from among these slave nodes based on the target address.

[0122] When the access type is cross-chip access, it indicates that the target address belongs to the storage space of another chip, and the accessing node cannot find the target address on its local chip. In this case, the next-hop node is the cross-chip node of the local chip. The accessing node can forward the access request to the cross-chip node of the other chip through the local chip's cross-chip node, and the cross-chip node of the other chip can directly access the target address, thereby enabling access to debug trace data. Since chips can be connected through one or more cross-chip nodes, the accessing node also needs to determine the next-hop node from one or more cross-chip nodes based on the target address.

[0123] S103, the access node sends the access request for accessing the target address to the next-hop node.

[0124] The access request can be used to access a target address. When the access type is local access, the accessing node sends the access request to the slave node of the local chip, which then directly accesses the target address. When the access type is cross-chip access, the accessing node can send the access request to the cross-chip node of the local chip, which then forwards it to the cross-chip nodes of other chips. The cross-chip nodes of other chips then access the target address through the slave node on the same chip.

[0125] In this embodiment, the access node first determines whether the access type to the target address is local or cross-chip access based on the attribute information of the non-uniform storage access node to which the preset storage space belongs. Then, it determines the next-hop node based on the access type and the target address, and finally sends the access request for accessing the target address to the next-hop node. In this way, any node in the on-chip bus interconnect network can directly initiate an access request to the target address and directly reach the node connected to the storage device. This eliminates the process of aggregation using a low-speed bus and a dedicated trace data bus, and also eliminates the relay process using a consistency node. This simplifies the routing path of the target address, achieves simple and efficient access, reduces the probability of packet loss, and improves transmission efficiency.

[0126] The process for determining the access type is explained below.

[0127] In some embodiments, the process of determining the access type by the access node in step S101 based on the attribute information of the NUMAnode to which the preset storage space belongs may include:

[0128] S1011, the access node obtains the attribute information of the NUMA node to which the preset storage space belongs, wherein the attribute information of the NUMA node to which the preset storage space belongs includes information used to indicate whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node.

[0129] The attribute information of the NUMA node to which the preset storage space belongs can be used to indicate whether the NUMA node to which the preset storage space belongs is cross-chip, that is, whether the nodes in the NUMA node to which the preset storage space belongs all belong to one chip or are distributed in different chips.

[0130] S1012, the access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-slice NUMA node.

[0131] In one possible implementation, step S1012, where the access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-slice NUMA node, may include:

[0132] S10121, if it is determined that the NUMA node to which the preset storage space belongs is a cross-chip NUMA node, the access node determines whether the slave node corresponding to the target address is located on the chip where the access node is located based on the target address.

[0133] S10122, if it is determined that the slave node corresponding to the target address is located on the chip where the access node is located, the access node determines that the access type is local access.

[0134] S10123, if it is determined that the slave node corresponding to the target address is not located on the chip where the access node is located, the access node determines that the access type is cross-chip access.

[0135] In this context, the slave node corresponding to the target address refers to the slave node that can directly access the target address. When the target address is located in the storage space of a storage device connected to a slave node, that slave node can be called the slave node corresponding to the target address.

[0136] When the NUMA node belonging to the preset storage space is a cross-chip NUMA node, it indicates that the NUMA nodes belonging to the preset storage space are distributed across different chips. In this case, we can first determine whether the slave node corresponding to the target address is located on the chip where the access node is located; that is, whether the chip to which the slave node corresponding to the target address belongs is the same chip as the chip to which the access node belongs. If they are the same chip, the target address can be accessed locally, so the access node can determine that the access type is local access. If they are not the same chip, the target address needs to be accessed from other chips, so the access node can determine that the access type is cross-chip access.

[0137] As one embodiment of this disclosure, when the NUMA node to which the preset storage space belongs is a cross-chip NUMA node, the access type is determined by further determining whether the chip of the slave node corresponding to the target address and the chip of the access node are the same chip.

[0138] In one possible implementation, step S1012, where the access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-slice NUMA node, may include:

[0139] S10124, if it is determined that the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, the access node determines whether the NUMA node is on the chip where the access node is located.

[0140] S10125, if it is determined that the NUMA node is on the chip where the access node is located, the access node determines that the access type is local access.

[0141] S10126, if it is determined that the NUMA node is not on the chip where the access node is located, the access node determines that the access type is cross-chip access.

[0142] In one example, if the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, the access node pre-stores NUMA node type information indicating whether the NUMA node is on the chip where the access node is located, wherein the attribute information of the NUMA node includes the NUMA node type information.

[0143] If the NUMA node belonging to the preset memory space is not a cross-chip NUMA node, all nodes of the NUMA node belonging to the preset memory space are on a single chip, and the specific chip on which these nodes are located is already determined and will not change during the entire power-on process. Therefore, for the access node, it is determined whether the chip to which the slave node corresponding to the target address belongs is the same chip as the chip to which the access node belongs. Therefore, the access node can pre-store NUMA node type information indicating whether the NUMA node is on the chip where the access node is located.

[0144] If it is determined that the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, the access node can determine whether the NUMA node to which the preset storage space belongs is on the chip where the access node is located, based on the type information stored on it in advance.

[0145] If the NUMA node to which the preset storage space belongs is located on the chip where the access node is located, the access node can determine that the access type is local access. If the NUMA node to which the preset storage space belongs is not located on the chip where the access node is located, the access node can determine that the access type is cross-chip access.

[0146] As one embodiment of this disclosure, when the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, it is possible to directly determine whether the NUMA node to which the preset storage space belongs is on the chip where the access node is located, thereby determining the access type.

[0147] This confirms whether the access type is local or cross-site access.

[0148] The following explains the process of determining the next-hop node when the access type is local access.

[0149] In some embodiments, step S102, where the access node determines the next-hop node based on the access type and the target address, may include:

[0150] S1021, when the access type is local access, the access node determines the slave node corresponding to the target address from the slave nodes on the chip where the access node is located, and uses it as the next hop node.

[0151] When the access type is local access, it indicates that the slave node corresponding to the target address is located on the chip where the access node is located. Therefore, the access node can determine the slave node corresponding to the target address from the slave nodes of the chip where it is located, and use it as the next hop node.

[0152] In some embodiments, the process of the access node determining the slave node corresponding to the target address from the slave nodes on the chip where the access node is located in step S1021 may include:

[0153] S10211, the access node uses a calculation method corresponding to the hash pattern information to determine the subordinate node corresponding to the target address.

[0154] The attribute information of the NUMA node to which the preset storage space belongs includes the hash pattern information of the NUMA node. The hash information can be used to identify the calculation method for subsequently determining the slave node corresponding to the target address.

[0155] In one example, the hash pattern information includes at least one of the following: the number of slave nodes included in the NUMA node; whether the slave nodes included in the NUMA node are grouped; the number of groups of slave nodes included in the NUMA node; and the number of groups of consistent nodes included in the NUMA node. Different NUMA nodes have different hash pattern information, meaning the number of slave nodes may differ, the slave nodes may be grouped or not, and if the slave nodes are grouped, the number of groups may also differ.

[0156] The access node can determine the calculation method based on the hash pattern information of the NUMA node to which the preset storage space belongs, and then use the calculation method to determine the slave node corresponding to the target address.

[0157] In some embodiments, step S10211, in which the access node uses a calculation method corresponding to the hash pattern information to determine the slave node corresponding to the target address, may include: the access node determining the calculation method corresponding to the hash pattern information based on whether the number of slave nodes contained in the NUMA node is an integer power of 2 and whether the slave nodes contained in the NUMA node are grouped; and the access node using the calculation method to determine the slave node corresponding to the target address.

[0158] In one possible implementation, when the number of slave nodes in the NUMA node is a power of 2, the slave nodes in the NUMA node are not grouped, and the consensus nodes in the NUMA node are grouped and the number of consensus node groups is a power of 2 multiple of the number of slave nodes in the NUMA node, the calculation method is as follows:

[0159] S201, based on the multiple relationship between the number of consensus node groups and the number of slave nodes, establish a correspondence between consensus node groups and slave nodes, such that 2 powers of 2 consensus node groups correspond to one slave node.

[0160] For example, suppose a NUMA node contains 4 slave nodes (2 to the power of 2), and the number of consistent node groups within a NUMA node is 16 (4 to the power of 2). In this case, the number of consistent node groups within a NUMA node is 4 times the number of slave nodes, meaning the ratio between the two is 4. Therefore, 4 groups of consistent nodes can correspond to one slave node.

[0161] It should be understood that in the embodiments of this disclosure, the integer powers of 2 include 2 to the power of 0, 2 to the power of 1, 2 to the power of 2, etc., which will not be elaborated here.

[0162] S202, based on the number of groups of the consensus nodes, perform a hash operation on the target address to obtain the index value of the consensus node group.

[0163] Taking a consensus node group of 16 as an example, which is an integer power of 2, the target address can be processed by conventional hash operation (such as modulo). The resulting hash value ranges from 0 to 15. This hash value is the index value of the consensus node, and one hash value corresponds to one of the 16 consensus node groups.

[0164] S203, the slave node corresponding to the consistency node group represented by the index value of the consistency node group is determined as the slave node corresponding to the target address.

[0165] Based on the index value obtained in S202, a consistent node group can be found. Based on the correspondence established in step S201, the slave node corresponding to the consistent node group can be found. This slave node is the slave node corresponding to the target address.

[0166] For example, if the index value of the consistent node group obtained in step S202 is 5, and the correspondence established in step S201 is that the consistent node group represented by index values ​​0-3 corresponds to slave node 1, and the consistent node group represented by index values ​​4-7 corresponds to slave node 2, then step S203 can determine the slave node 2 corresponding to the consistent node group represented by index value 5 as the slave node corresponding to the target address.

[0167] In one possible implementation, when the number of slave nodes included in the NUMA node is not an integer power of 2 and the slave nodes included in the NUMA node are not grouped, the calculation method is as follows:

[0168] S301, based on the number of slave nodes contained in the NUMA node, perform a hash operation on the target address that is not an integer power of 2 to obtain the index value of the slave node, and determine the slave node corresponding to the slave node index value as the slave node corresponding to the target address.

[0169] Since the number of slave nodes is not an integer power of 2 (such as 5, 7, etc.), using conventional hash modulo operations may result in uneven distribution of index values, affecting access efficiency. Therefore, in this embodiment, when the number of slave nodes is not an integer power of 2, a hash operation of a non-integer power of 2 (such as multiply-add shift hash algorithm, Fibonacci hash algorithm, etc.) is used, so that the obtained slave node index values ​​can be evenly distributed in 0-(n-1), where n is the number of slave nodes, thereby accurately determining the slave node corresponding to the target address.

[0170] For example, if the number of slave nodes is 5, an index value of 3 can be obtained through a hash operation that is not an integer power of 2. The slave node represented by index value 3 is the slave node corresponding to the target address.

[0171] In one possible implementation, when the number of slave nodes included in the NUMA node is not a power of 2, the slave nodes included in the NUMA node are grouped, the consistent nodes included in the NUMA node are grouped, and the number of consistent node groups included in the NUMA node is a power of 2 multiple of the number of slave node groups included in the NUMA node, the calculation method is as follows:

[0172] S401, based on the multiple relationship between the number of groups of the consistent nodes and the number of groups of the subordinate nodes, establish a correspondence between the consistent node groups and the subordinate node groups, such that 2 powers of 2 consistent node groups correspond to one subordinate node group.

[0173] For example, suppose the number of slave nodes within a NUMA node is 6 (not a power of 2), and this is divided into 2 slave node groups, each containing 3 slave nodes. The number of groups for a consistent node is 8, which is 4 times the number of groups for slave nodes (a power of 2). Therefore, the ratio between the number of groups for a consistent node and the number of groups for a slave node is 4:1. Thus, the established correspondence is 4 consistent node groups for every 1 slave node group.

[0174] S402, perform a hash operation on the target address based on the number of groups of consistent nodes contained in the NUMA node to obtain the index value of the consistent node group.

[0175] Assuming the number of groups in the consistent node is 8, the target address is processed using a conventional hash algorithm, and the resulting hash value ranges from 0 to 7. This hash value is the index value of the consistent node group.

[0176] S403, the subordinate node group corresponding to the consistent node group represented by the index value of the consistent node group is determined as the target subordinate node group.

[0177] The consistent node group can be found based on the index value obtained in step S402. Based on the correspondence established in step S401, the corresponding slave node group, i.e., the target slave node group, can be found. For example, if the index value of the consistent node group obtained in step S402 is 5, and the correspondence established in step S401 is that index values ​​0-3 correspond to slave node group 1, and index values ​​4-7 correspond to slave node group 2, then the target slave node group is slave node group 2.

[0178] S404, based on the number of slave nodes in a slave node group, perform a hash operation on the target address that is not an integer power of 2 to obtain the index value in the slave node group.

[0179] Assuming that the number of slave nodes in each slave node group is 3 (not an integer power of 2), the target address is processed using a hash operation that is not an integer power of 2, resulting in an index value ranging from 0 to 2. This index value is the index value within the slave node group.

[0180] S405, the subordinate node represented by the index value in the subordinate node group corresponding to the consistent node group represented by the index value of the consistent node group is determined as the subordinate node corresponding to the target address.

[0181] By combining the target slave node group determined in step S403 and the group index value determined in step S404, the slave node corresponding to the target address can be found in the target slave node group. For example, if the target slave node group is slave node group 2 (including slave node 4, slave node 5, and slave node 6) and the group index value is 0, then the slave node corresponding to the target address is slave node 4.

[0182] The following explains the process of determining the next-hop node when the access type is cross-shard access.

[0183] In some embodiments, at least one node in each chip's nodes is a cross-chip node, and each cross-chip node is connected to cross-chip nodes on other chips. In some embodiments, step S102, in which the access node determines the next-hop node based on the access type and the target address, may include:

[0184] S1022, when the access type is cross-chip access, the access node determines the cross-chip nodes that need to be passed through to reach the target chip from the cross-chip nodes on the chip where the access node is located according to the target address, and uses them as the next hop nodes. The target chip is the chip where the slave node corresponding to the target address is located.

[0185] In the case of cross-chip access, it indicates that the slave node corresponding to the target address is not located on the chip where the accessing node is located. The accessing node may need to forward the access request to the target chip through a cross-chip node. The target chip is the chip where the slave node corresponding to the target address is located. Therefore, the accessing node can determine the cross-chip node required to reach the target chip from the cross-chip nodes of the chip where the accessing node is located, based on the target address.

[0186] In some embodiments, the process in step S1022 where the access node determines the cross-chip nodes required to reach the target chip from the cross-chip nodes on the chip where the access node is located, based on the target address, may include:

[0187] S10221, Obtain the number of cross-chip nodes on the chip where the access node is located;

[0188] S10222, when the number of cross-chip nodes is an integer power of 2 and not 1, perform a hash operation of an integer power of 2 on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes required to reach the target chip, and / or, when the number of cross-chip nodes is not an integer power of 2, perform a hash operation of a non-integer power of 2 on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes required to reach the target chip, and / or, when the number of cross-chip nodes is 1, determine the cross-chip node as the cross-chip node required to reach the target chip.

[0189] If the number of cross-shard nodes is an integer power of 2 and not 1, such as 2, 4, 8, etc., an integer power of 2 hash operation (such as modulo operation) can be used to process the target address. The resulting index value can accurately match the corresponding cross-shard node, which is computationally efficient and has a small computational load.

[0190] If the number of cross-shard nodes is not an integer power of 2, such as 3, 5, 7, etc., a hash operation that is not an integer power of 2 (such as a multiply-add shift hash algorithm) can be used to avoid uneven distribution of index values ​​and ensure the rationality of cross-shard node selection.

[0191] If the number of cross-shard nodes is 1, then there is no need to perform hash calculations. The unique cross-shard node can be directly determined as the next-hop node, which simplifies the process of determining the next-hop node.

[0192] The above completes the process of determining the next-hop node under different access types.

[0193] After the access node sends the access request to the cross-chip node, the access to the target address has not yet been completed, therefore further processing by the cross-chip node is required. In some embodiments, the method for accessing a target address in an on-chip bus interconnect network provided in this disclosure may further include:

[0194] S104, the cross-chip node receives the access request and sends the access request to the cross-chip node on the target chip connected to it.

[0195] After receiving the access request, the cross-chip node of the access node receives the access request and sends it to the cross-chip node of the target chip.

[0196] S105, the cross-chip node on the target chip determines the slave node corresponding to the target address and sends the access request to the slave node.

[0197] Since the cross-chip node of the target chip and the slave node corresponding to the target address are on the same chip, the cross-chip node of the target chip can directly send the access request to the slave node. This step can be referred to step S1021, and will not be repeated here.

[0198] In addition to meeting the aforementioned address range definitions and routing paths, it is also necessary to maintain the independence of consistent routing across the bus network. This means that for the entire on-chip bus interconnect network, regardless of the path, the final destination node must be identical. Besides the debug tracing component accessing addresses in the preset memory space, processors and other host devices in the system will also initiate regular read / write transactions to the same address range. If the target addresses of both types of access are the same, debug tracing access and regular read / write access must be able to hit the same memory device.

[0199] In this embodiment of the disclosure, while implementing debug tracing routes, it is also necessary to maintain the independence of consistent routes on the original bus network, ensuring the consistency of the destination nodes between debug tracing routes and conventional storage device routes. In some embodiments, the method for accessing a target address in an on-chip bus interconnect network provided in this embodiment of the disclosure may further include:

[0200] S106, the access node performs a hash operation on the target address to a power of 2 based on the number of groups of consistent nodes contained in the NUMA node, and obtains the group index value.

[0201] The access node first obtains the number of groups of consistent nodes within the NUMA node. In this embodiment, the number of groups is configured as an integer power of 2, such as 4, 8, or 16. Therefore, using a hash operation with an integer power of 2 can efficiently and accurately obtain the group index value. For example, if the number of groups of consistent nodes is 8, the hash value obtained after hashing the target address will be in the range of 0-7. This hash value is the group index value, and each hash value corresponds to a consistent node group.

[0202] S107, the access node performs a hash operation that is not a power of 2 on the number of consistent nodes in a consistent node group to obtain the index value of the group.

[0203] Considering that the number of consistent nodes within a single consistent node group may be a non-integer power of 2, such as 3 or 5, a non-integer power of 2 can be used for hash operations. Taking a single consistent node group with 3 consistent nodes as an example, a hash operation using a non-integer power of 2 can yield a uniformly distributed index value within the group in the range of 0-2, ensuring accurate matching of the specific consistent node within the group.

[0204] S108, the access node determines the consistency node corresponding to the target address based on the group index value and the intra-group index value.

[0205] The access node first locates the consistent node group through the group index value, and then finds the specific consistent node within the consistent node group through the group index value. The combination of the two can uniquely determine the consistent node corresponding to the target address.

[0206] S109, the consistency node corresponding to the target address manages the cache consistency of the target address.

[0207] Consistent nodes manage the replicas of data corresponding to a target address across various caches, ensuring that different nodes accessing the data retrieve the latest and consistent content. By maintaining the independence of consistent routes, interference with regular storage device access routes during debugging and tracing processes can be avoided. This ensures that the target nodes (slave nodes and consistent nodes) for both types of routes remain consistent, satisfying debugging and tracing requirements without affecting the efficiency and consistency of normal data access.

[0208] In this embodiment of the disclosure, the on-chip bus interconnect network uses a coherent bus to connect multiple processors and connects to a storage device. The on-chip bus interconnect network has distributed content and involves multi-core coherence.

[0209] The method for accessing a target address in an on-chip bus interconnect network provided in this disclosure can be applied to server chips or chip systems. Server chips have high computational demands, requiring the connection of multiple processor vectors via a coherence bus and support for services across multiple debug and tracing address spaces. Chip systems consist of multiple chips and need to support debug and tracing services across these multiple chips.

[0210] In this embodiment, the data aggregation step is omitted in the debug tracing route, and the slave node connected to the storage device is directly accessed, bypassing the consistency node, in the consistency route, thus achieving direct access to the target address. Simultaneously, the independence of the consistency route on the original bus network is maintained, ensuring consistency of the destination node between the debug tracing route and the regular data access route.

[0211] Secondly, embodiments of this disclosure provide an on-chip bus interconnect network, wherein the on-chip bus interconnect network includes multiple nodes located on one or more chips, the multiple nodes include one or more slave nodes, the slave nodes are used to connect to one or more storage devices and receive access requests for corresponding storage addresses, the storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network, the target address is an address in the preset storage space, any one of the multiple nodes is used as an access node, the access node is used to initiate access to the preset storage space;

[0212] The access node is used to determine the access type based on the attribute information of the non-uniform memory access node (NUMAnode) to which the preset storage space belongs. The access type is either local access or cross-slice access.

[0213] The access node is used to determine the next-hop node based on the access type and the target address;

[0214] The access node is used to send the access request for accessing the target address to the next-hop node.

[0215] Thirdly, embodiments of this disclosure provide a computer-readable medium storing a computer program that, when executed by a processor, implements any of the methods for accessing a target address in an on-chip bus interconnect network according to embodiments of this disclosure.

[0216] Fourthly, embodiments of this disclosure provide a computer program product, which includes a computer program that, when executed by a processor, implements any of the methods for accessing a target address in an on-chip bus interconnect network according to embodiments of this disclosure.

[0217] For ease of understanding, the preset storage space used for storing debugging and tracking data in the embodiments of this disclosure will be described below.

[0218] In this embodiment, the on-chip bus interconnect network includes multiple chips, each chip includes multiple dies, and each die can be equipped with DDR or other types of storage devices. These storage devices together serve as the storage space for the entire system. In a distributed storage architecture, the dies or chips in the on-chip bus interconnect network can be flexibly grouped into NUMA nodes according to application requirements. Once the grouping is confirmed, it will not be changed during power-on.

[0219] In this embodiment, a contiguous address space is defined as a preset storage space for storing debug trace data. This address space is determined synchronously during NUMA node grouping and is fixed within a specific group of NUMA nodes. Throughout the power-on period, the routing attributes of this specific NUMA node remain unchanged. Therefore, the preset storage space embedded within this NUMA node also has the same routing attributes. This allows the address matching unit used for NUMA node location to be removed from the TART component, greatly simplifying the component's complexity.

[0220] To distinguish debug trace data between different devices, each device in the on-chip interconnect bus network is assigned a non-overlapping continuous access address. These addresses all fall within the address range of a preset memory space. Finally, based on the address where the debug trace data is stored, it can be accurately determined which device the debug trace data comes from.

[0221] In this embodiment, the process of using a low-speed bus and a dedicated bus for trace data aggregation is first eliminated. However, if a route using the consistency bus is initiated directly from any node, RART needs to be built into all nodes to address the CHN, resulting in excessive area overhead. Therefore, in this embodiment, based on the characteristic that debug trace data does not need to maintain consistency, the consistency relay process via the CHN node is further omitted. Since the CHN node relay is no longer needed, the routing when any node accesses debug trace data is simplified. In this embodiment, the routing when any node accesses debug trace data is divided into two paths according to the access type: one corresponding to cross-chip access (see...). Figure 7 Another type corresponds to local access (see...) Figure 8 ).

[0222] Figure 7 This is a schematic diagram of the debug tracing access path provided in an embodiment of this disclosure. (Refer to...) Figure 7If the NUMA node and the access node of the preset storage space are not on the same chip, the path passes through: any node, C2C, and SN in sequence. Here, SN is the SN corresponding to the target address, and C2C includes the C2C on the chip where the any node is located and the C2C on the chip where the SN corresponding to the target address is located. Any access node can directly address using its built-in TART. Since the target address is managed by the SN of the remote chip, after TART processing, the access request is directly routed to the C2C node within the local chip (i.e., the chip where the access node is located) for cross-chip access. Upon reaching the target chip (i.e., the chip where the SN corresponding to the target address is located), the C2C of the target chip also uses its built-in TART for further processing, thus directly routing to the SN corresponding to the target address within the target chip.

[0223] Figure 8 This is a schematic diagram of the debug tracing access path provided in an embodiment of this disclosure. (Refer to...) Figure 8 If the NUMA node and the access node of the preset storage space are located on the same chip, the path will pass through: any node and the SN in sequence. Here, the SN is the SN on the chip where the access node resides. The access node can directly address the target address using the built-in TART component. Since the target address is managed by the SN of the local chip, TART processing can directly route the path to the SN managing that address within the local chip.

[0224] This disclosure provides a method for accessing a target address in an on-chip bus interconnect network. Each node directly initiates write transactions based on the consistency bus and directly reaches the storage device. Compared with related technologies, this method eliminates the steps of low-speed bus and trace data dedicated bus aggregation and conversion. At the same time, when initiating write transactions using the consistency bus, the relay process of consistency nodes is eliminated, achieving simple and efficient access, saving bandwidth, reducing packet loss, and improving efficiency.

[0225] The processing procedure of TART in this embodiment of the present disclosure will be described below with reference to examples.

[0226] Example

[0227] This disclosure provides a TART routing component distributed across various nodes in an on-chip bus interconnect network, including but not limited to RN, CHN, SN, and C2C. Any node can initiate access to a target address via its built-in TART, thereby increasing bandwidth, reducing packet loss, and improving efficiency.

[0228] TART is distributed across various nodes, enabling it to bypass the data aggregation steps via the low-speed bus and dedicated trace bus, directly initiating write transactions to the storage device via the consistent bus. Furthermore, because debug traces do not require consistency maintenance, it can bypass consistent nodes and reach the storage device directly, further saving routing overhead. Simultaneously, TART can maintain the independence of consistent routes on the original bus network, ensuring consistency between the destination nodes for debug trace data and regular data routes.

[0229] First, a preset storage space for storing debug trace data is defined to exist only in a specific NUMA node during the entire power-on period. After grouping NUMA nodes, the routing attributes within all NUMA nodes will not change during power-on, such as the number of CHN nodes and the number of SN nodes.

[0230] Based on the above definition, in this embodiment of the disclosure, any access request for debug trace data initiated by any node will have a target address that falls within a specific NUMA node. Therefore, TART no longer needs to design address matching modules for multiple different address ranges. Any access request for debug trace data initiated on any node only needs to use one set of hash calculation units, eliminating the need to design multiple sets of hash calculation units for multiple different address ranges. Thus, the structure of TRAT is greatly simplified.

[0231] TRAT can preset the attribute information of the NUMA node to which the storage space belongs, determine the access type, and then perform different processing on different access types. In this embodiment of the disclosure, the access types are divided into local access and cross-segment access, and TRAT can process local access and cross-segment access separately.

[0232] Figure 9 This is a schematic diagram of TART processing provided in an embodiment of this disclosure. (Refer to...) Figure 9 The TART can include a hash attribute arbitration processing unit, an SN hash calculation unit, a C2C hash calculation unit, an SN identifier lookup table, and a C2C identifier lookup table. After receiving the target address, the TART can calculate and output either the SN target identifier or the C2C target identifier. The following sections describe each part of the TART.

[0233] The hash attribute arbitration processing unit can perform the following arbitration processing based on the access type determined by the attribute information of the NUMA node to which the preset storage space belongs:

[0234] If multiple chips are assigned to a single NUMA node, the target address is first hashed using a power of 2 based on the number of chips. For example, if two chips are assigned to a NUMA node, the target address is hashed using a power of 2, resulting in a hash value of 0 or 1. Then, this hash value is used as an index value, and based on whether the index value is 0 or 1, the target address is assigned to the local or remote SN management module.

[0235] If multiple chips are not grouped into a single NUMA node, the default storage space's local or remote attribute relative to the current node is already determined when configuring the entire architecture. In this case, you can directly configure whether the default storage space belongs to the local chip or the cross-chip relative to the current node.

[0236] After configuring the hash attribute arbitration unit, when a target address enters TRAT, its access type can be determined as either local or cross-segment access. Then, based on the access type, the data is input to the appropriate hash calculation unit. For example, if the access type is local, it is input to the SN hash calculation unit; if the access type is cross-segment access, it is input to the C2C hash calculation unit. The SN and C2C hash calculation units are explained below.

[0237] When the slave node corresponding to the target address is located on the chip where the access node is located, in order to ensure that the consistency of the consistent route can be maintained even after crossing the CHN node, three calculation methods can be formed based on the hash pattern information of the NUMA node to which the preset storage space belongs (also known as the hit NUMA node).

[0238] The first scenario involves a NUMA node whose number of slave nodes is a power of 2, whose slave nodes are not grouped, and whose consistent nodes are grouped, with the number of such groups being a power of 2 multiple of the number of slave nodes in the NUMA node. In this case, TART can establish a correspondence between consistent node groups and slave nodes based on the multiple relationship between the number of consistent node groups and the number of slave nodes, such that a power of 2 consistent node groups correspond to one slave node. Based on the number of consistent node groups, a hash operation is performed on the target address to obtain the index value of the consistent node group. The slave node corresponding to the consistent node group represented by the index value of the consistent node group is then determined as the slave node corresponding to the target address.

[0239] For regular consistent routing access that requires a consistent node as an intermediary, the target address is hashed to a power of 2 based on the number of consistent nodes in the NUMA node group to obtain the group index value. The accessing node is hashed to a power of 2 based on the number of consistent nodes in a consistent node group to obtain the group index value. The accessing node determines the consistent node corresponding to the target address based on the group index value and the group index value. The consistent node corresponding to the target address manages the cache consistency of the target address.

[0240] Since all consistent nodes in a set of consistent nodes correspond to a single, specific slave node, the independence of consistent routing is achieved.

[0241] The second scenario involves a NUMA node whose number of slave nodes is not a power of 2 and whose slave nodes are not grouped. In this case, the CHN grouping method is no longer necessary. For TART, based on the number of slave nodes in the NUMA node, a hash operation is performed on the target address to obtain the index value of the slave node. The slave node corresponding to the index value is then determined as the slave node corresponding to the target address, thus evenly distributing the address among the slave nodes.

[0242] At this point, for regular consistent routing access that requires a consistent node as an intermediary, the hashing algorithm in the CHN is no longer important. It is sufficient to pre-determine that all CHNs within the storage space are hashed using a non-power of 2 integer based on the number of SNs, thus evenly distributing the addresses of all CHNs across these SNs. This achieves the independence of consistent routing.

[0243] The third type is when the number of slave nodes contained in the hit NUMA node is not a power of 2, the hit NUMA node contains slave node groups, the hit NUMA node contains consistent node groups, and the number of consistent node groups in the hit NUMA node is a power of 2 multiple of the number of slave node groups in the hit NUMA node. At this point, TART establishes a correspondence between consistent node groups and slave node groups based on the multiple relationship between the number of groups of consistent nodes and the number of groups of slave nodes, such that an integer power of 2 consistent node groups correspond to one slave node group. A hash operation is performed on the target address based on the number of groups of consistent nodes contained in a NUMA node to obtain the index value of the consistent node group. The slave node group corresponding to the consistent node group represented by the index value of the consistent node group is determined as the target slave node group. Based on the number of slave nodes in a slave node group, a hash operation is performed on the target address based on a non-integer power of 2 to obtain the index value within the slave node group. The slave node represented by the index value of the consistent node group corresponding to the consistent node group is determined as the slave node corresponding to the target address. In this way, the address is evenly distributed among the slave nodes.

[0244] For regular consistent routing access requiring a consistent node as an intermediary, the target address is hashed using a power of 2 based on the number of consistent nodes in the NUMA node's group, yielding a group index value. The accessing node then performs a non-power-of-2 hash operation on the number of consistent nodes within a group to obtain a group index value. The accessing node determines the consistent node corresponding to the target address based on the group index value and the group index value. The consistent node corresponding to the target address manages the cache consistency of the target address. The difference from the first method lies in the final step: configuring a non-power-of-2 hash operation on the number of SNs corresponding to each group of consistent nodes, further distributing the CHNs within the group across different SNs. This achieves the independence of consistent routing.

[0245] When the slave node corresponding to the target address is not located on the chip where the access node is located, TART can obtain the number of cross-chip nodes on the chip where the access node is located; if the number of cross-chip nodes is an integer power of 2 and not 1, a hash operation of an integer power of 2 is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be passed to reach the target chip; and / or if the number of cross-chip nodes is not an integer power of 2, a hash operation of a non-integer power of 2 is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be passed to reach the target chip; and / or if the number of cross-chip nodes is 1, the cross-chip node is determined as the cross-chip node that needs to be passed to reach the target chip.

[0246] Since cross-chip access does not affect the final result as long as the target chip can be accessed correctly, C2C does not require special restrictions and can be handled according to the actual situation.

[0247] Reference Figure 9 After the SN hash calculation unit outputs the index value of SN, TART can query the SN identifier lookup table based on the index value obtained from the hash calculation to find the SN target identifier. In one example, the SN target identifier can represent the identifier of the next hop when the next hop is SN. The SN target identifier can be represented by its position coordinates (x, y) on the on-chip bus interconnect network. SN(0)tgtid represents the SN target identifier when the index value is 0, and SN(m)tgtid can identify the SN target identifier when the index value is m.

[0248] After the C2C hash calculation unit outputs the C2C index value, TART can query the C2C identifier lookup table based on the hash-calculated index value to find the C2C target identifier. In one example, the C2C target identifier can represent the representation of the next hop when the next hop is C2C. The C2C target identifier can be represented by its position coordinates (x, y) on the on-chip bus interconnect network. C2C(0)tgtid represents the C2C target identifier when the index value is 0, and C2C(k)tgtid can identify the C2C target identifier when the index value is m.

[0249] In this embodiment, compared to related technologies, the data aggregation steps via a low-speed bus and a dedicated bus for tracking data are firstly omitted, reducing data packet loss and improving transmission efficiency; secondly, based on the characteristic that debugging and tracking data does not require consistency maintenance, the requirement for CHN relay in the consistency bus routing is abandoned, which also greatly improves transmission efficiency; finally, based on the limited preset storage space location, an extremely streamlined and efficient routing component (i.e., TART) is developed to complete a direct SN mapping routing component that can maintain the independence of consistent routing. According to comprehensive simulation, compared to the RART component used in the original I / O consistency bus, the area of ​​the current component is reduced from 9150 square nanometers to 384 square nanometers, greatly reducing the component complexity.

[0250] This disclosure implements routing support for direct write operations to storage devices. In on-chip interconnect bus networks composed of multiple chips and dies, storage device access initiated by a requesting device typically requires caching the managed memory space address cache lines through the CHN to maintain cache consistency between the processor and the storage device. Each critical node in the routing includes a Node Address Routing Table (ART) to find the destination node for the next level of routing. Distributed debug tracing access does not require maintaining cache consistency, thus bypassing the CHN. The routing scheme that directly maps devices to storage devices can reduce path latency and improve system performance.

[0251] At the same time, the independence of consistent routing on the original bus network is maintained. The default address space is a contiguous block of memory. For debug trace access and regular DDR access from the same address, the final destination node of the route must remain consistent in order to maintain the independence of consistent routing on the bus network. Therefore, it is necessary to achieve the same destination node in scenarios where there is no intermediate consistent local node.

[0252] Those skilled in the art will understand that all or some of the steps, systems, and devices disclosed above, as functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0253] In hardware implementations, the division between functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be executed by several physical components working together.

[0254] Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit (CPU), digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media and communication media. In embodiments of this disclosure, computer storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, and any other media that can be used to store desired information and can be accessed by a computer. In embodiments of this disclosure, communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

Claims

1. A method of accessing a target address in a bus interconnect network on a chip, characterized by, The on-chip bus interconnect network includes multiple nodes located on one or more chips. Each node includes one or more slave nodes. Each slave node connects to one or more storage devices and receives access requests for corresponding storage addresses. The storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network. The target address is an address within the preset storage space. Any one of the multiple nodes acts as an access node, which initiates access to the preset storage space. The method includes: The access node determines the access type based on the attribute information of the non-uniform memory access node (NUMA node) to which the preset storage space belongs. The access type is either local access or cross-slice access. The access node determines the next-hop node based on the access type and the target address; The access node sends the access request for accessing the target address to the next-hop node; Wherein, the access node determines the next-hop node based on the access type and the target address, including: When the access type is local access, the access node determines the slave node corresponding to the target address from the slave nodes on the chip where the access node is located, and uses it as the next hop node; The attribute information of the NUMA node to which the preset storage space belongs includes the hash mode information of the NUMA node. The access node determines the slave node corresponding to the target address from the slave nodes on the chip where the access node is located, including: The access node uses a calculation method corresponding to the hash pattern information to determine the subordinate node corresponding to the target address; The hash pattern information includes at least one of the following: The number of slave nodes included in the NUMA node; Whether the subordinate nodes included in the NUMA node are grouped; The number of slave node groups contained in the NUMA node; The number of groups of consistent nodes contained in the NUMA node.

2. The method of claim 1, wherein, The access node determines the access type based on the attribute information of the non-uniform memory access node (NUMA node) to which the preset storage space belongs, including: The access node obtains the attribute information of the NUMA node to which the preset storage space belongs, wherein the attribute information of the NUMA node to which the preset storage space belongs includes information used to indicate whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node. The access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node.

3. The method of claim 2, wherein, The access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node, including: If it is determined that the NUMA node to which the preset storage space belongs is a cross-chip NUMA node, the access node determines whether the slave node corresponding to the target address is located on the chip where the access node is located based on the target address; If it is determined that the slave node corresponding to the target address is located on the chip where the access node is located, the access node determines that the access type is local access; If it is determined that the slave node corresponding to the target address is not located on the chip where the access node is located, the access node determines that the access type is cross-chip access.

4. The method of claim 2, wherein, The access node determines the access type based on whether the NUMA node to which the preset storage space belongs is a cross-shard NUMA node, including: If it is determined that the NUMA node to which the preset storage space belongs is not a cross-chip NUMA node, the access node determines whether the NUMA node is on the chip where the access node is located; If it is determined that the NUMA node is located on the chip where the access node is located, the access node determines that the access type is local access; If it is determined that the NUMA node is not on the chip where the access node is located, the access node determines that the access type is cross-chip access.

5. The method of claim 4, wherein, When the NUMA node is not a cross-chip NUMA node, the access node pre-stores NUMA node type information indicating whether the NUMA node is on the chip where the access node is located, wherein the attribute information of the NUMA node includes the NUMA node type information.

6. The method of claim 1, wherein, At least one node in each chip's nodes is a cross-chip node, and each of the cross-chip nodes is connected to cross-chip nodes on other chips. The access node determines the next-hop node based on the access type and the target address, including: When the access type is cross-chip access, the access node determines the cross-chip nodes that need to be passed through to reach the target chip from the cross-chip nodes on the chip where the access node is located, based on the target address, and uses them as the next hop nodes. The target chip is the chip where the subordinate node corresponding to the target address is located.

7. The method according to claim 6, characterized in that, The method further includes: The cross-chip node receives the access request and sends the access request to the cross-chip node on the target chip connected to it; The cross-chip node on the target chip determines the slave node corresponding to the target address and sends the access request to the slave node.

8. The method according to claim 1, characterized in that, The access node uses a calculation method corresponding to the hash pattern information to determine the subordinate node corresponding to the target address, including: The access node determines the calculation method corresponding to the hash pattern information based on whether the number of slave nodes contained in the NUMA node is an integer power of 2 and whether the slave nodes contained in the NUMA node are grouped. The access node uses the calculation method described above to determine the slave node corresponding to the target address.

9. The method according to claim 8, characterized in that, When the number of slave nodes in the NUMA node is a power of 2, the slave nodes in the NUMA node are not grouped, and the consensus nodes in the NUMA node are grouped and the number of consensus node groups is a power of 2 multiple of the number of slave nodes in the NUMA node, the calculation method is as follows: Based on the multiple relationship between the number of consensus node groups and the number of slave nodes, a correspondence between consensus node groups and slave nodes is established, such that an integer power of 2 consensus node groups correspond to one slave node. Based on the number of groups of the consistent nodes, a hash operation is performed on the target address to obtain the index value of the consistent node group; The slave node corresponding to the consensus node group represented by the index value of the consensus node group is determined as the slave node corresponding to the target address.

10. The method according to claim 8, characterized in that, When the number of slave nodes contained in the NUMA node is not a power of 2 and the slave nodes contained in the NUMA node are not grouped, the calculation method is as follows: Based on the number of slave nodes contained in the NUMA node, a hash operation other than a power of 2 is performed on the target address to obtain the index value of the slave node, and the slave node corresponding to the index value of the slave node is determined as the slave node corresponding to the target address.

11. The method according to claim 8, characterized in that, When the number of slave nodes in the NUMA node is not a power of 2, the number of slave node groups in the NUMA node, the number of consensus node groups in the NUMA node, and the number of consensus node groups in the NUMA node is a power of 2 multiple of the number of slave node groups in the NUMA node, the calculation method is as follows: Based on the multiple relationship between the number of groups of the consensus node and the number of groups of the slave node, a correspondence between consensus node groups and slave node groups is established, such that an integer power of 2 consensus node groups correspond to one slave node group. The target address is hashed based on the number of groups of consistent nodes contained in the NUMA node to obtain the index value of the consistent node group. The subordinate node group corresponding to the consistent node group represented by the index value of the consistent node group is determined as the target subordinate node group; Based on the number of slave nodes in a slave node group, perform a hash operation on the target address that is not an integer power of 2 to obtain the index value in the slave node group. The slave node represented by the index value of the consistent node group is determined as the slave node corresponding to the target address.

12. The method according to any one of claims 9 to 11, characterized in that, The method further includes: The access node performs a hash operation on the target address to a power of 2 based on the number of groups of consistent nodes contained in the NUMA node, and obtains the group index value. The access node performs a hash operation that is not a power of 2 on the number of consistent nodes in a consistent node group to obtain the index value of the group. The access node determines the consistency node corresponding to the target address based on the group index value and the intra-group index value. The consistency node corresponding to the target address manages the cache consistency of the target address.

13. The method according to claim 6, characterized in that, The access node determines the cross-chip nodes required to reach the target chip from the cross-chip nodes on the chip where the access node is located, based on the target address, including: Obtain the number of cross-chip nodes on the chip where the access node is located; When the number of cross-chip nodes is a power of 2 and not 1, a power of 2 hash operation is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be passed to reach the target chip. or, If the number of cross-chip nodes is not a power of 2, a hash operation of a power of 2 is performed on the target address based on the number of cross-chip nodes to obtain the index value of the cross-chip nodes that need to be traversed to reach the target chip. or, When the number of cross-chip nodes is 1, the cross-chip node is determined as the cross-chip node that needs to be passed through to reach the target chip.

14. An on-chip bus interconnection network, characterized in that, The on-chip bus interconnect network includes multiple nodes located on one or more chips. The multiple nodes include one or more slave nodes. The slave nodes are used to connect to one or more storage devices and receive access requests for corresponding storage addresses. The storage space provided by the one or more storage devices includes a preset storage space for storing debug trace data of the on-chip bus interconnect network. The target address is an address in the preset storage space. Any one of the multiple nodes is used as an access node, which is used to initiate access to the preset storage space. The access node is used to determine the access type based on the attribute information of the non-uniform memory access node (NUMA node) to which the preset storage space belongs. The access type is either local access or cross-slice access. The access node is used to determine the next-hop node based on the access type and the target address; The access node is used to send the access request for accessing the target address to the next-hop node; The step of determining the next-hop node based on the access type and the target address includes: When the access type is local access, the slave node corresponding to the target address is determined from the slave nodes on the chip where the access node is located, and used as the next hop node; The attribute information of the NUMA node to which the preset storage space belongs includes the hash mode information of the NUMA node, and determining the slave node corresponding to the target address from the slave nodes on the chip where the access node is located includes: The subordinate node corresponding to the target address is determined by using a calculation method corresponding to the hash pattern information; The hash pattern information includes at least one of the following: The number of slave nodes included in the NUMA node; Whether the subordinate nodes included in the NUMA node are grouped; The number of slave node groups contained in the NUMA node; The number of groups of consistent nodes contained in the NUMA node.

15. A computer-readable medium, characterized in that, It stores a computer program that, when executed by a processor, implements the method for accessing a target address in an on-chip bus interconnect network as described in any one of claims 1 to 13.

16. A computer program product, characterized in that, It includes a computer program that, when executed by a processor, implements the method for accessing a target address in an on-chip bus interconnect network as described in any one of claims 1 to 13.