Apparatus and method for processing in-order transactions
By introducing an ordered channel indicator into the interconnect, the requester element can adjust the signal release timing according to the channel attributes, which solves the problem that the requester element cannot recognize the ordered channel, realizes more efficient ordered transaction processing, and improves system performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ARM LTD
- Filing Date
- 2020-09-21
- Publication Date
- 2026-06-12
AI Technical Summary
In the interconnect, the requester element cannot effectively recognize the existence of ordered channels, causing it to adopt an inefficient sorting process when processing ordered transactions, thus failing to realize the potential performance benefits.
By introducing an ordered channel indication into the communication channel between the completer element and the requester element, the requester element can adjust the timing of signal issuance according to the ordered channel indication and dynamically switch the sorting process to improve efficiency.
In the presence of an ordered channel, the requester element can employ a more optimized sorting process, improving system performance, especially significantly increasing processing efficiency when the completer element is not a serialization point.
Smart Images

Figure CN114651242B_ABST
Abstract
Description
Background Technology
[0001] This technology relates to apparatus and methods for processing ordered transactions.
[0002] Interconnectors can be used to provide connections between multiple components within a device. Some of these components can be requester components for publishing transactions, while others can be completer components for processing those transactions.
[0003] In some cases, a sequence of transactions published by a specific requester element needs to be processed sequentially; such transactions are referred to herein as ordered transactions. Interconnectors can be arranged to provide a communication channel between each requester element and each completer element, but determining whether to impose any ordering constraints on a particular communication channel is typically the responsibility of the interconnector designer. Adding ordering constraints to all channels is generally expensive in terms of hardware cost and overall system performance, so ordered channels are typically added only when deemed necessary. If an ordered channel is provided between a specific requester element and a specific completer element, an efficient ordering process can be employed to handle a series of ordered transactions routed through that ordered communication channel. However, if an ordered communication channel is unavailable, a less efficient ordering process needs to be deployed to ensure that ordered transactions are processed in fact sequentially.
[0004] However, in typical cases, when publishing a specific transaction within a sequence of ordered transactions, the requester element is unaware of which completer element will handle the transaction, or whether an ordered channel exists in the interconnect between the requester element and that completer element. Therefore, in general, the requester element employs a less efficient ordering process when handling ordered transactions to ensure that ordering constraints are met, thus failing to realize the potential performance benefits that could be achieved when an ordered channel actually exists between the requester element and the completer element.
[0005] Therefore, it is desirable to provide a technique that can improve the processing of ordered transactions. Summary of the Invention
[0006] In one exemplary arrangement, an apparatus is provided comprising: a plurality of completer elements for processing transactions; a requester element for publishing a sequence of ordered transactions; and an interconnector for providing a communication channel between each completer element and the requester element for transmitting signals between the completer element and the requester element; wherein: a given completer element processing a given transaction in the sequence is arranged to publish a response signal to the requester element via its associated communication channel, the response signal including an ordered channel indication to identify whether the associated communication channel has an ordered channel property, wherein the ordered channel property guarantees that processing of transactions published by the requester element in a given order via the associated communication channel will be completed by the given completer element in the same given order; and the requester element, in response to the ordered channel indication, controls the timing of publishing at least one signal from the requester element, the at least one signal being associated with one or more transactions following the given transaction in the sequence.
[0007] In another exemplary arrangement, a method for processing ordered transactions is provided, the method comprising: employing a plurality of completer elements to process transactions; employing requester elements to publish a sequence of ordered transactions; and providing a communication channel for each completer element between the completer element and the requester element for transmitting signals between the completer element and the requester element; causing a given completer element processing a given transaction in the sequence to publish a response signal to the requester element via its associated communication channel, the response signal including an ordered channel indication to identify whether the associated communication channel has an ordered channel property, wherein the ordered channel property guarantees that processing of transactions published by the requester element in a given order via the associated communication channel will be completed by the given completer element in the same given order; and arranging the requester element in response to the ordered channel indication to control the timing of the publication of at least one signal from the requester element, the at least one signal being associated with one or more transactions following the given transaction in the sequence.
[0008] In yet another exemplary arrangement, an apparatus is provided comprising: a plurality of completer element devices for processing transactions; a requester element device for publishing a sequence of ordered transactions; and an interconnector device for providing a communication channel between the completer element device and the requester element device for each completer element device, for transmitting signals between the completer element device and the requester element device; wherein: a given completer element device processing a given transaction in the sequence is arranged to publish a response signal to the requester element device via its associated communication channel, the response signal including an ordered channel indication to identify whether the associated communication channel has an ordered channel attribute, wherein the ordered channel attribute guarantees that processing of transactions published by the requester element device in a given order via the associated communication channel will be completed by the given completer element device in the same given order; and the requester element device is configured to control the timing of publishing at least one signal from the requester element device in response to the ordered channel indication, the at least one signal being associated with one or more transactions following the given transaction in the sequence. Attached Figure Description
[0009] The present technology will be further described by way of illustration only, with reference to examples of the present technology shown in the accompanying drawings, wherein:
[0010] Figure 1 This is a block diagram of an apparatus according to an exemplary embodiment;
[0011] Figure 2 This is a block diagram illustrating an apparatus according to another exemplary embodiment;
[0012] Figure 3 This is a timing diagram illustrating a write stream process that can be employed in one exemplary implementation.
[0013] Figure 4 This is a timing diagram showing some inefficiencies that may occur when a write stream process is used, where the completer element is not a serialization point.
[0014] Figure 5 This is a timing diagram illustrating the write tunneling process, which can be used to improve the efficiency of processing a series of ordered write transactions when the completer element is not a serialization point, but there is an ordered channel between the requester element and the completer element.
[0015] Figure 6 This is a flowchart illustrating how, in an exemplary implementation, an ordered channel indication provided by a completer element is used to enable dynamic switching between a write stream process and a write tunnel process;
[0016] Figures 7A to 7D It shows when using Figures 3 to 6 When using this technology, it can support... Figure 2 Specific exemplary use cases within the device;
[0017] Figure 8 This is a sequence diagram illustrating how multiple transactions involving the same address would typically be handled without using the techniques described herein;
[0018] Figure 9 This is an equivalent timing diagram illustrating the performance improvements that can be achieved by employing the techniques described herein when processing multiple transactions involving the same address; and
[0019] Figure 10 This illustrates an exemplary embodiment for the purpose of facilitating use, as shown in the example. Figure 9 The flowchart illustrates the operation of the requester element in the technology shown. Detailed Implementation
[0020] In many interconnect designs, there is a possibility of reordering transactions between requester and completer elements, as such reordering can help improve overall system performance. However, in some cases, the requester element may need to publish an ordered sequence of transactions, and in such cases, it is necessary to guarantee that transactions will be processed in the order they are published. Typically, in such cases, the requester element needs to employ a sorting process for transaction publication, which ensures that an ordered transaction is only unsealed after some form of confirmation is received for an older transaction.
[0021] However, some interconnect designs enable the provision of ordered channels between at least some of the requester-completer pairs connected to the interconnect. If ordered channels are provided, this guarantees that processing of transactions published by the requester element in a specific order through that channel will be completed in the same order.
[0022] However, when a transaction is published by a requester element, it typically does not know which completer element will handle the transaction, and therefore does not know whether that completer element will be the same completer element that handles previous ordered transactions in a sequence of ordered transactions. Furthermore, since the ordering characteristics of any particular communication channel within an interconnect depend on the interconnect's microarchitecture design, the requester element typically does not know whether an ordered channel should be used. Therefore, although a more efficient ordering process for ordered transactions could be employed if it were known that multiple transactions in these transactions are targeting the same completer element via an ordered channel, the requester element typically cannot determine this and thus usually employs a standard, less efficient ordering process.
[0023] While it's possible to perform certain platform-specific programming within the requester element to capture information about completer elements associated with a specific address range (so that the requester element can identify which completer element will handle a specific transaction) and whether an ordered channel exists between the requester element and these completer elements, this requires additional logic within the requester element to maintain and process such information and to ensure the requester element design is compatible with a specific interconnect design. Typically, it's desirable to develop cross-interconnecter requester element designs that can be used with a variety of different interconnect designs. The techniques described herein enable the development of such cross-interconnecter requester element designs while still supporting efficient ordering processes for ordered transactions, even when ordered channels exist within the interconnect.
[0024] Specifically, according to the technology described herein, an apparatus is provided having a plurality of completer elements and one or more requester elements, the plurality of completer elements being used to process transactions, and the one or more requester elements being used to publish transactions. Specifically, requester elements capable of publishing an ordered sequence of transactions are provided. The apparatus also provides an interconnector for connecting the various requester elements and completer elements together. The interconnector provides a communication channel for each completer element between the completer element and the requester element for transmitting signals between the completer element and the requester element.
[0025] Furthermore, a given completer element for a given transaction in a sequence of ordered transactions is configured to issue a response signal to a requester element via its associated communication channel. This response signal includes an ordered channel indication to identify whether the associated communication channel has ordered channel properties. When a completer element receives a transaction for processing, it learns which requester element the transaction originated from, and therefore which communication channel within the interconnect is being used. Information regarding whether the communication channel is an ordered channel can then be provided to the completer element (e.g., stored in a configuration register accessible to the completer element), and the ordered channel indication can be appropriately set accordingly.
[0026] If the communication channel has an ordered channel property, this guarantees that the processing of transactions published by the requester element in a given order through the associated communication channel will be completed by the given completer element in the same given order. Therefore, the requester element can be arranged to control the timing of the publication of at least one signal from the requester element in response to an ordered channel indication, the at least one signal being associated with one or more transactions following a given transaction in the sequence. Thus, if the ordered channel indication indicates that the associated communication channel is not an ordered channel, the requester element can employ a standard sequencing procedure, where some form of acknowledgment for older transactions is required before unsealing subsequent transactions. However, if the ordered channel indication indicates that the communication channel is an ordered channel, a more optimized sequencing procedure can be employed, allowing the earlier publication of certain signals associated with one or more subsequent transactions compared to the case where no ordered channel exists.
[0027] In one exemplary implementation, the requester element is arranged to select a signal timing scheme from a plurality of signal timing schemes for use in one or more transactions following a given transaction in a sequence, based on an ordered channel indication.
[0028] Furthermore, in one exemplary embodiment, the requester element is arranged to further consider, when selecting a signal timing scheme, whether a given transaction and one or more subsequent transactions should be processed by the same completer element. Therefore, when multiple ordered transactions within an ordered transaction are targeting the same completer element, and an ordered channel exists between the requester element and the completer element, a signal timing scheme that facilitates a more optimized sequencing process for these transactions can be used, resulting in improved device performance.
[0029] The timing of the signal issued by the requester element may vary depending on the implementation, and depends on the order channel indication. However, in one exemplary implementation, the timing of the release indication signal issued from the requester element to the completer element varies depending on the presence of the order channel. Specifically, the requester element may be arranged to issue a release indication signal to the completer element processing the transaction when the requester element determines that data processed by all previous transactions in the sequence is observable. The release indication signal authorizes the completer element to make the data being processed by the transaction available to other requester elements. When the requester element understands that the interconnect is capable of making the data processed by any of these previous transactions available in response to a subsequent request issued for accessing the data (e.g., generated by different requester elements in the system), the requester element will determine that the data processed by all previous transactions in the sequence is observable. Therefore, the release indication signal is a mechanism to ensure that data processed by subsequent transactions becomes unavailable before data processed by earlier transactions.
[0030] In such systems, the multiple signal timing schemes can employ different criteria to determine when to issue a release indication signal. Therefore, the criteria used to assess when to issue the release indication signal can be varied depending on whether the ordered channel indication indicates the existence of an ordered channel.
[0031] In one exemplary embodiment, each completer element is arranged to issue a complete signal to a requester element when processing a transaction at a specified memory address. This indicates that the completer element has taken sufficient steps to ensure that the result of the operation requested by the transaction will be observed by another requester element, which then issues another transaction at the specified memory address to the completer element. According to a first signal timing scheme among the plurality of signal timing schemes, the requester element can then be arranged to issue a release indication signal to the completer element processing the current transaction when it has received a complete signal for all previous transactions in the sequence preceding the current transaction. In one exemplary embodiment, a sequencing flow employing such a first signal timing scheme can be referred to as a write stream flow. Such a method can achieve high performance when targeting completer elements that serve as serialization points (POS) within the system because such completer elements can directly issue complete signals without involving any downstream elements (those closer to memory than the completer element in question).
[0032] However, alternative signaling timing schemes can also be supported. For example, in one exemplary implementation, each completer element is arranged to issue a data pull signal to a requester element while processing a transaction, triggering the requester element to transfer the data items to be processed by the transaction to the completer element. According to a second signaling timing scheme among the plurality of signaling timing schemes, the requester element is then arranged to issue a release indication signal to the completer element processing the current transaction when it has received a data pull signal for all previous transactions in the sequence preceding the current transaction. Therefore, in this example, the requester element does not need to wait to receive relevant completion signals for previous transactions before issuing the release indication signal; instead, it can issue the release indication signal for previous transactions as soon as a data pull signal has been received. If an ordered channel exists between the requester element and the completer element, such a method only guarantees the ordering of transactions. However, since an ordered channel indication provided from the completer element can provide such information, this allows the requester element to switch to using the second signaling timing scheme in certain cases where an ordered channel exists. In one exemplary implementation, the sequencing process employing such a second signal timing scheme can be referred to as a write tunneling process, and this sequencing process can be used to improve performance when a single target completer element is used to process a series of ordered transactions, provided there is an ordered channel between the requester element and the completer element. Specifically, this can significantly improve performance when the completer element is not a serialization point, as the completer element will need to perform downstream communication with one or more other elements before it can issue a complete signal.
[0033] In one exemplary embodiment, the requester element is arranged to, when considering which signal timing scheme to use for the current transaction, detect when response signals providing an ordered channel indication have been provided for both the current and previous transactions, and determine from the source indication field provided in the two response signals whether these response signals have been issued by the same completer element. A second signal timing scheme may be used if the response signals have been issued by the same completer element and the ordered channel indication indicates the presence of an ordered channel attribute in the associated communication channel between the requester element and the same completer element; otherwise, a first signal timing scheme may be used. It has been found that such methods can significantly improve performance by enabling the requester element to dynamically switch between a first and second signal timing scheme based on whether an ordered channel is used for a series of ordered transactions.
[0034] The response signal that provides an ordered channel indication can take various forms, but in one exemplary implementation, a data pull signal is used as the response signal that provides an ordered channel indication.
[0035] In one exemplary implementation of the method described above, the sequence of ordered transactions includes a sequence of ordered write transactions. Regardless of whether the write transactions are processed by the same completer element, it may be necessary to order the write transactions, and the mechanism described above allows the requester element to employ an efficient ordering process by allowing dynamic switching between signal timing schemes, depending on whether the same completer element is being used to process multiple transactions, and in that case, depending on whether the ordered channel coexists with the completer element.
[0036] However, the techniques described herein are not limited to the above scenarios, and there are other cases in which providing an ordered channel indication in the response signal from the completer element can be used to improve the ordering process of ordered transactions, thereby improving performance.
[0037] For example, in one implementation, the requester element may understand that a sequence of ordered transactions (whether they are ordered write transactions or ordered read transactions) will be processed by the same completer element. In this case, the requester element may be arranged to detect when a response signal providing an ordered channel indication has been provided for the current transaction, considering which signal timing scheme to use for each subsequent transaction in the sequence following the current transaction, and, when the ordered channel indication indicates the existence of an ordered channel attribute in the associated communication channel between the requester element and the completer element, to issue a request transmission signal for each subsequent transaction once a request transmission signal has been issued for all transactions in the sequence preceding that subsequent transaction. Otherwise, the timing of issuing the request transmission signal for each subsequent transaction may be constrained until a predetermined signal has been received from the completer element for all transactions in the sequence preceding that subsequent transaction.
[0038] Therefore, in the presence of an ordered channel, the requester element can continuously issue requests to each of the remaining ordered transactions in the sequence without waiting to receive a predetermined signal from the completer element for all previous transactions in the sequence before issuing a request to the next transaction in the sequence.
[0039] In cases where the sequence of ordered transactions includes a sequence of ordered write transactions, the predetermined signal can be a data pull signal, which is issued by the completer element to the requester element during transaction processing to trigger the requester element to transmit the data items to be processed by the transaction to the completer element. The response signal providing an indication of an ordered channel can take various forms, but in one example, the data pull signal for the current transaction can be used as the response signal.
[0040] In cases where the sequence of ordered transactions includes a sequence of ordered read transactions, the predetermined signal can be a read receive signal issued by the completer element to the requester element during transaction processing. This read receive signal acknowledges to the requester element that the completer element has accepted the read transaction for processing. In such embodiments, the read receive signal for the current transaction can be used as a response signal providing an indication of an ordered channel.
[0041] The requester element can know in a variety of ways that the sequence of ordered transactions will be processed by the same completer element, but in a particular implementation, the requester element determines this when each transaction in the transaction specifies the same address.
[0042] Communication channels can be constructed in various ways, and in some implementations, multiple distinct layers can be provided within the communication channel. For example, a transport layer (also known as a link layer) can be used to route transactions from requester elements to completer elements, and the protocol used by the interconnect may not restrict whether the transport layer must be inherently ordered or unordered. Instead, this may depend entirely on the microarchitectural decisions made by the interconnect designer. Similarly, a protocol layer can be provided responsible for the protocol flow for completing transactions, and the interconnect protocol can use a retry-based mechanism for resource allocation between requesters and completers. A retry-based mechanism can, for example, cause newer transactions to precede older transactions when it comes to transaction allocation in the completer tracker. In such implementations, an ordered channel indication can be arranged to indicate ordered channel properties when both the transport layer and the protocol layer of the communication channel are constrained to process transactions sequentially. Thus, when both layers are arranged such that they are constrained to process transactions sequentially, the communication channel can be considered to have ordered channel properties.
[0043] A specific example will now be described with reference to the accompanying drawings.
[0044] Figure 1 This is a block diagram of an apparatus according to an exemplary embodiment. The apparatus includes a plurality of master devices 10, 20, 30, which are coupled to a plurality of slave devices 60, 65, 70 via an interconnect 40. In an example of element 70, this could be a downstream network that itself provides connectivity to a plurality of additional slave devices.
[0045] Within interconnect 40, multiple internal nodes 45, 50, and 55 may be provided; these internal nodes will be referred to herein as master nodes. The master nodes are arranged to receive and process transactions published by master devices, which may result in downstream communication with connected slave devices. When considering transactions published by master devices 10, 20, and 30, the master devices can be considered requester elements, and master nodes 45, 50, and 55 can be considered completer elements. While processing these transactions, additional transactions may also be published downstream of the master nodes to connected slave devices; for these transactions, master nodes 45, 50, and 55 can be considered requester elements, and slave devices can be considered completer elements.
[0046] Despite Figure 1 In this system, each master node is connected to a single slave device or network, but in some implementations, multiple master nodes may be connected to the same slave device, or a master node may be connected to multiple slave devices.
[0047] like Figure 1 As shown, interconnector 40 can establish multiple communication channels 75 for interconnecting various master devices 10, 20, 30 with master nodes 45, 50, 55. In some embodiments, interconnector 40 may also provide communication channels for communication between master nodes and various slave devices. Although in some embodiments the techniques described herein may be employed in relation to either transactions between master nodes and slave devices or transactions between master devices and master nodes, for the purposes of the following discussion, it is assumed that such techniques will be applied with respect to transactions published between master devices and master nodes.
[0048] like Figure 1 As shown, master devices can have various different types. In the example shown, it is assumed that master devices 10 and 20 are fully coherent master devices (e.g., central processing units (CPUs)) whose internal caches are consistent with caches at lower levels of the cache hierarchy, such as, for example, system caches that may reside within interconnect 40 (for simplicity, these caches are...). Figure 1 (Not shown in the image). To maintain this consistency, some master nodes 45, 50 may be fully consistent master nodes that include a consistency point. This consistency point manages consistency by snooping on the required fully consistent master, integrating the snoop responses of transactions, and sending a single response to the requesting fully consistent master. Such master nodes will also typically serve as serialization points (POS) to manage the order between memory requests.
[0049] Although this technology can be adopted for various types of interconnects, for the purposes of the following discussion, it is assumed that the interconnect uses the Advanced Microcontroller Bus Architecture (AMBA) developed by Arm Limited, Cambridge, United Kingdom, and specifically the AMBA 5CHI (Conformance Hub Interface) architecture specification. According to this specification, fully conformance masters are referred to as RN-F (Fully Conformance Request Nodes) and fully conformance master nodes are referred to as HN-F (Fully Conformance Master Nodes).
[0050] However, as Figure 1 As shown, not all master devices or master nodes need to be fully consistent. For example, according to the AMBA5CHI architecture specification, an input / output (I / O) consistent master device 30 can be provided; such an I / O consistent master device is called an RN-I (I / O Consistency Request Node). This type of I / O consistent master device 30 can be arranged to generate only a subset of transactions defined by the interconnect protocol and does not require peeping functionality.
[0051] Similarly, one or more of the master nodes in Master Node 55 can be non-consistent master nodes (referred to as HN-I in the AMBA 5CHI architecture specification). Such master nodes can be deployed to handle a restricted subset of transactions defined by the protocol, excluding consistency points and unable to handle stalkable requests.
[0052] The I / O consistency master 30 can be used to... Figure 1 The illustrated device is coupled to an upstream network or upstream component 35. Exactly as a concrete example, the I / O coherence master 30 can provide a means for... Figure 1 The illustrated device connects to an interface of the PCIe network (RN-I, which serves as a bridge for connecting to PCIe endpoints) forming the upstream network 35. Similarly, the non-conforming master node 55 can, for example, be used to connect the device to a downstream I / O device or network 70, again in a concrete example, which can be a PCIe network (e.g., HN-I, which connects to a PCIe root complex that may have multiple PCIe endpoints behind the root complex). In such an example, it should be understood that, by using the I / O conforming master device 30 and the non-conforming master node 55, the CHI interconnect 40 can be used to provide a communication path between two separate PCIe networks 35, 70.
[0053] As previously discussed, the communication channels 75 provided by interconnector 40 can be arranged in various ways, depending on the microarchitectural decisions made when designing the interconnect. At least some of the channels can be provided as ordered channels with an ordered channel property that guarantees that the processing of transactions published in a given order by requester elements through such communication channels will be completed by completer elements in the same given order. When a particular completer element wishes to publish a sequence of ordered transactions that need to be completed in the same order of publication, a more efficient sequencing process for these transactions can be employed if an ordered channel is known to exist between the requester element and the completer element that will process these transactions.
[0054] However, it is generally desirable to design master devices across interconnects, thus independent of the specific form of interconnect design used. When such a master device publishes a transaction, it may not know which completer element will handle the transaction, because, for example, the system address mapping 80 within interconnect 40 can be used to map the address specified by the transaction to the specific completer node to handle that transaction. Furthermore, the requester node does not know whether the communication channel to be used will be an ordered channel. Therefore, unless detailed platform-specific programming is performed within the requester element to capture information about the completer elements to be used for a specific address range and the capabilities of the communication channels to be used when communicating between the requester element and these completer elements (specifically, whether ordered channels are provided), the requester element will typically have to resort to standard ordering procedures that ensure the ordering of these ordered transactions even in the absence of ordered channels.
[0055] This means that, under normal circumstances, the performance benefits achievable by using ordered channels are not realized because the requester element assumes the channel is unordered in order to ensure that ordered transactions are processed sequentially. However, the technique described in this paper provides a mechanism that enables the requester element to employ a more efficient ordering process in the presence of ordered channels without performing platform-specific programming of the requester element, thus preserving the requester element design across interconnectors.
[0056] like Figure 1 As shown, when considering a pair of components within a device, the component closer to the slave device / main memory can be referred to as the downstream component, and the component further away from the slave device / main memory can be referred to as the upstream component. Therefore, when compared with master devices 10, 20, and 30, master nodes 45, 50, and 55 will be downstream components, but when considered relative to slave devices 60, 65, and 70, the master node will be the upstream component.
[0057] It should be understood that Figure 1A relatively simple system with only a few master and slave devices is shown, along with a single interconnector 40 that interconnects these devices. However, this technology can also be employed in more complex systems. Figure 2 An illustrative example is shown where two interconnected interconnects 110 and 135 exist, each of which can be a CHI-based interconnect. When considering a particular master device 105, the master device will publish its transactions to interconnect 110, where system address mapping can be used to identify the appropriate completer element to direct each transaction. Although in some cases the completer element can be the master node 115, 120, 125 within interconnect 110, it can be determined that the completer element is the connection element 130 used to couple interconnect 110 to another interconnect 135. In the illustrated embodiment, connection element 130 takes the form of a CXRA element, which is a bridging element used to connect CHI to a CCIX link for inter-chip communication. Corresponding connection elements 140 will be provided in other interconnects (in... Figure 2 In the example, the connecting element is a CXHA element, and connecting elements 130 and 140 can be connected via a suitable high-speed link (such as in...). Figure 2 A connection is established using a PCIe-based transport link (referred to as a CCIX link in the original text). Transactions received by the connection element 140 in the second interconnect can then be mapped using the system address mapping in the second interconnect, so that these transactions are routed to the appropriate completer element, such as... Figure 2 The master node shown is one of the master nodes 145, 150, and 155.
[0058] As previously discussed, each communication channel may or may not be configured as an ordered channel, depending on the microarchitectural decisions made during design. According to the techniques described herein, when a response is published from a specific completer element back to a requester element, the response signal includes an ordered channel indication to identify whether the associated communication channel possesses the aforementioned ordered channel attribute, thereby being considered or not an ordered channel. In the following description, this ordered channel indication will also be referred to as an OCE (Ordered Channel Enable) indication, and in one example, may be implemented as a 1-bit field whose value indicates whether the associated communication channel between the specific requester element and the specific completer element is an ordered channel.
[0059] As previously mentioned, multiple layers, such as transport and protocol layers, can be provided within a communication channel. For a communication channel to function as an ordered channel, each of these layers must have an ordering constraint that ensures transactions published in a specific order through the communication channel are processed in the same given order. As used herein, an OCE encoding of 1 indicates the existence of an ordered channel between the requester element and the completer element using the communication channel, while an OCE encoding of 0 indicates the absence of an ordered channel between the requester element and the completer element. As discussed later herein, based on the value of the OCE field provided during the response published regarding the current transaction in the ordered sequence, and determining whether subsequent transactions should target the same completer element, the requester element can dynamically tune its used ordering process for a sequence of ordered transactions to seek maximum performance.
[0060] Information about the communication channel can be provided to the completer element in several ways. Specifically, unlike a requester element that may not know which completer element will process the transaction and therefore which communication channel will be used when publishing a transaction, once the completer element receives a request signal for the transaction, the request signal will identify the source requester element, thus allowing the completer element to know which communication channel is being used. Information indicating whether the communication channel between the completer element and the identified requester element is an ordered channel can then be accessed, thereby allowing the OCE-encoded value to be set accordingly when publishing the response. For example, a configuration register can be provided associated with each master node to capture the OCE-encoded value for each possible requester element that can communicate with the completer element, and the configuration register can be hardwired at build time or can be written in software.
[0061] During transaction processing, a series of signals are typically issued in both directions between the requester element and the completer element. For example, when a transaction is initiated, a request signal is usually issued from the requester element to the completer element. One or more signals can then be relayed from the completer element back to the requester element to indicate the progress of the transaction. For write transactions, a response signal (also known as a data pull signal) is issued to identify that the completer element has a buffer available to receive the write data, and upon receiving such a data pull signal, the requester element can issue the write data. For certain types of transactions, the completer element is also arranged to issue a complete signal to the requester element when processing the transaction, indicating that the completer element has taken sufficient steps to ensure that the result of the operation requested by the transaction will be observed by another requester element, which issues another transaction to the completer element specifying the same memory address.
[0062] Furthermore, for certain transactions, the requester element may be arranged to issue a release indication signal (also referred to herein as a completion acknowledgment signal or a ACK signal) to the completer element processing the transaction when the requester element determines that data processed by all previous transactions in the sequence is observable. The requester element may determine the observability of all previous transactions in various ways, but in one exemplary embodiment, the previously mentioned completion signal may be used for this purpose. The release indication signal authorizes the completer element to make data being processed by the associated transaction available to other requester elements.
[0063] Therefore, in the absence of an ordered channel between the requester element and the completer element, the timing of issuing a completion acknowledgment signal can be used to enforce the ordering of the sequence of ordered transactions. This is because a single completer element processing one transaction within a transaction can be constrained to make data being processed in that transaction available only to other requester elements once it receives a completion acknowledgment signal from the requester element. This allows the requester element to control the timing among multiple completer elements that may be processing individual transactions in the sequence. However, as will be discussed in more detail below, if an ordered channel exists between the requester element and the completer element and multiple ordered transactions within an ordered transaction are being issued to the same completer element, the requester element may be able to determine that it does not need to wait for completion signals regarding previous transactions, after which it can issue a completion acknowledgment signal for the current transaction, thereby improving performance. References will follow below. Figures 3 to 5 Let's discuss this in more detail with a specific timing diagram example.
[0064] Figure 3 This is a timing diagram illustrating the write flow that can be used when the completer element is a serialization point within the system (such as when the completer element is a fully consistent master node (HN-F), taking into account the previously mentioned AMBA 5CHI protocol). Write flow flows can be deployed to support the Ordered Write Observation (OWO) sequencing model supported by the PCIe endpoint requester, and because the RN-I can be used as a bridge to the PCIe endpoint, the RN-I needs to support the flow. Figure 3As shown, the RN-I master device 200 may therefore wish to publish a sequence of ordered write transactions, and these may not necessarily all be processed by the same completer element. In this example, consider two ordered write transactions processed by different completer elements 205 and 210, each of which is a fully consistent master node (HN-F). Therefore, each completer element is a serialization point within the system and can thus take sufficient steps to ensure that the result of the operation required by the write transaction is observed by another requester element, which publishes another transaction to the completer element at a specified memory address, without communicating with any downstream components. Such a fully consistent master node may, for example, include its own cache storage to cache the write data for access by subsequent requests. Therefore, in some cases, before such a completer element has received the written data, and specifically, once the completer element has performed any required hazard checks to ensure that a subsequently received request (specifying the same address associated with the write operation) will be arranged to access the written data associated with the write operation, the completer element may be able to issue a write completion signal relatively quickly.
[0065] Therefore, considering Figure 3 When a write stream process is employed, the requester element 200 is allowed to publish write requests 215 and 220 for two ordered write transactions consecutively in any order, without waiting for any response from the completer element 205 regarding the first write transaction before publishing the write request 220 for the second write transaction. In the write request signal, the control values Reqorder (RO) and expcompack (ECA) are set to 1 to identify that a write stream process (or write tunneling process described later) is being employed, and to identify that the master node will receive a completion acknowledgment signal. Considering the first write request 215, once the master node 205 has determined that sufficient buffer space exists to receive the write data from the requester element, the master node can respond to the requester element 200 subsequently publishing the write data back to the completer element 205 by publishing a data buffer ID response signal 225 (also referred to herein as a data pull signal). Additionally, once the completer element 205 has performed the aforementioned risk checks, it can then publish a completion signal 230.
[0066] According to the write stream procedure, write data for the current transaction can be published after all data pull requests for any older ordered transactions in the sequence, as well as the current transaction, have been received. Therefore, since transaction A is the first transaction, write data can be published once data pull signal 225 for transaction A is received.
[0067] Furthermore, when using a write stream process, after the requester element has received completion signals for all older transactions in the sequence, it can send a completion acknowledgment signal (i.e., the previously mentioned release indication) for the current transaction. Therefore, since transaction A is the first transaction in the sequence, the completion acknowledgment signal is not bound by completion signals for any previous transactions and can therefore be published without waiting for any completion signals. Figure 3 In the example shown, it is assumed that the write data signal and the completion confirmation signal are combined into a single signal, as shown by Figure 3 As indicated by signal line 235 in the diagram. Upon receiving a completion acknowledgment as part of a combined signal issued via path 235, the master node 205 can then make the write data for transaction A available to other requester elements.
[0068] Regarding the second transaction (transaction B), master node 210 then processes the write request via path 220 in almost the same manner as discussed previously for master node 205's handling of transaction A. Therefore, once master node 210 has space to receive the write data, it will issue a data pull signal via path 240, and once master node 210 has performed any necessary risk checks, it will issue a completion signal via path 245. Once data pull signal 240 has been received, write data can be published from requester element 200 to master node 210, since a data pull signal for the earlier transaction (transaction A) has also been received at this time. If this does not occur, and instead, master node 210 issues a data pull signal for transaction B before master node 205 has issued a data pull signal for transaction A, the requester element will need to wait to receive both data pull signals before allowing write data for transaction B to be published to master node 210.
[0069] Regarding the publication of a completion acknowledgment signal for transaction B, according to the write stream flow, before allowing the publication of a completion acknowledgment signal for transaction B, the requester element 200 needs to wait to receive completion signals for all previous transactions (in this example, the completion signal is the completion signal for transaction A published via path 230). Figure 3 As shown, the requester element can republish the combined write data and completion acknowledgment signal to the master node 210, as indicated by path 250. Upon receiving the completion acknowledgment signal via path 250, the master node 210 can then make the data being processed by transaction B available to other requester elements.
[0070] Therefore, by adopting the write stream process discussed above, it can be seen that the requester element can continuously publish a series of ordered transaction request transmissions without waiting for any acknowledgment signal for the first transaction in the sequence before publishing the next transaction in the sequence. Instead, the ordering constraint is enforced by the timing followed when publishing the completion acknowledgment signal, and thus it can be seen that when using the write stream process, the first signal timing scheme can be used for the completion acknowledgment signal.
[0071] While write streaming can be very efficient when the master node is the serialization point, it essentially results in transaction serialization when the completer element is not the serialization point. This is achieved through... Figure 4 The example is illustrated schematically, where the first write transaction A and the second write transaction B are assigned to the same completer element (in this case, the non-consistent master node 300). Figure 3 As in the example, request transports for these two transactions can be published consecutively via paths 310 and 315 without the requester element waiting for any confirmation regarding transaction A before publishing the write request transport for transaction B. Figure 4 As shown, data pull signals can be published for two transactions via paths 320 and 325. Once a data pull response for transaction A has been received via path 320, the requester element 200 can publish the write data for transaction A. The requester element can also publish a completion acknowledgment signal for transaction A, since there are no previous transactions in the sequence for which the requester element needs to wait for a completion signal. Therefore, a combined write data and completion acknowledgment signal can be published to the completer element 300 via path 330.
[0072] Once a data fetch signal for transaction B has been received via path 325, the requester element 200 can publish the write data for transaction B via path 335, since two data fetch signals for both transactions A and B have been received at this point. However, according to the write stream flow, the completion acknowledgment signal for transaction B cannot yet be published by the requester element 200 because the completion signal for the first transaction (transaction A) has not yet been received from the completer element 300.
[0073] Once the completer element 300 has received a completion acknowledgment signal for transaction A, it can publish the transaction data downstream, at which point the data becomes observable elsewhere in the system. In this particular example, it is assumed that the slave device 305 is an AXI slave device conforming to the Advanced Extensible Interface (AXI) protocol, which forms part of the previously mentioned AMBA specification developed by Arm LLC. Therefore, address transmissions are published via the write address channel, and write data is published via the write data channel, as indicated by signal line 340. When appropriate, a response signal for transaction A is published from the AXI slave device 305, as indicated by signal line 345, and upon receiving this signal, the completer element 300 can then publish a completion signal for transaction A, as indicated by signal line 350. Only at this point does the requester element 200 then publish a completion acknowledgment signal for transaction B, as indicated by signal line 355.
[0074] Therefore, even if the completer element 300 has written data for transaction B after receiving the signal via path 335, it cannot publish that data downstream until it receives a completion acknowledgment signal, as indicated by signal path 355. At this point, the write transaction can be propagated downstream via path 360, thereby causing a response from the AXI slave device 305 published via path 365 when appropriate. This then allows the completer element 300 to publish a completion signal for transaction B, as indicated by signal line 370.
[0075] Therefore, from Figure 3 and Figure 4 The comparison shows that the write stream provides significant performance benefits when the completer element is a serialization point, but fails to achieve the same performance benefits when the completer element is not a serialization point. This can be particularly problematic because, as previously discussed, such inconsistent master nodes can be used when interconnect 40 is being employed to provide a bridging connection between two high-speed networks, such as PCIe networks.
[0076] exist Figure 3 and Figure 4 In the examples above, it is assumed that the ordered channel is unavailable, therefore the relevant response signal issued by the completer element sets the OCE indication to 0. In the example considered herein, the response signal providing the ordered channel indication is the data fetch response signal, such as... Figure 3 and Figure 4 It is clearly shown.
[0077] However, in cases where there are ordered channels between the requester and completer elements, an alternative sequencing procedure can be used, such as... Figure 5As shown, this sorting process is referred to as the write tunneling process. Figure 5 The signal lines in Figure 4 The signal lines are consistently marked, so it can be seen that, as before, write transactions are published via paths 310 and 315. According to the write tunneling procedure, requests are constrained to be published sequentially. Figure 4 Similarly, data pull signals are received, but this time, the OCE flag is set to 1, so these signals are labeled 320' and 325' to match them. Figure 4 The signals 320 and 325 are distinguished in this context. The constraints on publishing write data are the same as those discussed previously in the write stream process; therefore, the requester element needs to wait not only for the data pull signal for the current transaction but also for data pull signals for any previous transactions. Thus, once a data pull signal has been received via path 320', write data for transaction A can be published, and once data pull signals for transactions A and B have been received via paths 320' and 325', write data for transaction B can be published. Furthermore, regarding the write stream process, the requester element 200 can publish a completion acknowledgment signal without any constraints because there are no previous transactions in the ordered sequence prior to transaction A. Therefore, it can be done in a manner similar to... Figure 4 The write stream process example is exactly the same, publishing the combined write data and completion acknowledgment signal for transaction A via path 330.
[0078] However, according to the write tunneling process, the timing of issuing a completion acknowledgment signal for the current transaction by the requester element is not constrained to waiting for completion signals for all previous transactions. Instead, a completion acknowledgment signal for the current transaction can be sent once all data fetch signals for previous transactions have been received. Therefore, once a data fetch signal for transaction A has been received via path 320', a completion acknowledgment signal for transaction B can be issued by the requester element 200. Therefore, with Figure 4 Compared to the write stream approach, the requester element 200 can publish combined write data and a completion acknowledgment for transaction B, as indicated by signal line 335', instead of being constrained by the write stream approach to only publish write data, such as... Figure 4 As indicated by signal line 335 in the diagram.
[0079] Since the completer element 300 now has completion acknowledgment signals for both transactions A and B, these transactions can be processed in parallel with respect to the downstream AXI slave device 305, allowing address and data transmissions 340 and 360 to be published without any dependency on each other. When a response signal is received via paths 345 and 365, the corresponding completion signal can then be published back to the requester element 200 via paths 350 and 370.
[0080] Despite Figure 5 In this example, the master node is a non-consistent master node 300, but the write tunneling process can also be used for fully consistent master nodes (HN-F), although the performance benefits are more pronounced for non-consistent master nodes because they are not serialization points.
[0081] Prior to this technique, the requester element would find it difficult to assess when a write stream procedure should be used and when a write tunnel procedure could be used instead. However, by using the OCE flag in the data pull signal provided by the completer element, the requester element can perform a simple check to determine the appropriate timing for issuing the completion confirmation signal and whether a first signal timing scheme corresponding to the write stream procedure or a second signal timing scheme corresponding to the write tunnel procedure should be used. Figure 6 The diagram illustrates the steps taken by the requester element. In this example, assume two transactions are being executed, the first transaction A (in...) Figure 6 (referred to as "superior" in the text) and the second transaction B (in the text) Figure 6 The transaction is referred to as "itself" (i.e., the current transaction). At step 400, write requests for both transactions are issued sequentially. At step 405, it is determined whether data pull signals for both transactions have been received, and in this case, the process proceeds to step 410. Here, it is determined whether the parent transaction and the self transaction are targeting the same completer element. This can be determined by examining the source identification information in the two data pull signals. Specifically, the two data pull signals will identify the source of the transfer, and in this case, the completer element processing the transaction, and the requester element. The requester element can also sample the OCE value. In principle, the OCE value provided in either of the data pull signals can be sampled, as it will only be relevant if the same completer element is being used; therefore, in this case, the OCE value will be the same in both data pull signals. Figure 6 As shown, in the example shown, the OCE value for transaction B (i.e., itself) is sampled.
[0082] At step 415, the write data is driven onto the communication channel according to the rules previously discussed. Specifically, at this point, two data pull responses have been received, and therefore write data can be published for both the parent transaction and the current transaction. However, as indicated in step 420, the timing of the output completion acknowledgment signal will vary depending on the evaluation performed at step 410, and specifically, from the previously described... Figure 5It will be understood that, when using the same completer element and the OCE flag is set to 1 to indicate the existence of an ordered channel, a write tunneling procedure can then be used.
[0083] The signal evaluated in step 410 is used in a specific manner to determine the timing of the output completion acknowledgment signal for transaction B (i.e., whether to use a write stream or a write tunnel), as shown in Table 1 below:
[0084]
[0085] Table 1
[0086] Specifically, as shown in Table 1, the timing of the release of the completion acknowledgment signal for the second transaction (i.e., the transaction "itself") depends on the sampled OCE value and the determined ST value, which is set to 1 when the two transactions are targeting the same completer element.
[0087] For completeness, the differences between the write stream process and the write tunnel process can be seen in Table 2 below:
[0088]
[0089] Table 2
[0090] Figures 7A to 7D It shows that when considering Figure 2 When referring to exemplary implementations, the above text refers to... Figure 6 Various applications of the dynamic switching process discussed. Figure 7A This illustrates how high-throughput processing of ordered write transactions for local fully consistent master nodes 115 and 120 can be achieved, as indicated by lines 450 and 455. In this case, a write stream process can be used for multiple fully consistent master nodes even if an ordered channel does not exist, and high throughput can be achieved because the fully consistent master nodes are synchronization points.
[0091] Figure 7B This demonstrates how to implement high-throughput processing of ordered writes for use with locally inconsistent master nodes (such as master node 125), as... Figure 7B As indicated by line 460 in the diagram. Since this series of transactions involves the same holistic objective (i.e., HN-I 125), and an ordered channel exists between the requester element 105 and the completer element 125, a write tunneling process can be performed, thereby maintaining high throughput. For example, this can be used in situations where interconnector 110 is being used as a peer-to-peer path between two PCIe networks, where high performance through interconnector 110 is required.
[0092] Figure 7CThis illustrates how high-throughput processing of ordered write transactions can be achieved with respect to remote fully consistent master nodes (such as master nodes 145, 150 residing on interconnect 135). Since an ordered channel exists between requester element 105 and link element 130, which serves as a completer element within the consistent interconnect 110, and all transactions designated for remote master nodes 145, 150 will be delivered through element 130, a write tunnel can be used for communication between requester element 105 and link element 130, as indicated by path 470. A fully ordered PCIe transport layer can then be provided on the CCIX link between element 130 and element 140, as indicated by path 475, which already provides a high-throughput link. Within the second interconnect 135, write streams can be performed between CXHA element 140 and individual master nodes 145, 150, as indicated by lines 477 and 479. High throughput can be achieved because these master nodes are serialization points.
[0093] Figure 7D This illustrates how high-throughput processing of ordered write transactions can be performed on remote, inconsistent master nodes (such as the HN-I 155 connected to interconnect 135). This is consistent with what was previously discussed. Figure 7C Similarly, a write tunnel can be used between the requester element 105 and component 130, as indicated by line 480, and a fully ordered PCIe transport layer exists between the two components 130 and 140 to connect the two interconnectors, as indicated by line 485. Here, it is then assumed that the entire series of transactions is targeted at the same monolithic master node 155, and an ordered channel exists between CXHA component 140 and master node 155. Therefore, a write tunnel can be used between component 140 and master node 155, as indicated by line 490.
[0094] Despite Figures 3 to 7D The examples above have described the use of ordered channel indicators to achieve dynamic switching between write stream and write tunnel processes. However, ordered channel indicators can also be used in other scenarios to improve the performance of a series of ordered transactions. For example, significant performance improvements can be achieved when there are multiple ordered transactions involving the same address to be executed, whether these transactions are read transactions, write transactions, or actually a mixture of both. Without using this technique, for a specific exemplary scenario requiring four atomic memory operations involving the same address to be executed sequentially, timing can be as follows: Figure 8 As shown. These memory operations can increment a counter, or perform arithmetic or logical operations on data at a specific address, causing the data to be updated multiple times, and may require these operations to be performed in a specific order. Figure 8In the examples, it is assumed that the operations are atomic operations, because atomic operations do typically target the same address. However, the techniques described in this article are not limited to use in conjunction with atomic operations.
[0095] like Figure 8 As shown, within the requester element, four atomic storage requests can be forwarded from internal interface 500 to interconnect interface 505 (in this example, the CHI interface, as indicated by the four signal lines 515, 520, 525, and 530). However, without this technique, the requester node is unaware of whether an ordered channel is provided to the completer element and therefore will not issue requests for subsequent transactions until an acknowledgment has been received from the completer element for the previous transaction. Therefore, a request for the first atomic storage transaction is issued via path 535, and when appropriate, the completer element issues a data pull signal via path 540. In this example, it is assumed that the data pull signal and the complete signal are combined into a single response signal issued via path 540. Upon receiving the data pull signal (or, if the transaction is a read transaction rather than a write transaction, upon receiving the read receive signal), the requester element interface 505 can then issue a request for the next transaction as indicated by signal line 545.
[0096] like Figure 8 As shown, before a subsequent atomic memory transaction can be published, it must wait for a data pull signal for the previous atomic memory transaction, as indicated by the sequence of signal lines 550, 555, 560, 565, and 570. Therefore, this can significantly impact the performance of atomic memory operations. However, by using the techniques described herein, performance can be significantly improved if transactions are targeted to an ordered channel-based completer. Figure 9 As shown, to be consistent with Figure 8 The same method routes the four atomic store operations from internal interface 500 to interconnect interface 505, and in fact, the first atomic store request is as before. Figure 8 The data is released in the same manner as indicated by signal line 535. However, in this case, the combined data pull signal and completion signal also provide an ordered channel indication, and in this case, the existence of an ordered channel is identified (the OCE flag is set to 1), as indicated by signal path 540'.
[0097] At this point, since the requester element understands that all transactions target the same address and will therefore be processed by the same completer element, and that the requester element understands that the ordered channel coexists with the completer element, it can immediately issue all subsequent atomic storage requests without waiting for any additional acknowledgment signals from the completer element, such as... Figure 9The series of transmissions indicated by 545, 555, and 565. When appropriate, combined completion and data pull signals will be provided for each of these transactions, as indicated by signal lines 550', 560', and 570'.
[0098] Despite Figure 9 In this context, the requester element is considered a fully consistent master (RN-F), but the same technique can also be used for I / O consistent masters (RN-I). Similarly, the completer element can be a non-consistent master (HN-I) instead of a fully consistent master (HN-F).
[0099] Figure 10 This is a flowchart illustrating steps performed by an interconnect interface of a requester element according to an exemplary embodiment. At step 600, when considering a new transaction, it is determined whether a previous transaction involved the same address. This could be a previous transaction or any earlier transaction in a sequence. If the new transaction does not involve the same address as a previous transaction, the process proceeds to step 605, where a request is sent. Specifically, at this stage, it is determined that there is no address hazard. However, if a previous transaction involved the same address, the process proceeds to step 610, where it is determined whether OCE information is still available for that previous transaction with the same address. As discussed previously... Figure 9 It will be apparent that this can be provided as part of a response signal issued by a completer element that processes transactions for that address. Although at step 610, the requester may be waiting to receive the signal before determining the OCE information, in an alternative implementation, the requester element may maintain a storage device in which it can capture addresses and associated OCE values provided for previous transactions involving those addresses, and thus refer to the storage device to determine whether the OCE information is available.
[0100] Then, once the OCE information is available for the address, it is determined at step 615 whether the OCE indication is set to 1 to indicate an ordered channel. If not, the process proceeds to step 620, where a request for a new transaction is sent only if the requester element has already received data fetch responses for all older transactions at the same address. Figure 8 The signal timing scheme.
[0101] However, if the OCE flag is set to 1, the process proceeds to step 625, where a request can be sent once the requester element has sent requests for all older transactions at the same address. Figure 9 The signal timing scheme is improved, resulting in significant performance improvements.
[0102] It should be understood that the techniques described herein enable significant performance improvements when processing sequences of ordered transactions, thereby allowing for the use of requester element designs across interconnects while optimizing the sorting process. This technique enables interconnects, such as CHI-based interconnects, to use PCIe root port designs, where the technique is fully optimized for write sorting, regardless of whether the completer element is a serialization point (e.g., HN-F) or not a serialization point (e.g., HN-I or CXRA components) but has an ordered channel. If transactions target completers with ordered channels, the technique further enables more efficient CPU designs for address-ordered transactions.
[0103] While this technology can be used in a variety of different applications, some non-limiting examples of applications (where this technology can help achieve high throughput) include atomic processing of local fully consistent master nodes, atomic processing of remote fully consistent master nodes (in CCIX-based systems), ordered write processing of non-consistent master nodes (to support peer-to-peer PCIe writes), or ordered write processing of remote fully consistent master nodes and remote non-consistent master nodes (in CCIX-based systems).
[0104] In this application, the phrase "configured as..." is used to mean that the elements of the device have a configuration capable of performing the defined operations. In this context, "configuration" means the arrangement or manner of interconnection of hardware or software. For example, the device may have dedicated hardware that provides the defined operations, or a processor or other processing device may be programmed to perform the function. "Configured as" does not mean that the elements of the device need to be changed in any way to provide the defined operations.
[0105] While exemplary embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it should be understood that the invention is not limited to those precise embodiments, and various changes, additions, and modifications can be made therein by those skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. For example, features of the dependent claims may be combined with features of the independent claims in various ways without departing from the scope of the invention.
Claims
1. An apparatus for processing ordered transactions, the apparatus comprising: Multiple completer elements, the multiple completer elements being used to process transactions; A requester element, which is used to publish a sequence of ordered transactions; and An interconnector that provides a communication channel between the completer element and the requester element for each completer element, for transmitting signals between the completer element and the requester element; in: A given completer element processing a given transaction in the sequence is arranged to issue a response signal to the requester element via its associated communication channel. The response signal includes an ordered channel indication to identify whether the associated communication channel has an ordered channel attribute, wherein the ordered channel attribute guarantees that processing of transactions issued by the requester element in a given order via the associated communication channel will be completed by the given completer element in the same given order. The requester element responds to the ordered channel indication to control the timing of the issuance of at least one signal from the requester element, the at least one signal being associated with one or more transactions following the given transaction in the sequence.
2. The apparatus of claim 1, wherein the requester element is arranged to select a signal timing scheme from a plurality of signal timing schemes for use in the one or more transactions following the given transaction in the sequence, according to the ordered channel indication.
3. The apparatus of claim 2, wherein the requester element is arranged to, when the signal timing scheme is selected, further consider whether the given transaction and the one or more transactions following the given transaction should be processed by the same completer element.
4. The apparatus according to claim 2 or claim 3, wherein: During the processing of transactions in the sequence, the requester element is arranged to issue a release indication signal to the completer element processing the transaction when the requester element determines that the data being processed by all previous transactions in the sequence can be observed. The release indication signal authorizes the completer element to make the data being processed by the transaction available to other requester elements. and The multiple signal timing schemes use different criteria to determine when to issue the release indication signal.
5. The apparatus according to claim 4, wherein: Each completer element is configured to issue a complete signal to the requester element when processing a transaction at a specified memory address, indicating that the completer element has taken sufficient steps to ensure that the result of the operation required by the transaction will be observed by another requester element, which issues another transaction at the specified memory address to the completer element. and According to a first signal timing scheme among the plurality of signal timing schemes, the requester element is arranged to issue the release indication signal to the completer element processing the current transaction when the requester element has received a completion signal for all previous transactions in the sequence preceding the current transaction.
6. The apparatus according to claim 5, wherein: Each completer element is configured to send a data pull signal to the requester element while processing a transaction, thereby triggering the requester element to transmit the data item to be processed by the transaction to the completer element. and According to the second signal timing scheme of the plurality of signal timing schemes, the requester element is arranged to issue the release indication signal to the completer element processing the current transaction when the requester element has received a data pull signal for all previous transactions in the sequence preceding the current transaction.
7. The apparatus according to claim 6, wherein: The requester element is configured to consider which signal timing scheme to use for the current transaction: Detect when the response signal providing the ordered channel indication has been provided for both the current transaction and the previous transaction; The source indication field provided by the two response signals determines whether these response signals have been issued by the same completer element; as well as When the response signal has been issued by the same completer element and the ordered channel indication indicates the presence of the ordered channel attribute in the associated communication channel between the requester element and the same completer element, the second signal timing scheme is employed; and Otherwise, the first signal timing scheme shall be used.
8. The apparatus according to claim 7, wherein: The data pull signal is used as the response signal to provide the ordered channel indication.
9. The apparatus of claim 4, wherein the sequence of ordered transactions includes a sequence of ordered write transactions.
10. The apparatus according to claim 2, wherein: The requester element understands that the sequence of ordered transactions will be processed by the same completer element; and The requester element is arranged to consider which signal timing scheme to use for each subsequent transaction in the sequence following the current transaction: Detect when the response signal providing the ordered channel indication has been provided for the current transaction; When the ordered channel indication indicates that the ordered channel attribute exists in the associated communication channel between the requester element and the completer element, the request transmission signal is issued for each subsequent transaction once a request transmission signal has been issued for all transactions in the sequence preceding the subsequent transaction. as well as Otherwise, the timing of the release of the request transmission signal for each subsequent transaction is constrained until a predetermined signal has been received from the completer element for all transactions in the sequence preceding the subsequent transaction.
11. The apparatus according to claim 10, wherein: The sequence of ordered transactions includes a sequence of ordered write transactions, and the predetermined signal is a data pull signal, which is issued by the completer element to the requester element when processing a transaction, to trigger the requester element to transmit the data items to be processed by the transaction to the completer element.
12. The apparatus according to claim 11, wherein: The data fetch signal for the current transaction is used as the response signal to provide the ordered channel indication.
13. The apparatus according to any one of claims 10 to 12, wherein: The sequence of ordered transactions includes a sequence of ordered read transactions, and the predetermined signal is a read receive signal issued by the completer element to the requester element when processing a transaction.
14. The apparatus according to claim 13, wherein: The read receive signal for the current transaction is used as the response signal to provide the ordered channel indication.
15. The apparatus according to any one of claims 10 to 12, wherein the requester element knows that the sequence of ordered transactions will be processed by the same completer element because each transaction in the transaction specifies the same address.
16. The apparatus according to any one of claims 1 to 3, wherein the ordered channel indication is arranged to indicate the ordered channel attribute when both the transport layer and the protocol layer of the communication channel are constrained to process transactions sequentially.
17. A method for processing ordered transactions, the method comprising: Multiple completer elements are used to process transactions; Use a requester element to publish an ordered sequence of transactions; as well as A communication channel is provided for each completer element between the completer element and the requester element for transmitting signals between the completer element and the requester element; A given completer element that is processing a given transaction in the sequence publishes a response signal to the requester element through its associated communication channel. The response signal includes an ordered channel indication to identify whether the associated communication channel has an ordered channel attribute, wherein the ordered channel attribute guarantees that the processing of transactions published by the requester element through the associated communication channel in a given order will be completed by the given completer element in the same given order. as well as The requester element is arranged to control the timing of the issuance of at least one signal from the requester element in response to the ordered channel indication, the at least one signal being associated with one or more transactions following the given transaction in the sequence.
18. An apparatus for processing ordered transactions, the apparatus comprising: A device for using multiple completer elements to process transactions; A means for using requester elements to publish a sequence of ordered transactions; as well as A means for providing a communication channel between the completer element and the requester element for each completer element, so as to transmit signals between the completer element and the requester element; A means for causing a given completer element processing a given transaction in the sequence to issue a response signal to a requester element via its associated communication channel, the response signal including an ordered channel indication to identify whether the associated communication channel has an ordered channel attribute, wherein the ordered channel attribute guarantees that the processing of transactions issued by the requester element in a given order via the associated communication channel will be completed by the given completer element in the same given order. as well as Means for arranging the requester element to control the timing of the issuance of at least one signal from the requester element in response to the ordered channel indication, the at least one signal being associated with one or more transactions following the given transaction in the sequence.