System and method for service redirection of memory access requests

By combining a DMA controller and a buffer, efficient signal fusion from multiple data sources is achieved, solving the problem of low signal processing efficiency in autonomous driving and autonomous driving applications and improving the processing power of computing devices.

CN120660079BActive Publication Date: 2026-06-19MERCEDES BENZ GRP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MERCEDES BENZ GRP
Filing Date
2023-12-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently process signal fusion from multiple data sources, especially in autonomous driving and self-driving applications, where sensor-generated signals are processed inefficiently and fail to effectively utilize the potential of commercially available processing units.

Method used

Employing a direct memory access (DMA) controller, it performs aggregation and scattering operations by receiving redirected write and read requests, and combines buffers and compute node pools to achieve efficient processing of signal streams.

🎯Benefits of technology

It improves the efficiency of signal processing and the utilization of computing resources, reduces computing latency, and enhances data processing capabilities in autonomous driving and self-driving applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120660079B_ABST
    Figure CN120660079B_ABST
Patent Text Reader

Abstract

Example methods, apparatuses, and / or articles of art are disclosed that can be implemented, wholly or partially, in conjunction with processing signal streams. In one application, a direct memory access (DMA) controller may perform a clustering operation, at least in part, based on one or more redirected write requests, to obtain one or more data items from memory; to translate the obtained data items to one or more addresses in the memory; and to perform a scattering operation based on the one or more addresses.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The topic disclosed in this article involves processing signals received from streams from multiple data sources. Background Technology

[0002] Self-driving and / or autonomous driving applications, as well as other automotive and robotic applications, may rely on the fusion of signals, measurements, and / or observations generated by multiple sensors. Processing for such applications may involve manipulating arrays of data elements. These applications can be executed and / or implemented by commercially available central processing units (CPUs) and / or graphics processing units (GPUs). Such commercially available processing units can be configured to manipulate elements of an input array to generate elements of an output array. Summary of the Invention

[0003] One embodiment disclosed herein relates to a system comprising: a memory including one or more memory devices; and a direct memory access (DMA) controller coupled to the memory via a bus, the DMA controller being configured to: receive one or more redirected write requests; perform an aggregation operation, at least in part based on the received one or more redirected write requests, to obtain one or more words from the memory; translate the one or more words obtained by performing the aggregation operation to one or more addresses in the memory; and perform a scatter operation to write a data item to the one or more addresses in the memory. In a particular embodiment, the memory is a first memory, and the system further includes a second memory operable / used as a buffer, wherein the buffer is configured to store values ​​and / or states retrieved from the first memory. The DMA controller may also be configured to parse the values ​​and / or states in the buffer to determine at least one address among the one or more addresses in the first memory. In one example, the DMA controller is further configured to apply one or more arithmetic operations to at least one of the parsed values ​​and / or states to determine the at least one address.

[0004] In another specific embodiment, the DMA controller is further configured to interpret the one or more redirected write requests as requests for a clustering operation following a scattering operation. In yet another specific embodiment, the DMA controller is further configured to form a request for the scattering operation for the redirected write request based at least in part on the one or more addresses in the memory to which the data from the first clustering operation is interpreted and / or transformed. In yet another specific embodiment, the system further includes an initiator configured to initiate the one or more redirected write requests, wherein the DMA controller is configured to perform the first clustering operation in response to a signal from the initiator. The initiator of the one or more redirected read requests may include one or more registers of local memory or a computing node configured to receive sensor measurements and / or observations as data items in a signal stream.

[0005] Another specific embodiment disclosed herein relates to a method at a direct memory access (DMA) controller, the method comprising: performing a first aggregation operation based at least in part on one or more redirected read requests; transforming one or more words obtained by performing the first aggregation operation to one or more addresses; and performing a second aggregation operation to forward a data item located at the one or more addresses to a destination determined at least in part based on the one or more redirected read requests. In one example, transforming the one or more words obtained by performing the first aggregation operation to the one or more addresses further comprises parsing values ​​and / or states in a buffer to determine a memory address. In another example, the method further comprises applying one or more arithmetic operations to the parsed values ​​and / or states to determine the memory address.

[0006] In one particular embodiment, the method further includes interpreting the one or more redirected read requests as a first aggregation operation request among two aggregation operation requests. In another particular embodiment, the method further includes forming a second request among the two aggregation operation requests based at least in part on the one or more addresses determined from the result of the first aggregation operation. In yet another particular embodiment, the method further includes performing the first aggregation operation in response to a signal from an initiator of the one or more redirected read requests. In one example, the initiator of the one or more redirected read requests includes a process performed by a computing node to perform one or more sensor fusion operations.

[0007] Another specific embodiment disclosed herein relates to a system comprising: a memory including one or more memory devices; and a direct memory access (DMA) controller coupled to the memory via a bus. The DMA controller may be configured to: receive one or more redirected read requests; perform a first aggregation operation, at least in part based on the received one or more redirected read requests, to obtain one or more words from the memory; translate the one or more words obtained by performing the first aggregation operation to one or more addresses; and perform a second aggregation operation to forward data items located at the one or more addresses to a destination, the destination being determined at least in part based on the one or more redirected read requests. Attached Figure Description

[0008] The claimed subject matter is specifically pointed out and clearly claimed in the conclusion section of the specification. However, regarding the organization and / or operating methods, together with their purpose, features and / or advantages, if related to the appendix... Figure 1 For the best understanding of the above items, please refer to the following detailed description, as shown in the accompanying diagram:

[0009] Figure 1 This is a schematic diagram of the computing equipment according to the implementation plan;

[0010] Figure 2A and Figure 2B This is a schematic diagram of a computing device including a direct memory access (DMA) controller and / or engine according to one embodiment, the direct memory access (DMA) controller and / or engine including buffers;

[0011] Figure 3A and Figure 3E This is a schematic diagram of a computing device including a DMA controller and / or engine according to one embodiment, the DMA controller and / or engine including buffers for facilitating distributed aggregation operations;

[0012] Figure 3C This is a schematic diagram illustrating the non-addressable portion of an addressable line according to one embodiment;

[0013] Figure 3B and Figure 3D It is a flowchart of a process for facilitating decentralized and aggregated operations according to an implementation plan;

[0014] Figure 4A and Figure 4D This is a schematic diagram of a computing device including a DMA controller according to one embodiment, the DMA controller including a buffer for facilitating the redirection of DMA transactions;

[0015] Figure 4B and Figure 4C It is a flowchart of a process for facilitating the redirection of DMA transactions according to an implementation plan;

[0016] Figure 5A and Figure 5C It is a schematic diagram of a computing device according to one embodiment for facilitating the processing of multiple signal streams from multiple associated sources;

[0017] Figure 5B It is a flowchart of a method for processing multiple data streams from multiple associated sources according to one implementation scheme;

[0018] Figure 6 This is an illustration depicting an example sensor signal aggregation for an example vehicle according to one embodiment; and

[0019] Figure 7 An example schematic block diagram depicts the characteristics of an example vehicle in a self-driving / autonomous driving application according to one implementation scheme.

[0020] Reference is made in the following detailed description to the accompanying drawings, which form part of this description, wherein similar reference numerals may refer to corresponding and / or similar parts throughout the text. It should be understood that the drawings are not necessarily drawn to scale, such as for simplicity and / or clarity of illustration. For example, the dimensions of some aspects may be exaggerated relative to others. Furthermore, it should be understood that other embodiments may be utilized. Additionally, structural and / or other changes may be made without departing from the claimed subject matter. Throughout this specification, references to “claimed subject matter” refer to subject matter intended to be covered by one or more claims or any part thereof, and are not necessarily intended to refer to the entire set of claims, a particular combination of claims (e.g., method claims, apparatus claims, etc.), or a particular claim. It should also be noted that directions and / or references (e.g., such as up, down, top, bottom, etc.) may be used to facilitate discussion of the drawings and are not intended to limit the application of the claimed subject matter. Therefore, the following detailed description should not be considered as limiting the claimed subject matter and / or its equivalents. Detailed Implementation

[0021] Throughout this specification, references to an embodiment, implementation, scheme, etc., mean that a particular feature, structure, characteristic, and / or similarity described with respect to a particular embodiment and / or scheme is included in at least one embodiment and / or scheme of the claimed subject matter. Therefore, the appearance of such phrases, for example, throughout this specification, is not necessarily intended to refer to the same embodiment and / or scheme or any particular embodiment and / or scheme. Furthermore, it should be understood that the described particular features, structures, characteristics, and / or similarities can be combined in various ways in one or more embodiments and / or schemes, and are therefore within the contemplated scope of the claims. Generally speaking, of course, as with the specification of a patent application, these and other issues may vary in a particular context of use. In other words, throughout this disclosure, the specific descriptive and / or context of use provides useful guidance on reasonable inferences to be drawn; however, again, "in this context," without further limitation, generally refers at least to the context of this patent application.

[0022] To facilitate efficient processing of data items in one or more signal streams, a Direct Memory Access (DMA) controller coupled to memory via a bus can be configured to receive one or more redirected write requests. The DMA controller can perform an aggregation operation, at least in part, based on the received one or more redirected write requests, to obtain one or more words from the memory; translate the one or more words obtained from the aggregation operation into one or more addresses in the memory; and perform a scatter operation to write the data item at the one or more addresses in the memory determined by the translation. In another embodiment, the DMA controller can process one or more redirected read requests by performing a first aggregation operation, at least in part, based on one or more redirected read requests; translate the one or more words obtained from the execution of the first aggregation operation into one or more addresses; and perform a second aggregation operation to forward the data item located at the one or more addresses determined by the translation to a destination, which is determined at least in part based on the one or more redirected read requests.

[0023] In some cases, the host central processing unit (CPU) can be part of the computing device in the vehicle and can process data items in memory for automotive applications. As an example, automotive applications such as self-driving / autonomous driving applications (e.g., fully autonomous, semi-autonomous, driver assistance systems, etc.) can employ particle filters to fuse sensor signals and / or observed signal streams to, for example, update the particle filter state. Such applications can be implemented, for example, in systems such as automated machines (cars, trucks, etc.). In this context, as referred to herein, a “signal stream” means the time-varying progression of a sequence of encoded data items to be delivered to a receiving device via a signal transmission medium. The encoded data items delivered in the signal stream (also referred to as data values) can express attributes indicating conditions and / or events, subject identifiers, timestamps indicating the time of the event, metadata, and just a few examples of attributes that can be expressed by encoded data items delivered in a signal stream. In a particular implementation, the signal stream may deliver sensor measurements and / or observations along with associated timestamps to indicate the time when such measurements and / or observations were obtained.

[0024] In one aspect of an implementation, information originating from different sources and arriving as corresponding signal streams can be processed as an "information confluence" to determine computational results. Such information confluences can be processed by associating and / or correlating information items from different sources according to specific attributes (e.g., time, space, reliability, confidence, etc.). The processing of this information confluence can then include performing one or more operations on the items based on one or more attributes to produce results. In a particular implementation, data items in the information confluence can be processed by updating one or more states of a particle filter. For example, such a particle filter can enable the processing of a confluence of arrays of sensor signals / observations (e.g., received from a signal stream) to update the states of measurement particles, filter particles, static particles, and / or dynamic particles. In one implementation, measurements and / or observations in the confluence of an array of measurement results can be generated from different sensors. Such measurements and / or observations generated by different sensors can still be associated and / or correlated in time and space. According to one implementation, associating data items from different sources in the array confluence can be achieved, at least in part, using cardinality sorting applied to the array of keys. In a specific implementation, sensor signals and / or observations can be formatted into an array to generate a bus of the array. An example procedure for generating such a bus of the array can be executed according to the following pseudocode:

[0025]

[0026] According to one implementation, the confluence of multiple signal streams may include a mapping of data items in different input signal streams to data items in one or more output signal streams. For example, data items in such input signal streams may include sensor measurements and / or observations from sensors associated with those input signal streams. Therefore, the confluence of such input signal streams may include a mapping of sensor measurements and / or observations (e.g., from different / different sensors associated with those input signal streams) to data items in the output signal stream. Such data items in the output signal stream may include values ​​inferred / calculated based on those sensor measurements and / or observations. In the pseudocode example provided above, the number of input signal streams may be defined as S_in[] (containing data item value_in[]), and the number of output signal streams may be defined as S_out[] (containing data item value_out[]). Here, the expression “(value_out[], S_out_enable[]) = f(value_in[],parameters) for(i: 0..(l-1))” can map the data item value_in[] in the input signal stream S_in[] to the data item value_out[] in the output signal stream S_out[] according to the function f().

[0027] According to one implementation, computing devices, circuitry, and / or logic may form a “convergence engine” (CE), also referred to as a “convergence” and / or a “convergence processor” (CP), to process the converging of data items as discussed above. Such converging of data items may include, for example, converging of signal streams and / or sequences of signal stream convergings, which have reduced latency and / or reduced computational resources (e.g., power, memory, etc.). In a particular implementation, the output of such a CE’s converging operation may be provided as all or part of the input to subsequent converging operations. Converging characteristics, such as functions like k, f(), and read_next() as shown in the pseudocode example above, may be part of the runtime programming of the CE.

[0028] According to one implementation, the CE may employ a Direct Memory Access (DMA) subsystem, which may include a DMA controller (also referred to as a DMA "engine") capable of being configured to initiate read and write operations, for example, between row-accessible memory and transfer memory. The specific procedures by which the CE accesses memory can be determined parametrically at compile time and physically at runtime. Therefore, in a particular implementation, the events that trigger the execution of the DMA controller may not be limited to events occurring at the Arithmetic Logic Unit (ALU) (e.g., loads and stores from the ALU). The DMA controller can be triggered to execute a DMA transaction by the start of a merging operation. The DMA controller can then execute such DMA transactions independently of the ALU (e.g., depending only on the availability of read and write access at the valid endpoints of such DMA transactions).

[0029] Figure 1 This is a schematic diagram of a system 100 for performing DMA transactions. System 100 includes multiple components that communicate via bus 101. Components include a host CPU 102, a memory controller 106, RAM 108 (which may form main system memory), peripheral devices 114, and a DMA controller 112. In this context, "direct memory access" as used herein refers to a process performed by one or more hardware subsystems and / or circuitry to access a specific memory independently of a specific processing unit and / or central processing unit (e.g., independent of host CPU 102). According to one embodiment, DMA controller 112 may initiate operations to access (e.g., read or write access) random access memory (RAM) 108 via bus 101 independently of host CPU 102. For example, DMA controller and / or engine 112 may control and / or execute transactions to transfer data items (also referred to as data values) between peripheral devices 114 and RAM 108 independently of the actions of host CPU 102 (via memory controller 106). For example, such DMA transactions can be triggered by signals, conditions, and / or events (e.g., interrupt signals).

[0030] As discussed above, the processing of data items for automotive or other computing applications can be enhanced through the Convergence Engine (CE). Figure 2A and Figure 2BThis is a schematic diagram of a computing device 200 according to one embodiment, which includes a direct memory access (DMA) controller 212 (also referred to as a DMA engine 212) and a buffer 216 to implement one or more aspects of a CE. The computing device 200 may be, or can be formed into, for example, a system-on-a-chip (SoC), a microchip, control circuitry, or some other computing device. In some specific embodiments, the computing device 200 may form a vehicle controller, such as an advanced driver assistance system (ADAS) device, a telematics control unit (TCU), an electronic control unit (ECU), a centralized vehicle computer, or some other vehicle controller. Figure 2A and Figure 2B As illustrated, computing device 200 may also include CPU 202 (also referred to as host CPU), transfer memory 218, row-accessible memory 208, buffer 216, and compute node (CN) pool 220 (also referred to as CN pool 220).

[0031] In one embodiment, the CN may include a single processing circuit core capable of performing operations to map input operands to output computation results. In another embodiment, the CN may include multiple differentiated processing cores for performing operations to map input operands to output computation results. In yet another embodiment, two or more non-concurrently executing CNs may be implemented on the same processing circuit core. For example, the processing circuitry may use a first CN to generate an output result (e.g., stored in transfer memory 218) that serves as input to a subsequently executed second CN implemented on the same processing circuitry.

[0032] The CNs in CN pool 220 may include dedicated local memory (e.g., static random access memory (SRAM)) and general-purpose registers to receive operands for operations to be performed and / or provide the results of operations. In this example, host CPU 202 may use transfer memory 218 to store data items and may control the CNs in CN pool 220 to perform operations on these data items. Transfer memory 218 may be physically closer to the CNs and / or may operate with lower access latency, and therefore may be used as a (high-speed) cache to store data items. Row-accessible memory 208 may be external to transfer memory 218 and may provide a larger amount of memory space relative to transfer memory 218, but may be physically farther away from CN pool 220 and may operate with longer access latency. According to one embodiment, transfer memory 218 may include one or more synchronization mechanisms to facilitate communication between and / or between CNs in CN pool 220 (e.g., synchronization for communication between CNs with different execution latencies). In one particular implementation, row-accessible memory 208 can be separated from transfer memory 218, CPU 202, and buffer 216 via a bus (not shown). According to one embodiment, buffer 216 may be formed within the core circuitry for implementing the DMA controller and / or engine 212, such that buffer 216 is distinct from and separated from the circuitry for forming transfer memory 218. This formation of buffer 216 within the core circuitry of the DMA controller and / or engine 212 can reduce and / or minimize the latency associated with loading data items into buffer 216 and storing data items from buffer 216 during DMA operations.

[0033] In one embodiment, the DMA controller and / or engine 212 may be configured to interface / dock with transfer memory 218 and row-accessible memory 208. Transfer memory 218 and / or row-accessible memory 208 may also be cached row-addressable. In other words, in this example, row-accessible memory 208 may be cached row-addressable memory. Transfer memory 218 and / or row-accessible memory 208 can provide data items (also referred to as data values), which can be manipulated, operated on, or otherwise processed by CNs in CN pool 220. According to one embodiment, all or a portion of transfer memory 218 may be organized as a cache that can be integrated with commercially available components. It should also be noted that caching in transfer memory 218 or in buffer 216 can mitigate manufacturing defects and / or enable embodiments to be adapted to applications larger than intended.

[0034] In one implementation, the DMA controller and / or engine 212 may be configured to handle cache line-sized data items (e.g., 64 bytes or 128 bytes), where such data items can be located and accessed by cache line addresses even during scatter-gather operations. Such cache line addresses for 64-byte cache lines can be expressed using binary notation ending with six zeros. Similarly, cache line addresses for 128-byte cache lines can be expressed using binary notation ending with seven zeros. According to one implementation, the DMA controller and / or engine 212 may be configured to handle word-sized data items even when line-accessible memory 208 can continue to be addressable only at cache lines. Transfer memory 218 may be organized as a (high-speed) cache, a word-addressable register, or a collection of transfer memory buffers such as a word-width first-in-first-out (FIFO) buffer, or a combination thereof. In one implementation, such transfer memory buffers in transfer memory 218 may include circuitry and / or devices specifically designed to act as FIFO buffers. In other implementations, such transfer memory buffers in transfer memory 218 may include static random access memory (SRAM) devices or network-on-chip (NOC) devices specifically configured to act as FIFO buffers. To facilitate scattering or collecting / aggregating operations to transfer word-sized data items to and from row accessible memory 208, the DMA controller and / or engine 212 may implement buffer 216 (e.g., located between row accessible memory 208 and transfer memory 218). Buffer 216 may be configured to store bytes from multiple cache lines together for placement into a destination word in transfer memory 218. The DMA controller and / or engine 212 may also be capable of performing multicast write operations in either direction (e.g., from transfer memory 218 to row accessible memory 208, or from row accessible memory 208 to transfer memory 218). Buffer 216 may be distinct from transfer memory 218 and the memory DMA controller and / or engine 212. As discussed herein, buffer 216 may be formed in the client core to implement DMA controller and / or engine 212.

[0035] In this context, as used herein, "transfer memory" means circuitry used to facilitate communication between and / or among data items of CNs, such as CNs in CN pool 220. In one particular embodiment, such transfer memory may transfer the result of an execution of a first operation at a first CN as an input operand for a second operation to be performed at a second CN (e.g., in a computational pipeline). In one particular embodiment, transfer memory 218 may be organized as a static random access memory (SRAM) device for accessing controlled memory or as a shared memory, word-addressable or SIMD vector-addressable register file, or circuitry and / or devices specifically configured to act as a first-in-first-out (FIFO) buffer. Such circuitry and / or devices specifically configured to act as FIFO buffers may have a word width or a single-instruction, multiple-data (SIMD) vector width, or be circuitry or a network-on-chip (NOC) device coupled between endpoints, or combinations thereof, to name just a few.

[0036] As described above, the CNs in CN pool 220 (e.g., the CN pool) can operate on data items read from memory. In this context, as referred to herein, a “compute node” means an identifiable and distinct set of computing resources (e.g., hardware and executable instructions) that can be configured to perform operations to process input values ​​to provide output values. The CNs in CN pool 220 may include scalar CNs and / or processing circuitry cores to implement arithmetic logic units (ALUs), digital signal processors (DSPs), vector CNs, VLIW engines, or field-programmable gate arrays (FPGAs) or combinations thereof, providing only a few examples of specific circuitry cores that can be used to implement CNs in CN pool 220. CN pool 220 can be implemented according to various architectures. For example, CN pool 220 may include one or more CNs implemented in full-featured or simplified forms according to a Reduced Instruction Set Computing (RISC) architecture, Complex Instruction Set Computing (CISC) or Very Long Instruction Word (VLIW) architecture, or some combination of these types. CNs in CN pool 220 may also include combinations of scalar, SIMD, or Multiple Instruction and Single Data Stream (MISD) ALUs.

[0037] According to one implementation, the features of computing device 200 may include commercially available features such as, for example, transmission rings, Kloss networks, shuffling circuits. CN pool 220 may facilitate multithreading in CNs, CN clustering, mailboxes, interrupts, inter-CN synchronization features, atomic operations at locations in transmission memory 218, and features added to meet security and safety requirements; a few examples are provided only.

[0038] In a particular implementation, CN clusters within CN pool 220 may be formed at least in part based on resource trade-offs and may be permanently defined within the integrated circuit (IC) device. In some implementations, depending on how CN pool 220 will be configured, two devices within the same IC may implement different clusters of associated CNs. For example, the processor configuration may define the processing of multiple flows with CN clusters at least in part based on the associated flows to be executed. Each associated cluster may, for example, process the flows of an array such that the output of one cluster is provided to the input of one or more other clusters.

[0039] According to one embodiment, the transfer memory 218 may form one or more buffers, including one or more First-In-First-Out (FIFO) buffers. These one or more FIFO buffers may include “vertical” FIFO buffers. Such a vertical FIFO buffer may be a buffer having an endpoint that interfaces with the DMA controller and / or engine 212 (such endpoints may be referred to as “external” endpoints) and another endpoint that forms a register (such endpoints may be referred to as “internal” endpoints), which provides data items that can be used as operands for CNs in CN pool 220. The internal endpoint of the vertical FIFO buffer may be shared by multiple CNs in CN pool 220. If such an internal endpoint includes a FIFO output (i.e., the outgoing end of the FIFO buffer), broadcasting to multiple CNs in CN pool 220 can be implemented, for example. If such an internal endpoint includes a FIFO input (i.e., the incoming end of the FIFO buffer), hardware locks and / or instructions executed on CNs in CN pool 220 can prevent race conditions. In some specific embodiments, the FIFO buffers formed in the transfer memory 218 may also include “horizontal” FIFO buffers. The horizontal FIFO buffer can be a buffer with two endpoints that provide data items as operands for different CNs in CN pool 220, or simply provide data items to locations in transfer memory 218. It should be noted that the FIFO buffer formed in transfer memory 218 can provide transparent blocking and releasing mechanisms, for example, when input and output speeds differ. If the external endpoints of the FIFO buffer in transfer memory 218 are registers or operands that some CNs in CN pool 220 need to consume / process, and if CN processing is slow, the DMA controller and / or engine 212 can eventually implement a blocking mechanism. Conversely, if the DMA controller and / or engine 212 writes slowly to such a FIFO buffer in transfer memory 218, the CNs in CN pool 220 can eventually implement a blocking mechanism. A similar transparent blocking and releasing mechanism can exist if the FIFO buffer is implemented between and / or in and / or between locations in transfer memory 218.

[0040] In an IC device, if the transfer memory 218 forms a FIFO buffer, the circuitry at the endpoints of the FIFO buffer can be permanent or configurable (e.g., via internal FPGA circuitry). Specific implementations may include segments of vertical and horizontal FIFO buffers formed in the IC device circuitry. Such buffer circuitry may have endpoints that can be configured at runtime to originate from at least one of: operands or registers of CNs in CN pool 220, an interface to buffer 216, a location in transfer memory 218, and endpoints of other FIFO buffers in transfer memory 218. If the endpoints of the FIFO buffers in transfer memory 218 are operands or registers of CNs in CN pool 220, such FIFO buffer endpoints may be shared among multiple CNs in CN pool 220.

[0041] According to one implementation, CNs in CN pool 220 may receive operands for computational operations, for example, from registers in a register file, from a FIFO buffer in transfer memory 218, and / or from special constant and parameter registers (which may include shared registers). In some cases, a portion of the register file may be shared among multiple CNs in CN pool 220. Within computing device 200, host CPU 202 may also be able to access the constant and parameter registers. Host CPU 202 can provide host functions, such as, for example, initiating a confluence operation to be performed after a configuration task has been completed. Such configuration tasks may include defining CN clusters in CN pool 220, configuring inter-CN communication, configuring the width and depth of the FIFO buffer in transfer memory 218, configuring the endpoints of the FIFO buffer in transfer memory 218, defining manager CNs in CN pool 220 and CN clusters in CN pool 220 to be managed by the manager CNs, establishing a DMA controller and / or engine 212, establishing atoms, establishing communication and synchronization resources, monitoring the termination of confluences, and to name a few examples.

[0042] In one implementation, buffer 216 can facilitate bus processing in several ways. In one such non-limiting aspect, the input signal stream in the bus may include an “indirect stream” (e.g., a set of addresses in row-accessible memory 208 to be read). In this case, such an indirect signal stream may not be directly fed to a CN in CN pool 220, but may be provided to a DMA controller and / or engine 212, which may then transmit a signal stream of read data items to that CN. In other words, the latency of access via random read operations (random reads of rows or words or random indirect reads of rows or words or combinations thereof) can be hidden by extracting the pattern of random access, building a significantly long access list, and performing the necessary extraction and indirection at the DMA controller and / or engine 212 itself. In other cases, the input signal stream may include a “double indirect stream”, where the signal stream may contain addresses, at which data items provide further indirection after some processing. In a particular implementation, the first level of indirection in a double indirect stream may be read and used multiple times. In sensor fusion applications, multiple busbars can be configured, where the initial busbar can be constructed using lookup tables (LUTs) for data items at the first and second levels of indirection (e.g., within the transfer memory 218 itself). However, it should be understood that these examples are not limiting.

[0043] According to one implementation, some processing within the bus of the signal stream (e.g., within f() and read_next() in the pseudocode example above) can be viewed as extracting information from data items distributed across CN pool 220. An example of such extraction is provided in Table 1 below.

[0044]

[0045] Table 1

[0046] According to one implementation, results can be provided by CNs in CN pool 220, which operate collaboratively using operations such as shifting, shuffling, broadcasting, and multicasting of operands between participating CNs, to name just a few. Such features can exist within a CN, for example, as word operands of a SIMD vector CN within CN pool 220. CN pool 220 can be configured to have such features for inter-CN communication via configurable bridge circuitry between CNs. Such circuitry can be hardwired or configurable at runtime.

[0047] According to one implementation, CNs in CN pool 220 (e.g., configured in CN clusters) can be configured for specialized processing functions. In a particular implementation, such specialized CNs in CN pool 220 can facilitate the management of an application's processing flow. For example, CN pool 220 may include one or more processing CNs 224 and one or more manager CNs 222. Processing CNs 224 may perform processing operations on, for example, sensor observations, measurements, and / or other signals. In this example, manager CNs 222 may manage different sets of processing CNs 224. Processing CNs 224 may communicate with one or more manager CNs 222. In some cases, manager CNs 222 may provide information to processing CNs 224 based on communications from processing CNs 224. Processing CNs 224 may continue their processing as defined by the information provided by manager CNs 222. In some implementations, manager CNs 222 may include physically separate processing CNs 224, or may only include specialized circuitry formed within processing CNs 224.

[0048] It should be noted that the rates at which the different individual signal streams (e.g., from different sources such as different sensors) of the confluence are generated and consumed (e.g., processed) may not necessarily be equal. The rate at which such signal streams are consumed or generated can be determined, for example, by operational characteristics such as a function f() (as shown in the pseudocode example above). For example, the rate at which the signal streams of measurements and / or observations from sensors used for sensor fusion operations can be calculated / generated by the function f(). Such sensor fusion operations may involve an inverse sensor model that results in an output signal stream that is longer than the input signal stream. To handle the confluence of longer signal streams, the output rate can, for example, be matched to the bandwidth / throughput of the row-accessible memory 208. For example, configuring the output of one CN to be the input of another CN can assist in load balancing.

[0049] According to one implementation, computing device 200 enables the deployment of advanced sensor fusion operations to update particle filter states (e.g., in autonomous driving or other motor vehicle applications) while consuming very little power. In one application, instances of computing device 200 can be implemented as, for example, a sequence of pipeline stages. For such applications, the characteristics of computing device 200 can be configured at use to have different amounts of resources allocated to different pipeline stages. By using FIFO buffers (e.g., FIFO buffers formed in transfer memory 218), the exchange of data items between and / or within the pipeline stages and row-accessible memory 208 can be synchronized transparently (e.g., without mutexes, spinlocks, etc.).

[0050] In another embodiment, computing device 200 may be configured to be optimized for power, space, and / or performance (e.g., accuracy and / or latency). While the features of computing device 200 may be suitable for implementing CE, the features of computing device 200 may be suitable for other applications, including, for example, applications that rely on random access to row-accessible memory 208. The features of computing device 200 may also be implemented in so-called “supercomputers.” In the context of supercomputers, the low-power characteristics of computing devices can help overcome power constraints that may prevent the implementation of, for example, exascale supercomputers. Additionally, the circuitry used to implement computing device 200 can combine security and safety features in a manner that meets the requirements of embedded computing devices. Given the small physical size and low power consumption of computing device 200, its use may not necessarily be limited to being used as an external accelerator integrated circuit (IC) device, but may also be incorporated into subsystems within automotive-grade system-on-chip (SOC) IC devices.

[0051] In one aspect, for example, computing device 200 may include a specific arrangement and / or configuration of CNs (e.g., CN pool 220), transfer memory 218, and / or DMA controllers and / or engines 212. Computing device 200 may be configured to provide a network of CNs and memory elements suitable for a particular type of computing, such as for specific applications processing signal streams (e.g., carrying measurement and / or observation results from sensors). In one implementation, such a network of CNs and memory elements enables the simultaneous processing of multiple signal streams with high throughput and low latency. Such a network of CNs and memory elements may be implemented, at least in part, using an implementation of an in-device communication protocol (e.g., AXI) via physical connections and FIFO buffers. As noted above, endpoints of the FIFO buffer may include addressable memory locations or registers to receive, for example, operands (e.g., general-purpose registers configured as operands for computation operations in the ALU of a computing node) or results computed by the CNs. For example, a FIFO buffer pool may include endpoints that can be configured to be associated with various CNs or memories.

[0052] In one aspect, specific embodiments disclosed herein relate to so-called vectorized input / output (I / O) operations, including “scatter” and “aggregate” operations. Such vectorized operations can, for example, enable high-throughput transfer of large amounts of data to or from physical memory (e.g., multiple addressable rows of memory in row-accessible memory 208) using a single request or command, enhancing efficiency and convenience. For example, an aggregation operation might require sequentially reading data from multiple memory locations (e.g., buffers) in a single transaction, and writing the read data to a signal stream or a contiguous portion of memory. In one embodiment, the DMA controller and / or engine 212 may perform an aggregation operation to serve aggregation requests (e.g., originating from an application) specifying multiple memory locations from which data items are not necessarily row-aligned, and a destination (e.g., a memory address) for storing the read items. On the other hand, a scatter operation might require reading data items from a signal stream or contiguous memory, and writing the read data items to multiple different memory locations that are not necessarily row-aligned. In one implementation, the DMA controller and / or engine 312 may perform an aggregation operation to serve a distribution request (e.g., originating from an application) that specifies the location of data items to be read (e.g., a contiguous memory address) and may also specify the location to which the read data items are to be indirectly written (by requesting to read certain locations, it becomes possible to determine the location to which the read data items are to be written). The DMA controller and / or engine 312 may also perform the indirectly specified aggregation operation.

[0053] According to one implementation, using DMA transactions to assist in processing data items in a signal stream can be enhanced by using scatter and aggregate operations. In a particular implementation, the contents loaded into a buffer from an aggregate operation can be used to determine one or more addresses for subsequent aggregate or scatter operations. Figure 3A and Figure 3E This is a schematic diagram of a computing device 300 according to one embodiment, including a DMA controller and / or engine 312, which includes a buffer 316 for facilitating scatter and / or aggregation operations or communicating with the buffer. In a particular embodiment, computing device 300 may include computing device 200 (…). Figure 2A and Figure 2B One or more features of the DMA controller and / or engine 312. The DMA controller and / or engine 312 can be configured as a distributed aggregated multicast DMA engine (SGM-DMA), but with, for example, the ability to address words within addressable lines.

[0054] According to one implementation, the DMA controller and / or engine 312 may receive input signal streams and / or feed such input signal streams as blocks on a virtual channel via a FIFO buffer to the initial cluster of the CN. Subsequently, the output signal stream from the initial cluster of the CN may be fed as blocks on the virtual channel or as data via a FIFO buffer to subsequent downstream clusters of the CN. For example, transfers between clusters of CNs may be multicast transfers as determined by the specific application. At any stage, some or all of the output signal streams from some CN clusters may be returned to the DMA controller and / or engine 312 (e.g., during DMA divergence or aggregation operations).

[0055] According to one embodiment, the DMA controller and / or engine 312 may perform certain scatter and / or aggregation operations to transfer data items from one non-contiguous block of memory to another non-contiguous block of memory using a series of smaller contiguous block transfers. Here, obtaining such data items from a non-contiguous block of source memory can be performed in an aggregation operation. Similarly, writing data items to a non-contiguous block of destination memory can be performed in a scatter operation. In one specific embodiment, the smallest memory unit that can be accessed in such source or destination memory may be a single addressable row of values ​​and / or states (e.g., a single cached row or word in row-accessible memory). For example, the DMA controller and / or engine 312 may communicate with row-accessible memory (LAM) 308, which may be accessible on a row-by-row basis.

[0056] According to one embodiment, physical memory (such as LAM 308) may include bit units for defining values ​​and / or states to express information such as one or zero. Such physical memory may also organize bit units into words containing an integer number of 8-bit bytes (e.g., a four-byte word over 32 bits or an eight-byte word over 64 bits). Additionally, such physical memory may define a row address (e.g., a word row address) associated with consecutive bits of an "addressable row" defining the values ​​and / or states. For example, in response to a read or write request (e.g., originating from a host processor), a memory controller may access a portion of the memory in a read or write transaction anchored according to the word row address specified in the request. To service a read request, for example, the memory controller may retrieve the values ​​and / or states of all bytes for the row associated with the row address specified in the read request. Similarly, to service a write request, the memory controller may write the values ​​and / or states of all bytes for the addressable row associated with the row address specified in the write request. While a row address can specify the memory location containing all consecutive bytes of an addressable row, such a row address does not specify the location of individual sub-sections of such an addressable row, such as individual bytes or consecutive bytes smaller than the whole of the addressable row, or bytes spanning multiple addressable rows. Such sub-sections of additional addressable rows are referred to herein as “non-addressable sections”.

[0057] According to one implementation, an addressable row may define the smallest memory cell that can be located and / or accessed according to a memory addressing scheme. Figure 3C In a particular example implementation, such an addressable line 360 ​​may consist of smaller memory units such as bits, bytes, or words. In a particular exemplary implementation, the addressable line 360 ​​contains n+1 bytes 3620 to 362 n As noted above, certain specific implementations may involve updating non-addressable portions and / or portions smaller than the entire addressable line and / or non-addressable portions spanning multiple lines (whether including or excluding addressable lines). In the illustrated example implementation, bytes 3622 and 3623 may define the non-addressable portion of addressable line 360. Here, while the entire addressable line may be locatable and / or accessible via a unique address according to a memory addressing scheme, bytes 3622 and 3623 (smaller than the entire line) are not addressable according to that memory addressing scheme.

[0058] In one implementation, LAM 308 may have LAM controller 306 configured to receive requests specifying the address of a data item line stored in LAM 308. Such a data item line may include multiple words or bytes and may be the smallest unit of data item that LAM 308 can retrieve and return to another device. According to one embodiment, computing device 300 may use buffer 316 to enable access to smaller, non-addressable portions of the data item line (e.g., single or multiple bytes in a word or group of words). According to one embodiment, the circuitry forming buffer 316 may be integrated with the circuitry forming a DMA controller and / or engine 312, thereby minimizing latency for accesses to buffer 316 initiated by the DMA controller and / or engine 312. For example, buffer 316 may be configured as a static random access memory (SRAM) device that can be accessed by the DMA controller and / or engine 312 without initiating requests and / or transactions on the main memory bus (e.g., the bus coupled to LAM 308 or the host computer / processor).

[0059] According to one implementation, the DMA controller and / or engine 312 may communicate with the initiator 322. The initiator 322 may include a device (e.g., at least partially implemented by circuitry and / or logic) that implements specific states to trigger one or more DMA transactions to be executed by the DMA controller. For example, the initiator 322 may include an ALU output register, a buffer, or a hardware interrupt handler; these are just a few examples of devices that can initiate DMA transactions.

[0060] According to one implementation, the DMA controller and / or engine 312 may obtain a list of aggregation requests in response to a signal from initiator 322. In a particular implementation, initiator 322 may trigger a DMA transaction in response to an event or condition during the execution of a particle filtering process. For example, a particle filtering process may identify data items in memory that are expected to be retrieved for processing in a future execution loop. Once a large number of such data items have been identified, a list of aggregation requests identifying such data items (e.g., as redirected aggregation requests) may be forwarded to the DMA controller and / or engine 312. In one implementation, such a list of aggregation requests may be provided to the DMA controller and / or engine 312 in shared memory or a network on-chip (NOC), to name just a few examples. Once such a list is known to be available for processing by the DMA controller and / or engine 312, a process for generating the list (e.g., execution of computer-readable instructions) may trigger the DMA controller and / or engine 312 via an interrupt or published message. Such triggers may initiate the DMA controller and / or engine 312 to initiate one or more aggregation operations (e.g., redirected aggregation operations).

[0061] In response to a signal from initiator 322, the DMA controller and / or engine 312 may acquire a list of aggregation requests in the form of a linked list. For example, such a linked list may be capable of being located in memory (e.g., a reorganization buffer 316 or row accessible memory (LAM) 318) based on an address provided by initiator 322. According to one embodiment, such a list of aggregation requests may include individual aggregation requests that can be used as independent aggregation requests, separate from other aggregation requests in the list. The DMA controller and / or engine 312 may correlate the addresses in such aggregation requests with those to be determined by the memory controller (e.g., ...). Figure 1 The memory controller 106 performs a (potentially smaller) list of row read requests. Each row read request in this list may indicate a physical memory address (e.g., in LAM 308) that specifies the memory location the memory controller intends to read to service the individual row read requests. The memory controller can service such row read requests by loading the requested row into buffer 316. When a requested row read by the memory controller arrives at buffer 316, the DMA controller and / or engine 312 may refer to the original list of aggregate requests (used to form the list of row read requests) to extract the requested data item from the read row arriving at buffer 316. The DMA controller and / or engine 312 may form packets from the extracted items to be forwarded to one or more requesting entities (e.g., procedures performed on the CN of host CPU 202 and / or CN pool 220). In one aspect, the data item retrieved from the row stored in buffer 316 may include two or more non-addressable portions of the row (e.g., selected bytes and / or fields within an addressable row / word, such as an addressable row containing data items in memory).

[0062] Figure 3B A flowchart illustrating a process 350 for an aggregation operation according to one aspect of this disclosure is provided. In one embodiment, process 350 may include operations 352, 354, 356, and 358, which may be performed by one or more circuitry such as a DMA controller and / or engine 312 and / or buffer 316. Operation 352 may include processing one or more aggregation requests to determine one or more addressable lines of data items to be retrieved from memory. For example, operation 352 may map parameters in a received request to addresses in LAM 308. Operation 354 may be implemented by circuitry to load values ​​and / or states into memory and may include loading signals and / or states (e.g., expressions of data items) stored in one or more addressable lines of memory such as LAM 308 into buffer 316. Figure 3BAs illustrated, operation 356 may include resolving one or more non-addressable portions of the line loaded into buffer 316 at operation 354 into portions such as single or multiple bytes in, for example, a word or a group of words. Operation 358 may include processing two or more aggregation requests by, for example, returning the non-addressable portions resolved at operation 356.

[0063] like Figure 3B As illustrated, operation 358 may include responding to multiple aggregation requests presented to the DMA controller and / or engine 312 in the form of an aggregation request list. Loading one or more addressable lines into buffer 316 at operation 354 may occur in response to the aggregation request list. Operation 354 may parse two or more non-addressable portions based on data items specified in the aggregation request list. For example, operation 356 may decode parameters in the aggregation request to be mapped into byte offsets from the line address to the corresponding non-addressable portion. The parsed portions may then be forwarded to the initiator of the aggregation request (e.g., initiator 322). Here, for example, if multiple aggregation requests specify parsed portions within a single addressable line, process 350 may enable multiple aggregation requests to be served by accessing the same single addressable line loaded into buffer 316. This eliminates the need for the DMA controller and / or engine 312 to access the same addressable line multiple times for individual aggregation requests for data items in the same addressable line (e.g., in LAM 308).

[0064] To serve one or more aggregation requests, process 350 may aggregate portions less than the entire addressable line in memory by loading the addressable line into a buffer and resolving non-addressable portions to be provided to the requester. While some aggregation requests may require aggregation less than the entire addressable line, one or more received aggregation requests may require aggregation of the entire addressable line and / or multiple lines and / or bytes spanning multiple lines. For aggregation requests requiring aggregation less than the entire addressable line, the DMA controller and / or engine 312 may execute process 350. According to one embodiment, for aggregation requests requiring aggregation of the entire addressable line, the DMA controller and / or engine 312 may bypass operations 354, 356, and 358 and perform an aggregation operation that does not load the addressable line into buffer 316.

[0065] In another implementation, the DMA controller and / or engine 312 may obtain a list of scatter requests in response to a signal from initiator 322. The DMA controller and / or engine 312 may obtain a list of scatter requests in the form of a linked list, which can be located in memory according to addresses provided by initiator 322. The DMA controller and / or engine 312 may then combine addresses to be accessed by such scatter requests with a potentially smaller list of row read requests to be performed by a memory controller (e.g., memory controller 106). For example, scatter requests in the list referencing data items in the same addressable row of memory may be combined such that only a single row read is required (serving multiple scatter requests to access a data item). The requested row read by the memory controller may then be loaded into buffer 316.

[0066] According to one embodiment, the obtained list of scatter requests may indicate specific non-addressable portions (e.g., individual bytes or fields) of addressable lines to be read and loaded into buffer 316. These non-addressable portions in the addressable lines loaded into buffer 316 can then be modified and / or overwritten. When a request line read by the memory controller arrives at buffer 316, the DMA controller and / or engine 312 may refer to the original list of scatter requests to determine the specific non-addressable portions in the read lines arriving at buffer 316 to be modified and / or overwritten. The DMA controller and / or engine 312 may form packets from the modified lines in buffer 316 for writing back to memory via the memory controller.

[0067] Figure 3DA flowchart illustrating a process 370 for a scatter operation according to one embodiment is shown. Process 370 may include operations 372, 374, 376, and 378. Operation 372 may include, for example, receiving one or more scatter requests from initiator 322. Operation 374 may include loading signals and / or states (e.g., expression data items) of one or more addressable rows in a memory such as LAM 308 into buffer 316. In a particular embodiment, DMA controller and / or engine 312 may determine the addressable rows to be retrieved from LAM 308 at operation 374 by processing one or more scatter requests. Operation 376 may include writing values ​​and / or states to at least one non-addressable portion of the row loaded into buffer 316 at operation 374 to at least partially modify one or more addressable rows stored in buffer 316. For example, operation 374 may write values ​​and / or states to two or more non-addressable portions based on multiple scatter requests received at block 372. Operation 378 can fulfill such a scatter request by initiating a write operation to write at least one of the addressable lines modified at operation 374 back to memory (e.g., LAM 308). For example, operation 378 can initiate an operation to write back to the line address in the LAM 308 addressable line loaded at operation 374 and modified at operation 376.

[0068] In a particular implementation, operation 374 may be initiated by a scatter request received at operation 372. Here, one or more addressable lines whose values ​​and / or states are loaded into buffer 316 at operation 374 may be obtained by servicing one or more line read requests performed by a memory controller (e.g., memory controller 106). To write a modified addressable line, operation 378 may include initiating the memory controller to perform one or more operations to write one or more modified addressable lines. Here, process 370 may enable the servicing of multiple scatter requests by accessing a single addressable line loaded into buffer 316. For example, multiple scatter requests received at operation 372 may specify words, bytes, fields, etc., within the same single addressable line. This eliminates the need for the DMA controller and / or engine 312 to access the same addressable line multiple times for individual scatter requests for data items in the same addressable line (e.g., in LAM 308). Process 370 may further include converting the plurality of scatter requests received at operation 372 into a list of row read requests to be issued to the memory controller (a list of row read requests for one or more addressable rows specifying values ​​and / or states). This conversion of the plurality of scatter requests into a list of row read requests may further include constructing at least one single row read request for an addressable row in memory containing data items requested by at least two of the scatter requests received at operation 372.

[0069] To serve one or more scatter requests, process 370 may update a portion of the addressable line in memory less than the entire addressable line by loading the addressable line into buffer 316 and updating some portions of the loaded addressable line while leaving others unchanged. While some scatter requests may require updates less than the entire addressable line, one or more received scatter requests may require an update of the entire addressable line, and the DMA will only write to that addressable line instead of performing a read-modify-write operation. For scatter requests requiring updates less than the entire addressable line, the DMA controller and / or engine 312 may execute process 370. The DMA controller and / or engine 312 may also be configured to serve scatter requests requiring an update of the entire addressable line by bypassing loading the addressable line into buffer 316. Here, to complete such an update of the entire addressable line, the DMA controller and / or engine 312 may initiate a write operation to the addressable line in LAM 308 without loading the addressable line into buffer 316.

[0070] According to one embodiment, the DMA controller and / or engine 312 may receive multiple scatter requests that collectively request an update to the same overlapping portion of an addressable line in LAM 308. This can create conflicts regarding how the overlapping portion will be updated to serve the multiple scatter requests. For example, such multiple scatter requests may be ordered based on creation time or reception time. According to one embodiment, for example, a conflict concerning the updating of a portion of an addressable line by multiple scatter requests may be resolved based on the most recently created or received scatter request.

[0071] In specific implementations of processes 350 and 370, the non-addressable portion of a row stored in buffer 316 can be a set of bytes, bytes, or fields, etc. While specific actions in the above-described scatter and aggregate operations are described as occurring in a specific order, the concurrent execution of some actions and the concurrent or sequential execution of others are engineering choices. Furthermore, physical optimizations, such as the number and type of processing cores to be used to implement the features of the DMA controller and / or engine 312 and interface engine, the number and type of memory blocks for associated memory elements and buffer 316, the number of ports of the memory forming LAM 308, and addressability characteristics, can be chosen as engineering choices. For example, buffer 316 may or may not be byte-addressable, and the DMA controller and / or engine 312 may include, for example, a scalar or vector engine.

[0072] Figure 4A and Figure 4DThis is a schematic diagram of a computing device 400 according to one embodiment, including a DMA controller and / or engine 412, which communicates with a buffer 416 and an initiator 422 to facilitate the redirection of DMA transactions. In a particular embodiment, computing device 400 may include computing device 200 ( Figure 2A and Figure 2B One or more features of ) . According to one embodiment, the DMA controller and / or engine 412 may obtain a list of redirected read requests in response to a signal from initiator 422. According to one embodiment, a read request may include a message and / or signal specifying one or more target memory addresses to be accessed in a read transaction to serve the read request. For example, a read request may specify one or more target addresses as word row addresses in memory containing content to be retrieved in a memory read transaction to serve the read request. As used herein, “redirected read request” means a read request that has been transformed or altered such that the content at the original target memory address is modified and / or transformed to a different memory address. Here, the different memory address specifies a location in memory containing content to be retrieved in a read transaction to serve the redirected read request. For example, the DMA controller and / or engine 412 interprets the address specified in such a redirected read request as a word aggregation request to store the aggregated words in buffer 416. For example, the size of the associated word to be read first for the word aggregation request portion used to serve the redirected read request may depend at least in part on how the associated word will be interpreted to form the target address for redirection. In one implementation, the word to be read may include, for example, an address or an index of an array that can be converted to an address. In one implementation, the DMA controller and / or engine 412 may then convert the aggregated word stored in buffer 416 into an address and merge that address (i.e., the converted aggregated word) with the redirected read request to form an aggregated request. The DMA controller and / or engine 412 may then service the formed aggregated request to send the resulting data item to the destination, as described in the redirected read request obtained in response to a signal from initiator 422.

[0073] Figure 4BThis is a flowchart of a process 450 for facilitating DMA transaction redirection according to one embodiment. Process 450 may include operations 452, 454, 456, and 458. Operation 454 may include performing an aggregation operation (e.g., a word aggregation operation) based at least in part on one or more redirected read requests received at operation 452. In this operation, for example, the DMA controller and / or engine 412 may obtain or otherwise receive such a redirected read request in response to a signal from initiator 422. In a particular embodiment, the DMA controller and / or engine 412 may interpret the one or more redirected read requests as a request for a first aggregation operation of two aggregation operations to be performed. During the first aggregation operation performed at operation 454, "aggregated" data items may be loaded into buffer 416.

[0074] Operation 456 may include transforming one or more aggregated data items stored in buffer 416 to one or more addresses to specify a subsequent aggregation operation. For example, operation 456 may include transforming one or more words obtained by performing a first-word aggregation operation (performed at operation 454) to one or more addresses (also referred to as one or more address values). In a particular implementation, operation 456 may include resolving values ​​and / or states (from the aggregation operation) in buffer 416 to determine one or more memory addresses in LAM 408. Operation 456 may also include applying one or more arithmetic operations to the resolved values ​​and / or states to determine the one or more memory addresses in LAM 408. For example, operation 456 may apply one or more arithmetic operations to the resolved values ​​and / or states to be stored in the buffer, which will form memory addresses to memory locations in LAM 408. Such formed addresses to memory locations in LAM 408 may form the basis for subsequent aggregation operations.

[0075] According to one implementation scheme, the arithmetic operations for the 456 applications can be limited by the following expression (1):

[0076] address = base + x × element_size, (1)

[0077] in:

[0078] address is the target address (e.g., for an aggregation operation determined at operation 456 or for a dispersion operation at operation 476).

[0079] x is the value obtained from an aggregation operation (e.g., at operation 454 or 474); and

[0080] base and element_size are parameters provided in the redirected request (e.g., a redirected read request received at operation 452 or a redirected write request received at operation 472).

[0081] Operation 458 may include performing a second (e.g., subsequent) aggregation operation to forward data items located at the one or more determined addresses to a destination. Such a destination may be determined at least in part based on the one or more redirected read requests. In another embodiment, operation 458 may perform two or more aggregation operations based on one or more addresses obtained at operation 456. According to one embodiment, when performing aggregation operations, operation 458 may interpret one or more redirected read requests as two requests for the aggregation operation.

[0082] According to one embodiment, a write request may include messages and / or signals specifying one or more target memory addresses to be accessed in a memory write transaction to serve the write request. For example, a write request may specify one or more target addresses as word row addresses of locations in memory to be written to (serve the write request) in a memory write transaction. As used herein, a “redirected write request” means a write request that has been transformed or altered such that the original target memory address is modified and / or redirected to a different target memory address. Here, the different target memory address specifies a location in memory to be written to in a write transaction to serve the redirected write request.

[0083] In another specific embodiment, the DMA controller and / or engine 412 may obtain a list of redirected write requests in response to a signal from initiator 422. The DMA controller and / or engine 412 may interpret addresses specified in such redirected write requests as aggregation requests. For example, such aggregation requests may aim to load aggregated words into buffer 416. For example, the associated word to be read may have a size that depends at least in part on how the associated word will be interpreted to form the target address for redirection. One or more such words loaded into buffer 416 may be interpreted to form the target address for redirection. In one embodiment, such words loaded into buffer 416 may include, for example, addresses or indices of an array that can be converted to addresses. In one embodiment, the DMA controller and / or engine 412 may then convert the aggregated words stored in buffer 416 into addresses and merge those addresses with the redirected write requests to form a scatter request. The DMA controller and / or engine 412 may then service the formed scatter request, resulting in line updates based on the redirected write requests obtained in response to a signal from initiator 422.

[0084] Figure 4CThis is a flowchart of a process 470 for facilitating DMA transaction redirection according to one embodiment. Operation 472 may include performing a word aggregation operation based at least in part on one or more redirected write requests received at operation 472. For example, the DMA controller and / or engine 412 may receive such a redirected write request at operation 472 in response to a signal from initiator 422. In one particular embodiment, the DMA controller and / or engine 412 may interpret the one or more redirected write requests as requests for a word aggregation operation to be performed after the execution of a word scattering operation. During the word aggregation operation performed at operation 474, "aggregated" data items may be loaded into buffer 416. Operation 476 may include transforming one or more aggregated data items (from the aggregation operation) stored in buffer 416 to one or more addresses to specify a subsequent word scattering operation. In one particular embodiment, operation 476 may include parsing values ​​and / or states in buffer 416 to determine one or more memory addresses in LAM 408. For example, operation 476 may be performed at least in part by circuitry (e.g., a DMA controller and / or engine adapted to parse values ​​and / or states in a buffer) to determine at least one of the one or more addresses. For example, operation 476 may apply one or more arithmetic operations to the parsed values ​​and / or states to be stored in the buffer, the parsed values ​​and / or states forming memory addresses in LAM 408. Operation 476 may also include applying one or more arithmetic operations to the parsed values ​​and / or states to determine the one or more memory addresses leading to memory locations in LAM 408.

[0085] Operation 478 may include performing a scatter to write a specific data item to one or more addresses obtained at operation 476, at least in part, based on the one or more redirected read requests. According to one embodiment, operation 476 may apply arithmetic operations to calculate the target address for the scatter operation to be performed at operation 478 according to expression (1). For example, the DMA controller and / or engine 412 may form such a scatter request at least in part based on the content at one or more addresses determined at operation 476. Such a specific data item to be written from such a scatter request may, for example, be specified in one or more redirected write requests received at operation 472 in response to a signal from initiator 422. In a particular embodiment, operation 478 may include interpreting the content at the address determined at operation 476 as the address for the scatter operation.

[0086] A specific implementation of a computing device for processing multiple signal streams from multiple associated sources is provided by Figure 5A and Figure 5CThe computing device 500 is shown in the figure. In a particular embodiment, the computing device 500 may include the computing device 200 ( Figure 2A and Figure 2B The computing device 500 may include one or more features of the computing node (CN) pool 520, a block memory pool, and a FIFO buffer, as well as an NOC interconnecting various components, having associated register files and local memory. As noted above, the CNs in the CN pool 520 may include scalar CNs and / or processing circuitry cores to implement arithmetic logic units (ALUs), digital signal processors (DSPs), vector CNs, VLIW engines, or field-programmable gate arrays (FPGAs) or combinations thereof. Only a few examples of specific circuitry cores that can be used to implement the CNs in the CN pool 520 are provided. In one embodiment, a CN may include a single processing circuitry core capable of performing operations to map input operands to output computation results. In another embodiment, a CN may include multiple differentiated processing cores for performing operations to map input operands to output computation results. The CNs in the CN pool 520 may include dedicated local memory (e.g., SRAM) and general-purpose registers to receive operands for the operations to be performed and / or provide results from the execution of the operations. In one specific implementation, the CNs in CN pool 520 may be formed by one or more processing circuitry cores integrated / configured with elements of memory pool 508, such as a FIFO buffer (e.g., transfer memory 218). In one implementation, memory pool 508 may provide memory resources to be shared among the CNs in CN pool 520 and may facilitate communication between and / or among the CNs in CN pool 520. For example, the endpoints of a FIFO buffer integrated / configured with CN pool 520 may include memory blocks local to or isolated from the register files of the CNs in CN pool 520 and / or the CNs. The functionality of the CNs in CN pool 520 may be that of a host processor, a processing manager, or a signal processor, or a combination thereof, to name just a few examples. According to one implementation, the CNs in CN pool 520 may support atomic operations, interrupts, and other inter-processor communication (IPC) features to facilitate signal communication between and / or among the CNs in CN pool 520.

[0087] According to one embodiment, computing device 500 can receive a signal stream containing data items (e.g., sensor signals, observations and / or measurement results, timestamps, metadata, etc.) fed from external sources such as sensors and / or memory. The external memory (not shown) may be coupled to a DMA controller and / or engine (e.g., DMA controller and / or engine 312 and / or 412). CNs in CN pool 520 can also provide a source for the signal stream containing data items. For example, CNs in CN pool 520 can feed data items from the signal stream to be loaded into a FIFO buffer. CNs in CN pool 520 can also feed data items from the signal stream by transmitting blocks of output parameters as data items within an IC device in the NOC. According to one embodiment, a single logical signal stream can be multicast to multiple CNs in CN pool 520. Alternatively, a single logical signal stream can be segmented into sub-signal streams to be fed to a subset of CNs in CN pool 520. In another embodiment, some CNs in CN pool 520 can process data items from various signal streams to provide output signal streams as input streams to other CNs in CN pool 520. The final output of the CNs in CN pool 520 may include signal streams output from computing device 500 to a destination such as an actuator, memory, storage device, or display device, to name just a few examples. Processing between and / or among the CNs in CN pool 520 may be controlled and / or orchestrated via a combination of interrupts, polling of status flags, and periodic detection of work, to name just a few examples.

[0088] Figure 5BThis is a flowchart of a method 500 (also referred to as process 500) for processing multiple signal streams from multiple associated sources according to one embodiment. According to one embodiment, process 500 may combine data items received from the signal streams from multiple sources to update the state of a particle filter (e.g., to support automated operation of a motor vehicle). Such multiple sources may include, for example, multiple sensor devices (e.g., deployed in a motor vehicle), such as cameras, speedometers, active sensing devices (e.g., radar / liDAR), environmental sensors (e.g., thermometers, light sensors, altimeters, etc.), microphones, to name just a few examples. Such signal streams from multiple sources may be received in signal packets from a communication network (e.g., signal packets including sensor measurements and / or observations obtained from remote vehicles). In a particular embodiment, data items received from the signal streams from multiple sources may include sensor measurements and / or observations having common attributes that can be identified by process 500. In a particular embodiment, such common attributes may be identified by metadata that is co-located with the measurements and / or observations received from the multiple sources. For example, such common attributes can be associated with time (e.g., timestamps indicating the time of measurement and / or observation), space (e.g., location referencing an origin such as a point on a motor vehicle), and source (e.g., which particular physical sensor on the motor vehicle or which particular type of sensor among the sensors mounted on the motor vehicle).

[0089] Operation 552 may include associating a data item at least in part based on common attributes of data items received from multiple data streams. Such multiple data streams may be provided as the output of read / aggregate operations such as CN and / or DMA aggregation or word aggregation or redirection. For example, operation 552 may sort and / or correlate (e.g., “bucketing”) measurements and / or observations received from different signal streams by time (e.g., according to timestamps) and / or space (e.g., the location of the observed object relative to a reference point). In a particular embodiment, operation 552 may correlate measurements and / or observations acquired at approximately the same time and received from different sources (e.g., different sensors) with the location of a particle defined in the current state of the particle filter. In a particular embodiment, operation 552 may correlate measurements and / or observations from different sources based on the location of a specific object observed and / or measured by such correlated observations and / or measurements. In another specific implementation, data items from each of a plurality of signal streams may be loaded into a buffer (e.g., buffer 216) associated with a direct memory access DMA controller associated with the signal stream. Operation 552 may then identify at least one common attribute among the common attributes associated with the data items loaded into the buffer, based at least in part on the content of the data items loaded into the buffer.

[0090] Operation 554 may include simultaneously loading the data item associated with operation 552 into one or more registers of the compute node (e.g., ALU or general-purpose registers of other processing cores forming the compute node) without storing the data item in line-accessible memory. According to one embodiment, the compute node's registers may be loaded with data items to be retrieved on an execution cycle of the compute node. For example, data items loaded into the compute node's registers in an execution cycle (e.g., at an endpoint of a FIFO buffer) can provide operands for computational operations to be performed by the compute node in the next execution cycle. In a particular embodiment, one or more registers of the compute node may include endpoints of an associated FIFO buffer formed by internal memory (e.g., memory pool 508). Multiple such FIFO buffers having endpoints at the compute node's registers may be synchronized to apply data items from multiple sources (e.g., loaded data items from different sensors) as operands of the compute node with common attributes (e.g., temporal and spatial attributes). Operation 556 may include the execution of a computing node to process the data item loaded simultaneously at operation 554, to process the loaded data item as an operand of one or more computing operations (e.g., to perform one or more functions, such as, for example, updating the state of a particle filter). The data item output from the execution of one or more computing operations at operation 556 may form data items for additional signal streams to be processed by additional computing nodes and / or for storage in memory.

[0091] In one implementation, operation 552 may be performed by a first computing node that sorts the sensor observations and / or measurements at least in part based on the associated timestamps and locations of the objects observed and / or measured by the sensor observations and / or measurements received from multiple signal streams. At block 554, the sorted sensor observations and / or measurements may then be loaded into one or more registers of a second computing node. At block 556, the execution of the second node may then combine the sorted sensor observations and / or measurements.

[0092] In another embodiment, the data item of the additional signal stream, which is the output of operation 556, may be loaded as an operand for one or more additional computational operations into one or more registers of a subsequent computational node. For example, one or more direct memory access transactions may be performed to store the data item of the additional signal stream into external memory, a word scatter operation may be performed to write the data item of the additional signal stream, a redirected write operation may be performed to write the data item of the additional signal stream, or the data item of the additional signal stream may be provided as a control signal to one or more actuators, or a combination thereof.

[0093] In this context, "simultaneous loading," as used herein, refers to the loading of data items to be processed by a compute node within the same execution loop. If data items from a synchronous signaling stream are to be loaded simultaneously into a compute node's register, such data items may be loaded into that register within the same execution loop of the compute node (e.g., as operands for computational operations performed in the next execution loop). If data items from corresponding asynchronous signaling streams are to be loaded simultaneously into a compute node's register, such data items may be loaded into that register in different (e.g., adjacent) execution loops of the compute node. For example, the execution of the compute node may be paused by one or more execution loops to allow multiple data items from different asynchronous signaling streams to be loaded into that register as operands for computational operations within the compute node's execution loop. In another embodiment, transparent blocking and releasing mechanisms may be applied to the compute node to facilitate the simultaneous loading of data items from asynchronous signaling streams into the compute node's registers.

[0094] According to one embodiment, data items in at least two associated signal streams at operation 552 may include sensor observations and / or measurements associated at operation 554. Data items loaded simultaneously at operation 554 may then include associated sensor observations from multiple signal streams (e.g., from multiple different sources). In a particular embodiment, operation 552 may include combining the sensor observations and / or measurements from the at least two signal streams to provide combined sensor observations and / or measurements. Operation 552 may then sort and / or correlate (e.g., bucket) the combined sensor observations and / or measurements based at least in part on the associated timestamps and locations of the objects observed and / or measured by the sensor observations and / or measurements.

[0095] In a particular implementation, operation 554 may at least partially utilize process 350. Figure 3B ) and / or process 450 ( Figure 4BThis can be achieved through various means. For example, the DMA controller and / or engine can perform aggregation operations as described in processes 350 and / or 450 to populate queues held by FIFO buffers (e.g., using associated measurements and / or observations) to load registers of the compute node. Such execution by the DMA controller and / or engine can selectively load data items into one or more registers based at least in part on indications of common attributes in the contents of the data items loaded into the FIFO buffers. In one specific implementation, multiple FIFO buffers may have endpoints at corresponding registers of the compute node, wherein queues of different FIFO buffers are to be populated with data items from different signal streams (e.g., measurements and / or observations from different sensors). Queues of multiple FIFO buffers can be populated such that data items from different signal streams associated by specific attributes (e.g., temporal and spatial attributes) are simultaneously loaded into endpoints of the FIFO buffers (e.g., at general-purpose registers of the compute node). According to one embodiment, operations 552 and 554 can be performed by a first compute node to simultaneously load associated data items into one or more registers of a second compute node in an execution cycle of the second compute node. In subsequent execution loops of the second compute node, the second compute node can use the associated data item simultaneously loaded at block 554 as an operand to perform computational operations. In one embodiment, the simultaneous loading of the data item at block 554 can be facilitated by a FIFO buffer having an endpoint at a register of the second compute node. Here, the first compute node can fill the queue of the FIFO buffer with associated data items from different signal streams, such that the corresponding associated data items arrive at the endpoint of the FIFO buffer in the same or near execution loop. Such filling of the queue of the FIFO buffer with associated data items eliminates any need for the first compute node to store the associated data item in row-accessible memory (e.g., row-accessible memory 208 or RAM 108).

[0096] In another specific implementation, the result of the execution of the compute node at operation 556 can provide data items for one or more additional signal streams. In one implementation, such data items for additional signal streams can be loaded into one or more registers of subsequent compute nodes and / or downstream compute nodes. In one example, data items at the output register of a compute node (e.g., loaded from the execution of the compute operation) can be accessed via DMA write transactions, word-distributed DMA transactions, and / or redirected distributed DMA transactions (e.g., according to...). Figure 4CProcess 470) is transferred to row-accessible memory. For example, such a DMA transaction may store data items of an additional signal stream into that row-accessible memory. In another example, such data items provided at the output register of a compute node may be applied as data items of an input signal stream to one or more other compute nodes. In a particular implementation, the result of the execution from the first compute node at operation 556 may be loaded into the output register of an associated FIFO buffer defined as (e.g., from memory pool 508). The associated FIFO buffer may then have a second endpoint defined as an input register of a second compute node to receive the result determined by the execution of the first compute node as an operand.

[0097] According to one embodiment, operations 552 and 554 can be performed by a first CN in CN pool 520, while operation 556 can be performed by a second CN in CN pool 520. At operation 554, the first compute node can simultaneously load associated data items (e.g., data items containing sensor observations and / or measurement results associated at least partially based on spatial and temporal attributes, and simultaneously loaded data items including the associated sensor observations and / or measurement results) into one or more registers of the second CN in CN pool 520. At operation 556, the second CN can then process the data items simultaneously loaded by the first CN as operands into one or more compute operations. In one specific embodiment, a first FIFO buffer can define a first endpoint as a register of the first CN (e.g., the output register of the first CN). A second endpoint of the first FIFO buffer can define a first register (e.g., the input register of the second CN) in one or more registers of the second CN. As can be observed, at operation 554, the first FIFO allows data items to be simultaneously loaded into one or more registers of the second CN without storing the associated data items in row-accessible memory, as noted above. In another specific implementation, the FIFO buffer may define the first endpoint as a register of the third CN in CN pool 520, and the second endpoint as at least the second register of one or more registers of the second CN. Here, at operation 554, both the first CN and the third CN may simultaneously load associated data items into one or more registers of the second CN node without storing the associated data items in row-accessible memory.

[0098] According to one embodiment, all or part of computing devices 200, 300 (e.g., including features implementing processes 350 and / or 370, such as circuitry for forming a DMA controller), 400 (e.g., including features implementing processes 450 and / or 470, such as circuitry for forming a DMA controller), and / or 500 (e.g., including features implementing process 550) may be formed and / or expressed in transistors and / or lower metal interconnects (not shown) in processes such as those for forming complementary metal-oxide-semiconductor (CMOS) circuitry (e.g., frontend-of-line and / or backend-of-line processes), by way of example only. However, it should be understood that this is merely an example of how circuitry can be formed in a device during a frontend-of-line process, and the claimed subject matter is not limited in this respect.

[0099] It should be noted that the various circuits disclosed herein can be described using computer-aided design tools and are expressed (or represented) in terms of their behavior, register transfers, logic components, transistors, layout geometry, and / or other characteristics as data and / or computer-readable instructions embodied in various computer-readable media (e.g., non-transitory storage media). The formats of files and other objects in which such circuit expressions can be implemented (e.g., in circuit devices) include, but are not limited to, formats supporting behavioral languages ​​such as C, Verilog, and the Very High Speed ​​Integrated Circuit Hardware Description Language (VHDL); formats supporting register-level description languages ​​such as Register Transfer Language (RTL); formats supporting geometric description languages ​​such as Graphical Design System II (GDSII), Graphical Design System III (GDSIII), Graphical Design System IV (GDSIV), Caltech Intermediate Format (CIF), Manufacturing Electron Beam Exposure Systems (MEBES), and any other suitable formats and languages. Such formatted data and / or instructions can be embodied in storage media including, but not limited to, various forms of non-volatile storage media (e.g., optical, magnetic, or semiconductor storage media) and carrier waves that can be used to transmit such formatted data and / or instructions via wireless, optical, or wired signaling media or any combination thereof. Examples of transmission of such formatted data and / or instructions by carrier waves include, but are not limited to, transmission (upload, download, email, etc.) via the Internet and / or other computer networks via one or more electronic communication protocols (e.g., Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), etc.).

[0100] If such a data- and / or instruction-based expression of the circuit described above is received within a computer system via one or more machine-readable media, it can be processed by a processing entity within that computer system (e.g., one or more processors) in conjunction with the execution of one or more other computer programs to generate a representation or image of the physical representation of such circuit. These one or more other computer programs include, but are not limited to, netlist generation programs, placement and routing programs, etc. Such a representation or image can then be used in device manufacturing, for example, by implementing the generation of one or more masks for forming various components of the circuit in a device manufacturing process (e.g., a wafer fabrication process).

[0101] In the context of this patent application, the terms "between" and / or similar terms are understood to include "among" (if appropriate for a particular purpose), and vice versa. Similarly, in the context of this patent application, the terms "compatible with," "compliant with," and / or similar terms are understood to include basic compatibility and / or basic compliance, respectively.

[0102] Unless otherwise stated, in the context of this patent application, the term "or" (in the case of an associated list such as A, B, or C) is intended to mean "A, B, and C" (used herein in an inclusive sense) and "A, B, or C" (used herein in an exclusive sense). Under this understanding, "and" is used in an inclusive sense and is intended to mean A, B, and C; while "and / or" may be used cautiously to clearly indicate that all the foregoing meanings are contemplated, although such use is not necessary. Furthermore, the terms "one or more" and / or similar terms are used to describe any feature, structure, characteristic, etc., in the singular form, and "and / or" is also used to describe a plurality of features, structures, characteristics, and / or similar items and / or some other combination of features, structures, characteristics, and / or similar items. Similarly, the terms "based on" and / or similar terms are understood to not necessarily convey an exhaustive list of factors, but rather to allow for the existence of additional factors that are not necessarily explicitly described.

[0103] Algorithm descriptions and / or symbolic representations are examples of techniques used by those skilled in the art of signal processing and / or related fields to convey the essence of their work to others skilled in the art. In the context of this patent application, an algorithm is generally considered a self-consistent sequence of operations and / or similar signal processing that leads to a desired result. In the context of this patent application, operations and / or processing involve the physical manipulation of physical quantities. Typically, although not strictly necessary, such quantities may take the form of electrical and / or magnetic signals and / or states that can be stored, transmitted, combined, compared, processed, and / or otherwise manipulated, such as electronic signals and / or states that constitute components of various forms of digital content, such as signal measurements, text, images, video, audio, etc.

[0104] Primarily for general reasons, referring to such physical signals and / or physical states as bits, values, elements, parameters, symbols, characters, items, samples, observations, weights, numbers, numerical values, measurements, contents, etc., has proven convenient in some cases. However, it should be understood that all these and / or similar terms are associated with appropriate physical quantities and are merely convenient labels. Unless otherwise specifically stated, as is apparent from the foregoing discussion, it should be understood that throughout this specification, discussions using terms such as “processing,” “calculating,” “deriving,” “establishing,” “obtaining,” “identifying,” “selecting,” “generating,” etc., can refer to the actions and / or processes of a particular device (such as a dedicated computer and / or similar dedicated computing and / or network equipment). Therefore, in the context of this specification, a dedicated computer and / or similar dedicated computing and / or network equipment is capable of processing, manipulating, and / or transforming signals and / or states, typically in the form of physical electronic and / or magnetic quantities, within the memory, registers, and / or other storage devices, processing devices, and / or display devices of the dedicated computer and / or similar dedicated computing and / or network equipment. In the context of this particular patent application, as mentioned, the term "specific device" therefore includes general computing and / or networking devices (once they are programmed to perform specific functions, such as according to program software instructions), such as general-purpose computers.

[0105] In some cases, the operation of a memory device (such as a state change from binary one to binary zero or from binary zero to binary one) may include transitions, such as physical transitions. For certain types of memory devices, such physical transitions may include a physical transition of an article to a different state or thing. For example, but not limited to, for some types of memory devices, a state change may involve the accumulation and / or storage of charge or the release of stored charge. Similarly, in other memory devices, a state change may include physical changes, such as a change in magnetic orientation. Likewise, a physical change may include a change in molecular structure, such as a change from a crystalline form to an amorphous form or from an amorphous form to a crystalline form. In yet another memory device, a change in physical state may involve quantum mechanical phenomena, such as superposition, entanglement, and / or similar terms, for example, quantum mechanical phenomena that may involve qubits. The foregoing is not intended to be an exhaustive list of all examples where a state change from binary one to binary zero or from binary zero to binary one in a memory device may include transitions, such as physical but not transient transitions. Rather, the foregoing is intended as illustrative examples.

[0106] Figure 6 This is an example illustration of a sensor signal aggregation pattern depicting an embodiment of a vehicle 2200 capable of operating in one or more autonomous driving modes (e.g., fully autonomous mode, semi-autonomous mode, driver-assisted mode). As depicted, an automated vehicle such as vehicle 2200 may include multiple sensors to provide measurements and / or observations to be processed, such as according to processes 350, 370, 450, 470 and / or 550. Although in Figure 6 The description specifies a particular pattern and / or a particular number of sensors, but the scope of the subject matter is not limited in these respects. For example, a system or device such as vehicle 2200 may include any arrangement and / or configuration of any number of sensors in a wide range of arrangements and / or configurations. Furthermore, although... Figure 6 Sensors of a vehicle, described as a two-dimensional representation but capable of operating in one or more automated modes, may generate signals and / or signal packets representing conditions around the vehicle 2200 in three-dimensional space. In specific implementations, one-dimensional and / or two-dimensional sensor measurements may be combined and / or otherwise processed to produce, for example, a three-dimensional representation of the conditions around the vehicle 2200.

[0107] In a specific implementation, various sensors may be installed in vehicle 2200, for example, to capture observations and / or measurements of different parts of the environment surrounding and / or near the vehicle. In a specific implementation, vehicle 2200 may include multiple different sensors capable of detecting incoming signals such as, for example, light signals, electromagnetic signals, and / or sound signals. Each sensor may have a different field of view of the environment surrounding vehicle 2200. Although example fields of view 2210a to 2210h are depicted, the scope of the subject matter is, of course, not limited in these respects.

[0108] In a specific implementation, sensor signals and / or signal packets may be utilized by at least one processor of vehicle 2200, for example, to identify objects and / or other environmental conditions near vehicle 2200, which may be utilized by the processing system of vehicle 2200 to autonomously guide the vehicle through, for example, the environment. Examples of objects detectable in the environment surrounding a vehicle, such as vehicle 2200, may include other vehicles, trucks, cyclists, pedestrians, animals, rocks, trees, lampposts, guardrails, painted lines, traffic lights, buildings, road signs, etc. Some objects may be stationary, and other objects, such as pedestrians, may move across the environment.

[0109] In one embodiment, one or more sensors of the example vehicle 2200 may generate signals and / or signal packets that represent at least a portion of the environment surrounding and / or near the vehicle 2200. Other sensors may provide signals and / or signal packets representing the vehicle 2200's speed, acceleration, orientation, position (e.g., via a Global Navigation Satellite System (GNSS)), etc. As described more fully below, the sensor signals and / or signal packets may be processed, for example, via a particle filter to generate multiple particles. Such particles may be at least partially utilized to influence the operation of the vehicle 2200. In one embodiment, when the vehicle 2200, for example, moves forward through the environment, the sensor signals and / or signal states may be at least partially utilized to update the particle filter, for example, to further influence the operation of the vehicle. As discussed more fully below, for a particle filter, etc., utilizing a relatively broad range of signals and / or signal packets generated by various sensors, any of a plurality of sorting operations may be performed on the sensor signals and / or signal packets.

[0110] In this context, a "particle" refers to a digital representation, at least partially derived from sensor signals and / or signal packets, of the environmental conditions at a specific point in a specific coordinate system at a specific point in time. For example, a specific particle may include an array of parameters describing a specific point within the environment surrounding vehicle 2200 at a specific point in time. In specific implementations, "particle filters" and the like may be used to process sensor signals and / or signal packets to generate multiple particles describing an environment such as the environment surrounding vehicle 2200. Of course, particle filters are merely an example type of processing that can be performed on sensor signals and / or signal packets, and the scope of the subject matter is not limited thereto.

[0111] In specific implementations, a particular coordinate system may be specified, although the scope of the subject matter is not limited to any particular coordinate system. Such a coordinate system may include a three-dimensional parametric space, although other implementations may specify other numbers of dimensions. In specific implementations, individual particles may belong to a specific location within a particular three-dimensional space (e.g., the X, Y, and Z axes).

[0112] Figure 7 An example schematic block diagram of an example vehicle 2200 is depicted. As mentioned, in specific implementations, vehicle 2200 may include multiple sensors 2210. Such sensors may include, for example, image capture (e.g., cameras), radar, lidar, and / or ultrasonic sensors, to name just a few non-limiting examples. In specific implementations, sensor 2210 may generate sensor signals and / or signal packets 2215, which may be provided to and / or otherwise obtained by a control system such as control system 2220.

[0113] In specific implementations, the control system 2220 may include, for example, at least one processor, at least one memory device, and / or at least one communication interface. In specific implementations, the control system 2220 may include, for example, one or more central processing units (CPUs), neural network processors (NNPs), and / or graphics processing units (GPUs). In specific implementations, the control system 2220 may process sensor signals and / or signal packets to generate signals and / or signal packets that may affect the operation of the vehicle 2200. For example, signals and / or signal packets may be generated by the control system 2220 and may be provided to the drive system 2230 and / or otherwise obtained by the drive system. In specific implementations, the processing of sensor signals and / or signal packets by the control system 2220 may include, for example, the use of particle filters, although other implementations may utilize other signal processing algorithms, techniques, methods, etc., and the scope of the subject matter is not limited in this regard. In specific implementations, the drive system 2230 may include, for example, devices, mechanisms, systems, etc., for influencing the operation of the vehicle 2200. As mentioned, when the vehicle 2200 is traveling, additional sensor signals and / or signal packets can be acquired and processed, allowing the operation of the vehicle 2200 to be updated over time.

[0114] In the foregoing description, various aspects of the claimed subject matter have been described. For purposes of explanation, details such as quantities, systems, and / or configurations have been set forth as examples. In other instances, well-known features have been omitted and / or simplified so as not to obscure the claimed subject matter. While certain features have been illustrated and / or described herein, many modifications, substitutions, alterations, and / or equivalents will now occur to those skilled in the art. Therefore, it should be understood that the appended claims are intended to cover all modifications and / or alterations falling within the claimed subject matter.

Claims

1. A system comprising: The memory includes one or more memory devices; and A direct memory access (DMA) controller, coupled to the memory via a bus, is configured to: Receive one or more redirected write requests; The aggregation operation is performed at least in part based on one or more received redirected write requests to obtain one or more words from the memory; The one or more words obtained through the aggregation operation are converted into one or more addresses in the memory; as well as Perform a scatter operation to write data items to one or more addresses in the memory.

2. The system of claim 1, wherein the memory is a first memory, wherein the system further comprises a second memory operable as / used as a buffer, wherein the buffer is configured to store values ​​and / or states retrieved from the first memory, wherein the DMA controller is further configured to parse the values ​​and / or states in the buffer to determine at least one of the one or more addresses in the memory.

3. The system of claim 2, wherein the DMA controller is further configured to apply one or more arithmetic operations to at least one of the resolved values ​​and / or states to determine the at least one address.

4. The system of claim 1, wherein the DMA controller is further configured to interpret the one or more redirected write requests as requests for the aggregation operation.

5. The system of claim 1, wherein the DMA controller is further configured to form a request for the scatter operation based at least in part on the one or more addresses in the memory.

6. The system of claim 1, further comprising an initiator configured to initiate the one or more redirected write requests, wherein the DMA controller is configured to perform the aggregation operation in response to a signal from the initiator.

7. The system of claim 6, wherein the initiator of the one or more redirected write requests includes one or more registers of a local memory or a computing node, the local memory or the one or more registers of the computing node being configured to receive sensor measurements and / or observations as data items in a signal stream.

8. A method executed by a direct memory access (DMA) controller, the method comprising: The first aggregation operation is performed at least in part based on one or more redirected read requests; The one or more words obtained by performing the first aggregation operation will be transformed into one or more addresses; as well as A second aggregation operation is performed to forward data items located at the one or more addresses to a destination, which is determined at least in part based on the read requests from the one or more redirects.

9. The method of claim 8, wherein transforming the one or more words obtained by performing the first gather operation to the one or more addresses further comprises: Parse the values ​​and / or states in the buffer to determine the memory address.

10. The method according to claim 9, further comprising: One or more arithmetic operations are applied to the parsed value and / or state to determine the memory address.

11. The method according to claim 8, further comprising: The one or more redirected read requests are interpreted as a first request for the first aggregation operation.

12. The method according to claim 11, further comprising: A second request for the second aggregation operation is formed, at least in part, based on one or more of the addresses.

13. The method according to claim 8, further comprising: The first aggregation operation is performed in response to a signal from the initiator of the read request from the one or more redirects.

14. The method of claim 13, wherein the initiator of the one or more redirected read requests includes a process performed by the ALU to perform one or more sensor fusion operations.

15. A system comprising: The memory includes one or more memory devices; and A direct memory access (DMA) controller, coupled to the memory via a bus, is configured to: Receive one or more redirected read requests; The first aggregation operation is performed, at least in part, based on one or more received redirected read requests, to obtain one or more words from the memory; The one or more words obtained by performing the first aggregation operation will be converted into one or more addresses; as well as A second aggregation operation is performed to forward data items located at the one or more addresses to a destination, which is determined at least in part based on the read requests from the one or more redirects.

16. The system of claim 15, wherein the DMA controller is further configured to: parse values ​​and / or states in a buffer to determine the one or more addresses.

17. The system of claim 15, wherein the DMA controller is further configured to: Parse the values ​​and / or states in the buffer; and One or more arithmetic operations are applied to the parsed values ​​and / or states to determine the one or more memory addresses.

18. The system of claim 15, wherein the DMA controller is further configured to: The one or more redirected read requests are interpreted as requests for the first aggregation operation.

19. The system of claim 15, wherein the DMA controller is further configured to: form a request for the second aggregation operation based at least in part on the one or more addresses.

20. The system of claim 15, wherein the DMA controller is further configured to perform the first aggregation operation in response to a signal from the initiator of one or more redirected read requests.

Citation Information

Patent Citations

  • Multidimensional address generation for direct memory access

    US20200371978A1

  • Logical address direct memory access with multiple concurrent physical ports and internal switching

    US7877524B1