Extended memory neuromorphic component

By extending the memory neuromorphic component to intercept data streams and perform machine learning operations, the problem of transmission and processing efficiency of memory devices when dealing with large amounts of data is solved, achieving more efficient data recognition and transmission and improving the performance of the computing system.

CN115668223BActive Publication Date: 2026-06-16MICRON TECHNOLOGY INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MICRON TECHNOLOGY INC
Filing Date
2021-06-01
Publication Date
2026-06-16

Smart Images

  • Figure CN115668223B_ABST
    Figure CN115668223B_ABST
Patent Text Reader

Abstract

Systems, devices, and methods related to extended memory neuromorphic components for performing operations in memory are described. An example device can include a plurality of compute devices. Each of the compute devices can include a processing unit and a memory array. The example device can further include a communication subsystem coupled to at least one of the plurality of compute devices and to a neuromorphic component. At least one of the plurality of compute devices can receive a request from a host to perform an operation, receive an indication of data to be accessed in a memory device to perform the operation, and send the indication to the neuromorphic component to monitor the data to be accessed. The neuromorphic component can intercept the data and determine that a portion of the data should be tagged.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure generally relates to semiconductor memories and methods, and more specifically to apparatus, systems and methods for extending memory neuromorphic components. Background Technology

[0002] Memory devices are typically provided as internal semiconductor integrated circuits in computers or other electronic systems. Many different types of memory exist, including volatile and non-volatile memory. Volatile memory may require power to maintain its data (e.g., host data, erroneous data, etc.) and includes Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM), and Thyristor Random Access Memory (TRAM), among others. Non-volatile memory provides permanent data by retaining the stored data when no power is supplied and can include NAND flash memory, NOR flash memory, and resistive variable memory such as Phase Change Random Access Memory (PCRAM), Resistive Random Access Memory (RRAM), and Magnetoresistive Random Access Memory (MRAM) such as Spin Torque Transfer Random Access Memory (STT RAM), among others.

[0003] A memory device may be coupled to a host computer (e.g., a host computing device) to store data, commands, and / or instructions for use by the host computer or electronic system during operation. For example, data, commands, and / or instructions may be transferred between the host computer and the memory device during operation of the computing system or other electronic system. Attached Figure Description

[0004] Figures 1A to 1B Each of the following is a functional block diagram in the form of a computing system including a device according to several embodiments of the present disclosure, the device including a first communication subsystem, a second plurality of communication subsystems, a neuromorphic component, and a plurality of memory devices.

[0005] Figure 2A This is yet another functional block diagram of a computing system comprising a device according to several embodiments of the present disclosure, the device including a first plurality of communication subsystems, a second plurality of communication subsystems, a neuromorphic component, and a plurality of memory devices.

[0006] Figure 2B This is a functional block diagram of a computing system comprising a device according to several embodiments of the present disclosure, the device including a communication subsystem, multiple computing devices, a mailbox component, and a neuromorphic component.

[0007] Figure 3This is yet another functional block diagram of a computing system comprising a device according to several embodiments of the present disclosure, the device including a computing core, multiple communication subsystems and multiple memory devices.

[0008] Figure 4 This is a functional block diagram of a device comprising a computing core with several ports, according to several embodiments of the present disclosure.

[0009] Figure 5 The flowchart represents an example method corresponding to an extended memory architecture according to several embodiments of this disclosure. Detailed Implementation

[0010] Systems, apparatuses, and methods related to extended memory neuromorphic components for performing extended memory operations are described. An example apparatus may include multiple computing devices. Each computing device may include a processing unit configured to perform operations on blocks of data, and a memory array configured for a cache memory for each respective processing unit. The example apparatus may further include a communication subsystem (“IF”) coupled to at least one of the multiple computing devices and coupled to the neuromorphic component. At least one of the multiple computing devices may receive a request to perform an operation from a host and receive an indication of data to be accessed in the memory device to perform the operation. At least one of the multiple computing devices may send the indication to the neuromorphic component to monitor the data to be accessed in the memory device. The example apparatus may further include a neuromorphic component capable of intercepting data and determining a portion of the data that should be marked.

[0011] Extended memory architectures can transmit instructions to execute operations specified by a single address and operands, and can be executed by a computing device containing processing units and memory resources. The computing device can perform extended memory operations on data streamed via the computing device without receiving intermediate commands. In one example, the computing device is configured to receive commands to perform operations on data using the computing device's processing units and to determine that operands corresponding to the operations are stored in memory resources.

[0012] As will be further described below, data accessed by a computing device in a memory device can be intercepted by a neuromorphic component, and the intercepted data can be processed by the neuromorphic component. The neuromorphic component can process the data by performing several machine learning operations and / or neuromorphic operations on the data to determine whether the data contains a specific pattern or whether the data indicates that a specific event has occurred. For example, a data stream accessed in a memory device may contain a portion of data whose pattern indicates that a medical event has occurred in a patient. Furthermore, the portion of the data may indicate a security breach, an acute illness, an error occurring at a specific frequency, etc. Additionally, the data may contain odd similarities, which can detect patterns that identify known viruses and may be infiltrating homes through similarity detection. The data can be searched for specific similarities against all records that identify specific similarities. The data can be analyzed to identify time-sensitive similarities, etc., in the records. The data can be analyzed to determine data pattern recognition, identify and eliminate "false positives," or identify keywords that match or perform database searches. As an example, a waterfall in the United States is provided, starting with the letter "A," etc. The data portion that can be tagged (e.g., labeled, marked, marked, etc.) is identified as containing a pattern or event occurrence. A message indicating that the data portion has been tagged may be sent to a computing device and / or to a host.

[0013] Upon receiving a message indicating that a portion of the data has been marked, the computing device may send the message to a host or the data itself, which the computing device can process, to determine additional analysis and details about that portion of the data. Alternatively, the host may receive the message and process the data, determining additional analysis. In this way, the neuromorphic component can determine which portions of the data need to be analyzed and send this information to additional components (e.g., computing device, host, additional devices, etc.). By being physically closer to the memory device, meaning across fewer interconnects or buses transmitting data, the neuromorphic component can process data more efficiently and focus more quickly on identifying portions of the data and sending them downstream for further processing.

[0014] Furthermore, the computing device may be a RISC-V application processor core capable of supporting a full-featured operating system such as Linux. This particular core can be associated with applications such as Internet of Things (IoT) nodes and gateways, storage devices, and / or networking. The core may be coupled to several ports, such as memory ports, system ports, peripheral ports, and / or front-end ports. For example, the memory port may communicate with a memory device, the system port may communicate with an on-chip accelerator or "fast" SRAM, the peripheral port may communicate with an off-chip serial port, and / or the front-end port may communicate with a host interface, as will be discussed below. Figure 4 Further description.

[0015] In this manner, a first communication subsystem can be used to guide data from said specific port (e.g., a memory port of a computing device) via a communication subsystem (e.g., a multiplexer that selects a particular memory port), and transmit it via an additional communication subsystem (e.g., an interface such as an AXI interconnect interface) to a memory controller that can transfer data to memory devices (e.g., DDR memory, 3D cross-point memory, NAND memory, etc.). In one example, the AXI interconnect interface may conform to the AMBA® AXI version 4 specification from ARM®, including a subset of the AXI4-Lite control register interface.

[0016] As used herein, an "extended memory operation" refers to a memory operation that can be specified by a single address (e.g., a memory address) and an operand (e.g., a 64-bit operand). The operand may be represented as multiple bits (e.g., a bit string or a string of bits). However, embodiments are not limited to operations specified by 64-bit operands, and the operation may be specified by operands larger than 64 bits (e.g., 128 bits, etc.) or smaller than 64 bits (e.g., 32 bits). As described herein, the accessible effective address space for performing the extended memory operation is the size of the memory device or file system accessible to the host computing system or storage controller.

[0017] Extended memory operations may include those that can be performed by a processing device (e.g., by a processing device such as a core, or specifically shown as...). Figure 4 The computing core (410 in the computing device) executes instructions and / or operations. Instances of the core may include reduced instruction set computing devices or other hardware processing devices that can execute instructions to perform various computing tasks. In some embodiments, performing extended memory operations may include: retrieving data and / or instructions stored in memory resources of the computing device and / or microcode instructions stored in microcode components; performing operations within the computing device 110 (e.g., without transferring data or instructions to circuitry outside the computing device); and storing the results of the extended memory operations in memory resources of the computing device 110 or in auxiliary storage devices (e.g., in memory devices such as memory devices 116-1, 116-2, etc., as described herein). Figures 1A to 1B (See description below). In some embodiments, a particular computing device may have limited access to only a subset of microcode components. In this example, only a subset of microcode instructions may be accessible by the corresponding computing device. Access to microcode components may be based on cost or salary structures, data limitations or constraints, threshold parameters, and / or additional restrictions.

[0018] Non-limiting examples of extended memory operations may include floating-point addition, 32-bit complex operations, square root address (SQRT(addr)) operations, conversion operations (e.g., conversion between floating-point and integer formats, and / or conversion between floating-point and positional formats), normalization of data to a fixed format, absolute value operations, etc. In some embodiments, extended memory operations may include in-situ update operations performed by a computing device (e.g., where the result of the extended memory operation is stored at the address where operands for performing the extended memory operation were stored prior to the execution of the extended memory operation), and operations where previously stored data is used to determine new data (e.g., where operands stored at a specific address are used to generate new data that overwrites the specific address of the stored operands).

[0019] Therefore, in some embodiments, the execution of extended memory operations can mitigate or eliminate locking or mutual exclusion operations because the extended memory operations can be performed within a computing device, which reduces contention between multiple threads executing. Reducing or eliminating locking or mutual exclusion operations on threads during the execution of extended memory operations can improve the performance of the computing system, for example, because two or more extended memory operations can be executed in parallel within the same computing device or across computing devices that communicate with each other. Additionally, in some embodiments, the extended memory operations described herein can reduce or eliminate locking or mutual exclusion operations when transferring the results of the extended memory operations from the computing device performing the operation to the host.

[0020] Memory devices can be used to store important or critical data in computing devices and can transfer such data between hosts associated with the computing device via at least one extended memory architecture. However, as the size and amount of data stored in the memory device increase, transferring data to and from the host can become time-consuming and resource-intensive. For example, when a host requests to perform a memory operation using a large block of data, the amount of time and / or resources required for the request must increase proportionally to the size and / or amount of data associated with the block.

[0021] These effects may become more pronounced as the storage capacity of memory devices increases, because more and more data can be stored in memory devices and thus made available for memory operations. Furthermore, because data can be processed (e.g., memory operations can be performed on the data), the amount of data that can be processed also increases as the amount of data that can be stored in memory devices increases. This can lead to increased processing time and / or increased consumption of processing resources, potentially complicating the execution of certain types of memory operations. Additionally, while processing these larger volumes of data to perform extended memory operations, it may be difficult to determine whether additional processing of the data should be performed. For example, because so much data is being accessed in memory devices and processed by computing devices to perform extended memory operations, it may be difficult to identify which parts of the data should be further processed.

[0022] To mitigate these and other issues, the embodiments described herein allow the use of memory devices, one or more computing devices and / or memory arrays, and a first plurality of communication subsystems (e.g., PCIe interfaces, PCIe XDMA interfaces, AXI interconnect interfaces, etc.) and a second plurality of subsystems (e.g., interfaces such as AXI interconnects) to perform extended memory operations, thereby enabling more efficient transfer of data from computing devices to memory devices and / or from computing devices to the host, and vice versa. Furthermore, by intercepting data being transferred to computing devices to perform such extended memory operations, the neuromorphic component can analyze the intercepted data and determine whether a portion of the data contains indications of a specific data pattern or that a specific event has occurred.

[0023] In some embodiments, data can be transmitted to multiple memory devices via these communication subsystems by bypassing multiple computing devices. In some embodiments, data can be transmitted via these communication subsystems by passing through at least one of the multiple computing devices. Depending on the route of data transmission, each of the interfaces may have a unique speed. As will be further described below, data can be transmitted at a higher rate when bypassing multiple computing devices than when data passes through at least one of the multiple computing devices. Furthermore, the neuromorphic component may send a message instructing further analysis of the data to at least one of the multiple computing devices or a host in response to determining that a portion of the data contains a pattern or indication of events. The message may contain an identification of the data, such as the data residing in an address range within a memory device.

[0024] In some methods, performing memory operations may require multiple clock cycles and / or multiple function calls to the memory of a computing system (e.g., memory devices and / or memory arrays). In contrast, embodiments herein allow extended memory operations to be performed with a single function call or command. For example, embodiments herein allow memory operations to be performed with fewer function calls or commands than other methods, compared to methods that utilize at least one command and / or function call to load data to be operated on and then utilize at least one subsequent function call or command to store the operated data. Furthermore, the computing device of the computing system may receive requests to perform memory operations via a first communication subsystem (e.g., a PCIe interface, a multiplexer, a network on a control chip, etc.) and / or a second communication subsystem (e.g., an interface, an interconnect such as an AXI interconnect, etc.), and may receive data blocks from the memory device for performing the requested memory operation via both the first and second communication subsystems. Although the first and second communication subsystems are described in series, the embodiments are not limited thereto. As an example, requests for data and / or receipt of data blocks may be made solely via the first communication subsystem or solely via the second communication subsystem.

[0025] By reducing the number of function calls and / or commands used to perform memory operations, including determining whether data should be further processed, the amount of time consumed and / or the amount of computational resources consumed in performing such operations can be reduced compared to methods that require multiple function calls and / or commands to perform memory operations. Furthermore, the embodiments described herein can reduce the movement of data within memory devices and / or memory arrays because it may not be necessary to load data into a specific location before performing memory operations. This can reduce processing time compared to some methods, especially in scenarios where large amounts of data undergo memory operations.

[0026] Furthermore, the extended memory operations described herein allow for a much larger set of type fields than some other methods. For example, an instruction executed by the host to request operation using data in a memory device (e.g., a memory subsystem) may include type, address, and data fields. The instruction may be sent to at least one of a plurality of computing devices via a first communication subsystem (e.g., a multiplexer) and a second communication subsystem (e.g., an interface), and data may be transferred from the memory device via the first and / or second communication subsystems. In response to data being transferred to a computing device, a neuromorphic component may be notified that transmission is initiating. The neuromorphic component may intercept the data as it is being transferred to the computing device and analyzed. In response to data tagged by the neuromorphic component, the location of the tagged data may be sent to the computing device and / or the host.

[0027] A type field may correspond to a specific operation requested, an address may correspond to the address where the operation is stored, and a data field may correspond to the data (e.g., operands) to be used for the operation. In some methods, the type field may be limited to reads and / or writes of varying sizes, and some simple integer accumulation operations. In contrast, the embodiments herein allow for the use of a wider range of type fields because the effective address space available when performing extended memory operations can correspond to the size of the memory device. By expanding the address space available for performing operations, the embodiments herein therefore allow for a wider range of type fields, and thus, a wider range of memory operations can be performed compared to methods that do not allow for an effective address space corresponding to the size of the memory device.

[0028] In the following detailed description of this disclosure, reference is made to the accompanying drawings, which form a part of this disclosure, and the drawings illustrate by way of illustration one or more embodiments of this disclosure. These embodiments have been described in sufficient detail to enable those skilled in the art to practice embodiments of this disclosure, and it should be understood that other embodiments may be utilized and process, electrical, and structural changes may be made without departing from the scope of this disclosure.

[0029] As used herein, designators such as “X,” “Y,” “N,” “M,” “A,” “B,” “C,” and “D,” which are specific reference numerals in a drawing, indicate that a number of such specific features may be included. It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, unless the context explicitly states otherwise, the singular forms “a / an” and “the” may include both singular and plural designations. Additionally, “a number,” “at least one,” and “one or more” (e.g., a number of memory banks) may refer to one or more memory banks, while “more” is intended to refer to more than one such thing. Furthermore, the word “can / may” is used throughout this application in an permissive sense (i.e., possible, able) rather than a mandatory sense (i.e., must). The term “comprising” and its derivatives mean “including but not limited to.” Depending on the context, the term “coupled / coupling” means physically connected, directly or indirectly, or used for accessing and moving (transmitting) commands and / or data. Depending on the context, the terms “data” and “data value” are used interchangeably and may have the same meaning in this document.

[0030] The diagrams in this document follow a numbering rule, where the first one or more digits correspond to the diagram number, and the remaining digits identify the elements or components within the diagram. Similar elements or components between different diagrams can be identified by using similar digits. For example, 104 could represent... Figure 1A Component "04" in the text, and similar components can be found in Figure 2A The symbol 204 is used in this document. A single element number may generally refer to a group or number of similar elements or components. For example, multiple reference elements 106-1, 106-2, 106-3 may be generally referred to as 106. As will be understood, elements shown in the various embodiments herein may be added, interchanged, and / or removed to provide multiple additional embodiments of this disclosure. Furthermore, the scale and / or relative dimensions of the elements provided in the figures are intended to illustrate certain embodiments of this disclosure and should not be considered limiting.

[0031] Figure 1A and 1B Each of the following is a functional block diagram of a computing system 100 including a device 104, according to several embodiments of the present disclosure. The device includes a first IF (“interface”) 108, a second plurality of IFs (“interfaces”) including second IF A 106-1, second IF B 106-2, second IF C 106-3 (hereinafter collectively referred to as the second plurality of IFs 106), and a plurality of memory devices 116, ..., 116-N. As used herein, “device” may refer to, but is not limited to, any one or a combination of various structures, such as a circuit or circuit system, one or more dies, one or more modules, one or more devices, or one or more systems. Figures 1A to 1B In the embodiments described herein, memory devices 116-1, ..., 116-N may include one or more memory modules (e.g., dual data rate (DDR) memory, three-dimensional (3D) cross-point memory, NAND memory, single in-line memory module, dual in-line memory module, etc.). Memory devices 116-1, ..., 116-N may include volatile memory and / or non-volatile memory. In several embodiments, memory devices 116-1, ..., 116-N may include multi-chip devices. Multi-chip devices may include several different memory types and / or memory modules. For example, a memory system may contain non-volatile or volatile memory on any type of module.

[0032] Memory devices 116-1, ..., 116-N may provide main memory for computing system 100 or may be used as additional memory or storage devices throughout computing system 100. Each memory device 116-1, ..., 116-N may include one or more arrays of memory cells, such as volatile and / or non-volatile memory cells. For example, the array may be a flash array with a NAND architecture. Embodiments are not limited to a particular type of memory device. For example, memory devices may include RAM, ROM, DRAM, SDRAM, PCRAM, RRAM, and flash memory, etc.

[0033] In embodiments where memory devices 116-1, ..., 116-N include non-volatile memory, memory devices 116-1, ..., 116-N may be flash memory devices such as NAND or NOR flash memory devices. However, embodiments are not limited thereto, and memory devices 116-1, ..., 116-N may include other non-volatile memory devices such as non-volatile random access memory devices (e.g., NVRAM, ReRAM, FeRAM, MRAM, PCM), "emerging" memory devices such as 3-D crosspoint (3D XP) memory devices, or combinations thereof. A 3D XP array of non-volatile memory can combine a stackable cross-grid data access array to perform bit storage based on variations in volume resistance. Furthermore, compared to many flash-based memories, 3D XP non-volatile memory allows for in-situ write operations, where non-volatile memory cells can be programmed without pre-erasing them.

[0034] like Figures 1A to 1B As illustrated herein, multiple cores (“COREs”) 110-1, 110-2 (collectively referred to below as multiple computing devices 110) that may be referred to as “computing devices” in alternative embodiments may be coupled to a first IF (e.g., a Peripheral Component Interconnect High Speed ​​(PCIe) interface, a PCIe XDMA interface, etc.) 108. The first IF 108 may include circuitry and / or logic configured to allocate and deallocate resources to or to host 102 during the execution of the operations described herein. For example, the circuitry and / or logic may communicate data requests or allocate and / or deallocate resources to the computing device 110 during the execution of extended memory operations described herein.

[0035] The first IF 108 may be directly coupled to at least the second IF A 106-1 (e.g., an interface such as an interconnect interface) of the second plurality of IFs 106. Each of the second plurality of IFs 106 may be coupled to a corresponding one of the controller 112, accelerator 114, neuromorphic component 118, and peripheral component 120. In one example, the first IF A 106-1 of the second plurality of IFs 106 may be coupled to the controller 112. In this example, the second IF A 106-1 may be a memory interface. The controller 112 may be coupled to a plurality of memory devices 116-1, ..., 116-N via a plurality of channels 107-1, ..., 107-N.

[0036] Secondly, in this example, and as Figure 1AAs described, the second IF B 106-2 of the second plurality of IFs 106 can be coupled to the accelerator 114 and the neuromorphic component (“NM component”) 118. The on-chip accelerator 114 can be used to perform several positional operations and / or to communicate with internal SRAM on a field-programmable gate array (FPGA) containing the described components. As an example, the components of device 104 may be on an FPGA. The neuromorphic component 118 can be used to perform several neuromorphic or machine learning operations on data intercepted during transfer from memory device 116.

[0037] In alternative solutions, such as Figure 1B As described, the second IF A 106-1 of the second plurality of IFs 106 can be coupled to the neuromorphic component 118. In this way, the neuromorphic component 118 can be located physically closer to the data stream intercepted by the neuromorphic component 118 and can communicate more directly with the memory controller 112. A unidirectional arrow coupling the neuromorphic component 118 to the second IF A 106-1 indicates an AXI interconnect or bus, where the neuromorphic component 118 is in control or is the master device for data transmission, and in some instances, can provide DDR memory access. In this way, the neuromorphic component 118 can send messages or indications notifying the computing device 110 that data has been marked and that the data stream is intercepted when the data is transmitted to the computing device 110. A bidirectional arrow coupling the neuromorphic component 118 and the second IF A 106-1 indicates an AXI interconnect or bus, where the second IF A 106-1 is in control or is the master device for data transmission.

[0038] In embodiments where the neuromorphic component 118 is closer to the memory controller 112 and coupled to the second IF A 106-1, such as Figure 1B As explained, the data width can be greater than the data width of the neuromorphic component 118 coupled to the second IF B106-2, such as... Figure 1A As explained in the text. As an example, Figure 1B One embodiment may have a data width of 128 bits, while Figure 1A Implementations may have a 64-bit data width. Furthermore, Figure 1B The embodiments can be compared to Figure 1A The embodiments have more bits for writing the strobe. As an example, Figure 1B The write strobe can be 16 bits, while Figure 1A The write strobe can be 8 bits. As another example, Figure 1B An embodiment may have a 38-bit address space, an 8-bit ID nominal space, and a 4-bit Quality of Service (QoS). As an example, Figure 1A An embodiment may have a 32-bit address space, a 4-bit ID nominal space, and a 4-bit QoS.

[0039] Neuromorphic component 118 can perform neuromorphic operations using a neural network. Some neuromorphic systems may use resistive RAM (RRAM), such as a PCM device or a self-selected memory device, to store synaptic values ​​(or weights) (e.g., synaptic weights). Such variable resistive memory may contain memory cells configured to store multiple levels and / or may have wide sensing windows. Such memory may be configured to perform training operations controlled by impulses (e.g., spike impulses). Such training operations may include spike temporally dependent plasticity (STDP). STDP may take the form of Hebbian learning caused by the correlation between spike impulses transmitted between nodes (e.g., neurons). STDP may be an example of a process that adjusts the strength of connections between nodes (e.g., neurons).

[0040] In neural networks, synaptic weights refer to the strength or magnitude of the connection between two nodes (e.g., neurons). The nature and content of the information transmitted through a neural network can be based in part on the characteristics of the connections representing synapses formed between nodes. For example, the characteristics of a connection can be synaptic weights. Neuromorphic systems and devices can be designed, in particular, to achieve results that may be impossible with conventional computer architectures. For example, neuromorphic systems can be used to obtain results more commonly associated with biological systems, such as learning, visual processing, auditory processing, advanced computing, or other processes or combinations thereof. As an example, synaptic weights and / or connections between at least two memory units can represent a synapse, or the strength or degree of a synaptic connection, and are associated with corresponding short-term or long-term connections that correspond to the biogenesis of short-term and long-term memory. A series of neural network operations can be performed to increase the synaptic weights between at least two memory units in a short-term or long-term manner, depending on the type of memory units used, as described below.

[0041] Learning events in neural network operations can represent the causal propagation of spikes between neurons, thereby increasing the weights of connecting synapses. This increase in synaptic weights can be represented by an increase in the conductivity of memory cells. Variable resistive memory arrays (e.g., 3D crosspoint or self-selected memory (SSM) arrays) can simulate synaptic arrays, each characterized by either weights or memory cell conductivity. Higher conductivity results in larger synaptic weights and a greater degree of memory learning. Short-term memory learning can be rapid and / or reversible, where the simulated weights of synapses are enhanced, i.e., their conductivity increases through a reversible mechanism. Long-term memory learning can be slow and / or irreversible, where, for a specific state (e.g., SET or RESET), cell conductivity increases irreversibly, leading to unforgettable memories from longer, experience-dependent learning.

[0042] This document describes neuromorphic operations that can be used to simulate neurobiological architectures that may exist in the nervous system and / or store synaptic weights associated with long-term and short-term learning or relationships. A memory device may include a memory array comprising a first portion and a second portion. The first portion of the memory array may include a first plurality of variable-resistance memory cells, and the second portion may include a second plurality of variable-resistance memory cells. The second portion can be degraded by forced write cycles. The degradation mechanism may include damage to a chalcogenide material. In some embodiments comprising memory cells composed of materials other than chalcogenide materials, the degradation mechanism may include thermal relationships between memory cells, control via control gate coupling between memory cells, charge loss corresponding to memory cells, temperature-induced signal or threshold loss, etc.

[0043] Data intercepted by the second IF B 106-2 when accessed by the neuromorphic component 118 during data access by the computing device 110-1 in the memory device 116-1 (e.g.) Figure 1A (as described in the document) or data intercepted via the second IF A 106-1 (such as...) Figure 1B These neuromorphic operations are performed (as described above). When the neuromorphic component 118 is intended to be used to detect a specific event represented by data or a pattern in the data, the neuromorphic component 118 may receive a large amount of data for training. Using the neural network processing described above, the large amount of data can train the neuromorphic component 118 to detect events or patterns, and in doing so becomes more efficient and effective.

[0044] In addition, such as Figure 1A and 1B As described, the second IF C 106-3 of the second plurality of IF106 can be coupled to peripheral component 120. Peripheral component 120 can be either a general purpose input / output (GPID) LED or a general purpose asynchronous transceiver (UART). The GPID LED can be further coupled to additional LEDs, and the UART can be further coupled to a serial port. The second plurality of IF106 can be coupled to each corresponding component via several AXI buses. The second IF C (106-3) of the second plurality of IF106 can be used to transmit data off-chip via peripheral component 120 or an off-chip serial port.

[0045] The host 102 may be a host system, such as a personal laptop computer, desktop computer, digital camera, smartphone, memory card reader, and / or device with Internet of Things (IoT) capabilities, as well as various other types of host systems, and may include memory access devices, such as processors (or processing devices). Those skilled in the art will understand that "processor" can mean one or more processors, such as a parallel processing system, several coprocessors, etc. The host 102 may include a system motherboard and / or backplane, and may include several processing resources (e.g., one or more processors, microprocessors, or some other type of control circuitry). In some embodiments, the host may include a host controller 101, which may be configured to control at least some operations of the host 102 by, for example, generating commands and transmitting commands to the host controller to cause operations such as memory expansion operations to be performed. The host controller 101 may include circuitry (e.g., hardware) configurable to control at least some operations of the host 102. For example, the host controller 101 may be an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other combination of circuitry and / or logic configured to control at least some operations of the host 102. The host 102 may communicate with the first IF 108 via communication paths 103 / 105.

[0046] System 100 may include a separate integrated circuit or host 102, with a first IF 108, multiple second IFs 106, a controller 112, an on-chip accelerator 114, SRAM 233, peripheral components 120, and / or memory devices 116-1, ..., 116-N on the same integrated circuit. System 100 may be, for example, a server system and / or a high-performance computing (HPC) system and / or a portion thereof. Although Figure 1A and 1B The examples shown illustrate systems with a von Neumann architecture, but embodiments of this disclosure can be implemented in non-von Neumann architectures that may not include one or more components typically associated with a von Neumann architecture (e.g., CPU, ALU, etc.).

[0047] Controller 112 may be configured to request data blocks from one or more of memory devices 116-1, ..., 116-N, and cause multiple computing devices 110 to perform operations (e.g., extended memory operations) on the data blocks. The operations may be performed to evaluate functionality that can be specified by a single address and one or more operands associated with the data block. Controller 112 may be further configured to store the results of the extended memory operations in one or more of the computing devices 110-1, ..., 110-N via a second plurality of IFs 106 and / or to transmit the results to a channel (e.g., communication paths 103 and / or 105) and / or host 102.

[0048] In some embodiments, the second plurality of IFs 106 may request remote commands, initiate DMA commands, send read / write locations, and / or send a start function execution command to one of the plurality of computing devices 110. In some embodiments, the second plurality of IFs 106 may request the copying of data blocks from a buffer of computing device 110 to a buffer of memory controller 112 or memory device 116. Conversely, one of the second plurality of IFs 106 may request the copying of data blocks from a buffer of memory controller 112 or memory device 116 to a buffer of computing device 110. The second plurality of IFs 106 may request the copying of data blocks from a buffer of host 102 to computing device 110, or vice versa, request the copying of data blocks from computing device 110 to host 102. The second plurality of IFs 106 may request the copying of data blocks from a buffer of memory controller 112 or memory device 116 to a buffer of host 102. Conversely, the second plurality of IFs 106 may request the copying of data blocks from a buffer of host 102 to a buffer of memory controller 112 or memory device 116. Furthermore, in some embodiments, a second plurality of IF106 may request the execution of commands from a host on computing device 110. A second plurality of IF106 may request the execution of commands from computing device 110 on additional computing device 110. A second plurality of IF106 may request the execution of commands from memory controller 112 on computing device 110. In some embodiments, the second plurality of IF106 may include at least a portion of a controller (not described).

[0049] In some embodiments, the second plurality of IFs 106 may transfer data blocks (e.g., direct memory access (DMA) data blocks) from computing device 110 to memory device 116 (via memory controller 112), or vice versa. The second plurality of IFs 106 may also transfer data blocks (e.g., DMA blocks) from computing device 110 to host 102, or vice versa. Furthermore, the second plurality of IFs 106 may transfer data blocks (e.g., DMA blocks) from host 102 to memory device 116, or vice versa.

[0050] In some embodiments, a second plurality of IFs 106 may receive outputs (e.g., data on which extended memory operations have been performed, indications of tagged data received from neuromorphic component 118, indications of what type of data is within the tagged data, etc.) from computing devices 110-1, ..., 110-N to controller 115 of device 104 and / or host 102, and vice versa. For example, the second plurality of IFs 106 may be configured to receive data that has undergone extended memory operations via computing devices 110-1, ..., 110-N, and to transmit data corresponding to the results of the extended memory operations to controller 115 and / or host 102. Furthermore, for example, the second plurality of IFs 106 may be configured to receive data that has been tagged by neuromorphic component 118 as corresponding to an event or pattern being sent for further processing.

[0051] In some embodiments, the second plurality of IFs 106 may include at least a portion of the controller 115. For example, the second plurality of IFs 106 may include a circuitry including the controller 115 or a portion thereof. As an example, the controller 115 may manage additional operations and communications within a plurality of computing devices 210 and / or controllable devices 104. In some instances, the controller 115 may manage communication between computing devices 110-1 and 110-2.

[0052] The memory controller 112 may be a "standard" or "dumb" memory controller. For example, the memory controller 112 may be configured to perform simple operations on memory devices 116-1, ..., 116-N, such as copying, writing, reading, error correction, etc. However, in some embodiments, the memory controller 112 does not perform processing (e.g., operations to manipulate data) on the data associated with the memory devices 116-1, ..., 116-N. For example, the memory controller 112 may cause read and / or write operations to be performed to read data from or write data to the memory devices 116-1, ..., 116-N via communication paths 107-1, ..., 107-N, but the memory controller 112 may not perform processing on the data read from or written to the memory devices 116-1, ..., 116-N. In some embodiments, the memory controller 112 may be a non-volatile memory controller, but embodiments are not limited thereto.

[0053] In some embodiments, the first AXI bus that couples the first IF 108 to a plurality of second IF 106s, forming a first IF A 106-1, is an AXI bus capable of transmitting data faster than a second AXI bus that couples the first IF 108 to the computing device 110-1. For example, the first AXI bus can transmit at a rate of 300 MHz, while the second AXI bus can transmit at a rate of 100 MHz. Furthermore, the first AXI bus can be an AXI bus capable of transmitting data faster than a third AXI bus that couples the computing device 110-1 to one of the second plurality of IF 106s.

[0054] Figure 1A and 1B Each embodiment may include additional circuitry not described to avoid obscuring the embodiments of this disclosure. For example, device 104 may include address circuitry to latch address signals provided on I / O connections via I / O circuitry. Address signals can be received and decoded by row decoders and column decoders to access memory devices 116-1, ..., 116-N. Those skilled in the art will understand that the number of address input connections may depend on the density and architecture of memory devices 116-1, ..., 116-N.

[0055] In some embodiments, data (e.g., files) can be selectively stored or mapped into the computing device 110. Figure 1A and 1B The computing system 100 shown performs extended memory operations. Data may be selectively stored in the address space of the computing memory. In some embodiments, data may be selectively stored or mapped in computing device 110 in response to a command received from host 102. In embodiments where a command is received from host 102, the command may be transmitted to computing device 110 via an interface associated with host 102 (e.g., communication paths 103 and / or 105) and via a first IF 108 and a second plurality of IFs 106, respectively. Communication paths 103 / 105, the first IF 108, and the second plurality of IFs 106 may be a peripheral component interconnect high-speed (PCIe) bus, a dual data rate (DDR) interface, an interconnect interface (e.g., an AXI interconnect interface), a multiplexer (mux), or other suitable interface or bus. However, the embodiments are not limited thereto.

[0056] In a non-limiting instance where data (e.g., data intended for performing extended memory operations) is mapped to computing device 110, host controller 101 may transmit commands to computing device 110 to initiate the execution of extended memory operations using the data mapped to computing device 110. In some embodiments, host controller 101 may look up an address (e.g., a physical address) corresponding to the data mapped to computing device 110 and determine, based on the address, which computing device (e.g., computing device 110-1) the address (and therefore the data) is mapped to. Commands may then be transmitted to the computing device (e.g., computing device 110-1) containing the address (and therefore the data).

[0057] In some embodiments, the data may be 64-bit operands, but embodiments are not limited to operands with a specific size or length. In embodiments where the data is a 64-bit operand, once the host controller 101 transmits a command to initiate the execution of an extended memory operation to the correct computing device (e.g., computing device 110-1) based on the address of the stored data, the computing device (e.g., computing device 110-1) can use the data to perform the extended memory operation.

[0058] In some embodiments, computing device 110 may be individually addressable across adjacent address spaces, which facilitates the execution of extended memory operations as described herein. That is, the address to which the stored data or the data is mapped may be unique for all computing devices 110, such that when host controller 101 looks up an address, the address corresponds to a location in a particular computing device (e.g., computing device 110-1).

[0059] For example, a first computing device 110-1 may have a first set of addresses associated with it, a second computing device 110-2 may have a second set of addresses associated with it, a third computing device 110-3 may have a third set of addresses associated with it, and so on up to the nth computing device (e.g., computing device 110-N), which may have an nth set of addresses associated with it. That is, the first computing device 110-1 may have an address set from 0000000 to 0999999, the second computing device 110-2 may have an address set from 1000000 to 1999999, the third computing device 110-3 may have an address set from 2000000 to 2999999, and so on. It should be understood that these numbers of addresses are illustrative only and not limiting, and may depend on the architecture and / or size (e.g., storage capacity) of the computing device 110.

[0060] As a non-limiting example where the extended memory operation includes a floating-point addition-accumulation operation, computing device 110 may treat the destination address as a floating-point number, add the floating-point number to an argument stored at an address on computing device 110, and store the result back to the original address. For example, when host controller 101 (or device controller 115, not shown) initiates the execution of a floating-point addition-accumulation extended memory operation, the address of computing device 110 looked up by the host (e.g., the address to which data in the computing device is mapped) may be treated as a floating-point number, and the data stored at the address may be treated as an operand used to perform the extended memory operation. In response to receiving a command to initiate the extended memory operation, computing device 110 to which the data (e.g., the operand in this example) is mapped may perform an addition operation to add the data to the address (e.g., the value of the address), and store the result of the addition back to the original address on computing device 110.

[0061] As described above, in some embodiments, performing such extended memory operations may require only a single command (e.g., a request command) to be transmitted from host 102 (e.g., from host controller 101) to memory device 116 or from controller 115 to computing device 110. This reduces the amount of time consumed during operation execution compared to some prior methods, e.g., for multiple commands traversing communication paths 103, 105 and / or for data (e.g., operands moving from one address to another within computing device 110). In this way, computing device 110 can instead utilize resources to perform the operation.

[0062] Furthermore, compared to methods that require retrieving and loading operands from different locations before performing an operation, the execution of extended memory operations according to this disclosure can further reduce processing power or processing time because data mapped to the computing device 110 in which the extended memory operation is performed can be used as operands for the extended memory operation, and / or the addresses to which the data is mapped can be used as operands for the extended memory operation. In other words, at least because the embodiments herein allow skipping operand loading, the performance of the computing system 100 can be improved compared to methods that load operands and subsequently store the results of operations performed between operands.

[0063] Furthermore, in some embodiments, because extended memory operations can be performed within computing device 110 using addresses and data stored in those addresses, and in some embodiments, because the results of extended memory operations can be stored back in the original addresses, locking or mutex operations can be relaxed or eliminated during the execution of extended memory operations. Reducing or eliminating locking or mutex operations on threads during the execution of extended memory operations can improve the performance of computing system 100 because extended memory operations can be performed in parallel within the same computing device 110 or across two or more computing devices 110.

[0064] In some embodiments, an effective mapping of data in computing device 110 may include a base address, a fragment size, and / or a length. The base address may correspond to an address in computing device 110 where the data mapping is stored. The fragment size may correspond to the amount of data that computing system 100 can process (e.g., in bytes), and the length may correspond to the number of bits corresponding to the data. It should be noted that in some embodiments, data stored in computing device 110 may not be cached on host 102. For example, an extended memory operation may be performed entirely within computing device 110 without interfering with or otherwise transferring data to or from host 102 during the execution of the extended memory operation.

[0065] In a non-limiting instance where the base address is 4096, the segment size is 1024, the length is 16,386, and the mapped address is 7234, a third computing device (e.g., one of a plurality of computing devices 110) may be used. Figure 2A In the third segment of the computing device (210-3), host 102 and / or the first IF 108 and the second plurality of IF 106 can forward commands (e.g., requests) to perform an extended memory operation on the third computing device (210-3). The third computing device (210-3) can determine whether the data is stored in a mapped address in the memory of the third computing device (210-3). If the data is stored in a mapped address (e.g., an address in the third computing device (210-3)), then the third computing device (210-3) can use the data to perform the requested extended memory operation and can store the result of the extended memory operation back to the address where the data was originally stored.

[0066] In some embodiments, the computing device 110 containing the data requested for performing an extended memory operation may be determined by the host controller 101 and / or the first IF 108 and / or the second plurality of IFs 106. For example, a portion of the total address space available to all computing devices 110 may be allocated to each respective computing device. Therefore, the host controller 101 and / or the first IF 108 and / or the second plurality of IFs 106 may have information corresponding to which portions of the total address space correspond to which computing devices 110, and may thus direct the relevant computing device 110 to perform the extended memory operation. In some embodiments, the host controller 101 and / or the second plurality of IFs 106 may store the address (or address range) corresponding to the respective computing device 110 in a data structure such as a table, and direct the execution of the extended memory operation to the computing device 110 based on the address stored in the data structure.

[0067] However, the embodiments are not limited to this, and in some embodiments, the host controller 101 and / or the second plurality of IFs 106 may determine the size of the memory resources (e.g., the amount of data) and, based on the size of the memory resources associated with each computing device 110 and the total address space available to all computing devices 110, determine which computing device 110 stores the data to be used for performing extended memory operations. In embodiments where the host controller 101 and / or the second plurality of IFs 106 determine the computing device 110 storing the data to be used for performing extended memory operations based on the total address space available to all computing devices 110 and the amount of memory resources available to each computing device 110, it is possible to perform extended memory operations across multiple non-overlapping portions of the computing device memory resources.

[0068] Continuing with the above example, if no data is found at the requested address, then the third computing device (210-3) may request, as described herein, [further details needed]. Figure 2A More detailed description of the data, and once the data is loaded into the address of the third computing device (210-3), an extended memory operation is performed. In some embodiments, once the computing device (e.g., the third computing device 210-3 in this example) completes the extended memory operation, and / or may notify the host 102 and / or the result of the extended memory operation may be transmitted to the memory device 116 and / or the host 102.

[0069] In some embodiments, memory controller 112 may be configured to retrieve data blocks from memory devices 116-1, ..., 116-N coupled to device 104 in response to a request from a controller or host 102 of device 104. Memory controller 112 may then cause the data blocks to be transferred to computing devices 110-1, ..., 110-N and / or the device controller. When these data blocks are transferred to computing device 110, neuromorphic component 118 may intercept the data and simultaneously analyze it in terms of the occurrence of patterns or events. Similarly, memory controller 112 may be configured to receive data blocks from computing device 110 and / or controller 115. Memory controller 112 may then cause the data blocks to be transferred to memory devices 116 coupled to device 104.

[0070] The size of the data block may be approximately 4 kilobytes (but embodiments are not limited to this specific size), and it may be streamed by computing devices 110-1, ..., 110-N in response to one or more commands generated by controller 115 and / or host, and transmitted via a second plurality of IF 106. In some embodiments, the data block may be a 32-bit, 64-bit, 128-bit, or other data word or data block, and / or the data block may correspond to an operand for performing extended memory operations.

[0071] For example, as combined in this article Figure 2A In more detail, since computing device 110 can perform an extended memory operation (e.g., a process) on a second data block in response to the completion of an extended memory operation on a previous data block, data blocks can be continuously streamed through computing device 110 while data blocks are processed through computing device 110. In some embodiments, data blocks can be processed through computing device 110 in a streaming manner without intervention commands from controller and / or host 102. That is, in some embodiments, controller 115 (or host 102) can issue commands to cause computing device 110 to process data blocks received therefrom, and data blocks subsequently received by computing device 110 can be processed without additional commands from controller.

[0072] In some embodiments, processing a data block may include performing extended memory operations using the data block. For example, computing devices 110-1, ..., 110-N may perform extended memory operations on the data block in response to commands from the controller via a second plurality of IF106 to evaluate one or more functions, remove unwanted data, extract relevant data, or otherwise combine the execution of extended memory operations with the use of the data block.

[0073] In a non-limiting instance where data (e.g., data intended for performing extended memory operations) is mapped to one or more of computing devices 110, the controller may transmit a command to computing device 110 to initiate the execution of an extended memory operation using the data mapped to computing device 110. In some embodiments, the controller 115 may look up an address (e.g., a physical address) corresponding to the data mapped to computing device 110 and determine, based on the address, which computing device (e.g., computing device 110-1) the address (and therefore the data) is mapped to. The command may then be transmitted to the computing device (e.g., computing device 110-1) containing the address (and therefore the data). In some embodiments, the command may be transmitted to the computing device (e.g., computing device 110-1) via a second plurality of IF 106.

[0074] The controller 115 (or host) may be further configured to send commands to the computing device 110 to allocate and / or deallocate resources available for use by the computing device 110 when performing extended memory operations using data blocks. In some embodiments, allocating and / or deallocating resources available for use by the computing device 110 may involve selectively enabling some computing devices 110 while selectively disabling others. For example, if fewer than the total number of computing devices 110 are required to process the data blocks, the controller 115 may send commands to the computing devices 110 intended for processing the data blocks so that only those computing devices 110 that are deemed suitable are able to process the data blocks.

[0075] In some embodiments, the controller 115 may be further configured to send commands to synchronize the execution of operations performed by the computing device 110, such as memory expansion operations. For example, the controller 115 (and / or the host) may send commands to a first computing device 110-1 to cause the first computing device 110-1 to perform a first memory expansion operation, and the controller 115 (or the host) may send commands to a second computing device 110-2 to use the second computing device to perform a second memory expansion operation. Synchronizing the execution of operations performed by the computing device 110, such as memory expansion operations, via the controller 115 may further include causing the computing device 110 to perform specific operations at a specific time or in a specific order.

[0076] As described above, data generated by performing an extended memory operation may be stored in the computing device 110 at the original address where the data was stored before the extended memory operation was performed. However, in some embodiments, the data block generated by performing the extended memory operation may be converted into a logical record after the extended memory operation is performed. A logical record may contain data records independent of their physical location. For example, a logical record may be a data record pointing to an address (e.g., location) in at least one of the computing devices 110 that stores physical data corresponding to the execution of the extended memory operation.

[0077] In some embodiments, the result of an extended memory operation may be stored in the same address in the computing device memory as the address where the data was stored before the extended memory operation was performed. However, embodiments are not limited to this, and the result of the extended memory operation may be stored in the same address in the computing device memory as the address where the data was stored before the extended memory operation was performed. In some embodiments, logical records may point to these address locations such that the result of the extended memory operation can be accessed from the computing device 110 and transferred to a circuit system outside the computing device 110 (e.g., to a host).

[0078] In some embodiments, controller 115 may receive data blocks directly from memory controller 112 and / or send data blocks to the memory controller. This allows controller 115 to transfer data blocks not processed by computing device 110 (e.g., data blocks not used during the execution of extended memory operations) to and from memory controller 112.

[0079] For example, if controller 115 receives an unprocessed data block from host 102 coupled to device 104 that is to be stored by memory device 116 coupled to device 104, then controller 115 may cause the unprocessed data block to be transferred to memory controller 112, which may in turn cause the unprocessed data block to be transferred to memory device coupled to device 104.

[0080] Similarly, if the host requests an unprocessed (e.g., complete) data block (e.g., a data block not processed by computing device 110), then memory controller 112 may cause the unprocessed data block to be transferred to controller 115, which may then transfer the unprocessed data block to the host.

[0081] Figure 2AThis is a functional block diagram of a computing system 200 comprising a device 204 according to several embodiments of the present disclosure, the device including a first plurality of IFs 208, a second plurality of IFs 206, a neuromorphic component 218, and a plurality of memory devices 216. As used herein, "device" may refer to, but is not limited to, any one or a combination of various structures, such as a circuit or circuit system, one or more dies, one or more modules, one or more devices, or one or more systems. Figure 2A In the embodiments described herein, memory devices 216-1, ..., 216-N may include one or more memory modules (e.g., dual data rate (DDR) memory, three-dimensional (3D) cross-point memory, NAND memory, single in-line memory module, dual in-line memory module, etc.). Memory devices 216-1, ..., 216-N may include volatile memory and / or non-volatile memory. In several embodiments, memory devices 216-1, ..., 216-N may include multi-chip devices. Multi-chip devices may include several different memory types and / or memory modules. For example, a memory system may contain non-volatile or volatile memory on any type of module.

[0082] Memory devices 216-1, ..., 216-N may provide main memory for computing system 200 or may be used as additional memory or storage devices throughout computing system 100. Each memory device 216-1, ..., 216-N may include one or more arrays of memory cells, such as volatile and / or non-volatile memory cells. For example, the array may be a flash array with a NAND architecture. Embodiments are not limited to a specific type of memory device. For example, memory devices may include RAM, ROM, DRAM, SDRAM, PCRAM, RRAM, and flash memory, etc.

[0083] In embodiments where memory devices 216-1, ..., 216-N include non-volatile memory, memory devices 216-1, ..., 216-N may be flash memory devices such as NAND or NOR flash memory devices. However, embodiments are not limited thereto, and memory devices 216-1, ..., 216-N may include other non-volatile memory devices such as non-volatile random access memory devices (e.g., NVRAM, ReRAM, FeRAM, MRAM, PCM), "emerging" memory devices such as 3-D crosspoint (3D XP) memory devices, or combinations thereof. A 3D XP array of non-volatile memory can combine a stackable cross-grid data access array to perform bit storage based on changes in volume resistance. Furthermore, compared to many flash-based memories, 3D XP non-volatile memory allows for in-situ write operations, where non-volatile memory cells can be programmed without pre-erasing them.

[0084] like Figure 2A As described, host 202 may include host controller 201. Host 102 may communicate with a first IF A 208-1 of a first plurality of IFs via channels 203 / 205. The first IF A 208-1 may be a PCIe interface. The first IF A 208-1 may be coupled to a first IF B 208-2 of a first plurality of IFs 208. The first IF B 208-2 may be a PCIe XDMA interface. The first IFB 208-2 may be coupled to a first IF C 208-3 of a first plurality of IFs 208. The first IF C 208-3 may be coupled to each of a plurality of computing devices 210.

[0085] Furthermore, the first IF A 208-2 may be coupled to a first IF D 208-4 among a first plurality of IFs 208. The first IF D 208-4 may be a message passing interface (MPI). For example, the host 202 may send a message received by and held by the first IF D 208-4 until the computing device 210 or an additional interface retrieves the message to determine subsequent actions. Possible subsequent actions may include performing a specific function on a particular computing device 210, setting a reset vector for the external interface 231, or reading / modifying a location in SRAM 233. Alternatively, the computing device 210 may write the message received by IF 208-4 for access by the host 202. The host controller 201 may read the message from the first IF D 208-4 and transfer data to or from a location in the device (e.g., SRAM 233, registers or memory devices 216 in the computing device 210, and / or host memory (e.g., registers, cache memory, or main memory)). The first IF D 208-4 may also include a host register and / or a reset vector for controlling the selection of an external interface, such as external interface 231.

[0086] In at least one instance, the external interface 231 may be a JTAG interface 231 and the first IF D 208-4 may be used for JTAG selection. In some embodiments, the JTAG interface 231 (or some interface external to device 204) may be coupled to computing device 210. Additional data may be provided to device 204 from a device external to device 204 via the JTAG interface 231.

[0087] like Figure 2AAs described herein, multiple computing devices 210-1, 210-2, 210-3, 210-4, and 210-5 (hereinafter collectively referred to as multiple computing devices 210) may be coupled to SRAM 233. The multiple computing devices 210 may be coupled to SRAM 233 via a bus matrix. Furthermore, the multiple computing devices 210 may be coupled to additional multiple communication subsystems (e.g., multiplexers) 235-1, 235-2, and 235-3. The first multiple IFs 208 and / or the additional multiple communication subsystems 235 may include circuitry and / or logic configured to allocate and deallocate resources to computing devices 210 during the execution of the operations described herein. For example, the circuitry and / or logic may allocate and / or deallocate resources to computing devices 210 during the execution of the extended memory operations described herein. In one embodiment, SRAM 233 may be coupled to host 202 via a first IF C 208-3 (or via other IFs such as first IF A 208-2 and first IF D 208-4, connections not shown for ease of illustration). In this way, host 202 can provide instructions to perform specific operations (e.g., search, sort, etc.) via SRAM 233.

[0088] In addition, such as Figure 2A As illustrated herein, multiple computing devices 210 may (via SRAM 233) each be coupled to additional communication subsystems (e.g., multiplexers) 235-1, 235-2, and 235-3. The additional communication subsystem 235 may include circuitry and / or logic configured to allocate and deallocate resources to computing devices 210 during the execution of the operations described herein. For example, the circuitry and / or logic may allocate and / or deallocate resources to computing devices 210 during the execution of the extended memory operations described herein. While the examples described above include SRAM (e.g., multiplexers) coupled to each of the computing devices... Figure 2A The cache memory (such as SRAM) may be located in multiple locations, such as outside of device 204, inside device 204, etc.

[0089] Additional communication subsystems 235 may be coupled to a second plurality of IFs (e.g., interfaces such as interconnect interfaces) namely second IF A 206-1, second IF B 206-2, and second IF C 206-3. Each of the second plurality of IFs 206 may be coupled to a corresponding one of the controller 212, accelerometer 214, neuromorphic component 218, SRAM 217, and peripheral component 221. In one example, the second plurality of IFs 206 may be coupled to the corresponding controller 212, accelerometer 214, neuromorphic component 218, SRAM 217, and / or peripheral component 221 via several AXI buses.

[0090] As described, a second IF A (206-1) of the second plurality of IFs 206 may be coupled to a controller (e.g., a memory controller) 212. The controller 212 may be coupled to several memory devices 216-1, ..., 216-N via several channels 207-1, ..., 207-N. A second IF B (206-2) of the second plurality of IFs 206 may be coupled to an accelerator 214, a neuromorphic component 218, and an SRAM 217. The accelerator 214 may be coupled to a logic circuit system 213. The logic circuit system 213 may be on the same field-programmable gate array (FPGA) as the computing device 210, the first plurality of IFs 208, the second plurality of IFs 206, etc. The logic circuit system 213 may include an on-chip accelerator for performing several positional operations and / or for communicating with the internal SRAM (218) on the FPGA. A second IF C (206-3) of the second plurality of IFs 206 may be used to transfer data off-chip via a peripheral component 221.

[0091] In some embodiments, a first plurality of AXI buses coupling a first IF C 208-3 to a plurality of computing devices 210 couple the plurality of computing devices 210 to an additional plurality of communication subsystems 235, and a second plurality of IFs 206 couple to a controller 212, an accelerator 214, a neuromorphic component 218, an SRAM 217, or a peripheral component 221, which can use a faster AXI bus transmission speed than a second plurality of AXI buses coupling a first IF B 208-2 to a first IF C 208-3 and to a first IF D 208-4. As an example, the first plurality of AXI buses may have transmission rates in the range of 50 to 150 MHz, 100 MHz, etc., and the second plurality of AXI buses may have transmission rates in the range of 150 to 275 MHz, 250 MHz, etc. A third AXI bus may couple the first IF C 208-3 to a second IF A 206-1 and may have a faster transmission rate than the first or second plurality of AXI buses. As an example, the third AXI bus can have a transmission rate in the range of 250 to 350 MHz, 300 MHz, etc.

[0092] Figure 2B The functional block diagram is presented in the form of device 204, which includes a first IF B 208-2, computing devices 210-1 and 210-2, and a neuromorphic component 218. The first IF B 208-2 may be... Figure 1A The first IF in 108 Figure 1B The first IF B108 and Figure 2A The first IF B 208-2 is similar to the first IF in the previous example. The first IF B 208-2 can be a PCIe XDMA interface. The first IF B 208-2 can be coupled to the host, for example... Figure 2A The host 202. A first IF B 208-2 may be coupled to each of computing devices 210-1, 210-2 and to a second plurality of IFs 206. Communication between the host and the first IF B 208-2 may include communication with memory devices (e.g., Figure 2A Commands to perform operations on data in the memory device 216).

[0093] Computing device 210 may include multiple sub-cores 219. As an example, computing device 210-1 may include multiple sub-cores 219-1, 219-2, 219-3, and 219-4. Similarly, computing device 210-2 may include multiple sub-cores 219-5, 219-6, 219-7, and 219-8. Each of the sub-cores 219 may include an MMU, a PMP, and / or a cache memory, as will be further described below. Figure 4 As described.

[0094] Each of the sub-cores 219 can perform several extended memory operations, as described above. In some embodiments, a sub-core 219 can perform at least a portion of an operation and cooperate with additional sub-cores 219 to complete the operation. For example, a first sub-core 219-1 of computing device 210-1 can perform a first portion of an operation, and a second sub-core 219-5 of computing device 210-2 can perform a second portion of an operation. In some embodiments, computing device 210 can receive a command to perform an operation containing an address location within a memory device to access data. In this way, the host can more efficiently perform operations on multiple portions of data by simultaneously using different sub-computing devices 210 to process data.

[0095] For communication between the first subcore 219-1 and the second subcore 219-5, messages and / or commands can be transmitted to a mailbox component (“MB”) 223 that can be periodically accessed by each of the subcores 219. In this manner, the first subcore 219-1 can transmit a message indicating that a first part of an operation is being executed by the first subcore 219-1 to the mailbox component 223, and the second subcore 219-5 can also retain a message indicating that a second part is being executed by the second subcore 219-5. In response to the first subcore 219-1 completing the first part, the first subcore 219-1 can transmit a message indicating that the first part is complete to the mailbox component 223. Similarly, in response to the second subcore 219-5 completing the second part, the second subcore 219-5 can transmit a message indicating that the second part is complete to the mailbox component 223. The first subcore 219-1 can retrieve the result of executing the second part and combine it with the result of the first part in the first computing device 210-1. In another instance, the first sub-core 219-1 can send the results of the first part to an additional computing device (e.g., computing device 210-3), and the additional computing device can retrieve the results of the second part and combine the results to complete the operation. Furthermore, the results can be stored separately in a memory device (e.g., Figure 2A The memory device 216 is used for subsequent retrieval and processing.

[0096] In some embodiments, neuromorphic component 218 may send messages to mailbox component 223 via a second plurality of IFs 206. The messages may contain indications that data intercepted by neuromorphic component 218 has been flagged due to the occurrence of at least a portion of the data containing a pattern or indicative event. Subcore 219 may periodically access mailbox component 218 and receive messages from neuromorphic component 218. Messages may be sent from computing device 210 to host via first IF B 208-2.

[0097] Figure 3 This is a functional block diagram of a computing system 300 including device 304, according to several embodiments of the present disclosure, wherein the device includes a third plurality of IF 306 and a plurality of memory devices 316. As used herein, "device" may refer to, but is not limited to, any one or a combination of various structures, such as a circuit or circuit system, one or more dies, one or more modules, one or more devices, or one or more systems. Figure 3In the embodiments described herein, memory devices 316-1, ..., 316-N may include one or more memory modules (e.g., dual data rate (DDR) memory, three-dimensional (3D) cross-point memory, NAND memory, single in-line memory module, dual in-line memory module, etc.). Memory devices 316-1, ..., 316-N may include volatile memory and / or non-volatile memory. In several embodiments, memory devices 316-1, ..., 316-N may include multi-chip devices. Multi-chip devices may include several different memory types and / or memory modules. For example, a memory system may contain non-volatile or volatile memory on any type of module.

[0098] like Figure 3 As described herein, device 304 may include a computing device (e.g., a computing core). In some embodiments, device 304 may be an FPGA. Figures 1A to 1B Compared to 2A, each port of computing device 310 can be directly coupled to a third plurality of IFs 306 (as an example, without coupling via another set of IFs, such as a first IF 108 and a first plurality of IFs 208, which may be multiplexers). Computing device 310 can be coupled to the third plurality of IFs 306 via corresponding port connections, which include a memory port (“MemPort”) 311-1, a system port (“SystemPort”) 311-2, a peripheral port (“PeriphPort”) 311-3, and a front port (“FrontPort”) 311-4.

[0099] Memory port 311-1 is directly coupled to third IF A 306-1, specifically designated to receive data from the memory port and transmit data to memory controller 312. System port 311-2 is directly coupled to third IF B 306-2, specifically designated to receive data from system port 311-2 and transmit data to neuromorphic component 318. Peripheral port 311-3 is directly coupled to third IF C 306-3, specifically designated to receive data from peripheral port 311-3 and transmit data to serial port 325. Front port 311-4 is directly coupled to third IF D 306-4, specifically designated to receive data from front port 311-4 and transmit data to host interface 320, and subsequently to host 302 via channels 303 and / or 305. In this embodiment, the multiplexer may not be used between the port and the communication subsystem, but may be directly connected between the port and the communication subsystem for data transmission.

[0100] In some embodiments, a third plurality of IF306s may facilitate visibility between corresponding address spaces of computing device 310. For example, computing device 310 may store data in its memory resources in response to receiving data and / or files. The computing device may associate addresses (e.g., physical addresses) corresponding to locations in its memory resources where the data is stored. Furthermore, computing device 310 may resolve (e.g., partition) the addresses associated with the data into logical blocks.

[0101] In some embodiments, the zeroth logical block associated with the data may be transferred to a processing device (e.g., a Reduced Instruction Set Computing (RISC) device). Specific computing devices (e.g., computing devices 110, 210, 310) may be configured to identify a specific set of logical addresses that can be accessed by said computing device (e.g., 210-2), while other computing devices (e.g., computing devices 210-3, 210-4, etc., respectively) may be configured to identify different sets of logical addresses that can be accessed by those computing devices 110, 210, 310. In other words, a first computing device (e.g., computing device 210-2) may be able to access a first set of logical addresses associated with said computing device (210-2), and a second computing device (e.g., computing device 210-3) may be able to access a second set of logical addresses associated with it, and so on.

[0102] If a request for data corresponding to a second set of logical addresses (e.g., logical addresses accessible by the second computing device 210-3) is made at a first computing device (e.g., computing device 210-2), then a third plurality of IF306s can facilitate communication between the first computing device (e.g., computing device 210-2) and the second computing device (e.g., computing device 210-3) to allow the first computing device (e.g., computing device 210-2) to access the data corresponding to the second set of logical addresses (e.g., a set of logical addresses accessible by the second computing device 210-3). In other words, the communication subsystem can facilitate communication between computing device 310 (e.g., 210-1) and additional computing devices (e.g., computing devices 210-2, 210-3, 210-4) to make the address spaces of the computing devices visible to each other.

[0103] In some embodiments, communication between computing devices 110, 210, and 310 to facilitate address visibility may include receiving a message requesting access to data corresponding to a second set of logical addresses via an event queue of a first computing device (e.g., computing device 210-1), loading the requested data into the memory resources of the first computing device, and transmitting the requested data to a message buffer. Once the data has been buffered by the message buffer, the data can be transmitted to a second computing device (e.g., computing device 210-2) via a communication subsystem.

[0104] For example, during the execution of an extended memory operation, controllers 115, 215, 315 and / or a first computing device (e.g., computing device 210-1) may determine that the address specified by a host command (e.g., a command generated by a host, such as host 102 illustrated in FIG. 1, to initiate the execution of the extended memory operation) corresponds to a location in the memory resources of a second computing device (e.g., computing device 210-2) among a plurality of computing devices (110, 210). In this case, a computing device command may be generated and sent from controllers 115, 215, 315 and / or the first computing device (210-1) to the second computing device (210-2) to initiate the extended memory operation at the address specified by the computing device command using operands stored in the memory resources of the second computing device (210-2).

[0105] In response to receiving a command from a computing device, the second computing device (210-2) can perform an extended memory operation at the address specified by the computing device command using operands stored in the memory resources of the second computing device (210-2). This reduces command traffic between the host and the memory controller and / or computing devices (210, 310) because the host does not need to generate additional commands to perform the extended memory operation, which can improve the overall performance of the computing system, for example, by reducing the time associated with transmitting commands to and from the host.

[0106] In some embodiments, controllers 115, 215, and 315 may determine that performing an extended memory operation may involve performing multiple sub-operations. For example, an extended memory operation may be resolved or divided into two or more sub-operations, which may be performed as part of performing an overall extended memory operation. In this case, controllers 115, 215, and 315 and / or the first IF 108, the first plurality of IFs 208, and the second plurality of IFs 106 and 206 may utilize the address visibility described above to facilitate the execution of the sub-operations by the various computing devices 110, 210, and 310. In response to completing a sub-operation, controllers 115, 215, and 315 may cause the results of the sub-operations to be merged into a single result corresponding to the result of the extended memory operation.

[0107] In other embodiments, an application requesting data stored in computing devices 110, 210, 310 can know which computing devices 110, 210, 310 contain the requested data (e.g., may have corresponding information). In this example, the application may request data from the relevant computing devices 110, 210, 310, and / or the address may be loaded into multiple computing devices 110, 210, 310 and accessed by the application requesting data via a first IF 108, a first plurality of IFs 208, and a second plurality of IFs 106, 206.

[0108] Controllers 115, 215, and 315 may be discrete circuit systems physically separate from the first IF 108, the first plurality of IFs 208, and the second plurality of IFs 106 and 206, and may each be provided as one or more integrated circuits allowing communication between computing devices 110, 210, 310, memory controllers 112, 212, 312, and / or controllers 115, 215, and 315. Non-limiting examples of the first IF 108, the first plurality of IFs 208, and the second plurality of IFs 106 and 206 may include XBAR or other communication subsystems that allow interconnection and / or interoperability between controllers 115, 215, 315, computing devices 110, 210, 310, and / or memory controllers 112, 212, 312.

[0109] As described above, in response to receiving commands generated by controllers 115, 215, 315, first IF 108, first plurality of IF 208, second plurality of IF 106, 206 and / or host (e.g., host 102 illustrated in FIG. 1), extended memory operations can be performed using data stored in computing devices 110, 210, 310 and / or from data blocks streamed via computing devices 110, 210, 310.

[0110] Figure 4 This is a functional block diagram of a computing core 410 comprising several ports 411-1, 411-2, 411-3, and 411-4, according to several embodiments of the present disclosure. The computing core 410 may include a memory management unit (MMU) 420, a physical memory protection (PMP) unit 422, and a cache memory 424.

[0111] The MMU 420 refers to a computer hardware component used for memory and cache operations associated with the processor. The MMU 420 may be responsible for memory management and integrated into the processor, or in some instances, it may reside on a separate integrated circuit (IC) chip. The MMU 420 can be used for hardware memory management, which may include monitoring and regulating the processor's use of random access memory (RAM) and cache memory. The MMU 420 can be used for operating system (OS) memory management, ensuring sufficient memory resources are available for the objects and data structures of each running program. The MMU 420 can be used for application memory management, allocating the required or used memory for each individual program and then reclaiming the freed memory space when operations are complete or space becomes available.

[0112] In one embodiment, PMP unit 422 can be used to protect physical memory to restrict memory access and isolate processes from each other. PMP unit 422 can be used to set memory access permissions (read, write, execute) for specified memory areas. PMP unit 422 can support eight areas with a minimum size of 4 bytes. In some instances, PMP unit 422 can be programmed only in a permission mode called machine mode (or M mode). PMP unit 422 can enforce permissions for U mode access. However, locked areas can additionally enforce their permissions for M mode. Cache memory 424 can be SRAM cache memory, 3D crosspoint cache memory, etc. Cache memory 424 can contain 8 KB, 16 KB, 32 KB, etc., and may include error correction decoding (ECC).

[0113] In one embodiment, the computing core 410 may further include multiple ports, including a memory port 411-1, a system port 411-2, a peripheral device port 411-3, and a front-end port 411-4. The memory port 411-1 can be directly coupled to a communication subsystem (such as...). Figure 3 As described in the document, the communication subsystem is specifically designated to receive data from memory port 411-1. System port 411-2 is directly coupled to the communication subsystem, which is specifically designated to receive data from system port 411-2. Data via system port 411-2 can be transmitted to an accelerator (e.g., an on-chip accelerator). Peripheral port 411-3 is directly coupled to the communication subsystem, which is specifically designated to receive data from peripheral port 411-3, and this data can ultimately be transmitted to the serial port. Front port 411-4 is directly coupled to the communication subsystem, which is specifically designated to receive data from front port 411-4, and this data can ultimately be transmitted to the host interface and subsequently to the host.

[0114] The compute core 410 may be a cache-coherent 64-bit RISC-V processor with full Linux functionality. In some instances, memory port 411-1, system port 411-2, and peripheral port 411-3 may be outgoing ports, and front-end port 411-4 may be incoming ports. Examples of compute core 410 may include the U54-MC compute core. The compute core 410 may include an instruction memory system, an instruction fetch unit, an execution pipeline unit, a data memory system, and support for global, software, and timer interrupts. The instruction memory system may include a 16-kilobyte (KiB) 2-way set-associative instruction cache. The access latency for all blocks in the instruction memory system may be one clock cycle. The instruction cache may not remain coherent with the rest of the platform memory system. Writes to the instruction memory may be synchronized with the instruction fetch stream by executing the FENCE.I instruction. The instruction cache may have a line size of 64 bytes, and cache line filling may trigger burst accesses outside the compute core 410.

[0115] The instruction fetch unit may include branch prediction hardware to improve processor core performance. The branch predictor may include a 28-entry Branch Target Buffer (BTB) that predicts the target of the branch taken; a 512-entry Branch History Table (BHT) that predicts the direction of conditional branches; and a 6-entry Return Address Stack (RAS) that predicts the target the program returns to. The branch predictor may have a cycle delay so that correctly predicted control flow instructions do not incur penalties. Incorrectly predicted control flow instructions can cause three cycle penalties.

[0116] The execution pipeline unit can be a single-topic ordered pipeline. The pipeline can contain five stages: instruction fetch, instruction decode and register fetch, execution, data memory access, and register write-back. The pipeline can have a peak execution rate of one instruction per clock cycle and can be completely bypassed, allowing most instructions to have a cycle result delay. The pipeline can be interlocked for write-after-read and write-after-write hazards, so instructions can be scheduled to avoid stalling.

[0117] In one embodiment, the data storage system may include a DTIM interface that can support up to 8 KiB. The access latency from the core to its own DTIM can be two clock cycles for a full byte and three clock cycles for a smaller amount. Memory requests from one core to any other core's DTIM may not be as performant as memory requests from the core to its own DTIM. Misaligned access is not supported in the hardware and can lead to truncation to allow software emulation.

[0118] In some embodiments, the computing core 410 may include a floating-point unit (FPU) that provides full hardware support for the IEEE 754-2008 floating-point standard for 32-bit single-precision and 64-bit double-precision arithmetic. The FPU may include a fully pipelined fused-multiplication-addition unit and iterative division and square root unit, a magnitude comparator, and a floating-point to integer conversion unit with full hardware support for values ​​below normal and IEEE default values.

[0119] Figure 5 This is a flowchart illustrating an example method 528 corresponding to an extended memory architecture according to several embodiments of the present disclosure. At block 530, method 528 may include receiving a command from a host to at least one of a plurality of computing devices via a first communication subsystem. The command instructs a portion of data to be accessed in a non-volatile memory device to perform an operation. As an example, the command may be sent from the host to the computing device and instruct to perform an extended memory operation on said portion of the data. The portion of the data will be accessed before the extended memory operation on said portion of the data is performed. The first communication subsystem may be coupled to the host. The transmission of the command may be in response to receiving a request to transmit a block of data to perform an operation associated with the command. In some embodiments, receiving a command to initiate execution of an operation may include receiving an address corresponding to a memory location in a particular computing device storing operands corresponding to the execution of the operation. For example, as described above, the address may be an address in a memory portion where data to be used as operands during the execution of the operation is stored. Alternatively, receiving the command may include receiving an address within a microcode component rather than storing microcode instructions.

[0120] At block 532, method 528 may include an instruction to transmit, from at least one of a plurality of computing devices and via a second communication subsystem, a portion of data to be accessed by the neuromorphic component. This instruction may be sent to notify the neuromorphic component prior to actual access to the portion of the data. In this way, the neuromorphic component can be initiated and prepared to intercept data subsequently accessed by the computing devices. The second communication subsystem may be... Figure 2A The second multiple IF206 and Figure 3 The third or more IF306 in the memory, and the memory device can be memory devices 216 and 316, such as Figure 2A and 3 As explained in the text.

[0121] At box 534, method 528 may include determining, at the neuromorphic component, the portion of data indicating a specific event. This determination may be performed by several neuromorphic operations, machine learning operations, etc. The neuromorphic or machine learning operations may be performed using a neural network of the neuromorphic component or an additional memory array configuration capable of performing such operations.

[0122] At block 536, method 528 may include writing one or more bits at a location within said portion of data indicating a specific event. In some instances, method 528 may further include transmitting the one or more bits written at said location within said portion of data from a neuromorphic component and via a second communication subsystem to at least one of a plurality of computing devices. In some embodiments, method 528 may further include transmitting said location within said portion of data from at least one of the plurality of computing devices and via a first communication subsystem to a host. The method may further include performing additional operations on said portion of data at the marked location on the host. The method may further include performing additional operations on said portion of data at the marked location on at least one of the plurality of computing devices.

[0123] Method 528 may further include transferring a data block associated with a command from a non-volatile memory device to at least one of a plurality of computing devices via a second communication subsystem. A first communication subsystem may be coupled to a host and to at least one of the plurality of computing devices. A second communication subsystem may be coupled to at least one of the plurality of computing devices and to a memory device. Method 528 may further include at least one of the plurality of computing devices performing an operation using the data block in response to receiving a command and the data block, to reduce the data size from a first size to a second size via the at least one of the plurality of computing devices. Method 528 may further include transferring the reduced-size data block to the host via the first communication subsystem. The reduced-size data block may be transferred to the host via a PCIe interface coupled to the first communication subsystem. Method 528 may further include using a memory controller to transfer the reduced-size data block to the memory device.

[0124] Although specific embodiments have been illustrated and described herein, those skilled in the art will understand that arrangements calculated to achieve the same results may replace the specific embodiments shown. This disclosure is intended to cover modifications or variations of one or more embodiments of this disclosure. It should be understood that the above description has been carried out in an illustrative rather than restrictive manner. Combinations of the above embodiments and other embodiments not specifically described herein will be apparent to those skilled in the art after reviewing the above description. The scope of one or more embodiments of this disclosure includes other applications using the above structures and processes. Therefore, the scope of one or more embodiments of this disclosure should be determined with reference to the appended claims and the full scope of equivalents to which such claims are given.

[0125] In the foregoing detailed embodiments, some features are grouped together in a single embodiment for the purpose of simplifying this disclosure. This approach of the disclosure should not be construed as reflecting an intention that the disclosed embodiments must use more features than are expressly stated in each claim. Rather, as reflected in the appended claims, the subject matter of the invention lies in less than all the features of a single disclosed embodiment. Therefore, the appended claims are hereby incorporated into the detailed embodiments, wherein each claim is, in itself, a separate embodiment.

Claims

1. An extended memory device, comprising: Multiple computing devices, each comprising: Processing unit, configured to perform operations on data blocks; and A memory array configured as a cache memory for each corresponding processing unit; A communication subsystem coupled to at least one of the plurality of computing devices and coupled to a neuromorphic component; and The neuromorphic component; At least one of the plurality of computing devices is configured to: take over: Requests to perform operations from the host; and Instructions for data to be accessed in the memory device to perform the operation; and Instructions are sent to the neuromorphic component to monitor the data to be accessed in the memory device; and The neuromorphic component is configured to: Intercepting the data simultaneously with at least one of the plurality of computing devices accessing the data; and The portion of the data is identified to indicate that a specific event has occurred, and that portion of the data is marked.

2. The device according to claim 1, wherein: The neuromorphic component is configured to mark the portion of the data by causing one or more bits to be written within the portion of the data at locations indicating that the specific event has occurred; The indication that the portion of the data is marked is sent to at least one of the plurality of computing devices.

3. The device of claim 2, wherein at least one of the plurality of computing devices is configured to send the instruction to the host in response to the instruction that the portion of the received data is marked.

4. The device of claim 3, wherein at least one of the plurality of computing devices is configured to perform a specific operation on the portion of the data in response to the indication that the portion of the received data is marked.

5. The device according to any one of claims 1 to 4, wherein the neuromorphic component is configured to analyze the data to detect specific patterns within the data in order to label the portions of the data.

6. The device according to any one of claims 1 to 4, wherein: The neuromorphic component is configured to perform machine learning operations on the data; and The neuromorphic component is configured to: Receive training data before intercepting the data; and The intercepted data is analyzed based on the received training data.

7. An extended memory device comprising: Multiple computing devices, each comprising: Processing unit, configured to perform operations on data blocks; and A memory array configured as a cache memory for each corresponding processing unit; A first communication subsystem, coupled to a host and to each of the plurality of computing devices; and A second communication subsystem is coupled to each of the plurality of computing devices; A neuromorphic component coupled to the second communication subsystem; At least one of the plurality of computing devices is configured to: Receive a request from the host to perform an operation on a portion of the data stored in the memory device; An instruction to intercept the portion of data when the portion of data accessed by at least one of the computing devices is sent to the neuromorphic component via the second communication subsystem; and The portion of the data accessed; and The neuromorphic component is configured to: Intercept the portion of data simultaneously with the portion of data accessed by at least one of the plurality of computing devices; Read the aforementioned portion of the data; The portion of the data analyzed; Based on the analysis, it is determined that a specific event has occurred; and Based on the determination, one or more bits are written into the specified portion of the data at the specified location.

8. The device of claim 7, wherein the first communication subsystem is a peripheral component interconnect high-speed (PCIe) interface.

9. The device according to claim 7, wherein: The communication subsystem is coupled to the memory controller, and the memory controller is coupled to the memory device; and The memory controller is a DDR4 memory controller.

10. The device according to claim 7, wherein: The second communication subsystem is coupled to the neuromorphic component via a first interconnect controlled by the second communication subsystem; and The neuromorphic component is coupled to the second communication subsystem via a second interconnect controlled by the neuromorphic component.

11. The device of claim 7, wherein the memory device comprises at least one of: dual data rate (DDR) memory, three-dimensional (3D) cross-point memory, NAND memory, or any combination thereof.

12. The device of claim 7, wherein the processing unit of each of the plurality of computing devices is configured with a reduced instruction set architecture.

13. The device according to any one of claims 7 to 12, wherein the operation performed on the data block includes operations in which at least some of the data are sorted, reordered, removed or discarded, comma-separated value parsing operations, or both.

14. The device according to any one of claims 7 to 12, wherein each of the plurality of computing devices is configured as a Reduced Instruction Set Computer (RISC)-V device.

15. The device according to any one of claims 7 to 12, wherein the plurality of computing devices, the first communication subsystem, the second communication subsystem, and the neuromorphic component are configured on a field-programmable gate array (FPGA) and the memory device is external to the FPGA.

16. The device according to any one of claims 7 to 12, wherein the neuromorphic component is coupled to the host via the first communication subsystem and the second communication subsystem without passing through the plurality of computing devices.

17. A method for expanding memory, comprising: The host receives a command from at least one of a plurality of computing devices via a first communication subsystem, wherein the command indicates a portion of data to be accessed in a non-volatile memory device to perform an operation; The portion of data transmitted from at least one of the computing devices and via the second communication subsystem is an instruction to be accessed by the neuromorphic component; The portion of data determined at the neuromorphic component indicates a specific event; and Write one or more bits at the location within the portion of the data indicating the specific event; The neuromorphic component is configured to intercept the portion of data simultaneously with the portion of data accessed by at least one of the plurality of computing devices.

18. The method of claim 17, further comprising transmitting the location within the portion of the data from the neuromorphic component and via the second communication subsystem to at least one of the plurality of computing devices.

19. The method according to any one of claims 17 to 18, further comprising: The data block associated with the command is transferred from the non-volatile memory device to at least one of the plurality of computing devices via a second communication subsystem, wherein: The first communication subsystem is coupled to the host and to at least one of the plurality of computing devices; and The second communication subsystem is coupled to at least one of the plurality of computing devices and to the non-volatile memory device; Through at least one of the plurality of computing devices, in response to receiving the command and the data block, an operation is performed using the data block to reduce the data size from a first size to a second size through at least one of the plurality of computing devices; and The reduced-size data block is transmitted to the host via the first communication subsystem.