High-density in-memory computing device, neural network accelerator, and electronic device
By using a high-density in-memory computing device based on read-only memory, multi-bit storage and computation are realized, solving the data migration problem caused by the low storage density of SRAM, improving the area efficiency and energy efficiency of in-memory computing, and making it suitable for neural network operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-03-09
- Publication Date
- 2026-06-19
AI Technical Summary
Due to the low storage density of SRAM, existing in-memory computing devices cannot fully deploy large-scale neural networks on the chip, resulting in frequent off-chip memory reads, which generate a lot of data movement energy consumption and latency.
A high-density in-memory computing device based on read-only memory is adopted. Through the design of multiple computing modules and selection devices, each computing module realizes multi-bit storage. The control module selects and controls word lines and data selection control lines to perform target operations, thereby realizing high-density storage and computing.
It improves the area efficiency of in-memory computing, reduces or eliminates the need for off-chip memory access, reduces data movement energy consumption and latency, is suitable for neural network operations, and reduces energy consumption and latency during inference operations.
Smart Images

Figure CN116246669B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of integrated circuit technology, and in particular to a high-density in-memory computing device, a neural network accelerator, and an electronic device. Background Technology
[0002] With the widespread application of technologies such as artificial intelligence and the Internet of Things across various industries, the demand for computing power and storage in AI applications is increasing daily. The traditional von Neumann architecture has physically separate storage and computing units, which requires data to be frequently moved between memory and processor during computation, resulting in significant energy consumption and latency in data movement.
[0003] To improve the overall throughput and energy efficiency of the system, an in-memory computing architecture is proposed to eliminate the "memory wall" bottleneck in the von Neumann architecture. By integrating storage units with computing units, in-memory computing performs the required computations within the storage units, reducing the additional overhead during data migration and giving it great potential for deploying machine learning models on edge devices.
[0004] While in-memory computing shares similar computational principles across different memory devices, their implementations differ due to variations in the memory itself. For example, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Flash memory, Resistive Random Access Memory (ReRAM), and other newer memory devices. SRAM, with its mature manufacturing process, durability, flexible read / write capabilities, and fast read / write speeds, is currently the dominant architecture for in-memory computing. However, the limited bit-line voltage swing during computation in a 6T SRAM cell leads to read interference issues when using 6TSRAM for in-memory computing. Therefore, to improve the performance of SRAM-based in-memory computing, some SRAM-based in-memory computing circuits utilize SRAM with more transistors, such as 8TSRAM, 10T SRAM, and 12T SRAM.
[0005] To enable the widespread adoption of artificial intelligence applications in edge devices, the area efficiency of storage cells in in-memory computing devices is a key design consideration. However, due to the low storage density of SRAM, even with a minimum of 6TB SRAM cells, large-scale neural networks still face the challenge of not being fully deployed on the chip. This necessitates frequent readings of weight values from off-chip memory during computation, resulting in significant data movement energy consumption and latency. Therefore, further improving the area efficiency of in-memory computing devices has become a core optimization objective. Summary of the Invention
[0006] In view of this, the present disclosure proposes a high-density in-memory computing device based on a read-only memory device, the device comprising:
[0007] Multiple computing modules are provided, each including at least one read-only memory device, multiple selection devices, an excitation source, a storage state data line, a control word line, a computing bit line, and a data selection control line. The control terminal of the read-only memory device is connected to the corresponding control word line to receive control signals. The two data terminals of the read-only memory device are respectively connected to different storage state data lines to realize data storage. The storage state data lines are connected to the excitation source and the computing bit line through the selection devices. The data state of each storage state data line is characterized by current or voltage level according to the type of the excitation source. The control terminal of the selection device is connected to the corresponding data selection control line to receive data selection control signals, which are used to select the corresponding selection device.
[0008] The control module is connected to the control word lines and data selection control lines of each computing module. It is used to select the corresponding read-only memory device and selection device for target operation through the control word lines and data selection control lines, and output the result data through the computing bit lines.
[0009] In one possible implementation, the selection device includes a data selection device, which comprises a higher-level data selection device and a lower-level data selection device. The higher-level data selection device includes a first higher-level data selection device and a second higher-level data selection device. The lower-level data selection device includes a first lower-level data selection device and a second lower-level data selection device. The data selection control line includes a first data selection control line and a second data selection control line. The storage status data line includes a first storage status data line and a second storage status data line.
[0010] The input terminals of both the first lower-level data selection device and the second lower-level data selection device are connected to the excitation source.
[0011] The output terminal of the first lower-level data selection device is connected to the input terminal of the second upper-level data selection device through the first storage status data line.
[0012] The output terminal of the second lower-level data selection device is connected to the input terminal of the first upper-level data selection device via the second storage status data line.
[0013] The control terminals of the first lower-level data selection device and the first upper-level data selection device are both connected to the first data selection control line.
[0014] The control terminals of the second lower-level data selection device and the second upper-level data selection device are both connected to the second data selection control line.
[0015] The output terminals of the first and second upper-level data selection devices are connected to the calculation bit lines.
[0016] The first data terminal and the second data terminal of each read-only memory device are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
[0017] In one possible implementation, the selection device further includes column selection devices, the control terminals of each column selection device are connected to a control bit line for receiving column selection signals, the input terminals of the series assembly formed by each column selection device and the lower-level data selection device are connected to the excitation source, and the output terminals of the series assembly formed by each column selection device and the lower-level data selection device are connected to the corresponding storage state data line.
[0018] In one possible implementation, the excitation source includes multiple current sources or multiple voltage sources. If the excitation source is multiple voltage sources, the calculation module further includes a reset switch and a capacitor. The control terminal of the reset switch is connected to the reset control word line for receiving a reset signal. The reset terminal of the reset switch is connected to a reset level line. The reset detection terminal of the reset switch and the first terminal of the capacitor are connected to the output terminal of the upper-level data selection device. The second terminal of the capacitor is connected to the calculation bit line.
[0019] In one possible implementation, the read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and the second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
[0020] In one possible implementation, the first storage state data line connecting each ROM device in the (K+1)th column is the second storage state data line connecting each ROM device in the Kth column, where K is an integer.
[0021] In one possible implementation, the selection device includes a data selection device and a column selection device. The data selection device includes a higher-level data selection device and a lower-level data selection device. The higher-level data selection device includes a first higher-level data selection device and a second higher-level data selection device. The lower-level data selection device includes a first lower-level data selection device and a second lower-level data selection device. The data selection control line includes a first data selection control line and a second data selection control line. The storage status data line includes a first storage status data line and a second storage status data line.
[0022] The input terminals of the first lower-level data selection device and the second lower-level data selection device are connected to the excitation source.
[0023] The output terminal of the first lower-level data selection device is connected to the input terminal of the second upper-level data selection device, and is also connected to the input terminal of the corresponding column selection device.
[0024] The output of the second lower-level data selection device is connected to the input of the first upper-level data selection device, and is also connected to the input of another column selection device. The output of each column selection device is connected to the corresponding storage status data line.
[0025] The output terminals of the first and second upper-level data selection devices are connected to the calculation bit lines.
[0026] The control terminals of the first lower-level data selection device and the first upper-level data selection device are both connected to the first data selection control line.
[0027] The control terminals of the second lower-level data selection device and the second upper-level data selection device are both connected to the second data selection control line.
[0028] The first data terminal and the second data terminal of each read-only memory device are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
[0029] In one possible implementation, the read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and the second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
[0030] In one possible implementation, the first storage state data line connected to each read-only memory device in the (K+1)th column is the second storage state data line connected to each read-only memory device in the Kth column, where K is an integer;
[0031] In this group of data selection devices, multiple column selection devices corresponding to each column read-only memory device reuse the same set of data selection devices. The output terminals of each first upper-level data selection device and each second upper-level data selection device in this set of data selection devices are connected to the same calculation bit line through capacitors or directly.
[0032] In one possible implementation, multiple computing modules are electrically connected to form a layout of several rows and several columns, wherein the control word lines of computing modules in the same row are electrically connected, the control bit lines of computing modules in the same column are electrically connected, and the computing bit lines of computing modules in the same column are electrically connected.
[0033] In one possible implementation, the target operation includes a multiplication operation, a data read operation, and a multiply-accumulate operation. If it is a multiplication operation, the result data is the product of the stored data corresponding to the read-only memory device at the corresponding time and the control word line input data; if it is a data read operation, the result data is the stored data corresponding to the read-only memory at the corresponding time.
[0034] The result data includes multiplication and accumulation results, and the control module is further used for:
[0035] Control one or more columns of calculation modules to perform multiply-accumulate operations, and / or control some or all of the calculation modules connected to the same calculation bit line to perform multiply-accumulate operations;
[0036] A set of data is input through the control word line, and the calculation module to participate in the multiplication and accumulation operation is selected. The read-only storage device to participate in the calculation within each calculation module is selected using the control word line, and the stored data to participate in the calculation is selected using the data selection control line. The multiplication operation between the input data and the stored data is completed in each calculation module. The multiplication results obtained from each calculation module are accumulated through the calculation bit line and output to obtain the multiplication and accumulation result.
[0037] In one possible implementation, the control module is further configured to:
[0038] Adjusting the signal timing of the control word line and control bit line enables pipelined operation between different computation bit lines.
[0039] In one possible implementation, the control module is further configured to:
[0040] Each computing module is controlled to enter either a working mode or an idle mode, and in the working mode, each computing module performs the target operation.
[0041] According to another aspect of this disclosure, a neural network accelerator is provided, the neural network accelerator including a high-density in-memory computing device based on a read-only memory device as described above.
[0042] According to another aspect of this disclosure, an electronic device is provided, the electronic device including the high-density in-memory computing device based on the read-only memory device, or including the neural network accelerator.
[0043] The high-density in-memory computing device based on read-only memory devices in this disclosure realizes multi-bit storage of a single read-only memory device, enabling each computing module to have multi-bit data storage and computing capabilities, improving the high-density storage of the device, improving the area efficiency of in-memory computing, thereby reducing or even eliminating the device's access to off-chip memory.
[0044] Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0045] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this disclosure together with the specification and serve to explain the principles of this disclosure.
[0046] Figure 1 A schematic diagram of a high-density in-memory computing device based on a read-only memory device according to an embodiment of the present disclosure is shown.
[0047] Figure 2 A schematic diagram of a circuit symbol for a metal-oxide-semiconductor field-effect transistor according to an embodiment of the present disclosure is shown. Figure 3 A schematic diagram illustrating a typical case of drain-source current versus gate-source voltage in a metal-oxide-semiconductor field-effect transistor according to an embodiment of the present disclosure is shown.
[0048] Figure 4a , Figure 4b A schematic diagram of the calculation module in the current domain according to an embodiment of the present disclosure is shown.
[0049] Figure 5a , Figure 5b A schematic diagram of the calculation module in the charge domain according to an embodiment of the present disclosure is shown.
[0050] Figure 6a , Figure 6b A schematic diagram is shown of a calculation module according to an embodiment of the present disclosure performing a first multiplexing method for selecting devices in the current domain and charge domain.
[0051] Figure 7a , Figure 7bA schematic diagram is shown of a second multiplexing method for the selected device in the current domain and charge domain according to an embodiment of the present disclosure.
[0052] Figure 8 A schematic diagram of the structure of an in-memory computing device according to an embodiment of the present disclosure is shown.
[0053] Figure 9a , Figure 9b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiplication operation in the current domain is shown.
[0054] Figure 10a , Figure 10b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiplication operation in the charge domain is shown.
[0055] Figure 11a , Figure 11b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiply-accumulate operation in the current domain is shown.
[0056] Figure 12a , Figure 12b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiply-accumulate operation in the charge domain is shown.
[0057] Figure 13a , Figure 13b A schematic diagram showing the setting of the computing module of the in-memory computing device according to an embodiment of the present disclosure in idle mode in the current domain and charge domain is shown.
[0058] Figure 14a , Figure 14b , Figure 14c , Figure 14d A schematic diagram of pipelined operation of an in-memory computing device according to an embodiment of the present disclosure is shown.
[0059] Figure 15 A schematic diagram of a neural network accelerator according to an embodiment of the present disclosure is shown. Detailed Implementation
[0060] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.
[0061] In the description of this disclosure, it should be understood that the terms "length", "width", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, and are only for the convenience of describing this disclosure and simplifying the description, and are not intended to indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this disclosure.
[0062] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this disclosure, "a plurality of" means two or more, unless otherwise expressly specified.
[0063] In this disclosure, unless otherwise expressly specified and limited, the terms "installation," "connection," "linking," "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral part; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; they can refer to the internal communication of two components or the interaction between two components. Those skilled in the art can understand the specific meaning of the above terms in this disclosure according to the specific circumstances.
[0064] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.
[0065] Please see Figure 1 , Figure 1 A schematic diagram of a high-density in-memory computing device based on a read-only memory device according to an embodiment of the present disclosure is shown.
[0066] like Figure 1 As shown, the device includes:
[0067] Multiple computing modules 10, each computing module 10 including at least one read-only memory device (Q1), multiple selection devices (such as Q11, Q12, Q21, Q22), excitation source, memory state data line, and control word line (WL). <1> ), calculation bit line (CBL), data selection control lines (Ctrl, CtrlB), wherein: the control terminal of the read-only memory device (Q1) is connected to the corresponding control word line (WL) <1> The read-only memory device (Q1) is connected to receive control signals. The two data terminals of the read-only memory device (Q1) are respectively connected to different storage state data lines to realize data storage. The storage state data lines are connected to the excitation source and the computation bit line CBL through the selection device. The data state of each storage state data line is characterized in the form of current or level according to the type of the excitation source. The control terminal of the selection device (such as Q11, Q12, Q21, Q22) is connected to the corresponding data selection control line to receive data selection control signals. The selection control signals are used to select the corresponding selection device.
[0068] The control module is connected to the control word lines and data selection control lines of each computing module 10. It is used to select the corresponding read-only memory device and selection device for target operation through the control word lines and data selection control lines, and output the result data through the computing bit lines.
[0069] The high-density in-memory computing device based on read-only memory devices in this disclosure realizes multi-bit storage of a single read-only memory device, enabling each computing module to have multi-bit data storage and computing capabilities, improving the high-density storage of the device, improving the area efficiency of in-memory computing, thereby reducing or even eliminating the device's access to off-chip memory.
[0070] The high-density in-memory computing device based on read-only memory devices in this disclosure can be applied to neural network operations. Since each computing module has multi-bit data storage capability, it has the potential to store all parameters of a large-scale neural network on-chip, thereby reducing or even eliminating the additional power consumption and latency caused by data movement on and off-chip, reducing energy consumption and latency during inference operations, and enabling artificial intelligence algorithms to be efficiently deployed to edge or terminal devices. Of course, when the high-density in-memory computing device based on read-only memory devices is applied to other scenarios, it can also achieve the effect of reducing power consumption and latency due to its high-density storage characteristics.
[0071] This disclosure does not limit the specific implementation of the read-only memory (ROM) device. Those skilled in the art can choose an appropriate method to implement it according to the actual situation and needs. For example, the read-only device can be implemented by means of switches, diodes, bipolar transistors, metal-oxide-semiconductor field-effect transistors, etc.
[0072] This disclosure does not limit the specific implementation of the control module 10. Those skilled in the art can choose a suitable implementation method according to actual conditions and needs. In one example, the control module 10 may include a processing component. For example, the processing component includes, but is not limited to, a separate processor, discrete components, or a combination of a processor and discrete components. The processor may include a controller in an electronic device that has the function of executing instructions. The processor can be implemented in any suitable manner, for example, by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components. Inside the processor, the executable instructions can be executed by hardware circuits such as logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers.
[0073] This disclosure does not limit the number, type, or arrangement of read-only memory devices and selection devices in the in-memory computing device, nor does it limit the number of storage state data lines, control word lines, computing bit lines, or data selection control lines. It also does not limit the specific storage state data lines connected to the data terminals of the read-only memory devices, nor does it limit the representation form (voltage or current) of the data state, nor does it limit the type of excitation source. For example, the excitation source may include multiple voltage sources or multiple current sources. This disclosure does not limit the specific parameters of the current and voltage sources; those skilled in the art can set them according to actual conditions and needs.
[0074] This disclosure does not limit the arrangement of read-only storage devices (ROMs) in the computing module 10. For example, the storage array formed by the ROMs in the computing module 10 can be multi-row, single-column. For instance, each ROM in the computing module 10 can be controlled by an independent control word line and output corresponding calculation results through the same calculation bit line. Alternatively, the storage array formed by the ROMs in the computing module 10 can be multi-column, single-row. For instance, multiple ROMs in the computing module 10 can be controlled by the same control word line and output corresponding calculation results through calculation bit lines corresponding to different columns. Of course, the storage array formed by the ROMs in the computing module 10 can be multi-row, multi-column. For instance, each row of ROMs in the computing module can be controlled by an independent control word line and output corresponding calculation results through the same calculation bit line, or through calculation bit lines corresponding to different columns. This disclosure does not limit the arrangement of ROMs in the in-memory computing device; those skilled in the art can choose an appropriate method according to actual conditions and needs.
[0075] As described above, the in-memory computing device of this disclosure embodiment may include a plurality of computing modules 10, each computing module 10 may include at least one read-only storage device, and each read-only storage device may store multiple bits of data. Compared with other related in-memory computing technologies, the present disclosure embodiment can significantly improve the on-chip storage density of the processor.
[0076] This disclosure does not limit the specific implementation of read-only memory devices and selection devices. Those skilled in the art can choose appropriate devices to implement them according to actual conditions and needs. For example, this disclosure can use metal-oxide-semiconductor field-effect transistors (MOSFETs) to implement the switching of read-only memory devices, selection devices, etc.
[0077] Please see Figure 2 , Figure 3 , Figure 2 A schematic diagram of a circuit symbol for a metal-oxide-semiconductor field-effect transistor according to an embodiment of the present disclosure is shown. Figure 3 A schematic diagram illustrating a typical case of drain-source current versus gate-source voltage in a metal-oxide-semiconductor field-effect transistor according to an embodiment of the present disclosure is shown.
[0078] like Figure 2 As shown, a metal-oxide-semiconductor field-effect transistor (MOSFET) is a three-port device whose impedance characteristics between the drain and source are controlled by the gate potential. This MOSFET has the following characteristics: Figure 3 The switching characteristics are shown.
[0079] The embodiments of this disclosure can use metal-oxide-semiconductor field-effect transistors to implement read-only memory devices, data select devices, column select devices, and reset switches. The impedance characteristics between the drain and source of the device are used to form a switch, and the gate of the device is used as the control terminal, so that the device can realize the functions of information storage and switching.
[0080] The embodiments disclosed herein utilize metal-oxide-semiconductor field-effect transistors (MOSFETs), which are characterized by highly mature technology and wide application. Taking an N-type MOSFET as an example, a single MOSFET can change its impedance characteristics by controlling the potential difference between the gate voltage and the source voltage. When the drain and source are in a low-resistance state, the circuit is connected; when the drain and source are in a high-resistance state, the circuit is disconnected.
[0081] It should be noted that, in the embodiments of this disclosure, the metal-oxide-semiconductor field-effect transistor is only an example, and all devices with switching characteristics can theoretically be used to build the in-memory computing device proposed in the embodiments of this disclosure.
[0082] The following provides an example of how each computing module can be implemented.
[0083] In one possible implementation, such as Figure 1 As shown, the selection device may include data selection devices (such as Q11, Q12, Q21, Q22), which include upper-level data selection devices (such as Q11, Q12) and lower-level data selection devices (such as Q21, Q22). The upper-level data selection devices include a first upper-level data selection device Q11 and a second upper-level data selection device Q12. The lower-level data selection devices include a first lower-level data selection device Q21 and a second lower-level data selection device Q22. The data selection control lines include a first data selection control line CtrlB and a second data selection control line Ctrl. The storage status data lines include a first storage status data line and a second storage status data line.
[0084] The input terminals of the first lower-level data selection device Q21 and the second lower-level data selection device Q22 are both connected to the excitation source.
[0085] The output of the first lower-level data selection device Q21 is connected to the input of the second upper-level data selection device Q12 via the first storage status data line.
[0086] The output of the second lower-level data selection device Q22 is connected to the input of the first upper-level data selection device Q11 via the second storage status data line.
[0087] The control terminals of the first lower-level data selection device Q21 and the first upper-level data selection device Q11 are both connected to the first data selection control line CtrlB.
[0088] The control terminals of the second lower-level data selection device Q22 and the second upper-level data selection device Q12 are both connected to the second data selection control line Ctrl.
[0089] The output terminals of the first upper-level data selection device Q11 and the second upper-level data selection device Q12 are connected to the calculation bit line.
[0090] The first data terminal and the second data terminal of each read-only memory device (Q1) are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
[0091] In one possible implementation, the selection device may further include column selection devices, the control terminals of each column selection device being connected to a control bit line for receiving column selection signals, the input terminals of the series assembly formed by each column selection device and the lower-level data selection device being connected to the excitation source, and the output terminals of the series assembly formed by each column selection device and the lower-level data selection device being connected to the corresponding storage state data line. It should be understood that this disclosure is an embodiment.
[0092] The following provides an exemplary description of various possible implementations of the computing module.
[0093] Please see Figure 4a , Figure 4b , Figure 4a , Figure 4b A schematic diagram of the calculation module in the current domain according to an embodiment of the present disclosure is shown.
[0094] In one possible implementation, such as Figure 4aAs shown, the excitation source can be a current source, and the current source module can include multiple current sources. Each calculation module can include two or more read-only memory devices and data selection devices. The data selection devices include lower-level data selection devices and upper-level data selection devices. The control terminal of the lower-level data selection device is connected to the corresponding data selection control line (CtrlB or Ctrl). The input terminal of the lower-level data selection device is connected to the current source in the current source module. The output terminal of the lower-level data selection device is connected to the data terminal of each read-only memory device connected to the same storage state data line. The control terminal of the upper-level data selection device is connected to the corresponding data selection control line. The control lines (CtrlB or Ctrl) are connected, and the input terminal of the upper-level data selection device is connected to the data terminals of each read-only memory device connected to the same storage state data line. The output terminal of the upper-level data selection device is connected to the corresponding computing bit line. For example, in the working mode, the states of the first data selection control line CtrlB and the second data selection control line Ctrl are opposite to each other, that is, when one is high, the other is low. The control terminals of each read-only memory device are respectively connected to the corresponding control word line. In the memory computing device, each module can contain any number of read-only memory devices. This disclosure does not limit this.
[0095] For example, the number of data selection devices can be positively correlated with the number of bits stored in the read-only memory device; the more bits the read-only memory device stores, the more data selection devices are required. Figure 4a In the illustrated configuration, each read-only memory device stores 4 bits of information, 2 bits on the left and 2 bits on the right. Therefore, in this example, four storage status data lines are used on each side for representation. Each storage status data line has a higher-level data selector and a lower-level data selector, as shown below. Figure 4a As shown, the computing module has a total of 16 data selection devices, 8 on the left and 8 on the right.
[0096] It should be noted that each Figure 4a The descriptions of the number of read-only storage devices and selection devices in the accompanying drawings are exemplary and should not be regarded as limitations on the embodiments of this disclosure. In practical applications, those skilled in the art can set a larger number of read-only storage devices to further improve storage density.
[0097] In one possible implementation, such as Figure 4bAs shown, the in-memory computing device may include multiple computing modules. Each computing module may include two or more read-only memory devices, data selection devices, and column selection devices. The data selection devices may include lower-level data selection devices and upper-level data selection devices. The read-only memory devices in the computing modules are arranged in a layout of several rows and several columns. The control terminals of the column selection devices connected to the read-only memory devices in the same column are connected to the same control bit line. The control terminals of the read-only memory devices in the same row are connected to the same control word line. The control terminal of the lower-level data selection device is connected to the corresponding data selection control line CtrlB. The column selection device and... The lower-level data selection devices are connected in series. The input terminal of the series component is connected to a current source. The output terminal of the series component is connected to the data terminal of each read-only memory device connected to the same storage state data line. The control terminal of the upper-level data selection device is connected to the corresponding data selection control line Ctrl. The input terminal of the upper-level data selection device is connected to the data terminal of each read-only memory device connected to the same storage state data line. The output terminal of the upper-level data selection device is connected to the corresponding calculation bit line. In the working mode, the states of the data selection control lines Ctrl and CtrlB are opposite to each other, that is, when one is high, the other is low.
[0098] In one example, such as Figure 4b As shown, the computing module may include read-only memory devices Q5 to Q8, data selection devices Qs17 to Qs48, and column selection devices Qs49 to Qs64. The control terminals of read-only memory devices Q5 and Q6 are both connected to the control word line WL. <1> The control terminals of both read-only memory devices Q7 and Q8 are connected to the control word line WL. <2> The control terminals of data selection devices Qs17-Qs20, Qs25-Qs28, Qs37-Qs40, and Qs45-Qs48 are all connected to the data selection control line Ctrl. The control terminals of data selection devices Qs21-Qs24, Qs29-Qs32, Qs33-Qs36, and Qs41-Qs44 are all connected to the data selection control line CtrlB. The control terminals of column selection devices Qs49-Qs56 are all connected to the control bit line BL. <1> The control terminals of column select devices Qs57 through Qs64 are all connected to the control bit line BL. <2> .
[0099] It should be noted that the embodiments disclosed herein do not limit the specific connection method of the series components, such as... Figure 4bAs shown, the input terminal of the series component can be the input terminal of the column select device, and the output terminal of the series component can be the output terminal of the lower-level data select device. Specifically, the input terminal of the column select device is connected to the current source module, the output terminal of the lower-level data select device is connected to the storage status data line, and the output terminal of the column select device is connected to the input terminal of the lower-level data select device. Of course, the series component can also be in other series configurations (i.e., the positions of the column select device and the lower-level data select device can be interchanged). For example, the input terminal of the series component can be the input terminal of the lower-level data select device, and the output terminal of the series component can be the output terminal of the column select device. Specifically, the input terminal of the lower-level data select device is connected to the current source module, the output terminal of the column select device is connected to the storage status data line, and the input terminal of the column select device is connected to the output terminal of the lower-level data select device.
[0100] In one possible implementation, if the excitation source is multiple voltage sources, the calculation module further includes a reset switch and a capacitor. The control terminal of the reset switch is connected to the reset control word line for receiving a reset signal. The reset terminal of the reset switch is connected to a reset level line. The reset detection terminal of the reset switch and the first terminal of the capacitor are connected to the output terminal of the upper-level data selection device, and the second terminal of the capacitor is connected to the calculation bit line.
[0101] Please see Figure 5a , Figure 5b , Figure 5a , Figure 5b A schematic diagram of the calculation module in the charge domain according to an embodiment of the present disclosure is shown.
[0102] In one possible implementation, such as Figure 5a As shown, each computing module may include two or more read-only memory devices, data selection devices, a reset switch Qf1, and a capacitor C1. The data selection devices include lower-level data selection devices and upper-level data selection devices. The control terminal of the lower-level data selection device is connected to the corresponding data selection control line, and the input terminal of the lower-level data selection device is connected to the power supply level. The output terminal of the lower-level data selection device is connected to the data terminal of each read-only memory device connected to the same storage state data line. The control terminal of the upper-level data selection device is connected to the corresponding data selection control line, and the input terminal of the upper-level data selection device is connected to the data terminal of each read-only memory device connected to the same storage state data line. The output terminal of the upper-level data selection device is connected to the corresponding computing bit line through a capacitor.
[0103] For example, such as Figure 5aAs shown, the reset switch Qf1 may include a reset control terminal, a reset detection terminal, and a reset terminal. The reset control terminal is connected to the control word line WL. <1> The device is used to adjust the impedance characteristics between the reset detection terminal and the reset terminal. The reset terminal is connected to a reset state level line to receive the reset state level Vpre. The reset detection terminal and the first end of capacitor C1 are connected to the data terminal of each read-only memory device through a host data selection device. The second end of capacitor C1 is connected to the calculation bit line CBL and the result data is output through the calculation bit line. In the working mode, the input states of the data selection control line Ctrl and the data selection control line CtrlB are opposite to each other, that is, when one is high, the other is low. The control terminal of each read-only memory device is connected to the corresponding control word line. In the memory computing device, each module can contain any number of read-only memory devices. This disclosure does not limit the number of read-only memory devices.
[0104] In one possible implementation, such as Figure 5b As shown, the in-memory computing device may include multiple computing modules. Each computing module may include two or more read-only memory devices, data selection devices, column selection devices, a reset switch Qf2, and a capacitor C2. The data selection devices may include lower-level data selection devices and upper-level data selection devices. The read-only memory devices in the computing modules are arranged in a layout of several rows and several columns. The control terminals of the column selection devices connected to the read-only memory devices in the same column are connected to the same control bit line. The control terminals of the read-only memory devices in the same row are connected to the same control word line. The control terminals of the lower-level data selection devices are connected to the corresponding data selection control lines. The column selection devices and the lower-level data selection devices are connected in series to form a series assembly. The input terminal of the series assembly is connected to the power supply level. The output terminal of the series assembly is connected to the data terminals of the read-only memory devices connected to the same storage state data lines. The control terminal of the upper-level data selection device is connected to the corresponding data selection control lines. The input terminal of the upper-level data selection device is connected to the data terminals of the read-only memory devices connected to the same storage state data lines. The output terminal of the upper-level data selection device is connected to the corresponding computing bit line CBL through a capacitor.
[0105] For example, such as Figure 5bAs shown, the reset switch Qf2 includes a reset control terminal, a reset detection terminal, and a reset terminal. The reset control terminal is connected to the control word line WL. <1> The device is used to adjust the impedance characteristics between the reset detection terminal and the reset terminal. The reset terminal is connected to a reset state level line to receive the reset state level Vpre. The reset detection terminal and the first end of capacitor C2 are connected to the data terminals of each read-only memory device through a host data selection device. The second end of capacitor C2 is connected to the calculation bit line CBL and the result data is output through the calculation bit line. In the working mode, the states of the data selection control line Ctrl and the data selection control line CtrlB are opposite to each other, that is, when one is high, the other is low.
[0106] For example, such as Figure 5b As shown, the computing module may include read-only memory devices Q9 to Q12, data selection devices Qc17 to Qc48, and column selection devices Qc49 to Qc64. The control terminals of read-only memory devices Q9 and Q10 are both connected to the control word line WL. <2> The control terminals of read-only memory devices Q11 and Q12 are both connected to the control word line WL. <3> Data selection devices Qc17-Qc20, Qc25-Qc28, Qc37-Qc40, and Qc45-Qc48 are all connected to the data selection control line Ctrl. Data selection devices Qc21-Qc24, Qc29-Qc32, Qc33-Qc36, and Qc41-Qc44 are all connected to the data selection control line CtrlB. Column selection devices Qc49-Qc56 are all connected to the control bit line BL. <1> Column selectors Qc57 to Qc64 are all connected to the control bit line BL. <2> .
[0107] For example, each computing module can be connected to one or more power supplies of different levels to achieve multiple current sources or power supply levels with different operating values. In current domain computing mode, the current source can be connected to one or more read-only memory devices (ROMs), and in charge domain computing mode, the power supply level can be connected to one or more ROMs. This disclosure embodiment further leverages the high-density characteristics of ROMs by multiplexing current sources and power supply levels, thereby improving overall area efficiency.
[0108] For example, in Figure 4a , Figure 4b and Figure 5a , Figure 5bIn the computing module shown, each read-only memory (ROM) device can store 4 bits of information. Each data terminal of the ROM device is connected to one of the four storage state data lines (2 bits of data have four states) on each side to realize the storage of 2 bits of information at each data terminal of the ROM device. This embodiment of the present disclosure does not limit the number of storage state data lines on each side of the data terminal of each ROM device in the computing module of the in-memory computing device. Those skilled in the art can choose an appropriate method to implement this according to the actual integrated circuit process, layout drawing method, and needs.
[0109] In one possible implementation, embodiments of this disclosure can achieve the multiplexing of device selection and storage state data lines, further improving the storage density of the in-memory computing device.
[0110] In one possible implementation, the read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and the second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
[0111] In one possible implementation, the first storage state data line connecting each ROM device in the (K+1)th column is the second storage state data line connecting each ROM device in the Kth column, where K is an integer. An exemplary description follows.
[0112] Please see Figure 6a , Figure 6b , Figure 6a , Figure 6b A schematic diagram is shown of a calculation module according to an embodiment of the present disclosure performing a first multiplexing method for selecting devices in the current domain and charge domain.
[0113] In one possible implementation, such as Figure 6a and Figure 6bAs shown, the data terminals of adjacent columns of read-only memory devices within the computing module of the in-memory computing device can multiplex the intermediate storage status data lines and selection devices (including lower-level data selection devices, upper-level data selection devices, and column selection devices). In the first multiplexing method, the two data terminals of the read-only memory devices in the same column are connected to the corresponding serial assembly formed by the column selection device and the lower-level data selection device. The column selection device in the serial assembly to which the two data terminals of the read-only memory devices in the same column are connected is controlled by two different column selection signals (Column Select line, CSL). When an operation is performed on a certain column of read-only memory devices, the column selection signals on both sides of the corresponding column must be activated simultaneously.
[0114] For example, such as Figure 6a As shown, in the current domain, to perform a read or multiplication operation on the read-only memory device R1, the column select signal CSL can be simultaneously activated. <0> and column selection signal CSL <1> Set the column select signal to high level VDD, and set the other column select signals to low level VSS; activate the corresponding control word line WL. <1> The remaining control word lines WL are set to low level VSS; and according to the data to be operated, one of the data selection control lines Ctrl and CtrlB is set to high level VDD and the other to low level VSS, and the result is output to the calculation bit line CBL in the form of current.
[0115] For example, such as Figure 6b As shown, in the charge domain, to perform a read or multiplication operation on the read-only memory device R5, the column select signal CSL can be simultaneously activated. <0> and column selection signal CSL <1> Set the select signal to high (VDD), and the select signals for the remaining columns to low (VSS); activate the corresponding control word line (WL). <2> The remaining control word lines WL are set to low level VSS; and according to the data to be operated, one of the data selection control lines Ctrl and CtrlB is set to high level VDD and the other is set to low level VSS. The result is output to the calculation bit line CBL in the form of charge through the capacitor.
[0116] The above describes some possible implementations of the computing module in the charge domain and current source, as well as the first reuse method for reusing devices in each possible implementation. Of course, the embodiments disclosed herein are not limited to this. The computing module can also have other implementations in the charge domain and current source, as well as other reuse methods, which will be described by example below.
[0117] Please see Figure 7a , Figure 7b , Figure 7a , Figure 7b A schematic diagram is shown of a second multiplexing method for the selected device in the current domain and charge domain according to an embodiment of the present disclosure.
[0118] In one possible implementation, such as Figure 7a As shown, the selection device may include a data selection device and a column selection device. The data selection device includes a higher-level data selection device and a lower-level data selection device. The higher-level data selection device may include a first higher-level data selection device Q11 and a second higher-level data selection device Q12. The lower-level data selection device includes a first lower-level data selection device Q21 and a second lower-level data selection device Q22. The data selection control line includes a first data selection control line CtrlB and a second data selection control line Ctrl. The storage status data line includes a first storage status data line and a second storage status data line.
[0119] The input terminals of the first lower-level data selection device Q21 and the second lower-level data selection device Q22 are connected to the excitation source (such as a current source module or power supply).
[0120] The output terminal of the first lower-level data selection device Q21 is connected to the input terminal of the second upper-level data selection device Q12, and is also connected to the input terminal of the corresponding column selection device.
[0121] The output of the second lower-level data selection device Q22 is connected to the input of the first upper-level data selection device Q11, and is also connected to the input of another column selection device. The output of each column selection device is connected to the corresponding storage status data line.
[0122] The output terminals of the first upper-level data selection device Q11 and the second upper-level data selection device Q12 are connected to the calculation bit line CBL.
[0123] The control terminals of the first lower-level data selection device Q21 and the first upper-level data selection device Q11 are both connected to the first data selection control line CtrlB.
[0124] The control terminals of the second lower-level data selection device Q22 and the second upper-level data selection device Q12 are both connected to the second data selection control line Ctrl.
[0125] The first data terminal and the second data terminal of each read-only memory device are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
[0126] In one possible implementation, the read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and the second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
[0127] In one possible implementation, the first storage state data line connected to each read-only memory device in the (K+1)th column is the second storage state data line connected to each read-only memory device in the Kth column, where K is an integer;
[0128] In this context, multiple column select devices corresponding to each column read-only memory device multiplex the same set of data select devices (e.g., ...). Figure 7a and Figure 7b The signal S generated is shown. <1> ~S <8> (Multiple data selection devices), in which the output terminals of each first upper-level data selection device Q11 and each second upper-level data selection device Q12 are connected to the same calculation bit line through capacitors or directly.
[0129] In one possible implementation, such as Figure 7a and Figure 7b As shown, the data terminals of adjacent columns of read-only memory devices within the computing module of the in-memory computing device can be multiplexed using selection devices (lower-level data selection device, upper-level data selection device, and column selection device). In the second multiplexing method, the two data terminals of the read-only memory devices in the same column are connected to the corresponding column selection devices. The column selection devices connected to the two data terminals of the read-only memory devices in the same column are controlled by two different column selection signals (Column Select line, CSL). When performing an operation on a certain column of read-only memory devices, the column selection signals on both sides of the corresponding column must be activated simultaneously.
[0130] At the same time, such as Figure 7a , Figure 7b As shown, in the second multiplexing method, the data selection devices (including each lower-level data selection device and each upper-level data selection device) can be multiplexed for all read-only memory devices within the computing module. By directly connecting the current source module or power supply to the symmetrical series circuit structure composed of the data selection devices (lower-level data selection devices and upper-level data selection devices), and connecting the corresponding column selection device at each connection point between the lower-level and upper-level data selection devices, the number of selection devices required in the computing module can be further reduced, significantly improving the area efficiency of the computing module.
[0131] For example, such as Figure 7a and 7b As shown, each column of read-only memory devices and the connected storage status data lines and column select devices can be considered as a unit, directly connected to the signal S generated by a group (16) of data select devices of the excitation source. <1> ~S <8> Each data selection device is connected to the input terminal of the column selection device in each unit, thereby multiplexing the data selection device. Furthermore, column selection devices can also be multiplexed, for example... Figure 7a and 7b As shown, two adjacent columns of read-only memory devices can share a set of storage status data lines and column select devices.
[0132] In the current and charge domains, the control method for performing read or multiplication operations on read-only memory devices in the second multiplexing method is the same as that in the first multiplexing method, and will not be repeated here.
[0133] Please see Figure 8 , Figure 8 A schematic diagram of the structure of an in-memory computing device according to an embodiment of the present disclosure is shown.
[0134] In one possible implementation, such as Figure 8 As shown, multiple computing modules can be electrically connected to form a layout of several rows and columns. The control word lines of computing modules in the same row are electrically connected, the control bit lines of computing modules in the same column are electrically connected, and the computing bit lines of computing modules in the same column are electrically connected.
[0135] In one possible implementation, such as Figure 8 As shown, multiple computing modules are arranged in a row and column layout to form a computing array. The control bit lines and computing bit lines of some or all computing modules in the same column are electrically connected, and the control word lines of some or all computing modules in the same row are electrically connected. Each control word line is driven by a word line driver, and each control bit line is driven by a bit line driver. Each computing module is connected to a power supply and data selection driver. The computing bit lines of each computing module are connected to an output detection interface. Each computing module supports multiplication and accumulation operations on the computing bit lines. The multiplication and accumulation operations support parallel computing on multiple computing bit lines.
[0136] In one possible implementation, the target operation may include a multiplication operation, a data read operation, and a multiply-accumulate operation. If it is a multiplication operation, the result data is the product of the stored data corresponding to the read-only memory device at the corresponding time and the control word line input data; if it is a data read operation, the result data is the stored data corresponding to the read-only memory at the corresponding time.
[0137] The result data includes multiplication and accumulation results, and the control module can also be used for:
[0138] Control one or more columns of calculation modules to perform multiply-accumulate operations, and / or control some or all of the calculation modules connected to the same calculation bit line to perform multiply-accumulate operations;
[0139] A set of data is input through the control word line, and the calculation module to participate in the multiplication and accumulation operation is selected. The read-only storage device to participate in the calculation within each calculation module is selected using the control word line, and the stored data to participate in the calculation is selected using the data selection control line. The multiplication operation between the input data and the stored data is completed in each calculation module. The multiplication results obtained from each calculation module are accumulated through the calculation bit line and output to obtain the multiplication and accumulation result.
[0140] The following provides examples of possible ways to achieve the target operation.
[0141] This disclosure embodiment can control each calculation module to perform multiplication operations, as shown below. Figure 4b , Figure 5b The structure of the current domain and charge domain calculation module shown is illustrated by way of example.
[0142] Please see Figure 9a , Figure 9b , Figure 9a , Figure 9b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiplication operation in the current domain is shown.
[0143] In one example, such as Figure 9a As shown, the data selection control lines Ctrl and CtrlB are global control signals for the computing array and can be connected to other computing modules. Therefore, both Ctrl and CtrlB can be set to low level VSS simultaneously to put all computing modules into idle mode, or one can be kept high level VDD and the other low level VSS to keep other computing modules in working state. Regardless of the signal configuration of Ctrl and CtrlB, the computing module can be disconnected from the computing word line CBL and put into idle mode by setting all control word lines WL<1:2> and all control bit lines BL<1:2> to low level VSS, thus putting the computing module into idle mode and eliminating internal current in the computing module.
[0144] In one example, such as Figure 9bAs shown, the control bit line BL can be... <1> And set the data selection control line Ctrl to high level VDD, and the control bit line BL... <2> Control word line WL <1> And set the data selection control line CtrlB to low level VSS, by controlling the control word line WL <2> Setting it to VIN1 enables a multiplication operation between the input data VIN1 and the data stored in the read-only memory device Q7 when the data selection control line Ctrl is high (VDD). The output detection interface can detect the current value connected to the calculation bit line CBL to obtain the corresponding multiplication calculation result.
[0145] Please see Figure 10a , Figure 10b , Figure 10a , Figure 10b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiplication operation in the charge domain is shown.
[0146] In one example, such as Figure 10a As shown, the data selection control lines Ctrl and CtrlB are global control signals for the computing array and can be connected to other computing modules. Therefore, both Ctrl and CtrlB can be simultaneously set to low level VSS to put all computing modules into idle mode, or one can be kept high level VDD and the other low level VSS to keep other computing modules active. Regardless of the configuration of the data selection control lines Ctrl and CtrlB, they can be activated by first setting WL... <1> Set the control word lines WL<2:3> and all control bit lines BL<1:2> to VSS to clear the charge on the lower plate of capacitor C2 and the calculation bit line CBL. Then set the control word lines WL... <1> Set to low level VSS and keep the compute bit line CBL floating.
[0147] Then, as Figure 10b As shown, the control bit line BL can be... <1> And set the data selection control line Ctrl to high level VDD, and the control bit line BL... <2> 1. Set the data selection control line CtrlB to low level VSS, and then set the control word line WL to low level. <2> Set it to VIN2 and set the control word line WL. <1> With control word line WL <2> The voltage complementarity enables the multiplication operation between the input data VIN2 and the data stored in the read-only memory device Q9 when the data selection control line Ctrl is high (VDD). The output detection interface can detect the corresponding charge on the calculation bit line CBL to obtain the corresponding multiplication calculation result.
[0148] In one possible implementation, the present disclosure may further include multiply-accumulate operations, in which the control module controls the corresponding control word lines, control bit lines, and data selection control lines to select the corresponding stored data of the read-only storage devices participating in the calculation within each calculation module 10 of the calculation array, and implements multiply-accumulate operations with the calculation modules connected to the same calculation bit line simultaneously by one or more calculation bit lines.
[0149] Please see Figure 11a , Figure 11b , Figure 11a , Figure 11b A schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiply-accumulate operation in the current domain is shown.
[0150] In one example, such as Figure 11a As shown, the data selection control lines Ctrl and CtrlB are global control signals for the computing array and can be connected to other computing modules. Therefore, both Ctrl and CtrlB can be set to low level VSS simultaneously to put all computing modules into idle mode, or one can be kept high level VDD and the other low level VSS to keep other computing modules in working state. Regardless of the signal configuration of Ctrl and CtrlB, the computing module can be disconnected from the computing word line CBLm and put into idle mode by setting all control word lines WL1<1:2> and WL2<1:2> and all control bit lines BLm<1:2> to low level VSS, so that the computing module is in idle mode and there is no current inside the computing module.
[0151] In one example, such as Figure 11b As shown, the control bit line BLm can be... <1> And set the data selection control line Ctrl to high level VDD, and the control bit line BLm. <2> Control word line WL1 <2> WL2 <2> And set the data selection control line CtrlB to low level VSS, by controlling the control word line WL1. <1> and WL2 <1> Setting them to VIN3 and VIN4 respectively enables the multiplication operation between input data VIN3 and the data stored in read-only memory device M1 when the data selection control line Ctrl is high (VDD), and the multiplication operation between input data VIN4 and the data stored in read-only memory device M5. The output detection interface can detect the current value connected to the calculation bit line CBLm, and accumulate the corresponding multiplication result current according to Kirchhoff's current law to obtain the corresponding multiplication accumulation calculation result.
[0152] Please see Figure 12a , Figure 12b , Figure 12a , Figure 12bA schematic diagram of a computing module of an in-memory computing device according to an embodiment of the present disclosure performing a multiply-accumulate operation in the charge domain is shown.
[0153] In one example, such as Figure 12a As shown, the data selection control lines Ctrl and CtrlB are global control signals for the computing array and can be connected to other computing modules. Therefore, both Ctrl and CtrlB can be set to low level VSS simultaneously to put all computing modules into idle mode, or one can be kept high level VDD and the other low level VSS to keep other computing modules in working state. Regardless of the signal configuration of the data selection control lines Ctrl and CtrlB, it can be achieved by first setting WL1... <1> and WL2 <1> Set the control word lines WL1<2:3>, WL2<2:3>, and all control bit lines BLm<1:2> to a low level VSS to clear the charge on the lower plates of capacitors C3 and C4 and the calculation bit line CBLm. Then set the control word line WL1... <1> and WL2 <1> Set to low level VSS and keep the compute bit line CBLm floating.
[0154] In one example, such as Figure 12b As shown, the control bit line BLm can be... <1> And set the data selection control line Ctrl to high level VDD, and the control bit line BLm. <2> 1. Set the data selection control line CtrlB to low level VSS, and then set the control word line WL1 to low level. <2> and WL2 <2> Set them to VIN5 and VIN6 respectively, and set the control word line WL1. <1> With control word line WL1 <2> Voltage complementarity and control word line WL2 <1> With control word line WL2 <2> The voltage complementarity enables multiplication operations between input data VIN5 and data stored in read-only memory device M9 when the data selection control line Ctrl is high (VDD), and multiplication operations between input data VIN6 and data stored in read-only memory device M13. The output detection interface can detect the corresponding charge amount connected to the calculation bit line CBLm, and accumulate the corresponding multiplication results according to the charge redistribution to obtain the corresponding multiplication accumulation calculation result in the form of charge amount.
[0155] In one possible implementation, the control module can also be used for:
[0156] Each computing module is controlled to enter either a working mode or an idle mode, and in the working mode, each computing module performs the target operation.
[0157] For example, the computing module of the in-memory computing device in this embodiment supports a working mode and an idle mode. In the working mode, read operations, multiplication operations, or multiply-accumulate operations are performed. In the idle mode, the computing module can enter the idle mode by controlling the voltage settings of the word line and the bit line, thereby reducing power consumption. Furthermore, the read-only memory device stores data through hard connections between its data terminals and the corresponding storage state data lines, so the stored data has non-volatile characteristics.
[0158] Please see Figure 13a , Figure 13b , Figure 13a , Figure 13b A schematic diagram showing the setting of the computing module of the in-memory computing device according to an embodiment of the present disclosure in idle mode in the current domain and charge domain is shown.
[0159] In one example, such as Figure 13a and 13b As shown, in this embodiment of the present disclosure, by setting all control word lines and control bit lines to low level VSS, the computing module of the in-memory computing device can enter an idle mode, thereby reducing array power consumption.
[0160] In one possible implementation, the control module can also be used for:
[0161] Adjusting the signal timing of the control word line and control bit line enables pipelined operation between different computation bit lines.
[0162] Please see Figure 14a , Figure 14b , Figure 14c , Figure 14d , Figure 14a , Figure 14b , Figure 14c , Figure 14d A schematic diagram of pipelined operation of an in-memory computing device according to an embodiment of the present disclosure is shown.
[0163] In one example, the multiplication and accumulation operation can be performed according to the aforementioned calculation module. With the help of other circuits, such as setting one or more switching devices between the calculation modules, MOSFETs are used as the switching devices in this embodiment. By adjusting the timing of signals such as control word lines and control bit lines, pipelined operation between the calculation and output detection interfaces of the calculation modules connected to different calculation bit lines can be realized.
[0164] For example, pipelined operation can refer to transmitting the calculation result of the previous calculation module to the current calculation module through the computation bit line (CBL) to participate in the calculation operation of the current calculation module, and transmitting the result of the current calculation module to the next calculation module through the computation bit line (CBL) for calculation operation.
[0165] In one example, such as Figure 14a , Figure 14b As shown, pipelined operation can be implemented according to the configuration of the current domain calculation module.
[0166] In one example, such as Figure 14c , Figure 14d As shown, pipelined operation can be implemented according to the configuration of the charge domain calculation module.
[0167] This embodiment of the disclosure can improve system throughput and the utilization rate of the output detection interface by adjusting the signal timing of the control word line and control bit line, thereby enabling pipelined operation between the multiply-accumulate calculation and the output detection interface of the computing modules connected to different computing bit lines.
[0168] The embodiments disclosed herein do not limit the specific timing of the control word lines and control bit lines during pipeline operation. Those skilled in the art can choose an appropriate method to implement it according to the actual situation and needs.
[0169] According to one aspect of this disclosure, a neural network accelerator is provided, the accelerator including the aforementioned in-memory computing device.
[0170] Please see Figure 15 , Figure 15 A schematic diagram of a neural network accelerator according to an embodiment of the present disclosure is shown.
[0171] like Figure 15 As shown, the neural network accelerator consists of external modules such as word line drivers, bit line drivers, and a computing array composed of computing modules. The weight values of the neural network model can be stored via a read-only memory device (ROM) and its corresponding storage state data lines, electrically connected. Input data, such as feature maps, is input through the word lines to perform multiplication and accumulation operations on the input values and weight values within the neural network accelerator's computing array. The electrical characteristics of the computing bit lines are detected through an output detection interface, and the detected analog output signal is converted into a corresponding digital signal for output.
[0172] For example, embodiments of this disclosure can be used to accelerate computation in fixed-point neural network inference.
[0173] According to one aspect of this disclosure, an electronic device is provided, the electronic device including the neural network accelerator described above.
[0174] As described above, the embodiments of this disclosure use read-only memory devices to store weights to realize in-memory computing devices. By making full use of both ends of the read-only memory device and adopting a time-sharing reading method, the read-only memory device has the characteristics of high-density and multi-bit data storage. The traditional 6T SRAM structure uses 6 transistors to store 1 bit of information, while in the embodiments of this disclosure, the read-only memory device only needs to use one transistor to store multiple bits of information. In one example, a single read-only memory device can be used to store 4 bits of information, which greatly improves the storage density and enables the chip to have the potential to store all parameters of a large-scale neural network on-chip. This reduces or even eliminates the additional power consumption and latency caused by data movement on and off-chip, reduces the energy consumption and latency during inference operations, and enables artificial intelligence algorithms to be efficiently deployed to edge or terminal devices.
[0175] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A high-density in-memory computing device based on read-only memory devices, characterized by, The device includes: Multiple computing modules are provided, each including at least one read-only memory device, multiple selection devices, an excitation source, a storage state data line, a control word line, a computing bit line, and a data selection control line. The control terminal of the read-only memory device is connected to the corresponding control word line to receive control signals. The two data terminals of the read-only memory device are respectively connected to different storage state data lines to realize data storage. The storage state data lines are connected to the excitation source and the computing bit line through the selection devices. The data state of each storage state data line is characterized by current or voltage level according to the type of the excitation source. The control terminal of the selection device is connected to the corresponding data selection control line to receive data selection control signals, which are used to select the corresponding selection device. The control module is connected to the control word lines and data selection control lines of each computing module. It is used to select the corresponding read-only memory device and selection device for target operation through the control word lines and data selection control lines, and output the result data through the computing bit lines.
2. The apparatus of claim 1, wherein, The selection device includes a data selection device, which comprises a higher-level data selection device and a lower-level data selection device. The higher-level data selection device includes a first higher-level data selection device and a second higher-level data selection device. The lower-level data selection device includes a first lower-level data selection device and a second lower-level data selection device. The data selection control line includes a first data selection control line and a second data selection control line. The storage status data line includes a first storage status data line and a second storage status data line. The input terminals of both the first lower-level data selection device and the second lower-level data selection device are connected to the excitation source. The output terminal of the first lower-level data selection device is connected to the input terminal of the second upper-level data selection device through the first storage status data line. The output terminal of the second lower-level data selection device is connected to the input terminal of the first upper-level data selection device via the second storage status data line. The control terminals of the first lower-level data selection device and the first upper-level data selection device are both connected to the first data selection control line. The control terminals of the second lower-level data selection device and the second upper-level data selection device are both connected to the second data selection control line. The output terminals of the first and second upper-level data selection devices are connected to the calculation bit lines. The first data terminal and the second data terminal of each read-only memory device are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
3. The apparatus of claim 2, wherein, The selection device also includes column selection devices. The control terminal of each column selection device is connected to the control bit line to receive column selection signals. The input terminal of the series component formed by each column selection device and the lower-level data selection device is connected to the excitation source. The output terminal of the series component formed by each column selection device and the lower-level data selection device is connected to the corresponding storage status data line.
4. The apparatus of claim 3, wherein, The excitation source includes multiple current sources or multiple voltage sources. If the excitation source is multiple voltage sources, the calculation module also includes a reset switch and a capacitor. The control terminal of the reset switch is connected to the reset control word line for receiving a reset signal. The reset terminal of the reset switch is connected to a reset level line. The reset detection terminal of the reset switch and the first terminal of the capacitor are connected to the output terminal of the upper-level data selection device. The second terminal of the capacitor is connected to the calculation bit line.
5. The apparatus of claim 4, wherein, The read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
6. The apparatus of claim 5, wherein, The first storage state data line connecting each ROM device in the (K+1)th column is the second storage state data line connecting each ROM device in the Kth column, where K is an integer.
7. The apparatus of claim 1, wherein, The selection device includes a data selection device and a column selection device. The data selection device includes a higher-level data selection device and a lower-level data selection device. The higher-level data selection device includes a first higher-level data selection device and a second higher-level data selection device. The lower-level data selection device includes a first lower-level data selection device and a second lower-level data selection device. The data selection control line includes a first data selection control line and a second data selection control line. The storage status data line includes a first storage status data line and a second storage status data line. The input terminals of the first lower-level data selection device and the second lower-level data selection device are connected to the excitation source. The output terminal of the first lower-level data selection device is connected to the input terminal of the second upper-level data selection device, and is also connected to the input terminal of the corresponding column selection device. The output of the second lower-level data selection device is connected to the input of the first upper-level data selection device, and is also connected to the input of another column selection device. The output of each column selection device is connected to the corresponding storage status data line. The output terminals of the first and second upper-level data selection devices are connected to the calculation bit lines. The control terminals of the first lower-level data selection device and the first upper-level data selection device are both connected to the first data selection control line. The control terminals of the second lower-level data selection device and the second upper-level data selection device are both connected to the second data selection control line. The first data terminal and the second data terminal of each read-only memory device are respectively connected to the corresponding first storage state data line and the corresponding second storage state data line.
8. The apparatus according to claim 7, characterized in that, The read-only memory devices in the computing module are arranged in a layout of several rows and several columns. The control terminal of each row of read-only memory devices is connected to the same control word line. The two data terminals of each read-only memory device in each column are respectively connected to the corresponding first storage state data line and second storage state data line. The output terminal of each upper-level data selection device is connected to the corresponding computing bit line through a capacitor or directly, and the result data is output through the computing bit line.
9. The apparatus of claim 8, wherein, The first storage state data line connected to each read-only memory device in the (K+1)th column is the second storage state data line connected to each read-only memory device in the Kth column, where K is an integer; In this group of data selection devices, multiple column selection devices corresponding to each column read-only memory device reuse the same set of data selection devices. The output terminals of each first upper-level data selection device and each second upper-level data selection device in this set of data selection devices are connected to the same calculation bit line through capacitors or directly.
10. The device according to any one of claims 1 to 9, characterized in that Multiple computing modules are electrically connected to form a layout of several rows and columns. The control word lines of computing modules in the same row are electrically connected, the control bit lines of computing modules in the same column are electrically connected, and the computing bit lines of computing modules in the same column are electrically connected.
11. The apparatus of claim 10, wherein, The target operation includes multiplication, data read, and multiply-accumulate operations. If it is a multiplication operation, the result data is the product of the stored data in the read-only memory at the corresponding time and the control word line input data. If it is a data read operation, the result data is the stored data in the read-only memory at the corresponding time. The result data includes multiplication and accumulation results, and the control module is further used for: Control one or more columns of calculation modules to perform multiply-accumulate operations, and / or control some or all of the calculation modules connected to the same calculation bit line to perform multiply-accumulate operations; A set of data is input through the control word line, and the calculation module to participate in the multiplication and accumulation operation is selected. The read-only storage device to participate in the calculation within each calculation module is selected using the control word line, and the stored data to participate in the calculation is selected using the data selection control line. The multiplication operation between the input data and the stored data is completed in each calculation module. The multiplication results obtained from each calculation module are accumulated through the calculation bit line and output to obtain the multiplication and accumulation result.
12. The apparatus of claim 11, wherein, The control module is also used for: Adjusting the signal timing of the control word line and control bit line enables pipelined operation between different computation bit lines.
13. The apparatus of claim 11, wherein, The control module is also used for: Each computing module is controlled to enter either a working mode or an idle mode, and in the working mode, each computing module performs the target operation.
14. A neural network accelerator, comprising: The neural network accelerator includes a high-density in-memory computing device based on a read-only memory device as described in any one of claims 1-13.
15. An electronic device, comprising: The electronic device includes a high-density in-memory computing device based on a read-only memory device as described in any one of claims 1-13, or includes a neural network accelerator as described in claim 14.