Low power charge domain computing unit with 2T1R1C and its readout circuit
By using the 2T1R1C charge domain storage unit and its readout circuit, and by utilizing the charging and discharging of the capacitor and the threshold voltage of the NMOS transistor, the problem of high power consumption of the RRAM storage unit is solved, and low-power and high-precision multiplication operations are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FUZHOU UNIV
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-19
AI Technical Summary
In the prior art, the power consumption of resistive random access memory (RRAM) storage units is too high, and traditional readout circuits require additional reference voltage generation circuits, resulting in wasted power consumption.
The low-power charge domain memory cell and its readout circuit employ 2T1R1C, including an M-row × N-column 2T1R1C charge domain memory cell array and readout circuit. Multiplication is achieved by charging and discharging the capacitor, and the capacitor voltage state is determined by the threshold voltage of the counter and NMOS transistor, eliminating the need for an additional reference voltage generation circuit.
It effectively reduces the power consumption of the memory computing unit and its readout circuit, improves the calculation accuracy, and ensures the accuracy of the multiplication results and low power consumption characteristics.
Smart Images

Figure CN122245373A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of integrated circuit design technology, specifically relating to a low-power charge domain memory cell using 2T1R1C and its readout circuit, which is particularly suitable for memory-based computing architectures and artificial intelligence hardware acceleration chips based on resistive random access memory (RRAM). Background Technology
[0002] The Compute-in-Memory (CIM) architecture, with Resistive Random Access Memory (RRAM) at its core, has become a key technology for breaking through the "memory wall" and "power wall" bottlenecks of traditional computing architectures due to its high storage density. It is applied in neural networks to core tasks such as feature extraction and convolution operations, providing efficient support for accelerating artificial intelligence hardware.
[0003] In practical applications, it is necessary to implement the multiplication operation between input data and weights through the RRAM memory unit, and read the multiplication result of the RRAM memory unit through the readout circuit. There are two main implementation methods in the existing technology: (1) Applying the input voltage to the RRAM to obtain the current, which represents the product of the input voltage and the RRAM conductance. Although this method can accurately reflect the weight information, the current generation requires a large amount of static power consumption. (2) Using a comparator, the result of the memory unit operation is compared with the reference voltage to obtain the quantization result. This method is more flexible in quantization, but it requires the introduction of an additional reference voltage generation circuit, which wastes some power consumption.
[0004] Therefore, how to effectively reduce the power consumption of the memory computing unit and its readout circuit while ensuring computational accuracy is a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to solve the problems of excessive power consumption in the current readout mode and the need for an additional reference voltage generation circuit in the traditional readout circuit, and to provide a low-power charge domain memory cell using 2T1R1C and its readout circuit.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: a low-power charge domain memory cell using 2T1R1C and its readout circuit, comprising an M-row × N-column 2T1R1C charge domain memory cell array and a readout circuit, wherein M is a perfect square and N is a natural number;
[0007] The memory-in-memory unit includes a first transistor (T1), a second transistor (T2), a memristor (RRAM), and a capacitor (C1). The gate of the first transistor (T1) is used to receive a 1-bit input signal, and the source of the first transistor (T1) is connected to the power supply. The first terminal of the memristor (RRAM) is connected to the drain of the first transistor (T1), and the second terminal of the memristor (RRAM) is connected to the drain of the second transistor (T2). The resistance value of the memristor (RRAM) represents 1-bit weight information. The gate of the second transistor (T2) is used to receive a reset signal, and the source of the second transistor (T2) is grounded. The first plate of the capacitor (C1) is connected to the common node between the second terminal of the memristor (RRAM) and the drain of the second transistor (T2), serving as the output terminal of the memory-in-memory unit. The second plate of the capacitor (C1) is grounded.
[0008] The readout circuit includes N column accumulation sub-modules connected to each column of the memory cell array, and a shift-add circuit connected to all N column accumulation sub-modules. Each column accumulation sub-module includes an M-to-1 data selector, a third transistor, and a counter. For each column of memory cells, the outputs of all M memory cells are connected to the input of an M-to-1 data selector for the corresponding column, and the output of the M-to-1 data selector is connected to the gate of the third transistor. The drain of the third transistor is connected to the input of the counter, and the source of the third transistor is grounded. The output of the counter outputs the accumulation result of the column of memory cells. The inherent threshold voltage of the third transistor is used as a reference voltage to distinguish the voltage state on the capacitor (C1) in the memory cell, thereby controlling the conduction or cutoff of the third transistor and triggering the counter to count.
[0009] Furthermore, the workflow of the storage unit includes a reset phase and a multiplication phase;
[0010] During the reset phase, the second transistor is turned on by the reset signal to discharge the charge on the capacitor to ground;
[0011] During the multiplication phase, the result of the multiplication operation using a 1-bit input signal and a 1-bit weight information is represented by the voltage across the capacitor: the first transistor is turned on when and only when the input signal is logic 1 and the weight information is 1, the memristor is in a low-resistance state, and the power supply charges the capacitor through the first transistor and the memristor; under other combinations of input and weight, the capacitor remains at a low level and there is no charging process.
[0012] Furthermore, the first transistor is a PMOS transistor, the second transistor is an NMOS transistor, and the third transistor is an NMOS transistor.
[0013] Furthermore, the memristor and capacitor form an RC low-pass filter to attenuate the high-frequency glitches caused by the clock feedthrough introduced by the parasitic capacitance of the first transistor, so as to stabilize the voltage on the capacitor and ensure the accuracy of the multiplication result.
[0014] Furthermore, the capacitance value is determined based on charging speed constraints and noise margin constraints; the charging speed constraints are:
[0015]
[0016] in, The charging time constant. This refers to the low-resistance state resistance value of the memristor. Preset charging time, This is the charging completion rate coefficient;
[0017] The noise margin constraint is:
[0018]
[0019] in, This is the capacitance value. Boltzmann's constant, For temperature, Allowable voltage fluctuations;
[0020] Under the premise of satisfying charging speed constraints and noise margin constraints, the minimum capacitance value is selected. To reduce power consumption.
[0021] Furthermore, the counter is a falling edge triggered counter. When the capacitor in the storage unit is fully charged, its output voltage is output to the gate of the third transistor via the M-to-1 data selector. When the output voltage is greater than the threshold voltage of the third transistor, the third transistor is turned on, generating a falling edge at its drain, triggering the counter to increment by one, thus completing the accumulation of the multiplication result. When the capacitor (C1) is depleted, the third transistor is turned off, and the counter does not count. Finally, the counter outputs the accumulated value of all multiplication results in the corresponding column.
[0022] Furthermore, the input terminals of the shift-add circuit are respectively connected to the output terminals of the counters in each column accumulation submodule, and are used to perform weighted fusion of the accumulation results of each column storage unit, outputting the final multiply-accumulate operation result MAC_OUT, whose bit width is... .
[0023] Furthermore, in the memory cell array, the gate of the first transistor of each row of memory cells is connected to the same 1-bit input signal, and each column of memory cells corresponds to the same bit in the N-bit weight, which is used to realize the parallel multiplication operation of the 1-bit input with all the corresponding bits of the weight in the column.
[0024] Furthermore, the high-resistivity state HRS of the memristor corresponds to a weight of 0, and the low-resistivity state LRS corresponds to a weight of 1.
[0025] Furthermore, the charging frequency f of the capacitor in the memory unit ranges from 10kHz to 1GHz.
[0026] Compared with the prior art, the present invention has the following beneficial effects:
[0027] (1) Low power consumption: First, traditional memory computing units represent the multiplication result through current, which requires a continuous current path and generates a large amount of static power consumption. The present invention realizes the "multiplication" process by charging and discharging the capacitor in the proposed 2T1R1C unit, which transforms the static power consumption in the traditional current domain into the dynamic power consumption in the charge domain. The static power consumption is negligible. Second, traditional memory computing units need to initialize the memory computing unit through a pre-charging process, which adds extra dynamic power consumption. The present invention eliminates this process and only charges the capacitor in the "multiplication: 1×1" scenario. It does not charge in the other three scenarios, reducing the number of capacitor charging times and further reducing dynamic power consumption. Finally, the readout circuit judges the voltage on the memory computing unit capacitor through the threshold voltage of NMOS. No additional reference source generation circuit is required, which can reduce the overhead of the external circuit and thus reduce the overall power consumption.
[0028] (2) High precision: First, the storage unit in this invention adopts charge domain operation, so that the change of RRAM resistance only affects the charging and discharging speed of the capacitor without changing the multiplication result. In addition, the low-pass filter formed by RRAM and capacitor can effectively attenuate high-frequency interference and avoid reading errors caused by incomplete charging and discharging of capacitor. Second, the readout circuit directly reads the high-precision output of storage unit without introducing errors from additional analog signal processing links, ensuring the overall output accuracy of multiplication and accumulation operation. Attached Figure Description
[0029] Figure 1 This is a schematic diagram of the overall structure of the low-power charge domain memory unit and its readout circuit provided in an embodiment of the present invention.
[0030] Figure 2 This is a schematic diagram of the structure of the 2T1R1C charge domain storage unit in an embodiment of the present invention;
[0031] Figure 3 This is a schematic diagram of the 2T1R1C charge domain storage unit performing a "reset" operation in an embodiment of the present invention;
[0032] Figure 4 This is a schematic diagram (four operating conditions) of the 2T1R1C charge domain storage unit performing 1-bit "multiplication" in an embodiment of the present invention.
[0033] Figure 5 This is a power consumption comparison diagram between the 2T1R1C charge domain storage unit and the control group unit (using current to obtain the multiplication result) in the embodiment of the present invention. Detailed Implementation
[0034] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0035] It should be noted that the following detailed descriptions are exemplary and intended to provide further explanation of this application. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains.
[0036] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this application. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0037] like Figure 1 As shown, this embodiment provides a low-power charge domain memory cell using 2T1R1C and its readout circuit, including an M-row × N-column 2T1R1C charge domain memory cell array (M is a perfect square number and N is a natural number) and a readout circuit.
[0038] like Figure 2 As shown, the in-memory computing unit includes a first transistor T1, a second transistor T2, a memristor RRAM, and a capacitor C1. The gate of the first transistor T1 is used to receive a 1-bit input signal, and the source of the first transistor T1 is connected to the power supply. The first terminal of the memristor RRAM is connected to the drain of the first transistor T1, and the second terminal of the memristor RRAM is connected to the drain of the second transistor T2. The resistance value of the memristor RRAM represents 1-bit weight information. The gate of the second transistor T2 is used to receive a reset signal, and the source of the second transistor T2 is grounded. The first plate of the capacitor C1 is connected to the common node between the second terminal of the memristor RRAM and the drain of the second transistor T2, serving as the output terminal of the in-memory computing unit. The second plate of the capacitor C1 is grounded. The first transistor T1 is a PMOS transistor, and the second transistor T2 is an NMOS transistor.
[0039] The memristor RRAM and capacitor C1 form an RC low-pass filter to attenuate high-frequency glitches caused by the clock feedthrough introduced by the parasitic capacitance of the first transistor T1, thereby stabilizing the voltage across capacitor C1 and ensuring the accuracy of the multiplication result. The high-resistivity state HRS of the memristor RRAM corresponds to a weight of 0, and the low-resistivity state LRS corresponds to a weight of 1.
[0040] The capacitance value of capacitor C1 is determined based on charging speed constraints and noise margin constraints.
[0041] The charging speed constraint is:
[0042]
[0043] in, The charging time constant. This refers to the low-resistance state resistance value of the memristor. Preset charging time, This is the charging completion rate coefficient.
[0044] The noise margin constraint is:
[0045]
[0046] in, Let C1 be the capacitance value. Boltzmann's constant, For temperature, Allowable voltage fluctuations.
[0047] Under the premise of satisfying charging speed constraints and noise margin constraints, the minimum capacitance value is selected. To reduce power consumption.
[0048] The in-memory computing unit's workflow includes a reset phase and a multiplication phase. During the reset phase, a reset signal controls the second transistor T2 to turn on, discharging the charge on capacitor C1 to ground. During the multiplication phase, the result of the multiplication operation using a 1-bit input signal and 1-bit weight information is represented by the voltage across capacitor C1: the first transistor T1 turns on only when the input signal is logic 1 and the weight information is 1, the memristor RRAM is in a low-impedance state, and the power supply charges capacitor C1 through the first transistor T1 and the memristor RRAM; under other combinations of input and weight, capacitor C1 remains low, with no charging process.
[0049] In the array of storage units, the gate of the first transistor T1 of each row of storage units is connected to the same 1-bit input signal, and each column of storage units corresponds to the same bit in the N-bit weight, which is used to realize the parallel multiplication operation of the 1-bit input and all the corresponding bits of the weight in the column.
[0050] The readout circuit includes N column accumulation sub-modules connected to each column of the memory unit array, and a shift-add circuit connected to the N column accumulation sub-modules simultaneously.
[0051] The column accumulation submodule includes an M-to-1 data selector, a third transistor, and a counter. For each column of memory-based units, the outputs of all M memory-based units are connected to the input of an M-to-1 data selector for the corresponding column. The output of the M-to-1 data selector is connected to the gate of the third transistor. The drain of the third transistor is connected to the input of the counter, and the source of the third transistor is grounded. The output of the counter outputs the accumulation result of the memory-based units in that column. The inherent threshold voltage of the third transistor is used as a reference voltage to distinguish the voltage state on capacitor C1 in the memory-based unit, thereby controlling the conduction or cutoff of the third transistor and triggering the counter to count.
[0052] The counter is a falling edge triggered counter. The capacitor C1 in the storage unit is fully charged, and its output voltage is output to the gate of the third transistor through the M-to-1 data selector. When the output voltage is greater than the threshold voltage of the third transistor, the third transistor is turned on, and a falling edge is generated at its drain, triggering the counter to increment by one, thus completing the accumulation of the multiplication result. When the capacitor (C1) is depleted, the third transistor is turned off, and the counter does not count. Finally, the counter outputs the accumulated value of all multiplication results in the corresponding column.
[0053] The input terminals of the shift-add circuit are connected to the output terminals of each counter, and are used to perform weighted fusion of the accumulated results of each column of storage units, outputting the final multiply-accumulate result MAC_OUT, whose bit width is... .
[0054] The relevant technical content involved in this invention will be further explained below.
[0055] In this embodiment, the 2T1R1C charge domain memory array has a size of M rows × N columns (containing M × N memory units, where M is a perfect square and N is a natural number). The high resistance (HRS) / low resistance (LRS) states of the RRAM in the memory units correspond to weights 0 and 1, respectively. The M row N column array corresponds to a neural network. The convolution kernel contains M N-bit weights. Each column cell stores and operates on a 1-bit weight, and each row cell is connected to a 1-bit input signal. M rows correspond to M parallel 1-bit input signals. After the array receives a 1-bit voltage signal, it is multiplied by the weights of each cell in the array. The multiplication result is represented by the capacitor voltage in the cell. After selection by a data selector, the cell capacitor voltage is read using an NMOS transistor in the readout circuit. Then, the counter and shift-add circuit in the readout circuit sequentially perform accumulation and weighting operations, finally outputting the accumulated result of the 1-bit input multiplied by the convolution kernel. The connection method and working principle of the storage and operation units and the readout circuit are explained in detail below.
[0056] (1) 2T1R1C charge domain storage unit
[0057] 1) Circuit connection method. For example... Figure 2 As shown, the 2T1R1C charge domain memory unit consists of transistors T1 and T2, a memristor RRAM, and capacitor C1. T1, T2, and the RRAM are connected in series, while C1 and T2 are connected in parallel. The gate of T1 is connected to the input signal. When the input signal is 1, it represents a low level, corresponding to T1 being turned on; when the input signal is 0, it represents a high level, corresponding to T1 being turned off. The resistance value of the memristor RRAM can be switched between high resistance (HRS) and low resistance (LRS), corresponding to weights 0 and 1, respectively. The gate of T2 is connected to the reset signal to "reset" the memory unit. The upper plate of capacitor C1 is the output of the memory unit, connected to the subsequent readout circuit to read the voltage across the capacitor in the memory unit.
[0058] 2) Workflow. The workflow of the 2T1R1C charge domain memory cell consists of two parts: "reset" and "multiplication" operations.
[0059] When performing a "reset" operation, such as Figure 3 As shown, when the reset signal arrives, transistor T2 turns on, and the SL terminal is grounded, completely discharging the charge on C1 to ground and ensuring that there is no residual charge on C1.
[0060] When performing a "multiplication" operation, such as Figure 4As shown, the 1-bit multiplication operation of the memory unit includes four operating states: When performing a "multiplication: 1×1" operation, a low-level signal ("input 1") is input to the gate of T1, T1 is turned on, and the RRAM is in a low-resistance (LRS) state, representing a weight of 1. The power supply charges capacitor C1 through the low-resistance path between T1 and the RRAM, and the charge in the power supply is transferred and stored in capacitor C1. When performing a "multiplication: 0×1" operation, a high-level signal ("input 0") is input to the gate of T1, T1 is turned off, and the RRAM is in a low-resistance (LRS) state, representing a weight of 1. The power supply cannot charge capacitor C1 through the high-resistance path between T1 and the RRAM, C1 remains low, and the charge in the power supply is not stored. The charge is transferred to capacitor C1. When performing "multiplication: 0×0", a high-level signal ("input 0") is input to the gate of T1, T1 is turned off, and RRAM is in a high-impedance (HRS) state, representing a weight of 0. The power supply cannot charge capacitor C1 through the high-impedance path between T1 and RRAM, and C1 remains at a low level. The charge in the power supply is not transferred to capacitor C1. When performing "multiplication: 1×0", a low-level signal ("input 1") is input to the gate of T1, T1 is turned on, and RRAM is in a high-impedance (HRS) state, representing a weight of 0. The power supply cannot charge capacitor C1 through the high-impedance path between T1 and RRAM, and C1 remains at a low level. The charge in the power supply is not transferred to capacitor C1.
[0061] 3) The low power consumption characteristics of the 2T1R1C charge domain memory unit of the present invention are explained from two perspectives as follows.
[0062] ① The voltage across capacitor C1 is used as the result of the multiplication operation. The voltage and charge on the capacitor satisfy the following condition: The physical relationship between the two operations is that the multiplication operation is based on the charge storage and transfer of the capacitor. The traditional current domain multiplication operation does not involve charge storage and transfer on the capacitor. Therefore, the dynamic power consumption generated in the above four working conditions during the multiplication operation is called the charge domain power consumption, which can be uniformly expressed by formula (1):
[0063]
[0064] Where C1 is the size of the multiplication capacitor in the 2T1R1C charge domain storage unit, and Vdd is the power supply voltage. The frequency at which the capacitor is charged. The activity factor for capacitor charging. The value of C1 is determined below; The value must be adapted to the specific needs of the low-power in-memory computing and artificial intelligence hardware acceleration chips targeted by this patent: in high-speed computing scenarios (such as high-throughput data processing), The value is set too high to meet computing power requirements; in low-speed, low-power scenarios (such as lightweight computing on edge devices), The value is set relatively small to control power consumption. This is combined with the timing and hardware constraints of the RRAM in-memory computing architecture. The value range is typically 10kHz to 1GHz, with 100MHz being a common choice in typical applications to achieve a balance between computational efficiency and power consumption; regarding Since the memory cell has no pre-charging process, dynamic power consumption is only triggered when performing a "multiplication: 1×1" operation, where the power supply charges C1 through T1 and RRAM. In the other three operating conditions, the power supply does not charge C1, and dynamic power consumption is not triggered. This means that in the four operating conditions of 1-bit data multiplication, only the "multiplication: 1×1" operation requires charging C1, reducing the number of times C1 needs to be charged. Therefore, the dynamic power consumption triggering mechanism is optimized, meaning that dynamic power consumption (i.e., triggering) only occurs when performing a "multiplication: 1×1" operation. Furthermore, considering the independent and equally probable distribution of the four operating conditions... Furthermore, its power consumption can be further reduced in sparse input or sparse weight scenarios. When no charging event occurs or C1 is charged to a steady state, there is no DC conduction path between the power supply and ground in the memory cell. The static power consumption is mainly determined by the device leakage current and can be approximately ignored, thereby eliminating the static power consumption path. Therefore, the 2T1R1C charge domain memory cell of the present invention has only dynamic power consumption and no static power consumption in its charge domain.
[0065] Traditional current-domain power consumption calculations obtain the current by applying the input voltage to the RRAM and using it as the result of a multiplication operation. Its static power consumption is expressed by formula (2):
[0066]
[0067] in, Vdd is the equivalent resistance under the corresponding weighted state, and Vdd is the power supply voltage.
[0068] The power consumption ratio can be obtained from formulas (1) and (2):
[0069]
[0070] Combining formulas (1) to (3), the value of C1 can be given by the constraints of "speed - noise margin - energy consumption": to ensure that within the preset charging time Internal charging complete, can then:
[0071]
[0072] in, This is the charging completion coefficient, typically ranging from 3 to 5, to ensure the storage unit's output voltage reaches the preset voltage. Requirements of over 95%. To ensure that the margin of the cell read voltage meets noise and process variation requirements, it can be set... (where k is the Boltzmann constant and T is the temperature). By selecting the smallest possible C1 while satisfying the above constraints, the temperature can be linearly reduced. .
[0073] As can be seen from formula (3), the power consumption in the current domain is related to 1 / Proportional, therefore RRAM is in LRS (smaller) Power consumption increases significantly during the charge domain; however, power consumption in the charge domain is mainly determined by C1 and the activity factor α. By charging time constant only It affects the time required to complete charging, but not the energy output per charge. By eliminating static power consumption paths and optimizing the dynamic power consumption triggering mechanism, power consumption optimization for the "multiplication" operation is achieved.
[0074] ② There is parasitic capacitance between the gate and drain of T1. When a signal is input to the input terminal, it is equivalent to an input step signal. The clock feedthrough phenomenon caused by the parasitic capacitance will transmit the step signal to subsequent circuits, thus generating node glitch voltage. The transfer function of the low-pass filter composed of RRAM and C1 can be expressed as:
[0075]
[0076] From a power consumption perspective, the node voltage spikes caused by clock feedthrough can be equivalent to an undesirable charge / discharge of capacitor C1. If the spike amplitude is... The additional energy introduced by each burr is approximately The transfer function of the RC low-pass network formed by RRAM and C1 is... Attenuation of high-frequency components, making according to Shrink, therefore according to Further reductions were made, decreasing the ineffective dynamic power consumption introduced by input flipping.
[0077] From an accuracy perspective, the step signal at the input is attenuated by the RC filter, suppressing high-frequency components at the output node and making the voltage more stable. This prevents the capacitor from not being fully discharged (charged) within the preset charge / discharge cycle, which could lead to errors in the subsequent readout circuit. The introduction of the low-pass filter significantly reduces the interference of clock feedthrough on the charging process of capacitor C1, ensuring the accuracy of the multiplication result output by the memory-based computing unit. Furthermore, facing the RRAM resistance fluctuations caused by process corner variations, traditional current-domain operations suffer from error accumulation. However, this invention switches the memory-based computing unit operation from the current domain to the charge domain, ensuring that RRAM resistance changes only affect the charging and discharging speed of the capacitor, without interfering with the final calculation result, thus guaranteeing the accuracy of the calculation result.
[0078] Figure 5 For different C1 capacitance values, RRAM types, and unit capacitor charging frequencies Below is a power consumption comparison experiment between the charge domain and traditional current domain memory units of this invention. Common RRAM types include ECM (Electrochemical Metallization) and VCM (Valence Change Mechanism). The HRS range of ECM is 1 GΩ to 500 kΩ, and the LRS range is 100 kΩ to 1 kΩ; the HRS range of VCM is 100 kΩ to 1 kΩ, and the LRS range is 10 kΩ to 10 Ω. Figure 5 In a power consumption experiment comparing the 2T1R1C charge domain storage unit with a control group unit (using current to obtain multiplication results), the typical HRS value of the ECM type was 1MΩ, and the typical LRS value was 10kΩ; the typical HRS value of the VCM type was 5kΩ, and the typical LRS value was 1kΩ. Figure 5 The experimental results show that charge domain operations effectively save power consumption in multiplication operations compared to current domain operations.
[0079] (2) Readout circuit
[0080] like Figure 1 As shown, the readout circuit consists of N column accumulation submodules and a shift-add circuit. Each column accumulation submodule corresponds to one column of the memory cell array and includes an M-to-1 data selector, an NMOS transistor (i.e., the third transistor), and a counter. The voltage of the capacitor in the memory cell serves as the output of the memory cell and is connected to the input of the data selector in the corresponding column accumulation submodule; the output of the data selector is connected to the gate of the NMOS transistor; the drain of the NMOS transistor is connected to the input of the counter, and the source is grounded; the output of the counter is the accumulation result of the memory cell in that column and is connected to the shift-add circuit. The shift-add circuit performs weighted fusion on the accumulation results of each column and finally outputs the multiplication-accumulation result. The array size can be flexibly configured as M rows × N columns, containing M × N memory cells. Here, M is a perfect square number, adaptable to… The convolution kernel requires M weights for storage. The convolution kernel is a fixed-size weight matrix used for feature extraction in neural network algorithms, and its operation is achieved through sliding convolution with the input data. N is a natural number representing the number of bits storing the weights in the convolution kernel. Each column of storage units corresponds to the storage and operation of 1 bit of weight. N columns of storage units work together to achieve parallel storage of N bits of weight. Each row of storage units corresponds to the input of 1 bit of input signal. The M rows and N columns of storage unit arrays correspond to... The convolution kernel contains M N-bit weights.
[0081] The input signal in row M is input to each 2T1R1C charge domain memory cell in that row. After the memory cell completes its operation, the output of the memory cell located in row M and column N is OUT. M,N The output result OUT of the M memory cells located in column N. 1~M,N The output is sent to a subsequent M-to-1 data selector, whose output is connected to the gate of an NMOS transistor. The drain of the NMOS transistor is connected to the input of a counter, and the counter outputs the accumulated result DOUT of the Nth column. N The cumulative result DOUT of N columns. 1~N The input is fed into the shift-add circuit to obtain the final quantization result MAC_OUT<0:K-1>, where the relationship between K and M and N is shown in formula (6):
[0082]
[0083] The output of the memory cell has only two states: when the capacitor is fully charged, the corresponding NMOS in the readout circuit is turned on; when the capacitor is depleted, the corresponding NMOS in the readout circuit is turned off. That is, when the capacitor in the memory cell is fully charged, its output is selected by the data selector and transmitted to the gate of the NMOS in the readout circuit. At this time, the NMOS gate voltage is greater than the NMOS threshold voltage, and the NMOS is turned on. When the capacitor in the memory cell is depleted, its output is selected by the data selector and transmitted to the gate of the NMOS in the readout circuit. At this time, the NMOS gate voltage is less than the NMOS threshold voltage, and the NMOS is turned off.
[0084] The counter circuit is triggered by a falling edge. When the NMOS in the readout circuit is turned on, a falling edge is generated, triggering the counter and incrementing the count by 1. The counter counts the number of fully charged capacitors in the M rows of memory cells. The voltage of the capacitors in the memory cells represents the result of multiplying 1 bit of input by 1 bit of weight, and the counter's count represents the accumulated value of the multiplication result of 1 bit of input and 1 bit of weight.
[0085] The single-column readout circuit completes 1-bit input AND through the column accumulator submodule. The convolution kernel performs an accumulation operation on the column weights. The results from the N-column readout circuit are synchronously input to the shift-add circuit for weighted fusion, ultimately outputting 1 bit input, N bits weighted sum. The result of the convolution operation of the convolution kernel.
[0086] In summary, the low-power analysis of the readout circuit is as follows: Utilizing the threshold voltage of the NMOS transistor, the voltage status of the capacitors in the memory cell can be distinguished, eliminating the need for additional energy-consuming circuits such as a reference voltage generator and voltage divider resistor arrays. This reduces continuously power-consuming units from a hardware architecture perspective, with dynamic power consumption occurring only during the dynamic charging and discharging of the capacitors, resulting in near-zero static power consumption. Simultaneously, the NMOS transistor directly converts the capacitor voltage into a turn-on / turn-off signal, eliminating the need for the traditional process of obtaining the readout result through a comparator using a reference voltage and computational results. This avoids residual charge on the capacitor causing counting errors in subsequent counter stages, thus reducing energy-consuming aspects such as signal processing in the analog circuit, further lowering dynamic power consumption.
[0087] In summary, the low-power charge-domain memory cell and its readout circuit proposed in this invention, employing a 2T1R1C architecture, exhibit low-power characteristics: the memory cell abandons traditional current-domain operations and pre-charging processes, using the capacitor voltage to represent the 1-bit multiplication result, and charging only during "multiplication: 1×1," thus converting static power consumption into dynamic power consumption; the readout circuit replaces the additional reference source generation circuit with the inherent threshold voltage of the NMOS transistor. Simultaneously, the counter is triggered by a falling edge only when the "multiplication: 1×1" result is generated, and in conjunction with the data selector's selection operation, system power consumption is reduced, adapting to the requirements of low-power in-memory computing scenarios.
[0088] The low-power charge domain memory cell and its readout circuit proposed in this invention, employing a 2T1R1C architecture, possess high precision characteristics: When process angles change, variations in the RRAM resistance value can lead to accumulated errors in traditional current-domain operations. However, this invention transforms current-domain operations into charge-domain operations, meaning that variations in the RRAM resistance value only affect the capacitor's charging and discharging speed, having no impact on the final calculation result, thus ensuring the accuracy of the calculation. Furthermore, the low-pass filter formed by the RRAM and capacitor in the proposed cell effectively attenuates high-frequency glitches and clock feedthrough interference in the input signal, avoiding read errors caused by incomplete capacitor charging and discharging, ensuring stable multiplication results. The readout circuit directly reads the output of the memory cell without introducing errors from additional analog signal processing stages. Therefore, the readout result of the readout circuit simultaneously possesses high precision characteristics, ensuring the output accuracy of the multiplication-accumulation operation.
[0089] This invention, as the core circuit for RRAM in-memory computation, first resets the computational unit via a reset signal before performing multiplication and accumulation operations. Then, the 2T1R1C charge domain computational unit performs a multiplication operation of 1 bit input and 1 bit weight. Next, the M-to-1 data selector of the readout circuit sequentially selects M computational units in the corresponding column, using the threshold voltage of the NMOS transistor to distinguish the capacitor voltage, thereby triggering a counter to accumulate the result of the "multiplication: 1×1" operation. Finally, the accumulated result from multiple columns is weighted by a shift-add circuit and output. The result of the convolution kernel operation. This invention is compatible with mainstream integrated circuit design platforms such as Cadence, ADS, Hspice, and PSpice, and can be flexibly integrated into in-memory computing macro circuits, adapting to the design and application needs of low-power, high-precision in-memory computing scenarios.
[0090] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. A low-power charge domain memory cell employing a 2T1R1C and its readout circuit, characterized in that, It includes an M-row × N-column 2T1R1C charge domain storage cell array and a readout circuit, where M is a perfect square and N is a natural number; The memory-in-memory unit includes a first transistor (T1), a second transistor (T2), a memristor (RRAM), and a capacitor (C1). The gate of the first transistor (T1) is used to receive a 1-bit input signal, and the source of the first transistor (T1) is connected to the power supply. The first terminal of the memristor (RRAM) is connected to the drain of the first transistor (T1), and the second terminal of the memristor (RRAM) is connected to the drain of the second transistor (T2). The resistance value of the memristor (RRAM) represents 1-bit weight information. The gate of the second transistor (T2) is used to receive a reset signal, and the source of the second transistor (T2) is grounded. The first plate of the capacitor (C1) is connected to the common node between the second terminal of the memristor (RRAM) and the drain of the second transistor (T2), serving as the output terminal of the memory-in-memory unit. The second plate of the capacitor (C1) is grounded. The readout circuit includes N column accumulation sub-modules connected to each column of the memory cell array, and a shift-add circuit connected to all N column accumulation sub-modules. Each column accumulation sub-module includes an M-to-1 data selector, a third transistor, and a counter. For each column of memory cells, the outputs of all M memory cells are connected to the input of an M-to-1 data selector for the corresponding column, and the output of the M-to-1 data selector is connected to the gate of the third transistor. The drain of the third transistor is connected to the input of the counter, and the source of the third transistor is grounded. The output of the counter outputs the accumulation result of the column of memory cells. The inherent threshold voltage of the third transistor is used as a reference voltage to distinguish the voltage state on the capacitor (C1) in the memory cell, thereby controlling the conduction or cutoff of the third transistor and triggering the counter to count.
2. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The workflow of the in-memory computing unit includes a reset phase and a multiplication phase; During the reset phase, the second transistor (T2) is turned on by the reset signal, which discharges the charge on the capacitor (C1) to ground; During the multiplication phase, the result of the multiplication operation using a 1-bit input signal and a 1-bit weight information is represented by the voltage across the capacitor (C1): the first transistor (T1) is turned on if and only if the input signal is logic 1 and the weight information is 1, the memristor (RRAM) is in a low-impedance state, and the power supply charges the capacitor (C1) through the first transistor (T1) and the memristor (RRAM); under other combinations of input and weight, the capacitor (C1) remains at a low level and there is no charging process.
3. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The first transistor (T1) is a PMOS transistor, the second transistor (T2) is an NMOS transistor, and the third transistor is an NMOS transistor.
4. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The memristor (RRAM) and capacitor (C1) form an RC low-pass filter to attenuate the high-frequency glitches caused by the clock feedthrough introduced by the parasitic capacitance of the first transistor (T1), so as to stabilize the voltage on capacitor (C1) and ensure the accuracy of the multiplication result.
5. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The capacitance value of the capacitor (C1) is determined based on the charging speed constraint and the noise margin constraint; the charging speed constraint is: in, The charging time constant. This refers to the low-resistance state resistance value of the memristor. Preset charging time, This is the charging completion rate coefficient; The noise margin constraint is: in, This is the capacitance value of capacitor (C1). Boltzmann's constant, For temperature, Allowable voltage fluctuations; Under the premise of satisfying charging speed constraints and noise margin constraints, the minimum capacitance value is selected. To reduce power consumption.
6. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The counter is a falling edge triggered counter. The capacitor (C1) in the storage unit is fully charged. Its output voltage is output to the gate of the third transistor through the M-to-1 data selector. When the output voltage is greater than the threshold voltage of the third transistor, the third transistor is turned on, and a falling edge is generated at its drain, triggering the counter to increment by one, thus completing the accumulation of the multiplication result. When capacitor (C1) is depleted, the third transistor is cut off, and the counter does not count; finally, the counter outputs the cumulative value of all multiplication results in the corresponding column.
7. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The input terminals of the shift-add circuit are respectively connected to the output terminals of the counters in each column accumulation submodule. This circuit performs weighted fusion of the accumulation results from each column's storage unit, outputting the final multiply-accumulate result MAC_OUT, with a bit width of [missing information]. .
8. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, In the array of storage units, the gate of the first transistor (T1) of each row of storage units is connected to the same 1-bit input signal, and each column of storage units corresponds to the same bit in the N-bit weight, which is used to realize the parallel multiplication operation of the 1-bit input and all the corresponding bits of the weight in the column.
9. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The high-resistivity state HRS of the memristor (RRAM) corresponds to a weight of 0, and the low-resistivity state LRS corresponds to a weight of 1.
10. The low-power charge domain memory unit and its readout circuit using 2T1R1C as described in claim 1, characterized in that, The charging frequency f of the capacitor (C1) in the memory unit ranges from 10kHz to 1GHz.