Analog domain full precision in-memory computing circuit and method based on magnetic tunnel junction computing cells

By using a 3T2M magnetic tunnel junction computing unit and dynamically adjusting the number of ADC comparator enable units, the problem of high power consumption in sparse vector matrix multiplication and accumulation operations of the MRAM in-memory computing architecture is solved, and efficient analog domain in-memory computing is achieved.

CN115390789BActive Publication Date: 2026-06-12SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2022-08-26
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing MRAM-based in-memory computing architectures consume a lot of power in sparse vector matrix multiplication and accumulation operations, making it difficult to effectively reduce energy consumption and hardware resource consumption.

Method used

The system employs a magnetic random access memory (RAM) computing array with 3 transistors and 2 magnetic tunnel junctions (3T2M), a pulse generation circuit, a timing control circuit, an accumulation circuit, a multiplexer, an input-sensitive parallel analog-to-digital converter, and an enable signal generation circuit. Through built-in multiplication operations and dynamic adjustment of the number of ADC comparator enable signals, it achieves full-precision in-memory computation in the analog domain.

🎯Benefits of technology

It improves computational energy efficiency and reduces power consumption, especially significantly improving computational efficiency and accuracy in sparse vector matrix multiplication and accumulation operations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115390789B_ABST
    Figure CN115390789B_ABST
Patent Text Reader

Abstract

The application discloses a kind of analog field full-precision in-memory computing circuit and method based on magnetic tunnel junction computing unit, including 3 transistor 2 magnetic tunnel junction (3T2M) magnetic random access memory (MRAM) computing array, pulse generation circuit, timing control circuit, accumulation circuit, multiplexer, input sensitive parallel analog / digital converter (Flash ADC), enable signal generation circuit and digital shift accumulator.The invention realizes built-in multiplication operation in in-memory computing mode using 3T2M computing unit and improves computing unit yield through two complementary magnetic tunnel junctions (MTJ), based on Kirchhoff's current law, uses parallel transistor and capacitor to realize accumulation operation.Compared with traditional von Neumann architecture accelerator and existing MRAM analog field in-memory computing architecture, the application can effectively adapt to sparse vector matrix multiplication accumulation operation, reduce power consumption overhead and improve circuit energy efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of integrated circuit design, and particularly relates to an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, as well as a circuit design method adapted to sparse vector matrix multiplication and accumulation operations. Background Technology

[0002] In recent years, the rapid development of artificial intelligence and deep neural networks has greatly promoted the development of IoT applications. With the increase in computational complexity, massive amounts of data need to be transferred between central processing units and storage units. The von Neumann architecture, which separates memory and processor, can lead to a mismatch between the speed of computational operations and data transfer operations. On the other hand, it also suffers from the "memory wall" problem, which causes data transfer power consumption to be much greater than computational power consumption, thus becoming a major bottleneck in the development of low-power IoT edge devices.

[0003] In-memory computing architecture is the most promising approach to breaking through the bottlenecks of the von Neumann architecture. This architecture retains the storage and read / write functions of the memory circuitry itself, and can perform operations such as multiplication and accumulation, reducing memory access power consumption and data transfer frequency. Spin Transfer Torque-Magnetic RAM (STT-MRAM) possesses high write resistance, non-volatility, and compatibility with CMOS processes, making it suitable as an implementation medium for in-memory computing. Furthermore, software algorithms applied to IoT devices require careful design, leveraging data sparsity to reduce computational power consumption and hardware resource overhead.

[0004] Currently, in-memory computing circuits based on MRAM can implement Boolean logic and multiply-accumulate operations. In multiply-accumulate operations, the weight data needs to be read out and then multiplied and accumulated with the input stimulus. This external multiplication design increases the power consumption of the STT-MRAM memory array in sparse vector matrix multiply-accumulate scenarios. Furthermore, low-bit analog domain in-memory computing circuits often use Flash ADCs for their analog / digital converters (ADCs). While these ADCs are fast and flexible in design, they have higher power consumption and area overhead, accounting for a major portion of the power consumption in in-memory computing architectures. Summary of the Invention

[0005] The purpose of this invention is to provide an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, in order to solve the technical problem of high power consumption in sparse vector matrix multiplication and accumulation operations of the in-memory computing architecture based on MRAM, and improve the energy efficiency of the MRAM-based analog domain in-memory computing circuit.

[0006] To solve the above-mentioned technical problems, the specific technical solution of the present invention is as follows:

[0007] A full-precision in-memory computing circuit for analog domain based on magnetic tunnel junction computing units is characterized by comprising a magnetic random access memory (MRAM) computing array with 3 transistors and 2 magnetic tunnel junctions (3T2M), a pulse generation circuit, a timing control circuit, an accumulation circuit, a multiplexer, an input-sensitive parallel analog-to-digital converter (Flash ADC), an enable signal generation circuit, and a digital shift accumulator.

[0008] The 3-transistor 2-magnetic-tunnel junction 3T2M computing array consists of computing units arranged in a matrix (K rows, L columns), each containing two cross-connect transistors, one series access transistor, and two complementary magnetic tunnel junctions. In in-memory computing mode, the complementary magnetic tunnel junctions perform multiplication operations through the cross-connect transistors and series access transistors.

[0009] The pulse generation circuit completes the conversion of the digital domain input vector into an analog pulse signal in the in-memory computing mode, and feeds the pulse into the computing array.

[0010] The timing control circuit generates control signals for the in-memory computing circuit.

[0011] The accumulation circuit includes N groups of transistor-capacitor integration modules, each group including M parallel PMOS transistors and 1 capacitor. In the in-memory calculation mode, the sum of currents is related to the number of PMOS transistors turned on. The sum of currents is converted into a voltage signal by charging the capacitor.

[0012] The multiplexer is used in in-memory computing mode to implement column selection and realize the multiplexing of ADC;

[0013] The input-sensitive parallel analog-to-digital converter quantizes the voltage signal output by the accumulator circuit in the in-memory calculation mode to obtain the digital result of the multiply-accumulate operation.

[0014] The enable signal generation module, in the in-memory calculation mode, counts the number of bits equal to "1" in the input vector and generates an enable signal to control the number of comparators enabled in the FLASH ADC.

[0015] The digital shift accumulator weights the multi-bit weights and the partial sum of the input excitations, and then adds them together to obtain the final multi-bit multiply-accumulate calculation result.

[0016] Furthermore, the 3-transistor 2-magnetic tunnel junction 3T2M computing array comprises K rows and L columns of 3T2M computing units. Computing units in the same row share a word line WL, connected to the input vector excitation; computing units in the same column share a bit line BL and a source line SL. After the results of the 3T2M computing units stabilize, they are stored through latches and sampled by the subsequent accumulation circuit.

[0017] Furthermore, the 3-transistor 2-magnetic tunnel junction 3T2M computing unit includes:

[0018] The first NMOS transistor N1 has its gate connected to node VD, its drain connected to the reference magnetic tunnel junction unit, and its source connected to the drain of transistor N3.

[0019] The second NMOS transistor N2 has its gate connected to node VDB, its drain connected to the data magnetic tunnel junction unit, and its source connected to the drain of transistor N3.

[0020] The third NMOS transistor N3 has its gate connected to the word line WL, its drain connected to the source of transistors N1 and N2, and its source connected to the source line SL.

[0021] The first magnetic tunnel junction reference cell is connected to the bit line BL at one end and to the drain of N1 transistor and the gate of N2 transistor at the other end.

[0022] The second magnetic tunnel junction data unit is connected to the bit line BL at one end and to the gate of N1 transistor and the drain of N2 transistor at the other end.

[0023] Furthermore, the accumulator circuit includes one power gate switch S1, one capacitor reset switch S2, and N groups of long-channel PMOS transistor-capacitor modules, each group containing M parallel-connected long-channel PMOS transistors and one summing capacitor CSUM. The circuit structure includes:

[0024] One end of the power gate switch S1 is connected to the power supply, and the other end is connected to the source of the PMOS transistor.

[0025] Capacitor reset switch S2 is connected in parallel with CSUM;

[0026] One end of the summing capacitor CSUM is grounded, and the other end is connected to the drain of M PMOS transistors;

[0027] The drains of M PMOS transistors are connected together, and their sources are connected together. The drains serve as the data line DL.

[0028] Furthermore, the input-sensitive parallel analog-to-digital converter consists of one reference resistor chain, M comparators, and one encoder. Each comparator includes one preamplifier and one latching comparator.

[0029] Furthermore, the preamplifier includes:

[0030] The first PMOS transistor P1 has its gate connected to the bias voltage Vb, its source connected to the power supply, and its drain connected to the source of transistors P2 and P3.

[0031] The second PMOS transistor P2 has its gate connected to the positive input voltage, its source connected to the drain of transistor P1, and its drain connected to node AOUT-.

[0032] The third PMOS transistor P3 has its gate connected to the negative input voltage, its source connected to the drain of transistor P1, and its drain connected to node AOUT+.

[0033] The fourth PMOS transistor P4 has its gate connected to the enable signal AEN, its source connected to node AOUT-, and its drain connected to node AOUT+.

[0034] The first NMOS transistor N1 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT-.

[0035] The second NMOS transistor N2 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT-.

[0036] The third NMOS transistor N3 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT+.

[0037] The fourth NMOS transistor N4 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT+.

[0038] The first switch S1 has one end connected to the source of transistors N1, N2, N3, and N4, and the other end grounded.

[0039] Furthermore, the latch-type comparator includes:

[0040] The first PMOS transistor P5 has its gate connected to the control signal SEN, its source connected to the power supply, and its drain connected to the source of transistors P6 and P7.

[0041] The second PMOS transistor P6 has its gate connected to node AOUT+, its source connected to the drain of the first PMOS transistor P5, and its drain connected to the source of P8.

[0042] The third PMOS transistor P7 has its gate connected to node AOUT-, its source connected to the drain of the first PMOS transistor P5, and its drain connected to the source of P9.

[0043] The fourth PMOS transistor P8 has its gate connected to node DOUT, its source connected to the drain of the second PMOS transistor P6, and its drain connected to node DOUTB.

[0044] The fifth PMOS transistor P9 has its gate connected to node DOUTB, its source connected to the drain of the third PMOS transistor P7, and its drain connected to node DOUT.

[0045] The first NMOS transistor N5 has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUTB.

[0046] The second NMOS transistor N6 has its gate connected to node DOUT, its source grounded, and its drain connected to node DOUTB.

[0047] The third NMOS transistor N7 has its gate connected to node DOUTB, its source grounded, and its drain connected to node DOUT.

[0048] The fourth NMOS transistor, N8, has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUT.

[0049] Furthermore, the enable signal generation circuit includes a full adder circuit and a logic gate circuit.

[0050] The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit of the present invention has the following advantages:

[0051] (1) The 3T2M computing unit of the present invention utilizes complementary magnetic tunnel junctions and series transistors to realize built-in multiplication operations, which can effectively improve yield and reduce power consumption. Compared with the traditional 1T1M and 2T2M structures, the 3T2M structure can adapt to sparse vector matrix multiplication operations and improve computing energy efficiency.

[0052] (2) The PMOS transistor in the accumulation circuit of the present invention adopts a long channel size, which reduces the influence of the transistor channel modulation effect on the current, thereby reducing the problem of nonlinearity in the analog domain and improving the calculation accuracy.

[0053] (3) The input-sensitive parallel analog-to-digital converter and enable signal generation circuit of the present invention can dynamically adjust the number of comparators enabled in the Flash ADC according to the input vector, reduce unnecessary power consumption, reduce the power consumption ratio of the ADC circuit in the entire in-memory computing architecture, and improve computing energy efficiency. Attached Figure Description

[0054] Figure 1 A block diagram of a full-precision in-memory computing circuit in the analog domain based on a magnetic tunnel junction computing unit is provided for an embodiment of the present invention.

[0055] Figure 2 A circuit diagram of a 3T2M computing unit in a full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit provided in an embodiment of the present invention;

[0056] Figure 3 This invention provides a schematic diagram of the 3T2M multiplication calculation logic in an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, as provided in an embodiment of the invention.

[0057] Figure 4 A computational array structure diagram in an analog domain full-precision in-memory computational circuit based on a magnetic tunnel junction computational unit is provided for an embodiment of the present invention.

[0058] Figure 5 A circuit diagram of an input-sensitive parallel analog-to-digital converter in an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit is provided for an embodiment of the present invention.

[0059] Figure 6 A comparator circuit diagram of an input-sensitive parallel analog-to-digital converter in an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit is provided for an embodiment of the present invention.

[0060] Figure 7 A simulation diagram of the output voltage of the accumulation circuit of the analog domain full-precision in-memory computing circuit based on the magnetic tunnel junction computing unit provided in this embodiment of the invention;

[0061] Figure 8 A multi-bit computation timing diagram of an analog domain full-precision in-memory computation circuit based on a magnetic tunnel junction computation unit is provided for an embodiment of the present invention.

[0062] Figure 9 The diagram shows the energy efficiency results of a full-precision in-memory computing circuit in the analog domain based on a magnetic tunnel junction computing unit, as provided in an embodiment of the present invention. Detailed Implementation

[0063] To better understand the purpose, structure, and function of this invention, the following detailed description of an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit is provided in conjunction with the accompanying drawings.

[0064] A full-precision in-memory computing circuit for analog domain based on magnetic tunnel junction computing units includes a magnetic random access memory (MRAM) computing array of 3 transistors and 2 magnetic tunnel junctions (3T2M), a pulse generation circuit, a timing control circuit, an accumulation circuit, a multiplexer, an input-sensitive parallel analog-to-digital converter (Flash ADC), an enable signal generation circuit, and a digital shift accumulator.

[0065] The 3-transistor 2-magnetic-tunnel junction 3T2M computing array consists of computing units arranged in a matrix (K rows, L columns), each containing two cross-connect transistors, one series access transistor, and two complementary magnetic tunnel junctions. In in-memory computing mode, the complementary magnetic tunnel junctions perform multiplication operations through the cross-connect transistors and the series access transistors.

[0066] The pulse generation circuit completes the conversion of the digital domain input vector into an analog pulse signal in the in-memory computing mode, and feeds the pulse into the computing array.

[0067] The timing control circuit generates control signals for the in-memory computing circuit.

[0068] The accumulation circuit includes N groups of transistor-capacitor integration modules, each group including M parallel PMOS transistors and 1 capacitor. In the in-memory calculation mode, the sum of currents is related to the number of PMOS transistors turned on. The sum of currents is converted into a voltage signal by charging the capacitor.

[0069] The multiplexer is used in in-memory computing mode to implement column selection and realize the multiplexing of ADC;

[0070] The input-sensitive parallel analog-to-digital converter quantizes the voltage signal output by the accumulator circuit in the in-memory calculation mode to obtain the digital result of the multiply-accumulate operation.

[0071] The enable signal generation module, in the in-memory calculation mode, counts the number of bits equal to "1" in the input vector and generates an enable signal to control the number of comparators enabled in the FLASH ADC.

[0072] The digital shift accumulator weights the multi-bit weights and the partial sum of the input excitations, and then adds them together to obtain the final multi-bit multiply-accumulate calculation result.

[0073] The 3-transistor 2-magnetic tunnel junction 3T2M computing array comprises K rows and L columns of 3T2M computing units. Computing units in the same row share a word line WL, connected to the input vector excitation; computing units in the same column share a bit line BL and a source line SL. After the results of the 3T2M computing units stabilize, they are stored through latches and sampled by subsequent accumulation circuits.

[0074] The 3-transistor 2-magnetic tunnel junction 3T2M computing unit includes:

[0075] The first NMOS transistor N1 has its gate connected to node V. D The drain is connected to the reference magnetic tunnel junction unit, and the source is connected to the drain of the N3 transistor.

[0076] The second NMOS transistor N2 has its gate connected to node V. DB The drain is connected to the data magnetic tunnel junction unit, and the source is connected to the drain of the N3 transistor.

[0077] The third NMOS transistor N3 has its gate connected to the word line WL, its drain connected to the source of transistors N1 and N2, and its source connected to the source line SL.

[0078] The first magnetic tunnel junction reference cell is connected to the bit line BL at one end and to the drain of N1 transistor and the gate of N2 transistor at the other end.

[0079] The second magnetic tunnel junction data unit is connected to the bit line BL at one end and to the gate of N1 transistor and the drain of N2 transistor at the other end.

[0080] The accumulator circuit includes one power gate switch S1, one capacitor reset switch S2, and N groups of long-channel PMOS transistor-capacitor modules. Each group contains M parallel-connected long-channel PMOS transistors and one summing capacitor C. SUM The circuit structure includes:

[0081] One end of the power gate switch S1 is connected to the power supply, and the other end is connected to the source of the PMOS transistor.

[0082] Capacitor reset switch S2 and summing capacitor C SUM in parallel;

[0083] Summation capacitance C SUM One end is grounded, and the other end is connected to the drain of M PMOS transistors;

[0084] The drains of M PMOS transistors are connected together, and their sources are connected together. The drains serve as the data line DL.

[0085] The input-sensitive parallel analog-to-digital converter consists of one reference resistor chain, M comparators, and one encoder. Each comparator includes one preamplifier and one latching comparator.

[0086] The preamplifier includes:

[0087] The first PMOS transistor P1 has its gate connected to a bias voltage V. b The source is connected to the power supply, and the drain is connected to the source of transistors P2 and P3.

[0088] The second PMOS transistor P2 has its gate connected to the positive input voltage, its source connected to the drain of transistor P1, and its drain connected to node AOUT-.

[0089] The third PMOS transistor P3 has its gate connected to the negative input voltage, its source connected to the drain of transistor P1, and its drain connected to node AOUT+.

[0090] The fourth PMOS transistor P4 has its gate connected to the enable signal AEN, its source connected to node AOUT-, and its drain connected to node AOUT+.

[0091] The first NMOS transistor N1 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT-.

[0092] The second NMOS transistor N2 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT-.

[0093] The third NMOS transistor N3 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT+.

[0094] The fourth NMOS transistor N4 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT+.

[0095] The first switch S1 has one end connected to the source of transistors N1, N2, N3, and N4, and the other end grounded.

[0096] The latch-type comparator includes:

[0097] The first PMOS transistor P5 has its gate connected to the control signal SEN, its source connected to the power supply, and its drain connected to the source of transistors P6 and P7.

[0098] The second PMOS transistor P6 has its gate connected to node AOUT+, its source connected to the drain of transistor P5, and its drain connected to the source of transistor P8.

[0099] The third PMOS transistor P7 has its gate connected to node AOUT-, its source connected to the drain of transistor P5, and its drain connected to the source of transistor P9.

[0100] The fourth PMOS transistor, P8, has its gate connected to node DOUT, its source connected to the drain of transistor P6, and its drain connected to node DOUTB.

[0101] The fifth PMOS transistor, P9, has its gate connected to node DOUTB, its source connected to the drain of transistor P7, and its drain connected to node DOUT.

[0102] The first NMOS transistor N5 has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUTB.

[0103] The second NMOS transistor N6 has its gate connected to node DOUT, its source grounded, and its drain connected to node DOUTB.

[0104] The third NMOS transistor N7 has its gate connected to node DOUTB, its source grounded, and its drain connected to node DOUT.

[0105] The fourth NMOS transistor, N8, has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUT.

[0106] The enable signal generation circuit includes a full adder circuit and a logic gate circuit.

[0107] Example

[0108] The present invention discloses an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, comprising a magnetic random access memory (MRAM) computing array with 3 transistors and 2 magnetic tunnel junctions (3T2M), a pulse generation circuit, a timing control circuit, an accumulation circuit, a multiplexer, an input-sensitive parallel analog-to-digital converter (Flash ADC), an enable signal generation circuit, and a digital shift accumulator.

[0109] like Figure 1 The in-memory computing architecture shown includes: a 3-transistor 2-magnetic tunnel junction 3T2M computing array consisting of computing units arranged in a matrix (K rows and L columns) containing 2 cross-connect transistors, 1 series access transistor and 2 complementary magnetic tunnel junctions. In in-memory computing mode, the complementary magnetic tunnel junction implements multiplication operations through cross-connected transistors and series access transistors; the pulse generation circuit converts the input vector into a fixed-width pulse signal, which is then input into the computing array; the timing control circuit generates control signals to control the enable time of each module; the accumulation circuit, based on Kirchhoff's current law, connects multiple transistors in parallel, and the current charges the summing capacitor, converting the current signal into a voltage signal; the multiplexer is used in in-memory computing mode to implement column selection and realize the multiplexing of the ADC; the input-sensitive parallel analog-to-digital converter quantizes the voltage signal output by the accumulation circuit to obtain the digital result of the multiply-accumulate operation; the enable signal generation module counts the number of bits equal to "1" in the input vector and generates an enable signal to control the number of comparators enabled in the FLASH ADC; the digital shift accumulator weights the multi-bit weights and the partial sum of the input excitation and then adds them together to obtain the final multi-bit multiply-accumulate calculation result.

[0110] This embodiment uses an 8×1 weight matrix (K=8, L=1) to implement the multiplication and accumulation operation of eight 2-bit input values ​​IN and eight 1-bit weight values ​​W. The formula is as follows:

[0111]

[0112] The input value IN in formula (1) is mapped as follows in the 3T2M computing array disclosed in this invention:

[0113]

[0114] IN in formula (2) i,0 and IN i,1 These represent the high-order bits and low-order bits of the input value, respectively. This embodiment of the invention employs a serial input strategy. During computation, the high-order bits are first fed into the computation array to obtain a partial sum, and then the low-order bits are fed into the computation array. Finally, the two partial sums are added together to obtain the final multiplication-accumulation result.

[0115] In formula (1), the weight value W is mapped in the 3T2M computing array disclosed in this invention as follows:

[0116]

[0117] The weight values ​​in formula (3) are stored in the 3T2M computing unit of the computing array disclosed in this invention in matrix format.

[0118] like Figure 2 The diagram shown is a 3T2M computing unit circuit diagram in an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to an embodiment of the present invention. The right magnetic tunnel junction is a data unit storing weight information, and the left magnetic tunnel junction is a reference unit with the opposite state to the right magnetic tunnel junction. Two cross-connected and one series-connected transistor implement the multiplication operation and output the multiplication result to V. D Node. When the input excitation is "1" (WL = "1"), and the data cell magnetic tunnel junction is in an antiparallel state (high resistance state), the magnetic tunnel junction voltage drop is high, V D When the node voltage is low, transistor N1 is cut off, and the feedback effect increases the conduction level of transistor N2. D The node voltage is close to 0. The same analysis can be performed for the other cases.

[0119] like Figure 3 The diagram shown illustrates the multiplication logic of a 3T2M computing unit in a full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, as provided in an embodiment of the present invention. Multiplication is implemented using AND logic with different input and weight configurations. When the input is "0" (the gate of transistor N3 is "0"), transistor N3 is turned off, regardless of the weight information value. D All nodes are at a high level V H When the input is "1" (the gate of N3 is "1"), N3 is turned on, and V... D The node displays a high level (V) based on whether the weight information stored in the data unit is "0" or "1". H ) or low level (V L ).

[0120] like Figure 4 The diagram shown illustrates a 3T2M computation array and accumulation circuit in a full-precision in-memory computation circuit based on a magnetic tunnel junction computation unit, as provided in an embodiment of the present invention. The white background represents the computation unit array, and the dark background represents the accumulation circuit. Each column of computation units shares a source line SL, a bit line BL, and a data line DL, while each row of computation units shares a word line WL. In this embodiment, only the operation on one column of computation units, i.e., 1 bit of weighted data, is considered. The accumulation circuit consists of 8 parallel PMOS transistors and 1 summing capacitor C. SUM It consists of two switches, S1 and S2. Before the summation begins, S2 is closed to reset the summation capacitor, V SUM The node is reset to zero potential. After the multiplication operation is completed, S1 is closed and S2 is open. The multiplication results of the eight units are input to the gate, controlling the conduction state of the eight PMOS transistors respectively. The charging current is determined by the multiplication-accumulation result; the larger the multiplication-accumulation value, the more PMOS transistors are turned on, and the larger the charging current. SUM The higher the voltage value of the node.

[0121] like Figure 5 The diagram illustrates an input-sensitive parallel analog-to-digital converter circuit in a full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, as provided in an embodiment of the present invention. It includes a series resistor chain, eight comparators, one encoder, and one enable signal generator. The series resistor chain generates a linear reference voltage, which serves as the negative input to the comparators. The accumulator circuit outputs a voltage V. SUM Connected to the positive terminals of eight comparators, the comparator outputs are encoded and output as a digital value multiplied by the accumulated value. The comparator consists of a preamplifier and a latching comparator, each controlled by two signals: the preamplifier enable signal AEN and the latching comparator enable signal SEN. The enable signal generation circuit determines the number of comparators enabled based on the number of "1" bits in the input vector WL, thus reducing the number of comparators operating and lowering power consumption when the input sparsity is high.

[0122] like Figure 6 The diagram shows a comparator circuit for an input-sensitive parallel analog-to-digital converter in an analog-domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, according to an embodiment of the present invention. The left side shows the preamplifier circuit, and the right side shows the latching comparator circuit. The preamplifier amplifies the voltage difference between the positive and negative input signals through the feedback of transistors N2 and N3, and its output is connected to the gates of P6 and P7 of the latching comparator. When SEN = 1, the DOUT and DOUTB nodes of the latching comparator are reset to low level; when SEN = 0, the two nodes are charged through transistors P7 and P6 respectively. The magnitudes of the AOUT+ and AOUT- voltages determine the charging speed. If AOUT+ > AOUT-, DOUT charges faster, transistor P8 is turned off, transistor N6 is turned on, DOUTB is pulled down to low level, DOUT = 1 and is latched, and this value is the comparator output.

[0123] like Figure 7 The figure shows a simulation diagram of the output voltage of the accumulation circuit in a full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit provided in an embodiment of the present invention. The S1 switch of the accumulation circuit closes at 0.3 ns, charging the summing capacitor through a parallel PMOS transistor. After 0.8 ns, the S1 switch opens, and the voltage on the summing capacitor is latched, facilitating sampling by the subsequent FlashADC circuit. A long-channel transistor is used in the parallel PMOS transistor to reduce the current nonlinearity problem caused by the channel modulation effect.

[0124] like Figure 8The diagram shown illustrates a 2-bit input, 1-bit weight calculation timing diagram for an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, according to an embodiment of the present invention. In the first clock cycle, the high-order bits of the input stimulus are fed into the computing array to obtain the high-order bit operation portion sum. In the second clock cycle, the low-order bits of the input stimulus are fed into the computing array to obtain the low-order bit operation portion sum. Simultaneously, the high-order bit sum is shifted left by 1 bit to achieve weighting. The final multiplication and accumulation result is output at the rising edge of the third clock cycle.

[0125] like Figure 9 The figure shown illustrates the energy efficiency results of an analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit, as provided in an embodiment of the present invention. It can be seen that the in-memory computing circuit proposed in this invention has a significant energy efficiency improvement compared to traditional digital domain in-memory computing circuits, and exhibits significant advantages in high input sparsity scenarios.

[0126] It is understood that the present invention has been described through some embodiments, and those skilled in the art will recognize that various changes or equivalent substitutions can be made to these features and embodiments without departing from the spirit and scope of the invention. Furthermore, under the teachings of the present invention, these features and embodiments can be modified to adapt to specific situations and materials without departing from the spirit and scope of the invention. Therefore, the present invention is not limited to the specific embodiments disclosed herein, and all embodiments falling within the scope of the claims of this application are within the protection scope of the present invention.

Claims

1. A full-precision in-memory computing circuit for the analog domain based on a magnetic tunnel junction computing unit, characterized in that, It includes a 3-transistor 2-magnetic tunnel junction 3T2M computing array, a pulse generation circuit, a timing control circuit, an accumulator circuit, a multiplexer, an input-sensitive parallel analog-to-digital converter, an enable signal generation circuit, and a digital shift accumulator; The 3-transistor 2-magnetic tunnel junction 3T2M computing array is a K-row L-column matrix composed of computing units containing 2 cross-connect transistors, 1 series access transistor, and 2 complementary magnetic tunnel junctions. In in-memory computing mode, the complementary magnetic tunnel junctions perform multiplication operations through the cross-connect transistors and series access transistors. The 3-transistor 2-magnetic tunnel junction 3T2M computing unit includes: The first NMOS transistor N1 has its gate connected to node V. D The drain is connected to the first magnetic tunnel junction reference cell, and the source is connected to the drain of the third NMOS transistor N3; The second NMOS transistor N2 has its gate connected to node V. DB The drain is connected to the second magnetic tunnel junction data unit, and the source is connected to the drain of the third NMOS transistor N3; The third NMOS transistor N3 has its gate connected to the word line WL, its drain connected to the source of the first NMOS transistor N1 and the second NMOS transistor N2, and its source connected to the source line SL. The first magnetic tunnel junction reference cell is connected to the bit line BL at one end and to the drain of the first NMOS transistor N1 and the gate of the second NMOS transistor N2 at the other end. The second magnetic tunnel junction data unit is connected to the bit line BL at one end and to the gate of the first NMOS transistor N1 and the drain of the second NMOS transistor N2 at the other end. The accumulator circuit includes one power gate switch S1, one capacitor reset switch S2, and N groups of long-channel PMOS transistor-capacitor modules. Each group contains M parallel-connected long-channel PMOS transistors and one summing capacitor C. SUM In in-memory computing mode, the sum of currents is related to the number of PMOS transistors that are turned on, and the sum of currents is converted into a voltage signal by charging the capacitor. The pulse generation circuit completes the conversion of the digital domain input vector into an analog pulse signal in the in-memory computing mode, and feeds the pulse into the computing array. The timing control circuit generates control signals for the in-memory computing circuit. The multiplexer is used in in-memory computing mode to implement column selection and realize the multiplexing of ADC; The input-sensitive parallel analog-to-digital converter quantizes the voltage signal output by the accumulator circuit in the in-memory calculation mode to obtain the digital result of the multiply-accumulate operation. The enable signal generation circuit, in the in-memory computing mode, counts the number of bits equal to "1" in the input vector and generates an enable signal to control the number of comparators enabled in the input-sensitive parallel analog-to-digital converter. The digital shift accumulator weights the multi-bit weights and the partial sum of the input excitations, and then adds them together to obtain the final multi-bit multiply-accumulate calculation result.

2. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 1, characterized in that, The same row of computing units shares a word line WL, which is connected to the input vector excitation; the same column of computing units shares a bit line BL and a source line SL. After the result of the computing unit is stable, it is stored through a latch and sampled by the subsequent accumulation circuit.

3. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 1, characterized in that, The circuit structure of the accumulation circuit includes: One end of the power gate switch S1 is connected to the power supply, and the other end is connected to the source of the PMOS transistor. Capacitor reset switch S2 and summing capacitor C SUM in parallel; Summation capacitance C SUM One end is grounded, and the other end is connected to the drain of M PMOS transistors; The drains of M PMOS transistors are connected together, and their sources are connected together. The drains are used as the data line DL.

4. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 1, characterized in that, The input-sensitive parallel analog-to-digital converter consists of a reference resistor chain, M comparators, and an encoder; the comparators include a preamplifier and a latching comparator.

5. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 4, characterized in that, The preamplifier includes: The first PMOS transistor P1 has its gate connected to a bias voltage V. b The source is connected to the power supply, and the drain is connected to the source of the second PMOS transistor P2 and the third PMOS transistor P3. The second PMOS transistor P2 has its gate connected to the positive input voltage, its source connected to the drain of the first PMOS transistor P1, and its drain connected to node AOUT-. The third PMOS transistor P3 has its gate connected to the negative input voltage, its source connected to the drain of the first PMOS transistor P1, and its drain connected to node AOUT+. The fourth PMOS transistor P4 has its gate connected to the enable signal AEN, its source connected to node AOUT-, and its drain connected to node AOUT+. The first NMOS transistor N1 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT-. The second NMOS transistor N2 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT-. The third NMOS transistor N3 has its gate connected to node AOUT-, its source connected to one end of switch S1, and its drain connected to node AOUT+. The fourth NMOS transistor N4 has its gate connected to node AOUT+, its source connected to one end of switch S1, and its drain connected to node AOUT+. The first switch S1 has one end connected to the source of the first NMOS transistor N1, the second NMOS transistor N2, the third NMOS transistor N3, and the fourth NMOS transistor N4, and the other end grounded. Its on / off state is controlled by the enable signal AEN.

6. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 4, characterized in that, The latch-type comparator includes: The first PMOS transistor P5 has its gate connected to the control signal SEN, its source connected to the power supply, and its drain connected to the source of transistors P6 and P7. The second PMOS transistor P6 has its gate connected to node AOUT+, its source connected to the drain of the first PMOS transistor P5, and its drain connected to the source of P8. The third PMOS transistor P7 has its gate connected to node AOUT-, its source connected to the drain of the first PMOS transistor P5, and its drain connected to the source of P9. The fourth PMOS transistor P8 has its gate connected to node DOUT, its source connected to the drain of the second PMOS transistor P6, and its drain connected to node DOUTB. The fifth PMOS transistor P9 has its gate connected to node DOUTB, its source connected to the drain of the third PMOS transistor P7, and its drain connected to node DOUT. The first NMOS transistor N5 has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUTB. The second NMOS transistor N6 has its gate connected to node DOUT, its source grounded, and its drain connected to node DOUTB. The third NMOS transistor N7 has its gate connected to node DOUTB, its source grounded, and its drain connected to node DOUT. The fourth NMOS transistor, N8, has its gate connected to the control signal SEN, its source grounded, and its drain connected to node DOUT.

7. The analog domain full-precision in-memory computing circuit based on a magnetic tunnel junction computing unit according to claim 1, characterized in that, The enable signal generation circuit includes a full adder circuit and a logic gate circuit.

8. A full-precision in-memory computation method for the analog domain based on a magnetic tunnel junction computation unit, using the circuit described in any one of claims 1-7, characterized in that, In in-memory computing mode, the complementary magnetic tunnel junction implements multiplication operations through cross-connected transistors and series access transistors; the pulse generation circuit converts the input vector into a pulse signal with a fixed pulse width, which is then input into the computing array. The timing control circuit generates control signals to control the enable time of each module; The accumulator circuit is based on Kirchhoff's current law, connecting multiple transistors in parallel. The current charges the summing capacitor, converting the current signal into a voltage signal. The multiplexer is used in the in-memory computing mode to implement column selection and realize the multiplexing of the ADC. The input-sensitive parallel analog-to-digital converter quantizes the voltage signal output by the accumulator circuit to obtain the digital result of the multiply-accumulate operation; the enable signal generation module counts the number of bits equal to "1" in the input vector and generates an enable signal to control the number of comparators enabled in the FLASH ADC; the digital shift accumulator weights the multi-bit weights and the partial sum of the input excitation and then adds them together to obtain the final multi-bit multiply-accumulate calculation result.