High energy efficiency analog cam based on fefet structure and operating method thereof
By designing an analog CAM cell based on a FeFET structure and employing an adaptive search scheme, the problems of large area overhead and high energy consumption in existing analog CAM designs are solved, achieving higher search energy efficiency and a more compact design suitable for edge intelligent tasks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2025-08-19
- Publication Date
- 2026-06-30
Smart Images

Figure CN121237156B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of storage, and more particularly to a high-efficiency analog CAM based on FeFET structure and its operation method. It considers the use of FeFET as a device for the design of low-power, high-performance analog CAMs with non-volatility. Background Technology
[0002] In-memory computing (IMC) is a novel computing paradigm that avoids the memory wall problem caused by frequent data transfers between memory and the processor. Content-addressed memory (CAM) is a specific and promising hardware solution within IMC that enables parallel execution of search operations across all memory vectors. Therefore, CAM is well-suited for high-speed associative search tasks and holds immense potential in today's data-intensive applications, including machine learning and neuromorphic computing.
[0003] Traditional CMOS CAMs face numerous challenges in deploying in compact and energy-efficient edge devices due to their high power consumption and low area density. To address these issues, researchers are utilizing emerging non-volatile memory (NVM) technologies, such as resistive RAM (ReRAM), spin-transfer torque magnetoresistive RAM (STT-MRAM), and ferroelectric field-effect transistors (FeFETs), to design compact and efficient CAM designs. ReRAM and STT-MRAM combine variable resistance and non-volatile memory characteristics, encoding low / high resistance states (LRS / HRS) as logic values "1" / "0". However, due to the low resistance ratio and current-driven write mechanism, ReRAM-based CAM designs face challenges of high write and search power consumption; while STT-MRAM-based CAM cells, due to their limited tunnel magnetoresistive (TMR) ratio, require additional transistors for reliable write and search operations, leading to increased area overhead. FeFETs, as three-terminal devices, can act as 1T non-volatile memory and switches, rather than variable resistors. Due to FeFET's unique hysteresis IV characteristic curve, high switching current ratio, high turn-off resistance, and electric field-driven writing mechanism, FeFET is more suitable for building more compact and energy-efficient CAM designs and achieving better area efficiency compared to traditional CMOS CAMs.
[0004] While common single-bit CAM (BCAM) designs are easy to implement by leveraging the emerging NVM, they still suffer from limitations in storage density because each binary bit needs to be stored and processed separately. Therefore, current state-of-the-art multi-bit CAM (MCAM) and analog CAM (ACAM) designs further utilize the multilevel cell (MLC) and analog programming features of NVM to improve storage density. Unlike MCAM, which stores fixed discrete values, ACAM can program continuous matching ranges into memory blocks and supports analog query search. Therefore, ACAM designs can encompass the functionality of MCAM and provide more flexible search capabilities for associative search applications that require searching for analog values, such as deep random forest (DRF) accelerators, performing exceptionally well in edge intelligence tasks. Among state-of-the-art ACAM designs, the representative 6T-2R ACAM design achieves analog search by introducing two voltage divider circuits, but sacrifices area density and energy efficiency due to the use of additional transistors and current-driven sensing; another 2FeFET ACAM design employs complementary search encoding, but its search drive circuit requires analog linear inverters to implement complementary analog input voltages, resulting in additional energy consumption. Therefore, there is still room for optimization in the existing ACAM design due to its limitations in area density and energy efficiency. Summary of the Invention
[0005] The purpose of this invention is to address the problems of large area overhead, high energy consumption, and poor performance of existing ACAMs by providing a high-efficiency ACAM cell design based on FeFET structure. Furthermore, it proposes to move the pre-charging of the ML (Multi-Range Array) to the search operation within the cell at the array level and to propose an adaptive scheme for selectively terminating the search phase in advance, thereby achieving higher search energy efficiency.
[0006] The objective of this invention is achieved through the following technical solution:
[0007] A high-efficiency analog CAM based on a FeFET structure is disclosed. The analog CAM unit consists of one NMOS and two FeFETs, where the NMOS is T0 and the two FeFETs are F1 and F2, respectively. In the analog CAM unit structure, T0 and F1 are connected in series to form an equivalent voltage divider circuit. The source of T0 is connected to the word line WL, and the source of F1 is connected to the word line WL. The gate of T0 is connected to the control line CL, the gate of F1 is connected to the search line SL, the drain of T0 is connected to the drain of F1 and the gate of F2 D, the drain of F2 is connected to the matching line ML, and ML discharges through F2.
[0008] Furthermore, by operating the gate of F1 and the drain of F2 respectively, and keeping D always grounded, sufficient voltage drops are generated at the gate and source terminals of F1 and F2 to store the analog values of the two FeFETs; the analog values stored in F1 and F2 are respectively set with the position and width of the continuous matching range.
[0009] Furthermore, the simulated CAM search operation is performed in two phases, in which the search is conducted against the upper and lower bounds of the continuous matching range, respectively.
[0010] Furthermore, the query input is connected to the gate of F1 via a single search line SL.
[0011] Furthermore, in the cross array composed of several analog CAM units, each column shares the same vertical line SL, and each row shares the same horizontal line WL. In each row, all the analog CAM units are connected to each other via ML, and each ML is simultaneously connected to a detection amplifier SA as its output.
[0012] Furthermore, in the cross array composed of several analog CAM units, ML, which was originally connected to the drain of F2 in the analog CAM unit design, is changed to be connected to the source of F2. At the same time, the drain of F2 is changed to be connected to the charging line ScL, so that the pre-charging of ML is transferred to the search operation in the analog CAM unit, thereby reducing the voltage swing of ML.
[0013] The present invention also provides a method for simulating CAM operation as described above, comprising:
[0014] Before the cross array composed of several analog CAM units starts working, data is stored for each analog CAM unit. That is, after calculating the analog values of position and width based on the upper and lower bounds of the continuous matching range, F1 and F2 are written through SL and ScL respectively.
[0015] Each search cycle is divided into search phase one and search phase two. Each phase has independent clock signals CLK1 and CLK2, which are used to search for the upper and lower bounds of the continuous matching range, respectively.
[0016] Search Phase 1: When clock signal CLK1 is high, CLK1 discharges ML to ground by controlling the gate of the pull-down NMOS transistor connected to ML; when CLK1 is low, the discharge of ML is turned off, and WL and Set to 0 and 1, SL is set to search for the input voltage condition, and ScL is connected to the power supply. At this point, for matched analog CAM units, ScL will not charge ML via F2; for mismatched analog CAM units, ScL will charge ML via F2. After waiting for a period of time, observe the output of the detection amplifier SA for each row. If it is 1, it indicates that ML in this row has been charged, and the row is mismatched. Immediately cut off all search configurations to prevent ML from being charged. This reduces the voltage swing of ML; if it is 0, it means that ML in this row is not charged and the row is currently matched.
[0017] Search Phase Two: An adaptive scheme for selectively terminating the search phase is introduced. If a mismatch is observed in ML during Search Phase One, Search Phase Two is terminated early, and the mismatch result is directly output. If a match is observed in ML during Search Phase One, Search Phase Two continues. First, when clock signal CLK2 is high, CLK2 discharges ML to ground by controlling the gate of another pull-down NMOS transistor connected to ML. When CLK2 is low, the discharge of ML is turned off, and WL and... Reverse the values to 1 and 0, keeping the voltages of SL and ScL unchanged. At this time, for matched analog CAM units, ScL will not charge ML through F2, while for unmatched analog CAM units, ScL will charge ML through F2. After waiting for a certain period of time, observe the output of each row detection amplifier. If it is 1, it means that ML in this row has been charged, and the row is ultimately unmatched, and all search configurations are immediately cut off. If it is 0, it means that ML in this row has not been charged, and the row is a perfect match through the two-stage search.
[0018] The beneficial effects of this invention are as follows:
[0019] The ACAM design in this invention can improve search energy efficiency.
[0020] (1) For the 2FeFET-1T ACAM cell design, each ACAM cell has only one FeFET connected to the matching line ML, reducing the ML capacitance and thus reducing precharge power consumption. Furthermore, since this design uses a single-lookup input, it eliminates the need for an analog linear inverter compared to the 2FeFET ACAM design. In addition, due to the smaller number of devices in this design, the area overhead is smaller than that of the 6T-2R ACAM design, resulting in reduced production costs.
[0021] (2) For the 2FeFET-1T ACAM array design, unlike the traditional array design, this design eliminates the traditional pre-charge stage and moves the pre-charge of ML to the search operation within the cell, thereby reducing the voltage swing of ML; and proposes an adaptive scheme for selectively terminating the search stage in advance. If the mismatch of the row is observed in the first search stage by using the detection amplifier SA, the second search stage is terminated in advance; otherwise, the second search stage continues, shortening the search time and effectively achieving higher search efficiency. Attached Figure Description
[0022] Figure 1 (a) is the ACAM cell structure diagram of 2FeFET-1T, and (b) is the equivalent voltage divider circuit diagram.
[0023] Figure 2 (a) is a schematic diagram of the operation principle of the first stage of the ACAM cell search of 2FeFET-1T, which searches for the upper bound of the continuous matching range; (b) is a schematic diagram of the operation principle of the second stage of the ACAM cell search of 2FeFET-1T, which searches for the lower bound of the continuous matching range.
[0024] Figure 3 This is a schematic diagram of the operation of setting the continuous matching range of the ACAM cell of 2FeFET-1T. The analog value stored in F1 sets the position of the matching range, and the analog value stored in F2 sets the width of the matching range.
[0025] Figure 4 (a) is a structural diagram of the ACAM array of 2FeFET-1T, and (b) is a partial circuit diagram including the ACAM unit and the detection amplifier SA.
[0026] Figure 5 It is an adaptive solution diagram that selectively terminates the search phase early;
[0027] Figure 6 (a) is a simulation diagram of adjusting the width of the continuous matching range by F2; (b) is a simulation diagram of adjusting the position of the continuous matching range by F1; (c) is a simulation diagram of the continuous matching range changing only at the upper bound from [0.1V, 0.2V] to [0.1V, 1.1V]; and (d) is a simulation diagram of the continuous matching range changing only at the lower bound from [0.1V, 1.1V] to [1.0V, 1.1V].
[0028] Figure 7(a) is a transient simulation waveform of the ACAM cell of 2FeFET-1T, (b) is a simulation diagram showing the corresponding continuous matching range [0.4V, 0.6V], (c) is a 3-D transient simulation waveform of SA output as SL input voltage changes, and (d) is a simulation diagram showing 8 (3-bit) bounded non-overlapping continuous matching ranges.
[0029] Figure 8 The transient simulation waveforms of the 2FeFET-1T ACAM array include three search cases: Case 1: Search phase 1 does not match, and phase 2 is terminated prematurely; Case 2: Search phase 1 matches, and search phase 2 matches; Case 3: Search phase 1 matches, and search phase 2 does not match.
[0030] Figure 9 In the middle, (a) is a graph showing the search power consumption and delay of the 2FeFET-1T ACAM array under different row numbers, and (b) is a graph showing the search power consumption and delay of the 2FeFET-1T ACAM array under different word lengths.
[0031] Figure 10 (a) shows the transient simulation waveforms and statistical analysis of a 2FeFET-1T ACAM array with a word length of 16 cells under device process variations; (b) shows the transient simulation waveforms and statistical analysis of a 2FeFET-1T ACAM array with a word length of 32 cells under device process variations; and (c) shows the transient simulation waveforms and statistical analysis of a 2FeFET-1T ACAM array with a word length of 64 cells under device process variations. Detailed Implementation
[0032] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0033] The ACAM unit circuit structure and operating principle of 2FeFET-1T:
[0034] like Figure 1 As shown in Figure (a), a high-efficiency 2FeFET-1T ACAM cell design based on a FeFET structure includes two FeFETs (F1 and F2) and one NMOS transistor (T0), wherein T0 and F1 are connected in series to form an equivalent voltage divider circuit, as shown in Figure (a). Figure 1 As shown in (b), the sources of T0 and F1 are connected to word lines WL and F1, respectively. The gate of T0 is connected to the control line CL, the gate of F1 is connected to the search line SL, the drain of T0 is connected to the drain of F1 and the gate of F2 D, the drain of F2 is connected to the matching line ML, and the source is grounded. Word line WL / The voltage divider structure is powered, the search line SL inputs query data, the control line CL adjusts the equivalent resistance of T0, and ML discharges through F2. It is important to note that in the cross-array composed of 2FeFET-1T ACAM cells, ML is designed to be connected to the source of F2, while the drain of F2 is connected to the power supply line ScL. This is because the pre-charging of ML is transferred to the search operation within the cell in the array design of this invention, thereby reducing the voltage swing of ML.
[0035] For the write operation of the 2FeFET-1T ACAM cell, the analog value is written to the two FeFETs of the ACAM cell in two steps. Whether writing to F1 or F2, the analog value must be... Applied to CL to activate access transistor T0, and WL / Grounding ensures that node D remains grounded during the write process. In the first step, [the following steps are taken]. Apply to SL and ground ML, then in the second step, Apply to ML and cut off SL Therefore, these two write operations apply write voltages to the gate of F1 and the drain of F2, respectively, thereby generating sufficiently large gate-source voltages for F1 and F2. The voltage drop was sufficient to change the polarization of the inner FE layers of F1 and F2, successfully writing the required analog value into the two FeFETs.
[0036] The operating principle of the 2FeFET-1T ACAM cell is as follows: Figure 2 As shown, the search operation is performed in two stages, searching for the upper and lower bounds of the continuous matching range stored in the ACAM cell, respectively. Figure 2 As shown in (a), in search phase one, WL and These are set to low and high levels, respectively. Therefore, the voltage of node D in search phase one is:
[0037]
[0038] in, for Voltage, This is the equivalent resistance of T0 in stage one. By selecting an appropriate bias, The equivalent resistance value is set to Between that range. This is the equivalent resistance of F1, and its value can be found in... to The variation depends on the search input voltage. And the threshold voltage stored in F1 ( When the search input voltage is lower than hour, for At this time, the voltage at node D is:
[0039]
[0040] This is clearly below the threshold voltage of F2 ( With F2 kept off, ML will not discharge, thus achieving the matching operation.
[0041] Similarly, when the search input voltage is higher than hour, for At this time, the voltage at node D is:
[0042]
[0043] When it is lower than The same applies when the match is successful, while higher than... A mismatch is indicated by a certain timeframe, and the corresponding SL input voltage inflection point is the upper bound of the continuous matching range. However, searching only for the upper bound in search phase one is insufficient to determine whether the input query matches the stored continuous matching range. A further search for the lower bound is required in search phase two to complete the full search operation.
[0044] like Figure 2 As shown in (b), in search phase two, for WL and The voltage direction on the node has been reversed; therefore, the voltage at node D becomes:
[0045]
[0046] in For WL voltage, For the equivalent resistance of T0 in stage two, when the search input voltage is higher than... hour, for At this time, the voltage at node D is:
[0047]
[0048] This value is significantly lower than This indicates a successful match. When the search input voltage is lower than... Then, Will be At this time, the voltage at node D is:
[0049]
[0050] Similarly, when the search input voltage is below / above the lower bound of the continuous matching range, the node D voltage will be above / below This indicates the existence of a mismatch / match, and the SL voltage point that changes the search result represents the lower bound of the matching range. In summary, when the search input voltage exceeds the upper or lower bound of the continuous matching range, the node D voltage will be higher than... The search phase leads to ML discharge, and the search input voltage is only within the matching range when both search phases are matched.
[0051] The truth tables for write and search operations of the 2FeFET-1T ACAM cell are shown in the table below, where: (1) and This indicates write pulses with different durations; (2) Indicates the search input voltage; (3) , , , (4) D1 / D2 represents the analog values stored in F1 / F2.
[0052]
[0053] Since the voltage value of node D in both search phases is only determined when the search input voltage is within the continuous matching range. and Only then will it be lower than Therefore, for the simulated values stored in F1 and F2 ( and Programming can be done through and This is derived from... Because... and It is the voltage at the intermediate node of the equivalent voltage divider circuit, and the transistors connected in series in this circuit all operate in the linear region. Therefore, the voltage of the search line SL can be derived. )and as well as The relationship between them:
[0054]
[0055] in These are the constant coefficients in the transfer functions of transistors T0 and F1, respectively. , and These are the threshold voltages of transistors T0, F1, and F2, respectively. From the above two equations, the upper and lower bounds of the continuous matching range can be further derived:
[0056]
[0057] It is clear from the above two equations that It controls the position of the continuous matching range, and This controls the width of the continuous matching range, such as Figure 3 As shown. Therefore, by examining... and The programming can then use the above two formulas to obtain the upper and lower bounds of the set continuous matching range.
[0058] Overall structure and operation process of the 2FeFET-1T ACAM array:
[0059] Figure 4 As shown in (a), the 2FeFET-1T ACAM array includes an ACAM core, write / search buffer, word line driver, output sense amplifier SA, and peripheral circuitry. The words stored in the array are arranged in rows, with word lines WL / It is connected to the power supply rails of the voltage divider circuit structure in each row, while SL and CL are shared by the gates of F1 and T0 of each cell in the same column.
[0060] In a typical CAM array, each ML is connected to the power line ScL via a PMOS transistor. The power supply pre-charges the ML before the search operation. In the 2FeFET-1T ACAM array of this invention, the pre-charging of the ML is moved to the search operation within the cell, connected to the source of F2 in each row of cells, while ScL is connected to the drain of F2, as shown below. Figure 4 As shown in (b). Figure 4 In (b), ML is simultaneously connected to the detection amplifier SA as its output. The two inverters inside the SA structure detect the ML voltage and output the matched / unmatched result, denoted as SAO. In the matched case, since there is no strong pre-charge path, the ML voltage is significantly lower than the inverter's threshold voltage, so SAO is low, indicating a match. In the unmatched case, ML is charged by ScL through the conducting F2 in the unmatched unit. When the ML voltage is pulled up to exceed the inverter's threshold voltage, SAO becomes high. At this time, observing the SAO signal indicates that the search result has been confirmed as unmatched, so all search configurations are immediately cut off, and there is no need to provide voltage to pre-charge ML. This prevents the ML voltage from being charged to the power supply voltage, reducing voltage swing and saving pre-charge time and energy consumption.
[0061] The entire operation process of the 2FeFET-1T ACAM array is as follows:
[0062] (1) Before the array composed of 2FeFET-1T ACAMs starts working, data is stored for each cell: that is, after calculating the simulated values of position and width based on the upper and lower bounds of the continuous matching range, F1 and F2 are written through SL and ScL respectively. In addition, during the writing operation, it is also necessary to... The bias suppression scheme is applied to all unselected rows in relation to the write level (WL) to avoid write interference.
[0063] (2) Each search cycle will be divided into search phase one and search phase two:
[0064] To ensure the accuracy of the search operation, ML is reset before the start of each search phase via a pull-down NMOS transistor. These two transistors are controlled by two independent clock signals, CLK1 and CLK2, respectively. Figure 4 As shown in (b), the phase relationship between the two clock signals is fixed and can be adjusted.
[0065] Furthermore, this invention proposes an adaptive scheme for selectively terminating the search phase early, which is implemented using only an additional AND gate, such as... Figure 4 As shown in (b), the two input signals of this AND gate are CLK2 and the intermediate state signals of the two cascaded inverters, respectively, thereby generating a new pull-down control signal CLK2A to replace CLK2. By introducing CLK2A, the ACAM of the 2FeFET-1T can adaptively and selectively terminate the search operation in advance, as shown in (b). Figure 5 As shown. If a match occurs in search phase one, then when CLK2 is high, CLK2A is also high, resetting ML and continuing to search phase two. Conversely, if a mismatch occurs in search phase one, then when CLK2 is high, CLK2A will be low. In this case, ML will not be reset, and SAO will already be high, indicating a final mismatch result, thus prematurely terminating search phase two. In short, search phase two will only proceed if a match result is displayed in search phase one, because the mismatch result in search phase one is sufficient to determine the final search result.
[0066] Search Phase 1: When clock signal CLK1 is high, CLK1 controls the gate of the pull-down NMOS transistor connected to ML, discharging ML to ground and resetting it; when CLK1 is low, the discharge of ML is turned off, and then WL and Set to 0 and 1, SL is set to search for the input voltage, and ScL is connected to the power supply. At this point, for matched analog CAM units, ScL will not charge ML via F2; for unmatched analog CAM units, ScL will charge ML via F2. After waiting for a period of time, observe the SAO output of each row. If it is 1, it means that ML in this row has been charged, the row is unmatched, and all search configurations are immediately cut off. If it is 0, it means that ML in this row has not been charged, the row is currently matched.
[0067] Search Phase Two: If a match is observed for ML in Search Phase One, then Search Phase Two continues. First, when clock signal CLK2 is high, CLK2A is also high. CLK2A discharges ML to ground by controlling the gate of another pull-down NMOS transistor connected to ML. When CLK2 is low, CLK2A is also low, thus turning off the discharge of ML and... Reverse the values to 1 and 0, keeping the voltages of SL and ScL unchanged. At this time, for matched analog CAM units, ScL will not charge ML through F2, while for unmatched analog CAM units, ScL will charge ML through F2. After waiting for a certain period of time, observe the SAO output of each row. If it is 1, it means that ML in this row has been charged, and the row is ultimately unmatched. If it is 0, it means that ML in this row has not been charged, and the row is determined to be a complete match through two stages of search.
[0068] The functions and effects of this invention are further illustrated and demonstrated through the following simulation experiments:
[0069] 1. Simulation conditions
[0070] The experiment used a physical circuit-compatible SPECTRE and SPICE model to simulate FeFETs, based on the Preisach model. This model enables efficient design and analysis and has been widely used in FeFET circuit design. The basic transistor used in the simulation was the UMC 40nm model.
[0071] The applicant simulated the ACAM design of 2FeFETT-1T using SPECTRE software and compared the results with non-patent literature 1 (AT Do et al., “Design of a power efficient cam using automated background checking scheme for small match line swing,” in 2013 Proceedings of the ESSCIRC, pp. 209–212, IEEE, 2013.) and non-patent literature 2 (J. Li et al., “1 mb 0.41 µm”). 2"2t-2r cell nonvolatile tcam with two-bit encoding and clocked self-referenced sensing," JSSC, vol. 49, no. 4, pp. 896–907, 2014.), Non-Patent Document 3 (C. Wang et al., "Design of magnetic nonvolatile tcam with priority-decision inmemory technology for high speed, low power, and high reliability," IEEETCAS-I, vol. 67, no. 2, pp. 464–474, 2019.), Non-Patent Document 4 (J. Cai et al., "Energy efficient data search design and optimization based on a compactferroelectric fet content addressable memory," in Proceedings of the 59thACM / IEEE DAC, pp. 751–756, 2022.), Non-Patent Document 5 (C. Li et al., "A scalabledesign of multi-bit ferroelectric content addressable memory for data-centriccomputing," in 2020 IEEE IEDM, pp. 29–3, IEEE, 2020.), Non-Patent Document 6 (R. Rajaeiet al., "Compact single-phase-search multistate content-addressable memorydesign using one fefet / cell," IEEE TED, vol. 68, no. 1, pp. 109–117, 2020.), Non-Patent Document 7 (X. Yin et al.Nine CAM designs were compared in non-patent literature 8 (C. Li et al., “An ultracompact single ferroelectric field-effect transistor binary and multibit associative search engine,” *Advanced Intelligent Systems*, vol. 5, no. 7, p. 2200428, 2023.), and non-patent literature 9 (X. Yin et al., “Fecam: A universal compact digital and analog content addressable memory using ferroelectric,” *IEEE TED*, vol. 67, no. 7, pp. 2785–2792, 2020.).
[0072] The main metrics for comparison include the number of transistors per cell, the area of each cell, the search latency, and the search energy consumption per cell per search. For the ACAM design in this invention, the search latency is measured using the worst-case latency, i.e., the case where only one cell is mismatched; the energy consumption for each search phase is measured using the average energy consumption, i.e., the case where half of the cells in a row are matched and half are mismatched.
[0073] 2. Simulation Results
[0074] 1) Verification of the correctness of ACAM unit operation functions
[0075] The applicant first evaluated the impact of the analog values stored in F1 and F2 in a single ACAM unit on the programming of continuous matching ranges, such as Figure 6 As shown in (a) and (b), respectively, by fixing Adjustment and fixed Adjustment To conduct an evaluation. Figure 6 In (a), by fixing Achieved a fixed Curve, when As the value increases, the overlap range of the MLSA output curves indicates that the width of the continuous matching range also increases accordingly. This phenomenon verifies that the stored value of F2 can correctly set the width of the continuous matching range. Figure 6 In (b), when When offset, it will cause The curve also shifts accordingly, but remains fixed. This ensures that the overlap range of the MLSA output curves remains constant. This phenomenon verifies the correctness of the stored value of F1 in setting the position of the continuous matching range. Furthermore, the applicant... Figure 6 Figures (c) and (d) demonstrate the ability of the ACAM unit in this invention to set multiple consecutive matching ranges, with a maximum range size of 1V. Figure 6 In (c), the range changes from [0.1V, 0.2V] to [0.1V, 1.1V], with only the upper bound increasing by 0.1V, and in... Figure 6 In (d), the range changes from [0.1V, 1.1V] to [1.0V, 1.1V], with only the lower bound increasing by 0.1V. Figure 6 The corresponding examples are also shown in (c) and (d). and The settings further validated the correctness of the continuous matching range programming.
[0076] like Figure 7 As shown in (a), the applicant performed transient simulations on the ACAM cell to verify the functional correctness of the cell search operation. In this case, the ACAM cell was pre-set with a continuous matching range of 0.4V and 0.6V boundaries, as shown in (a). Figure 7 As shown in (b). Therefore, when the search input voltage is 0.5V, the SA output remains high throughout both search phases to achieve matching, while when the search input voltage is 0.7V / 0.3V, the SA output drops to low in search phase one / two, indicating a mismatch. Furthermore, Figure 7 Figure (c) shows a 3-D transient simulation waveform of the SA output as a function of the search input voltage. It can be seen that only search input voltages within the matching range [0.4V, 0.6V] can yield a matching search result. This case verifies the correctness of the search operation function of the ACAM unit of this invention. Furthermore, Figure 7 (d) also demonstrates the ability of the 2FeFET-1T ACAM cell to store eight non-overlapping consecutive matching ranges, indicating that it has 3-bit MLC storage characteristics and verifying that ACAM can cover MCAM functionality.
[0077] 2) Verification of the correctness of ACAM array operation function
[0078] Figure 8 The transient simulation waveforms of the ACAM array in this invention, including three search cases, are shown to verify the array operation and the correctness of the proposed adaptive scheme for selectively terminating the search phase early. In the first search case, SAO is high in search phase one, indicating that the search result is already a mismatch. Therefore, CLK2A is low in search phase two, ML is not reset, and the search configuration is cut off, thus terminating search phase two early. In the second and third search cases, SAO is low in search phase one, indicating that the search for the upper bound of the matching range is a match. In this case, search phase two is still needed to determine the final search result. In search phase two, SAO is low / high in the second / third search cases, indicating that the final search result is a match / mismatch. These three search cases successfully verify the functional correctness of the ACAM array operation in this invention.
[0079] 3) Search latency and energy consumption analysis of ACAM arrays with different row numbers and word lengths
[0080] In each search phase, the search energy consumption of the ACAM array mainly consists of two parts: (i) pre-charge energy consumption determined based on the ML-associated capacitor and voltage swing; and (ii) SA energy consumption dominated by the two cascaded inverters. Furthermore, based on probabilistic statistics, the applicant assumes that the probability of each ACAM cell matching the input analog value is 1 / 2. Therefore, the matching probability of a certain row in search phase one is:
[0081]
[0082] Where n represents the word length. Then, considering the adaptive scheme for selectively terminating the search phase proposed in this invention, when a mismatch occurs in search phase one, the early termination of search phase two will not incur energy consumption. Therefore, the average search energy consumption per row can be calculated as follows:
[0083]
[0084] in, and This represents the energy consumption of search phases one and two under the average condition of half of the units being matched. This represents the energy consumption of search phase one under the condition of all cells matching. The applicant measured the search delay of the ACAM array under the worst-case scenario.
[0085] Based on the above analysis Figure 9 Figures (a) and (b) illustrate the search latency and power consumption of ACAM arrays with different numbers of rows and different word lengths in this invention. Figure 9As shown in (a), since each row in the ACAM array is independent, the search energy consumption of the ACAM array increases linearly with the number of rows, while the change in search latency is negligible. On the other hand, as... Figure 9 As shown in (b), with increasing word length, the capacitance associated with ML increases, thereby slowing down the pre-charge speed and signal propagation speed of ML, leading to increased search latency. According to... The calculation formula, when the word length is short, shows that as the word length increases, the influence of the second term in the formula caused by the second search phase gradually weakens, thus leading to... Overall, the trend is downward; however, when the word length is long, the effect of the second term in the formula can be ignored. In this case, as the word length increases, the increased ML correlation capacitance will lead to an increase in the pre-charge energy of each word. Overall, the trend is upward again. Furthermore, since SA energy consumption mainly depends on two cascaded inverters, the increase in word length has a negligible impact on SA energy consumption. From this perspective, the average search energy consumption per unit actually continues to decrease.
[0086] 4) Robustness verification against process changes
[0087] The applicant also conducted robustness verification tests on the ACAM array design in this invention. It is assumed that the FeFET device exhibits an experimental variation of σ = 40mV at each threshold voltage state, while the CMOS device is configured for Monte Carlo process corners. Figure 10 As shown, with the increase in word length per line, 100 Monte Carlo simulations were performed for the cases with the above-mentioned process changes, including the worst-case mismatch and all-cell matching cases. Figure 10 The transient simulation waveforms and statistical behavior of SAO during the search operation are shown. These 100 Monte Carlo simulation results show that as the array size increases, the detection margin between matching and non-matching cases gradually decreases, but the ACAM array can still achieve successful search operations, which demonstrates the reliability and robustness of the ACAM design proposed in this invention.
[0088] 5) Performance Comparison
[0089] The table below compares the performance indicators of the FeFET-based ACAM design in this invention with those of other CAM designs.
[0090]
[0091] The table above summarizes the technical specifications of the 2FeFET-1T ACAM and other CAMs, where the cell size is estimated based on the 2x22FeFET-1T ACAM array layout. As can be seen from the table, the 2FeFET-1T ACAM design proposed in this invention achieves the best energy efficiency among all B / M / ACAMs. This is mainly due to the reduction in ML-related capacitance and voltage swing, as well as the adoption of an adaptive scheme to terminate the search operation early. While the 2FeFET-1T ACAM suffers from increased search latency due to its inherent two-stage search scheme, this is still acceptable across all CAM designs. Compared with the state-of-the-art existing ACAM designs, the 6T-2R ACAM design (Non-Patent Document 8) has a lower search delay due to its current-based sensing. However, its search energy consumption is 8.39 times higher than the 2FeFET-1T ACAM design proposed in this invention. Moreover, this design has a more compact structure and reduces area overhead. In addition, this design eliminates the need for analog voltage conversion in the 2FeFET ACAM design (Non-Patent Document 9) by adopting a single search line input. Furthermore, by further optimizing the array architecture, this design achieves an energy efficiency 2.94 times higher than the 2FeFET ACAM design.
[0092] The results above demonstrate that this invention not only possesses non-volatility, which is difficult to achieve in CMOS design, and robustness to process variations, but also features compact design and high search efficiency, showing its potential in data-intensive search applications. Furthermore, the results also verify the functional correctness of the ACAM cells and the operations within the array in this invention.
[0093] The above embodiments are used to explain and illustrate the present invention, but not to limit the present invention. Any modifications and changes made to the present invention within the spirit and scope of the claims shall fall within the protection scope of the present invention.
Claims
1. A high-efficiency analog CAM operation method based on FeFET structure, characterized in that, The analog CAM unit consists of one NMOS transistor and two FeFET transistors, with the NMOS transistor being T0 and the two FeFET transistors being F1 and F2, respectively. In this analog CAM unit structure, T0 and F1 are connected in series to form an equivalent voltage divider circuit. The source of T0 is connected to the word line WL, and the source of F1 is connected to the word line WL. The gate of T0 is connected to the control line CL, the gate of F1 is connected to the search line SL, the drain of T0 is connected to the drain of F1 and the gate of F2 D, and the drain of F2 is connected to the matching line ML. ML discharges through F2. The operating methods include: Before the cross array composed of several analog CAM units starts working, data is stored for each analog CAM unit. That is, after calculating the analog values of position and width based on the upper and lower bounds of the continuous matching range, F1 and F2 are written through SL and charging line ScL respectively. Each search cycle is divided into search phase one and search phase two. Each phase has independent clock signals CLK1 and CLK2, which are used to search for the upper and lower bounds of the continuous matching range, respectively. Search Phase 1: When clock signal CLK1 is high, CLK1 discharges ML to ground by controlling the gate of the pull-down NMOS transistor connected to ML; when CLK1 is low, the discharge of ML is turned off, and WL and Set to 0 and 1, SL is set to search for the input voltage condition, and ScL is connected to the power supply V. DD At this point, for matched analog CAM units, ScL will not charge ML through F2; for mismatched analog CAM units, ScL will charge ML through F2. After waiting for a period of time, observe the output of the detection amplifier SA for each row. If it is 1, it means that ML in this row is charged, the row is mismatched, and all search configurations are immediately cut off, so that ML is not charged to V. DD This reduces the voltage swing of ML; if it is 0, it means that ML in this row is not charged and the row is currently matched. Search Phase Two: An adaptive scheme for selectively terminating the search phase is introduced. If a mismatch is observed in ML during Search Phase One, Search Phase Two is terminated early, and the mismatch result is directly output. If a match is observed in ML during Search Phase One, Search Phase Two continues. First, when clock signal CLK2 is high, CLK2 discharges ML to ground by controlling the gate of another pull-down NMOS transistor connected to ML. When CLK2 is low, the discharge of ML is turned off, and WL and... Reverse the values to 1 and 0, keeping the voltages of SL and ScL unchanged. At this time, for matched analog CAM units, ScL will not charge ML through F2, while for unmatched analog CAM units, ScL will charge ML through F2. After waiting for a certain period of time, observe the output of each row detection amplifier. If it is 1, it means that ML in this row has been charged, and the row is ultimately unmatched, and all search configurations are immediately cut off. If it is 0, it means that ML in this row has not been charged, and the row is a perfect match through the two-stage search.
2. The operation method of a high-efficiency analog CAM based on a FeFET structure according to claim 1, characterized in that, By operating the gate of F1 and the drain of F2 respectively, and keeping D always grounded, sufficient voltage drops are generated at the gate and source terminals of F1 and F2 to store the analog values of the two FeFETs; the analog values stored in F1 and F2 are respectively set with the position and width of the continuous matching range.
3. The operation method of a high-efficiency analog CAM based on a FeFET structure according to claim 1, characterized in that, The query input is connected to the gate of F1 via a single search line SL.
4. The operation method of a high-efficiency analog CAM based on a FeFET structure according to claim 1, characterized in that, A cross array composed of several analog CAM units shares the same vertical line SL for each column and the same horizontal line WL for each row. In each row, all the analog CAM units are connected to each other via ML, and each ML is simultaneously connected to a detection amplifier SA as its output.
5. The operation method of a high-efficiency analog CAM based on a FeFET structure according to claim 4, characterized in that, In a cross array composed of several analog CAM cells, ML, which was originally connected to the drain of F2 in the analog CAM cell design, is now connected to the source of F2. At the same time, the drain of F2 is connected to the charging line ScL. This transfers the pre-charging of ML to the search operation within the analog CAM cell, thereby reducing the voltage swing of ML.