Adaptive bias decoder for analog neural memory arrays in artificial neural networks

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a non-volatile memory array as a synapse, combined with a differential summer and activation function circuit, the problems of low energy efficiency and high computational complexity in existing artificial neural network hardware are solved, achieving efficient simulation computation and fine tuning.

CN115968495BActive Publication Date: 2026-06-19SILICON STORAGE TECHNOLOGY INC

View PDF 6 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SILICON STORAGE TECHNOLOGY INC
Filing Date: 2021-01-05
Publication Date: 2026-06-19

Application Information

Patent Timeline

05 Jan 2021

Application

19 Jun 2026

Publication

CN115968495B

IPC: G11C16/04; G11C16/08; G11C16/24; G11C16/30

AI Tagging

Application Domain

Read-only memories

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing artificial neural network hardware technologies suffer from low energy efficiency and high computational complexity. In particular, CMOS-implemented synapses are too large, making it difficult to achieve efficient analog computation.

Method used

Using a non-volatile memory array as a synapse, vector matrix multiplication is achieved by configuring the memory cell array. Combined with a difference summer and activation function circuit, analog calculation and weight storage are realized, reducing the need for separate multiplication and addition logic circuits.

Benefits of technology

It achieves efficient simulation computing, reduces hardware complexity and energy consumption, and is suitable for fine-tuning of artificial neural networks and high-performance information processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115968495B_ABST

Patent Text Reader

Abstract

Numerous embodiments of analog neural memory arrays are disclosed. Some embodiments include an adaptive bias decoder that provides additional bias to the array input lines to compensate for ground floating above 0V. This is useful, for example, for minimizing voltage drops during read, program, or erase operations while maintaining the accuracy of the operation.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] Priority Statement

[0002] This application claims priority to U.S. Provisional Patent Application No. 63 / 048,470, filed July 6, 2020, entitled “Adaptive Bias Decoder for Analog Neural Memory Array in Artificial Neural Network With Source Line Pulldown Mechanism,” and U.S. Patent Application No. 17 / 140,924, filed January 4, 2021, entitled “Adaptive Bias Decoder for Analog Neural Memory Array in Artificial Neural Network.” Technical Field

[0003] Many implementations of analog neural memory arrays have been disclosed. Background Technology

[0004] Artificial neural networks mimic biological neural networks (the central nervous system of animals, especially the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are often unknown. Artificial neural networks typically consist of interconnected layers of "neurons" that exchange messages with each other.

[0005] Figure 1 An artificial neural network is illustrated, where circles represent the inputs or layers of neurons. Connections (called synapses) are indicated by arrows and have numerical weights that can be adjusted empirically. This allows the artificial neural network to adapt to its inputs and learn. Typically, an artificial neural network consists of layers with multiple inputs. There are usually one or more intermediate layers of neurons, and an output layer of neurons that provide the output of the neural network. Neurons at each level make decisions individually or collectively based on the data received from the synapses.

[0006] One of the major challenges in developing artificial neural networks for high-performance information processing is the lack of sufficient hardware technology. In reality, practical artificial neural networks rely on a large number of synapses to achieve high connectivity between neurons, i.e., very high computational parallelism. In principle, such complexity can be achieved using digital supercomputers or dedicated clusters of graphics processing units. However, compared to biological networks, these methods are generally energy inefficient, in addition to being costly, as biological networks consume far less energy primarily due to their ability to perform low-precision analog computations. CMOS analog circuits have been used in artificial neural networks, but given the large number of neurons and synapses, most CMOS-implemented synapses are excessively large.

[0007] The applicant previously disclosed an artificial (simulated) neural network utilizing one or more non-volatile memory arrays as synapses in U.S. Patent Application No. 15 / 594,439 (published as U.S. Patent Publication 2017 / 0337466), which is incorporated herein by reference. The non-volatile memory array operates as a simulated neuromorphic memory. As used herein, the term "neuromorphic" refers to a circuit that implements a model of a nervous system. The simulated neuromorphic memory includes a first plurality of synapses configured to receive a first plurality of inputs and generate a first plurality of outputs therefrom, and a first plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses includes a plurality of memory cells, wherein each memory cell includes: spaced-apart source and drain regions formed in a semiconductor substrate, wherein a channel region extends between the source and drain regions; a floating gate disposed over and insulated from a first portion of the channel region; and a non-floating gate disposed over and insulated from a second portion of the channel region. Each memory cell is configured to store weight values corresponding to a plurality of electrons on the floating gate. Multiple memory cells are configured to multiply a first plurality of inputs by stored weight values to generate a first plurality of outputs. An array of memory cells arranged in this manner may be called a vector matrix multiplication (VMM) array.

[0008] Examples of different non-volatile memory cells that can be used in a VMM will now be discussed.

[0009] Non-volatile memory cells

[0010] Various types of known nonvolatile memory cells can be used in VMM arrays. For example, U.S. Patent 5,029,130 (“130 Patent”), which is incorporated herein by reference, discloses an array of split-gate nonvolatile memory cells, which is a type of flash memory cell. Such memory cells 210 in... Figure 2As shown in the figure. Each memory cell 210 includes a source region 14 and a drain region 16 formed in a semiconductor substrate 12, with a channel region 18 therebetween. A floating gate 20 is formed over and insulated from (and controls the conductivity of) a first portion of the channel region 18, and is formed over a portion of the source region 14. A word line terminal 22 (which is typically coupled to a word line) has a first portion disposed over and insulated from (and controlling the conductivity of) a second portion of the channel region 18, and a second portion extending upward and located over the floating gate 20. The floating gate 20 and the word line terminal 22 are insulated from the substrate 12 by a gate oxide. A bit line terminal 24 is coupled to the drain region 16.

[0011] The memory cell 210 is erased by applying a high positive voltage to the word line terminal 22 (where electrons are removed from the floating gate), which causes electrons on the floating gate 20 to tunnel from the floating gate 20 to the word line terminal 22 through the intermediate insulator via the Fowler-Nordheim tunnel.

[0012] Memory cell 210 is programmed by applying a positive voltage to word line terminal 22 and a positive voltage to source region 14 (where electrons are placed on the floating gate). Electron current flows from drain region 16 to source region 14 (source line terminal). When electrons reach the gap between word line terminal 22 and floating gate 20, they are accelerated and become excited (heated). Due to electrostatic attraction from floating gate 20, some heated electrons are injected into floating gate 20 through gate oxide.

[0013] Memory cell 210 is read by applying a positive read voltage to the drain region 16 and word line terminal 22 (which connects the portion of channel region 18 below the word line terminal). If the floating gate 20 is positively charged (i.e., electrons are erased), the portion of channel region 18 below the floating gate 20 is also turned on, and current flows through channel region 18, which is sensed as an erased state or a "1" state. If the floating gate 20 is negatively charged (i.e., programmed electronically), the portion of channel region below the floating gate 20 is mostly or completely turned off, and current does not flow (or very little current) through channel region 18, which is sensed as a programmed state or a "0" state.

[0014] Table 1 shows the typical voltage ranges that can be applied to the terminals of memory cell 110 for performing read, erase, and program operations:

[0015] Table 1: Figure 2 Operation of flash memory cell 210

[0016] WL BL SL Read 1 0.5V-3V 0.1V-2V 0V Read 2 0.5V-3V 0V-2V 2V-0.1V erase Approximately 11V-13V 0V 0V programming 1V-2V 1μA-3μA 9V-10V

[0017] "Read 1" is the read mode where the cell current is output on the bit line. "Read 2" is the read mode where the cell current is output on the source line terminal.

[0018] Figure 3 The memory cell 310 is shown, which is related to Figure 2 The memory cell 210 is similar, but with the addition of a control gate (CG) terminal 28. The control gate terminal 28 is biased at a high voltage (e.g., 10V) during programming, at a low or negative voltage (e.g., 0V / -8V) during erasure, and at a low or medium voltage (e.g., 0V / 2.5V) during reading. Other terminals are similar. Figure 2 That kind of bias.

[0019] Figure 4 A quad-gate memory cell 410 is shown, comprising a source region 14, a drain region 16, a floating gate 20 over a first portion of a channel region 18, a select gate 22 (typically coupled to a word line WL) over a second portion of the channel region 18, a control gate 28 over the floating gate 20, and an erase gate 30 over the source region 14. This configuration is described in U.S. Patent 6,747,310, which is incorporated herein by reference for all purposes. Here, all gates except the floating gate 20 are non-floating gates, meaning they are electrically connected to or capable of being electrically connected to a voltage source. Programming is performed by heated electrons from the channel region 18 that inject themselves into the floating gate 20. Erasing is performed by electrons tunneling from the floating gate 20 to the erase gate 30.

[0020] Table 2 shows the typical voltage ranges that can be applied to the terminals of memory cell 410 for performing read, erase, and program operations:

[0021] Table 2: Figure 4 Operation of flash memory cell 410

[0022] WL / SG BL CG EG SL Read 1 0.5V-2V 0.1V-2V 0V-2.6V 0V-2.6V 0V Read 2 0.5V-2V 0V-2V 0V-2.6V 0V-2.6V 2V-0.1V erase -0.5V / 0V 0V 0V / -8V 8V-12V 0V programming 1V 1μA 8V-11V 4.5V-9V 4.5V-5V

[0023] "Read 1" is the read mode where the cell current is output on the bit line. "Read 2" is the read mode where the cell current is output on the source line terminal.

[0024] Figure 5 Memory cell 510 is shown, except that it does not have the erase gate EG terminal. Memory cell 510 is similar to... Figure 4 The memory cell 410 is similar. Erasure is performed by biasing the substrate 18 to a high voltage and the control gate CG terminal 28 to a low voltage or a negative voltage. Alternatively, erasure is performed by biasing the word line terminal 22 to a positive voltage and the control gate terminal 28 to a negative voltage. Programming and reading are similar. Figure 4 As it is.

[0025] Figure 6 A tri-gate memory cell 610 is shown, which is another type of flash memory cell. Memory cell 610 and... Figure 4 The memory cell 410 is identical to the memory cell 610, except that the memory cell 610 does not have a separate control gate terminal. Except that no control gate bias is applied, the erase operation (erasing via the erase gate terminal) and read operation are similar. Figure 4 The programming operation is performed without a control gate bias, and as a result, a higher voltage must be applied to the source line terminals during the programming operation to compensate for the lack of a control gate bias.

[0026] Table 3 shows the typical voltage ranges that can be applied to the terminals of memory cell 610 for performing read, erase, and program operations:

[0027] Table 3: Figure 6 Operation of flash memory cell 610

[0028] WL / SG BL EG SL Read 1 0.5V-2.2V 0.1V-2V 0V-2.6V 0V Read 2 0.5V-2.2V 0V-2V 0V-2.6V 2V-0.1V erase -0.5V / 0V 0V 11.5V 0V programming 1V 2μA-3μA 4.5V 7V-9V

[0029] "Read 1" is the read mode where the cell current is output on the bit line. "Read 2" is the read mode where the cell current is output on the source line terminal.

[0030] Figure 7 The stacked gate memory cell 710 is shown, which is another type of flash memory cell. Memory cell 710 and... Figure 2 The memory cell 210 is similar, except that the floating gate 20 extends over the entire channel region 18, and the control gate terminal 22 (which will be coupled to the word line here) extends over the floating gate 20, separated by an insulating layer (not shown). Programming is performed using hot electron injection from the channel 18 to the channel region near the drain region 16, and erasure is performed using Fowler-Nordheim electron tunneling from the floating gate 20 to the substrate 12. Read operations operate in a similar manner to those previously described for memory cell 210.

[0031] Table 4 shows the typical voltage ranges that can be applied to the terminals of memory cell 710 and substrate 12 to perform read, erase, and program operations:

[0032] Table 4: Figure 7 Operation of flash memory cell 710

[0033] CG BL SL substrate Read 1 0V-5V 0.1V–2V 0V-2V 0V Read 2 0.5V-2V 0V-2V 2V-0.1V 0V erase -8V to -10V / 0V FLT FLT 8V-10V / 15V-20V programming 8V-12V 3V-5V / 0V 0V / 3V-5V 0V

[0034] "Read 1" is a read mode in which the cell current is output on the bit line. "Read 2" is a read mode in which the cell current is output at the source line terminal. Optionally, in an array comprising rows and columns of memory cells 210, 310, 410, 510, 610, or 710, the source line can be coupled to a row of memory cells or two adjacent rows of memory cells. That is, the source line terminal can be shared by memory cells in adjacent rows.

[0035] Figure 8 A dual-split-gate memory cell 810 is shown. The memory cell 810 includes: a floating gate (FG) 20 disposed on and insulated from a substrate 12; a control gate 28 (CG) disposed on and insulated from the floating gate 20; an erase gate 30 (EG) disposed adjacent to and insulated from the floating gate 20 and the control gate 28, and disposed on and insulated from the substrate 12, wherein the erase gate is T-shaped, such that the apex corner of the control gate CG faces the inner corner of the T-shaped erase gate to improve erasure efficiency; and a drain region 16 (DR) adjacent to the floating gate 20 in the substrate (having bit line contacts 24 (BL) connected to the drain diffusion region 16 (DR)). The memory cells are formed as a pair of memory cells (A on the left and B on the right) sharing a common erase gate 30. This cell design is consistent with the above reference. Figures 2-7 The memory cell under discussion differs at least in that it lacks a source region beneath the erase gate EG, a select gate (also known as a word line), and a channel region for each memory cell. Instead, a single, continuous channel region 18 extends beneath two memory cells (i.e., from the drain region 16 of one memory cell to the drain region 16 of another). To read or program a memory cell, the control gate 28 of the other memory cell is raised to a sufficient voltage to turn on the underlying channel region portion via a voltage-coupled floating gate 20 (e.g., to read or program cell A, the voltage on the FGB is raised via a voltage coupling from the CGB to turn on the channel region portion beneath the FGB). Erasure is performed using Fowler-Nordheim electrons tunneling from the floating gate 20 to the erase gate 30. Programming is performed using hot electron injection from the channel 18 to the floating gate 20, denoted as Programming 1 in Table 5. Alternatively, programming is performed using Fowler-Nordheim electrons tunneling from the erase gate 30 to the floating gate 20, denoted as Programming 2 in Table 5. Alternatively, programming is performed using Fowler Nordheim electrons tunneling from the channel 18 to the floating gate 20, in which case the conditions are similar to Programming 2, except that the substrate is biased at a low or negative voltage while the erase gate is biased at a low positive voltage.

[0036] Table 5 shows the typical voltage ranges that can be applied to the terminals of memory cell 810 for performing read, erase, and program operations:

[0037] Table 5: Figure 8 Operation of flash memory cell 810

[0038]

[0039] To utilize memory arrays comprising one of the aforementioned types of non-volatile memory cells in artificial neural networks, two modifications were made. First, the circuitry was configured such that each memory cell could be individually programmed, erased, and read without adversely affecting the memory state of other memory cells in the array, as explained further below. Second, continuous (simulated) programming of the memory cells was provided.

[0040] Specifically, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully erased state to a fully programmed state independently with minimal interference to other memory cells. In another embodiment, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully programmed state to a fully erased state and vice versa, independently with minimal interference to other memory cells. This means that the cell storage device is analog, or at least can store one discrete value from many discrete values (such as 16 or 64 different values), which allows for very precise and individual tuning of all cells in the memory array, and makes the memory array ideal for storage and fine-tuning of synaptic weights in neural networks.

[0041] The methods and apparatus described herein can be applied to other non-volatile memory technologies, such as, but not limited to, FINFET split-gate flash or stacked-gate flash memory, NAND flash memory, SONOS (silicon-oxide-nitride-oxide-silicon with charge trapped in nitride), MONOS (metal-oxide-nitride-oxide-silicon with metal charge trapped in nitride), ReRAM (resistive RAM), PCM (phase-change memory), MRAM (magnetic RAM), FeRAM (ferroelectric RAM), OTP (double-layer or multi-layer programmable at one time), and CeRAM (associative electron RAM). The methods and apparatus described herein can also be applied to volatile memory technologies for neural networks, such as, but not limited to, SRAM, DRAM, and / or volatile synaptic cells.

[0042] Neural networks using non-volatile memory cell arrays

[0043] Figure 9 This conceptually illustrates a non-limiting example of a neural network using a non-volatile memory array in this embodiment. This example uses a non-volatile memory array neural network for a facial recognition application, but any other suitable application can also be implemented using a neural network based on a non-volatile memory array.

[0044] In this example, S0 is the input layer, which is a 32x32 pixel RGB image with 5-bit precision (i.e., three 32x32 pixel arrays, one for each color R, G, and B, with 5-bit precision per pixel). The synapse CB1 from the input layer S0 to layer C1 applies different sets of weights in some cases and shared weights in others, and scans the input image with a 3x3 pixel overlapping filter (kernel), shifting the filter by one pixel (or more than one pixel depending on the model). Specifically, the values of nine pixels in a 3x3 portion of the image (i.e., called the filter or kernel) are provided to synapse CB1, where these nine input values are multiplied by appropriate weights, and after summing the output of this multiplication, a single output value is determined by the first synapse of CB1 to generate the pixels of one of the feature maps in layer C1. The 3x3 filter is then shifted one pixel to the right within the input layer S0 (i.e., adding a column of three pixels to the right and releasing a column of three pixels to the left), thereby providing the nine pixel values from this newly positioned filter to synapse CB1, where they are multiplied by the same weights and the second single output value is determined by the associated synapse. This process continues until the 3x3 filter scans all three colors and all bits (precision values) across the entire 32x32 pixel image of the input layer S0. This process is then repeated using different sets of weights to generate different feature maps for C1 until all feature maps for layer C1 are computed.

[0045] At layer C1, in this example, there are 16 feature maps, each with 30x30 pixels. Each pixel is a new feature pixel extracted from the product of the input and the kernel, so each feature map is a two-dimensional array. Therefore, in this example, layer C1 consists of a 16-layer two-dimensional array (remember that the layers and arrays referred to in this article are logical relationships, not necessarily physical relationships; that is, the array does not have to be oriented as a physical two-dimensional array). Each of the 16 feature maps in layer C1 is generated by one set of sixteen different groups of synaptic weights applied to the filter scan. The C1 feature maps can all relate to different aspects of the same image features, such as boundary recognition. For example, the first map (generated using the first weight recombination, shared for all scans used to generate the first map) can recognize circular edges, the second map (generated using the second weight recombination, different from the first weight recombination) can recognize rectangular edges, or the aspect ratio of certain features, and so on.

[0046] Before transitioning from layer C1 to layer S1, activation function P1 (pooling) is applied, which pools the values from consecutive non-overlapping 2x2 regions in each feature map. The purpose of the pooling function is to average the neighboring locations (or, alternatively, use a max function) to, for example, reduce the dependence on edge locations and reduce the data size before moving to the next stage. At layer S1, there are 16 15x15 feature maps (i.e., sixteen different arrays, each 15x15 pixels). The synapse CB2 from layer S1 to layer C2 scans the map in S1 using a 4x4 filter, where the filter is shifted by 1 pixel. At layer C2, there are 22 12x12 feature maps. Before transitioning from layer C2 to layer S2, activation function P2 (pooling) is applied, which pools the values from consecutive non-overlapping 2x2 regions in each feature map. At layer S2, there are 22 6x6 feature maps. An activation function (pooling) is applied to the synapse CB3 from layer S2 to layer C3, where each neuron in layer C3 is connected to each mapping in layer S2 via a corresponding synapse in CB3. There are 64 neurons in layer C3. The synapse CB4 from layer C3 to the output layer S3 completely connects C3 to S3, meaning each neuron in layer C3 is connected to every neuron in layer S3. The output at S3 comprises 10 neurons, with the highest-output neuron determining the class. For example, this output could indicate the recognition or classification of the content of the original image.

[0047] Synapses for each layer are implemented using an array or a portion of an array of non-volatile memory cells.

[0048] Figure 10 This is a block diagram of a system that could be used for this purpose. The VMM system 32 includes non-volatile memory cells and serves as synapses between layers (such as...). Figure 6 (CB1, CB2, CB3, and CB4 in the original text). Specifically, the VMM system 32 includes a VMM array 33 (comprising non-volatile memory cells arranged in rows and columns), an erase gate and word line gate decoder 34, a control gate decoder 35, a bit line decoder 36, and a source line decoder 37, which decode the corresponding inputs to the non-volatile memory cell array 33. The inputs to the VMM array 33 may come from the erase gate and word line gate decoder 34 or from the control gate decoder 35. In this example, the source line decoder 37 also decodes the outputs of the VMM array 33. Alternatively, the bit line decoder 36 may decode the outputs of the VMM array 33.

[0049] The VMM array 33 serves two purposes. First, it stores weights that will be used by the VMM system 32. Second, the VMM array 33 efficiently multiplies the inputs with the weights stored in the VMM array 33 and adds them together at each output line (source line or bit line) to produce an output that will serve as the input to the next layer or the final layer. By performing multiplication and addition functions, the VMM array 33 eliminates the need for separate multiplication and addition logic circuits and is also highly efficient due to its in-situ memory computation.

[0050] The output of VMM array 33 is provided to a differential summer (such as a summing operational amplifier or a summing current mirror) 38, which sums the output of VMM array 33 to create a single value for the convolution. Differential summer 38 is arranged to perform the summation of both the positive and negative weight inputs to output a single value.

[0051] The output values of the difference summer 38 are then summed and provided to the activation function circuit 39, which corrects the output. The activation function circuit 39 can provide a sigmoid, tanh, ReLU function, or any other nonlinear function. The corrected output value of the activation function circuit 39 becomes the next layer's (e.g., Figure 9 The elements of the feature map of layer C1 are then applied to the next synapse to produce the next feature map layer or the final layer. Thus, in this example, the VMM array 33 constitutes multiple synapses (which receive their input from existing neuron layers or from input layers such as an image database), and the summer 38 and activation function circuit 39 constitute multiple neurons.

[0052] Figure 10 The inputs to the VMM system 32 (WLx, EGx, CGx, and optionally BLx and SLx) can be analog levels (e.g., current, voltage, or charge), binary levels, digital pulses (in which case a pulse-to-analog converter PAC may be required to convert the pulses to the appropriate input analog level), or digital bits (in which case a DAC is provided to convert the digital bits to the appropriate input analog level); the outputs can be analog levels, binary levels, digital pulses, or digital bits (in which case an output ADC is provided to convert the output analog level to digital bits).

[0053] Figure 11 A block diagram illustrating the use of a multi-layer VMM system 32 (here labeled VMM systems 32a, 32b, 32c, 32d, and 32e) is provided. Figure 11As shown, the input (denoted as Inputx) is converted from digital to analog by a digital-to-analog converter 31 and provided to the input VMM system 32a. The converted analog input can be voltage or current. The first-level input D / A conversion can be accomplished by using a function or LUT (lookup table) of appropriate analog levels of the matrix multiplier that maps Inputx to the input VMM system 32a. Input conversion can also be accomplished by an analog-to-analog (A / A) converter to convert an external analog input to a mapped analog input to the input VMM system 32a. Input conversion can also be accomplished by a digital-to-digital pulse (D / P) converter to convert an external digital input to one or more digital pulses mapped to the input VMM system 32a.

[0054] The output generated by the input VMM system 32a is provided as input to the next VMM system (hidden level 1) 32b, which in turn generates the output provided as input to the next VMM system (hidden level 2) 32c, and so on. The layers of the VMM system 32 serve as different layers of synapses and neurons in a convolutional neural network (CNN). Each VMM system 32a, 32b, 32c, 32d, and 32e can be an independent physical system comprising a corresponding non-volatile memory array, or multiple VMM systems can utilize different portions of the same physical non-volatile memory array, or multiple VMM systems can utilize overlapping portions of the same physical non-volatile memory array. Each VMM system 32a, 32b, 32c, 32d, and 32e can also be time-division multiplexed for different portions of its array or neurons. Figure 11 The example shown contains five layers (32a, 32b, 32c, 32d, 32e): one input layer (32a), two hidden layers (32b, 32c), and two fully connected layers (32d, 32e). Those skilled in the art will recognize that this is merely exemplary, and conversely, a system may include more than two hidden layers and more than two fully connected layers.

[0055] VMM array

[0056] Figure 12 This illustrates a neuronal VMM array 1200, which is particularly suitable for... Figure 3 The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1200 includes a memory array 1201 of non-volatile memory cells and a reference array 1202 of non-volatile reference memory cells (at the top of the array). Alternatively, another reference array may be placed at the bottom.

[0057] In VMM array 1200, control gate lines (such as control gate line 1203) extend vertically (therefore, reference array 1202 is orthogonal to control gate line 1203 in the row direction), and erase gate lines (such as erase gate line 1204) extend horizontally. Here, the inputs of VMM array 1200 are set on control gate lines (CG0, CG1, CG2, CG3), and the outputs of VMM array 1200 appear on source lines (SL0, SL1). In one embodiment, only even rows are used, and in another embodiment, only odd rows are used. The current placed on each source line (SL0, SL1, respectively) performs a summation function of all currents from the memory cells connected to that particular source line.

[0058] As described herein with respect to neural networks, the non-volatile memory cells of the VMM array 1200 (i.e., the flash memory of the VMM array 1200) are preferably configured to operate in the subthreshold region.

[0059] In weak inversion, the non-volatile reference memory cell and non-volatile memory cell described in this paper are biased:

[0060] Ids = Io * e (Vg-Vth) / nVt =w*Io*e (Vg) / nVt ,

[0061] Where w = e (-Vth) / nVt

[0062] Where Ids is the drain-to-source current; Vg is the gate voltage on the memory cell; Vth is the threshold voltage of the memory cell; Vt is the thermal voltage = k*T / q, where k is the Boltzmann constant, T is the temperature in Kelvin, and q is the electron charge; n is the slope factor = 1 + (Cdep / Cox), where Cdep = the capacitance of the depletion layer, and Cox is the capacitance of the gate oxide layer; Io is the memory cell current at the gate voltage equal to the threshold voltage, and Io is related to (Wt / L)*u*Cox*(n-1)*Vt 2 Proportional, where u is the carrier mobility, and Wt and L are the width and length of the memory cell, respectively.

[0063] For I-to-V logarithmic converters that use memory cells (such as reference memory cells or peripheral memory cells) or transistors to convert input current Ids to input voltage Vg:

[0064] Vg = n * Vt * log[Ids / wp * Io]

[0065] Here, wp refers to the w in the reference memory cell or the peripheral memory cell.

[0066] For I-to-V logarithmic converters that use memory cells (such as reference memory cells or peripheral memory cells) or transistors to convert input current Ids to input voltage Vg:

[0067] Vg = n * Vt * log[Ids / wp * Io]

[0068] Here, wp refers to the w in the reference memory cell or the peripheral memory cell.

[0069] For a memory array used as a VMM array for vector matrix multipliers, the output current is:

[0070] Iout = wa * Io * e (Vg) / nVt ,Right now

[0071] Iout = (wa / wp) * Iin = W * Iin

[0072] W = e (Vthp-Vtha) / nVt

[0073] Iin = wp * Io * e (Vg) / nVt

[0074] Here, wa = w for each memory cell in the memory array.

[0075] Word lines or control gates can be used as inputs to memory cells that accept input voltages.

[0076] Alternatively, the non-volatile memory cells of the VMM array described herein can be configured to operate in a linear region:

[0077] Ids=β*(Vgs-Vth)*Vds; β=u*Cox*Wt / L,

[0078] Wα(Vgs-Vth),

[0079] This means that the weight W in the linear region is proportional to (Vgs-Vth).

[0080] Word lines, control gates, bit lines, or source lines can be used as inputs to memory cells operating in a linear region. Bit lines or source lines can be used as outputs to memory cells.

[0081] For an I-to-V linear converter, a memory cell (such as a reference memory cell or a peripheral memory cell) or a transistor or resistor operating in the linear region can be used to linearly convert the input / output current into the input / output voltage.

[0082] Alternatively, the memory cells of the VMM array described herein can be configured to operate in a saturation region:

[0083] Ids = 1 / 2 * β * (Vgs - Vth)2 β=u*Cox*Wt / L

[0084] Wα(Vgs-Vth) 2 This means that the weight W is related to (Vgs-Vth). 2 proportional

[0085] Word lines, control gates, or erase gates can be used as inputs to memory cells operating in saturation regions. Bit lines or source lines can be used as outputs of output neurons.

[0086] Alternatively, the memory cells of the VMM array described herein can be used in all regions or combinations thereof (subthreshold, linear, or saturated regions) of each or more layers of a neural network.

[0087] Figure 13 A neuronal VMM array 1300 is shown, which is particularly suitable for Figure 2 The memory cell 210 shown serves as a synapse between the input layer and the next layer. The VMM array 1300 includes a memory array 1303 of non-volatile memory cells, a reference array 1301 of first non-volatile reference memory cells, and a reference array 1302 of second non-volatile reference memory cells. The reference arrays 1301 and 1302, arranged along the column direction of the array, are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs WL0, WL1, WL2, and WL3. In practice, the first and second non-volatile reference memory cells are diode-connected via a multiplexer 1314 (partially shown) through which current inputs flow. The reference cells are tuned (e.g., programmed) to a target reference level. The target reference level is provided by a reference microarray matrix (not shown).

[0088] Memory array 1303 serves two purposes. First, it stores the weights used by VMM array 1300 on their respective memory cells. Second, memory array 1303 efficiently multiplies the inputs (i.e., the current inputs provided in terminals BLR0, BLR1, BLR2, and BLR3, which reference arrays 1301 and 1302 convert into input voltages to provide to word lines WL0, WL1, WL2, and WL3) by the weights stored in memory array 1303, and then sums all the results (memory cell currents) to produce an output on the corresponding bit lines (BL0-BLN), which will be the input to the next layer or the final layer. By performing multiplication and addition functions, memory array 1303 eliminates the need for separate multiplication and addition logic circuits and is also highly efficient. Here, voltage inputs are provided on word lines (WL0, WL1, WL2, and WL3), and the output appears on the corresponding bit lines (BL0-BLN) during read (inference) operations. The current placed on each bit line in the bit lines BL0-BLN performs a summation function of the currents from all non-volatile memory cells connected to that particular bit line.

[0089] Table 6 shows the operating voltages used for the VMM array 1300. The columns in the table indicate the voltage applied to the word line for the selected cell, the word line for the unselected cell, the bit line for the selected cell, the bit line for the unselected cell, the source line for the selected cell, and the source line for the unselected cell, where FLT indicates floating, i.e., no voltage applied. The rows indicate read, erase, and program operations.

[0090] Table 6: Figure 13 Operation of VMM array 1300

[0091]

[0092] Figure 14 The diagram shows a neuronal VMM array 1400, which is particularly suitable for... Figure 2The memory cell 210 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1400 includes a memory array 1403 of non-volatile memory cells, a reference array 1401 of first non-volatile reference memory cells, and a reference array 1402 of second non-volatile reference memory cells. Reference arrays 1401 and 1402 extend in the row direction of the VMM array 1400. The VMM array is similar to VMM 1300, except that in VMM array 1400, word lines extend in the vertical direction. Here, inputs are set on word lines (WLA0, WLB0, WLA1, WLB2, WLA2, WLB2, WLA3, WLB3), and outputs appear on source lines (SL0, SL1) during read operations. The current placed on each source line performs a summation function of all currents from the memory cells connected to that particular source line.

[0093] Table 7 shows the operating voltages used for the VMM array 1400. The columns in the table indicate the voltage applied to the word line for the selected cell, the word line for the unselected cell, the bit line for the selected cell, the bit line for the unselected cell, the source line for the selected cell, and the source line for the unselected cell. The rows indicate read, erase, and program operations.

[0094] Table 7: Figure 14 Operation of VMM array 1400

[0095]

[0096] Figure 15 This illustrates a neuronal VMM array 1500, which is particularly suitable for... Figure 3 The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1500 includes a memory array 1503 of non-volatile memory cells, a reference array 1501 of first non-volatile reference memory cells, and a reference array 1502 of second non-volatile reference memory cells. Reference arrays 1501 and 1502 are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs CG0, CG1, CG2, and CG3. In practice, the first and second non-volatile reference memory cells are diode-connected via a multiplexer 1512 (partially shown), through which current inputs flow via BLR0, BLR1, BLR2, and BLR3. Each multiplexer 1512 includes a corresponding multiplexer 1505 and a common-source cascode transistor 1504 to ensure a constant voltage on the bit lines (such as BLRO) of each of the first and second non-volatile reference memory cells during a read operation. The reference cells are tuned to a target reference level.

[0097] Memory array 1503 serves two purposes. First, it stores weights that will be used by VMM array 1500. Second, memory array 1503 efficiently multiplies the inputs (current inputs provided to terminals BLR0, BLR1, BLR2, and BLR3, which reference arrays 1501 and 1502 convert into input voltages to be provided to control gates CG0, CG1, CG2, and CG3) by the weights stored in the memory array, and then sums all the results (cell currents) to produce an output that appears on BL0-BLN and will be the input to the next layer or the final layer. By performing multiplication and addition functions, the memory array eliminates the need for separate multiplication and addition logic circuits and is also highly efficient. Here, the inputs are provided on control gate lines (CG0, CG1, CG2, and CG3), and the outputs appear on bit lines (BL0-BLN) during read operations. The currents placed on each bit line perform a summation function of all currents from the memory cells connected to that particular bit line.

[0098] The VMM array 1500 implements unidirectional tuning for the non-volatile memory cells in the memory array 1503. That is, each non-volatile memory cell is erased and then partially programmed until the desired charge is reached on the floating gate. This can be performed, for example, using the precise programming techniques described below. If too much charge is placed on the floating gate (causing an incorrect value to be stored in the cell), the cell must be erased, and the sequence of partial programming operations must restart. As shown, two rows sharing the same erase gate (such as EG0 or EG1) need to be erased together (this is called page erasure), and thereafter, each cell is partially programmed until the desired charge is reached on the floating gate.

[0099] Table 8 shows the operating voltages used for the VMM array 1500. The columns in the table indicate the voltage applied to the word lines for the selected cell, the word lines for the unselected cell, the bit lines for the selected cell, the bit lines for the unselected cell, the control gate for the selected cell, the control gate for the unselected cell in the same sector as the selected cell, the control gate for the unselected cell in a different sector from the selected cell, the erase gate for the selected cell, the erase gate for the unselected cell, the source line for the selected cell, and the source line for the unselected cell. The rows indicate read, erase, and program operations.

[0100] Table 8: Figure 15 Operation of VMM array 1500

[0101]

[0102] Figure 16 The diagram shows a neuronal VMM array 1600, which is particularly suitable for... Figure 3The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1600 includes a memory array 1603 of non-volatile memory cells, a reference array 1601 of first non-volatile reference memory cells, and a reference array 1602 of second non-volatile reference memory cells. EG lines EGR0, EG0, EG1, and EGR1 extend vertically, while CG lines CG0, CG1, CG2, and CG3 and SL lines WL0, WL1, WL2, and WL3 extend horizontally. Similar to VMM array 1600, but unlike VMM array 1600, VMM array 1600 implements bidirectional tuning, where each individual cell can be completely erased, partially programmed, and partially erased as needed to achieve the desired amount of charge on the floating gate due to the use of individual EG lines. As shown in the figure, reference arrays 1601 and 1602 convert the input currents in terminals BLR0, BLR1, BLR2, and BLR3 into control gate voltages CG0, CG1, CG2, and CG3 to be applied to memory cells in the row direction (through the operation of reference cells connected via diodes of multiplexer 1614). Current outputs (neurons) are in bit lines BL0-BLN, where each bit line sums all currents from non-volatile memory cells connected to that particular bit line.

[0103] Table 9 shows the operating voltages used for the VMM array 1600. The columns in the table indicate the voltage applied to the word lines for the selected cell, the word lines for the unselected cell, the bit lines for the selected cell, the bit lines for the unselected cell, the control gate for the selected cell, the control gate for the unselected cell in the same sector as the selected cell, the control gate for the unselected cell in a different sector from the selected cell, the erase gate for the selected cell, the erase gate for the unselected cell, the source line for the selected cell, and the source line for the unselected cell. The rows indicate read, erase, and program operations.

[0104] Table 9: Figure 16 Operation of VMM array 1600

[0105]

[0106] The inputs of a VMM array can be analog levels, binary levels, timing pulses, or digital bits, and the outputs can be analog levels, binary levels, timing pulses, or digital bits (in which case an output ADC is needed to convert the output analog level current or voltage into digital bits).

[0107] For each memory cell in the VMM array, each weight w can be implemented by a single memory cell, a differential cell, or a hybrid memory cell (the average of two or more cells). In the case of differential cells, two memory cells are needed to implement the weight w as a differential weight (w = w+ - w-). In the case of two hybrid memory cells, two memory cells are needed to implement the weight w as the average of the two cells.

[0108] One drawback of existing arrays of non-volatile memory cells is that a relatively large amount of time is required to pull the source line to ground to perform a read or erase operation.

[0109] An improved VMM system is needed that can pull the source line to ground more accurately than existing systems. An adaptive bias circuit is also needed that can change the bias applied to the input and / or output lines of the VMM system in cases where the source line is pulled down to a voltage greater than 0V (i.e., when the "ground" line is not actually at 0V). Summary of the Invention

[0110] Numerous embodiments of analog neural memory arrays are disclosed. Some embodiments include improved mechanisms for accurately pulling the source lines low to ground. This is useful, for example, to minimize voltage drops during read, program, or erase operations. Other embodiments include specific implementations for negative and positive inputs with negative and positive weights. Still other embodiments include adaptive biasing circuitry capable of altering the bias of the input or output lines of the VMM system in those cases where the source lines are pulled down to voltages greater than 0V (i.e., where the "ground" line is not actually at 0V). Attached Figure Description

[0111] Figure 1 This illustrates existing artificial neural networks.

[0112] Figure 2 This illustrates a split-gate flash memory cell from the prior art.

[0113] Figure 3 This illustrates another prior art split-gate flash memory cell.

[0114] Figure 4 This illustrates another prior art split-gate flash memory cell.

[0115] Figure 5 This illustrates another prior art split-gate flash memory cell.

[0116] Figure 6 This illustrates another prior art split-gate flash memory cell.

[0117] Figure 7 This illustrates a stacked gate flash memory cell from the prior art.

[0118] Figure 8 This shows a dual-split-gate memory cell.

[0119] Figure 9 Different layers of an exemplary artificial neural network using one or more VMM arrays are shown.

[0120] Figure 10 A VMM system including a VMM array and other circuitry is shown.

[0121] Figure 11 An exemplary artificial neural network using one or more VMM systems is shown.

[0122] Figure 12 An implementation scheme for a VMM array is shown.

[0123] Figure 13 Another implementation of the VMM array is shown.

[0124] Figure 14 Another implementation of the VMM array is shown.

[0125] Figure 15 Another implementation of the VMM array is shown.

[0126] Figure 16 Another implementation of the VMM array is shown.

[0127] Figure 17 The VMM system is shown.

[0128] Figure 18A , Figure 18B and Figure 18C This illustrates a VMM array based on existing technology.

[0129] Figure 19A , Figure 19B and Figure 19C An improved VMM array is shown.

[0130] Figure 20 Another improved VMM array is shown.

[0131] Figure 21 A VMM system with an improved source pole pull-down mechanism is shown.

[0132] Figure 22 Another VMM system with an improved source pole pull-down mechanism is shown.

[0133] Figure 23 Another VMM system with an improved source pole pull-down mechanism is shown.

[0134] Figure 24 Another VMM system with an improved source pole pull-down mechanism is shown.

[0135] Figure 25 An exemplary layout diagram of a VMM system with an improved source pole pull-down mechanism is shown.

[0136] Figure 26 Another exemplary layout diagram of a VMM system with an improved source pole pull-down mechanism is shown.

[0137] Figure 27A , Figure 27B and Figure 27C Other improved VMM arrays are shown.

[0138] Figure 28 Another improved VMM array, including a redundant array, is shown.

[0139] Figure 29 Another improved VMM system is shown, which includes two VMM arrays and a shared virtual bit line switching circuit.

[0140] Figure 30 Another improved VMM system is shown.

[0141] Figure 31 An implementation scheme for the summer circuit is shown.

[0142] Figure 32 Another implementation of the summer circuit is shown.

[0143] Figure 33A and Figure 33B Other implementations of the summer circuit are shown.

[0144] Figure 34A , Figure 34B and Figure 34C An implementation scheme for the output circuit is shown.

[0145] Figure 35 The neuron output circuit is shown.

[0146] Figure 36 An implementation scheme for an analog-to-digital converter is shown.

[0147] Figure 37 Another implementation of the analog-to-digital converter is shown.

[0148] Figure 38 Another implementation of the analog-to-digital converter is shown.

[0149] Figure 39 Another implementation of the analog-to-digital converter is shown.

[0150] Figure 40 This shows the desired voltage difference between the control gate line and the source line in a VMM array.

[0151] Figure 41 This shows the desired voltage difference between the word line and the source line in a VMM array.

[0152] Figure 42 This shows the desired voltage difference between the bit line and the source line in a VMM array.

[0153] Figure 43 This shows a typical voltage change at the terminals in a VMM array in response to temperature variations.

[0154] Figure 44A and Figure 44B An adaptive bias circuit is shown.

[0155] Figure 45A and Figure 45B The regulator circuit is shown.

[0156] Figure 46A A digital-to-analog converter is shown.

[0157] Figure 46B An analog-to-digital converter is shown.

[0158] Figure 47 The decoder is shown.

[0159] Figure 48 This shows a current-to-voltage summator.

[0160] Figure 49 Another current-to-voltage summator is shown.

[0161] Figure 50 Another current-to-voltage summator is shown.

[0162] Figure 51 Another current-to-voltage summator is shown.

[0163] Figure 52 Another current-to-voltage summator is shown.

[0164] Figure 53 The high-voltage decoding circuitry for the erase gate decoder, control gate decoder, and source line decoder is shown. Detailed Implementation

[0165] The artificial neural network of this invention utilizes a combination of CMOS technology and non-volatile memory arrays.

[0166] Improved VMM system implementation plan

[0167] Figure 17A block diagram of a VMM system 1700 is shown. The VMM system 1700 includes a VMM array 1701, a row decoder 1702, a high-voltage decoder 1703, a column decoder 1704, a bitline driver 1705, input circuitry 1706, output circuitry 1707, control logic unit 1708, and a bias generator 1709. The VMM system 1700 further includes a high-voltage generation block 1710, which includes a charge pump 1711, a charge pump regulator 1712, and a high-voltage level generator 1713. The VMM system 1700 also includes an algorithm controller 1714, analog circuitry 1715, control logic unit 1716, and test control logic unit 1717. The systems and methods described below can be implemented in the VMM system 1700.

[0168] Input circuit 1706 may include circuitry such as a DAC (digital-to-analog converter), DPC (digital-to-pulse converter), AAC (analog-to-analog converter, such as a current-to-voltage converter), PAC (pulse-to-analog level converter), or any other type of converter. Input circuit 1706 may implement normalization, scaling functions, or arithmetic functions. Input circuit 1706 may implement a temperature compensation function for the input. Input circuit 1706 may implement activation functions such as ReLU or sigmoid functions. Output circuit 1707 may include circuitry such as an ADC (analog-to-digital converter for converting the neuron's analog output to digital bits), AAC (analog-to-analog converter, such as a current-to-voltage converter), APC (analog-to-pulse converter), or any other type of converter. Output circuit 1707 may implement activation functions such as ReLU or sigmoid functions. Output circuit 1707 may implement statistical normalization, regularization, up / down scaling functions, statistical rounding, or arithmetic functions (e.g., addition, subtraction, division, multiplication, shifting, logarithms) on the neuron's output. The output circuit 1707 can implement a temperature compensation function on the neuron output or array output (such as bitline output) to keep the array power consumption approximately constant or to improve the accuracy of the array (neuron) output, such as by keeping the IV slope approximately the same.

[0169] Figure 18AA prior art VMM system 1800 is illustrated. The VMM system 1800 includes exemplary cells 1801 and 1802, an exemplary bit line switch 1803 (which connects a bit line to a sensing circuit), an exemplary virtual bit line switch 1804 (which is coupled to a low level, such as ground during a read), and exemplary virtual cells 1805 and 1806 (source pull-down cells). Bit line switch 1803 is coupled to a cell column including cells 1801 and 1802 for storing data in the VMM system 1800. Virtual bit line switch 1804 is coupled to a column (bit line) of cells that are virtual cells not used for storing data in the VMM system 1800. This virtual bit line (also referred to as a source pull-down bit line) is used as a source pull-down during a read, meaning it is used to pull the source line SL low (such as ground) through a memory cell in the virtual bit line.

[0170] One drawback of the VMM system 1800 is that the input impedance of each cell varies due to the length of the electrical paths through the associated bit line switches, the cell itself, and the associated virtual bit line switches. For example, Figure 18B The electrical path is shown through bit line switch 1803, unit 1801, virtual unit 1805, and virtual bit line switch 1804. Similarly, Figure 18C The electrical path is shown through bit line switch 1803, vertical metal bit line 1807, cell 1802, dummy cell 1808, vertical metal bit line 1808, and dummy bit line switch 1804. It can be seen that the path through cell 1802 traverses a much longer bit line and dummy bit line, which is associated with higher capacitance and higher resistance. This results in a greater parasitic impedance in the bit line or source line of cell 1802 compared to cell 1801. This variability is a disadvantage, for example, as it causes the cell output accuracy applied to the read or verify (for programming / erase tuning cycles) cells to vary depending on the position of these cells in the array.

[0171] Figure 19A An improved VMM system 1900 is illustrated. The VMM system 1900 includes exemplary units 1901 and 1902, an exemplary bit line switch 1903 (which connects a bit line to a sensing circuit), exemplary virtual units 1905 and 1906 (source pull-down units), and an exemplary virtual bit line switch 1904 (which is coupled to a low level in the read, such as ground; this switch is connected to a virtual bit line, which is connected to a virtual unit used as a source pull-down). It can be seen that the exemplary virtual bit line switch 1904 and the other virtual bit line switches are located on the opposite end of the array to the bit line switches 1903 and the other bit line switches.

[0172] exist Figure 19B and Figure 19CThe benefits of this design can be seen in the text. Figure 19B The electrical path is shown through bit line switch 1903, unit 1901, virtual unit 1905 (source line pull-down unit), vertical metal bit line 1908, and virtual bit line switch 1904 (which is coupled to a low level such as ground level during reading). Figure 19C The electrical paths are shown through bit line switch 1903, vertical metal line 1907, cell 1902, dummy cell 1906 (source pull-down cell), and dummy bit line switch 1904. These paths are substantially identical (cell, interconnect length), which applies to all cells in the VMM system 1900. Therefore, the bit line impedance plus the source line impedance of each cell is substantially the same, meaning that the amount of parasitic voltage drop drawn for read or verify operations of different cells in the array varies relatively evenly.

[0173] Figure 20 A VMM system 2000 with global source line pull-down lines is shown. The VMM system 2000 is similar to the VMM system 1900, except that: virtual bit lines 2005a-2005n or 2007a-2007n are connected together (to act as global source line pull-down lines during read or verify operations to pull the memory cell source lines to ground); virtual bit line switches (such as virtual bit line switches 2001 and 2002) are connected or coupled to a common ground, denoted as ARYGND; and the source lines are coupled together to a source line switch 2003, which selectively pulls the source lines to ground. These changes further reduce the variation in (array) parasitic impedance between cells during read or verify operations. The source lines are connected together as SLARY 2888.

[0174] Figure 21VMM system 2100 is shown. VMM system 2100 includes bit line switch 2101, pull-down bit line switch 2102, pull-down bit line switch 2103, bit line switch 2104, data unit 2105 (here, "data unit" is a memory unit used to store weight values of a neural network), pull-down unit 2106, pull-down unit 2107, and data unit 2018. Note that pull-down units 2106 and 2107 are adjacent to each other. This allows the vertical metal lines BLpdx of the two pull-down units 2106 and 2107 to be connected together (line 2111) to reduce parasitic resistance caused by the resulting wider metal lines. During the read or verify (for programming / erase tuning cycle) operation of data unit 2105, current flows through bit line switch 2101 into the bit line terminal of unit 2105 and out to the source line terminal of unit 2015. At the source line terminal, the current then flows into source line 2110, where it flows into the source line terminals of pull-down units 2106 and 2107, and through pull-down bit line switches 2102 and 2103. During the read or verify (for programming / erase tuning cycle) operation of unit 2104, current flows through bit line switch 2104 into the bit line terminal of data unit 2108 and out to the source line terminal of unit 2108. At the source line terminal, the current then flows into source line 2110, where it flows into the source line terminals of pull-down units 2106 and 2107, and through pull-down bit line switches 2102 and 2103. This column pattern repeats throughout the array, with every four columns containing two columns of data cells and two adjacent array columns used for the pull-down operation. In another embodiment, the diffusers of the two pull-down cells in two adjacent columns can be merged into a larger diffuser to increase pull-down capability. In another embodiment, the diffuser of the pull-down cell can be made larger than the diffuser of the data cell to increase pull-down capability. In yet another embodiment, each pull-down cell has a bias condition different from the bias condition of the selected data cell.

[0175] In one embodiment, the pull-down unit has the same physical structure as a conventional data memory cell. In another embodiment, the pull-down unit has a different physical structure than a conventional data memory cell; for example, the pull-down unit may be a modified version of a conventional data memory cell, such as by modifying one or more physical dimensions (width, length, etc.) or electrical parameters (layer thickness, implantation, etc.). In yet another embodiment, the pull-down unit is a conventional transistor (without a floating gate), such as an I / O or high-voltage transistor.

[0176] Figure 22VMM system 2200 is shown. VMM system 2200 includes bit line 2201, pull-down bit line 2202, data cells 2203 and 2206, pull-down cells 2204 and 2205, and source line 2210. During a read or verify operation of cell 2203, current flows through bit line switch 2201 into the bit line terminal of cell 2203 and out to the source line terminal of cell 2203. At this source line terminal, the current then flows into source line 2210 and into the source line terminal of pull-down cell 2204, and through pull-down bit line BLpd 2202. This design is repeated for each column, and the final result is that the row containing pull-down cell 2204 is a pull-down cell row.

[0177] During the read or verify (for programming / erase tuning cycles) operation of cell 2206, current flows through bit line switch 2201 into the bit line terminal of cell 2206 and out to the source line terminal of cell 2206. At this source line terminal, the current then flows into source line 2211 and into the source line terminal of pull-down unit 2205, and through pull-down bit line 2202. This design is repeated for each column, and the end result is that the row containing pull-down unit 2205 is a pull-down unit row. Figure 22 As shown, there are four rows: two adjacent middle rows are used for drop-down cells, and the top and bottom rows are data cells.

[0178] Table 10 shows the operating voltages for the VMM system 2200. The columns in the table indicate the voltages set for the bit line, bit pull-down, word line, control gate, word line (WLS), control gate (CGS), erase gate (for all cells), and source line (for all cells). The rows indicate read, erase, and program operations. Note that the voltage biases for CGS and WLS during read operations are higher than the normal WL and CG biases to enhance the drive capability of the pull-down cells. During programming, the voltages biased against WLS and CGS can be negative to reduce interference.

[0179] Table 10: Figure 22 Operation of VMM array 2200

[0180]

[0181] Figure 23VMM system 2300 is shown. VMM system 2300 includes bit lines 2301 and 2302, data cells 2303 and 2306, and pull-down cells 2304 and 2305. During a read or verify operation of cell 2303 (for programming / erase tuning cycles), current flows through bit line switch 2301 into the bit line terminal of cell 2303 and out to the source line terminal of cell 2303, where the current then flows into the source line terminal of pull-down cell 2304 and through bit line 2302 (which in this case is used as a pull-down bit line). This design is repeated for each column, and the end result is that the row containing pull-down cell 2304 in the first mode is the pull-down cell row. During the read or verify operation of data unit 2306 (for programming / erase tuning cycles), current flows through bit line switch 2301 into the bit line terminal of unit 2306 and out to the source line terminal of unit 2306. At the source line terminal, the current then flows into the source line terminal of pull-down unit 2305 and through bit line 2302 (used as a pull-down bit line in this case). This design is repeated for each column, and the end result is that the row containing pull-down unit 2305 in the second mode is the pull-down unit row. Figure 23 As shown, there are four rows, with alternating odd (or even) rows used for drop-down cells and alternating even (or odd) rows for data cells.

[0182] It is worth noting that during the second mode, units 2305 and 2306 are active during reading or verification, and units 2303 and 2305 are used for the pull-down process, where the roles of bit lines 2301 and 2302 are reversed.

[0183] Table 11 shows the operating voltages for the VMM system 2300. The columns in the table indicate the voltages set on the bit lines for the selected data cell, the bit lines for the selected pull-down cell, the word lines for the selected data cell, the control gates for the selected data cell, the word lines (WLS) for the selected pull-down cells, the control gates (CGS) for the selected pull-down cells, the erase gates for all cells, and the source lines for all cells. The rows indicate read, erase, and program operations.

[0184] Table 11: Figure 23 Operating the VMM system 2300

[0185]

[0186] Figure 24VMM system 2400 is shown. VMM system 2400 includes bit line 2401, pull-down bit line 2402, (data) cell 2403, source line 2411, and pull-down cells 2404, 2405, and 2406. During a read or validate operation of cell 2403, current flows through bit line 2401 into the bit line terminal of cell 2403 and out to the source line terminal of cell 2403. At the source line terminal, the current then flows into source line 2411, and at that source line, the current then flows into the source line terminals of pull-down cells 2404, 2405, and 2406, from which current flows through pull-down bit line 2402. This design is repeated for each column, resulting in rows containing pull-down cells 2404, 2405, and 2406 that are each pull-down cell rows. This maximizes the pull-down applied to the source line terminals of cell 2403, as current is drawn into the pull-down bit line 2402 through the three cells. Note that the source lines of the four rows are connected together.

[0187] Table 12 shows the operating voltages for the VMM system 2400. The columns in the table indicate the voltages set for the bit line, bit pull-down, word line, control gate, erase gate, word line (WLS), control gate (CGS), erase gate, and source line for all cells. The rows indicate read, erase, and program operations.

[0188] Table 12: Figure 24 Operating the VMM system 2400

[0189]

[0190] Figure 25 It shows Figure 22 An exemplary layout 2500 of the VMM system 2200. Light-colored squares indicate metal contact with bit lines such as bit line 2201 and with pull-down bit lines such as pull-down bit line 2202.

[0191] Figure 26 Showing with Figure 22 The alternative layout 2600 of the VMM system 2200 is similar to that of the VMM system 2200, except that the pull-down bit line 2602 is extremely wide and spans two columns of pull-down cells. That is, the diffusion region for the pull-down bit line 2602 is wider than the diffusion region for the bit line 2601. Layout 2600 further shows cells 2603 and 2604 (pull-down cells), source line 2610, and bit line 2601. In another embodiment, the diffusion portions of the two pull-down cells (left and right) can be merged into a single larger diffusion portion.

[0192] Figure 27AThe VMM system 2700 is shown. To implement negative and positive weights in the neural network, half of the bit lines are designated as w+ lines (connected to the bit lines of the memory cells implementing positive weights), and the other half of the bit lines are designated as w- lines (connected to the bit lines of the memory cells implementing negative weights) and are distributed alternately between the w+ lines. The negative operation is performed at the output (neuron output) of the w- bit line by summing circuits (such as summing circuits 2701 and 2702). The outputs of the w+ and w- lines are combined to effectively give w = w+ - w- for each (w+, w-) cell pair of all (w+, w-) line pairs. Virtual bit lines or source line pull-down bit lines used to avoid FG-FG coupling and / or reduce IR voltage drop in the source lines during reads are not shown in the figure. The inputs of the system 2700 (such as to CG or WL) can have positive or negative values. For cases where the input is negative, since the actual input to the array is still positive (such as the voltage level on CG or WL), the array output (bit line output) is inverted before the output to achieve an equivalent function of the negative input.

[0193] Alternatively, refer to Figure 27B Positive weights can be implemented in the first array 2711, and negative weights can be implemented in the second array 2712, which is separate from the first array, and the resulting weights are appropriately combined by a summing circuit 2713. Similarly, virtual bit lines (not shown) or source lines pulled low (not shown) are used to avoid FG-FG coupling and / or reduce the IR voltage drop in the source lines during readout.

[0194] Alternatively, Figure 27C A VMM system 2750 is illustrated to implement negative and positive weights for a neural network with either positive or negative inputs. A first array 2751 implements positive inputs with both negative and positive weights, and a second array 2752 implements negative inputs with both negative and positive weights. The output of the second array is inverted before being added to the output of the first array via a summer 2755 because any input to any array has only positive values (such as analog voltage levels on CG or WL).

[0195] Table 10A shows an exemplary layout of a physical array arrangement of a pair of (w+, w-) bit lines BL0 / 1 and BL2 / 3, where four rows are coupled to source pull-down bit lines BLPWDN. (BL0, BL1) bit line pairs are used to implement the (w+, w-) lines. Between the (w+, w-) line pairs, there are source pull-down bit lines (BLPWDN). This is used to prevent adjacent (w+, w-) lines from coupling (e.g., FG to FG) into current-carrying (w+, w-) lines. Essentially, the source pull-down bit lines (BLPWDN) act as a physical barrier between the (w+, w-) line pairs.

[0196] For further details regarding the FG-FG coupling phenomenon and the mechanisms used to counteract it, please refer to U.S. Provisional Patent Application No. 62 / 981,757, filed February 26, 2020, entitled “Ultra-Precise Tuning of Analog Neural Memory Cells in a DeepLearning Artificial Neural Network,” which is incorporated herein by reference.

[0197] Table 10B shows different exemplary weight combinations. '1' means that a unit is used and there is a real output value, while '0' means that no unit is used and there is no value or no significant output value.

[0198] In another implementation, a virtual bit line can be used instead of a source line pull-down bit line.

[0199] In another implementation, virtual rows can also be used as physical barriers to avoid coupling between rows.

[0200] Table 10A: Exemplary Layout

[0201] <![CDATA[B LPWDN ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BLPWDN ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BLPWDN ]]> row0 w01+ w01- w02+ w02- row1 w11+ w11- w12+ w12- row2 w21+ w21- w22+ w22- row3 w31+ w31- w32+ w32-

[0202] Table 10B: Exemplary Weight Combinations

[0203] <![CDATA[ BLPWDN ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BLPWDN ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BLPWDN ]]> row0 1 0 1 0 row1 0 1 0 1 row2 0 1 1 0 row3 1 1 1 1

[0204] Table 11A shows another array implementation of the physical arrangement of (w+,w-) line pairs BL0 / 1 and BL2 / 3 with redundant lines BL01 and BL23 and source line pull-down line BLPWDN. BL01 is used for weight remapping of the BL0 / 1 pair, and BL23 is used for weight remapping of the BL2 / 3 pair.

[0205] Table 11B shows the case where distributed weights do not require remapping, essentially with no adjacent '1's between BL1 and BL3, causing coupling between adjacent bit lines. Table 11C shows the case where distributed weights require remapping, essentially with adjacent '1's between BL1 and BL3, causing coupling between adjacent bit lines. This remapping is shown in Table 11D, resulting in no '1' values between any adjacent bit lines. Furthermore, by remapping (meaning redistributing weights) the actual '1' value weights between bit lines, the total current along the bit lines is now reduced, resulting in more accurate values in the bit lines (output neurons). In this case, additional columns (bit lines) (BL01, BL23) are needed as redundant columns.

[0206] Tables 11E and 11F illustrate another implementation of remapping noisy cells (or defective cells) to redundant (empty) columns (such as BL01, BL23 in Table 10E or BL0B and BL1B in Table 11F). A summer is used to appropriately sum the bitline outputs and mappings.

[0207] Table 11A: Exemplary Layout

[0208] <![CDATA[ BLPWDN ]]> <![CDATA[ BL01 ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BL23 ]]> <![CDATA[ BLPWDN ]]> row0 w01+ w01- w02+ w02- row1 w11+ w11- w12+ w12- row2 w21+ w21- w22+ w22- row3 w31+ w31- w32+ w32-

[0209] Table 11B: Exemplary Weight Combinations

[0210] <![CDATA[ BLPWDN ]]> <![CDATA[ BL01 ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BL23 ]]> <![CDATA[ BLPWDN ]]> row0 1 0 1 0 row1 0 1 0 1 row2 1 0 1 0 row3 0 1 0 1

[0211] Table 11C: Exemplary Weight Combinations

[0212] <![CDATA[ BLPWDN ]]> <![CDATA[ BL01 ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BL23 ]]> <![CDATA[ BLPWD N]]> row0 0 1 1 0 row1 0 1 1 0 row2 0 1 1 0 row3 0 1 1 0

[0213] Table 11D: Weight Combinations for Remapping

[0214] <![CDATA[ BLPWDN ]]> <![CDATA[ BL01 ]]> <![CDATA[ BL0 ]]> <![CDATA[ BL1 ]]> <![CDATA[ BL2 ]]> <![CDATA[ BL3 ]]> <![CDATA[ BL2 3]]> <![CDATA[ BLPWDN ]]> row0 0 0 1 0 0 1 row1 1 0 0 1 0 0 row2 0 0 1 0 0 1 row3 1 0 0 1 0 0

[0215] Table 11E: Weight Combinations for Remapping

[0216]

[0217] Table 11F: Weight Combinations for Remapping

[0218]

[0219] Table 11G shows the applicable... Figure 27B The implementation scheme for the physical arrangement of the array. Since each array has positive or negative weights, a virtual bit line is needed for each bit line as a source pull-down and physical barrier to avoid FG-FG coupling.

[0220] Table 11G: Exemplary Layout

[0221] <![CDATA[ BLPWDN ]]> <![CDATA[ BL0 ]]> <![CDATA[ BLPWDN ]]> <![CDATA[ BL1 ]]> <![CDATA[ BLPWDN ]]> row0 w01+ / - w02+ / - row1 w11+ / - w12+ / - row2 w21+ / - w22+ / - row3 w31+ / - w32+ / -

[0222] Another implementation has a tuned bit line as an adjacent bit line to the target bit line, so as to tune the target bit line to the final target by means of FG-FG coupling. In this case, the source pull-down bit line (BLPWDN) is inserted on the side of the target bit line that is not adjacent to the tuned bit line.

[0223] An alternative implementation for mapping noisy or defective units designates these units as unused units (after they have been identified as noisy or defective by the sensing circuitry), meaning they are (deeply) programmed not to contribute any value to the neuron output.

[0224] Implementations for processing fast cells first identify these cells, and then apply more precise algorithms to these cells, such as those with small or no voltage increment pulses or those using floating gate coupling algorithms.

[0225] Figure 28 An optional redundant array 2801 is shown, which can be included in any of the VMM arrays discussed so far. If any column attached to the bitline switch is considered defective, the redundant array 2801 can be used as redundancy to replace the defective column. This redundant array can have its own redundant neuron outputs (e.g., bitlines) and ADC circuitry for redundancy purposes. Where redundancy is required, the output of the redundant ADC is used to replace the output of the ADC for the defective bitline. The redundant array 2801 can also be used for weight mapping for power distribution across bitlines, as described in Table 10x.

[0226] Figure 29 A VMM system 2900 is shown, comprising arrays 2901 and 2902, column multiplexer 2903, local bit lines LBL 2905a-d, global bit lines GBL 2908 and 2909, and virtual bit line switch 2905. Column multiplexer 2903 is used to select the top local bit line 2905 of array 2901 or the bottom local bit line 2905 of array 2902 to the global bit line 2908. In one embodiment, the (metallic) global bit line 2908 has the same number of lines as the local bit lines, for example, 8 or 16. In another embodiment, the global bit line 2908 has only one (metallic) line for every N local bit lines, for example, one global bit line for every 8 or 16 local bit lines. The column multiplexer 2903 also includes multiplexing adjacent global bit lines (such as GBL 2909) into a current global bit line (such as GBL 2908) to effectively increase the width of the current global bit line. This reduces the voltage drop on the global bit line.

[0227] Figure 30A VMM system 3000 is shown. The VMM system 3000 includes an array 3010, a shift register (SR) 3001, a digital-to-analog converter 3002 (which receives input from SR 3001 and outputs equivalent (analog or pseudo-analog) levels or information) to corresponding control gate lines CG, a summer circuit 3003, an analog-to-digital converter 3004, and a bit line switch 3005. Virtual bit lines and virtual bit line switches are present but not shown. As shown, the ADC circuitry can be combined to produce a single ADC with greater accuracy (i.e., a larger number of bits). The source lines are connected together as a SLARY 3888.

[0228] The summer circuit 3003 may include Figures 31-3 The circuit shown in Figure 3. It may include circuitry for normalization, scaling, arithmetic operations, activation, statistical rounding, etc.

[0229] Figure 31 A current-to-voltage summator circuit 3100, adjustable by a variable resistor, is shown. This circuit includes current sources 3101-1, ..., 3101-n drawing currents Ineu(1), ..., Ineu(n) (these are the currents received from the bit lines of the VMM array, respectively), an operational amplifier 3102, a variable holding capacitor 3104, and a variable resistor 3103. The operational amplifier 3102 outputs a voltage, Vneuout = R3103 * (Ineu1 + Ineu0), which is proportional to the current Ineux. When switch 3106 is open, the holding capacitor 3104 holds the output voltage. This held output voltage is used, for example, for conversion to digital bits via an ADC circuit. VREF is, for example, a reference voltage from 0.1V to 1.0V. This is the voltage that can be applied to the bit lines of the array being read.

[0230] Figure 32 A current-to-voltage summator circuit 3200, adjustable by a variable capacitor (essentially an integrator), is shown. This circuit includes current sources 3201-1, ..., 3201-n drawing currents Ineu(1), ..., Ineu(n) (which are the currents received from the bit lines of the VMM array, respectively), an operational amplifier 3202, a variable capacitor 3203, and a switch 3204. The operational amplifier 3202 outputs a voltage, Vneuout = Ineu * integration time / C3203, which is proportional to the current Ineu.

[0231] Figure 33AA voltage summer 3300, adjustable by a variable capacitor (i.e., a switched capacitor SC circuit), is shown. The voltage summer includes switches 3301 and 3302, variable capacitors 3303 and 3304, an operational amplifier 3305, a variable capacitor 3306, and a switch 3306. When switch 3301 is closed, input Vin0 is provided to operational amplifier 3305. When switch 3302 is closed, input Vin1 is provided to operational amplifier 3305. Optionally, switches 3301 and 3302 are not closed simultaneously. Operational amplifier 3305 generates an output Vout, which is an amplified version of the input (Vin0 and / or Vin1, depending on which switch between 3301 and 3302 is closed). That is, Vout = Cin / Cout*(Vin), where Cin is C3303 or C3304, and Cout is C3306. For example, Vout = Cin / Cout * ∑(Vinx), Cin = C3303 = C3304. In one implementation, Vin0 is the positive voltage and Vin1 is the negative voltage, and the voltage summer 3300 adds them together to produce the output voltage Vout.

[0232] Figure 33B A voltage summer 3350 is shown, which includes switches 3351, 3352, 3353, and 3354, a variable input capacitor 3358, an operational amplifier 3355, a variable feedback capacitor 3356, and a switch 3357. In one embodiment, Vin0 is a positive voltage and Vin1 is a negative voltage, and the voltage summer 3300 adds them together to produce an output voltage Vout.

[0233] For Input = Vin0: When switches 3354 and 3351 are closed, input Vin0 is supplied to the top terminal of capacitor 3358. Then, switch 3351 is opened and switch 3353 is closed to transfer charge from capacitor 3358 to feedback capacitor 3356. Basically, the output VOUT = (C3358 / C3356) * Vin0 (for cases with, for example, VREF = 0).

[0234] For Input = Vin1: When switches 3353 and 3354 are closed, both terminals of capacitor 3358 discharge to VREF. Then, by opening switch 3354 and closing switch 3352, the bottom terminal of capacitor 3358 is charged to Vin1, which in turn charges feedback capacitor 3356 to VOUT = -(C3358 / C3356)*Vin1 (for the case where VREF = 0).

[0235] Therefore, if Vin1 input is enabled after Vin0 input is enabled, then for the example case of VREF=0, VOUT=(C3358 / C3356)*(Vin 0-Vin1). This is used, for example, to implement w=w+-w-.

[0236] Applicable to the VMM arrays discussed above Figure 2 The input and output operations can be performed in digital or analog form. Methods include:

[0237] • The sequential inputs from IN[0:q] to DAC:

[0238] • Operate sequentially: IN0, then IN1, ..., then INq; all input bits have the same VCGin; all bit lines (neurons) outputs are summed using adjusted binary index multipliers; before or after the ADC.

[0239] • Methods for adjusting the binary index multiplier of neurons (bit lines): such as Figure 20 As shown, the example summer has two bit lines, BL0 and BLn. Weights are distributed across multiple bit lines BL0 to BLn. For example, there are four bit lines: BL0, BL1, BL2, and BL3. The output of bit line BL0 is multiplied by 2^0 = 1. The output of bit line BLn, representing the position of the nth binary bit, is multiplied by 2^n; for example, for n = 3, 2^3 = 8. The outputs of all bit lines, after being appropriately multiplied by the binary bit position 2^n, are then summed. This is then digitized by the ADC. This method means that all cells have only a binary range; multi-level ranges (n bits) are handled by peripheral circuitry (i.e., the summer circuitry). Therefore, for the highest bias level of the memory cell, the voltage drop across all bit lines is approximately the same.

[0240] • Operate sequentially: IN0, IN1, ..., then INq; each input bit has a corresponding analog value VCGin; all neuron outputs are summed for evaluation of all input bits; before or after ADC.

[0241] Parallel input of the DAC:

[0242] • Each input IN[0:q] has a corresponding analog value VCGin; all neuron outputs are summed using an adjusted binary index multiplier method; before or after the ADC.

[0243] By operating on the array sequentially, the power is distributed more evenly. This neuron (bit line) binary indexing method also reduces the power in the array because each unit in the bit line has only a binary level, which is implemented by the summer circuit 2603.

[0244] As shown in Figure 33, each ADC can be configured to be combined with the next ADC for a higher-level implementation with a suitable ADC design.

[0245] Figure 34A , Figure 34B and Figure 34C Showing what can be used Figure 30 The summer circuit 3003 and the output circuit of the analog-to-digital converter 3004 are described.

[0246] Figure 34A The output circuit 3400 is shown, which includes an analog-to-digital converter 3402 that receives neuron output 3401 and outputs digital bits 3403.

[0247] Figure 34B The output circuit 3410 is shown, which includes a neuron output circuit 3411 that receives neuron output 3401 and generates output 3413 together, and an analog-to-digital converter 3412.

[0248] Figure 34C The output circuit 3420 is shown, which includes a neuron output circuit 3421 that receives neuron output 3401 and generates output 3423 together, and a converter 3422.

[0249] The neuron output circuit 3411 or 3411 can perform operations such as summation, scaling, normalization, and arithmetic. The converter 3422 can perform operations such as ADC, PDC, AAC, and APC.

[0250] Figure 35 The diagram shows a neuron output circuit 3500, which includes adjustable (scalable) current sources 3501 and 3502. These two current sources together generate the output iOUT, i.e., the neuron output. This circuit can perform the summation of positive and negative weights, i.e., w = w+ - w-, and simultaneously increase or decrease the output neuron current.

[0251] Figure 36A configurable neuron serial analog-to-digital converter 3600 is shown. It includes an integrator 3670 that integrates the neuron output current into an integrating capacitor 3602. One embodiment generates a digital output (count output) 3621 by timing a ramp-up VRAMP 3650 until the comparator 3604 switches polarity, or another embodiment ramps down node VC 3610 via a ramp current 3651 until VOUT 3603 reaches VREF 3650, at which point the EC 3605 signal disables the counter 3620. This (n-bit) ADC can be configured to have a lower bit accuracy of less than n bits or a higher bit accuracy of more than n bits, depending on the target application. Configurability is achieved, for example, by configuring the ramp rate of capacitor 3602, current 3651 or VRAMP 3650, timing 3641, etc. In another embodiment, the ADC circuitry of a VMM array is configured to have a lower accuracy of less than n bits and the ADC circuitry of another VMM array is configured to have a higher accuracy of more than n bits. Furthermore, the ADC circuit of a neuron circuit can be configured to be combined with the next ADC of the next neuron circuit to produce higher n-bit ADC accuracy, such as by combining the integrating capacitor 3602 of the two ADC circuits.

[0252] Figure 37 This diagram shows a configurable neural network SAR (successive approximation register) analog-to-digital converter 3700. The circuit is based on a successive approximation converter that uses binary capacitors for charge redistribution. The circuit includes a binary CDAC (capacitor-based DAC) 3701, an operational amplifier / comparator 3702, and SAR logic 3703. As shown, GndV 3704 is a low-voltage reference level, such as ground.

[0253] Figure 38 A configurable neuron-combined SAR analog-to-digital converter 3800 is shown. This circuit combines two ADCs from two neuron circuits into one to achieve higher precision (n bits). For example, for a 4-bit ADC from one neuron circuit, this circuit can achieve >4-bit precision, such as 8-bit ADC precision, by combining two 4-bit ADCs. The combined circuit topology is equivalent to a split capacitor (bridge capacitor (cap) or focus cap) SAR ADC circuit, such as an 8-bit 4C-4C SAR ADC produced by combining two adjacent 4-bit 4C SAR ADC circuits. This is achieved by a bridging circuit 3804, whose capacitor capacitance is equal to (total number of CDAC capacitor cells / total number of CDAC capacitor cells - 1).

[0254] Figure 39A configurable neuron, pipelined SAR CDAC ADC circuit 3900 is shown, which can be combined with a next SAR ADC to increase the number of bits in a pipelined manner. A residual voltage 3906 is generated by capacitor 3930Cf to be provided as input to the next stage of the pipelined ADC (e.g., to provide a gain of 2 (the ratio of Cf to C of all capacitors in DAC 3901)) as input to the next SAR CDAC ADC.

[0255] For additional specific implementation details of configurable output neuron (such as configurable neuron ADC) circuitry, please refer to U.S. Patent No. 16 / 449,201, filed June 21, 2019, entitled “Configurable Input Blocks and Output Blocks and Physical Layout for Analog Neural Memory in a Deep Learning Artificial Neural Network,” which is incorporated herein by reference.

[0256] Adaptive bias circuit

[0257] Refer again Figure 20 The applicant has determined that during operation, node "ARYGND" (array ground) will not always remain at 0V. Specifically, when current is injected into ARYGND from the various source lines, the voltage of ARYGND will fluctuate above 0V, for example, between 0.1-0.5V. This adversely affects the accuracy of the VMM System 2000 because it will affect the reading and programming of different cells that are affected by the voltage difference between the input lines WL, CG, EG, and BL and the source lines pulled down to ARYGND.

[0258] In one implementation, the entire array has only one source line (SLARY), such as Figure 20 SLARY 2888 or Figure 30 As shown in SLARY 3888. Figure 45A As shown in 45B (which is a buffer regulator) or 45B (which is a force / sensing regulator), a voltage bias can be directly applied to the SLARY 2888 or 3888 to maintain a fixed bias, such as 20mV. In another embodiment, there are multiple SLARY lines and multiple regulators, such as regulators 4510 or 4520, to maintain a fixed bias on these SLARY lines.

[0259] Figures 40 to 42 On one hand, it shows the adaptive voltage difference between the control gate line, word line, and bit line, and on the other hand, it shows the source line.

[0260] exist Figure 40 In Figure 4000, it is indicated that the difference between the control gate line and the source line of the selected cell (labeled d(CG-SL)) is desired to keep the current from the selected cell or bit line constant as the source line voltage increases. Here, the x-axis tracks the SL voltage, while the y-axis tracks the difference between the CG voltage and the SL voltage. For example, when ARYGND rises above 0V (meaning the source line will also rise above 0V), the difference between the control gate line and the source line is expected to adaptively change to maintain the cell or bit line current at a constant level. In this example, the voltage on CG will adaptively increase as a function of the source line voltage to compensate for different effects from the source voltage variation. These compensated effects include the effective decrease in gate-source voltage due to the increased source voltage, or the body effect of an increase in threshold voltage due to the increased source voltage. It also compensates for the effect caused by the decrease in drain-source voltage due to the increased source voltage. Furthermore, it compensates for the effect caused by coupling from the control gate CG and the source voltage to the floating gate FG. Appropriate compensation functions or lookup table data can be characterized by silicon data.

[0261] exist Figure 41 In Figure 4100, it is indicated that the difference between the word line and source line of the selected cell (labeled d(WL-SL)) is desired to keep the current from the selected cell or bit line constant as the source line voltage increases. For example, when ARYGND rises above 0V (meaning the source line will also rise above 0V), the difference between the word line and source line is expected to increase adaptively to maintain the current of the cell or bit line at a constant level. For example, the voltage on the word line WL will adaptively change as a function of the source line voltage to compensate for similar issues described above. Figure 40 The different effects arising from source voltage variations are described above. Word line voltage adaptive design preferably also compensates for voltage coupling between the WL and FG terminals.

[0262] exist Figure 42In Figure 4000, it is indicated that the difference between the bit line and source line of the selected cell (labeled d(BL-SL)) is desired to keep the current from the selected cell or bit line constant as the source line voltage increases. For example, when ARYGND rises above 0V (meaning the source line will also rise above 0V), the difference between the bit line and source line is desired to keep the current from the selected cell or bit line at a constant level. For example, the voltage on bit line BL will adaptively increase as a function of the source line voltage to compensate for various effects, such as those from a reduced drain-source voltage, body effect modulation threshold voltage, voltage coupling from the source to FG, or other electrical effects.

[0263] exist Figure 43 As can be seen in Figure 4300, as the temperature of the VMM array increases (either naturally occurring during VMM array operation or due to environmental changes), the bias voltage that should be applied to the control gate lines and / or erase gate lines and / or word lines to achieve the same operation will decrease as a function of the temperature increase in the subthreshold operating region. The bias voltage can increase as a function of the temperature increase in the linear or saturated operating region. For example, an appropriate compensation function or lookup table can be characterized by silicon data. That is, in addition to voltage variations at ARYGND, temperature variations can also affect the operation and accuracy of the VMM array. In one embodiment, an on-chip temperature sensor is implemented to detect temperature changes and apply a compensation function to the bias voltage accordingly.

[0264] Figure 44A An adaptive bias circuit 4410 is shown, which includes an adjustable current source 4401 and an adjustable resistor 4402. One end of the resistor 4402 is coupled to the adjustable current source 4401 at node VREF_AB 4403, and the other end of the resistor 4402 is connected to a source from... Figure 20 or Figure 30 The voltage at node ARYGND changes. When the voltage at ARYGND changes, the voltage at node VREF_AB will change by approximately the same amount because the voltage drop across resistor 4402 will remain constant for the constant current generated by the adjustable current source 4401. Resistor 4402 can be adjusted according to a defined compensation function or lookup table to change the voltage VREF_AB, thus changing as a function of the ARYGND voltage. Similarly, the resistance of resistor 4402 can change as a function of temperature. In addition to, or in place of, any adjustment to the resistance of resistor 4402, the amount of current provided by the adjustable current source 4401 can be adjusted according to a defined compensation function or lookup table to change as a function of temperature or the ARYGND voltage. Therefore, VREF_AB will reflect any real-time changes in the ARYGND voltage and temperature and can be used by various circuits during read or programming operations.

[0265] Figure 44B An adaptive bias circuit 4420 is shown, which includes adjustable resistors 4421 and 4422 and an operational amplifier 4423. The resistances of resistors 4421 and / or 4422 can be adjusted according to a defined compensation function or lookup table to vary as a function of the ARYGND voltage, thereby providing the desired voltage VREF_AB at node 4424. The resistances of resistors 4421 and / or 4422 can be adjusted according to a defined compensation function or lookup table to vary as a function of temperature.

[0266] Figure 45A and Figure 45B Regulator 4500 (including regulator 4510 or regulator 4520 respectively) is shown, which can optionally receive voltage VREF_AB from adaptive bias circuit 4400 at node IN to provide a regulated voltage output at node OUT. Regulator 4520 is used in a force / sensing configuration to compensate for interconnect line voltage drops, such as those in power bus wiring, where the voltage at output node OUT is switchably connected to feedback voltage FB to force the voltage at output node OUT to follow feedback voltage FB. When applied to VMM system 2000, voltage OUT can be applied to node SLARY 2888, as referenced above. Figure 20 The subject of discussion.

[0267] The voltage VREF_AB at node IN is determined based on a lookup table or a function based on characterization data. The output voltage at node OUT can be used to provide bias to ARYGND or SLARY. For example, node OUT can be used as a power supply for the SL bias of the entire array, such as SLARY, for example, 0V or 15mV. Node OUT can also be used as a bias applied to one or more bit lines, for example to compensate for coupling mismatch, bulk effect mismatch, or PVT mismatch, but is not limited thereto. Node OUT can also be used as a bias voltage for Vcg, for example to compensate for coupling mismatch, bulk effect mismatch, or PVT mismatch, but is not limited thereto.

[0268] Figure 46A An adaptive bias circuit including an input digital-to-analog converter 4600 is shown. The input digital-to-analog converter 4600 receives digital input signals DIN[7:0] and generates analog signals that can be applied to the input lines of the VMM array (such as control gate lines, word lines, erase gate lines, or source lines) to program one or more selected cells in the VMM array. Notably, the input digital-to-analog converter 4600 utilizes the voltage source VREF_AB from Figures 44 and 45, and it utilizes... Figure 20 and Figure 44AARYGND is used as its ground. The input digital-to-analog converter 4600 automatically compensates for variations in ARYGND by using the VREF_AB voltage.

[0269] Figure 46B An adaptive bias circuit including an output analog-to-digital converter 4650 is shown. The output analog-to-digital converter 4650 receives an array output such as current or voltage and generates digital output bits. Notably, the output analog-to-digital converter 4650 utilizes data from... Figure 44A and / or Figure 44B The voltage source VREF_AB, and it utilizes from Figure 20 and Figure 44A ARYGND is used as its ground. Therefore, the output analog-to-digital converter 4650 will compensate for changes in ARYGND by using the VREF_AB voltage.

[0270] Figure 47 An adjustable bias line decoder 4700 is shown, which includes a word line decoder 4701 coupled to a control gate decoder 4702. In this example, the adjustable bias line decoder 4700 is used for row 0 in the VMM array. All other rows in the array will have a similar adjustable bias line decoder assigned to them.

[0271] The word line decoder 4701 includes a PMOS transistor 4703 and an NMOS transistor 4704 arranged as inverters, and a NAND gate 4705, configured as shown. The control gate decoder 4702 includes an input digital-to-analog converter 4600 from Figure 46, an inverter 4706, switches 4707 and 4708, and an NMOS transistor 4709 used as a transmission gate, configured as shown. Here, the word line WL0 and the control gate line CG0 are activated when the NAND gate 4705 receives an address signal corresponding to the row (here, row 0) allocated to the adjustable bias line decoder 4700. With WL0 and CG0 activated, the analog output voltage of the input digital-to-analog converter 4600 is applied to the control gate CG0. Here, the control gate line CG0 receives a compensation bias voltage provided by the input digital-to-analog converter 4600. Similar devices for EG or WL compensation can also be used.

[0272] Figure 48 This diagram shows a current-to-voltage summator circuit 4800, which is used to convert array output (BL) current into voltage. This current-to-voltage summator circuit is used in conjunction with... Figure 31 The current-to-voltage summator circuit 3100 is the same as that in Figure 44 and Figure 45, except that the operational amplifier 3102 receives VREF_AB (from Figures 44 and 45) instead of VREF at its non-inverting input, and the array output current references the virtual ground node ARYGND.

[0273] Figure 49 The diagram illustrates a current-to-voltage summator circuit 4900, which includes an operational amplifier 4901, a variable resistor 4902, and a current source 4903 that draws current Ineu (which is the current received from the bit lines of the VMM array). The operational amplifier receives VREF_AB (from...) at its non-inverting input. Figure 44A and / or Figure 45B Operational amplifier 4901 output voltage Vneuout. Array output current references the virtual ground node ARYGND.

[0274] Figure 50 This shows a current-to-voltage summator 5000, which is connected to... Figure 32 The current-to-voltage summator 3200 is the same as the one in the previous one, except that the operational amplifier 3202 receives VREF_AB (from...) at its non-inverting input. Figure 44A and / or Figure 44B (Instead of VREF) The array output current references the virtual ground node ARYGND.

[0275] Figure 51 A voltage summer 5100 is shown, which is connected to... Figure 33A The voltage summer in the 3300 is the same, except that it uses voltage VREF_AB (from...). Figure 44A and / or Figure 44B (Instead of VREF) The array output voltage Vinx references the virtual ground node ARYGND.

[0276] Figure 52 A voltage summer 5200 is shown, which is connected to... Figure 33B The voltage summer in the 3350 is the same, except that it uses voltage VREF_AB (from...). Figure 44A and / or Figure 44B (Instead of VREF) The array output voltage Vinx references the virtual ground node ARYGND.

[0277] Figure 53 A VMM high-voltage decoding circuit is shown, which includes components adapted to work with... Figure 3 The erase gate decoder circuit 5301, control gate decoder circuit 5304, source line decoder circuit 5307, and high voltage level shifter 5311 are used together with memory cells of the type shown.

[0278] The erase gate decoder circuit 5301 includes a PMOS select transistor 5302 (controlled by the signal HVO_B) and an NMOS deselect transistor 5303 (controlled by the signal HVO_B) configured as shown.

[0279] The control gate decoder circuit 5304 includes a PMOS select transistor 5304 (controlled by the signal HVO_B) and an NMOS deselect transistor 5306 (controlled by the signal HVO_B) configured as shown.

[0280] The source line decoder circuit 5307 includes an NMOS monitoring transistor 5308 (controlled by the signal SL_MON), a drive transistor 5309 (controlled by the signal HVO), and a deselect transistor 5310 (controlled by the signal HVO_B) configured as shown.

[0281] The high-voltage level shifter 5311 receives the enable signal EN and outputs a high-voltage signal HV and its complement HVO_B, and receives HVSUP (high voltage) and HVSUP_LOW for its voltage rail.

[0282] It should be noted that, as used herein, the terms “above” and “on” both encompass “directly on” (without intermediate material, elements, or space between) and “indirectly on” (with intermediate material, elements, or space between). Similarly, the term “adjacent” includes “directly adjacent” (without intermediate material, elements, or space between) and “indirectly adjacent” (with intermediate material, elements, or space between), “mounted to” includes “directly mounted to” (without intermediate material, elements, or space between) and “indirectly mounted to” (with intermediate material, elements, or space between), and “electrically coupled to” includes “directly electrically coupled to” (without intermediate material or elements electrically connecting the elements together) and “indirectly electrically coupled to” (with intermediate material or elements electrically connecting the elements together). For example, forming an element “above the substrate” can include forming an element directly on the substrate without intermediate material / elements between them, and forming an element indirectly on the substrate with one or more intermediate materials / elements between them.

Claims

1. A non-volatile memory system, comprising: A non-volatile memory cell array, wherein the non-volatile memory cell array is arranged in rows and columns, and each non-volatile memory cell includes a source and a drain; Multiple bit lines, each of which is coupled to the drain of each non-volatile memory cell in a column of non-volatile memory cells; Multiple pull-down bit lines, each of which couples a row of a non-volatile memory cell to a pull-down node; An adaptive bias circuit is used to generate an regulated voltage in response to changes in the voltage of the pull-down node; Source lines, the source lines being coupled to the source of each non-volatile memory cell; and An adjustable bias row decoder is provided for receiving a row address and providing an regulated voltage to the control gate line of the row corresponding to the row address in the array during operation.

2. A non-volatile memory system, comprising: A non-volatile memory cell array, wherein the non-volatile memory cell array is arranged in rows and columns, and each non-volatile memory cell includes a source and a drain; Multiple bit lines, each of which is coupled to the drain of each non-volatile memory cell in a column of non-volatile memory cells; Multiple pull-down bit lines, each of which couples a row of a non-volatile memory cell to a pull-down node; An adaptive bias circuit is used to generate an regulated voltage in response to changes in the voltage of the pull-down node; Source lines, the source lines being coupled to the source of each non-volatile memory cell; and An adjustable bias row decoder is provided for receiving a row address and providing an regulated voltage to the word line of the row corresponding to the row address in the array during operation.

3. A non-volatile memory system, comprising: A non-volatile memory cell array, wherein the non-volatile memory cell array is arranged in rows and columns, and each non-volatile memory cell includes a source and a drain; Multiple bit lines, each of which is coupled to the drain of each non-volatile memory cell in a column of non-volatile memory cells; Multiple pull-down bit lines, each of which couples a row of a non-volatile memory cell to a pull-down node; An adaptive bias circuit is used to generate an regulated voltage in response to changes in the voltage of the pull-down node; Source lines, the source lines being coupled to the source of each non-volatile memory cell; and An adjustable bias row decoder is provided for receiving a row address and providing an regulated voltage to the erase gate line of the row corresponding to the row address in the array during operation.

4. The non-volatile memory system according to claim 1, 2 or 3, wherein the operation is a read operation.

5. The non-volatile memory system according to claim 1, 2 or 3, wherein the operation is a programming operation.

6. The non-volatile memory system according to claim 1, 2 or 3, wherein the adaptive bias circuit includes a variable resistor having a first end and a second end, the first end being coupled to an adjustable current source and the second end being coupled to the pull-down node.

7. The non-volatile memory system of claim 6, wherein the variable resistor is adjusted based on a compensation function.

8. The non-volatile memory system of claim 6, wherein the variable resistor is adjusted based on a lookup table.

9. The non-volatile memory system of claim 6, wherein the variable resistor adjusts in response to temperature changes.

10. The non-volatile memory system according to claim 1, 2, or 3, wherein the adaptive bias circuit comprises: An operational amplifier includes an inverting input, a non-inverting input, and an output; A first variable resistor is coupled to the inverting input; as well as A second variable resistor is coupled between the inverting input and the output; The resistances of the first variable resistor and the second variable resistor are determined as a function of the voltage of the pull-down node.