Digital output mechanism simulating neural memory in deep learning artificial neural networks
By using non-volatile memory arrays and current-to-voltage converter analog-to-digital converters in artificial neural networks, the problems of low energy efficiency and high computational complexity in hardware implementation are solved, achieving efficient weight value storage and operation, which is suitable for vector-matrix multiplication operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SILICON STORAGE TECHNOLOGY INC
- Filing Date
- 2021-04-07
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, the hardware implementation of artificial neural networks suffers from low energy efficiency and high computational complexity. Especially in scenarios with a large number of synaptic connections, there is a lack of efficient output and input mechanisms for non-volatile memory units, making it difficult to achieve efficient vector-matrix multiplication operations.
Using a non-volatile memory array as a synapse, combined with a current-to-voltage converter and an analog-to-digital converter, an output block is designed to generate output from the non-volatile memory cell array. This includes a current-to-voltage converter and an analog-to-digital converter to convert the current sequence into multiple output bits, enabling accurate storage and computation of weight values.
By using non-volatile memory arrays, efficient weight value storage and operation are achieved, reducing the need for separate multiplication and addition logic circuits, improving computational and energy efficiency, and making it suitable for synaptic simulation of artificial neural networks.
Smart Images

Figure CN116635869B_ABST
Abstract
Description
[0001] Priority Statement
[0002] This application claims priority to U.S. Provisional Patent Application No. 63 / 133,270, filed January 1, 2021, entitled “Input and Digital Output Mechanisms for Analog Neural Memory in a Deep Learning Artificial Neural Network”, and U.S. Patent Application No. 17 / 219,352, filed March 31, 2021, entitled “Digital Output Mechanisms for Analog Neural Memory in a Deep Learning Artificial Neural Network”. Technical Field
[0003] Several implementations of the output mechanism for reading or verifying the output of non-volatile memory cells within a vector-matrix multiplication (VMM) array in an artificial neural network are disclosed. Background Technology
[0004] Artificial neural networks mimic biological neural networks (the central nervous system of animals, especially the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are often unknown. Artificial neural networks typically consist of interconnected layers of "neurons" that exchange messages with each other.
[0005] Figure 1 An artificial neural network is illustrated, where circles represent the inputs or layers of neurons. Connections (called synapses) are indicated by arrows and have numerical weights that can be adjusted empirically. This allows the neural network to adapt to its inputs and learn. Typically, a neural network consists of layers with multiple inputs. There are usually one or more intermediate layers of neurons, and an output layer of neurons that provide the output of the neural network. Neurons at each level make decisions individually or collectively based on the data received from the synapses.
[0006] One of the major challenges in developing artificial neural networks for high-performance information processing is the lack of sufficient hardware technology. Real-world neural networks rely on a large number of synapses to achieve high connectivity between neurons, i.e., very high computational parallelism. In principle, such complexity can be achieved using digital supercomputers or dedicated clusters of graphics processing units. However, compared to biological networks, these methods are generally energy inefficient, in addition to being costly, as biological networks consume less energy primarily due to their ability to perform low-precision analog computations. CMOS analog circuits have been used in artificial neural networks, but given the large number of neurons and synapses, most CMOS-implemented synapses are excessively large.
[0007] The applicant previously disclosed an artificial (simulated) neural network utilizing one or more non-volatile memory arrays as synapses in U.S. Patent Application No. 15 / 594,439, which is incorporated herein by reference. The non-volatile memory array operates as a simulated neuromorphic memory. The neural network device includes a plurality of synapses configured to receive a first plurality of inputs and generate a first plurality of outputs therefrom, and a plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses includes a plurality of memory cells, wherein each memory cell includes: spaced-apart source and drain regions formed in a semiconductor substrate, wherein a channel region extends between the source and drain regions; a floating gate disposed over and insulated from a first portion of the channel region; and a non-floating gate disposed over and insulated from a second portion of the channel region. Each memory cell is configured to store weight values corresponding to a plurality of electrons on the floating gate. The plurality of memory cells are configured to multiply the first plurality of inputs by the stored weight values to generate the first plurality of outputs.
[0008] Every non-volatile memory cell used in an analog neuromorphic memory system must be erased and programmed to maintain a very specific and precise amount of charge (i.e., the number of electrons) in the floating gate. For example, each floating gate must hold one of N distinct values, where N is the number of different weights that can be indicated by each cell. Examples of N include 16, 32, 64, 128, and 256.
[0009] Because the output of one VMM is often needed to be applied to another VMM, it is desirable in a VMM system to be able to convert the output of one VMM to bits and apply the input bits to another VMM. This then presents the challenge of how to best implement a bit encoding mechanism for a VMM system.
[0010] What is needed are improved input and output blocks for the VMM to perform programming, verification, and reading. Summary of the Invention
[0011] Several implementations for reading or verifying values stored in selected memory cells of a vector-matrix multiplication (VMM) array in an artificial neural network are disclosed. The implementations include various designs for input and output blocks used with the VMM array.
[0012] In one embodiment, the output block for generating output from a non-volatile memory cell array includes a current-to-voltage converter for receiving a current sequence generated in response to an input sequence to the array from one or more selected non-volatile memory cells in the array, and for generating a voltage or voltage sequence in response to the current sequence; and an analog-to-digital converter for converting the voltage or voltage sequence into a plurality of output bits, wherein the plurality of output bits reflect a weighting function performed on one or more of the current sequence or voltage or voltage sequences.
[0013] In another embodiment, an output block for generating output from a non-volatile memory cell array includes a current-to-voltage converter for receiving current from one or more selected non-volatile memory cells in the array in response to an input applied to the array and converting the current into a voltage, the current-to-voltage converter including sample and hold circuitry for holding the voltage.
[0014] In another embodiment, the output block for generating an output from a current sequence received from the array in response to an input sequence received by the non-volatile memory cell array includes an analog-to-digital converter for receiving the current sequence and converting the current sequence into an output including a plurality of output bits. Attached Figure Description
[0015] Figure 1 This is a schematic diagram of an artificial neural network.
[0016] Figure 2 This illustrates a split-gate flash memory cell from the prior art.
[0017] Figure 3 This illustrates another prior art split-gate flash memory cell.
[0018] Figure 4 This illustrates another prior art split-gate flash memory cell.
[0019] Figure 5 This illustrates another prior art split-gate flash memory cell.
[0020] Figure 6 This is a schematic diagram illustrating different levels of an exemplary artificial neural network utilizing one or more non-volatile memory arrays.
[0021] Figure 7A block diagram illustrating a vector-matrix multiplication system is provided.
[0022] Figure 8 A block diagram illustrating an exemplary artificial neural network utilizing one or more vector-matrix multiplication systems is provided.
[0023] Figure 9 Another implementation scheme of the vector-matrix multiplication system is shown.
[0024] Figure 10 Another implementation scheme of the vector-matrix multiplication system is shown.
[0025] Figure 11 Another implementation scheme of the vector-matrix multiplication system is shown.
[0026] Figure 12 Another implementation scheme of the vector-matrix multiplication system is shown.
[0027] Figure 13 Another implementation scheme of the vector-matrix multiplication system is shown.
[0028] Figure 14 This illustrates a prior art long short-term memory system.
[0029] Figure 15 An exemplary cell used in a long short-term memory system is shown.
[0030] Figure 16 Show Figure 15 An implementation of an exemplary unit.
[0031] Figure 17 Show Figure 15 Another implementation of the exemplary unit.
[0032] Figure 18 This illustrates a prior art gate-controlled recursive cell system.
[0033] Figure 19 An exemplary cell used in a gate-controlled recursive cell system is shown.
[0034] Figure 20 Show Figure 19 An implementation of an exemplary unit.
[0035] Figure 21 Show Figure 19 Another implementation of the exemplary unit.
[0036] Figure 22A An implementation of a method for programming non-volatile memory cells is shown.
[0037] Figure 22BAnother embodiment of a method for programming non-volatile memory cells is shown.
[0038] Figure 23 An implementation of a rough programming approach is shown.
[0039] Figure 24 An exemplary pulse is shown for use in programming a non-volatile memory cell.
[0040] Figure 25 An exemplary pulse is shown for use in programming a non-volatile memory cell.
[0041] Figure 26A and Figure 26B A calibration algorithm for programming a non-volatile memory cell is shown, which adjusts programming parameters based on the cell's slope characteristics.
[0042] Figure 27 The circuitry used in the calibration algorithm shown in Figure 26 is illustrated.
[0043] Figure 28 A calibration algorithm for programming non-volatile memory cells is shown.
[0044] Figure 29 Shown in Figure 28 The circuit used in the calibration algorithm.
[0045] Figure 30 An exemplary progression of the voltage applied to the control gate of a non-volatile memory cell during programming operations is shown.
[0046] Figure 31 An exemplary progression of the voltage applied to the control gate of a non-volatile memory cell during programming operations is shown.
[0047] Figure 32 A system is shown for applying a programming voltage during programming of a non-volatile memory cell within a vector-multiplication matrix system.
[0048] Figure 33 The circuit for a charge summer is shown.
[0049] Figure 34 The circuit for a current summer is shown.
[0050] Figure 35 The circuit for a digital summer is shown.
[0051] Figure 36A An implementation of an integral analog-to-digital converter for neuron output is shown.
[0052] Figure 36B Showing Figure 36AThe graph shows the voltage output of the integrating analog-to-digital converter as a function of time.
[0053] Figure 36C Another implementation of an integrating analog-to-digital converter for neuron output is shown.
[0054] Figure 36D Showing Figure 36C The graph shows the voltage output of the integrating analog-to-digital converter as a function of time.
[0055] Figure 36E Another implementation of an integrating analog-to-digital converter for neuron output is shown.
[0056] Figure 36F Another implementation of an integrating analog-to-digital converter for neuron output is shown.
[0057] Figure 37A and Figure 37B This illustrates a successive approximation analog-to-digital converter used for neuron output.
[0058] Figure 38 An embodiment of a Σ-Δ analog-to-digital converter is shown.
[0059] Figure 39 The output block is shown.
[0060] Figure 40 Another implementation scheme of the vector-matrix multiplication system is shown.
[0061] Figure 41 Another implementation scheme of the vector-matrix multiplication system is shown.
[0062] Figure 42 Another implementation scheme of the vector-matrix multiplication system is shown.
[0063] Figure 43 A block diagram of a vector-matrix multiplication system is shown.
[0064] Figure 44 Show a digital summer.
[0065] Figure 45 The output block is shown.
[0066] Figure 46 An embodiment of a current-to-voltage converter is shown.
[0067] Figure 47 Another implementation of a current-to-voltage converter is shown.
[0068] Figure 48 Another implementation of a current-to-voltage converter is shown.
[0069] Figure 49A Another implementation of a current-to-voltage converter is shown.
[0070] Figure 49B An embodiment of a lossless variable resistor is shown.
[0071] Figure 50 Another implementation of a current-to-voltage converter is shown.
[0072] Figure 51 Another implementation of a current-to-voltage converter is shown.
[0073] Figure 52A , Figure 52B and Figure 52C An implementation scheme for a hybrid serial converter is shown. Detailed Implementation
[0074] The artificial neural network of this invention utilizes a combination of CMOS technology and non-volatile memory arrays.
[0075] Non-volatile memory cells
[0076] Digital nonvolatile memory is well known. For example, U.S. Patent 5,029,130 (“130 Patent”), which is incorporated herein by reference, discloses an array of split-gate nonvolatile memory cells, which is a type of flash memory cell. Such memory cells 210 in… Figure 2 As shown in the figure. Each memory cell 210 includes a source region 14 and a drain region 16 formed in a semiconductor substrate 12, with a channel region 18 therebetween. A floating gate 20 is formed over and insulated from (and controls the conductivity of) a first portion of the channel region 18, and is formed over a portion of the source region 14. A word line terminal 22 (which is typically coupled to a word line) has a first portion disposed over and insulated from (and controlling the conductivity of) a second portion of the channel region 18, and a second portion extending upward and located over the floating gate 20. The floating gate 20 and the word line terminal 22 are insulated from the substrate 12 by a gate oxide. A bit line 24 is coupled to the drain region 16.
[0077] The memory cell 210 is erased by applying a high positive voltage to the word line terminal 22 (where electrons are removed from the floating gate), which causes electrons on the floating gate 20 to tunnel from the floating gate 20 to the word line terminal 22 through the intermediate insulator via the Fowler-Nordheim (FN) tunnel.
[0078] The memory cell 210 is programmed by source-side injection (SSI) with hot electrons (where electrons are placed on the floating gate) by applying a positive voltage to both the word line terminal 22 and the source region 14. Electron flow occurs from the drain region 16 to the source region 14. As electrons reach the gap between the word line terminal 22 and the floating gate 20, they accelerate and become hot. Due to electrostatic attraction from the floating gate 20, some of the heated electrons are injected onto the floating gate 20 through the gate oxide.
[0079] Memory cell 210 is read by applying a positive read voltage to the drain region 16 and word line terminal 22 (which connects the portion of channel region 18 below the word line terminal). If the floating gate 20 is positively charged (i.e., electrons are erased), the portion of channel region 18 below the floating gate 20 is also turned on, and current flows through channel region 18, which is sensed as an erased state or a "1" state. If the floating gate 20 is negatively charged (i.e., programmed electronically), the portion of channel region below the floating gate 20 is mostly or completely turned off, and current does not flow (or very little current) through channel region 18, which is sensed as a programmed state or a "0" state.
[0080] Table 1 shows the typical voltage and current ranges that can be applied to the terminals of memory cell 110 to perform read, erase, and program operations:
[0081] Table 1: Figure 3 Operation of flash memory cell 210
[0082] WL BL SL Read 2-3V 0.6-2V 0V erase Approximately 11-13V 0V 0V programming 1-2V 10.5-3μA 9-10V
[0083] Other split-gate memory cell configurations as other types of flash memory cells are known. For example, Figure 3 A quad-gate memory cell 310 is shown, comprising a source region 14, a drain region 16, a floating gate 20 over a first portion of a channel region 18, a select gate 22 (typically coupled to a word line WL) over a second portion of the channel region 18, a control gate 28 over the floating gate 20, and an erase gate 30 over the source region 14. This configuration is described in U.S. Patent 6,747,310, which is incorporated herein by reference for all purposes. Here, all gates except the floating gate 20 are non-floating gates, meaning they are electrically connected to or can be electrically connected to a voltage source. Programming is performed by heated electrons from the channel region 18 that inject themselves into the floating gate 20. Erasing is performed by electrons tunneling from the floating gate 20 to the erase gate 30.
[0084] Table 2 shows the typical voltage and current ranges that can be applied to the terminals of memory cell 310 for performing read, erase, and program operations:
[0085] Table 2: Figure 3 Operation of flash memory cell 310
[0086] WL / SG BL CG EG SL Read 1.0-2V 0.6-2V 0-2.6V 0-2.6V 0V erase -0.5V / 0V 0V 0V / -8V 8-12V 0V programming 1V 0.1-1μA 8-11V 4.5-9V 4.5-5V
[0087] Figure 4 A tri-gate memory cell 410 is shown, which is another type of flash memory cell. Memory cell 410 and... Figure 3 The memory cell 310 is the same as memory cell 410, except that memory cell 410 does not have a separate control gate. Except that no control gate bias is applied, erase operations (thus erasing is performed using an erase gate) and read operations are the same as... Figure 3 The operation is similar. Programming is also performed without a control gate bias, and as a result, a higher voltage must be applied to the source line during programming to compensate for the lack of a control gate bias.
[0088] Table 3 shows the typical voltage and current ranges that can be applied to the terminals of memory cell 410 for performing read, erase, and program operations:
[0089] Table 3: Figure 4 Operation of flash memory cell 410
[0090] WL / SG BL EG SL Read 0.7-2.2V 0.6-2V 0-2.6V 0V erase -0.5V / 0V 0V 11.5V 0V programming 1V 0.2-3μA 4.5V 7-9V
[0091] Figure 5 A stacked-gate memory cell 510 is shown, which is another type of flash memory cell. Memory cell 510 and... Figure 2 The memory cell 210 is similar, except that the floating gate 20 extends over the entire channel region 18, and the control gate 22 (which will be coupled to the word line here) extends over the floating gate 20, separated by an insulating layer (not shown). Erasing is performed by electron tunneling from the FG to the substrate via the FN, and programming is performed by channel hot electron (CHE) injection in the region between the channel 18 and the drain region 16, by electron flow from the source region 14 to the drain region 16, and by read operations similar to those of the memory cell 210 with a higher control gate voltage.
[0092] Table 4 shows the typical voltage ranges that can be applied to the terminals of memory cell 510 and substrate 12 to perform read, erase, and program operations:
[0093] Table 4: Figure 5 Operation of flash memory cell 510
[0094] CG BL SL substrate Read 2-5V 0.6–2V 0V 0V erase -8 to -10V / 0V FLT FLT 8-10V / 15-20V programming 8-12V 3-5V 0V 0V
[0095] The methods and apparatus described herein can be applied to other non-volatile memory technologies, such as, but not limited to, FINFET split-gate flash or stacked-gate flash memory, NAND flash memory, SONOS (silicon-oxide-nitride-oxide-silicon with charge trapped in nitride), MONOS (metal-oxide-nitride-oxide-silicon with metal charge trapped in nitride), ReRAM (resistive RAM), PCM (phase-change memory), MRAM (magnetic RAM), FeRAM (ferroelectric RAM), CT (charge-trapping) memory, CN (carbon nanotube) memory, OTP (dual-level or multi-level one-time programmable) and CeRAM (associated electron RAM), etc.
[0096] To utilize memory arrays comprising one of the aforementioned types of non-volatile memory cells in artificial neural networks, two modifications were made. First, the circuitry was configured such that each memory cell could be individually programmed, erased, and read without adversely affecting the memory state of other memory cells in the array, as explained further below. Second, continuous (simulated) programming of the memory cells was provided.
[0097] Specifically, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully erased state to a fully programmed state independently with minimal interference to other memory cells. In another embodiment, the memory state (i.e., the charge on the floating gate) of each memory cell in the array can be continuously changed from a fully programmed state to a fully erased state and vice versa, independently with minimal interference to other memory cells. This means that the cell storage device is analog, or at least can store one discrete value from many discrete values (such as 16 or 64 different values), which allows for very precise and individual tuning of all cells in the memory array, and makes the memory array ideal for storage and fine-tuning of synaptic weights in neural networks.
[0098] Neural networks using non-volatile memory cell arrays
[0099] Figure 6 This conceptually illustrates a non-limiting example of a neural network using a non-volatile memory array in this embodiment. This example uses a non-volatile memory array neural network for a facial recognition application, but any other suitable application can also be implemented using a neural network based on a non-volatile memory array.
[0100] In this example, S0 is the input layer, which is a 32x32 pixel RGB image with 5-bit precision (i.e., three 32x32 pixel arrays, one for each color R, G, and B, with 5-bit precision per pixel). The synapse CB1 from the input layer S0 to layer C1 applies different sets of weights in some cases and shared weights in others, and scans the input image with a 3x3 pixel overlapping filter (kernel), shifting the filter by one pixel (or more than one pixel depending on the model). Specifically, the values of nine pixels in a 3x3 portion of the image (i.e., called the filter or kernel) are provided to synapse CB1, where these nine input values are multiplied by appropriate weights, and after summing the output of this multiplication, a single output value is determined by the first synapse of CB1 and provided for generating one of the pixels in the feature map of layer C1. The 3x3 filter is then shifted one pixel to the right within the input layer S0 (i.e., adding a column of three pixels to the right and releasing a column of three pixels to the left), thereby providing the nine pixel values from this newly positioned filter to synapse CB1, where they are multiplied by the same weights and a second single output value is determined by the associated synapse. This process continues until the 3x3 filter scans all three colors and all bits (precision values) across the entire 32x32 pixel image of the input layer S0. This process is then repeated using different sets of weights to generate different feature maps for layer C1 until all feature maps for layer C1 are computed.
[0101] At layer C1, in this example, there are 16 feature maps, each with 30x30 pixels. Each pixel is a new feature pixel extracted from the product of the input and the kernel, so each feature map is a two-dimensional array. Therefore, in this example, layer C1 consists of a 16-layer two-dimensional array (remember that the layers and arrays referred to in this article are logical relationships, not necessarily physical relationships; that is, the array does not have to be oriented as a physical two-dimensional array). Each of the 16 feature maps in layer C1 is generated by one set of sixteen different groups of synaptic weights applied to the filter scan. The C1 feature maps can all relate to different aspects of the same image features, such as boundary recognition. For example, the first map (generated using the first weight recombination, shared for all scans used to generate the first map) can recognize circular edges, the second map (generated using the second weight recombination, different from the first weight recombination) can recognize rectangular edges, or the aspect ratio of certain features, and so on.
[0102] Before transitioning from layer C1 to layer S1, activation function P1 (pooling) is applied, which pools the values from consecutive non-overlapping 2x2 regions in each feature map. The purpose of pooling function P1 is to average the neighboring locations (or, alternatively, use a max function) to, for example, reduce the dependence on edge locations and reduce the data size before moving to the next stage. At layer S1, there are 16 15x15 feature maps (i.e., sixteen different arrays, each 15x15 pixels). The synapse CB2 from layer S1 to layer C2 scans the map in layer S1 using a 4x4 filter, where the filter is shifted by 1 pixel. At layer C2, there are 22 12x12 feature maps. Before transitioning from layer C2 to layer S2, activation function P2 (pooling) is applied, which pools the values from consecutive non-overlapping 2x2 regions in each feature map. At layer S2, there are 22 6x6 feature maps. An activation function (pooling) is applied to the synapse CB3 from layer S2 to layer C3, where each neuron in layer C3 is connected to each mapping in layer S2 via a corresponding synapse in CB3. There are 64 neurons in layer C3. The synapse CB4 from layer C3 to the output layer S3 completely connects C3 to S3, meaning each neuron in layer C3 is connected to every neuron in layer S3. The output at S3 comprises 10 neurons, with the highest-output neuron determining the class. For example, this output could indicate the recognition or classification of the content of the original image.
[0103] Synapses for each layer are implemented using an array or a portion of an array of non-volatile memory cells.
[0104] Figure 7 This is a block diagram of an array that could be used for this purpose. The vector-matrix multiplication (VMM) array 32 includes non-volatile memory cells and serves as synapses between layers (such as...). Figure 6 (CB1, CB2, CB3, and CB4 in the original text). Specifically, the VMM array 32 includes a non-volatile memory cell array 33, an erase gate and word line gate decoder 34, a control gate decoder 35, a bit line decoder 36, and a source line decoder 37, which decode the corresponding inputs of the non-volatile memory cell array 33. The inputs to the VMM array 32 may come from the erase gate and word line gate decoder 34 or from the control gate decoder 35. In this example, the source line decoder 37 also decodes the outputs of the non-volatile memory cell array 33. Alternatively, the bit line decoder 36 may decode the outputs of the non-volatile memory cell array 33.
[0105] The non-volatile memory cell array 33 serves two purposes. First, it stores weights that will be used by the VMM array 32. Second, the non-volatile memory cell array 33 efficiently multiplies the inputs with the weights stored in the non-volatile memory cell array 33 and adds them together at each output line (source line or bit line) to produce an output that will be used as the input to the next layer or the final layer. By performing multiplication and addition functions, the non-volatile memory cell array 33 eliminates the need for separate multiplication and addition logic circuits and is also highly efficient due to its in-situ memory computation.
[0106] The output of the non-volatile memory cell array 33 is provided to a differential summer (such as a summing operational amplifier or a summing current mirror) 38, which sums the output of the non-volatile memory cell array 33 to create a single value for the convolution. The differential summer 38 is arranged to perform the summation of positive and negative weights.
[0107] The output values of the difference summer 38 are then summed and provided to the activation function circuit 39, which corrects the output. The activation function circuit 39 can provide a sigmoid, tanh, or ReLU function. The corrected output value of the activation function circuit 39 becomes the next layer's (e.g., Figure 6 The elements of the feature map of layer C1 are then applied to the next synapse to produce the next feature map layer or the final layer. Thus, in this example, the non-volatile memory cell array 33 constitutes multiple synapses (which receive their input from existing neuron layers or from input layers such as image databases), and the summing operational amplifier 38 and activation function circuit 39 constitute multiple neurons.
[0108] Figure 7 The inputs to the VMM array 32 (WLx, EGx, CGx and optional BLx and SLx) can be analog, binary or digital (in which case a DAC is provided to convert the digital bits to the appropriate input analog level), and the output can be analog, binary or digital (in which case an output ADC is provided to convert the output analog level to digital bits).
[0109] Figure 8 A block diagram illustrating the use of a multilayer VMM array 32 (here labeled VMM arrays 32a, 32b, 32c, 32d, and 32e). For example... Figure 8As shown, the input (denoted as Inputx) is converted from digital to analog by a digital-to-analog converter 31 and provided to the input VMM array 32a. The converted analog input can be voltage or current. The first-level input D / A conversion can be accomplished by using a function or LUT (lookup table) of appropriate analog level to map Inputx to the input VMM array 32a. Input conversion can also be accomplished by an analog-to-analog (A / A) converter to convert the external analog input into a mapped analog input to the input VMM array 32a.
[0110] The output generated by the input VMM array 32a is provided as input to the next VMM array (hidden level 1) 32b, which in turn generates the output provided as input to the next VMM array (hidden level 2) 32c, and so on. The layers of the VMM array 32 serve as different layers of synapses and neurons in a convolutional neural network (CNN). Each VMM array 32a, 32b, 32c, 32d, and 32e can be an independent physical non-volatile memory array, or multiple VMM arrays can utilize different portions of the same non-volatile memory array, or multiple VMM arrays can utilize overlapping portions of the same physical non-volatile memory array. Figure 8 The example shown contains five layers (32a, 32b, 32c, 32d, 32e): one input layer (32a), two hidden layers (32b, 32c), and two fully connected layers (32d, 32e). Those skilled in the art will recognize that this is merely exemplary, and conversely, a system may include more than two hidden layers and more than two fully connected layers.
[0111] Vector-Matrix Multiplication (VMM) Array
[0112] Figure 9 This illustrates a neuronal VMM array 900, which is particularly suitable for... Figure 3 The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 900 includes a memory array 901 of non-volatile memory cells and a reference array 902 of non-volatile reference memory cells (at the top of the array). Alternatively, another reference array may be placed at the bottom.
[0113] In the VMM array 900, control gate lines (such as control gate line 903) extend vertically (therefore, the reference array 902 is orthogonal to control gate line 903 in the row direction), and erase gate lines (such as erase gate line 904) extend horizontally. Here, the inputs of the VMM array 900 are located on control gate lines (CG0, CG1, CG2, CG3), and the outputs of the VMM array 900 appear on source lines (SL0, SL1). In one embodiment, only even-numbered rows are used, and in another embodiment, only odd-numbered rows are used. The currents placed on each source line (SL0, SL1, respectively) perform a summation function of all currents from the memory cells connected to that particular source line.
[0114] As described herein with respect to neural networks, the non-volatile memory cells of the VMM array 900 (i.e., memory cells 310 of the VMM array 900) are preferably configured to operate in the subthreshold region.
[0115] Bias the non-volatile reference memory cell and non-volatile memory cell described herein within the weak inversion (subthreshold region):
[0116] Ids = Io * e (Vg-Vth) / nVt =w*Io*e (Vg) / nVt ,
[0117] Where w = e (-Vth) / nVt
[0118] Where Ids is the drain-to-source current; Vg is the gate voltage on the memory cell; Vth is the threshold voltage of the memory cell; Vt is the thermal voltage = k*T / q, where k is the Boltzmann constant, T is the temperature in Kelvin, and q is the electron charge; n is the slope factor = 1 + (Cdep / Cox), where Cdep = the capacitance of the depletion layer, and Cox is the capacitance of the gate oxide layer; Io is the memory cell current at the gate voltage equal to the threshold voltage, and Io is related to (Wt / L)*u*Cox*(n-1)*Vt 2 Proportional, where u is the carrier mobility, and Wt and L are the width and length of the memory cell, respectively.
[0119] For I-to-V logarithmic converters that use memory cells (such as reference memory cells or peripheral memory cells) or transistors to convert input current to input voltage:
[0120] Vg = n * Vt * log[Ids / wp * Io]
[0121] Where wp represents the w in the reference memory cell or the peripheral memory cell.
[0122] For a memory array used as a vector matrix multiplier (VMM) array with current input, the output current is:
[0123] Iout = wa * Io * e (Vg) / nVt ,Right now
[0124] Iout = (wa / wp) * Iin = W * Iin
[0125] W = e (Vthp-Vtha) / nVt
[0126] Here, wa = w for each memory cell in the memory array.
[0127] Vthp is the effective threshold voltage of the peripheral memory cell, and Vtha is the effective threshold voltage of the primary (data) memory cell. Note that the threshold voltage of the transistor is a function of the substrate bulk bias voltage, and the substrate bulk bias voltage, denoted as Vsb, can be modulated to compensate for various conditions at such temperatures. The threshold voltage Vth can be expressed as:
[0128]
[0129] Where Vth0 is the threshold voltage with zero substrate bias. It is the surface potential, and γ is the host effect parameter.
[0130] Word lines or control gates can be used as inputs to memory cells that accept input voltages.
[0131] Alternatively, the flash memory cells of the VMM array described herein can be configured to operate in a linear region:
[0132] Ids=β*(Vgs-Vth)*Vds;β=u*Cox*Wt / L
[0133] W = α(Vgs - Vth)
[0134] This means that the weight W in the linear region is proportional to (Vgs-Vth).
[0135] Word lines, control gates, bit lines, or source lines can be used as inputs to memory cells operating in a linear region. Bit lines or source lines can be used as outputs to memory cells.
[0136] For an IV linear converter, a memory cell (such as a reference memory cell or a peripheral memory cell) or a transistor operating in the linear region can be used to linearly convert the input / output current into the input / output voltage.
[0137] Alternatively, the memory cells of the VMM array described herein can be configured to operate in a saturation region:
[0138] Ids = 1 / 2 * β * (Vgs - Vth) 2 β=u*Cox*Wt / L
[0139] Wα(Vgs-Vth) 2 This means that the weight W is related to (Vgs-Vth). 2 proportional
[0140] Word lines, control gates, or erase gates can be used as inputs to memory cells operating in saturation regions. Bit lines or source lines can be used as outputs of output neurons.
[0141] Alternatively, the memory cells of the VMM array described herein can be used in all regions or combinations thereof (subthreshold, linear, or saturated regions) of each or more layers of a neural network.
[0142] U.S. Patent Application No. 15 / 826,345 describes Figure 7 Other embodiments of the VMM array 32 are described herein, and this application is incorporated herein by reference. As described herein, source lines or bit lines can be used as neuron outputs (current summation outputs).
[0143] Figure 10 This illustrates a neuronal VMM array 1000, which is particularly suitable for... Figure 2 The memory cell 210 shown serves as a synapse between the input layer and the next layer. The VMM array 1000 includes a memory array 1003 of non-volatile memory cells, a reference array 1001 of first non-volatile reference memory cells, and a reference array 1002 of second non-volatile reference memory cells. The reference arrays 1001 and 1002, arranged along the column direction of the array, are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs WL0, WL1, WL2, and WL3. In practice, the first and second non-volatile reference memory cells are diode-connected via a multiplexer 1014 (partially shown) through which current inputs flow. The reference cells are tuned (e.g., programmed) to a target reference level. The target reference level is provided by a reference microarray matrix (not shown).
[0144] The memory array 1003 serves two purposes. First, it stores the weights used by the VMM array 1000 on their respective memory cells. Second, the memory array 1003 efficiently multiplies the inputs (i.e., the current inputs provided in terminals BLR0, BLR1, BLR2, and BLR3, which reference arrays 1001 and 1002 convert into input voltages to be provided to word lines WL0, WL1, WL2, and WL3) by the weights stored in the memory array 1003, and then sums all the results (memory cell currents) to produce an output on the corresponding bit lines (BL0-BLN), which will be the input to the next layer or the final layer. By performing multiplication and addition functions, the memory array 1003 eliminates the need for separate multiplication and addition logic circuits and is also highly efficient. Here, the voltage inputs are provided on the word lines (WL0, WL1, WL2, and WL3), and the outputs appear on the corresponding bit lines (BL0-BLN) during read (inference) operations. The current placed on each bit line in the bit lines BL0-BLN performs a summation function of the currents from all non-volatile memory cells connected to that particular bit line.
[0145] Table 5 shows the operating voltages and currents for the VMM array 1000. The columns in the table indicate the voltages applied to the word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The rows indicate read, erase, and program operations.
[0146] Table 5: Figure 10 Operation of VMM array 1000 :
[0147] WL WL - Not selected BL BL - Not Selected SL SL - Not selected Read 1-3.5V -0.5V / 0V 0.6-2V (Ineuron) 0.6V-2V / 0V 0V 0V erase Approximately 5-13V 0V 0V 0V 0V 0V programming 1-2V -0.5V / 0V 0.1-3uA Vinh approximately 2.5V 4-10V 0-1V / FLT
[0148] Figure 11 The neuronal VMM array 1100 is shown, which is particularly suitable for Figure 2The memory cell 210 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1100 includes a memory array 1103 of non-volatile memory cells, a reference array 1101 of first non-volatile reference memory cells, and a reference array 1102 of second non-volatile reference memory cells. Reference arrays 1101 and 1102 extend in the row direction of the VMM array 1100. The VMM array is similar to VMM 1000, except that in VMM array 1100, word lines extend in the vertical direction. Here, inputs are set on word lines (WLA0, WLB0, WLA1, WLB2, WLA2, WLB2, WLA3, WLB3), and outputs appear on source lines (SL0, SL1) during read operations. The current placed on each source line performs a summation function of all currents from the memory cells connected to that particular source line.
[0149] Table 6 shows the operating voltages and currents for the VMM array 1100. The columns in the table indicate the voltages applied to the word lines for selected cells, word lines for unselected cells, bit lines for selected cells, bit lines for unselected cells, source lines for selected cells, and source lines for unselected cells. The rows indicate read, erase, and program operations.
[0150] Table 6: Figure 11 Operation of VMM array 1100
[0151]
[0152] Figure 12 This illustrates a neuronal VMM array 1200, which is particularly suitable for... Figure 3 The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1200 includes a memory array 1203 of non-volatile memory cells, a reference array 1201 of first non-volatile reference memory cells, and a reference array 1202 of second non-volatile reference memory cells. Reference arrays 1201 and 1202 are used to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs CG0, CG1, CG2, and CG3. In practice, the first and second non-volatile reference memory cells are diode-connected via a multiplexer 1212 (partially shown), through which current inputs flow via BLR0, BLR1, BLR2, and BLR3. Each multiplexer 1212 includes a corresponding multiplexer 1205 and a common-source cascode transistor 1204 to ensure that the voltage on the bit lines (such as BLRO) of each of the first and second non-volatile reference memory cells remains constant during read operations. The reference cells are tuned to a target reference level.
[0153] Memory array 1203 serves two purposes. First, it stores weights that will be used by VMM array 1200. Second, memory array 1203 efficiently multiplies the inputs (current inputs provided to terminals BLR0, BLR1, BLR2, and BLR3, which reference arrays 1201 and 1202 convert into input voltages to be provided to the control gates (CG0, CG1, CG2, and CG3)) by the weights stored in the memory array, and then sums all the results (cell currents) to produce an output that appears on BL0-BLN and will be the input to the next layer or the final layer. By performing multiplication and addition functions, the memory array eliminates the need for separate multiplication and addition logic circuits and is also highly efficient. Here, the inputs are provided on the control gate lines (CG0, CG1, CG2, and CG3), and the outputs appear on the bit lines (BL0–BLN) during read operations. The currents placed on each bit line perform a summation function of all the currents from the memory cells connected to that particular bit line.
[0154] VMM array 1200 implements unidirectional tuning for the non-volatile memory cells in memory array 1203. That is, each non-volatile memory cell is erased and then partially programmed until the desired charge is reached on the floating gate. If too much charge is placed on the floating gate (causing an incorrect value to be stored in the cell), the cell is erased, and the sequence of partial programming operations restarts. As shown, two rows sharing the same erase gate (such as EG0 or EG1) are erased together (this is called page erase), and thereafter, each cell is partially programmed until the desired charge is reached on the floating gate.
[0155] Table 7 shows the operating voltages and currents used for the VMM array 1200. The columns in the table indicate the word lines for the selected cell, the word lines for the unselected cell, the bit lines for the selected cell, the bit lines for the unselected cell, the control gate for the selected cell, the control gate for the unselected cell in the same sector as the selected cell, the control gate for the unselected cell in a different sector from the selected cell, the erase gate for the selected cell, the erase gate for the unselected cell, the source line for the selected cell, and the voltage on the source line for the unselected cell. The rows indicate read, erase, and program operations.
[0156] Table 7: Figure 12 Operation of VMM array 1200
[0157]
[0158] Figure 13 The neuronal VMM array 1300 is shown, which is particularly suitable for Figure 3The memory cell 310 shown serves as a synapse and component for neurons between the input layer and the next layer. The VMM array 1300 includes a memory array 1303 of non-volatile memory cells, a reference array 1301 of first non-volatile reference memory cells, and a reference array 1302 of second non-volatile reference memory cells. EG lines EGR0, EG0, EG1, and EGR1 extend vertically, while CG lines CG0, CG1, CG2, and CG3 and SL lines WL0, WL1, WL2, and WL3 extend horizontally. The VMM array 1300 is similar to the VMM array 1400, except that the VMM array 1300 implements bidirectional tuning, where each individual cell can be completely erased, partially programmed, and partially erased as needed to achieve the desired amount of charge on the floating gate due to the use of individual EG lines. As shown in the figure, reference arrays 1301 and 1302 convert the input currents in terminals BLR0, BLR1, BLR2, and BLR3 into control gate voltages CG0, CG1, CG2, and CG3 to be applied to memory cells in the row direction (through the operation of reference cells connected via diodes of multiplexer 1314). Current outputs (neurons) are in bit lines BL0-BLN, where each bit line sums all currents from non-volatile memory cells connected to that particular bit line.
[0159] Table 8 shows the operating voltages and currents used for the VMM array 1300. The columns in the table indicate the word lines for the selected cell, the word lines for the unselected cell, the bit lines for the selected cell, the bit lines for the unselected cell, the control gate for the selected cell, the control gate for the unselected cell in the same sector as the selected cell, the control gate for the unselected cell in a different sector from the selected cell, the erase gate for the selected cell, the erase gate for the unselected cell, the source line for the selected cell, and the voltage on the source line for the unselected cell. The rows indicate read, erase, and program operations.
[0160] Table 8: Figure 13 Operation of VMM array 1300
[0161]
[0162] Improved VMM system with page- or word-based tuning
[0163] Figure 40A VMM array 4000 is shown. The VMM array 4000 enables unidirectional or bidirectional tuning for pages of non-volatile memory cells. Here, an exemplary page 4001 comprises two words, each in a different row. A word comprises multiple (e.g., 8-64) memory cells. A particular word can include only one or a few cells. Pairs of adjacent rows share a source line, such as SL0 or SL1. All cells in page 4001 share a common erase gate line controlled by an erase gate enable transistor 4002, which controls the supply of voltage to the erase gate terminal EGW of all cells in the exemplary page set 4001. Here, it is possible to erase all cells in page 4001 simultaneously. Subsequently, cells in page 4001 can be tuned unidirectionally or bidirectionally via programming (cellwise, meaning each cell in the word can be tuned at once; wordwise, meaning all cells in the word can be tuned simultaneously) and erasing (wordwise, meaning all cells in the word can be tuned simultaneously). Some cells in page 4001 can also be tuned unidirectionally via programming. Programming operations can include those described below relative to... Figures 24 to 2 The precise programming technique of 6. When too much electron charge is placed on the floating gate (which will cause an incorrect current value to be stored in the cell, i.e., the current value is lower than the expected current value), the cell must be erased and part of the programming sequence must be restarted.
[0164] Figure 41 A VMM array 4100 is shown. The VMM array 4100 enables unidirectional or bidirectional tuning of words of non-volatile memory cells. Here, an exemplary word 4101 comprises multiple cells in a row. All cells in word 4101 share a common erase gate line controlled by an erase gate enable transistor 4102, which controls the supply of voltage to the erase gate terminals of all cells in word 4101. Here, all cells in word 4101 can be erased simultaneously. Subsequently, cells in word 4101 can be unidirectionally or bidirectionally tuned by programming (cellwise, meaning each cell in the word can be tuned one at a time; wordwise, meaning all cells in the word can be tuned simultaneously) and erasing (wordwise, meaning all cells in the word can be tuned simultaneously). Programming operations can include precise programming techniques described below. When too much electronic charge is placed on the floating gate (causing an incorrect current value to be stored in the cell, i.e., a current value lower than the expected current value), the cell must be erased, and part of the sequence of programming operations must be restarted.
[0165] Figure 42A VMM array 4200 is shown. The VMM array 4200 enables unidirectional or bidirectional tuning of words of non-volatile memory cells. Here, an exemplary word 4201 comprises two half-words of cells. Each half-word belongs to a row sharing an erase gate. All cells in word 4201 share a common erase gate line connected to the erase gate terminal EGW. Unlike VMM arrays 1800 and 1700, there is no erase gate enable transistor. Here, all cells in word 4201 can be erased simultaneously. Thereafter, cells in word 4201 can be unidirectionally or bidirectionally tuned by programming (cellwise, meaning each cell in the word can be tuned one at a time; wordwise, meaning all cells in the word can be tuned simultaneously) and erasing (wordwise, meaning all cells in the word can be tuned simultaneously). Programming operations can include the precise programming techniques described below. If too much electronic charge is placed on the floating gate (causing incorrect current values to be stored in the cell, i.e., the current value is lower than the expected current value), the cell must be erased, and part of the programming sequence must be restarted.
[0166] Long Short-Term Memory
[0167] Existing technologies include a concept known as Long Short-Term Memory (LSTM). LSTM cells are commonly used in neural networks. LSTM allows neural networks to remember information for predetermined arbitrary time intervals and use that information in subsequent operations. A typical LSTM cell includes a cell, an input gate, an output gate, and a forget gate. The three gates regulate the flow of information into and out of the cell and the time interval at which information is remembered in the LSTM. Virtual Memory Models (VMMs) are particularly useful in LSTM cells.
[0168] Figure 14 An exemplary LSTM 1400 is illustrated. In this example, the LSTM 1400 includes cells 1401, 1402, 1403, and 1404. Cell 1401 receives an input vector x0 and generates an output vector h0 and a cell state vector c0. Cell 1402 receives an input vector x1, an output vector (hidden state) h0 from cell 1401, and a cell state c0 from cell 1401, and generates an output vector h1 and a cell state vector c1. Cell 1403 receives an input vector x2, an output vector (hidden state) h1 from cell 1402, and a cell state c1 from cell 1402, and generates an output vector h2 and a cell state vector c2. Cell 1404 receives an input vector x3, an output vector (hidden state) h2 from cell 1403, and a cell state c2 from cell 1403, and generates an output vector h3. Additional cells may be used, and this four-cell LSTM is merely an example.
[0169] Figure 15Showing what can be used Figure 14 An exemplary specific implementation of LSTM unit 1500 in units 1401, 1402, 1403 and 1404 is provided. LSTM unit 1500 receives input vector x(t), unit state vector c(t-1) from the previous unit and output vector h(t-1) from the previous unit, and generates unit state vector c(t) and output vector h(t).
[0170] LSTM unit 1500 includes sigmoid function devices 1501, 1502, and 1503, each applying a number between 0 and 1 to control the amount of each component in the input vector allowed to pass through to the output vector. LSTM unit 1500 also includes tanh devices 1504 and 1505 for applying a hyperbolic tangent function to the input vector, multiplier devices 1506, 1507, and 1508 for multiplying two vectors together, and adder device 1509 for adding two vectors together. The output vector h(t) can be provided to the next LSTM unit in the system, or it can be accessed for other purposes.
[0171] Figure 16 The diagram shows LSTM unit 1600, which is an example of a specific implementation of LSTM unit 1500. For the reader's convenience, LSTM unit 1600 uses the same numbering as LSTM unit 1500. The sigmoid function devices 1501, 1502, and 1503, and the tanh device 1504 each include multiple VMM arrays 1601 and activation circuit blocks 1602. Therefore, it can be seen that VMM arrays are particularly useful in LSTM units used in certain neural network systems.
[0172] Alternative forms of the LSTM unit 1600 (and another example of a specific implementation of the LSTM unit 1500) are in Figure 17 As shown in [the image]. Figure 17In this context, sigmoid function devices 1501, 1502, and 1503, as well as tanh device 1504, share the same physical hardware (VMM array 1701 and activation function block 1702) in a time-division multiplexing manner. The LSTM unit 1700 also includes a multiplier device 1703 for multiplying two vectors together, an adder device 1708 for adding two vectors together, a tanh device 1505 (which includes an activation circuit block 1702), a register 1707 for storing the value i(t) when it is output from the sigmoid function block 1702, a register 1704 for storing the value f(t)*c(t-1) when it is output from the multiplier device 1703 via multiplexer 1710, a register 1705 for storing the value i(t)*u(t) when it is output from the multiplier device 1703 via multiplexer 1710, a register 1706 for storing the value o(t)*c~(t) when it is output from the multiplier device 1703 via multiplexer 1710, and a multiplexer 1709.
[0173] LSTM unit 1600 includes multiple VMM arrays 1601 and corresponding activation function blocks 1602, while LSTM unit 1700 includes only one set of VMM arrays 1701 and activation function blocks 1702, which are used to represent multiple layers in an implementation of LSTM unit 1700. LSTM unit 1700 will require less space than LSTM 1600 because LSTM unit 1700 only needs 1 / 4 of its space for VMM and activation function blocks compared to LSTM unit 1600.
[0174] It is also understood that an LSTM cell will typically comprise multiple VMM arrays, each requiring functionality provided by certain circuit blocks outside the VMM array itself (such as summer and activation circuit blocks, and high-voltage generation blocks). Providing a separate circuit block for each VMM array would require a significant amount of space within the semiconductor device and would be inefficient to some extent. Therefore, the implementation described below attempts to minimize the circuitry required outside the VMM arrays themselves.
[0175] Gate control recursive unit
[0176] A simulated VMM implementation can be used in gated recurrent unit (GRU) systems. A GRU is the gated mechanism in a recurrent neural network. A GRU is similar to an LSTM, but a GRU unit typically contains fewer components than an LSTM unit.
[0177] Figure 18An exemplary GRU 1800 is shown. This example GRU 1800 includes units 1801, 1802, 1803, and 1804. Unit 1801 receives an input vector x0 and generates an output vector h0. Unit 1802 receives an input vector x1, the output vector h0 from unit 1801, and generates an output vector h1. Unit 1803 receives an input vector x2 and the output vector (hidden state) h1 from unit 1802 and generates an output vector h2. Unit 1804 receives an input vector x3 and the output vector (hidden state) h2 from unit 1803 and generates an output vector h3. Additional units may be used, and this four-unit GRU is merely an example.
[0178] Figure 19 Showing what can be used Figure 18 An exemplary specific implementation of GRU unit 1900 of units 1801, 1802, 1803, and 1804. GRU unit 1900 receives an input vector x(t) and an output vector h(t-1) from a previous GRU unit, and generates an output vector h(t). GRU unit 1900 includes sigmoid function devices 1901 and 1902, each of which applies a number between 0 and 1 to the components from the output vector h(t-1) and the input vector x(t). GRU unit 1900 also includes a tanh device 1903 for applying a hyperbolic tangent function to the input vector, multiple multiplier devices 1904, 1905, and 1906 for multiplying two vectors together, an adder device 1907 for adding two vectors together, and a complementary device 1908 for subtracting the input from 1 to generate the output.
[0179] Figure 20 GRU unit 2000 is shown as an example of a specific implementation of GRU unit 1900. For the reader's convenience, GRU unit 2000 uses the same numbering as GRU unit 1900. Figure 20 As shown, sigmoid function devices 1901 and 1902 and tanh device 1903 each include multiple VMM arrays 2001 and activation function blocks 2002. Therefore, it can be seen that VMM arrays are particularly useful in GRU units used in certain neural network systems.
[0180] Alternative forms of the GRU unit 2000 (and another example of a specific implementation of the GRU unit 1900) are in Figure 21 As shown in [the image]. Figure 21 In this configuration, the GRU unit 2100 utilizes a VMM array 2101 and an activation function block 2102, which, when configured as a sigmoid function, applies numbers between 0 and 1 to control how much of each component in the input vector is allowed to pass through to the output vector. Figure 21 In this context, sigmoid function devices 1901 and 1902 and tanh device 1903 share the same physical hardware (VMM array 2101 and activation function block 2102) in a time-division multiplexing manner. GRU unit 2100 also includes a multiplier device 2103 for multiplying two vectors together, an adder device 2105 for adding two vectors together, a complement device 2109 for subtracting the input from 1 to generate the output, a multiplexer 2104, a register 2106 for holding the value h(t-1)*r(t) when it is output from the multiplier device 2103 via the multiplexer 2104, a register 2107 for holding the value h(t-1)*z(t) when it is output from the multiplier device 2103 via the multiplexer 2104, and a register 2108 for holding the value h^(t)*(1-z(t)) when it is output from the multiplier device 2103 via the multiplexer 2104.
[0181] GRU unit 2000 includes multiple sets of VMM arrays 2001 and activation function blocks 2002, while GRU unit 2100 includes only one set of VMM arrays 2101 and activation function blocks 2102, which are used to represent multiple layers in an implementation of GRU unit 2100. GRU unit 2100 will require less space than GRU unit 2000 because GRU unit 2100 only needs 1 / 3 of its space for VMMs and activation function blocks compared to GRU unit 2000.
[0182] It is also understandable that a GRU system will typically include multiple VMM arrays, each requiring functionality provided by certain circuit blocks outside the VMM array itself (such as summer and activation circuit blocks, and high-voltage generation blocks). Providing a separate circuit block for each VMM array would require a significant amount of space within the semiconductor device and would be inefficient to some extent. Therefore, the implementation described below attempts to minimize the circuitry required outside the VMM arrays themselves.
[0183] The inputs to a VMM array can be analog levels, binary levels, pulses, time-modulated pulses, or digital bits (in which case a DAC is needed to convert the digital bits into the appropriate input analog level), and the outputs can be analog levels, binary levels, timed pulses, pulses, or digital bits (in which case an output ADC is needed to convert the output analog level into digital bits).
[0184] For each memory cell in the VMM array, each weight W can be implemented by a single memory cell, a differential cell, or a hybrid memory cell (the average of two cells). In the case of differential cells, two memory cells are needed to implement the weight W as a differential weight (W = W+ – W-). In the case of two hybrid memory cells, two memory cells are needed to implement the weight W as the average of two cells.
[0185] Overview of VMM Systems
[0186] Figure 43 A block diagram of a VMM system 4300 is shown. The VMM system 4300 includes a VMM array 4301, a row decoder 4302, a high-voltage decoder 4303, a column decoder 4304, a bitline driver 4305, input circuitry 4306, output circuitry 4307, control logic unit 4308, and a bias generator 4309. The VMM system 4300 further includes a high-voltage generation block 4310, which includes a charge pump 4311, a charge pump regulator 4312, and a high-voltage level generator 4313. The VMM system 4300 further includes a (programming / erasing, or weighted tuning) algorithm controller 4314, analog circuitry 4315, a control engine 4316 (which may include special functions such as arithmetic functions, activation functions, embedded microcontroller logic, etc.), and test control logic unit 4317. The systems and methods described below can be implemented in the VMM system 4300.
[0187] Input circuitry 4306 may include circuitry such as a DAC (digital-to-analog converter), DPC (digital-to-pulse converter, digital-to-time-modulated pulse converter), AAC (analog-to-analog converter, such as current-to-voltage converter, logarithmic converter), PAC (pulse-to-analog level converter), or any other type of converter. Input circuitry 4306 can implement normalization, linear, or nonlinear up / down scaling functions or arithmetic functions. Input circuitry 4306 can implement a temperature compensation function for the input level. Input circuitry 4306 can implement activation functions such as ReLU or sigmoid. Output circuitry 4307 may include circuitry such as an ADC (analog-to-digital converter for converting the analog output of a neuron into digital bits), AAC (analog-to-analog converter, such as current-to-voltage converter, logarithmic converter), APC (analog-to-pulse converter, analog-to-time-modulated pulse converter), or any other type of converter. Output circuitry 4307 can implement activation functions such as ReLU or sigmoid. The output circuit 4307 can implement statistical normalization, regularization, up / down scaling / gain functions, statistical rounding, or arithmetic functions (e.g., addition, subtraction, division, multiplication, shifting, logarithms) on the neuron output. The output circuit 4307 can also implement temperature compensation functions on the neuron output or array output (such as bitline output) to keep the array's power consumption approximately constant or to improve the accuracy of the array (neuron) output, such as by keeping the IV slope approximately the same.
[0188] Implementation scheme for precise programming of cells in VMM
[0189] Figure 22A Programming method 2200 is illustrated. First, the method begins (step 2201), which typically occurs in response to receiving a programming command. Next, a batch programming operation programs all cells to a "0" state (step 2202). Then, a soft erase operation erases all cells to an intermediate level (achieved through a weak erase, i.e., incomplete erase), such that each cell will consume approximately 3-5 μA of current during a read operation (step 2203). This contrasts with deep erase levels, where each cell consumes approximately ~20-30 μA of current during a read operation. Then, hard programming to a very deep programming state is performed on all unselected cells or zero-weight cells (i.e., cells with weight = 0 or negligible weight, i.e., weights within a negligible threshold) to add electrons to the cell's floating gate and remove all positive charge (step 2204), ensuring that those cells are truly "off," meaning those cells will consume negligible amounts of current during read operations.
[0190] Then, a coarse programming method is performed on the selected cells (step 2205), followed by a precise programming method (step 2206) to program the required precise values for each selected cell. Here, the selected cells are those identified as the subject of programming method 2200 and selected by asserting appropriate word lines and bit lines or by some other mechanism.
[0191] Figure 22B Another programming method 2210, similar to programming method 2200, is shown. However, instead of... Figure 22A The programming operation, as described in step 2202, programs all cells to the "0" state. After the method begins (step 2201), an erase operation is used to erase all cells to the "1" state (step 2212). Then, a soft programming operation (step 2213) is used to program all cells to an intermediate level (achieved through soft programming, i.e., incomplete programming), such that each cell will consume approximately 3-5uA of current during a read operation. Afterwards, unselected cells are hard programmed (step 2204), and coarse and fine programming methods are performed (steps 2205 to 2206), as described above regarding... Figure 22A As described. Figure 22B The variant of the implementation completely removes the soft programming method (step 2213).
[0192] Figure 23 A first embodiment of the coarse programming method 2205 is shown, which is a search and execution method 2300. First, a lookup table search or a pre-determined function is performed to determine a coarse target current value (I) for each cell in the selected cells based on the value intended to be stored in the selected cells. CT (Step 2301). The selected cell can be programmed to store one of N possible values (e.g., unrestricted 128, 64, 32). Each of the N values corresponds to a different desired current value (I0) to be consumed by the selected cell during a read operation. D In one implementation, the lookup table or function (e.g., a function derived from curve fitting to data or a physical function based on memory performance, wherein the function operates on variables such as a final target value and existing values, and calculates the expected or desired target for the next operation) contains M possible current values to be used as a coarse target current value I for the selected cell during the search and execution of method 2300. CT Where M is an integer less than N. For example, if N is 8, then M can be 4, which means there are 8 possible values that the selected cell can store, and 4 rough target current values I. CT One of the rough target current values will be selected as the rough target current value I for searching and executing method 2300. CTIn other words, the search and execution method 2300 is arranged to rapidly program the selected cell to approximate the desired current value I to a certain extent. D Rough target current value (I CT Then, the precise programming method 2206 programs the selected cell more precisely to be extremely close to the desired current value I. D .
[0193] For the simple example of N=8 and M=4, examples of unit values, desired current values, and rough target current values are shown in Tables 9 and 10:
[0194] Table 9: Examples of N desired current values when N=8
[0195] <![CDATA[ The value stored in the selected cell ]]> <![CDATA[ Desired current value (I) D ) ]]> 000 100pA 001 200pA 010 300pA 011 400pA 100 500pA 101 600pA 110 700pA 111 800pA
[0196] Table 10: Examples of M target current values when M=4
[0197] <![CDATA[ Rough target current value (I) CT ) ]]> <![CDATA[ Associated cell values ]]> <![CDATA[200pA+ I CTOFF SET1 ]]> 000,001 <![CDATA[400pA+ I CTOFF SET2 ]]> 010,011 <![CDATA[600pA+ I CTOFF SET3 ]]> 100,101 <![CDATA[800pA+ I CTOFF SET4 ]]> 110,111
[0198] Offset value I CTOFFSETx This is used to prevent the desired current value from being exceeded during rough programming.
[0199] Once a rough target current value I is selected CT The selected cell is programmed by applying voltage v0 to the appropriate terminal of the selected cell based on the cell architecture type of the selected cell (e.g., memory cell 210, 310, 410, or 510) (step 2302). If the selected cell is Figure 3 For a medium-sized memory cell 310, voltage v0 is applied to the control gate terminal 28, and based on the approximate target current value I... CT V0 can be 5-7V. The value of V0 can optionally be obtained from a voltage lookup table storing v0 and a rough target current value I. CT Sure.
[0200] Next, by applying a voltage v i =v i-1 +v increment Program the selected unit, where i starts from 1 and increments each time the step is repeated (step 2303), and v increment To achieve a small voltage suitable for the required granularity of programming, the first time step 2303 is executed, i = 1, and v1 will be v0 + v increment Then, a verification operation (step 2304) is performed, in which a read operation is performed on the selected cell, and the current consumed through the selected cell (I) is measured. cell If I cell Less than or equal to I CTIf (the first threshold) is reached, then the search and execution method 2300 is complete, and the precise programming method 2206 can begin. If I cell Not less than or equal to I CT Then repeat step 2303, and increment i.
[0201] Therefore, at the moment when coarse programming method 2205 ends and precise programming method 2206 begins, the voltage v i This will be the final voltage used to program the selected cell, and the selected cell stores the approximate target current value I. CT The associated value. Precise programming method 2206 programs the selected cell to consume the desired current value I during a read operation. D The current is the desired current value I associated with the value intended to be stored in the selected cell, by adding or subtracting an acceptable deviation amount, such as 50 pA or less. D .
[0202] Figure 24 An example is shown of different voltage increments that can be applied to the control gate of a selected memory cell during the precise programming method 2206.
[0203] In the first embodiment, an incremental voltage is gradually applied to the control gate to further program the selected memory cell. The starting point is v. i This is the final voltage applied during coarse programming method 2205. The increment v p1 Add to v1, then use voltage v1+v p1 Program the selected unit (indicated by the second pulse from the left in progressive 2401). p1 For less than v increment The increment (of the voltage increment used during coarse programming method 2205). After each programming voltage is applied, a verification step (similar to step 2304) is performed, where it is determined whether Icell is less than or equal to I. PT1 (This is the first precise target current value and here it is the second threshold), where I PT1 =I D +I PT1OFFSET , where I PT1OFFSET This is an offset value added to prevent programming overshoot. If it is not less than or equal to I... PT1 Then another increment v p1 Add to the previously applied programming voltage and repeat the process. In I cell Less than or equal to I PT1 When I..., that part of the programming sequence stops. Optionally, if I... PT1 equals I D Or approximate it to I with sufficient precision. DIf the selected memory cell has been successfully programmed (meaning an acceptable deviation), then the selected memory cell has been successfully programmed.
[0204] If I PT1 Not equal to I D Or approximate it to I with sufficient precision. D Then, further programming with finer granularity is performed. Here, we now use Progressive 2402. The starting point of Progressive 2402 is the final voltage used for programming under Progressive 2401. The increment V... p2 (It is less than v) p1 A programming voltage is added, and a combined voltage is applied to program the selected memory cell. After each programming voltage is applied, a verification step (similar to step 2304) is performed, where I is determined. cell Is it less than or equal to I? PT2 (This is the second precise target current value and here the third threshold), where I PT2 =ID+I PT2OFFSET And among them I PT2OFFSET This is an offset value added to prevent programming overshoot. If I cell Not less than or equal to I PT2 Then another increment V p2 Add to the previously applied programming voltage and repeat the process. In I cell Less than or equal to I PT2 When that happens, that part of the programming sequence stops. Here, assume I... PT2 equals I D Or close enough to I D This allows programming to stop because the target value has been achieved with sufficient precision. Those skilled in the art will understand that if I... PT2 Not equal to I D Or close enough to I D Since programming can stop, then additional progress can be applied with increasingly smaller increments of programming being used. For example, in Figure 25 In the middle, apply three progressive steps (2501, 2502, and 2503) instead of just two.
[0205] A second embodiment is shown in progressive 2403. Here, instead of increasing the programming voltage applied during the programming of the selected memory cell, the same programming voltage is applied over the duration of the increasing cycle. Instead of adding an incremental voltage such as v in progressive 2401... p1 And add v to the cumulative 2403 p2 Instead, the additional time increment t p1 Add to the programming pulses such that each applied pulse is t longer than the previously applied pulse. p1In the example shown, the first pulse has a duration tp0, and the second pulse has a duration t p0 +t p1 After each programming pulse is applied, the same verification steps as previously described for progression 2401 are performed. Optionally, additional progressions may be applied if the additional time increment added to the programming pulse has a shorter duration than the previously used progression. Although only one time progression is shown, those skilled in the art will understand that any number of different time progressions may be applied.
[0206] Additional details will now be provided for two additional implementations of the coarse programming method 2205.
[0207] Figure 26A A second implementation of the rough programming method 2205 is shown (in...) Figure 22A and Figure 22B (As shown in the diagram), this is the adaptive calibration method 2600. The method begins (step 2601). The selected cell is programmed with a default initial programming voltage value v0 (step 2602). Unlike in the search and execution method 2300, here the programming voltage value v0 is not derived from a lookup table or a function, but is a relatively small initial value. The control gate voltage (Vcg) of the cell is measured at a first current value IR1 (e.g., 100na) and a second current value IR2 (e.g., 10na), and the slope is determined and stored based on these measurements (e.g., 360mV / decimal current) (step 2603).
[0208] Determine the new programming voltage v i The first time this step is performed, i = 1, and a subthreshold formula is used based on the stored slope and current target value (such as a rough target current value I). CT The offset value is used to determine v1, as follows:
[0209] V i =V i-1 +V increment ,
[0210] v increment The slope of Vcg is proportional to the slope of log[Ids / wa*Io], where
[0211] Vcg = n * Vt * log[Ids / wa * Io]
[0212] Here, Vcg is the control gate voltage, wa is the w of the memory cell, and Ids is the target current value plus the offset value.
[0213] If the stored slope value is relatively steep, a relatively small current offset value can be used. If the stored slope value is relatively flat, a relatively high current offset value can be used. Therefore, determining the slope information allows for the selection of a current offset value tailored to the specific cell under consideration. This ultimately shortens the programming process. When this step is repeated, i increments, and v... i =v i-1 +v increment Then use v i To perform programming. increment Available from storage v increment Values and target current values (such as a rough target current value I) CT The lookup table is determined.
[0214] Next, a verification operation occurs, in which a read operation is performed on the selected cell, and the current consumed through the selected cell (I) is recorded. cell ) and the rough target current value I CT Compare (step 2605). If I cell Less than or equal to the approximate target current value I CT , where I CT Set to = I D +I CTOFFSET , where I CTOFFSET The offset value added is to prevent programming overshoot; then the adaptive calibration method 2600 is complete and the precise programming method 2206 can begin. If I cell Not less than or equal to I CT Then repeat steps 2604 to 2605, and increment i.
[0215] Figure 26B A second embodiment of the coarse programming method 2205 is shown, which is an adaptive calibration method 2650. The method begins (step 2651). The cells are programmed with a default starting value v0 (step 2652). v0 is derived from a lookup table formed using silicon characteristics, wherein the values of the table further include providing an offset value I. CTOFFSET This is to ensure that the programming target does not overshoot.
[0216] In step 2653, the IV slope parameter is formed to determine the next programming voltage. The first control gate read voltage V... CGR1 The current is applied to the selected cell, and the cell current IR1 is measured. Then, the second control gate reads the voltage V. CGR2 An application is made to the selected cell, and the resulting cell current IR2 is measured. The slope is determined based on these measurements and stored, for example, according to an equation in the subthreshold region (the cell operating in the subthreshold):
[0217] Slope = (V CGR1 –VCGR2 ) / (LOG(IR1)–LOG(IR2))
[0218] (Step 2653). V CGR1 and V CGR2 Examples of values are 1.5V and 1.3V.
[0219] Determining the slope allows you to select a v value customized for each element in the selected elements. increment Value. This makes the programming process shorter.
[0220] When step 2654 is executed, i is incremented, based on the stored slope value and the coarse target current value I. CT And the offset value, using a formula such as the following to determine the new programming voltage V i :
[0221] V i =V i-1 +V increment ,
[0222] Where v increment =α * slope * (LOG(IR1) – LOG(I)) CT )),
[0223] Where α is a predetermined constant < 1 (programmed offset value) to prevent overshoot, for example, 0.9.
[0224] Then use the programming voltage v i Program the cell (step 2655). Here, depending on the programming scheme used, a voltage may be applied to the source line terminal, control gate terminal, or erase gate terminal of the selected cell. i .
[0225] Next, a verification operation occurs, in which a read operation is performed on the selected cell, and the current consumed through the selected cell (I) is recorded. cell ) and the rough target current value I CT Compare (step 2656). If I cell Less than or equal to the approximate target current value I CT The rough target threshold I CT Set to = I D +I CTOFFSET , where I CTOFFSET If the offset value is added to prevent programming overshoot, the process proceeds to step 2657. Otherwise, the process returns to step 2654, and i is incremented.
[0226] In step 2657, I cell With a current value less than the rough target value I CT Threshold I CT2A comparison is made to determine whether an overshoot has occurred. That is, although steps 2654 to 2656 ensure I... cell Below the rough target current value I CT But I cell It may be too far below the rough target current value I CT That is, an impulse has already occurred, and I cell It can represent the stored value corresponding to the error value. If I cell Not less than or equal to I CT2 If no overshoot occurs and adaptive calibration method 2650 has been completed, the process proceeds to precise programming method 2206, where the initial value v... i The unit is programmed to be at or near the coarse target threshold I. CT If I cell Less than or equal to I CT2 If an overshoot has occurred, the selected cell is erased (step 2658), and the programming process restarts at step 2652, this time using a smaller V. increment To avoid overshooting again. Optionally, if step 2658 is performed more than a predetermined number of times, the selected cell may be considered a bad cell that should not be used.
[0227] The precise programming method 2206 consists of multiple verification cycles and programming cycles, wherein the programming voltage is increased by a constant fine voltage with a fixed pulse width, or wherein the programming voltage is fixed and the programming pulse width varies or remains constant for the next pulse, as described above regarding... Figures 24 to 25 As described.
[0228] Optionally, determine whether the current through the selected non-volatile memory cell during a read or verification operation is less than or equal to a coarse target current value I. CT Step (2656) can be performed by the following steps: applying a fixed bias to the terminals of the non-volatile memory cell; measuring and digitizing the current consumed by the selected non-volatile memory cell to generate a digital output bit; and connecting the digital output bit to a signal representing a first threshold current value I. CT The digits are compared.
[0229] Optionally, determine whether the current through the selected non-volatile memory cell during a read or verification operation is less than or equal to a coarse target current value I. CT The steps can be performed by: applying an input to the terminals of a non-volatile memory cell; modulating the current consumed by the non-volatile memory cell with the input pulse to generate a modulated output; digitizing the modulated output to generate a digital output bit; and interpolating the digital output bit with a signal representing a first threshold current I. CT The digits are compared.
[0230] Figure 27 An exemplary circuit implementation for performing a portion of adaptive calibration method 2600 is shown. During step 2603, current source 2701 is used to apply exemplary current values IR1 and IR2 to the selected cell (here, memory cell 2702), and then the voltage at the control gate of memory cell 2702 is measured (CGR1 for IR1, CGR2 for IR2). As described above, the slope is (V CGR1 –V CGR2 ) / (LOG(IR1)–LOG(IR2)).
[0231] Figure 28 Another embodiment of the coarse programming method 2205 is shown, which is an absolute calibration method 2800. The method begins (step 2801). The cell is programmed with a default starting value v0 (step 2802). The control gate voltage (Vtarget) of the cell is measured at the current value Itarget (i.e., the final expected value of the cell current). CGRx And store it (step 2803). Determine the programming voltage v1 based on the stored control gate voltage and current value Itarget plus the offset value Ioffset+Itarget (step 2804). For example, the new programming voltage v1 can be calculated as follows: v1=v0+(V CGBIAS - Stored V CGR ), where V CGBIAS The default read control gate voltage at the maximum target current (which in one implementation is ~1.5V), and the stored V CGR The control gate voltage is measured and read in step 2803.
[0232] Then use the programming voltage v i Program this unit. When i = 1, use the voltage v1 obtained from step 2804. When i >= 2, use the voltage v. i =v i-1 +V increment v increment Available from storage v increment The value is determined from a lookup table of the current value Itarget. Next, a verification operation occurs, in which a read operation is performed on the selected cell, and the current consumed through the selected cell (Itarget) is recorded. cell ) and the rough target current value I CT Compare (step 2806). If I cell Less than or equal to the approximate target current value I CT Then the absolute calibration method 2800 is completed, and the precise programming method 2206 can begin. If I cell Not less than or equal to the approximate target current value I CT, then repeat steps 2805 to 2806, and increment i.
[0233] Figure 29 FIG. 2900 shows a circuit for implementing step 2803 of the absolute calibration method 2800. A voltage source (not shown) generates V CGR , which starts from an initial voltage and ramps up. Here, n + 1 different current sources 2901 (2901-0, 2901-1, 2901-2,..., 2901-n) generate different currents IO0, IO1, IO2,..., IOn with increasing amplitudes. Each current source 2901 is connected to a corresponding inverter 2902 (2902-0, 2902-1, 2902-2,..., 2902-n) and memory cell 2903 (2903-0, 2903-1, 2903-2,..., 2903-n). When V CGR ramps up, each memory cell 2903 consumes an increasing amount of current, and the input voltage to each inverter 2902 decreases. Since IO0 < IO1 < IO2 <... < IOn, the output of inverter 2902-0 will first switch from low to high as V CGR increases. Next, the output of inverter 2902-1 will switch from low to high, then the output of inverter 2902-2, and so on, until the output of inverter 2902-n switches from low to high. Each inverter 2902 controls a corresponding switch 2904 (2904-0, 2904-1, 2904-2,..., 2904-n), such that when the output of inverter 2902 is low, switch 2904 is closed, and when the output of inverter 2902 is high, switch 2904 is open. When inverter 2902 switches from low to high, the V CGR sampled when switch 2904 is low is held by a corresponding capacitor 2905 (2905-0, 2905-1, 2905-2,..., 2905-n). Thus, each corresponding switch 2904 and capacitor 2905 can form a sample-and-hold circuit. In Figure 28 the absolute calibration method 2800, the values of IO0, IO1, IO2,..., IOn are used as possible values of Itarget, and the corresponding sampled voltages are used as the associated values V CGRx . Graph 2906 shows VCGR ramping up over time, and the outputs of inverters 2902-0, 2902-1, and 2902-n switching from low to high at different times.
[0234] Figure 30An exemplary progression 3000 is shown for programming selected cells during adaptive calibration method 2600 or absolute calibration method 2800. In one embodiment, voltage v cgp A control gate is applied to the memory cell in the selected row. The number of selected memory cells in the selected row is, for example, 32 cells. Therefore, up to 32 memory cells in the selected row can be programmed in parallel. Each memory cell is allowed to be connected to the programming current Iprog via a bit line enable signal. If the bit line enable signal is inactive (meaning a positive voltage is applied to the selected bit line), the memory cell is suppressed (unprogrammed). Figure 30 As shown, the bit line enable signal En_blx (where x varies between 1 and n, where n is the number of bit lines) is allowed at different times to have the required v for that bit line (and thus the selected memory on said bit line). cgp Voltage level. In another embodiment, an enable signal on a bit line can be used to control the voltage applied to the control gate of the selected cell. Each bit line enable signal results in the desired voltage (such as...) corresponding to that bit line. Figure 28 The v mentioned in i ) as v cgp It is applied. The bit line enable signal can also control the programming current flowing into the bit line. In this example, each subsequent control gate voltage v cgp Higher than the previous voltage. Alternatively, each subsequent control gate voltage may be lower or higher than the previous voltage. cgp Each subsequent increment in the equation may or may not be equal to the previous increment.
[0235] Figure 31 An exemplary progression 3100 is shown for programming a selected cell during adaptive calibration method 2600 or absolute calibration method 2800. In one embodiment, a bit line enable signal (e.g., EN_bln, EN_bl1, EN_bl5) enables the selected bit line (i.e., the bit line coupled to the selected memory cell) to be enabled with a corresponding V cgp Programming is done using voltage levels. In another implementation, bit line enable signals can be used to control the voltage applied to the incremental ramp control gate of the selected cell. Each bit line enable signal results in the desired voltage corresponding to that bit line (such as...). Figure 28 The v mentioned in i A control gate voltage is applied. In this example, each subsequent increment is equal to the previous increment.
[0236] Figure 32A system for implementing an input and output method utilizing a VMM array for reading or verification is shown. Input function circuitry 3201 receives digital bit values and converts them into analog signals, which are then used to apply a voltage to a control gate of a selected cell in array 3204. This control gate is selected by a control gate decoder 3202, a word line decoder 3203, and a bit line (not shown). In the embodiment described below, an input is applied to a selected memory cell, which then generates an output current representing a multiplication operation of the received input and the stored weight W in the selected cell. Output neuron circuitry block 3205 performs output actions for each column (neuron) of cells in the VMM array 3204. Output circuitry block 3205 can be implemented using an integrating analog-to-digital converter (ADC), a successive approximation (SAR) ADC, or a Σ-Δ ADC.
[0237] In one implementation, the digital value provided to the input function circuit 3201 comprises four bits (DIN3, DIN2, DIN1, and DIN0), meaning the input can be one of 16 different values. Each of the 16 different combinations of bit values corresponds to a different number of input pulses to be applied to the control gate of the selected cell, which then generates an output current representing the product of the input value in that cell and the stored weight W. A larger number of pulses will result in a larger output value (current) for that cell. Table 11 shows examples of the corresponding number of input bit values DIN[3:0] and the number of pulses applied to the control gate:
[0238] Table 11: Digital Bit Input and Generated Pulses
[0239] DIN3 DIN2 DIN1 DIN0 Generated pulse 0 0 0 0 0 0 0 0 1 1 0 0 1 0 2 0 0 1 1 3 0 1 0 0 4 0 1 0 1 5 0 1 1 0 6 0 1 1 1 7 1 0 0 0 8 1 0 0 1 9 1 0 1 0 10 1 0 1 1 11 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15
[0240] In the example above, there are a maximum of 16 pulses for reading the cell value for a 4-digit number. Each pulse equals one unit of cell value (current). For example, if Icell unit = 1nA, then for DIN[3-0] = 0001, Icell = 1 * 1nA = 1nA; and for DIN[3-0] = 1111, Icell = 15 * 1nA = 15nA.
[0241] In another implementation, the digital bit input uses bit position summation to read the cell value, as shown in Table 12. Here, only four pulses are needed to evaluate the four-bit digital value. For example, the first pulse is used to evaluate DIN0, the second pulse to evaluate DIN1, the third pulse to evaluate DIN2, and the fourth pulse to evaluate DIN3. The results from the four pulses are then summed according to the bit positions. The implemented bit summation formula is as follows: Output = (2^0 * DIN0 + 2^1 * DIN1 + 2^2 * DIN2 + 2^3 * DIN3) * Icell units.
[0242] For example, if the Icell unit = 1nA, then for DIN[3-0] = 0001, the total number of Icells = 0 + 0 + 0 + 1 * 1nA = 1nA; and for DIN[3-0] = 1111, the total number of Icells = 8 * 1nA + 4 * 1nA + 2 * 1nA + 1 * 1nA = 15nA.
[0243] Table 12: Summation of Numeric Inputs
[0244] <![CDATA[ 2^3*DIN3 ]]> <![CDATA[ 2^2*DIN2 ]]> <![CDATA[ 2^1*DIN1 ]]> <![CDATA[ 2^0*DIN0 ]]> <![CDATA[ Total value ]]> 0 0 0 0 0 0 0 0 1 1 0 0 2 0 2 0 0 2 1 3 0 4 0 0 4 0 4 0 1 5 0 4 2 0 6 0 4 2 1 7 8 0 0 0 8 8 0 0 1 9 8 0 2 0 10 8 0 2 1 11 8 4 0 0 12 8 4 0 1 13 8 4 2 0 14 8 4 2 1 15
[0245] Figure 33 An example of a charge summer 3300 is shown, which can be used to sum the output of a VMM during a verification or readout operation to obtain a single analog value representing the output and which can then optionally be converted into a digital bit value. The charge summer 3300 includes a current source 3301 and an array of sample-and-hold circuitry including a switch 3302 and a sample-and-hold (S / H) capacitor 3303. As shown in the example for a 4-bit digital value, there are four S / H circuits to hold the value from four evaluation pulses, where these values are summed at the end of the process. The S / H capacitor 3303 is selected to have a scale associated with 2^n*DINn bit positions of the S / H capacitor; for example, C_DIN3 = x8 Cu DIN3 (where Cu is a unit capacitor), C_DIN2 = x4 Cu for bit DIN2, C_DIN1 = x2 Cu for bit DIN1, and DIN0 = x1 Cu for bit DIN9. The current source 3301 is also assigned a scale accordingly.
[0246] Figure 34A current summer 3400 is shown that can be used to sum the output of a VMM during verification or read operations. The current summer 3400 includes a current source 3401 (which is the output Icell from the VMM array), a transistor 3402, a switch 3403, a node 3404, and a transistor 3405. In this example, the current summer 3400 outputs four digital values DIN0, DIN1, DIN2, and DIN3 serially at node 3404. Four evaluation pulses are sequentially input to the VMM array. During the first pulse, for time period t_DIN0, switch 3403 corresponding to DIN0 is closed and the other switches 3403 are open. During the second pulse, for time period t_DIN1, switch 3404 corresponding to DIN1 is closed and the other switches are open. During the third pulse, for time period t_DIN2, switch 3404 corresponding to DIN2 is closed and the other switches are open. During the fourth pulse, for time period t_DIN3, switch 3404 corresponding to DIN3 is closed and the other switches are open. At the end of the process, the values are summed to generate a digital output, where a weighting process is applied to the DIN value based on the relative bit position of DIN. For example, DOUT can be equal to 8*I_DIN3+4*I_DIN2,+2*I_DIN1+1*I_DIN0.
[0247] Figure 39 Output block 3900 is shown (which is...) Figure 32 The implementation of output block 3205 in the VMM array). Output block 3900 receives data from the VMM array (such as...). Figure 32 The output current of array 3204 in the diagram is shown here as ICELL 3901. Output block 3900 includes D / A converter 3902, shifter 3903, adder 3904 and output register 3905.
[0248] Here, it is assumed that the input blocks of the VMM (such as...) Figure 32 The input to input block 3201 of VMM 3204 is DIN[n:0], where n is the binary exponent of the input bits, and there are a total of i bits, where i can be in the range of 1 to n+1. For example, if i = 4, the input will be four input bits DIN3, DIN2, DIN1 and DIN0. Each input bit DINx is applied to input block 3201 of VMM 3204 at a time.
[0249] Input block 3201 converts DINx into an input signal (using one of the embodiments described herein or other known techniques), which is applied to the terminals of a selected cell in array 3204 (where the selected cell is selected by word line decoder 3203 and a selected bit line (not shown)). In one embodiment, the input signal is a single pulse of variable duration, as shown in Table 13 for an exemplary 4-bit input. The input signal of pulse TPULSE (for row inputs of the VMM array) has a width proportional to the decimal value (0 to 15) of the data input DIN[3:0].
[0250] Table 13: Example table of 4-bit inputs with pulses
[0251] DIN3 DIN2 DIN1 DIN0 TPULSE or pulse 0 0 0 0 0 0 0 0 1 1X 0 0 1 0 2X 0 0 1 1 3X 0 1 0 0 4X 0 1 0 1 5X 0 1 1 0 6X 0 1 1 1 7X 1 0 0 0 8X 1 0 0 1 9X 1 0 1 0 10X 1 0 1 1 11X 1 1 0 0 12X 1 1 0 1 13X 1 1 1 0 14X 1 1 1 1 15X
[0252] In another implementation, the input signal is an analog bias voltage, as shown in Table 14A for an exemplary 4-bit input. The input signal can have, for example, 16 voltage levels linearly spaced for a cell operating in the linear region. Alternatively, for a cell operating in the subthreshold region, the input signal can be logarithmically spaced (meaning the voltage value is proportional to the logarithm of the cell current), for example, for a binary current value, VCGINk = VCGIN(k-1) – (1 / n*Vt)*LN2, where VCGIN is the voltage at the corresponding CG terminal.
[0253] Table 14A: Example table of 4-bit inputs with analog bias levels
[0254] DIN3 DIN2 DIN1 DIN0 VCGIN 0 0 0 0 VCGIN0 0 0 0 1 VCGIN1 0 0 1 0 VCGIN2 0 0 1 1 VCGIN3 0 1 0 0 VCGIN4 0 1 0 1 VCGIN5 0 1 1 0 VCGIN6 0 1 1 1 VCGIN7 1 0 0 0 VCGIN8 1 0 0 1 VCGIN9 1 0 1 0 VCGIN10 1 0 1 1 VCGIN11 1 1 0 0 VCGIN12 1 1 0 1 VCGIN13 1 1 1 0 VCGIN14 1 1 1 1 VCGIN15
[0255] The 4-bit input DIN[3:0] of a specific row will select one of 16 voltage levels (e.g., VCGIN0... or VCGIN15) and apply it to that row of the VMM array. In one implementation, this operation is performed on all four input data bits simultaneously, meaning that the four input data bits will be converted to one of 16 possible voltage levels and applied to the row. In an alternative implementation, one data input bit is applied at a time in a sequential manner (input bit-by-bit operation), followed by an analog operation (in the analog domain). Figure 33 , Figure 34 or in the numeric field ( Figure 35 , Figure 39 The results of each data input are summed together. Optionally, each data input bit can be weighted based on its bit position. For example, such as by using... Figure 39The output block 3900 in the middle may apply voltage VCGIN1 as input to the row of the VMM array if the least significant bit is set to "1", and may apply voltage VCGIN8 as input to the row of the VMM array if the least significant bit is set to "1".
[0256] In another embodiment, the input signals to the input blocks of the array are exemplary 4-bit inputs shown in Table 14B for bit-by-bit operation of the inputs at a constant analog bias voltage of the operating unit in a linear or subthreshold or any region (e.g., operation of the DIN0, DIN1, DIN2, DIN3 inputs in sequence).
[0257] Table 14B: Example table of 4-bit inputs with a single analog bias level for bit-by-bit operation
[0258] DIN3 DIN2 DIN1 DIN0 VCGIN 0 0 0 0 VCGIN1 0 0 0 1 VCGIN1 0 0 1 0 VCGIN1 0 0 1 1 VCGIN1 0 1 0 0 VCGIN1 0 1 0 1 VCGIN1 0 1 1 0 VCGIN1 0 1 1 1 VCGIN1 1 0 0 0 VCGIN1 1 0 0 1 VCGIN1 1 0 1 0 VCGIN1 1 0 1 1 VCGIN1 1 1 0 0 VCGIN1 1 1 0 1 VCGIN1 1 1 1 0 VCGIN1 1 1 1 1 VCGIN1
[0259] The binary weighted result of each input bit DIN in the analog domain (such as by using, etc.) Figure 34 The current summer shown) or in the digital domain (such as by using Figure 35 or Figure 44 (The implementation plan) seeks a settlement together. Figure 44 The digital summer 4400 is shown, which is connected to... Figure 35 The digital summer in this example is the same as the 3500, except that specific weights have been assigned to each output stream generated in response to the input bits.
[0260] In another embodiment, the input signals to the input blocks of the array are exemplary 4-bit inputs, as shown in Table 14C, for input multi-bit operations (e.g., DIN3 together with DIN2, and DIN1 together with DIN0) and having four analog bias levels. In one embodiment, for cells operating in the linear region (e.g., 0V, 25V, 0.5V, 1.0V), the four analog levels are linearly spaced to ensure linearly equal scaling of the output cell current. In another embodiment, for cells operating in the subthreshold region, the levels are logarithmically spaced to ensure linear scaling of the output cell current, meaning, for example, that the voltage value is proportional to the logarithm of the current of the cell operating in the subthreshold region, e.g., for a binary current value, VCGINk = VCGIN(k-1) – (1 / n*Vt)*LN 2.
[0261] Table 14C: Example table of 4-bit inputs with analog bias levels for bit-by-bit operation
[0262]
[0263] The binary weighted result of each multi-bit DIN[1:0] and DIN[3:2] in the analog domain (e.g.) Figure 34In the current summator in the current domain or in the digital domain ( Figure 35 , Figure 39 They are added together in ( ).
[0264] In another implementation, the input signal is a mixed signal that includes an analog bias voltage component with added pulse components (analog bias power supply modulated pulse), as shown in Table 15 for an exemplary 4-bit input with analog bias power supply and pulse. The pulse can be modulated by the length (TPULSE) or by the number of pulses (PULSES) within a predetermined time period.
[0265] Table 15: Example of a mixed input with 4-bit inputs having analog bias levels and pulses.
[0266] DIN3 DIN2 DIN1 DIN0 VCGIN TPULSE or pulse 0 0 0 0 VCGIN1 0X 0 0 0 1 VCGIN1 1X 0 0 1 0 VCGIN1 2X 0 0 1 1 VCGIN1 3X 0 1 0 0 VCGIN1 4X 0 1 0 1 VCGIN1 5X 0 1 1 0 VCGIN1 6X 0 1 1 1 VCGIN1 7X 1 0 0 0 VCGIN2 4X 1 0 0 1 VCGIN2 4.5X 1 0 1 0 VCGIN2 5X 1 0 1 1 VCGIN2 5.5X 1 1 0 0 VCGIN2 6X 1 1 0 1 VCGIN2 6.5X 1 1 1 0 VCGIN2 7X 1 1 1 1 VCGIN2 7.5X
[0267] In the table above, the value "4.5X" means a pulse with a width equal to 4.5 times the width of a 1X pulse, or four 1X pulses plus a pulse with a width of half a 1X pulse.
[0268] The input data is divided into multiple input datasets, each of which is assigned a specific voltage bias level. For example, for an 8-bit input DIN[7:0], the first row power supply VCGIN1 is applied to the input bits in set DIN[3:0], and unlike VCGIN1, the second row power supply VCGIN2 is applied to the input bits in set DIN[7:4]. In this exemplary implementation where the two binary input sets are partitioned, the analog bias power supply VCGIN2 (for the second data input set DIN[7:4]) generates a cell current that is 2x the cell current generated by the analog bias power supply VCGIN1 (for the first data input set DIN[3:0]). For example, for cells operating in the linear region, the ratio of VCGIN2 / VCGIN1 could be 2x. Because different VCGIN voltages are applied to each dataset, the same number of pulses with the same period can be applied to members of data input set DIN[7:4] and members of data input set DIN[3:0], as the difference in VCGIN will distinguish between the two members.
[0269] In a variation of this implementation, two partitions can be used for each input dataset, with each partition corresponding to a different analog bias voltage. This means using four different voltages: VCGIN1, VCGIN2, VCGIN3, and VCGIN4. This further reduces the number of pulses / cycles required. In other words, the four different data input values can use the same number of pulses / cycles because the differences in VCGIN will distinguish the four different values.
[0270] Refer again Figure 39The output block 3900 receives the output current ICELL from the VMM in response to the input DINx. The D / A converter 3902 converts ICELL into the digital form DOUT[m:0], which represents the digital value of the output generated in response to DINn, where each DOUT_n is a set of one or more output bits.
[0271] Shifter 3903, adder 3904, and register 3905 operate to apply different weights to each output DOUT[m:0]_n generated in response to each input bit DINn. In the simple example of n=4, shifter 3902, adder 3904, and register 3905 perform the following actions:
[0272] (1) In response to DIN0, shifter 3903 receives DOUT_0[m:0]0 and does not shift it to produce the result of (1);
[0273] (2) In response to DIN1, shifter 3903 receives DOUT_1[m:0] and shifts it one bit to the left, and adder 3904 adds the shift result to the result of (1) to produce the result of (2);
[0274] (3) In response to DIN2, shifter 3903 receives DOUT_2[m:0] and shifts it two bits to the left, and adder 3904 adds the shift result to the result of (2) to produce the result of (3);
[0275] (4) In response to DIN3, shifter 3903 receives DOUT_3[m:0] and shifts it three bits to the left, and adder 3904 adds the shifted result to the result of (3) to produce the result of (4), which is the final result DOUT[m:0].
[0276] When the DIN[n:0] input is combined with an analog voltage level to represent the binary weight of each data input, only addition is required, without bit-by-bit shifting for such mixed inputs. The output register 3905 stores and outputs the result of (4) as DOUT.
[0277] Additional input and output circuits
[0278] Figure 35 A digital summer 3500 is shown, which receives multiple digital values, adds them together, and generates an output DOUT representing the sum of the inputs. The digital summer 3500 can be used during verification or read operations. Figure 35Example of a 4-bit digital value including bits DOUT0, DOUT1, DOUT2, and DOUT3 is shown. Each bit is generated from the evaluation input pulse. Each bit can be weighted based on its bit position, where a weight t_DINn of 2^n is applied to bit DINn. For example, DOUT3 can be multiplied by 2^3 (=8), DOUT2 can be multiplied by 2^2 (=4), DOUT1 can be multiplied by 2^1 (=2), and DOUT0 can be multiplied by 2^0 (=1).
[0279] Figure 36A This illustrates an integrating dual-slope ADC 3600 applied to the output neuron to convert the array cell current into a digital output bit DOUTx. The integrator, consisting of an integrating operational amplifier 3601 and an integrating capacitor 3602, integrates the cell current ICELL with the reference current IREF. (The diagram is incomplete and requires further context.) Figure 36B As shown, during a fixed time t1 (integration time), the cell current is integrated upwards (VOUT rises), and then integrated downwards with a reference current applied during time t2 (VOUT falls, non-integration time). The current Icell = t2 / t1 * IREF. For example, for t1, 1024 cycles are used for a 10-bit digital resolution, and for t2, the number of cycles varies from 0 to 1024 cycles depending on the Icell value. A digital counter 3630, enabled by signal EC, is used to generate the digital output bit DOUTx during cycle t2.
[0280] Figure 36C This diagram illustrates an integrating single-slope ADC 3660 applied to the output neuron to convert the array cell current into digital output bits. The integrator, consisting of an integrating operational amplifier 3661, an integrating capacitor 3662, a switch S3, and a comparator 3664, integrates the array cell current ICELL 3666 and generates the output signal EC.
[0281] like Figure 36D As shown in Figure 3670, the cell current ICELL1 is integrated upwards during time t1 (VOUT rises until it reaches VREF2, which corresponds to...). Figure 36C The value of EC changes, and during time t2, another unit current ICELL2 is integrated. Unit current ICELL = Cint * VREF2 / t, where t is the time elapsed before EC changes value. A pulse counter 3668, enabled by signal EC, is used to count the number of pulses during integration time t, and the number of pulses represents the digital output value DOUTx.
[0282] In the example shown, the digital output of t1 will be less than the digital output of t2 because the count of t1 will be less than the count of t2. This also means that the cell current ICELL1 during time period t1 is greater than the cell current ICELL2 during time period t2. An initial calibration is performed to calibrate the value of the integrating capacitor 3662 using the reference current Iref and the fixed time Tref, Cint = Tref * Iref / VREF2.
[0283] Figure 36E An integrating dual-slope ADC 3680 is shown, comprising an ICELL 3684, a comparator 3681, switches S1, S2, and S3, a capacitor 3682, and a reference current source 3683. The integrating dual-slope ADC 3680 receives the output neuron current (ICELL 3684) and generates the output EC. The integrating dual-slope ADC 3680 does not use an integrating operational amplifier. The cell current or reference current is integrated directly with respect to the capacitor 3682. A pulse counter 3687, enabled by the signal EC, is used to count pulses during the integration time, where the integration time ends when EC changes value. The output of the pulse counter is a digital output DOUTx representing the ICELL. The current ICELL = t2 / t1 * IREF.
[0284] Figure 36F An integrating single-slope ADC 3690 is shown, comprising an ICELL 3694, a comparator 3691, switches S2 and S3, and a capacitor 3692. The integrating single-slope ADC 3690 receives the output neuron current (ICELL 3694) and generates the output EC. The integrating single-slope ADC 3690 does not use an integrating operational amplifier. The unit current is integrated directly with respect to capacitor 3692. A pulse counter 3697, enabled by the signal EC, is used to count the digital output pulses during the integration time, where the integration time ends when EC changes value. The output of the pulse counter is the digital output DOUTx representing the ICELL. The unit current ICELL = Cint * VREF² / t.
[0285] Figure 37A This diagram illustrates a SAR (Successive Approximation Register) ADC 3700 applied to the output neurons to convert cell (array) current into digital output bits. The cell current can be lowered across a resistor to convert to a VCELL. Alternatively, the cell current can be used to charge the S / H capacitor to convert to a VCELL. A binary search is used to compute the bit starting from the MSB (most significant bit). Based on the digital bits from the SAR 3701, the DAC 3702 is used to set the appropriate analog reference voltage for the comparator 3703. The output of the comparator 3703 is fed back to the SAR 3701 to select the next analog level. Figure 37BAs shown, for the example of a 4-bit digital output, there are 4 evaluation cycles: the first pulse evaluates DOUT3 by setting the analog level to the middle, and then the second pulse evaluates DOUT2 by setting the analog level to the middle of either the upper or lower half. DOUT3 and DOUT4 similarly divide the range in half. Another implementation can use SAR CDAC (charge redistribution CDAC) to convert the neuron current into a digital output bit.
[0286] Figure 38 This diagram illustrates a Σ-Δ ADC 3800 applied to the output neuron to convert cell current into digital output bits. An integrator, consisting of operational amplifier 3801 and capacitor 3805, integrates the sum of the current from the selected cell current and the reference current from the 1-bit current DAC 3804. Comparator 3802 compares the integrated output voltage with the reference voltage. A clock-controlled DFF 3803 provides the digital output stream based on the output of comparator 3802. The digital output stream typically enters a digital filter before the output digital output bits.
[0287] Figure 45 Output block 4500 is shown. Output block 4500 includes a current-to-voltage converter 4501 and an analog-to-digital converter 4502. Output block 4500 receives an output current, shown here as Ineu, from a VMM array, where the output current represents the output value from the VMM array used to read or verify an operation being performed. Current-to-voltage converter 4501 converts the output current Ineu into a voltage signal, shown here as VOUT, such that voltage VOUT represents the output current Ineu from the VMM. A / D converter 4502 converts voltage VOUT into digital form and outputs a digital output, shown here as DOUT.
[0288] In one specific implementation of output block 4500, current-to-voltage converter 4501 receives a current sequence from one or more selected non-volatile memory cells in an array in response to an input sequence, and converts the current sequence into a voltage sequence. An A / D converter then converts the voltage sequence received from current-to-voltage converter 4501 into multiple output bits, wherein the multiple output bits are generated based on a weighted sum of the voltage sequences.
[0289] Figure 46 This illustrates a lossless (no I*Rmux drop) current-to-voltage converter 4600, which is... Figure 45An implementation of the current-to-voltage converter 4501 is described. The current-to-voltage converter 4600 includes an operational amplifier 4601; resistors 4602, 4603, 4604, and 4605; and switches 4606, 4607, 4608, 4609, 4610, 4611, 4612, and 4613. A lossless variable resistor unit consists of a resistor and two switches (muxes), such as resistor 4602 and switches 4610 and 4606, one of which carries current (switch 4610) and the other does not (switch 4606), and the output is taken from the switch that does not carry current.
[0290] The current-to-voltage converter 4600 receives a current Ineu and outputs a voltage VOUT. It is worth noting that due to the multiplexing technique in the output voltage VOUT (the I*R drop of the switches), VOUT can be measured without experiencing a voltage drop, meaning the output voltage is sampled outside the feedback loop or current loop. For example, when switches 4613 and 4609 are closed (on) and the other switches are open (off), VOUT equals VREF + (R4602 + R4603 + R4604 + R4605) * (Ineu). Similarly, when switches 4610 and 4606 are closed (on) and the other switches are open (off), VOUT is VREF + (R4602 * Ineu). After the current Ineu is converted to voltage VOUT, the voltage VOUT can be sampled and held by opening all switches. In this case, the voltage VOUT is referenced to the reference level VREF.
[0291] Figure 47 The current-to-voltage converter 4700 is shown. Figure 45 An implementation of the current-to-voltage converter 4501 is described. The current-to-voltage converter 4700 includes a comparator 4701; switches 4702, 4705, 4706, and 4707; an S / H (sample and hold) capacitor 4703; and a variable resistor 4704. The current-to-voltage converter 4700 receives a current Ineu and outputs an S / H voltage VOUT. The lossless variable resistor 4704 is similar to... Figure 46Resistor 4650 is used. During the current-to-voltage conversion, current Ineu flows through resistor 4704 to generate an output voltage = R4704 * Ineu. S1 (4702), S2 (4705), and S3 (4707) are closed (on), while S4 (4706) is open (off). Since S3 does not carry current, the output VOUT = R4704 * Ineu. During the hold period, S4 is closed (on), and S1, S2, and S3 are open (off), and VOUT remains across capacitor 4703. It is worth noting that because VOUT is measured (enabled) outside the current-carrying switch, VOUT can be measured without experiencing a voltage drop.
[0292] Figure 48 The current-to-voltage converter 4800 is shown. Figure 45 An implementation of the current-to-voltage converter 4501 is described. The current-to-voltage converter 4800 includes an operational amplifier 4801; switches 4802 and 4805; an S / H capacitor 4803; and a variable resistor 4804. The current-to-voltage converter 4800 receives a current Ineu and outputs a voltage VOUT. Notably, since VOUT is measured (enabled) externally to the current-carrying switches (multiplexers), VOUT can be measured without a voltage drop. During the current-to-voltage conversion, the current Ineu flows through resistor 4804 to generate an output voltage = R4804 * Ineu, and S1 (4802) and S2 (4805) are closed (on). Since S2 does not carry current, the output VOUT = R4804 * Ineu. During the hold period, S1 and S2 are open (off), and VOUT remains across capacitor 4803, in which case the S / H voltage VOUT is referenced to ground level.
[0293] Alternatively, the current-to-voltage converters 4700 and 4800 do not include a variable resistor 4704 or 4804. In this case, the S / H capacitor 4703 or 4803 is charged with Ineu by a variable signal pulse enable switch 4702 or 4802 controlled by a predetermined, finely adjustable timing pulse value, wherein the timing pulse value is selected based on the Ineu dynamic current range. In this case, the S / H capacitor can be a variable capacitor with a finely adjustable capacitance value.
[0294] Figure 49A The current-to-voltage converter 4900 is shown. Figure 45An implementation of the current-to-voltage converter 4501 is described. The current-to-voltage converter 4900 includes switches 4901 and 4902; a variable resistor 4903; and a capacitor 4904. The current-to-voltage converter 4800 receives a current Ineu and outputs a voltage VOUT. During the current-to-voltage conversion, the current Ineu flows through the variable resistor 4903 to generate an output voltage = R4903 * Ineu, S1 (4901) closes (is on), and the output VOUT = R4804 * Ineu. During the hold period, S1 opens (is off), VOUT remains across the capacitor 4904, and the switch (multiplexer) inside the variable resistor 4903 (e.g., ...) Figure 49B The S1a / S2a / S3a / S4a in the variable resistor 4950 are also open (closed). It is worth noting that since VOUT is measured (enabled) outside the current-carrying switch (multiplexer), VOUT can be measured without suffering a voltage drop.
[0295] Figure 49B The variable resistor 4950 used in 4903 is shown. The variable resistor 4950 includes switches S1a, S2a, S3a, S4a, S1b, S2b, S3b and S4b.
[0296] Figure 50 The current-to-voltage converter 5000 is shown. Figure 45 An implementation of the current-to-voltage converter 4501 is described. The current-to-voltage converter 5000 includes an operational amplifier (op amp) 5001; a level shifter 5002 (gate of transistor 5004 = drain of transistor 5004 - Voffset); NMOS transistors 5003, PMOS transistors 5004 and 5005; switches 5006 and 5007; and a variable resistor 5008 (which can be configured as described above regarding...). Figure 49BThe current-to-voltage converter 5000 receives a current Ineu and outputs a voltage VOUT. It is noteworthy that VOUT can be measured similarly without experiencing a voltage drop (as shown in Figure 49). In this case, the S / H voltage VOUT is referenced to ground level. Operational amplifier 5001 and transistor 5003 impose a fixed bias voltage VREF 5010 on the bit lines of the array during readout operations. PMOS transistors 5004 and 5005 act as variable-scale current mirrors to mirror the array output current (Ineu) into variable resistor 5008 and S / H capacitor 5009. Alternatively, the current-to-voltage converter 5000 may not include variable resistor 5008, in which case the mirrored Ineu charges S / H capacitor 5009 via a variable signal pulse enable switch 5006 controlled by a predetermined, finely tuned timing pulse value, where the timing pulse value is selected based on the dynamic current range of Ineu. In this case, the S / H capacitor may be a variable capacitor with a finely tuned capacitance value.
[0297] Figure 51 The current-to-voltage converter 5100 is shown, which is Figure 45 An implementation scheme for the current-to-voltage converter 4501 is described. The current-to-voltage converter 5100 includes an operational amplifier 5101; an NMOS transistor 5102; and a variable resistor 5103 (which can be configured as described above). Figure 49B The following are described for implementation: switches 5104 and 5105; capacitor 5106; and voltage source VH. The current-to-voltage converter 5100 receives current Ineu and outputs voltage VOUT. It is worth noting that, with... Figure 50 Similarly, VOUT can be measured without experiencing a voltage drop, in which case the S / H voltage VOUT is referenced to the high-voltage supply VH. Operational amplifier 5101 and transistor 5102 are used to impose a fixed bias VREF 5110 on the bit line during read operations. Alternatively, the current-to-voltage converter 5100 does not include a variable resistor 5103; in this case, the S / H capacitor 5106 is Ineu-discharged by a variable signal pulse enable switch 5106 controlled by a predetermined, finely tuned pulse timing value, where the timing value is selected based on the Ineu dynamic current range. In this case, the S / H capacitor can be a variable capacitor with a finely tuned capacitance value.
[0298] Figure 52A This illustrates a hybrid serial analog-to-digital converter 5200, which utilizes the features described above. Figure 48The described lossless current-to-voltage converter 4800 consists of a current-to-voltage converter 5220, a comparator 5209, current sources 5206 and 5207, and switches S1 and S2. The current-to-voltage converter 5220 can be modified to use the method described above. Figure 46 , 47 It is implemented using any of the current-to-voltage converters described in 49, 50, and 51. The current-to-voltage converter 5220 includes an operational amplifier 5201, a switch 5205, a variable resistor 5204, and a sample-and-hold capacitor 5203.
[0299] Figure 52B A timing diagram 5250 illustrating the operation of a hybrid serial ADC converter 5200 is shown, wherein during time period t1, switch 5202 (S2) is closed while switch 5202 (S1) remains open, and then held by capacitor 5203 by opening switch 5208 (S2) to convert current ICELL 5206 into voltage VOUT. During period t2, IREF 5207 is enabled by closing switch 5208 (S1) to begin the deintegration period, which is the counting period. During this counting period, a clock pulse (not shown) denoted as t2 is counted by pulse counter 5210, which is converted into digital bit DOUT, i.e., the number of counts, as long as EC is high. The digital counter and clock used to convert the comparator output EC into digital bits, as well as the control logic components, are not shown.
[0300] Figure 52C Another timing diagram 5250 shows the operation of the hybrid serial ADC converter 5200, where the t1 period is... Figure 52B The periods are the same. During time period t2, the voltage VOUT is converted into the digital bit DOUT by ramping up the reference voltage VREF2 from a reference level such as VREF1 to its maximum level. During time period t2, a digital counter (not shown) counts the pulses, and the output of the digital counter is the output DOUT. When VREF2 exceeds VOUT, time period t2 ends, which will result in... Figure 52A The value of EC in the middle changes.
[0301] It should be noted that, as used herein, the terms “above” and “on” both encompass “directly on” (without intermediate material, elements, or space between) and “indirectly on” (with intermediate material, elements, or space between). Similarly, the term “adjacent” includes “directly adjacent” (without intermediate material, elements, or space between) and “indirectly adjacent” (with intermediate material, elements, or space between), “mounted to” includes “directly mounted to” (without intermediate material, elements, or space between) and “indirectly mounted to” (with intermediate material, elements, or space between), and “electrically coupled to” includes “directly electrically coupled to” (without intermediate material or elements electrically connecting the elements together) and “indirectly electrically coupled to” (with intermediate material or elements electrically connecting the elements together). For example, forming an element “above the substrate” can include forming an element directly on the substrate without intermediate material / elements between them, and forming an element indirectly on the substrate with one or more intermediate materials / elements between them.
Claims
1. A system for a non-volatile memory cell array, the system comprising: A vector-matrix multiplication array in an artificial neural network, the array comprising multiple non-volatile memory units; An input block is configured to receive a digital input DIN[n:0] comprising n+1 bits to apply the digital input to the array one bit at a time, the application of the digital input to the array one bit at a time being performed by converting the bit into an analog input signal and applying the analog input signal to a row of nonvolatile memory cells in the array, wherein a "1" of the bit is converted into an analog input signal, the magnitude of the voltage of the analog input signal depending on the bit position in the digital input; as well as An output block, configured to receive an output current from the array in response to each analog input signal applied to the array, and to combine the output currents to form an output DOUT [m:0], the output block comprising: An analog-to-digital converter, the analog-to-digital converter being used to generate a digital value DOUT_n in response to each analog input signal applied to the array; An adder that adds the digital values from the analog-to-digital converter to generate the output DOUT [m:0]; and A register that stores the output DOUT [m:0].
2. The system according to claim 1, wherein the non-volatile memory cell is a split-gate flash memory cell.
3. The system according to claim 1, wherein the non-volatile memory cell is a stacked gate flash memory cell.
4. The system according to claim 1, wherein the non-volatile memory cell is a split-gate flash memory cell.
5. The system of claim 1, wherein the non-volatile memory cell is a stacked gate flash memory cell.
6. A system for a non-volatile memory cell array, the system comprising: A vector-matrix multiplication array in an artificial neural network, the array comprising multiple non-volatile memory units; An input block is configured to receive a digital input DIN[n:0] comprising n+1 bits to apply the digital input to the array one bit at a time, the application of the digital input to the array one bit at a time being performed by converting the bit into an analog input signal and applying the analog input signal to a row of nonvolatile memory cells in the array, wherein a "1" of the bit is converted into an analog input signal, the magnitude of the voltage of the analog input signal depending on the bit position in the digital input; as well as An output block, configured to receive an output current from the array in response to each analog input signal applied to the array, and to combine the output currents to form an output DOUT [m:0], the output block comprising: A current-to-voltage converter for receiving current from one or more selected non-volatile memory cells in an array in response to an input applied to the array and converting the current into a voltage, the current-to-voltage converter including sample and hold circuitry for holding the voltage; An analog-to-digital converter (ADC) is used to convert the voltage into multiple output bits.
7. The system of claim 6, wherein the current-to-voltage converter includes a lossless variable resistor unit that provides the voltage.
8. The system of claim 6, wherein the analog-to-digital converter is a hybrid serial analog-to-digital converter.
9. The system of claim 6, wherein the analog-to-digital converter performs counting to convert the voltage into digital bits.
10. The system of claim 9, wherein the period of the counting is determined by a reference current that discharges the holding capacitor.
11. The system of claim 9, wherein the period of the counting is determined by a reference voltage gradient until it crosses a threshold voltage.
12. The system of claim 6, wherein the non-volatile memory cell is a split-gate flash memory cell.
13. The system of claim 6, wherein the non-volatile memory cell is a stacked gate flash memory cell.
14. The system of claim 6, wherein the current-to-voltage converter converts the current sequence into a voltage sequence without a voltage drop from the output current of the output block.
15. The system of claim 6, wherein the current-to-voltage converter includes an operational amplifier that provides the voltage.
16. The system of claim 6, wherein the current-to-voltage converter includes a capacitor charged by a control switch from current from one or more selected non-volatile memory cells.
17. The system of claim 16, wherein the capacitor is a variable capacitor.
18. The system of claim 17, wherein the variable capacitor is adjustable.
19. The system of claim 16, wherein the capacitor is charged during a first time period that ends when the voltage exceeds a reference voltage.
20. The system of claim 19, wherein the current-to-voltage converter includes an operational amplifier for comparing the voltage with the reference voltage.
21. The system of claim 19, wherein the capacitor is discharged during a second time period that ends when the voltage provided by the current-to-voltage converter reaches ground.
22. The system of claim 21, wherein the counter counts clock pulses during the second time period to output a count, wherein the count is a digital version of the voltage.