Vector matrix multiplication array using analog input

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
Non-volatile memory arrays are used in artificial neural networks to address the inefficiencies of existing hardware, enhancing energy efficiency and reducing complexity by performing calculations directly in memory, thus improving neural network performance.

JP7875985B2Active Publication Date: 2026-06-18SILICON STORAGE TECHNOLOGY INC

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: SILICON STORAGE TECHNOLOGY INC
Filing Date: 2022-07-15
Publication Date: 2026-06-18

Application Information

Patent Timeline

15 Jul 2022

Application

18 Jun 2026

Publication

JP7875985B2

IPC: G06N3/065; G11C11/54; G06G7/60; G06G7/16; G11C16/08; G06G7/184

CPC: G06F17/16; G06N3/0464; G06N3/048; G06N3/0442; G06N3/065; G11C7/1006; G11C11/54; G06G7/16

AI Tagging

Application Domain

Read-only memories Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Memory device including a plurality of cell layers
US20260156839A1Read-only memoriesCell layerLogic circuitry
Semiconductor structure, semiconductor device, computing and storage integrated device, electronic device, and operating method thereof
CN122207079ARead-only memories Digital storage
storage device
CN122266419ARead-only memories Software engineering Control circuit
Semiconductor memory device
CN116935934BRead-only memories Digital storage Switching signal Hemt circuits
Non-volatile memory device
CN122157736ARead-only memories Cell region Hemt circuits

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing artificial neural networks face challenges in high-performance information processing due to the lack of suitable hardware technology, particularly in terms of energy efficiency and scalability, as they rely on bulky CMOS-implemented synapses and digital supercomputers.

Method used

Utilizing non-volatile memory arrays as synapses in artificial neural networks, allowing for precise tuning of memory cells independently and continuously, enabling efficient vector-matrix multiplication without separate multiplication and addition logic circuits.

Benefits of technology

This approach enhances energy efficiency and reduces hardware complexity by performing calculations directly in memory, eliminating the need for additional logic circuits and enabling precise weight tuning for neural network synapses.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007875985000011
Figure 0007875985000012
Figure 0007875985000013

Patent Text Reader

Abstract

Numerous examples of artificial neural networks comprising vector matrix multiplication arrays utilizing analog inputs are disclosed. In one example, a system comprises a vector matrix multiplication array comprising an array including a plurality of non-volatile memory cells arranged in rows and columns, a capacitor having a first terminal and a second terminal coupled to a common potential, a row decoder for enabling application of an input signal to the first terminal of the capacitor in response to an address, and a buffer coupled to the first terminal of the capacitor for generating an output voltage for each row of the vector matrix multiplication array.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] (Claim of Priority) This application claims priority to U.S. Provisional Patent Application No. 63 / 328,473, filed Apr. 7, 2022, entitled “Artificial Neural Network Comprising Vector-By-Matrix Multiplication Arrays Utilizing Analog Inputs and Analog Outputs,” and U.S. Patent Application No. 17 / 847,486, filed Jun. 23, 2022, entitled “Vector-By-Matrix-Multiplication Array Utilizing Analog Inputs.”

[0002] (Field of the Invention) Numerous examples of artificial neural networks comprising vector-matrix multiplication arrays utilizing analog inputs are disclosed.

Background Art

[0003] An artificial neural network mimics a biological neural network (the central nervous system of an animal, particularly the brain) and is used to estimate or approximate a function that depends on a number of inputs and is generally unknown. An artificial neural network typically includes layers of interconnected “neurons” that exchange messages with each other.

[0004] FIG. 1 shows an artificial neural network, in which the circles illustrate layers of inputs or neurons. Connections (referred to as synapses) are represented by arrows and have numerical weights that can be tuned based on experience. Thereby, the neural network adapts to the inputs and becomes learnable. Typically, a neural network includes multiple layers of inputs. Typically, there is one or more intermediate layers of neurons and an output layer of neurons that provides the output of the neural network. Each level of neurons makes decisions individually or collectively based on the data received from the synapses.

[0005] One of the main challenges in the development of artificial neural networks for high-performance information processing is the lack of suitable hardware technology. In fact, practical neural networks rely on a very large number of synapses, which enables high connectivity between neurons, i.e., a very high degree of parallelization of computational processing. In principle, such complexity can be realized by a digital supercomputer or a dedicated graphics processing unit cluster. However, in addition to high costs, these approaches also suffer from poor energy efficiency compared to biological networks, which mainly perform low-precision analog calculations and consume far less energy. CMOS analog circuits have been used in artificial neural networks, but most CMOS-implemented synapses have been too bulky when assuming a large number of neurons and synapses.

[0006] The applicant has previously disclosed in U.S. Patent Application Publication No. 2017 / 0337466 (A1), incorporated herein by reference, an artificial (analog) neural network that utilizes one or more non-volatile memory arrays as synapses. The non-volatile memory array operates as an analog neural memory and includes non-volatile memory cells arranged in rows and columns. The neural network includes a first plurality of synapses configured to receive a first plurality of inputs and generate therefrom a first plurality of outputs, and a first plurality of neurons configured to receive the first plurality of outputs. The first plurality of synapses includes a plurality of memory cells, each of the memory cells including a spaced source region and a drain region formed in a semiconductor substrate with a channel region extending therebetween, a floating gate insulated and disposed above a first portion of the channel region, and a non-floating gate insulated and disposed above a second portion of the channel region. Each of the plurality of memory cells stores a weight value corresponding to a plurality of electrons in the floating gate. The plurality of memory cells multiply the stored weight values by the first plurality of inputs to generate the first plurality of outputs. <Non-volatile memory cell>

[0007] Non-volatile memory is well known. For example, U.S. Patent No. 5,029,130 ("'130"), incorporated herein by reference, discloses an array of split-gate non-volatile memory cells, a type of flash memory cell. Such a memory cell 210 is shown in Figure 2. Each memory cell 210 includes a source region 14 and a drain region 16 formed in a semiconductor substrate 12, with a channel region 18 between the source region 14 and the drain region 16. A floating gate 20 is formed insulated above a first portion of the channel region 18 (and controlling the conductivity of the first portion of the channel region 18) and extends above a portion of the source region 14. A word line terminal 22 (typically coupled to a word line) has a first portion disposed insulated above a second portion of the channel region 18 (and controlling the conductivity of the second portion of the channel region 18) and a second portion extending upward above the floating gate 20. The floating gate 20 and word line terminal 22 are insulated from the substrate 12 by the gate oxide. The bit line 24 is coupled to the drain region 16.

[0008] By applying a high-voltage positive voltage to the word line terminal 22, erasure is performed on the memory cell 210 (electrons are removed from the floating gate), causing the electrons in the floating gate 20 to pass through the insulator between them to the word line terminal 22 via a Fowler-Nordheim (FN) tunnel.

[0009] The memory cell 210 is programmed by source-side injection (SSI) using hot electrons by applying a positive voltage to the word line terminal 22 and a positive voltage to the source region 14 (electrons are added to the floating gate). The electron flow flows from the drain region 16 towards the source region 14. The electrons are accelerated and generate heat when they reach the gap between the word line terminal 22 and the floating gate 20. Some of the heated electrons are injected into the floating gate 20 via the gate oxide due to the electrostatic attraction from the floating gate 20.

[0010] The memory cell 210 is read by applying a positive read voltage to the drain area 16 and the word line terminal 22 (turning on the portion of the channel area 18 below the word line terminal). When the floating gate 20 is positively charged (i.e., electrons are erased), the portion of the channel area 18 below the floating gate 20 is also turned on, and current flows through the channel area 18, which is detected as the erased state, or the "1" state. When the floating gate 20 is negatively charged (i.e., programmed with electrons), the portion of the channel area below the floating gate 20 is almost or completely off, and no (or very little) current flows through the channel area 18, which is detected as the programmed state, or the "0" state.

[0011] Table 1 shows typical voltage / current ranges that may be applied to the terminals of the memory cell 210 to perform read, erase, and program operations. Table 1: Operation of flash memory cell 210 in Figure 2 [Table 1]

[0012] Other types of flash memory cells, other split-gate memory cell configurations, are also known. For example, Figure 3 illustrates a four-gate memory cell 310 comprising a source region 14, a drain region 16, a floating gate 20 above a first portion of the channel region 18, a selection gate 22 (typically coupled to a word line (WL)) above a second portion of the channel region 18, a control gate 28 above the floating gate 20, and an erase gate 30 above the source region 14. This configuration is described in U.S. Patent No. 6,747,310, which is incorporated herein by reference for all purposes. Here, all gates are non-floating gates, except for the floating gate 20; that is, they are electrically connected to or can be connected to a voltage source. Programming is performed by heated electrons injecting themselves from the channel region 18 into the floating gate 20. Erasing is performed by electrons tunneling from the floating gate 20 to the erase gate 30.

[0013] Table 2 shows typical voltage / current ranges that may be applied to the terminals of the memory cell 310 to perform read, erase, and program operations. Table 2: Operation of the flash memory cell 310 in Figure 3 [Table 2]

[0014] Figure 4 illustrates a different type of flash memory cell, a 3-gate memory cell 410. Memory cell 410 is identical to memory cell 310 in Figure 3, except that memory cell 410 does not have a separate control gate. The erase and read operations (erasure occurs through the use of an erase gate) are the same as those in Figure 3, except that no control gate bias is applied. The programming operation is also performed without a control gate bias; as a result, a higher voltage is applied to the source line during programming to compensate for the lack of control gate bias.

[0015] Table 3 shows typical voltage / current ranges that may be applied to the terminals of the memory cell 410 to perform read, erase, and program operations. Table 3: Operation of flash memory cell 410 in Figure 4 [Table 3]

[0016] Figure 5 illustrates a different type of flash memory cell, a stacked gate memory cell 510. The memory cell 510 is similar to the memory cell 210 in Figure 2, except that the floating gate 20 extends above the entire channel region 18, and the control gate 22 (where coupled to the word line) extends above the floating gate 20, separated by an insulating layer (not shown). Erasing is performed by FN tunneling of electrons from the FG to the substrate, and programming is performed by channel hot electron (CHE) injection in the region between the channel 18 and the drain region 16, by electrons flowing from the source region 14 to the drain region 16, and by a read operation similar to the read operation of the memory cell 210, which has a higher control gate voltage.

[0017] Table 4 shows typical voltage ranges that can be applied to the terminals of the memory cell 510 and the circuit board 12 for performing read, erase, and program operations. Table 4: Operation of flash memory cell 510 in Figure 5 [Table 4]

[0018] The methods and means described herein, without limitation, may be applied to other non-volatile memory technologies such as FINFET split-gate flash or stack-gate flash memory, NAND flash, SONOS (silicon-oxide-nitride-oxide-silicon, charge trap in nitride), MONOS (metal-oxide-nitride-oxide-silicon, metal charge trap in nitride), ReRAM (resistive random-access memory), PCM (phase change memory), MRAM (magnetoresistive random-access memory), FeRAM (ferroelectric random-access memory), CT (charge trap) memory, CN (carbon-tube) memory, OTP (one-time programmable, bi-level or multi-level), and CeRAM (correlated electron random-access memory).

[0019] Two modifications are made to utilize a memory array containing one of the non-volatile memory cell types in the artificial neural network described above. First, lines are configured to allow each memory cell to be programmed, erased, and read individually without adversely affecting the memory state of other memory cells in the array, as will be further described below. Second, sequential (analog) programming of the memory cells is provided.

[0020] Specifically, the memory state of each memory cell in the array (i.e., the charge on the floating gate) can be changed independently and continuously with minimal disturbance to other memory cells, from a completely erased state to a fully programmed state, and vice versa. This means that the cell memory is essentially analog, or can store at least one of a large number of discontinuous values (such as 16 or 64 different values), which allows every memory cell in the memory array to be tuned very precisely and individually, and makes the memory array ideal for fine-tuning memory and the weights of neural network synapses. <Neural networks using non-volatile memory cell arrays>

[0021] Figure 6 conceptually illustrates an unrestricted example of a neural network utilizing a non-volatile memory array in this example. While this example uses a non-volatile memory array neural network for a facial recognition application, it is also possible to implement other suitable applications using a non-volatile memory array-based neural network.

[0022] S0 is the input layer, which in this example is a 32x32 pixel RGB image with 5-bit precision (i.e., three 32x32 pixel arrays, one for each color R, G, and B, with each pixel having 5-bit precision). The synapse CB1, going from input layer S0 to layer C1, scans the input image with a 3x3 pixel overlapping filter (kernel), applying different weight sets to some instances and shared weights to others, and shifts the filter by one pixel (or more than two pixels depending on the model). Specifically, the values of nine pixels in the 3x3 portion of the image (i.e., referred to as the filter or kernel) are provided to synapse CB1, where these nine input values are multiplied by the appropriate weights, and after summing the outputs of the multiplications, a single output value is determined, which is then given by the first synapse of CB1 to generate one of the pixels in the feature map of layer C1. The 3x3 filter is then shifted one pixel to the right within the input layer S0 (i.e., a column of 3 pixels is added to the right and a column of 3 pixels is dropped to the left), thereby providing the 9 pixel values of this newly positioned filter to synapse CB1, where they are multiplied by the same weights as above, determining a second single output value by the associated synapse. This process continues until the 3x3 filter has scanned the entire 32x32 pixel image of the input layer S0 for all three colors and all bits (precision values). The process is then repeated with different weight sets to generate different feature maps of layer C1 until all feature maps of layer C1 have been computed.

[0023] In this example, layer C1 contains 16 feature maps, each with 30x30 pixels. Each pixel is a new feature pixel extracted from the multiplication of the input and the kernel; therefore, each feature map is a two-dimensional array, and thus in this example, layer C1 constitutes 16 layers of two-dimensional arrays (note that the layers and arrays referred to herein are logical relationships, not necessarily physical relationships; i.e., arrays are not necessarily oriented to physical two-dimensional arrays). Each of the 16 feature maps in layer C1 is generated by one of 16 different synaptic weight sets applied to the filtered scan. All C1 feature maps can target different aspects of the same image feature, such as boundary identification. For example, a first map (generated using a first weight set shared across all scans used to generate this first map) can identify circular edges, and a second map (generated using a second weight set different from the first) can identify rectangular edges or the aspect ratio of a particular feature, etc.

[0024] Before moving from layer C1 to layer S1, an activation function P1 (pooling) is applied that pools values from non-overlapping, consecutive 2x2 regions within each feature map. The purpose of the pooling function P1 is to average neighbor positions (or use the max function), reduce dependence on edge positions, and reduce data size before moving to the next stage. In layer S1, there are 16 15x15 feature maps (i.e., 16 different arrays of 15x15 pixels each). Synapse CB2, moving from layer S1 to layer C2, scans the maps in layer S1 with a 4x4 filter, shifting by 1 pixel. In layer C2, there are 22 12x12 feature maps. Before moving from layer C2 to layer S2, an activation function P2 (pooling) is applied that pools values from non-overlapping, consecutive 2x2 regions within each feature map. In layer S2, there are 22 6x6 feature maps. At synapse CB3, which goes from layer S2 to layer C3, an activation function (pooling) is applied, where all neurons in layer C3 are connected to all maps in layer S2 via each synapse of CB3. There are 64 neurons in layer C3. Synapse CB4, which goes from layer C3 to output layer S3, completely connects C3 to S3; that is, all neurons in layer C3 are connected to all neurons in layer S3. The output in S3 contains 10 neurons, where the neuron with the highest output determines the class. This output can, for example, indicate the identification or classification (classification) of the content of the original image.

[0025] Each layer of a synapse is implemented using an array or a portion of an array of non-volatile memory cells.

[0026] Figure 7 is a block diagram of an array that can be used for that purpose. The vector-by-matrix multiplication (VMM) array 32 contains non-volatile memory cells and is used as synapses between one layer and the next (e.g., CB1, CB2, CB3, and CB4 in Figure 6). Specifically, the VMM array 32 includes an array of non-volatile memory cells 33, erase gate and word line gate decoders 34, a control gate decoder 35, a bit line decoder 36, and a source line decoder 37, each of which decoders decodes its respective input to the non-volatile memory cell array 33. Input to the VMM array 32 can be from the erase gate and word line gate decoder 34 or from the control gate decoder 35. In this example, the source line decoder 37 also decodes the output of the non-volatile memory cell array 33. Alternatively, the bit line decoder 36 can decode the output of the non-volatile memory cell array 33.

[0027] The non-volatile memory cell array 33 serves two purposes. First, it stores the weights used by the VMM array 32. Second, the non-volatile memory cell array 33 effectively multiplies the weights stored in it by the inputs, adds them up for each output line (source line or bit line) to generate an output, which becomes the input to the next layer or the last layer. By having the non-volatile memory cell array 33 perform the multiplication and addition functions, the need for separate multiplication and addition logic circuits is eliminated, and the calculations are more power-efficient due to being performed in memory.

[0028] The outputs of the non-volatile memory cell array 33 are fed to a differential summer (such as a totaling operational amplifier or totaling current mirror) 38, which sums the outputs of the non-volatile memory cell array 33 to create a single value for its convolution. The differential summer 38 is configured to perform the summing of positive and negative weights.

[0029] The summed output values of the differential summer 38 are then fed to an activation function block 39, which rectifies the output. The activation function block 39 may provide a sigmoid, tanh, or ReLU function. The rectified output values of the activation function block 39 become elements of a feature map as the next layer (e.g., C1 in Figure 6), and are then applied to the next synapse to generate the next feature map layer or the final layer. Thus, in this example, the non-volatile memory cell array 33 constitutes multiple synapses (receiving input from the previous layer of the neuron or from an input layer such as an image database), and the summing operational amplifier 38 and activation function block 39 constitute multiple neurons.

[0030] The inputs to the VMM array 32 in Figure 7 (WLx, EGx, CGx, and optionally BLx and SLx) can be analog level, binary level, or digital bits (in which case a DAC is provided to convert the digital bits to the appropriate input analog level), and the outputs can be analog level, binary level, or digital bits (in which case an output ADC is provided to convert the output analog level to the digital bits).

[0031] Figure 8 is a block diagram illustrating the use of multiple layers of the VMM array 32, labeled in the figure as VMM arrays 32a, 32b, 32c, 32d, and 32e. As shown in Figure 8, the input (indicated as Inputx) is converted from digital to analog by the digital-to-analog converter 31 and provided to the input VMM array 32a. The converted analog input can be voltage or current. The input D / A conversion of the first layer can be performed by using a function or LUT (look-up table) that maps the input Inputx to the appropriate analog level of the matrix multiplier of the input VMM array 32a. Input conversion can also be performed by an analog-to-analog (A / A) converter to convert an external analog input to the mapped analog input to the input VMM array 32a.

[0032] The output generated by input VMM array 32a is then provided as input to the next VMM array (hidden level 1) 32b, which generates an output that is then provided as input to input VMM array (hidden level 2) 32c, and so on. The various layers of VMM array 32 function as the synapses and neurons of a convolutional neural network (CNN). Each VMM array 32a, 32b, 32c, 32d, and 32e can be a standalone physical non-volatile memory array, or multiple VMM arrays can utilize different parts of the same physical non-volatile memory array, or multiple VMM arrays can utilize overlapping parts of the same physical non-volatile memory array. The example shown in Figure 8 includes five layers (32a, 32b, 32c, 32d, 32e), i.e., one input layer (32a), two hidden layers (32b, 32c), and two fully connected layers (32d, 32e). Those skilled in the art will understand that this is merely an example, and that the system could instead include more than two hidden layers and more than two fully connected layers. <Vector Matrix Multiplication (VMM) Array>

[0033] Figure 9 illustrates a neuron VMM array 900, particularly suitable for the memory cell 310 shown in Figure 3, and used as part of a synapse and neuron between the input layer and the next layer. The VMM array 900 includes a memory array 901 of non-volatile memory cells and a reference array 902 of non-volatile reference memory cells (located at the top of the array). Alternatively, another reference array may be located at the bottom.

[0034] In the VMM array 900, control gate lines such as the control gate line 903 extend in the vertical direction (thus, the reference array 902 in the row direction is orthogonal to the control gate line 903), and erase gate lines such as the erase gate line 904 extend in the horizontal direction. Here, the input to the VMM array 900 is provided to the control gate lines (CG0, CG1, CG2, CG3), and the output of the VMM array 900 appears on the source lines (SL0, SL1). In one example, only even rows are used, and in another example, only odd rows are used. The current of each source line (SL0 and SL1 respectively) performs a summation function of all the currents from the memory cells connected to that particular source line.

[0035] As described herein for neural networks, the non-volatile memory cells of the VMM array 900, i.e., the memory cells 310 of the VMM array 900, can be configured to operate in the subthreshold region.

[0036] The non-volatile reference memory cells and non-volatile memory cells described herein are biased in weak inversion (subthreshold region) as follows: Ids = Io * e (Vg-Vth) / nVt = w * Io * e (Vg) / nVt where w = e (-Vth) / nVt and Ids is the drain-source current, Vg is the gate voltage of the memory cell, Vth is the threshold voltage of the memory cell, Vt is the thermal voltage = k * T / q, where k is the Boltzmann constant, T is the Kelvin temperature, q is the electronic charge, n is the slope factor = 1 + (Cdep / Cox), Cdep is the capacitance of the depletion layer, and Cox is the capacitance of the gate oxide layer, Io is the memory cell current at a gate voltage equal to the threshold voltage, and Io is (Wt / L) * u * Cox * (n - 1) * Vt 2 and is proportional to, where u is the carrier mobility, and Wt and L are the width and length of the memory cell respectively.

[0037] When using an IV logarithmic converter that converts input current to input voltage using a memory cell (such as a reference memory cell or peripheral memory cell) or transistor: Vg=n * Vt * log[Ids / wp * Io] In the formula, wp is the w of the reference or peripheral memory cell.

[0038] For a memory array used as a vector matrix multiplier (VMM) array with current input, the output current is as follows: Iout=wa * Io * e (Vg) / nVt That is to say Iout=(wa / wp) * Iin=W * Iin W=e (Vthp-Vtha) / nVt Here, wa = w of each memory cell in the memory array. Vthp is the effective threshold voltage of the peripheral memory cell, and Vtha is the effective threshold voltage of the main (data) memory cell. Note that the threshold voltage of a transistor is a function of the substrate bias voltage, and the substrate bias voltage, denoted as Vsb, can be modulated to compensate for various conditions at such temperatures. The threshold voltage Vth can be expressed as follows: Vth = Vth0 + gamma(SQRT|Vsb-2) * φF)-SQRT|2 * φF|) In the formula, Vth0 is the threshold voltage with zero substrate bias, φF is the surface potential, and gamma is the body effect parameter.

[0039] Word lines or control gates can be used as inputs to memory cells for input voltage.

[0040] Alternatively, the flash memory cells of the VMM array described herein can be configured to operate in a linear region. Ids=beta * (Vgs-Vth) * Vds; beta = u * Cox * Wt / L W=α(Vgs-Vth) In other words, the weight W in the linear domain is proportional to (Vgs - Vth).

[0041] Word lines, control gate lines, bit lines, or source lines can be used as inputs to memory cells operating within the linear region. Bit lines or source lines can be used as outputs to memory cells.

[0042] For IV linear converters, a memory cell (such as a reference memory cell or peripheral memory cell) or transistor operating in the linear domain can be used to linearly convert input / output currents into input / output voltages.

[0043] Alternatively, the memory cells of the VMM array described herein can be configured to operate in the saturation region. Ids=1 / 2 * beta * (Vgs-Vth) 2 , beta=u * Cox * Wt / L Wα(Vgs-Vth) 2 That is, the weight W is (Vgs - Vth) 2 proportional to

[0044] Word lines, control gates, or erase gates can be used as inputs to memory cells operating within a saturation region. Bit lines or source lines can be used as outputs to output neurons.

[0045] Alternatively, the memory cells of the VMM array described herein may be used in all regions or combinations thereof (sub-threshold, linear, or saturated) for each layer or multilayer of a neural network.

[0046] Another example for the VMM array 32 in Figure 7 is described in U.S. Patent No. 10,748,630, which is incorporated herein by reference. As described in the above application, source lines or bit lines can be used as neuron outputs (current sum outputs).

[0047] Figure 10 illustrates a neuron VMM array 1000, particularly suited to the memory cell 210 shown in Figure 2 and used as a synapse between the input layer and the next layer. The VMM array 1000 includes a memory array 1003 of non-volatile memory cells, a reference array 1001 of first non-volatile reference memory cells, and a reference array 1002 of second non-volatile reference memory cells. The reference arrays 1001 and 1002, arranged in the column direction of the array, function to convert current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs WL0, WL1, WL2, and WL3. In practice, the first and second non-volatile reference memory cells are diode-connected through a multiplexer 1014 (partially illustrated) while current inputs are flowing in. The reference cells are tuned (e.g., programmed) to a target reference level. The target reference level is provided by a reference miniarray matrix (not shown).

[0048] The memory array 1003 serves two purposes. First, it stores the weights used by the VMM array 1000 in each memory cell. Second, the memory array 1003 effectively multiplies the weights stored in it by the inputs (i.e., the current inputs provided to terminals BLR0, BLR1, BLR2, and BLR3, which are converted into input voltages by the reference arrays 1001 and 1002 and supplied to the word lines WL0, WL1, WL2, and WL3), and then adds all the results (memory cell currents) to generate the output of each bit line (BL0~BLN), which becomes the input to the next layer or the last layer. By performing multiplication and addition functions, the memory array 1003 eliminates the need for separate multiplication and addition logic circuits and is also power efficient. Here, voltage inputs are supplied to word lines WL0, WL1, WL2, and WL3, and outputs appear on the respective bit lines BL0 to BLN during the read (inference) operation. The current in each bit line BL0 to BLN is a function of the sum of the currents from all non-volatile memory cells connected to that particular bit line.

[0049] Table 5 shows the operating voltages and currents of the VMM array 1000. The columns in the table show the voltages applied to the word lines of selected cells, word lines of unselected cells, bit lines of selected cells, bit lines of unselected cells, source lines of selected cells, and source lines of unselected cells. The rows show the read, erase, and program operations. Table 5: Operation of VMM Array 1000 in Figure 10 [Table 5]

[0050] Figure 11 depicts a neuron VMM array 1100, which is particularly suitable for the memory cell 210 shown in Figure 2 and is used as part of a synapse and neuron between the input layer and the next layer. The VMM array 1100 includes a memory array 1103 of non-volatile memory cells, a reference array 1101 of a first non-volatile reference memory cell, and a reference array 1102 of a second non-volatile reference memory cell. The reference arrays 1101 and 1102 extend in the row direction of the VMM array 1100. The VMM array is similar to the VMM 1000, except that the word lines in the VMM array 1100 extend vertically. Here, inputs are provided to the word lines (WLA0, WLB0, WLA1, WLB2, WLA2, WLB2, WLA3, WLB3), and outputs appear on the source lines (SL0, SL1) during read operations. The current on each source line performs a function of the sum of all currents from the memory cells connected to that particular source line.

[0051] Table 6 shows the operating voltages and currents of the VMM array 1100. The columns in the table show the voltages applied to the word lines of selected cells, word lines of unselected cells, bit lines of selected cells, bit lines of unselected cells, source lines of selected cells, and source lines of unselected cells. The rows show the read, erase, and program operations. Table 6: Operation of VMM Array 1100 in Figure 11 [Table 6]

[0052] Figure 12 illustrates a neuron VMM array 1200, which is particularly suitable for the memory cell 310 shown in Figure 3 and is used as part of a synapse and neuron between the input layer and the next layer. The VMM array 1200 includes a memory array 1203 of nonvolatile memory cells, a reference array 1201 of a first nonvolatile reference memory cell, and a reference array 1202 of a second nonvolatile reference memory cell. The reference arrays 1201 and 1202 function to convert the current inputs flowing into terminals BLR0, BLR1, BLR2, and BLR3 into voltage inputs CG0, CG1, CG2, and CG3. In practice, the first and second nonvolatile reference memory cells are diode-connected through a multiplexer 1212 (partially shown) with current inputs flowing through BLR0, BLR1, BLR2, and BLR3. The multiplexer 1212 includes a corresponding multiplexer 1205 and a cascoding transistor 1204 to ensure a constant voltage across the respective bit lines (such as BLR0) of the first and second non-volatile reference memory cells during read operations. The reference cells are tuned to a target reference level.

[0053] The memory array 1203 serves two purposes. First, it stores the weights used by the VMM array 1200. Second, the memory array 1203 effectively multiplies the weights stored in the memory array by the inputs (current inputs provided to terminals BLR0, BLR1, BLR2, and BLR3, which are converted into input voltages by the reference arrays 1201 and 1202 and supplied to the control gates (CG0, CG1, CG2, and CG3)), then adds all the results (cell currents) to produce an output, which appears in BL0~BLN and becomes the input to the next layer or the last layer. By having the memory array perform the multiplication and addition functions, the need for separate multiplication and addition logic circuits is eliminated, and power efficiency is also improved. Here, the inputs are provided to the control gate lines (CG0, CG1, CG2, and CG3), and the output appears in the bit lines (BL0~BLN) during read operations. The current in each bit line is a function of the sum of all currents from the memory cells connected to that particular bit line.

[0054] The VMM array 1200 implements one-way tuning of non-volatile memory cells within the memory array 1203. That is, each non-volatile memory cell is erased and then partially programmed until the desired charge on the floating gate is reached. If too much charge is applied to the floating gate (e.g., an incorrect value is stored in the cell), the cell is erased and the series of partial programming operations is restarted from the beginning. As shown, two rows sharing the same erase gate (e.g., EG0 or EG1) are erased together (known as page erase), and then each cell is partially programmed until the desired charge on the floating gate is reached.

[0055] Table 7 shows the operating voltages and currents of the VMM array 1200. The columns in the table show the voltages applied to the word lines of selected cells, word lines of unselected cells, bit lines of selected cells, bit lines of unselected cells, control gates of selected cells, control gates of unselected cells in the same sector as the selected cell, control gates of unselected cells in a different sector than the selected cell, erase gates of selected cells, erase gates of unselected cells, source lines of selected cells, and source lines of unselected cells. The rows show the read, erase, and program operations. Table 7: Operation of VMM Array 1200 in Figure 12 [Table 7]

[0056] Figure 13 depicts a neuron VMM array 1300, which is particularly suitable for the memory cell 310 shown in Figure 3 and is used as part of a synapse and neuron between the input layer and the next layer. The VMM array 1300 comprises a memory array 1303 of non-volatile memory cells, a reference array 1301 or a first non-volatile reference memory cell, and a reference array 1302 of a second non-volatile reference memory cell. The EG lines EGR0, EG0, EG1, and EGR1 extend vertically, and the CG lines CG0, CG1, CG2, and CG3 and the SL lines WL0, WL1, WL2, and WL3 extend horizontally. The VMM array 1300 is similar to the VMM array 1400 except that the VMM array 1300 implements bidirectional tuning, and each individual cell can be completely erased, partially programmed, and partially erased as needed to reach a desired amount of charge on a floating gate by using individual EG lines. As shown, reference arrays 1301 and 1302 convert the input currents at terminals BLR0, BLR1, BLR2, and BLR3 into control gate voltages CG0, CG1, CG2, and CG3 (through the action of diode-connected reference cells via multiplexer 1314), and these voltages are applied to memory cells in the row direction. Current outputs (neurons) are located in the bit lines BL0~BLN, and each bit line sums all the currents from the non-volatile memory cells connected to that particular bit line.

[0057] Table 8 shows the operating voltages and currents of the VMM array 1300. The columns in the table show the voltages applied to the word lines of selected cells, word lines of unselected cells, bit lines of selected cells, bit lines of unselected cells, control gates of selected cells, control gates of unselected cells in the same sector as the selected cell, control gates of unselected cells in a different sector than the selected cell, erase gates of selected cells, erase gates of unselected cells, source lines of selected cells, and source lines of unselected cells. The rows show the read, erase, and program operations. Table 8: Operation of VMM Array 1300 in Figure 13 [Table 8]

[0058] Figure 22 illustrates a neuron VMM array 2200 that is particularly suitable for the memory cell 210 shown in Figure 2 and is used as part of the synapse and neuron between the input layer and the next layer. In the VMM array 2200, the input 0. ..., INPUT N These are bit lines BL0, ..., BL, respectively. N The signal is received, and outputs OUTPUT1, OUTPUT2, OUTPUT3, and OUTPUT4 are generated on source lines SL0, SL1, SL2, and SL3, respectively.

[0059] Figure 23 illustrates a neuron VMM array 2300, which is particularly suitable for the memory cell 210 shown in Figure 2 and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, INPUT1, INPUT2, and INPUT3 are received on source lines SL0, SL1, SL2, and SL3, respectively, and outputs OUTPUT0, ..., OUTPUT N These are bit lines BL0, ..., BL N It is generated by [this method].

[0060] Figure 24 illustrates a neuron VMM array 2400, which is particularly suitable for the memory cell 210 shown in Figure 2 and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, ..., INPUT M These are the word lines WL0, ..., WL, respectively. M Received at, output OUTPUT0, ..., OUTPUT N These are bit lines BL0, ..., BL N It is generated by [this method].

[0061] Figure 25 illustrates a neuron VMM array 2500, which is particularly suitable for the memory cell 310 shown in Figure 3 and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, ..., INPUT M These are the word lines WL0, ..., WL, respectively. MReceived at, output OUTPUT0, ..., OUTPUT N These are bit lines BL0, ..., BL N It is generated by [this method].

[0062] Figure 26 illustrates a neuron VMM array 2600, which is particularly suitable for the memory cell 410 shown in Figure 4 and is used as part of the synapses and neurons between the input layer and the next layer. In this example, the input is INPUT 0、 ..., INPUT n However, each of them is a vertical control gate line CG0, ..., CG N The signal is received, and outputs OUTPUT1 and OUTPUT2 are generated on source lines SL0 and SL1.

[0063] Figure 27 illustrates a neuron VMM array 2700, which is particularly suitable for the memory cell 410 shown in Figure 4 and is used as part of synapses and neurons between the input layer and the next layer. In this example, the inputs are INPUT0, ..., INPUT N These are bit lines BL0, ..., BL, respectively. N The data is received by the gates of the bit line control gates 2701-1, 2701-2, ..., 2701-(N-1) and 2701-N, which are coupled to the source line. The example outputs OUTPUT1 and OUTPUT2 are generated on source lines SL0 and SL1.

[0064] Figure 28 illustrates a neuron VMM array 2800, which is particularly suitable for the memory cell 310 shown in Figure 3, the memory cell 510 shown in Figure 5, and the memory cell 710 shown in Figure 7, and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, ..., INPUT M These are the word lines WL0, ..., WL, respectively. M Received to, output OUTPUT0, ..., OUTPUT N These are bit lines BL0, ..., BL N It is generated in [location].

[0065] Figure 29 illustrates a neuron VMM array 2900, which is particularly suitable for the memory cell 310 shown in Figure 3, the memory cell 510 shown in Figure 5, and the memory cell 710 shown in Figure 7, and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, ..., INPUT M These are control gate lines CG0, ..., CG M Received to: Output OUTPUT0, ..., OUTPUT N These are the vertical source lines SL0, ..., SL, respectively. N It is generated in each source line SL i It is coupled to the source lines of all memory cells in column i.

[0066] Figure 30 illustrates a neuron VMM array 3000, which is particularly suitable for the memory cell 310 shown in Figure 3, the memory cell 510 shown in Figure 5, and the memory cell 710 shown in Figure 7, and is used as part of synapses and neurons between the input layer and the next layer. In this example, inputs INPUT0, ..., INPUT M These are control gate lines CG0, ..., CG M Received to: Output OUTPUT0, ..., OUTPUT N These are the vertical bit lines BL0, ..., BL, respectively. N Generated in each bit line BL i It is coupled to the bit lines of all memory cells in column i. <Long-term and short-term memory>

[0067] Prior art includes the concept known as long short-term memory (LSTM). LSTM units are often used within neural networks. LSTMs allow neural networks to store information for a predetermined period and use that information in subsequent operations. A conventional LSTM unit includes a cell, an input gate, an output gate, and a forget gate. The three gates regulate the flow of information into and out of the cell, and the duration for which information is stored within the LSTM. VMMs are particularly useful in LSTM units.

[0068] Figure 14 illustrates an example LSTM1400. In this example, the LSTM1400 includes cells 1401, 1402, 1403, and 1404. Cell 1401 receives the input vector x0 and generates the output vector h0 and the cell state vector c0. Cell 1402 receives the input vector x1, the output vector (hidden state) h0 from cell 1401, and the cell state c0 from cell 1401, and generates the output vector h1 and the cell state vector c1. Cell 1403 receives the input vector x2, the output vector (hidden state) h1 from cell 1402, and the cell state c1 from cell 1402, and generates the output vector h2 and the cell state vector c2. Cell 1404 receives the input vector x3, the output vector (hidden state) h2 from cell 1403, and the cell state c2 from cell 1403, and generates the output vector h3. Additional cells can also be used, and an LSTM with four cells is just one example.

[0069] Figure 15 illustrates an example implementation of LSTM cell 1500 usable for cells 1401, 1402, 1403, and 1404 in Figure 14. LSTM cell 1500 receives an input vector x(t), a cell state vector c(t-1) from the preceding cell, and an output vector h(t-1) from the preceding cell, and generates the cell state vector c(t) and output vector h(t).

[0070] LSTM cell 1500 includes sigmoid function devices 1501, 1502, and 1503, each of which applies a number between 0 and 1 to control the extent to which each component of the input vector contributes to the output vector. LSTM cell 1500 also includes tanh devices 1504 and 1505 for applying a hyperbolic tangent function to the input vector, multiplier devices 1506, 1507, and 1508 for multiplying two vectors, and an adder device 1509 for adding two vectors. The output vector h(t) can be provided to the next LSTM cell in the system or accessed for other purposes.

[0071] Figure 16 shows an example of an LSTM cell 1600 implementation of LSTM cell 1500. For the reader's convenience, the same numbering method used in LSTM cell 1500 is used in LSTM cell 1600. Sigmoid function devices 1501, 1502, and 1503, and tanh device 1504 each contain multiple VMM arrays 1601 and activation function blocks 1602. Thus, VMM arrays are particularly useful in LSTM cells used in certain neural network systems. Multiplier devices 1506, 1507, and 1508, and adder device 1509 are implemented in a digital or analog manner. Activation function block 1602 can be implemented in a digital or analog manner.

[0072] Figure 17 shows an alternative example of LSTM cell 1600 (and another example of one implementation of LSTM cell 1500). In Figure 17, sigmoid function devices 1501, 1502, and 1503, and tanh device 1504 share the same physical hardware (VMM array 1701 and activation function block 1702) in a time-division multiplexed manner. LSTM cell 1700 also includes a multiplier device 1703 for multiplying two vectors, an adder device 1708 for adding two vectors, a tanh device 1505 (including the activation function block 1702), a register 1707 for storing the value i(t) when i(t) is output from the sigmoid function block 1702, and a value f(t)* Register 1704 is used to store c(t-1) when its value is output from the multiplier device 1703 via the multiplexer 1710, and the value i(t) * Register 1705 for storing u(t) when its value is output from multiplier device 1703 via multiplexer 1710, and value o(t) * The set includes register 1706 and multiplexer 1709 for storing c~(t) when its value is output from multiplier device 1703 via multiplexer 1710.

[0073] While an LSTM cell 1600 contains multiple sets of VMM arrays 1601 and their respective activation function blocks 1602, an LSTM cell 1700 contains only one set of VMM arrays 1701 and activation function blocks 1702, which are used to represent multiple layers in the example of an LSTM cell 1700. Compared to an LSTM cell 1600, an LSTM cell 1700 requires only one-quarter the space for the VMMs and activation function blocks, thus requiring less space than an LSTM cell 1600.

[0074] It can be further understood that an LSTM unit typically includes multiple VMM arrays, each of which requires functionality provided by specific circuit blocks outside the VMM array, such as adder and activation function blocks and high-voltage generation blocks. Providing separate circuit blocks for each VMM array would require a considerable amount of space within the semiconductor device and would be somewhat inefficient. Therefore, the examples described below reduce the circuitry required outside the VMM array itself. <Gated recurrent unit>

[0075] Analog VMM implementations can be used in GRU (gated recurrent unit) systems. A GRU is a gate mechanism within a recurrent neural network. GRUs are similar to LSTMs, except that GRU cells generally contain fewer components than LSTM cells.

[0076] Figure 18 illustrates an example GRU1800. In this example, GRU1800 includes cells 1801, 1802, 1803, and 1804. Cell 1801 receives input vector x0 and generates output vector h0. Cell 1802 receives input vector x1 and output vector h0 from cell 1801 and generates output vector h1. Cell 1803 receives input vector x2 and output vector (hidden state) h1 from cell 1802 and generates output vector h2. Cell 1804 receives input vector x3 and output vector (hidden state) h2 from cell 1803 and generates output vector h3. Additional cells are also available, and a GRU with four cells is just an example.

[0077] Figure 19 illustrates an exemplary implementation of GRU cell 1900, which may be used in cells 1801, 1802, 1803, and 1804 of Figure 18. GRU cell 1900 receives an input vector x(t) and an output vector h(t-1) from a preceding GRU cell and generates an output vector h(t). GRU cell 1900 includes sigmoid function devices 1901 and 1902, each of which applies a number between 0 and 1 to the components from the output vector h(t-1) and the input vector x(t). GRU cell 1900 also includes a tanh device 1903 for applying a hyperbolic tangent function to the input vector, multiple multiplier devices 1904, 1905, and 1906 for multiplying two vectors, an adder device 1907 for adding two vectors, and a complementary device 1908 for subtracting the input from 1 to generate an output.

[0078] Figure 20 illustrates GRU cell 2000, an example of an implementation of GRU cell 1900. For the reader's convenience, the same numbering method used in GRU cell 1900 is employed in GRU cell 2000. As can be seen from Figure 20, the sigmoid function devices 1901 and 1902, and the tanh device 1903, each contain multiple VMM arrays 2001 and activation function blocks 2002. Thus, VMM arrays are specifically used in GRU cells used in certain neural network systems. The multiplier devices 1904, 1905, and 1906, the adder device 1907, and the complementary device 1908 are implemented in either a digital or analog manner. The activation function block 2002 can be implemented in either a digital or analog manner.

[0079] Figure 21 shows an alternative example of GRU cell 2000 (and another example of one implementation of GRU cell 1900). In Figure 21, GRU cell 2100 utilizes VMM array 2101 and activation function block 2102, which, when configured as a sigmoid function, applies a number between 0 and 1 to control the degree to which each component of the input vector contributes to the output vector. In Figure 21, sigmoid function devices 1901 and 1902, and tanh device 1903, share the same physical hardware (VMM array 2101 and activation function block 2102) in a time-division multiplexed manner. GRU cell 2100 also includes a multiplier device 2103 for multiplying two vectors, an adder device 2105 for adding two vectors, a complementary device 2109 for subtracting the input from 1 to generate an output, a multiplexer 2104, and a value h(t-1) * Register 2106 for holding r(t) when its value is output from multiplier device 2103 via multiplexer 2104, and value h(t-1) * A register 2107 holds the value of z(t) when its value is output from the multiplier device 2103 via the multiplexer 2104, and the value h^(t) *The register 2108 holds (1-z((t)) when its value is output from the multiplier device 2103 via the multiplexer 2104.

[0080] While GRU cell 2000 contains multiple sets of VMM array 2001 and activation function block 2002, GRU cell 2100 contains only one set of VMM array 2101 and activation function block 2102, which is used to represent multiple layers in the example of GRU cell 2100. GRU cell 2100 requires 1 / 3 the space for the VMM and activation function block compared to GRU cell 2000, so GRU cell 2100 requires less space than GRU cell 2000.

[0081] It can be further understood that a GRU system typically includes multiple VMM arrays, each of which requires functionality provided by specific circuit blocks outside the VMM array, such as adder and activation function blocks and high-voltage generation blocks. Providing separate circuit blocks for each VMM array would require a considerable amount of space within the semiconductor device and would be somewhat inefficient. Therefore, the examples described below reduce the circuitry required outside the VMM array itself.

[0082] The input to the VMM array may be analog level, binary level, pulse, time-modulated pulse, or digital bit (in which case a DAC is required to convert the digital bit to an appropriate input analog level), and the output may be analog level, binary level, timing pulse, pulse, or digital bit (in which case an output ADC is required to convert the output analog level to a digital bit).

[0083] For each memory cell in a VMM array, each weight W can be implemented by a single memory cell, a differential cell, or two blended memory cells (the average of two cells). In the case of a differential cell, two memory cells are required to implement the weight W as a differential weight (W = W+-W-). In the case of two blended memory cells, two memory cells are required to implement the weight W as the average of two cells.

[0084] Figure 31 illustrates the VMM system 3100. In some examples, the weights W stored in the VMM array are stored as differential pairs, W+ (positive weight) and W- (negative weight), where W = (W+) - (W-). In the VMM system 3100, half of the multiple bit lines are designated as W+ lines, i.e., bit lines that will connect to memory cells that will store the positive weight W+, and the other half of the multiple bit lines are designated as W- lines, i.e., bit lines that will connect to memory cells that will implement the negative weight W-. The W- lines are interspersed alternately between the W+ lines. Subtraction operations are performed by adders, such as adders 3101 and 3102, which receive current from the W+ and W- lines. The outputs of the W+ lines and the W- lines are combined to effectively give W = W+ - W- for each pair of (W+, W-) cells in all pairs of (W+, W-) lines. Up to this point, we have described W- lines that are alternately scattered between W+ lines, but in other examples, W+ and W- lines can be arbitrarily placed anywhere within the array.

[0085] Figure 32 illustrates another example. In the VMM system 3210, positive weights W+ are implemented in the first array 3211, and negative weights W- are implemented in the second array 3212, which is separate from the first array, and the resulting weights are appropriately combined by the adder circuit 3213.

[0086] Figure 33 illustrates the VMM system 3300. The weights W stored in the VMM array are stored as differential pairs, W+ (positive weight) and W- (negative weight), where W = (W+) - (W-). The VMM system 3300 comprises arrays 3301 and 3302. Half of the multiple bit lines in each of arrays 3301 and 3302 are designated as W+ lines, i.e., bit lines connected to memory cells that store the positive weights W+, and the other half of the multiple bit lines in each of arrays 3301 and 3302 are designated as W- lines, i.e., bit lines connected to memory cells that implement the negative weights W-. The W- lines are interspersed alternately between the W+ lines. Subtraction operations are performed by adders that receive current from the W+ and W- lines, such as adders 3303, 3304, 3305, and 3306. The outputs of the W+ line and the W- line from each array 3301 and 3302 are combined together to effectively give W=W+-W- for each pair of (W+, W-) cells in all pairs of (W+, W-) lines. In addition, the W values from each array 3301 and 3302 can be further combined via adders 3307 and 3308, such that each W value is the result of subtracting the W value from array 3302 from the W value from array 3301, meaning that the final result from adders 3307 and 3308 is the difference of one of the two difference values.

[0087] Each non-volatile memory cell used in an analog neural memory system holds a charge, i.e., the number of electrons, in a very specific and precise quantity within a floating gate, corresponding to the erase / program. For example, each floating gate should hold one of N different values, where N is the number of different weights that can be represented by each cell. Examples of N include 16, 32, 64, 128, and 256.

[0088] As the applications of artificial neural networks become more complex, the need to increase speed while maintaining accuracy is increasing. Prior-tech VMM systems utilize digital inputs and digital outputs, which require analog-to-digital and digital-to-analog conversions at various stages.

[0089] What is needed is a VMM system architecture that operates in the analog domain, where the input is in analog format. [Overview of the Initiative]

[0090] Numerous examples are described for providing artificial neural network systems that utilize vector matrix multiplication arrays with analog inputs.

[0091]

[0092]

[0093]

[0094]

[0095]

[0096]

[0097]

[0098]

[0099]

[0100]

[0101]

[0102]

[0103]

[0104]

[0105]

[0106]

[0107]

[0108]

[0109]

[0110]

[0111]

[0112]

[0113]

[0114]

[0115]

[0116]

[0117]

[0118]

[0119]

[0120]

[0121]

[0122]

[0123]

[0124]

[0125]

[0126]

[0127]

[0128]

[0129]

[0130]

[0131]

[0132]

[0133]

[0134]

[0135]

[0136]

[0137]

[0138]

[0139]

[0140]

[0141]

[0142]

[0143]

[0144] [Brief explanation of the drawing]

[0145] [Figure 1] This is a diagram illustrating an artificial neural network. [Figure 2] This displays a split-gate flash memory cell from prior art. [Figure 3] This diagram depicts a split-gate flash memory cell, another prior art technique. [Figure 4] This diagram depicts a split-gate flash memory cell, another prior art technique. [Figure 5] This diagram depicts a split-gate flash memory cell, another prior art technique. [Figure 6] This diagram illustrates various levels of exemplary artificial neural networks that utilize one or more non-volatile memory arrays. [Figure 7] This is a block diagram illustrating a VMM system. [Figure 8] This block diagram illustrates an exemplary artificial neural network utilizing one or more VMM systems. [Figure 9] Let's illustrate another example of a VMM system. [Figure 10] Let's illustrate another example of a VMM system. [Figure 11]Let's illustrate another example of a VMM system. [Figure 12] Let's illustrate another example of a VMM system. [Figure 13] Let's draw another example of a VMM system t. [Figure 14] This illustrates the long- and short-term memory systems of prior art. [Figure 15] Draw an example cell used in long- and short-term memory systems. [Figure 16] Figure 15 illustrates the implementation of an example cell. [Figure 17] Figure 15 illustrates another example implementation of the cell. [Figure 18] This displays a gated recurrent unit system from prior art. [Figure 19] Draw an example cell for use in a gated regressive unit system. [Figure 20] Figure 19 shows an example implementation of cell t. [Figure 21] Figure 19 illustrates another example implementation of the cell. [Figure 22] Let's illustrate another example of a VMM system. [Figure 23] Let's illustrate another example of a VMM system. [Figure 24] Let's illustrate another example of a VMM system. [Figure 25] Let's illustrate another example of a VMM system. [Figure 26] Let's illustrate another example of a VMM system. [Figure 27] Let's illustrate another example of a VMM system. [Figure 28] Let's illustrate another example of a VMM system. [Figure 29] Let's illustrate another example of a VMM system. [Figure 30] Let's illustrate another example of a VMM system. [Figure 31] Let's illustrate another example of a VMM system. [Figure 32] Let's illustrate another example of a VMM system. [Figure 33] Let's illustrate another example of a VMM system. [Figure 34] Let's illustrate another example of a VMM system. [Figure 35A] Draw an analog voltage input circuit. [Figure 35B] Draw an analog voltage input circuit. [Figure 35C] Draw an analog voltage input circuit. [Figure 36] Draw an analog voltage input circuit. [Figure 37] Draw an analog voltage input circuit. [Figure 38] Let's illustrate another example of a VMM system. [Figure 39] Let's illustrate another example of a VMM system. [Figure 40] Let's illustrate another example of a VMM system. [Figure 41] Let's illustrate another example of a VMM system. [Figure 42A] An example of a VMM system is shown. [Figure 42B] An example of a VMM system is shown. [Figure 43] Let's illustrate another example of a VMM system. [Figure 44] Let's illustrate another example of a VMM system. [Figure 45A] Draw an example current-voltage converter. [Figure 45B] Draw another example of a current-to-voltage converter. [Figure 46A] Draw another example of a current-to-voltage converter. [Figure 46B] Draw another example of a current-to-voltage converter. [Figure 47] Draw another example of a current-to-voltage converter. [Figure 48] Draw an example of a current-to-pulse converter. [Figure 49] Draw the example activation circuit. [Figure 50] Draw another example of an activation circuit. [Figure 51]Draw the example average current pooling circuit. [Figure 52] Draw the example maximum voltage pooling circuit. [Figure 53] Draw the example minimum voltage pooling circuit. [Modes for carrying out the invention]

[0146] <VMMシステムのアーキテクチャ> Figure 34 shows a block diagram of the VMM system 3400. The VMM system 3400 comprises a VMM array 3401, a row decoder 3402, a high-voltage decoder 3403, a column decoder 3404, a bit line driver 3405, an input circuit 3406, an output circuit 3407, a control logic 3408, and a bias generator 3409. The VMM system 3400 further comprises a high-voltage generation block 3410, which includes a charge pump 3411, a charge pump regulator 3412, and a high-voltage analog precision level generator 3413. The VMM system 3400 further comprises an algorithm controller 3414 (for program / erase or weight adjustment), an analog circuit 3415, a control engine 3416 (which may include, but is not limited to, special functions such as arithmetic functions, startup functions, and embedded microcontroller logic), and a test control logic 3417.

[0147] As will be described in more detail below, the input circuit 3406 may include circuits such as AAC (analog-to-analog converter, e.g., current-to-voltage converter or logarithmic converter), PAC (pulse-to-analog level converter), or any other type of converter. The input circuit 3406 may implement one or more of the following: normalization, linear or nonlinear up / downscaling functions, or arithmetic functions. The input circuit 3406 may implement a temperature compensation function for the input level. The input circuit 3406 may perform activation functions such as a rectified linear activation function (ReLU) or a sigmoid.

[0148] As will be described in more detail below, the output circuit 3407 may include circuits such as AAC (analog-to-analog converter, e.g., current-to-voltage converter or logarithmic converter), APC (analog-to-pulse(s) converter or analog-to-time-modulated pulse converter), or any other type of converter. The output circuit 3407 may implement activation functions such as ReLU or sigmoid. The output circuit 3407 may implement one or more of the following for the neuron output: statistical normalization, regularization, up / down scaling / gain functions, statistical rounding, or arithmetic functions (e.g., addition, subtraction, division, multiplication, shift, logarithm). The output circuit 3407 may implement a temperature compensation function for the neuron output or array output (such as a bit line output) to keep the array's power consumption approximately constant with respect to temperature changes, or to improve the accuracy of the array (neuron) output by keeping the IV slope approximately the same with respect to temperature changes. The output circuit 3407 may include an output temperature compensation circuit for an output circuit, such as an ADC circuit, which maintains a nearly constant full-scale input range for the ADC circuit across different array output current ranges.

[0149] Next, we will provide further details regarding the example of input circuit 3406.

[0150] Table 8 shows the various types of functions that can be performed by the input circuit 3406 in the analog domain. Table 8: Exemplary functions performed by input circuit 3406 [Table 9]

[0151] Figure 35A shows an analog voltage input circuit 3500 that can be used to perform Example 1 (from neuron current to analog voltage) in Table 8. The input circuit 3500 receives n neuron input currents Ineu[n:0], scaled or unscaled, and converts the n neuron input currents into their respective analog voltages in a linear manner. The input circuit 3500 comprises blocks 3501-0, 3501-1, ..., 3501(n-1), 3501-n, each block being coupled to one of n+1 rows in a VMM array (such as VMM array 3401 in Figure 34). Block 3501-0 includes a row decoder 3502-0, a switch 3503-0, a capacitor 3504-0, and a buffer 3505-0. The row decoders 3502-0 to 3502-n receive their respective row addresses, and their respective outputs are asserted if the received row address is the address of the corresponding row. For example, individual rows such as row 0 may be asserted, or multiple rows such as rows 0 through 512 may be asserted. Referring to block 3501-0 as an example, the asserted output signal of each row decoder 3052-0 closes switch 3503-0 in response to a pulse received with a pulse width of tp for a predetermined time tp, after which switch 3503-0 passes current Ineu[n:0], charging one terminal of capacitor 3504-0 and generating a voltage that is effectively sampled and held voltage VCGSH_0 of the current Ineu_0 supplied to buffer 3505-0, which is a voltage buffer, and maintains voltage VCG0 at its output even after switch 3503-0 is opened after a predetermined time. The other terminal of capacitor 3504-0 is connected to a common potential such as ground. Thus, row decoder 3052-0 enables the application of current Ineu[n:0] to capacitor 3504 for a predetermined time tp. Block 3501-0 performs the sample-and-hold function. Then, the voltage VCG0 is applied to the control gate line of row 0 in the VMM array. Each block 3501-1, ..., 3501(n-1), 3501-n contains the same components as block 3501-0 and operates in the same manner.In the illustrated example, switch 3503-1 also receives a pulse with pulse width tp, but switches 3503-(n-1) and 3503-n do not receive a pulse.

[0152] Figure 35B illustrates an analog voltage input circuit 3550 that can be used to perform Example 2 (analog current to analog voltage) in Table 8. The input circuit 3550 receives a neuron input voltage Vneu[n:0] and converts it to an analog voltage in a linear manner. The input circuit 3550 comprises blocks 3551-0, 3551-1, ..., 3551(n-1), 3551-n, each block being coupled to one of the n+1 rows in a VMM array (such as VMM array 3401 in Figure 34). Block 3551-0 includes a row decoder 3552-0, a switch 3553-0, a capacitor 3554-0, and a buffer 3555-0. The row decoders 3552-0 to 3552-n receive a row address, and an output(s) are asserted if the row address is the address of the corresponding row. For example, individual rows such as row 0 may be asserted, or multiple rows such as row 0 through row 512 may be asserted. The asserted output signal closes switch 3553-0 for a predetermined time tp (in response to a received pulse (not shown) of pulse width tp), after which switch 3553-0 passes voltage Vneu_0, effectively charging one terminal of capacitor 3554-0 to a voltage that is sampled and held voltage VCGSH_0, which is supplied to buffer 3555-0, which functions as a voltage buffer, and maintains voltage VCG0 at its output even after switch 3553-0 is opened after a predetermined time. The other terminal of capacitor 3554-0 is connected to a common potential such as ground. Thus, row decoder 3552-0 enables the application of Vneu_0 to capacitor 3554. In this way, block 3551-0 performs a sample-and-hold function. Next, the voltage VCG0 is applied to the control gate line of row 0 in the VMM array. Each block 3551-1, ..., 3551(n-1), 3551-n contains the same components as block 3551-0 and operates in the same manner.

[0153] Figure 35C illustrates an analog voltage input circuit 3580 that can be used to perform Example 2 (analog current to analog voltage) in Table 8. In this figure, the input neuron voltage is directly activated (passed through) by row decoders 3551-0, 3552-1, ..., 3552-(n-1), 3552-n, which have appropriate addresses, and applied to the VCG0 voltage of the corresponding row by switches 3552-0, 3552-1, ..., 3552-(n-1), 3552-n. VCG0 is applied to the control gate of the VMM array.

[0154] Figure 36 illustrates the analog voltage input circuit 3600, which can be used to convert an input containing one or more pulses into a voltage in a linear manner, by performing Examples 3 and 4 in Table 8 in the input circuit 3406. The input circuit 3600 comprises blocks 3601-0, 3601-1, ..., 3601(n-1), 3601-n, each of which is coupled to one of n+1 rows in a VMM array (such as the VMM array 3401 in Figure 34). Block 3601-0 includes a row decoder 3602-0, a switch 3603-0, a capacitor 3604-0, a switch 3607-0, an input signal 3608-0, a current source 3609-0, and a buffer 3605-0. The row decoder 3602-0 receives a row address, and its output is asserted if the row address is the address of the corresponding row. For example, individual rows such as row 0 may be asserted, or multiple rows such as rows 0 through 512 may be asserted. The asserted output signal of row decoder 3602-0 closes switch 3607-0, which closes switch 3603-0, enabling input signal 3608-0, which is a pulse with pulse width tp0 representing time, to be passed through for a duration of pulse width tp0. The closed switch 3603-0 allows current from the respective current sources 3609-0 to pass through, generating a pulsed current. The pulsed current charges one terminal of capacitor 3604-0, generating voltage VCGSH_0. Voltage VCGSH_0 is supplied to buffer 3605-0, which functions as a voltage buffer and maintains voltage VCO0 at its output even after switch 3603-0 is opened at the end of pulse width tp0. The other terminal of capacitor 3604-0 is connected to a common potential, such as ground. Each pulse input 3608-0 can be a single pulse with a variable pulse width tp, as indicated by tp0 for row 0, or one or more pulses with a variable number of pulses but a constant width, as indicated by two pulses with a constant width tp1 for row 1. Here, the variability of pulse width or pulse number reflects the activation value applied to a particular row. For example, the activation value may vary from 0 to 256 for an 8-bit activation value.Therefore, block 3601-0 converts the pulse input signal into a sampled and held voltage VCG0. The voltage VCG0 is then applied to the control gate line of row 0 in the VMM array. Each block 3601-1, ..., 3601(n-1), 3601-n contains the same components as block 3601-0 and operates in the same manner.

[0155] FIG. 37 depicts an analog voltage input circuit 3700, and using this analog voltage input circuit 3700, Example 5 in Table 8 can be executed in input circuit 3406 to convert the scaled neuron input current Ineu_scaled into a pulse signal. Note that the width of the pulse signal is proportional to the magnitude of Ineu_scaled. Input circuit 3700 includes blocks 3701-0, 3701-1,..., 3701(n-1), 3701-n, and each block is coupled to one of n+1 rows within a VMM array (such as VMM array 3401 in FIG. 34). Block 3701-0 includes a row decoder 3702-0, a switch 3703-0, a capacitor 3704-0, and a voltage-to-pulse (VtP) converter 3705-0. The row decoder 3702-0 receives a row address, and EN, which is the output of the row decoder 3702-0, is asserted for a predetermined time tp when the row address is the address of row 0, generating a pulse with a pulse width tp. The asserted output signal closes the switch 3703-0. When the switch 3703-0 is closed, it allows the signal Ineu_scaled to pass through, charging one terminal of the capacitor 3704-0 for a period of tp to generate a voltage VCGSH_0, and the voltage VCGSH_0 is provided to the voltage-to-pulse converter 3705-0. The other terminal of the capacitor 3704-0 is connected to a common potential such as ground. The voltage-to-pulse converter 3705-0 includes a comparator 3706-0, and the comparator 3706-0 compares the generated voltage VCGSH_0 with a reference voltage VRAMP that slopes upward as shown in the graph. Control_0, which is the output of the comparator, is high when VCGSH_0 > VRAMP. When Control_0 is high, it closes the switch 3707-0 to generate a voltage VCG0 equal to a voltage Vsource, for example, 1.5V. When VCGSH_0 < VRAMP, Control_0 switches to low, opens the switch 3707-0, sets VCG0 to low, and effectively terminates the pulse. Thus, block 3701-0 converts the input current Ineu_scaled into a pulse VCG0 of a constant voltage. Note that the width of the pulse is proportional to the magnitude of the input current.Next, pulse VCG0 is applied to the control gate line of row 0 in the VMM array. Each block 3701-1, ..., 3701(n-1), 3701-n contains the same components as block 3701-0 and operates in the same manner.

[0156] VtP block 3705-0 is equally applicable to Figures 35A / 35B / 35C and can convert the sampled and held voltage into pulses to be applied to the VMM array to perform Example 7 in Table 8.

[0157] Next, we will provide further details regarding the example of the output circuit 3407 in Figure 34.

[0158] Table 9 shows the various types of functions that can be performed by the output circuit 3407 in the analog domain. Table 9: Exemplary functions performed by output circuit 3407 [Table 10]

[0159] Figure 38 shows a VMM system 3800 comprising a VMM array 3401 and an output circuit 3407, the output circuit 3407 comprising an Ineuron scaler 3801 for performing Example 1 in Table 9. The Ineuron scaler 3801 receives output from the VMM array 3401 in the form of neuronal current Ineu. The Ineuron scaler 3801 converts the neuronal current Ineu to a scaled neuronal current Ineu_scaled. The Ineuron scaler 3801 can scale the current, for example, using a current Miller ratio circuit.

[0160] Figure 39 shows a VMM system 3900 comprising a VMM array 3401 and an output circuit 3407, the output circuit 3407 comprising an Ineuron scaler 3801 and a current-to-voltage converter (ItV) 3901 for performing Example 2 in Table 9. The Ineuron scaler 3801 receives the output from the VMM array 3401 in the form of neuron current Ineu. The Ineuron scaler 3801 converts the neuron current Ineu to a scaled neuron current Ineu_scaled. The current-to-voltage converter 3901 receives the scaled neuron current Ineu_scaled and converts the current to a voltage Vout according to a linear or logarithmic function.

[0161] Figure 40 shows a VMM system 4000 comprising a VMM array 3401 and an output circuit 3407, the output circuit 3407 comprising a current-to-pulse width (ItPW) converter 4001 for performing Example 3 in Table 9. The current-to-pulse width converter 4001 receives the output from the column of the VMM array 3401 in the form of neuron currents Ineu.

[0162] The current-to-pulse width converter 4001 converts the neuron current Ineu into a signal Pulse_width, which is a signal containing a single pulse whose width is proportional to the magnitude of Ineu.

[0163] Figure 41 shows a VMM system 4100 comprising a VMM array 3401 and an output circuit 3407, the output circuit 3407 comprising a current-to-pulse count (ItPC) converter 4101 for performing Example 4 in Table 9. The current-to-pulse count converter 4101 receives output from a column of the VMM array 3401 in the form of neuron currents Ineu. The current-to-pulse count converter 4101 converts the neuron currents Ineu into a signal Pulse_count, which is a signal containing one or more pulses of uniform width, with the pulse count being proportional to the magnitude of Ineu.

[0164] Figures 42A, 42B, 43, and 44 show VMM systems 4200, 4250, 4300, and 4400, respectively, which are similar to the VMM systems in Figures 39 to 41, except that an activation circuit 4201 is added to the output circuit 3407. The activation circuit 4201 performs an activation function such as ReLU, sigmoid, or tanh, but is not limited to these.

[0165] In Figure 42A, the output circuit 3407 comprises an Ineuron scaler 3801, an activation circuit 4201, and a current-to-voltage converter 3901. The Ineuron scaler 3801 receives the output from the VMM array 3401 in the form of a neuron current Ineu. The Ineuron scaler 3801 converts the neuron current Ineu into a scaled neuron current Ineu_scaled. The activation circuit 4201 receives the scaled neuron current Ineu_scaled and performs a function on it to generate I_active. The current-to-voltage converter 3901 receives I_active and converts the current into a voltage Vout according to a linear or logarithmic function.

[0166] Alternatively, the activation can be placed after the current-voltage converter, as shown in Figure 42B.

[0167] In Figure 43, the output circuit 3407 comprises an Ineuron scaler 3801, an activation circuit 4201, and a current-to-pulse width converter 4001. The Ineuron scaler 3801 receives the output from the VMM array 3401 in the form of a neuron current Ineu. The Ineuron scaler 3801 converts the neuron current Ineu into a scaled neuron current Ineu_scaled. The activation circuit 4201 receives the scaled neuron current Ineu_scaled and executes a function on it to generate I_active. The current-to-pulse width converter 4001 converts I_active into a signal Pulse_width, which is a signal containing a single pulse whose width is proportional to the magnitude of I_active.

[0168] In Figure 44, the output circuit 3407 comprises an Ineuron scaler 3801, an activation circuit 4201, and a current-to-pulse count converter 4101. The Ineuron scaler 3801 receives the output from the VMM array 3401 in the form of neuron current Ineu. The Ineuron scaler 3801 converts the neuron current Ineu into a scaled neuron current Ineu_scaled. The activation circuit 4201 receives the scaled neuron current Ineu_scaled and performs a function on it to generate I_active. The current-to-pulse count converter 4101 converts I_active into a signal Pulse_count, which is a signal containing one or more pulses of uniform width, with the number of pulses proportional to the magnitude of I_active.

[0169] Figures 45 to 51 illustrate exemplary circuits for implementing the functions of the output circuit 3407 described in Figures 38 to 44.

[0170] Figure 45A illustrates a current-to-voltage converter 4500 that may be used for the current-to-voltage converter (logarithmic) 3901. The current-to-voltage converter 4500 comprises an exemplary block 4501 coupled to bit line BLR0 and identical blocks for other bit lines. The current-to-voltage converter 4500 also comprises switches 4506, 4507, 4508, and 4509 and a controller 4510. Blocks 4501-0 include a reference cell 4502-0, an operational amplifier 4504-0, and a switch 4505-0. The controller 4510 controls the operation of switch 4505-0, as well as switches 4506, 4507, 4508, and 4509.

[0171] During the operation of the current-voltage converter, controller 4510 closes switch 4505-0 and opens switches 4506, 4507, 4508, and 4509. Block 4501-0 receives the input current I0 on bit line BLR0, which may be the current from VMM array 3401, which is the contribution to Ineu from column 0 in the array. Op-amp 4504-0 forces the voltage at its input to be equal through feedback from the output of op-amp 4504-0 to the control gate of reference cell 4502-0, which forces a constant voltage VREF on bit line BLR0. The current in reference cell 4502-0 is regulated by its control gate (output of op-amp 4504-0) so that the current is equal to the input current I0. The output of op-amp 4504-0 is the same as the control gate voltage Vout-0 of reference cell 4502-0, and is a voltage signal, which is the logarithmic function of the input current I0 received at BLR0 for a reference cell operating in the subthreshold region. For a cell operating in the subthreshold region, VCG is the logarithmic function of the cell current Icell. Blocks identical to block 4501-0 are coupled to their respective array outputs from VMM array 3401.

[0172] Figure 45B illustrates a current-voltage converter 4550 that may be used for the current-voltage converter (logarithmic) 3901. The current-voltage converter 4550 comprises an exemplary block 4551 coupled to bit line BLR0 and identical blocks for other bit lines. The current-voltage converter 4550 also comprises switches 4558, 4559, 4560, and 4561 and a controller 4562. Block 4551-0 comprises a reference cell 4552-0, an operational amplifier 4554-0, a switch 4555-0, a capacitor 4556-0, and a buffer 4557-0. The controller 4562 controls 4558, 4559, 4560, and 4561.

[0173] During operation, block 4551-0 receives current I0 from the output of VMM array 3401, which is an inverted version of the current from bit line BLR0. The bit line current is inverted so that the current flows from Vdd (high power supply) to low (into this circuit). Optionally, the current may be scaled before being supplied to this circuit.

[0174] The (array output) current is supplied to the reference cells 4552-0, which also receive a voltage Vsweep at their control gate terminals when their respective switches are closed. Vsweep is a fluctuating voltage (such as a ramp signal) that charges capacitor 4556-0 during sweep operation when switch 4550-0 is closed. When the output of comparator 4554-0 changes due to a change in Vsweep, switch 4555-0 opens, thereby sampling Vsweep into capacitor 4556-0, and the instantaneous Vsweep voltage may be held in capacitor 4556-0. The held voltage represents the output voltage that conducts the same current as the array into the reference cells. This voltage is supplied to buffer 4557-0 and output as voltage Vout-0. Vout-0 is a voltage signal that is the logarithmic function of the current I0 received on BL0, and this is due to the cells operating in the subthreshold region, i.e., VCG is a log(Icell) function. Block 4551-0, identical to the block in question, is coupled to the array current output of the VMM array 3401.

[0175] Figure 46A illustrates a current-to-voltage converter 4600 that can be used for the current-to-voltage converter (logarithmic) 3901. The current-to-voltage converter 4600 includes a reference memory cell 4601, a switch 4602, an operational amplifier 4603, and a controller 4604, arranged as shown in the figure. The reference memory cell 4601 receives the current BLR from the bit line in the VMM array, which is the contribution to the Ineuron for that particular column in the VMM array. The operational amplifier 4603 outputs a voltage VNEUOUT, and the switch 4602 is closed by the controller 4604, applying its voltage to the control gate terminal of the reference memory cell 4601. This feedback loop moves VNEUOUT to a value that makes the voltage applied to the inverting terminal of the operational amplifier 4603 equal to the voltage VREF applied to the non-inverting terminal of the operational amplifier 4603. In this way, the current-to-voltage converter 4600 converts the received current BLR to the voltage VNEUOUT according to a logarithmic function for cells in the subthreshold region and according to a linear function for cells in the linear region.

[0176] Figure 46B illustrates a current-to-voltage converter 4550 that may be used for the current-to-voltage converter (logarithmic) 3901 in Figure 39. The current-to-voltage converter 4650 comprises a reference memory cell 4651, a switch 4652, a comparator 4653, a switch 4654, a capacitor 4655, a buffer 4656, and a controller 4657, arranged as shown. The reference memory cell 4650 receives the current IBL from the bit lines in the VMM array, which is the contribution to the Ineuron for its particular column in the VMM array. The operational amplifier 4653 outputs the voltage COMPOUT, and switches 4654 and 4652 are closed by the controller 4657, applying their voltage to the control gate terminal of the reference memory cell 4651. This feedback loop moves COMPOUT to a value that makes the voltage applied to the inverting terminal of the operational amplifier 4653 equal to the voltage VREF applied to the non-inverting terminal of the operational amplifier 4653. The voltage across capacitor 4655 is also COMPOUT, and this voltage is held. This voltage is input to buffer 4656 and output as VNEUOUT. In this way, the current-to-voltage converter 4650 converts the received current IBL into the voltage VNEUOUT according to a logarithmic function.

[0177] Figure 47 illustrates a current-voltage converter 4700 that can be used by the current-voltage converter 3901 to linearly convert current to voltage. Specifically, the current-voltage converter 3901 includes instances of the current-voltage converter 4700 for each bit line output in the VMM array 3401. The current-voltage converter 4700 comprises a PMOS transistor 4701 and an operational amplifier 4702 arranged as shown in the figure. One terminal of the PMOS transistor 4701 is connected to the voltage source. The other terminal of the PMOS transistor 4701 is connected to the gate of the PMOS transistor 4701 and coupled to the bit line in the VMM array 401 and the non-inverting terminal of the operational amplifier 4702. The inverting terminal of the operational amplifier 4702 is connected to the output of the operational amplifier 4702. The current drawn by the bit line I_BL results in a voltage V_IBL output from the operational amplifier 4702. The operational amplifier 4702 functions as a buffer, maintaining its level V_IBL, which reflects the current drawn by the bit line I_BL, regardless of any load it may be fitted with. Optionally, the current-to-voltage converter 4700 can also be used in a bit line current mirror buffer. In such an example, the voltage V_IBL is supplied to the gate of a similar PMOS (not shown), and the current in this PMOS is the mirror current of the PMOS 4701. Figure 48 draws a current-to-pulse converter 4800, which can be used for a current-to-pulse width converter 4001 or a current-to-pulse number converter 4101 to convert a current into one or more pulses. The current-to-pulse converter 4800 receives the neuron current I_BL and the enable signal EN, and comprises a capacitor 4801, a comparator 4802, and a gate 4806. During operation, the capacitor 4801 is charged by I_BL. Initially, the voltage across the capacitor 4801 is lower than VREF, and the output COMPOUT is high. When the voltage across capacitor 4801 exceeds VREF, the output COMPOUT of comparator 4802 changes from high to low. COMPOUT, along with the activation signal EN, is input to AND gate 4806, and the output of AND gate 4806 becomes a pulse VNEU_PW whose pulse width is proportional to the magnitude of I_BL.This is shown in graphs 4803 and 4804.

[0178] Optionally, an AND gate 4807 can be used instead of an AND gate 4806. The AND gate 4807 receives COMPOUT, EN, and the clock signal as inputs and outputs VNEU_PC. VNEU_PC contains a series of pulses having the frequency and phase of CLK, where the pulses begin when COMPOUT and EN are high and end when COMPOUT or EN is low. This converts the current I_BL into a series of uniform pulses, where the number of pulses is proportional to the magnitude of I_BL.

[0179] Figure 49 shows an activation circuit 4900, which is an exemplary implementation of the tanh activation circuit 4201. The activation circuit 4900 comprises a current-to-voltage converter 4901, PMOS transistors 4902 and 4903 (forming a current mirror), NMOS transistors 4904 and 4905, and an NMOS transistor 4906, arranged as shown in the figure. The activation circuit 4900 receives a current input I_input and generates a current output Iout according to a sigmoid function performed using a differential pair, as follows: Iout = I1 - I2 = Ibias * tanh(K * (V1-V2) / 2). Thus, the activation circuit 4900 converts I_input to Iout according to the sigmoid function. Graph 4907 shows IO as a function of I_input.

[0180] Figure 50 shows an activation circuit 5000, which is an exemplary implementation of the activation circuit 4201. The activation circuit 5000 includes an NMOS transistor 5001, an operational amplifier 5002, and an NMOS transistor 5003, arranged as shown in the figure. The activation circuit 5000 receives a current input I_In and generates a voltage output OUT according to the ReLu function, as shown by graph 5004.

[0181] Figure 51 shows an average current pooling circuit 5100, which may optionally be part of the output circuit 3407 to perform the averaging function. The average current pooling circuit 5100 comprises N current sources 5101-1, ..., 5101-N (each representing current from a bit line in the VMM array), an NMOS transistor 5102, and an NMOS transistor 5103. The NMOS transistor 5102 sums all the currents received from current sources 5101-1, ..., 5101-N. The NMOS transistors 5102 and 5103 are arranged in a current mirror configuration. However, the widths of the NMOS transistors 5102 and 5103 differ by N times, and as a result, the current Iout drawn through the NMOS transistor 5103 is 1 / N of the current drawn by the NMOS transistor 5102, which effectively produces the average of the currents received from all N bit lines.

[0182] Figure 52 shows a maximum voltage pooling circuit 5200, which may optionally be part of the output circuit 3407. The maximum voltage pooling circuit 5200 receives n voltages (VIN1, ..., VINn) and outputs the largest of these voltages as VOUT. This is done by comparing pairs of voltages (VIN1 and VIN2, ..., VINn-1 and VINn), outputting the larger of the two, then comparing the resulting pairs, and continuing this process until only one voltage VOUT remains. The comparison is performed using circuit 5201, which comprises a comparator 5202, an NMOS transistor 5203, an inverter 5204, and an NMOS transistor 5205. Circuit 5201 receives two voltages, such as VIN1 and VIN2, and outputs the larger of the two voltages as OUT. In detail, comparator 5202 outputs a high signal when VIN1 is greater than VIN2, and this high signal turns on NMOS transistor 5203 to pass VIN1 to OUT, and turns off NMOS transistor 5205 via inverter 5204. Similarly, comparator 5202 outputs a low signal when VIN2 is greater than VIN1, and this low signal turns off NMOS transistor 5203, and turns on NMOS transistor 5205 via inverter 5204 to pass VIN2 to OUT.

[0183] Figure 53 shows a minimum voltage pooling circuit 5300, which may optionally be part of the output circuit 3407. The minimum voltage pooling circuit 5300 receives n voltages (VIN1, ..., VINn) and outputs the smallest of these voltages as VOUT. This is done by comparing pairs of voltages (VIN1 and VIN2, ..., VINn-1 and VINn), outputting the smaller of the two, then comparing the resulting pairs, and continuing this process until only one voltage VOUT remains. The comparison is performed using circuit 5301, which comprises a comparator 5302, an NMOS transistor 5303, an inverter 5304, and an NMOS transistor 5305. Circuit 5301 receives two voltages, such as VIN1 and VIN2, and outputs the smaller of the two voltages as OUT.

[0184] In detail, comparator 5302 outputs a high signal when VIN1 is less than VIN2, this high signal turns on NMOS transistor 5203 and passes VIN1 to OUT, and turns off NMOS transistor 5205 via inverter 5204. Similarly, comparator 5202 outputs a low signal when VIN2 is less than VIN1, this low signal turns off NMOS transistor 5203 and turns on NMOS transistor 5205 via inverter 5204 and passes VIN2 to OUT.

[0185] It should be noted that, as used herein, the terms “over” and “on” both encompass “directly” (without intermediate material, element, or gap between them) and “indirectly to” (with intermediate material, element, or gap between them). Similarly, the term “adjacent” includes “directly adjacent” (without intermediate material, element, or gap between them) and “indirectly adjacent” (with intermediate material, element, or gap between them); “attached” includes “directly attached” (without intermediate material, element, or gap between them) and “indirectly attached to” (with intermediate material, element, or gap between them); and “electrically coupled” includes “directly electrically coupled” (without intermediate material or element between them electrically connecting the elements together) and “indirectly electrically coupled to” (with intermediate material or element between them electrically connecting the elements together). For example, forming an element "on top of a substrate" may include forming the element directly on the substrate without any intermediate materials / elements between them, and forming the element indirectly on the substrate with one or more intermediate materials / elements between them.

Claims

1. A vector matrix multiplication array comprising an array containing multiple non-volatile memory cells arranged in rows and columns, A capacitor having a first terminal and a second terminal coupled to a common potential, A row decoder for enabling the application of an input signal to the first terminal of the capacitor in response to an address, A buffer coupled to the first terminal of the capacitor, comprising a buffer for generating output voltages for each row of a vector matrix multiplication array, A system equipped with these features.

2. The row decoder activates the input signal by closing a switch at the output of the row decoder, and in the closed position, the switch connects the input neuron current, which is the input signal, to the first terminal of the capacitor. The system according to claim 1, wherein the row decoder disables the input signal by opening the switch at the output of the row decoder, and the switch, in the open position, disconnects the input neuron current from the first terminal of the capacitor.

3. The system according to claim 2, wherein the input neuron current is received from a neural network array.

4. The system according to claim 2, wherein the input neuron current is a current scaled based on the current received from the neural network array.

5. The system according to claim 2, wherein the output voltage is generated with respect to the input neuron current according to a linear function performed by the switch and the capacitor.

6. The system according to claim 1, wherein the plurality of nonvolatile memory cells include stacked gate flash memory cells.

7. The system according to claim 1, wherein the plurality of non-volatile memory cells include split-gate flash memory cells.

8. It is a system, A vector matrix multiplication array comprising an array containing multiple non-volatile memory cells arranged in rows and columns, A switch for switchingly connecting each input to each row of the vector matrix multiplication array, A row decoder for activating the switch in response to an address and combining each of the inputs into each of the rows of the vector matrix multiplication array, A system equipped with these features.

9. The system according to claim 8, wherein each of the aforementioned inputs is received from a neural network array.

10. The system according to claim 8, wherein each of the aforementioned inputs is applied to the control gate terminal of the nonvolatile memory cell in each of the aforementioned rows.

11. The system according to claim 8, wherein the plurality of nonvolatile memory cells include stacked gate flash memory cells.

12. The system according to claim 8, wherein the plurality of non-volatile memory cells include split-gate flash memory cells.

13. The row decoder enables the application of an input signal to the capacitor in response to the address, The buffer generates an output voltage using the voltage stored in the capacitor by the applied voltage, The steps include providing the output voltage to a row of non-volatile memory cells in a vector matrix multiplication array, A method that includes this.

14. The method according to claim 13, wherein the input signal is received from a neural network array.

15. The method according to claim 13, wherein the output voltage is generated according to a linear function performed on the input signal.

16. The method according to claim 13, wherein the non-volatile memory cell includes a stacked gate flash memory cell.

17. The method according to claim 13, wherein the non-volatile memory cell includes a split-gate flash memory cell.