Bayesian network in memory
By implementing a Bayesian neural network in memory and utilizing memory units and a random pulse generator, the efficiency problem of Gaussian random variables and high-dimensional integrals in traditional schemes is solved, achieving efficient computation and parallelism.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MICRON TECHNOLOGY INC
- Filing Date
- 2021-08-25
- Publication Date
- 2026-06-30
Smart Images

Figure CN114121086B_ABST
Abstract
Description
Technical Field
[0001] This disclosure generally relates to memory, and more specifically to apparatus and methods associated with implementing a Bayesian neural network in memory. Background Technology
[0002] Memory devices are typically provided as internal semiconductor integrated circuits in computers or other electronic devices. Many different types of memory exist, including volatile memory and non-volatile memory. Volatile memory may require power to maintain its data and includes random access memory (RAM), dynamic random access memory (DRAM), and synchronous dynamic random access memory (SDRAM). Non-volatile memory provides permanent data by retaining the stored data when no power is supplied and includes NAND flash memory, NOR flash memory, read-only memory (ROM), electrically erasable programmable ROM (EEPROM), erasable programmable ROM (EPROM), and resistive variable memory, such as phase-change random access memory (PCRAM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM).
[0003] Memory is also used as volatile and non-volatile data storage in a wide range of electronic applications, including (but not limited to) personal computers, Memory Sticks, digital cameras, cellular phones, portable music players (e.g., MP3 players, movie players), and other electronic devices. Memory cells can be arranged in arrays, wherein the arrays are used in memory devices. Summary of the Invention
[0004] One aspect of this application relates to an apparatus for implementing a Bayesian neural network in memory, comprising: a memory array; a controller coupled to the memory array and configured to: read data from a first plurality of memory cells of the memory array; generate a first plurality of weight values and a first plurality of bias values using a second plurality of memory cells of the memory array based on a plurality of deterministic values read from the first plurality of memory cells; generate a second plurality of weight values and a second plurality of bias values using a third plurality of memory cells of the memory array based on the first plurality of weight values and the second plurality of bias values; and transmit output data from a fourth plurality of memory cells of the memory array, the output data including results based at least on: an input provided by a host, the second plurality of weight values, and the second plurality of bias values of the Bayesian neural network.
[0005] Another aspect of this application relates to a method for implementing a Bayesian neural network in memory, comprising: providing first data including a training set and first partial derivatives of a first loss function with multiple parameters to a first storage bank of a memory device, wherein the first loss function corresponds to a layer of the Bayesian neural network; in response to receiving the data including the first partial derivatives, generating second data using the first storage bank including second partial derivatives of a second loss function with multiple weights and the multiple parameters; generating third data using a second storage bank including third partial derivatives of the second loss function, wherein the third data is generated using the second data; generating fourth data using the second storage bank including fourth partial derivatives of the second loss function, wherein the fourth data is generated using the second data; and writing updated weight values of the layer of the Bayesian neural network to another portion of the memory device, at least in part based on the third data including the third partial derivatives and the fourth data including the fourth partial derivatives.
[0006] Another aspect of this application relates to a system for implementing a Bayesian neural network in memory, comprising: a memory device; a plurality of registers; and a controller coupled to the memory device and the plurality of registers and configured to: access first data representing a first partial derivative of a loss function of the Bayesian neural network from the plurality of registers; access second data representing a second partial derivative of the loss function from the plurality of registers; access third data representing a third partial derivative of the loss function from the plurality of registers; and update the resistance of the plurality of memory cells using the first data, the second data, and the third data to store a plurality of updated deterministic values. Attached Figure Description
[0007] Figure 1 This is a block diagram of a device in the form of a computing system including a memory device, according to several embodiments of the present disclosure.
[0008] Figure 2 These are examples of Bayesian neural networks based on several embodiments of this disclosure.
[0009] Figure 3 Example memory arrays according to several embodiments of the present disclosure are described.
[0010] Figure 4A The first part describes an instance flow for performing forward propagation according to several embodiments of the present disclosure.
[0011] Figure 4B The second part describes an instance flow for performing forward propagation according to several embodiments of the present disclosure.
[0012] Figure 5This document describes an example flow for performing backpropagation according to several embodiments of the present disclosure.
[0013] Figure 6 This document describes an example flow for updating weights according to several embodiments of the present disclosure.
[0014] Figure 7 Example flowcharts illustrating methods for implementing a Bayesian neural network in memory according to several embodiments of the present disclosure.
[0015] Figure 8 An example machine is described, which is a computer system within which a set of instructions can be executed to cause the machine to perform the various methods discussed herein. Detailed Implementation
[0016] This disclosure includes apparatus and methods relating to implementing a Bayesian neural network in memory. The Bayesian neural network can be implemented in memory using a memory array (e.g., trained and used during inference).
[0017] As used in this paper, a probabilistic neural network is a feedforward neural network architecture for estimating uncertainty. A Bayesian neural network is an instance of a probabilistic neural network. A Bayesian neural network is a neural network architecture with posterior inference. A Bayesian neural network can also be a stochastic neural network. As used in this paper, a stochastic neural network is a neural network architecture that utilizes stochastic variations (e.g., stochastic transfer functions, stochastic weights, and / or stochastic biases).
[0018] The examples described in this paper illustrate neural networks that implement random weights and biases and provide posterior inference. The examples described in this paper can be implemented using Bayesian neural networks.
[0019] Mission-critical systems (such as medical or automotive systems) utilize uncertainty estimation in neural network models. Uncertainty predictions can be used to assess the level of confidence in predictions generated by neural network models. For example, in healthcare, reliable uncertainty estimations prevent overconfident decisions regarding rare or novel patient conditions. In autonomous agents actively exploring their environment, uncertainty estimations can be used to identify which data points are most informative. For instance, Bayesian neural networks can be used to identify actions to be taken regarding vehicle steering. For example, a Bayesian neural network can receive an image of an intersection. The image can be provided as an input vector to the Bayesian neural network. The Bayesian neural network can then generate the position of the steering wheel and the determinism associated with that position.
[0020] However, traditional implementations of Bayesian neural networks may not be efficient at providing true Gaussian random variables. Traditional implementations of Bayesian neural networks may also be inefficient at performing calculations of high-dimensional integrals.
[0021] The aspects of this disclosure address the above and other deficiencies. For example, compared to conventional implementations of Bayesian neural networks, several embodiments employ memory device hardware to create Gaussian random variables in an efficient manner. Several embodiments implement Bayesian neural networks in a memory device to leverage the hardware of the memory device to efficiently utilize high-dimensional integrals.
[0022] Bayesian neural networks can be implemented in a memory device using several memory units and one or more random pulse generators. Using memory units simultaneously to implement neural networks can increase parallelism.
[0023] The diagrams in this document follow a numbering convention, where the first one or a few digits correspond to the diagram number, and the remaining digits identify the elements or components within the diagram. Similar elements or components between different diagrams can be identified using similar digits. For example, 110 in... Figure 1 Reference component "10" can be found in the middle, and similar components are in Figure 3 The reference number is 310. Similar elements within the diagram may be referenced using hyphens followed by additional numbers or letters. For example, see [reference 1]. Figure 4A and 4B Elements 442-1, 442-2, 442-3, and 442-4 are shown in the figures. It will be understood that elements illustrated in the various embodiments herein may be added, interchanged, and / or eliminated to provide several additional embodiments of this disclosure. Furthermore, it should be understood that the scale and relative dimensions of the elements provided in the figures are intended to illustrate certain embodiments of this disclosure and should not be considered limiting.
[0024] Figure 1 This is a block diagram of a device in the form of a computing system 100 including a memory device 103, according to several embodiments of the present disclosure. As used herein, for example, the memory device 103, the memory array 110, and / or the host 102 may also be individually considered as a “device”.
[0025] In this example, computing system 100 includes a host 102 coupled to memory device 103 via interface 104. Computing system 100 can be a variety of other types of systems, such as a personal laptop computer, desktop computer, digital camera, mobile phone, memory card reader, or Internet of Things (IoT) enabled device. Host 102 may include several processing resources (e.g., one or more processors, microprocessors, or other types of control circuitry) capable of accessing memory device 103. Computing system 100 may include a separate integrated circuit, or both host 102 and memory device 103 may be on the same integrated circuit. For example, host 102 may be a system controller for a memory system including multiple memory devices 103, wherein the system controller provides access to the respective memory devices 103 through another processing resource, such as a central processing unit (CPU).
[0026] For clarity, computing system 100 has been simplified to focus on features particularly relevant to this disclosure. For example, memory array 110 may be a DRAM array, SRAM array, STT RAM array, PCRAM array, TRAM array, RRAM array, NAND flash array, NOR flash array, and / or 3D cross-dot array. Array 110 may include memory cells arranged in rows coupled by access lines (which may be referred to herein as word lines or select lines) and columns coupled by sense lines (which may be referred to herein as digital lines or data lines). Although memory array 110 is shown as a single memory array, memory array 110 may represent multiple memory arrays configured in the memory bank of memory device 103.
[0027] Memory device 103 includes address circuitry 106 to latch address signals provided via interface 104. For example, the interface may include a physical interface employing a suitable protocol (e.g., a data bus, address bus, and command bus, or a combination of data / address / command buses). This protocol may be custom or proprietary, or interface 104 may employ a standardized protocol, such as Fast Peripheral Component Interconnect (PCIe), Gen-Z Interconnect, Accelerator Cache Coherent Interconnect (CCIX), or the like. Address signals are received and decoded by row decoder 108 and column decoder 112 to access memory array 110. Data can be read from memory array 110 by sensing voltage and / or current changes on a sensing line using sensing circuitry 111. Sensing circuitry 111 may be coupled to memory array 110. Each memory array and its corresponding sensing circuitry may constitute a storage bank of memory device 103. For example, sensing circuitry 111 may include a sensing amplifier capable of reading and latching a page (e.g., a row) of data from memory array 110. I / O circuitry 107 can be used for bidirectional data communication with host 102 via interface 104. Read / write circuitry 113 is used to write data to or read data from memory array 110. For example, circuitry 113 may include various drivers, latching circuitry, etc.
[0028] Control circuitry system 105 decodes signals provided by host 102. These signals may be commands provided by host 102. These signals may include chip enable signals, write enable signals, and address latch signals for controlling operations performed on memory array 110 (including data read operations, data write operations, and data erase operations). In various embodiments, control circuitry system 105 is responsible for executing instructions from host 102. Control circuitry system 105 may include state machines, sequencers, and / or other types of control circuitry systems, which may be implemented in hardware, firmware, or software, or any combination thereof. In some instances, host 102 may be a controller external to memory device 103. For example, host 102 may be a memory controller coupled to the processing resources of a computing device. Data may be provided to and / or from memory array 110 via data lines that couple memory array 110 to I / O circuitry system 107.
[0029] In various examples, memory array 110 may be a resistive memory array. A resistive memory array may be a resistive programmable device. That is, memory array 110 can be programmed by modifying the resistance of the memory cells comprising memory array 110. Memory cells can be programmed to have a specific resistance (or conductance). The terms resistance and conductance are used interchangeably herein regarding programming memory cells and / or representing values with respect to memory cells, because any change in resistance is accompanied by a proportional change in conductance. The resistance of a memory cell can represent a value that can be used to perform operations. For example, the resistance of a memory cell can be used to perform multiplication and other types of operations.
[0030] In various instances, the resistance of memory cells can be programmed to represent the weight and bias values of a neural network. This ability to program the resistance of memory cells facilitates the performance of forward, backward, and weight updates using a finite number of memory banks in the memory array 110. Figure 3 The pulse generator 114, further described in the text, randomly selects weights and biases. Utilizing... Figure 3 The analog-to-digital converter (ADC) 115, further described herein, can convert the result of operations performed at layers of a Bayesian neural network into a voltage signal. Although the pulse generator 114 and ADC 115 are described as being directly coupled to the memory array 110, in some embodiments, the pulse generator 114 and / or ADC 115 may be coupled to the memory array 110 via the sensing circuitry 111, the row decoder 108, or the column decoder 112.
[0031] Figure 2 An example Bayesian neural network 220 according to several embodiments of the present disclosure is described. The Bayesian neural network 220 may include an input layer 221, hidden layers 222-1, hidden layers 222-2, and an output layer 223. Hidden layers 222-1 and 222-2 are referred to as hidden layers 222.
[0032] Input layer 221 can receive multiple input values and can provide input values to hidden layer 222-1. Hidden layer 222 can perform multiple calculations on input values and intermediate values using weights 222 (e.g., W1, W2) and biases 225 (e.g., b1, b2).
[0033] Each of the hidden layers 222 may include different sets of weights that can be used to perform computations on the Bayesian neural network 220. For example, weight W1 represents multiple weight values. The weight values corresponding to weight W1 may include, for example, weight value W. 11 W 12 ,…,W 1N Bias b1 represents multiple bias values. The bias values corresponding to bias b1 may include, for example, the bias value b. 11 ,b12 ,…,b 1N . Figure 2 The Bayesian neural network 220 is shown as including two hidden layers 222. However, a Bayesian neural network may include more than two hidden layers 222. The examples described herein are illustrative and can be extended to cover different implementations of Bayesian neural networks.
[0034] Figure 2 The Bayesian neural network 220 shown is called a fully connected Bayesian neural network 220 because each node in each layer is connected to each node in the neighboring layers. For example, the input layer 221 can be described as including multiple input nodes, and the hidden layer 222-1 is shown to include multiple hidden nodes. Each input node can be connected to each hidden node in the hidden layer 222-1. Similarly, each node in the hidden layer 222-1 is connected to each node in the hidden layer 222-2.
[0035] Each connection between nodes can be assigned a corresponding weight and bias value. The output of each node in hidden layer 222-1 can be provided to each node in hidden layer 222-2. Several operations can be performed using the outputs of the nodes in hidden layer 222-1, the corresponding weight values from weight 224, and the corresponding bias values from bias 225. Nodes from hidden layer 222-2 can produce outputs, which can be provided as inputs to output layer 223. The output of output layer 223 can be the output of Bayesian neural network 220.
[0036] In various instances, weights 224 and biases 225 are random variables. Each of weights 224 and each of biases 225 is assigned a Gaussian distribution with a mean and a standard deviation. For example, weights 224 have a mean... and standard deviation The Gaussian distribution has a bias of 225. and standard deviation The distribution is Gaussian, where i represents a layer from a Bayesian neural network, for example, hidden layer 222-1. Although weights 224 are described as having an average value... and standard deviation The distribution is Gaussian, but the weights 224 come from weight W. 11 W 12 ,…,W 1N and weight W 21 W 22 ,…,W 2N Each of them can have a separate Gaussian distribution with a mean and a standard deviation. Furthermore, the bias 225 originates from bias b. 11 ,b12 ,…,b 1N and bias b 21 ,b 22 ,…,b 2N Each of them can have a separate Gaussian distribution with a mean and a standard deviation.
[0037] Each of the input layer 221, the hidden layer 222, and the output layer 223 can be implemented in one or more memory banks of the memory array. Furthermore, one or more memory banks of the memory array can be used to implement multiple operations performed in layers 221, 222, and 223.
[0038] The output of the Bayesian neural network 220 can be represented as: The conditional distribution. In the conditional distribution In this context, w represents multiple weights, D represents training data, and X represents... new It is the input to a Bayesian neural network 220, and It is the result of executing Bayesian Neural Network 220. This can be described as inputting X new Given the weights w and training data D, the results Estimate the uncertainty.
[0039] Can be equal to However, due to computational limitations, calculating P(w|D) may be infeasible. P(w|D) can be approximated by q(w|θ), where θ represents parameters {μ,ρ} such that θ={μ,ρ}.
[0040] The parameter θ can be chosen to minimize F(D,θ), which is called the cost function. The cost function F(D,θ) can also be called the KL divergence. KL divergence is a measure of how much a probability distribution differs from a second probability distribution. The cost function F(D,θ) provides a measure of how much a distribution P(w|D) differs from a distribution q(w|θ), such that F(D,θ) provides the distance between P(w|D) and q(w|θ).
[0041] The cost function F(D,θ) can be defined as F(D,θ)≈∑ i∈layers [log q(w i |θ)-log P(w i )-log P(D|w i )], where q(w i |θ) is called the variational posterior, and P(w) i ) is called the prior, and w is the a priori. i Indicates the variational posterior q(w) i The i-th Monte Carlo sample drawn from |θ).i It can be selected based on random numbers. P(D|w) i The variational posterior is the likelihood measure of how well a Bayesian neural network 220 with weights w fits the training data D. The variational posterior is defined as... The variational posterior follows a Gaussian distribution. The prior is defined as follows: Gaussian mixture models are used to model priors depending on weights w, where σ1 and σ2 are constants representing standard deviations.
[0042] Figure 3 An example memory array 310 according to several embodiments of the present disclosure is described. The memory array 310 includes a plurality of memory cells 333. The memory cells 333 are coupled to a sensing line 335 and an access line 336.
[0043] Memory cell 333 may be a resistive memory cell. Resistive memory cell 333 may include terminals coupling memory cell 333 to sensing line 335 and access line 336. The terminals of memory cell 333 may be coupled to each other via resistive element 334. Resistive element 334 may be a variable-resistance material (e.g., a material programmable to multiple different resistance states, which may represent multiple different data states), such as (for example) transition metal oxide materials, or perovskites containing two or more metals (e.g., transition metals, alkaline earth metals, and / or rare earth metals). Other examples of variable-resistance materials that may be included in the memory element of resistive memory cell 223 may include various materials that use trapped charge to modify or change conductivity, chalcogenides formed from various doped or undoped materials, binary metal oxide materials, giant magnetoresistive materials, and / or various polymer-based variable-resistance materials, etc. Embodiments are not limited to specific variable-resistance materials. In various examples, the conductance of memory cell 333 can be programmed by programming resistive element 334. For example, the control circuitry of the memory device can program the resistive element 334. Actions performed by the memory device, memory array 310, memory cell 334, pulse generators (e.g., deterministic pulse generators 331-1, 331-2 and random pulse generator 332) and / or analog-to-digital converters 315-1, 315-2 can be described as being performed or caused by the control circuitry of the memory device.
[0044] Electrical conductance can represent the weight values of a Bayesian neural network. For example, the conductance of memory cell 333 can represent the weight values of a layer in a Bayesian neural network. As used herein, the terms weight and weight value are used interchangeably.
[0045] Memory unit 333 can be used to perform operations. Memory unit 333 can be controlled to perform matrix multiplication in parallel with and locally on the memory devices of the managed memory array 310. Matrix multiplication can be performed using inputs and multiple weight values. Inputs can be provided as input vectors. Multiple weight values, represented by the conductance of memory unit 333, can be provided as a weight matrix. Inputs in Figure 3 The middle is represented as V in V in May include vectors Input (e.g., x0…x) n Each of these can be provided to the memory array 310 via, for example, a signal line of a sensing line 335 or an access line 336. Figure 3 The input can be provided via access lines 336 and / or sensing lines 335. Each of the access lines 336 can provide a portion of the input (one of the input values). For example, the first access line can provide the input value x0, ..., and the last access line can provide the input value x... n , where n is equal to or less than the number of access lines 335. The input is provided by pulse generator 331 and pulse generator 332.
[0046] The input can be multiplied by a weight matrix, which includes weight values stored in memory array 310 and represented by the conductance of memory cells 333. The weight matrix is represented as follows: Each of the memory cells 333 can store different weight values representing its conductance.
[0047] The output of matrix multiplication can be used as an output vector. Provides output (e.g., h0…h) n Each of these can be provided via different of the signal lines (e.g., sensing line 335 or access line 336). Matrix multiplication is represented as... Or h = Wx. In various instances, multiple examples of matrix multiplication can be performed in memory array 310. A single example of matrix multiplication can also be performed in memory array 310.
[0048] Therefore, memory array 310 and / or the storage bank of memory array 310 can be described as processing data. As used herein, processing includes using memory (e.g., memory array and / or the storage bank of memory) to generate an output in response to the receipt of an input. The output can be generated using the resistance of the memory cell of the memory and the input to the memory.
[0049] The input may be provided by pulse generators 331-1, 331-2, 332 (e.g., voltage pulse generators). Pulse generators 331-1, 331-2, 332 may include hardware to generate voltage pulses. In various instances, pulse generators 331-1, 331-2, 332 may receive voltage inputs or multiple voltage inputs and may generate multiple voltage pulses. In some instances, pulse generator 332 may be a random number generator. Pulse generator 332 may implement a discarding scheme, which may be used in conjunction with random number generation for sampling. Pulse generators 331-1, 331-2 may be deterministic pulse generators.
[0050] Outputs can be provided via sensing line 335 or access line 336. The outputs can be interpreted as current signals. The outputs can be provided to analog-to-digital converters (ADCs) 315-1 and 315-2. ADCs 315-1 and 315-2 can receive current and output voltage. ADC 315-1 measures the current supplied by access line 336. ADC 315-2 measures the current supplied by sensing line 335. The outputs of ADCs 315-1 and 315-2 can be voltage signals, which can be stored in the registers of the memory device, or can be directly provided to a voltage pulse generator coupled to a different memory array or the same memory array, awaiting reprogramming of memory array 310.
[0051] For example, memory array 310 can be used to generate an output that can be converted into a voltage signal by ADCs 315-1 and 315-2. The voltage signal can be stored in a register of the memory device. Memory array 310 can then be reprogrammed by resetting the conductance of memory cell 333. Resetting the conductance of memory cell 333 can reprogram memory array 310 to be used as different layers of a Bayesian neural network. The output stored in the register can be provided as input to pulse generators 331-1, 331-2, and 332, which can provide input to memory array 310.
[0052] Figure 3 The operation of a memory array 310 for implementing layers of a Bayesian neural network is demonstrated. Multiple layers of a Bayesian neural network can be implemented to forward propagate the Bayesian neural network, which can lead to inference. The Bayesian neural network can also forward propagate to prepare for backward propagation, as described below.
[0053] The resistive component 334 can be programmed by providing input via sensing line 335 and access line 336. Operation can be performed by providing input via either sensing line 335 or access line 336. Forward propagation can be performed by providing input via either sensing line 335 or access line 336. Backward propagation can be performed by providing input via the other of sensing line 335 or access line 336. For example, regarding Figure 3 The input (vector x) can be provided to memory unit 333 storing the weight matrix w via access line 336. The vector x can be the output from deterministic pulse generator 331-1. Deterministic pulse generator 331-1 can be operated to produce a known, defined output. Providing input to memory unit 333 effectively multiplies the vector x by the weight matrix w and results in the generation of an output (vector h). In various instances, the vector x can be the output from random pulse generator 332. The vector x can include random variables generated by random pulse generator 332. The weight matrix w can be stored in memory unit 333 using both deterministic pulse generator 331-1 and deterministic pulse generator 331-2. Deterministic pulse generators 331-1 and 331-2 can simultaneously provide input to store the weight matrix w in memory unit 333. Deterministic pulse generator 331-1 can provide input via access line 336, while deterministic pulse generator 331-2 provides input via sensing line 335.
[0054] Figure 4A and 4B This section describes the first and second portions of an example flow for performing forward propagation according to several embodiments of this disclosure. Due to the size of the figures, Figure 4 is divided into two pages, as shown below. Figure 4A and Figure 4B Forward propagation can be performed using multiple memory banks of the memory array. For example, forward propagation is shown to be performed using memory banks 442-1, 442-2, 442-3, and 442-4. Figure 4 illustrates the forward propagation of a layer in a Bayesian neural network.
[0055] Forward propagation may include generating a display using storage units 442-1, 442-2, 442-3 and random pulse generators 432-1, 432-2. Multiple weight values. The memory 442-4 can use these multiple weight values to generate the output of the layers of the Bayesian neural network.
[0056] Memory 442-1 can store parameters for a given layer of a Bayesian neural network. The parameter θ can be provided to the random pulse generator 432-1. The random pulse generator 432-1 can utilize the parameter θ to generate q(w) i |θ) generates multiple samples. From q(w) i The multiple weight values sampled by |θ) can be stored using the memory cells of storage bank 442-2, so that the weight values The weight value is stored in the memory cell of memory bank 442-2. The random pulse generator 432-2 can be supplied from the memory unit of the random pulse generator 432-1. The random pulse generator 432-2 can utilize weight values. Sampled from P(w). The sampled weighted values are provided by the random pulse generator 432-2. It can be stored in the memory cells of memory bank 442-3. Memory banks 442-1, 442-2, and 442-3 can be updated in each training period, where the training period is defined as a forward and backward loop.
[0057] weight value and weight values The input X can be stored in register 441 of the memory devices of managed storage units 442-1, 442-2, 442-3, and 442-4. Storage unit 442-4 can receive input X provided by pulse generator 431. In various instances, input X can be provided by the host along with instructions to generate inference using a Bayesian neural network.
[0058] The weight value w retrieved from register 441 i It can be programmed into memory 442-4. Input X and weight value w. i It can be used to generate the outputs of layers in a Bayesian neural network. The output of a hidden layer in a Bayesian neural network can be represented as h, while the output of a Bayesian neural network can be represented as y. The pulse generator 431 can be a deterministic pulse generator or a random pulse generator.
[0059] The control circuit system 405 can implement the forward propagation of layers in a Bayesian neural network using memory banks 442-1, 442-2, 442-3, and 442-4. The control circuit system 405 can program the memory cells of memory banks 442-1, 442-2, 442-3, and 442-4 and control pulse generators 432-1, 432-2, and 431 to provide input to memory banks 442-1, 442-2, 442-3, and 442-4.
[0060] Figure 5 Example flows for performing backpropagation according to several embodiments of the present disclosure are described. Control circuitry 505 may be configured to perform backpropagation using storage banks 542-1, 542-2 and registers 541-1, 541-2.
[0061] Backpropagation in a Bayesian neural network involves calculating multiple partial derivatives. For example, the partial derivatives of the loss function with respect to the parameter θ can be provided to the memory bank 542-2 of the memory array via a deterministic impulse generator. The partial derivatives of the loss function can be provided as follows: The partial derivatives of the loss function with respect to the parameter θ can be provided as a vector of values, such that each of the vector values is provided via one of the access lines to memory 542-2. Input to memory 542-2 This results in a partial derivative of F(w,θ) with respect to w (where F(w,θ) is equal to the partial derivative of F(w,θ) with respect to w). The output of ). Partial derivatives It can be stored in register 541-2.
[0062] The controller can provide to the storage unit 542-1 (For example, from register 541-2). Memory bank 542-1 can be programmed to store parameters. This input to memory 542-1 causes the partial derivative of F(w,θ) with respect to the mean value to be produced using the parameter θ stored in memory 542-1. The partial derivatives of F(w,θ) with respect to the standard deviation are (for) The output of ). In various examples, It can be provided to memory 542-1 to generate in the first number of operations It can be provided to storage 542-1 a second time to generate In other instances, the same number of operations can be used to generate simultaneously. and
[0063] Partial derivatives It can be generated by memory banks 542-1, 542-2, or those different from those shown here. In some instances, the partial derivatives... It can also be generated by the control circuit system 505 or the host computer. In various instances, the control circuit system 505 can cause the deterministic pulse generator to provide a partial derivative to the memory bank 541-2.
[0064] and The value can be stored in register 541-1. Although registers 541-1 and 541-2 are shown as different registers, registers 541-1 and 541-2 can be the same multiple registers or different multiple registers.
[0065] In various instances, the amount of memory used in the backpropagation of a layer in a Bayesian neural network can be less than the amount of memory used in the forward propagation of a layer in a Bayesian neural network. For example, the amount of memory used to generate partial derivatives stored in register 541-1 for a given layer of a Bayesian neural network can be less than the amount of memory used to generate inference for a layer of a Bayesian neural network.
[0066] Figure 6The following describes an example flow for updating weights according to several embodiments of this disclosure. The storage space used for updating the weights of layers in a Bayesian neural network may be less than the storage space used for performing backpropagation, and the storage space used for performing backpropagation may be less than the storage space used for performing forward propagation.
[0067] Register 641-2 can provide partial derivatives Furthermore, register 641-1 can provide partial derivatives. As input to storage 642-1, to update the average weight w of a specific layer of the Bayesian neural network, such that... Where 'a' is a constant. Register 641-1 can provide partial derivatives. Furthermore, register 641-1 can provide partial derivatives. As input to storage 642-1 to update the standard deviation, such that... ∈ is a random impulse. ρ is stored in memory 642-1 along with other parameters. Part of the calculation can be performed externally from storage units 642-1 and 642-2, for example, via an arithmetic unit. The arithmetic unit can be located either inside or outside the controller 605. Storage unit 642-1 can receive input via sensing lines and access lines to update the values stored in the memory units. Partial derivatives The sum of these terms represents the gradient with respect to the mean. The gradient with respect to the standard deviation is expressed as... Where ∈ is a constant. In various examples, the weights μ and ρ can be updated simultaneously or individually.
[0068] The updated parameter θ can be used to perform the forward propagation of the corresponding layer using memory 642-1. Figure 6 The diagram illustrates memory bank 642-2 to demonstrate the use of a single memory bank 642-1, and shows that the memory bank storing parameters θ of a layer in a Bayesian neural network can also be used to update parameters θ. Although the operations performed in the memory bank of the memory array are described as being performed using four memory banks, operations corresponding to layers of a Bayesian neural network can be performed using fewer or more memory banks. For example, forward propagation, backward propagation, and weight updates for a specific layer of a Bayesian neural network can be performed using one, two, three, four, and / or five or more memory banks.
[0069] Figure 7Example flowcharts illustrate a method 780 for implementing a Bayesian neural network in memory according to several embodiments of the present disclosure. Method 780 can be executed by processing logic, which may include hardware (e.g., processing means, circuitry, dedicated logic, programmable logic, microcode, device hardware, integrated circuits, etc.), software (e.g., instructions that run or execute on the processing means) or a combination thereof. In some embodiments, method 780 is performed by… Figure 1 The control circuitry system (e.g., controller) 105 executes the process. Although shown in a specific sequence or order, the order of the processes may be modified unless otherwise specified. Therefore, the illustrated embodiments should be understood as examples only, and the illustrated processes may be executed in different orders, and some processes may be executed in parallel. Furthermore, one or more processes may be omitted in various embodiments. Therefore, not all processes are required in every embodiment. Other process flows are possible.
[0070] At point 781, first data, including the training set and multiple parameters, of the first partial derivative of a first loss function, can be provided to the first storage of the memory device, wherein the first loss function corresponds to a layer of the Bayesian neural network. The first partial derivative is provided as... At 782, in response to receiving data including the first partial derivative, the first storage is used to generate second data, the second data including the second partial derivative of a second loss function with multiple weights and multiple parameters.
[0071] The second partial derivative provides for
[0072] At point 783, third data, which includes the third partial derivative of the second loss function, is generated using the second storage. The third data is generated using the second data.
[0073] At position 784, fourth data, including the fourth partial derivative of the second loss function, can be generated using the second storage, where the fourth data is generated using the second data. The third partial derivative is provided as... And the fourth partial derivative provides At point 785, the updated weight values of the layers of the Bayesian neural network can be written to another part of the memory device, at least in part based on the third data including the third partial derivative and the fourth data including the fourth partial derivative.
[0074] The first loss function can be generated at either the controller of the memory device or a third storage element of the memory device. The first loss function can provide the variational posterior of the Bayesian neural network (e.g., q(w)). i |θ)), prior (e.g., P(w) i The fit measure between multiple weights and training data (e.g., logP(D|w)) and training data. iThe divergence measure between )). Variational posterior and prior can be generated using different banks of memory in the memory device. For example, variational posterior and prior can be generated in banks other than the first and second banks of memory.
[0075] The second partial derivative may be stored in a first plurality of registers in the memory device so that it is available in a second memory bank and can be used to update a plurality of deterministic values (e.g., θ) of the Bayesian neural network. The third and fourth partial derivatives may be stored in a second plurality of registers so that they are available to update a plurality of deterministic values of the Bayesian neural network.
[0076] In different instances, multiple deterministic values can be accessed from a first plurality of memory cells of the memory array. A first plurality of weight values and a first plurality of bias values can be generated based on these deterministic values, wherein the multiple weight values and the multiple bias values are generated using a second plurality of memory cells of the memory array. A second plurality of weight values and a second plurality of bias values can be generated based on the first plurality of weight values and the second plurality of bias values, wherein the second plurality of weight values and the second plurality of bias values are generated using a third plurality of memory cells of the memory array. A result and a confidence level of the result can be determined using input provided by the host, the second plurality of weight values, and the second plurality of bias values of the Bayesian neural network, wherein the result and the confidence level are outputs of a fourth plurality of memory cells of the memory array.
[0077] The deterministic values may include the average weight, standard deviation of the weights, average bias, and standard deviation of the biases of multiple layers of a Bayesian neural network implemented in a memory array. The controller of the memory device may access multiple deterministic values using multiple memory banks, generate a first plurality of weight values and a first plurality of bias values, generate a second plurality of weight values and a second plurality of bias values, and determine the results and the confidence level of the results. The multiple memory banks may include a first plurality of memory cells, a second plurality of memory cells, a third plurality of memory cells, and a fourth plurality of memory cells. For example, the first plurality of memory cells may include a first memory bank, the second plurality of memory cells may include a second memory bank, the third plurality of memory cells may include a third memory bank, and the fourth plurality of memory cells may include a fourth memory bank. The first, second, third, and fourth memory banks may include layers of a Bayesian neural network.
[0078] In various instances, the first plurality of memory cells, the second plurality of memory cells, and the third plurality of memory cells may include a first memory bank, and the fourth plurality of memory cells may include a second memory bank. The first memory bank and the second memory bank may include layers of a Bayesian neural network.
[0079] In various examples, the first plurality of weight values, the first plurality of bias values, the second plurality of weight values, and the second plurality of bias values of the Bayesian neural network can be sampled using several Bayesian pulse generators coupled to a memory array. The first plurality of weight values and the first plurality of bias values of the Bayesian neural network can be sampled using a first random pulse generator among several random pulse generators. The second plurality of weight values of the Bayesian neural network can also be sampled using a second random pulse generator among several random pulse generators.
[0080] Several random pulse generators can be controlled to provide a first plurality of voltage pulses to a first plurality of signal lines to sample a first plurality of weight values and a first plurality of bias values, and a second plurality of memory units are coupled to the first plurality of signal lines. The random pulse generators can also be configured to provide a second plurality of voltage pulses to a second plurality of signal lines to sample a second plurality of weight values and a second plurality of bias values, and a third plurality of memory units are coupled to the second plurality of signal lines.
[0081] The first plurality of voltage pulses can cause a resistor based on a second plurality of memory cells to emit a first plurality of currents from the second plurality of memory cells. The second plurality of voltage pulses can cause a resistor based on a third plurality of memory cells to emit a second plurality of currents from the third plurality of memory cells.
[0082] A second plurality of memory cells may be coupled to a first plurality of different signal lines, wherein a third plurality of memory cells are coupled to a second plurality of different signal lines. A first plurality of currents may be provided to an analog-to-digital converter via the first plurality of different signal lines. The first plurality of currents may represent a first plurality of weight values and a first plurality of bias values. A second plurality of currents may be provided to different analog-to-digital converters via a second plurality of different signal lines, wherein the second plurality of currents represent a second plurality of weight values and a second plurality of bias values of a Bayesian neural network.
[0083] An analog-to-digital converter (ADC) can generate a first plurality of output voltages corresponding to a first plurality of currents. Using different ADCs, a second plurality of output voltages corresponding to a second plurality of currents can be generated. These second plurality of output voltages represent a second plurality of sampled weight values and a second plurality of bias values.
[0084] Multiple registers can store a first plurality of values corresponding to a first plurality of output voltages and a second plurality of values corresponding to a second plurality of output voltages, so that the first plurality of values and the second plurality of values can be used by different random pulse generators and a fourth plurality of memory units.
[0085] Various examples can implement systems including a controller coupled to memory devices and multiple registers. The controller can be configured to access, from multiple registers, a first partial derivative of the loss function with respect to the average of multiple deterministic values of the Bayesian neural network, a second partial derivative of the loss function with respect to the standard deviation of the multiple deterministic values, a third partial derivative of the loss function with respect to multiple weight values of the Bayesian network, and to update the resistance of multiple memory cells to store the updated deterministic values. The resistance of the multiple memory cells can be updated using the first, second, and third partial derivatives.
[0086] A first plurality of voltages corresponding to the first and second partial derivatives can be applied to a first plurality of signal lines. A second plurality of voltages corresponding to the third partial derivative can be applied to a second plurality of signal lines, wherein each of the plurality of memory cells is coupled to a corresponding one of the first plurality of signal lines and the second plurality of signal lines. Applying the first plurality of voltages and the second plurality of voltages can update the resistance of the plurality of memory cells. The resistance of the plurality of memory cells can represent multiple deterministic values of the Bayesian neural network.
[0087] Figure 8 An example machine illustrating computer system 890 is described, within which a set of instructions is executable to cause said machine to perform the various methods discussed herein. In various embodiments, computer system 890 may correspond to a system (e.g., Figure 1 A computing system 100), the system comprising, coupled to, or utilizing a memory subsystem (e.g., Figure 1 The memory device 103), or a controller that can be used to execute (e.g., Figure 1 The machine operates the controller circuitry system 105. In an alternative embodiment, the machine may be connected (e.g., networked) to other machines in a LAN, intranet, extranet, and / or the Internet. The machine may operate as a server or client machine in a client-server network environment, as a peer-to-peer machine in a peer-to-peer (or distributed) network environment, or as a server or client machine in a cloud computing infrastructure or environment.
[0088] The machine may be a personal computer (PC), tablet PC, set-top box (STB), personal digital assistant (PDA), cellular phone, network device, server, network router, switch, or bridge, or any machine capable of executing a set of instructions (sequentially or otherwise) specifying actions to be taken by said machine. Furthermore, although a single machine is described, the term "machine" should also be considered to include any collection of machines that individually or collectively execute a set (or more) of instructions to perform any or more of the methods discussed herein.
[0089] The example computer system 890 includes a processing device 891, a main memory 893 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 897 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 899, which communicate with each other via a bus 897.
[0090] Processing device 891 represents one or more general-purpose processing devices, such as microprocessors, central processing units, or the like. More specifically, the processing device may be a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor, a Very Long Instruction Word (VLIW) microprocessor, or a processor implementing other instruction sets, or a processor implementing a combination of instruction sets. Processing device 891 may also be one or more special-purpose processing devices, such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), network processors, or the like. Processing device 891 is configured to execute instructions 892 for performing the operations and steps discussed herein. Computer system 890 may further include a network interface device 895 for communicating via network 896.
[0091] Data storage system 899 may include machine-readable storage medium 889 (also referred to as computer-readable medium) thereon storing one or more sets of instructions 892 or software embodying any or more of the methods or functions described herein. During execution of instructions 892 by computer system 890, instructions 892 may also reside wholly or at least partially in main memory 893 and / or processing device 891, which also constitute machine-readable storage medium.
[0092] In one embodiment, instruction 892 includes instructions for implementing the corresponding Figure 1 The machine-readable storage medium 889 is shown as a single medium in the exemplary embodiment, but the term "machine-readable storage medium" should be considered to include a single medium or multiple media storing one or more sets of instructions. The term "machine-readable storage medium" should also be considered to include any medium capable of storing or encoding a set of instructions for machine execution and causing the machine to perform any or more of the methods of this disclosure. Therefore, the term "machine-readable storage medium" should be considered to include (but is not limited to) solid-state memory, optical media, and magnetic media.
[0093] As used herein, “several things” can refer to one or more of such things. For example, “several memory devices” can refer to one or more memory devices. “Multiple things” means two or more. Additionally, the indicator “N” used herein (especially with respect to reference numerals in the figures) indicates that several specific features so indicated may be included in several embodiments of this disclosure.
[0094] The figures in this document follow a numbering convention, where the first one or the first few digits correspond to the figure number, and the remaining digits identify elements or components in the figure. Similar elements or components between different figures can be identified by using similar digits. It will be understood that elements shown in the various embodiments herein may be added, interchanged, and / or eliminated to provide several additional embodiments of this disclosure. Furthermore, the scale and relative dimensions of the elements provided in the figures are intended to illustrate various embodiments of this disclosure and are not intended to be limiting.
[0095] Although specific embodiments have been described and illustrated herein, those skilled in the art will understand that arrangements calculated to achieve the same results may be substituted for the specific embodiments shown. This disclosure is intended to cover adaptations or variations of various embodiments of this disclosure. It should be understood that the foregoing description has been carried out in an illustrative rather than restrictive manner. Those skilled in the art will understand, upon reviewing the foregoing description, combinations of the foregoing embodiments and other embodiments not explicitly described herein. The scope of the various embodiments of this disclosure includes other applications using the above structures and methods. Therefore, the scope of the various embodiments of this disclosure should be determined with reference to the appended claims, together with the full scope of the equivalents granted thereto.
[0096] In the foregoing detailed description, various features are grouped in a single embodiment for the purpose of simplifying this disclosure. This approach of the disclosure should not be construed as reflecting an intention that the disclosed embodiments of the disclosure must use more features than expressly stated in each claim. Rather, as reflected in the appended claims, the inventive subject matter exists in fewer than all the features of a single disclosed embodiment. Therefore, the appended claims are hereby incorporated into the detailed description, wherein each claim is an independent, separate embodiment.
Claims
1. An apparatus for implementing a Bayesian neural network in memory, comprising: Memory arrays (110, 310); Controllers (105, 405, 505, 605), coupled to the memory array and configured to: Data is read from the first plurality of memory cells (333) of the memory array; Based on a plurality of deterministic values read from the first plurality of memory cells, a first plurality of weight values (224) and a first plurality of bias values (225) are generated using a second plurality of memory cells (333) of the memory array; Based on the first plurality of weight values and the first plurality of bias values, a second plurality of weight values (224) and a second plurality of bias values (225) are generated using a third plurality of memory cells (333) of the memory array; and Output data is transferred from the fourth plurality of memory cells (333) of the memory array, the output data including results based on at least the following and confidence levels of the results: Input provided by host (102), The second plurality of weight values, and The second plurality of bias values of the Bayesian neural network (220).
2. The device according to claim 1, wherein the deterministic values include the average weight, standard deviation of weight, average bias, and standard deviation of bias of the plurality of layers (221, 222-1, 222-2, 223) of the Bayesian neural network implemented in the memory array.
3. The device of claim 2, wherein the controller is further configured to: The system utilizes multiple storage banks (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2) to access the multiple deterministic values, generate the first multiple weight values and the first multiple bias values, generate the second multiple weight values and the second multiple bias values, and determine the result and the confidence level of the result; The plurality of memory units include the first plurality of memory units, the second plurality of memory units, the third plurality of memory units, and the fourth plurality of memory units.
4. The device according to claim 3, wherein: The first plurality of memory units include a first memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2), the second plurality of memory units include a second memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2), the third plurality of memory units include a third memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2), and the fourth plurality of memory units include a fourth memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2); and The first storage, the second storage, the third storage, and the fourth storage include layers (221, 222-1, 222-2, 223) of the Bayesian neural network.
5. The device according to claim 3, wherein: The first plurality of memory units, the second plurality of memory units, and the third plurality of memory units each include a first memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2), and the fourth plurality of memory units each include a second memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2); and The first storage and the second storage include layers of the Bayesian neural network.
6. The device according to any one of claims 1 to 5, wherein the controller is further configured to sample the first plurality of weight values, the first plurality of bias values, the second plurality of weight values and the second plurality of bias values of the Bayesian neural network using a plurality of random pulse generators (332, 432-1, 432-2) coupled to the memory array.
7. The device of claim 6, wherein the controller is further configured to sample the first plurality of weight values and the first plurality of bias values of the Bayesian neural network using a first random pulse generator (332, 432-1, 432-2) of the plurality of random pulse generators (332, 432-1, 432-2), and to sample the second plurality of weight values (224) of the Bayesian neural network using a second random pulse generator (332, 432-1, 432-2) of the plurality of random pulse generators.
8. The device of claim 6, wherein the controller is configured to control the plurality of random pulse generators to: A first plurality of voltage pulses are provided to a first plurality of signal lines (335, 336) to sample the first plurality of weight values and the first plurality of bias values, and a second plurality of memory cells are coupled to the first plurality of signal lines (335, 336); and A second plurality of voltage pulses are provided to a second plurality of signal lines (335, 336) to sample the second plurality of weight values and the second plurality of bias values, wherein the third plurality of memory cells are coupled to the second plurality of signal lines (335, 336).
9. The device according to claim 8, wherein: The first plurality of voltage pulses cause a first plurality of currents to be emitted from the second plurality of memory cells based on the resistance of the second plurality of memory cells; and The second plurality of voltage pulses cause a second plurality of currents to be emitted from the third plurality of memory cells based on the resistance of the third plurality of memory cells.
10. The device of claim 9, wherein the second plurality of memory cells are coupled to the first plurality of different signal lines; The third plurality of memory cells are coupled to a second plurality of different signal lines (335, 336); and The controller is further configured to control the memory array to: The first plurality of currents are provided to the analog-to-digital converters (315-1, 315-2) via the first plurality of different signal lines, wherein the first plurality of currents represent the first plurality of weight values and the first plurality of bias values; and The second plurality of currents are provided to different analog-to-digital converters (315-1, 315-2) via the second plurality of different signal lines, wherein the second plurality of currents represent the second plurality of weight values and the second plurality of bias values of the Bayesian neural network.
11. The device of claim 10, wherein the controller is further configured to: The analog-to-digital converter is used to generate a first plurality of output voltages corresponding to the first plurality of currents; Using the different analog-to-digital converters, a second plurality of output voltages corresponding to the second plurality of currents are generated, wherein the second plurality of output voltages represent the sampled second plurality of weight values and the second plurality of bias values.
12. The device of claim 11, further comprising a plurality of registers (442, 541, 641-1, 641-2), and wherein the controller is further configured to: The first plurality of values corresponding to the first plurality of output voltages and the second plurality of values corresponding to the second plurality of output voltages are stored in the plurality of registers so that the first plurality of values and the second plurality of values can be used by different random pulse generators and the fourth plurality of memory units.
13. A method for implementing a Bayesian neural network in memory, comprising: First data, including training data and the first partial derivative of a first loss function with multiple parameters, is provided to the first storage bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2) of the memory device (103), wherein the first loss function corresponds to the layers (221, 222-1, 222-2, 223) of the Bayesian neural network (220); In response to receiving the data including the first partial derivative, second data is generated using the first storage unit, which includes the second partial derivative of a second loss function comprising a plurality of weights (224) and the plurality of parameters (225); The third data, which includes the third partial derivative of the second loss function, is generated using the second storage bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2), wherein the third data is generated using the second data. The second storage is used to generate fourth data including the fourth partial derivative of the second loss function, wherein the fourth data is generated using the second data; and The updated weight values (224) of the layer of the Bayesian neural network are written to another part of the memory device, based at least in part on the third data including the third partial derivative and the fourth data including the fourth partial derivative.
14. The method of claim 13, further comprising: The first loss function is generated at one of the controller (105, 405, 505, 605) of the memory device and the third memory bank (442-1, 442-2, 442-3, 442-4, 542-1, 542-2, 642-1, 642-2) of the memory device; and The first loss function provides a divergence measure between the variational posterior and prior of the Bayesian neural network and the fit measure between the plurality of weights and the training data.
15. The method of claim 14, further comprising generating the variational posterior and the prior in different banks of the memory device.
16. The method according to any one of claims 13 to 15, further comprising storing the second data including the second partial derivative in a first plurality of registers of the memory device such that the second partial derivative is available for use in the second memory and for updating a plurality of deterministic values of the Bayesian neural network.
17. The method of claim 16, further comprising storing the third data including the third partial derivative and the fourth data including the fourth partial derivative in a second plurality of registers (442, 541, 641-1, 641-2) so that the third partial derivative and the fourth partial derivative can be used to update the plurality of deterministic values of the Bayesian neural network.
18. A system for implementing a Bayesian neural network in memory, comprising: Memory device (103); Multiple registers (442, 541, 641-1, 641-2); Controllers (105, 405, 505, 605), coupled to the memory device and the plurality of registers and configured to: First data representing the first partial derivative of the loss function of the Bayesian neural network (220) is accessed from the plurality of registers; Access second data representing the second partial derivative of the loss function from the plurality of registers; Access the third data representing the third partial derivative of the loss function from the plurality of registers; and The resistances of multiple memory cells in the memory device are updated using the first data, the second data, and the third data to store multiple updated deterministic values.
19. The system of claim 18, wherein the controller configured to update the resistances of the plurality of memory cells is further configured to: A first plurality of voltages corresponding to the first partial derivative and the second partial derivative are applied to a first plurality of signal lines (335, 336); and A second plurality of voltages corresponding to the third partial derivative are applied to a second plurality of signal lines (335, 336), wherein each of the plurality of memory cells is coupled to a corresponding one of the first plurality of signal lines and the second plurality of signal lines, and wherein the application of the first plurality of voltages and the second plurality of voltages updates the resistance of the plurality of memory cells.
20. The system according to any one of claims 18 to 19, wherein the resistance of the plurality of memory cells represents a plurality of deterministic values of the Bayesian neural network.