Worst-case noise and margin management for rpu crossbar arrays
By managing noise and bounds using a worst-case scaling factor on an RPU cross-switch array, the runtime latency problem in traditional methods is solved, enabling an efficient DNN training process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INTERNATIONAL BUSINESS MACHINE CORPORATION
- Filing Date
- 2021-11-04
- Publication Date
- 2026-06-23
Smart Images

Figure CN116583854B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the training of a deep neural network (DNN) on an analog crossbar array with resistive processing unit (RPU) devices, and more specifically, to a technique for noise and bound management of DNN training on an RPU crossbar array using a worst-case scaling factor, which provides effective runtime improvements. Background Technology
[0002] Deep neural networks (DNNs) can be implemented in analog cross-switch arrays of memory devices such as resistor processing units (RPUs). DNN-based models have been used for a variety of cognitive-based tasks, such as object and speech recognition and natural language processing. When performing these tasks, neural network training is required to provide a high level of accuracy. However, the operations performed on an RPU array are inherently analog and therefore susceptible to various noise sources. When the input values to the RPU array are small (e.g., for backward loop iterations), the output signal may be masked by noise, thus producing incorrect results.
[0003] Furthermore, digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) are used to convert the digital inputs of the RPU array into analog signals and the outputs of the RPU array back into digital signals, respectively. Therefore, the training process is also limited by the bounded range of the DACs and ADCs used in the array.
[0004] Boundary management becomes crucial for DNN training on RPU arrays, especially when weights are set according to automatic weight scaling. With automatic weight scaling, the available resistive state resources of the RPU devices in the array are optimally mapped to a weight range (resistance values), which is useful for DNN training by scaling the bounded weight range of the RPU devices with the array size.
[0005] The traditional approach is to identify the maximum value (m) in the input vector and scale the input values of the RPU array to that maximum value (m) to achieve optimal analog noise performance (noise management). To manage limits (limit management), saturation at the output of the RPU array is eliminated by reducing the value upon which the input signal forming the RPU array is based.
[0006] However, when the threshold is exceeded, the scaled input needs to be recalculated to reach the output threshold. While highly effective in addressing the increased test error caused by automatic weight scaling, this iterative scaling approach incurs undesirable costs, namely variable runtime.
[0007] Therefore, effective noise and limit management techniques with improved runtime would be desirable. Summary of the Invention
[0008] This invention provides a technique for noise and boundary management during training of a deep neural network (DNN) on a resistor processing unit (RPU) cross-switch array using a worst-case scaling factor, which provides effective runtime improvements. In one aspect of the invention, a method for noise and boundary management is provided. The method includes: obtaining an input vector value x of an analog cross-switch array of the RPU device, wherein a weight matrix is mapped to the analog cross-switch array of the RPU device; and scaling the input vector value x based on a worst-case scenario to provide a scaled input vector value x' as input to the analog cross-switch array of the RPU device, wherein the worst-case scenario comprises the sum of the assumed maximum weights of the weight matrix multiplied by the absolute values from the input vector value x.
[0009] For example, the absolute maximum input value x can be calculated from the input vector value x. mx The suggested scaling factor σ can be calculated as follows: Where ω is the assumed maximum weight of the weight matrix, s is the total input variable, and b is the output limit of the analog cross-switch array of the RPU device, and the noise and limit management scaling factor α can be set to the absolute maximum input value x. mx Alternatively, a scaling factor σ can be suggested, whichever is larger, and noise and bounds can be used to manage the scaling factor α to scale the input vector value x.
[0010] A more complete understanding of the invention, as well as its further features and advantages, will be obtained by referring to the following detailed description and accompanying drawings. Attached Figure Description
[0011] Figure 1 This is a diagram illustrating a deep neural network (DNN) embodied in an analog cross-switch array of a resistor processing unit (RPU) device according to an embodiment of the present invention;
[0012] Figure 2 This is a diagram illustrating an exemplary method for noise and boundary management according to an embodiment of the present invention;
[0013] Figure 3 This is a diagram illustrating an exemplary implementation of the noise and boundary management technique in a forward loop operation according to an embodiment of the present invention;
[0014] Figure 4 This is a diagram illustrating an exemplary implementation of the noise and boundary management technique in a backward loop operation according to an embodiment of the present invention;
[0015] Figure 5 This is a diagram illustrating an alternative exemplary method for noise and boundary management according to an embodiment of the present invention;
[0016] Figure 6 This is a diagram illustrating an exemplary apparatus for performing one or more methods given herein, according to embodiments of the present invention;
[0017] Figure 7 A cloud computing environment according to embodiments of the present invention is described; and
[0018] Figure 8 An abstract model layer according to an embodiment of the present invention is described. Detailed Implementation
[0019] As mentioned above, existing noise and boundary management techniques involve scaling the input values of the RPU array using the maximum value (m) in the input vector to obtain optimal analog noise performance (noise management). To manage boundaries, saturation at the output of the RPU array is eliminated by iteratively reducing the value upon which the input signal forming the RPU array is based until an output threshold is reached (boundary management). However, doing so undesirably leads to runtime delays.
[0020] Advantageously, this paper provides a technique for noise and bound management of deep neural network (DNN) training on an analog RPU cross-switch array when the dynamic range of a (noisy) analog system is limited and the runtime must be minimized. Specifically, as will be described in detail below, this method scales the input signal of the RPU array relative to the worst-case scenario (i.e., the maximum possible output of any weight matrix given a particular input condition), which is used to estimate the scaling factor required to bring the input signal into the limited dynamic range of the analog cross-switch system.
[0021] like Figure 1 As shown, the DNN can be embodied in the simulated cross-switch array of the RPU device, where each parameter (weight Wij) of the algorithm (abstract) weight matrix 102 is mapped to a single RPU device (RPUij) on the hardware, namely the physical cross-switch array 104 of the RPU device 110. The cross-switch array 104 has a set (first set) of conductive row lines 106 and a set (second set) of conductive column lines 108, which are orthogonal and intersect with the set of conductive row lines 106. See also... Figure 1The intersections of multiple sets of conductive row lines 106 and conductive column lines 108 are separated by RPU devices 110, forming a cross-switch array 104 of RPU devices 110. Each RPU device 110 may include an active region (i.e., a two-terminal device) between two electrodes. The conductivity state of the active region identifies the weight value of the RPU device 110, which can be updated / adjusted by applying a programming signal to the electrodes.
[0022] Each RPU device 110 (RPUij) is uniquely identified based on its position (i.e., in the i-th row and j-th column of the cross switch array 104). For example, in the cross switch array 104, from top to bottom and left to right, the RPU device 110 located at the intersection of the first conductive row line 106a and the first conductive column line 108a is designated as RPU11, the RPU device 110 located at the intersection of the first conductive row line 106a and the second conductive column line 108b is designated as RPU12, and so on. The mapping of the weight parameters in the weight matrix 102 to the RPU devices 110 in the cross switch array 104 follows the same convention. For example, the weight Wi1 of the weight matrix 102 is mapped to RPUi1 of the cross switch array 104, the weight Wi2 of the weight matrix 102 is mapped to RPUi2 of the cross switch array 104, and so on.
[0023] The RPU device 110 of the cross-switch array 104 serves as a weighted connection between neurons in the DNN. The resistance of the RPU device 110 can be changed by controlling the voltage applied between the individual wires of the set of conductive row lines 106 and the set of conductive column lines 108. The data on how the resistance of the RPU device 110 is changed is stored in the cross-switch array 104 based on, for example, a high resistance state or a low resistance state of the RPU device 110. The resistance state (high or low) of the RPU device 110 is read by applying (reading) voltage to the corresponding wires in the set of conductive row lines 106 and the set of conductive column lines 108 and measuring the current flowing through the target RPU device 110. All operations involving weighting are performed completely in parallel by the RPU device 110.
[0024] In machine learning and cognitive science, DNN-based models are a series of statistical learning models inspired by the biological neural networks of animals, particularly the brain. These models can be used to estimate or approximate the functions of systems and cognitions that depend on a large number of inputs and often unknown connection weights. DNNs are typically embodied as so-called "neuromorphic" systems of interconnected processor elements that act as simulated "neurons" exchanging "messages" with each other in the form of electronic signals. The connections carrying electronic messages between simulated neurons in a DNN are provided with numerical weights corresponding to the strength of a given connection. These numerical weights can be adjusted and tuned empirically, allowing the DNN to adapt to the input and learn. For example, a DNN used for image classification is defined by a set of input neurons that can be activated by the pixels of an input image. After being weighted and transformed by a function determined by the network designer, the activation of these input neurons is then passed on to other downstream neurons. This process is repeated until an output neuron is activated. The activated output neuron determines the classification of the image.
[0025] DNN training can be performed using a process such as stochastic gradient descent (SGD), where backpropagation is used to compute the error gradient for each parameter (weight Wij). Backpropagation is performed in three loops: a forward loop, a backward loop, and a weight update loop. These three loops are repeated multiple times until a convergence criterion is met.
[0026] A DNN-based model consists of multiple processing layers that learn data representations with multiple levels of abstraction. For a single processing layer where N input neurons are connected to M output neurons, the forward loop involves computing a vector-matrix multiplication (y = Wx), where a vector x of length N represents the activity of the input neurons, and a matrix W of size M×N stores the weight values between each pair of input and output neurons. The resulting vector y of length M is further processed by performing nonlinear activations on each resistive memory element before being passed to the next layer.
[0027] Once the information reaches the final output layer, the backward loop involves calculating the error signal and propagating it back through the DNN. The backward loop on a single layer also involves the weight matrix (z = W). T Vector-matrix multiplication on the transpose of δ (swapping each row and corresponding column), where the vector δ of length M represents the error computed by the output neuron, and the vector z of length N is further processed using the derivative of the neuron's nonlinearity, and then passed down to the previous layer.
[0028] Finally, in the weight update loop, the weight matrix W is updated by performing the outer product of the two vectors used in the forward and backward loops. This outer product is typically expressed as W←W+η(δx) T ), where η is the global learning rate. All these operations performed on the weight matrix 102 during backpropagation can be implemented using the cross-switch array 104 of the RPU device 110.
[0029] As described above, the digital-to-analog converter (DAC) and analog-to-digital converter (ADC) are used to convert the digital input of the RPU device 110 of the crossbar switch array 104 into an analog signal and to convert the output of the RPU device 110 of the crossbar switch array 104 back into a digital signal, respectively. In the case of noise and boundary management for DNN training on the analog RPU crossbar switch array,
[0030]
[0031] It is the input vector in the simulation space, and
[0032]
[0033] It is a digital output vector, where f DAC and f ADC Let represent the DAC and ADC conversions, respectively, and α be the noise and boundary management scaling factors. Operation via the RPU device 110 through the cross-switch array 104 has bounded output values from ADC saturation. That is, the ADC is limited to a range -b,...,b, where values below the output boundary -b or above the output boundary b are saturated with respect to the corresponding boundary. Relative information beyond the boundary values is lost due to clipping. Traditionally, if the analog calculation output... Greater than the limit, that is Then, with α←2α set, the calculation is iteratively repeated until the output is below the limit (limit management). However, iterative calculation has a negative impact on runtime.
[0034] In contrast, according to this technique, the input of the RPU device 110 of the crossbar switch array 104 is scaled based on a worst-case scenario to mitigate the risk of limited output values due to the restricted dynamic range of the analog crossbar switch array 104 of the RPU device 110. The term "worst-case scenario" as used herein refers to the maximum possible output (i.e., maximum weight) from a weight matrix (such as weight matrix 102) given a particular input vector. The physical conductance representing the weights in the RPU device is physically limited. Therefore, it is assumed herein that the weights in the weight matrix are in the range of –wmax to wmax, where wmax corresponds to gmax (i.e., the maximum conductance achievable by the RPU device). As will be described in detail below, using this technique, the absolute sum of the input signals and the assumption of constant maximum weights are used to calculate the noise and boundary management scaling factor (α) of the input and output in the digital peripheral device (before the DAC or after the ADC) to bring the input into the dynamic range of the RPU device 110 of the crossbar switch array 104. Advantageously, this worst-case management process does not add variable runtime because the results do not need to be recalculated when the limits are throttled. This is because the output limits are never throttled by using the worst case as a reference to determine the scaling factor.
[0035] Figure 2 This is a diagram illustrating an exemplary method 200 for noise and boundary management that scales the input to the RPU device 110 of the crossbar switch array 104 relative to the worst case. As described above, the forward and backward loops performed on the weight matrix 102 each involve vector matrix multiplication operations. In the analog crossbar switch array 104 of the RPU device 110, such vector matrix multiplication operations involve multiplying each input vector value (see below) with the corresponding weight value Wij (in the corresponding row) in the weight matrix 102 and summing the results. This process is also referred to herein as a “multiply-accumulate” operation of the analog crossbar switch array 104 of the RPU device 110. For each multiply-accumulate loop, the steps of method 200 are performed as a pre-computation in the digital aspect to determine a scaling factor α for the operation on the analog crossbar switch array 104 of the RPU device 110. It is worth noting that, according to an exemplary embodiment, one or more steps of method 200, including the calculation of the noise and boundary management scaling factor a, scaling / rescaling the input / output values from the cross switch array 104 of the RPU device 110 by a factor of a (see below), etc., are performed externally to the RPU array hardware, for example by means of, as described below. Figure 6The device described in the description is an apparatus such as 600. Additionally, one or more elements of this technology may optionally be provided as a service in a cloud environment. For example, by way of example only, training data for the input vector values (see below) may reside remotely on a cloud server. Furthermore, any step of method 200 may be performed on a dedicated cloud server to utilize high-performance CPUs and GPUs, after which the results are sent back to the local device.
[0036] In step 202, an input vector is obtained. The input vector includes a numeric value x. The numeric value x in the input vector is also referred to herein as "input vector value x". According to an exemplary embodiment, the input vector value x includes data from the training dataset. By way of example only, the training dataset can be obtained from a database of DNN training data or other storage.
[0037] In step 204, the absolute maximum value x of the input vector is... mx The calculation is as follows:
[0038] x mx =max i |x i | (3)
[0039] Therefore, x mx This can also be referred to as the absolute maximum input value in this document. In step 206, the absolute maximum input value x is... mx Assign to scaling factor a.
[0040] At this stage of the process, the weight values (see above) used for the vector-matrix multiplication operation performed on the analog crossbar array 104 of the RPU device 110 are unknown a priori. However, as stated above, the worst-case assumption is that all analog weights are at most positive for all positive input vector values and at most negative for all negative input vector values. In this case, in step 208, the sum of all absolute values of the input vector value x is calculated. In step 210, this sum is assigned to the total input variable s, i.e., s = ∑|x i |
[0041] In the preceding example, the input vector values x (negative and positive) are fed as input to the analog crossbar array 104 of the RPU device 110 in one pass. Alternatively, in another exemplary embodiment, the negative and positive input values of the input vector x are fed as input to the analog crossbar array 104 of the RPU device 110 in two separate passes, wherein the corresponding other input values (negative or positive) of the input vector are set to zero, and the outputs of the two passes are sign-corrected and added accordingly to obtain the final result. For example, if a negative input value is fed as input to the analog crossbar array 104 of the RPU device 110 in the first pass (while the positive input value is set to zero), then in the second pass, a positive input value is fed as input to the analog crossbar array 104 of the RPU device 110 (while the negative input value is set to zero), and vice versa. In this case, the corresponding worst-case scenario can be applied, where it is assumed that all positive (or negative) input vector values reach their maximum positive (or negative) weights, and all other weights contribute nothing to the output (and can therefore be assumed to be zero), since the corresponding input values will be set to zero for the corresponding iteration. In this case, the larger of either the positive or negative input vector values is allocated to the total input variable s.
[0042] That is, according to this alternative embodiment, in step 212, only the sum of all absolute values of the positive input vector values (s) p ) is calculated as:
[0043] s p =∑ i I(x i >0)·|x i | (4)
[0044] Where I(true) = 1 and I(false) = 0 indicate whether the condition is true. In step 214, only the sum of all absolute values of the negative input vector values (s) is considered. n ) is calculated as:
[0045] s n =∑ i I(x i <0)·|x i | (5)
[0046] Where I(true) = 1 and I(false) = 0 indicate whether the condition is true.
[0047] In step 216, the two quantities s are... p and s n The larger of the two variables is assigned to the total input variable s. That is, in this exemplary embodiment, the total input variable s is set as follows:
[0048] s = max(s)n ,s p (6)
[0049] Assume ω is the maximum weight of weight matrix 102, i.e., ω is the assumed maximum weight. However, as will be described in detail below, ω can be reduced (e.g., to 50% of the assumed maximum weight) because the worst-case scenario is highly unlikely. That is, for some impossible input vectors, the output may indeed be limited by the output bound, but for most cases this is impossible, making it unlikely that the DNN training or inference results will change significantly.
[0050] Given ω as the assumed maximum weight, the expected worst-case total output is ω multiplied by the total input s, i.e., ωs. As mentioned above, this expected worst-case total output is expected to be less than the output bound b. Therefore, in step 218, the proposed scaling factor σ of the input vector values (in the worst case) is calculated as the product of ω multiplied by the total input s divided by the output bound b, i.e.
[0051]
[0052] As described above, ω is the assumed maximum weight. According to the exemplary embodiment, ω is a user-defined value that maps the maximum conductance of the RPU device 110 to the “mathematical” weight values of the DNN. For example, by way of example only, a suitable value for the maximum weight ω might be 0.6. Therefore, based on Equation 7, if the actual total output from the analog cross-switch array 104 of the RPU device 110 is indeed as large as the expected worst-case total output (ωs), then σ = 1, and the total input s does not need to be scaled. Thus, even in the worst case, scaling ensures that the output limit b is never reached.
[0053] However, since the worst-case scenario is highly unlikely (i.e., the actual total output from the analog crossbar array 104 of the RPU device 110 is unlikely to be as large as the expected worst-case total output), it may be desirable to adjust the proposed scaling factor σ by reducing the assumed maximum weight ω to, for example, 50% of the assumed maximum weight and then recalculating the proposed scaling factor in step 218. See also Figure 2 (i.e., adjusting ω). In this case, although the expected worst-case total output limit is actually reached, the output is below the output limit b in most cases. Adjusting σ in this way usually produces the desired larger signal-to-noise ratio (SNR).
[0054] As described above, the digital-to-analog converter (DAC) is used to convert the scaled digital input vector values (according to method 200) into analog signals for performing vector-matrix multiplication on the analog crossbar array 104 of the RPU device 110. Specifically, the DAC converts the scaled digital input vector values into analog pulse widths. However, the DAC resolution may be limited. That is, it is worth noting that the proposed scaling factor σ calculated according to Equation 7 above may actually be so large that any value in the input vector values divided by the proposed scaling factor σ will result in a value smaller than the minimum bin of the digital-to-analog conversion. In this case, the inputs of the analog crossbar array 104 of the RPU device 110 will all be zero after the DAC conversion.
[0055] To avoid this situation, it might be desirable to use the value calculated from Equation 7 above, or the value calculated as the absolute maximum input value x, in step 220. mx Multiply by the variable (ρ) and divide by the quantization interval width (r) of the DAC DAC The suggested alternative value for the scaling factor σ (i.e.) (Using the smaller one) Set an upper limit (cap) for the suggested scaling factor σ:
[0056]
[0057] With an input range of (-1,...,1), the total range is 2. The DAC divides the total range into n steps (where n is the number of quantization steps (e.g., in the case of an 8-bit DAC, n = 256)) to achieve the quantization interval width (or simply "interval width"). Therefore, in this example, the quantization interval width is... for According to an exemplary embodiment, the variable ρ = 0.25. The variable ρ is essentially 1 at the minimum effective bit resolution. Therefore, for a value of ρ = 0.25, only 4 distinct values are allowed in the input range (instead of 256) due to σ-fold scaling.
[0058] In step 222, the noise and boundary management scaling factor (α) is set to x. mx (Based on Equation 3 above) or the suggested value of the scaling factor σ (based on Equation 8 above), whichever is larger shall prevail, i.e.
[0059] α = max(x) mx ,σ) (9)
[0060] This avoids making the maximum value of the scaled input vector (see step 224 described below) greater than 1 (which is assumed to be the maximum range of input vector values of the DAC, arbitrarily set to (-1,...,1)), because we do not want the input values to be clipped.
[0061] As described above, the above process is used to pre-calculate the noise and boundary management scaling factor α for each multiplication-accumulation cycle performed on the analog crossbar array 104 of the RPU device 110. Therefore, in step 224, each digital input vector value x is scaled by the noise and boundary management scaling factor (α) (calculated in step 222), i.e.
[0062] x←x / α (10)
[0063] A scaled digital input vector value x' is provided, which is converted into an analog signal via a digital-to-analog converter (DAC). In step 226, analog computation is then performed on the analog crossbar array 104 of the RPU device 110. As described above, the analog computation involves performing vector-matrix multiplication on the analog crossbar array 104 of the RPU device 110 by multiplying each scaled input vector value x' by the corresponding weight value in the weight matrix 102.
[0064] Similarly, in step 228, each analog output vector value obtained from the analog crossbar array 104 of the RPU device 110 is converted into a digital signal via an analog-to-digital converter (ADC) to provide a digital output vector value y'. In step 230, each digital output vector value y' is rescaled using a noise and boundary management scaling factor (α) (calculated in step 222), i.e.
[0065] y←y′α (11)
[0066] This provides a rescaled digital output vector value y.
[0067] Now, through reference Figure 3 and Figure 4 To describe exemplary implementations of this technology. That is, Figure 3 This is a schematic diagram illustrating a forward loop operation being performed on the analog crossbar switch array 304 of the RPU device 310. (See diagram for example.) Figure 3 As shown, the digital input vector value x (see “Digital Input x”) is provided as input to the analog crossbar switch array 304 of the RPU device 310. However, firstly, as in conjunction with the above... Figure 2 As described in Method 200, a noise and boundary management scaling factor is calculated (see “Noise / Boundary Management Calculation α”). Each digital input vector value x is then scaled by the noise and boundary management scaling factor (α), i.e., x←x / α, to provide a scaled digital input vector value x' (see “Scaled Digital RPU Input x'”).
[0068] The scaled digital input vector value x' is converted into an analog signal via a digital-to-analog converter (see "DA converter"). The analog signal, as an analog pulse width 320, is provided to the analog crossbar array 304 of the RPU device 310, where analog computations are performed. As described above, this analog computation involves performing vector-matrix multiplication on the analog crossbar array 304 of the RPU device 310 by multiplying each scaled input vector value x' by the corresponding weight value in the corresponding weight matrix (not shown). (Combined with the above...) Figure 1 The description is used to describe the mapping of the weight matrix to the analog cross switch array of the RPU device.
[0069] like Figure 3 As shown, the analog output vector value obtained from the operation performed on the analog crossbar array 304 of the RPU device 310 is provided to an integrated circuit 322 including an operational amplifier 324, which has an inverting input terminal connected (bridged) to the operational amplifier 324 and an output terminal (V) of the operational amplifier 324. out ) capacitor (C int The non-inverting input of operational amplifier 324 is grounded. The output of operational amplifier 324 (V...) out It is also connected to the input of the analog-to-digital converter (see “AD converter”).
[0070] An analog-to-digital converter (AD converter) converts each analog output vector value obtained from the analog crossbar array 304 of the RPU device 310 into a digital signal to provide a digital output vector value y' (see “Digital RPU Output y'”). Each digital output vector value y' is then rescaled by a noise and boundary management scaling factor (α) (see “Noise / Boundary Management Using α”), i.e., y←yα, to provide a rescaled digital output vector value y (see “Rescaled Digital Output y”). As described above, processes such as calculating the noise and boundary management scaling factor α, scaling / rescaled the input / output values from the crossbar array of the RPU device by a factor of α, can be performed externally to the RPU array hardware, for example, by means of, as described below... Figure 6 The device described herein is a device such as 600 that performs the operation.
[0071] Figure 4 This is a schematic diagram illustrating the backward loop operation performed on the analog crossbar switch array 404 of the RPU device 410. The process is generally combined with the above. Figure 3The description is the same as the forward loop operation, except that the transposed analog RPU array 404 is used for the backward loop iterations. "Transpose" means exchanging the input and output; that is, the previous output is now the input, and the previous input is the output. This is to (essentially) compute x = W′y, where W′ is the transpose of matrix W. Figure 4 As shown, the digital input vector value x (see “Digital Input x”) is provided as input to the analog crossbar switch array 404 of the RPU device 410. However, firstly, as in conjunction with the above... Figure 2 As described in the description of Method 200, a noise and boundary management scaling factor is calculated (see “Noise / Boundary Management Calculation α”). That is, as stated above, for each forward and backward loop, the steps of Method 200 are performed as a digital pre-calculation to determine a scaling factor α for operation on the analog crossbar switch array for the RPU device. Each digital input vector value x is then scaled by the noise and boundary management scaling factor (α), i.e., x←x / α, to provide a scaled digital input vector value x' (see “Scaled Digital RPU Input x'”).
[0072] The scaled digital input vector value x' is then converted into an analog signal via a digital-to-analog converter (see "DA converter"). The analog signal, as an analog pulse width 420, is provided to the analog crossbar array 404 of the RPU device 410, where analog computations are performed. As described above, this analog computation involves performing vector-matrix multiplication on the analog crossbar array 404 of the RPU device 410 by multiplying each scaled input vector value x' by the corresponding weight value in the corresponding weight matrix (not shown). (Combined with the above...) Figure 1 The description is used to describe the mapping of the weight matrix to the analog cross switch array of the RPU device.
[0073] like Figure 4 As shown, the analog output vector value obtained from the operation performed on the analog crossbar array 404 of the RPU device 410 is provided to an integrated circuit 422 including an operational amplifier 424, which has an inverting input terminal connected (bridged) to the operational amplifier 424 and an output terminal (V) of the operational amplifier 424. out ) capacitor (C intBACKWARD The non-inverting input of operational amplifier 424 is grounded. The output of operational amplifier 424 (V...) out It is also connected to the input of the analog-to-digital converter (see “AD converter”).
[0074] An analog-to-digital converter (AD converter) converts each analog output vector value obtained from the analog crossbar array 404 of the RPU device 410 into a digital signal to provide a digital output vector value y' (see “Digital RPU Output y'”). Each digital output vector value y' is then rescaled by a noise and boundary management scaling factor (α) (see “Noise / Boundary Management Using α”), i.e., y←yα, to provide a rescaled digital output vector value y (see “Rescaled Digital Output y”). As described above, processes such as calculating the noise and boundary management scaling factor α, scaling / rescaled the input / output values from the crossbar array of the RPU device by a factor of α, can be performed externally to the RPU array hardware, for example by a method such as the combination below. Figure 6 The device described herein is a device such as 600 that performs the operation.
[0075] As described above, this technique minimizes runtime by scaling the input vector value x of the analog crossbar switch array of the RPU device based on worst-case scaling. In the above embodiment, using the worst-case scenario as a reference for determining the scaling factor ensures that the output bounds are never throttled.
[0076] According to an alternative embodiment, method 200 described above (i.e., calculating the worst-case noise and boundary management scaling factor α, scaling / re-scaling the input / output values from the crossbar switch array of the RPU device by a factor α, etc.) is performed only when a limit is exceeded. See, for example, [link to example]. Figure 5 Example method 500.
[0077] In step 502, an input vector of the numerical value x is obtained, i.e., "input vector value x". According to an exemplary embodiment, the input vector value x includes data from the training dataset. By way of example only, the training dataset can be obtained from a database of DNN training data or other storage.
[0078] In step 504, the absolute maximum value x of the input vector is calculated according to Equation 3 above. mx (This is also referred to as the absolute maximum input value in this document). In step 506, the absolute maximum input value x is... mx The scaling factor α is assigned to noise and boundary management, i.e., α = x. mx .
[0079] In step 508, each digital input vector value x is scaled by a noise and bounds management scaling factor α = x. mx Scaling, i.e.
[0080] x′ initial ←x / α (12)
[0081] To provide a scaled digital input vector value x′ that is converted into an analog signal via a digital-to-analog converter (DAC). initial In step 510, an analog calculation is then performed on the analog crossbar array 104 of the RPU device 110. As described above, the analog calculation involves calculating each scaled input vector value x′. initial The vector matrix multiplication operation is performed on the analog crossbar array 104 of the RPU device 110 by multiplying the corresponding weight values in the weight matrix 102. Similarly, in step 512, each analog output vector value obtained from the analog crossbar array 104 of the RPU device 110 is converted into a digital signal via an analog-to-digital converter (ADC) to provide a digital output vector value y′. initial .
[0082] In step 514, the digital output vector value y′ is determined. initial Whether any of them has been capped (limited). For example, the digital output vector value y′ can be determined by sensing the saturation at the output of the operational amplifier. initial Whether any of them has been clipped. There are several circuit methods for determining the clipping limits. However, a straightforward approach is to simply use the maximum and minimum output values of the ADC, since the output will saturate to these values when the ADC input exceeds its range. Therefore, for example, if either digital output is 255 (which is the highest output value of an 8-bit DAC) or 0 (which is the lowest output value of an 8-bit DAC), then the limits are determined to be clipped and the calculation is repeated.
[0083] If the result in step 514 is "No", meaning there is no numerical output vector value y′ initial Since the amplitude has already been limited, in step 516, each digital output vector value y′ is... initial Scaling factor α = x based on noise and boundary management mx Rescaling, i.e.
[0084] y←y′ initial α (13)
[0085] This provides a rescaled digital output vector value y, and the process ends. Alternatively, if it is determined to be "yes" in step 514, i.e., the digital output vector value y′... initial If at least one of them has been limited, then in step 518, the worst-case noise and limit management scaling factor α is calculated as α = max(x mx ,σ)(see Equation 8 above), as combined with the above Figure 2 The method described in 200.
[0086] In step 520, each digital input vector value x is then managed by a scaling factor α = max(x) through worst-case noise and bounds. mx Scaling is performed on σ, i.e.
[0087] x′←x / α (14)
[0088] This provides a scaled digital input vector value x′ that is converted into an analog signal via a DAC. In step 522, analog computation is then performed on the analog crossbar array 104 of the RPU device 110. As described above, the analog computation involves performing vector-matrix multiplication on the analog crossbar array 104 of the RPU device 110 by multiplying each scaled input vector value x′ with the corresponding weight value in the weight matrix 102.
[0089] Similarly, in step 524, each analog output vector value obtained from the analog crossbar array 104 of the RPU device 110 is converted into a digital signal via an ADC to provide a digital output vector value y′. In step 526, each digital output vector value y′ is scaled by a worst-case noise and limit management scaling factor α = max(x) mx ,σ) rescaling, i.e.
[0090] y←y′α (15)
[0091] This provides a rescaled digital output vector value y. In this second iteration, output bounding is not tested because, in the typical worst-case scenario, the bounds are usually not bounded. However, even assuming the maximum weight ω is changed (see above) and some bounding occurs, this bounding should be ignored, and the bounds are not tested in the second iteration. Therefore, in the worst case, Method 500 requires only two iterations: once to initially determine that the output bounds have been bounded, and then again to manage the scaling factor α with the worst-case noise and bounds. This minimizes the impact on runtime, i.e., the runtime increases by at most two times.
[0092] This invention can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform aspects of the invention.
[0093] A computer-readable storage medium can be a tangible device capable of retaining and storing instructions for use by an instruction execution device. A computer-readable storage medium can be, for example, but not limited to, electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of computer-readable storage media includes: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanical encoding devices (such as punch cards or raised structures in recesses on which instructions are recorded), and any suitable combination of the foregoing. As used herein, a computer-readable storage medium should not be construed as being itself a transient signal, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
[0094] The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a suitable computing / processing device, or downloaded via a network (e.g., the Internet, a local area network, a wide area network, and / or a wireless network) to an external computer or external storage device. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to a computer-readable storage medium within the suitable computing / processing device.
[0095] Computer-readable program instructions used to perform the operations of this invention may be assembly instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk and C++, and traditional procedural programming languages such as "C" or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer via any type of network, including local area networks (LANs) or wide area networks (WANs), or may be connected to an external computer (e.g., via the Internet provided by an Internet service provider). In some embodiments, electronic circuits (including, for example, programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs)) can execute computer-readable program instructions by personalizing the electronic circuits with state information of computer-readable program instructions in order to perform aspects of the present invention.
[0096] This document describes aspects of the invention with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0097] These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create parts for implementing the functions / actions specified in one or more blocks of a flowchart and / or block diagram. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and / or other device to operate in a particular manner, such that the computer-readable storage medium in which the instructions are stored comprises an article of manufacture containing instructions that implement aspects of the functions / actions specified in one or more blocks of a flowchart and / or block diagram.
[0098] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device, thereby producing a computer-implemented process, such that the instructions, which are executed on the computer, other programmable apparatus or other device, implement the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0099] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of instructions, including one or more executable instructions for implementing a specified logical function(s). In some alternative implementations, the functions marked in the blocks may occur in a non-linear order. For example, two blocks shown consecutively may actually be executed substantially simultaneously, or these blocks may sometimes be executed in reverse order, depending on the functions involved. It will also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented by a dedicated hardware-based system that performs the specified function or action or executes a combination of dedicated hardware and computer instructions.
[0100] As described above, according to an exemplary embodiment, one or more steps of method 200, including the calculation of the noise and boundary management scaling factor α, scaling / re-scaling the input / output values from the crossbar switch array of the RPU device, etc., can be performed externally to the RPU array hardware, for example by a method such as Figure 6 The device shown is a device such as 600 to perform the operation. Figure 6 This is a block diagram of an apparatus 600 for implementing one or more methods given herein. By way of example only, apparatus 600 may be configured to implement... Figure 2 Method 200 involves one or more steps.
[0101] The device 600 includes a computer system 610 and a removable medium 650. The computer system 610 includes a processor device 620, a network interface 625, a memory 630, a media interface 635, and an optional display 640. The network interface 625 allows the computer system 610 to connect to a network, while the media interface 635 allows the computer system 610 to interact with media such as a hard disk drive or the removable medium 650.
[0102] Processor device 620 can be configured to implement the methods, steps, and functions disclosed herein. Memory 630 can be distributed or local, and processor device 620 can be distributed or single. Memory 630 can be implemented as electrical, magnetic, or optical memory, or any combination of these or other types of storage devices. Furthermore, the term "memory" should be interpreted broadly enough to include any information that can be read from or written to an addressable space accessible by processor device 620. In this definition, information on a network accessible via network interface 625 remains within memory 630, as processor device 620 can retrieve information from the network. It should be noted that each distributed processor constituting processor device 620 typically contains its own addressable memory space. It should also be noted that some or all of computer system 610 can be incorporated into application-specific or general-purpose integrated circuits.
[0103] The optional display 640 is any type of display suitable for human user interaction with the device 600. Typically, the display 640 is a computer monitor or other similar display.
[0104] refer to Figure 7 and Figure 8 It should be understood that although this disclosure includes a detailed description of cloud computing, the implementation of the teachings described herein is not limited to a cloud computing environment. Rather, embodiments of the invention can be implemented in conjunction with any other type of computing environment now known or developed hereafter.
[0105] Cloud computing is a service delivery model that enables convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing power, memory, storage, applications, virtual machines, and services) with minimal management effort or interaction with service providers to rapidly provision and release these configurable computing resources. This cloud model may include at least five features, at least three service models, and at least four deployment models.
[0106] The features are as follows:
[0107] On-demand self-service: Cloud consumers can automatically and unilaterally allocate computing power (such as server time and network storage) as needed without manual interaction with service providers.
[0108] Extensive network access: Capabilities are available through the network and accessed via standard mechanisms that facilitate the use of heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
[0109] Resource pooling: Using a multi-tenant model, a provider's computing resources are pooled to serve multiple consumers, where different physical and virtual resources are dynamically allocated and reallocated based on demand. There is a sense of location independence because consumers typically cannot control or know the exact location of the resources provided, but can specify the location at a higher level of abstraction (e.g., country, state, or data center).
[0110] Rapid and flexible: Features can be deployed quickly and flexibly, automatically in certain situations to rapidly scale up and quickly scale down. For consumers, the available features often appear unlimited, and any quantity can be purchased at any time.
[0111] Measurement services: Cloud systems automatically control and optimize resource usage by leveraging metering capabilities at a level of abstraction appropriate to the service type (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both providers and consumers of the services being utilized.
[0112] The service model is as follows:
[0113] Software as a Service (SaaS): This provides consumers with the ability to use applications from a provider that run on cloud infrastructure. These applications can be accessed from various client devices via thin client interfaces such as web browsers (e.g., web-based email). Consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, storage, and even individual application capabilities, with the exception of potentially restricted user-specific application configuration settings.
[0114] Platform as a Service (PaaS): This provides consumers with the ability to deploy consumer-created or acquired applications, built using provider-supported programming languages and tools, onto cloud infrastructure. Consumers do not manage or control the underlying cloud infrastructure, including networks, servers, operating systems, or storage, but they can control the deployed applications and, if necessary, the configuration of the application hosting environment.
[0115] Infrastructure as a Service (IaaS) provides consumers with the ability to provision processing, storage, networking, and other basic computing resources on which consumers can deploy and run arbitrary software, including operating systems and applications. Consumers do not manage or control the underlying cloud infrastructure, but they can control the operating system, storage, deployed applications, and may have limited control over selected networking components (e.g., host firewalls).
[0116] The deployment model is as follows:
[0117] Private cloud: This cloud infrastructure is operated solely by the organization. It can be managed by the organization or a third party, and it can exist internally or externally.
[0118] Community cloud: This cloud infrastructure is shared by several organizations and supports specific communities with common concerns (e.g., mission, security needs, policies, and compliance considerations). It can be managed by an organization or a third party and can exist internally or externally.
[0119] Public cloud: This cloud infrastructure is available to the general public or large industry groups and is owned by organizations that sell cloud services.
[0120] Hybrid cloud: This cloud infrastructure consists of two or more clouds (private, community, or public) that maintain a single entity but are bound together by standardization or proprietary technology to enable data and application portability (e.g., cloud bursts for load balancing between clouds).
[0121] Cloud computing environments are service-oriented, focusing on statelessness, loose coupling, modularity, and semantic interoperability. At the heart of cloud computing is the infrastructure comprising a network of interconnected nodes.
[0122] Now for reference Figure 7 The diagram illustrates an illustrative cloud computing environment 50. As shown, the cloud computing environment 50 includes one or more cloud computing nodes 10, whose local computing devices used by cloud consumers (such as personal digital assistants (PDAs) or cellular phones 54A, desktop computers 54B, laptop computers 54C, and / or automotive computer systems 54N) can communicate with the cloud computing nodes 10. The nodes 10 can communicate with each other. They can be physically or virtually grouped (not shown) in one or more networks (such as private clouds, community clouds, public clouds, or hybrid clouds as described above, or combinations thereof). This allows the cloud computing environment 50 to provide Infrastructure as a Service, Platform as a Service, and / or Software as a Service, for which cloud consumers do not need to maintain resources on their local computing devices. It should be understood that... Figure 7 The types of computing devices 54A-54N shown are merely illustrative, and computing node 10 and cloud computing environment 50 can communicate with any type of computerized device via any type of network and / or network-addressable connection (e.g., using a web browser).
[0123] Now for reference Figure 8 This demonstrates a cloud computing environment of 50 ( Figure 7 This provides a set of functional abstractions. It should be understood beforehand. Figure 8The components, layers, and functions shown are merely illustrative, and embodiments of the invention are not limited thereto. As shown, the following layers and corresponding functions are provided:
[0124] The hardware and software layer 60 includes hardware components and software components. Examples of hardware components include: a mainframe 61; a RISC (Reduced Instruction Set Computer) based server 62; a server 63; a blade server 64; a storage device 65; and network and networking components 66. In some embodiments, software components include network application server software 67 and database software 68.
[0125] The virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities can be provided: virtual servers 71; virtual storage 72; virtual networks 73, including virtual private networks; virtual applications and operating systems 74; and virtual clients 75.
[0126] In one example, management layer 80 can provide the following functionalities: Resource provisioning 81 provides dynamic procurement of computing resources and other resources used to perform tasks within the cloud computing environment. Metering and pricing 82 provides cost tracking as resources are utilized within the cloud computing environment and bills or invoices for the consumption of these resources. In one example, these resources may include application software licenses. Security provides authentication for cloud consumers and tasks, as well as protection for data and other resources. User portal 83 provides access to the cloud computing environment for consumers and system administrators. Service level management 84 provides cloud resource allocation and management to meet required service levels. Service Level Agreement (SLA) planning and fulfillment 85 provides pre-deployment and procurement of cloud resources, anticipating future demand for cloud resources based on the SLA.
[0127] Workload layer 90 provides examples of functionalities that can leverage a cloud computing environment. Examples of workloads and functionalities that can be provided from this workload layer include: mapping and navigation 91; software development and lifecycle management 92; virtual classroom education delivery 93; data analytics and processing 94; transaction processing 95; and scaling factor calculation and input / output scaling / re-scaling 96.
[0128] Although illustrative embodiments of the invention have been described herein, it should be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications can be made by those skilled in the art without departing from the scope of the invention.
Claims
1. A method for noise and boundary management, the method comprising: Obtain the input vector value x of the analog crossbar switch array of the resistor processing unit (RPU) device, where the weight matrix is mapped to the analog crossbar switch array of the RPU device; and The input vector value x is scaled based on the worst-case scenario to provide a scaled input vector value x' as the input to the analog crossbar switch array of the RPU device, wherein the worst-case scenario represents the maximum possible output of the weight matrix, which is the sum of the assumed maximum weights of the weight matrix multiplied by the absolute values from the input vector value x.
2. The method according to claim 1, further comprising: Calculate the absolute maximum input value based on the input vector value x. ; Suggested scaling factor Calculated as ,in, It is the assumed maximum weight of the weight matrix. It is the total input variable, and It is the output limit of the analog crossbar switch array of the RPU device; Scaling factor for noise and boundary management Set to the absolute maximum input value Or the suggested scaling factor The larger one shall prevail; and Use the aforementioned noise and boundary management scaling factor To scale the input vector value x.
3. The method according to claim 2, wherein, The absolute maximum input value The calculation is as follows: .
4. The method according to claim 2, further comprising: Calculate the sum of all absolute values of the input vector value x; as well as The sum of all absolute values of the input vector value x is allocated to the total input variable. .
5. The method according to claim 2, further comprising: Calculate the sum of all absolute values of only the positive input vector values. ; Calculate the sum of all absolute values of only the negative input vector values. ; as well as Will and The larger one is assigned to the total input variable. .
6. The method according to claim 2, further comprising: The suggested scaling factor The upper limit is set to be calculated as The value or is calculated as The smaller of the alternative values shall be used, where, It is a variable, and It is the range width of digital-to-analog quantization.
7. The method according to claim 2, further comprising: Reduce the assumed maximum weight of the weight matrix ; as well as Recalculate the proposed scaling factor .
8. The method according to claim 2, further comprising: The scaled input vector value x' is converted into an analog signal; as well as Perform vector-matrix multiplication on the analog cross switch array of the RPU device.
9. The method according to claim 8, wherein, Performing the vector-matrix multiplication operation includes: Each scaled input vector value x' is multiplied by the corresponding weight value in the weight matrix.
10. The method of claim 8, further comprising: The analog output vector value obtained from the analog cross switch array of the RPU device is converted into a digital signal to provide the digital output vector value y'; as well as Use the aforementioned noise and boundary management scaling factor To rescale the digital output vector value y' to provide a rescaled digital output vector value y.
11. The method according to claim 1, further comprising: The absolute maximum input value is calculated based on the input vector value x. ; The absolute maximum input value Assigned to scaling factor ; Using the scaling factor To scale the input vector value x to provide a scaled input vector value. Used as the input to the analog crossbar switch array of the RPU device; The scaled input vector values Converted into an analog signal; Perform vector-matrix multiplication on the analog cross switch array of the RPU device; The analog output vector values obtained from the analog crossbar switch array of the RPU device are converted into digital signals to provide digital output vector values. ; Determine whether the digital output vector value is Any of them have been limited; as well as When the digital output vector value When at least one of them has been throttled, the input vector value x is scaled based on the worst-case scenario.
12. An apparatus for noise and boundary management, comprising a processor connected to a memory, the processor being configured to: Obtain the input vector value x of the analog crossbar switch array of the RPU device, where, The weight matrix is mapped to the analog cross-switch array of the RPU device; and The input vector value x is scaled based on the worst-case scenario to provide a scaled input vector value x' as the input to the analog crossbar switch array of the RPU device, wherein the worst-case scenario represents the maximum possible output of the weight matrix, which is the sum of the assumed maximum weights of the weight matrix multiplied by the absolute values from the input vector value x.
13. The apparatus according to claim 12, wherein, The processor is also used for: The absolute maximum input value is calculated based on the input vector value x. ; Suggested scaling factor Calculated as ,in, It is the assumed maximum weight of the weight matrix. It is the total input variable, and It is the output limit of the analog crossbar switch array of the RPU device; Scaling factor for noise and boundary management Set to the absolute maximum input value Or the suggested scaling factor The larger one shall prevail; and Use the aforementioned noise and boundary management scaling factor To scale the input vector value x.
14. The apparatus according to claim 13, wherein, The processor is also used for: Calculate the sum of all absolute values of the input vector value x; and The sum of all absolute values of the input vector value x is allocated to the total input variable. .
15. The apparatus according to claim 13, wherein, The processor is also used for: Calculate the sum of all absolute values of only the positive input vector values. ; Calculate the sum of all absolute values of only the negative input vector values. ;as well as Will and The larger one is assigned to the total input variable. .
16. The apparatus according to claim 13, wherein, The processor is also used for: The suggested scaling factor The upper limit is set to be calculated as The value or is calculated as The smaller of the alternative values shall be used, where, It is a variable, and It is the range width of digital-to-analog quantization.
17. A computer program product for noise and boundary management, the computer program product comprising program instructions executable by a computer to cause the computer to: Obtain the input vector value x of the analog crossbar switch array of the RPU device, where, The weight matrix is mapped to the analog cross-switch array of the RPU device; and The input vector value x is scaled based on the worst-case scenario to provide a scaled input vector value x' as the input to the analog crossbar switch array of the RPU device, wherein the worst-case scenario represents the maximum possible output of the weight matrix, which is the sum of the assumed maximum weights of the weight matrix multiplied by the absolute values from the input vector value x.
18. The computer program product according to claim 17, wherein, The program instructions also cause the computer to: The absolute maximum input value is calculated based on the input vector value x. ; Suggested scaling factor Calculated as ,in, It is the assumed maximum weight of the weight matrix. It is the total input variable, and It is the output limit of the analog crossbar switch array of the RPU device; Scaling factor for noise and boundary management Set to the absolute maximum input value Or the suggested scaling factor The larger one shall prevail; and Use the aforementioned noise and boundary management scaling factor To scale the input vector value x.
19. The computer program product according to claim 18, wherein, The program instructions also cause the computer to: Calculate the sum of all absolute values of the input vector value x; and The sum of all absolute values of the input vector value x is allocated to the total input variable. .
20. The computer program product according to claim 18, wherein, The program instructions also cause the computer to: Calculate the sum of all absolute values of only the positive input vector values. ; Calculate the sum of all absolute values of only the negative input vector values. ;as well as Will and The larger one is assigned to the total input variable. .