Memory device for performing convolution operations
By performing convolution operations using a memristor-based memory device, and by utilizing simulated MAC operations and setting up an ADC and shift adder at the input of the PE accumulator, the problem of information loss in artificial neural networks when processing 3D image data is solved, and the cost and size of the memory device are reduced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SK HYNIX INC
- Filing Date
- 2022-03-01
- Publication Date
- 2026-06-26
AI Technical Summary
Existing artificial neural networks suffer from low efficiency in feature extraction and learning when processing 3D image data due to the loss of spatial information in the images. Furthermore, existing memory devices are costly and bulky when performing convolution operations.
A memristor-based memory device is used to perform analog MAC operations through multiple processing elements and digital arithmetic circuits. The weighted feature map is used to perform partial graph convolution operation on the input feature map, and an ADC and a shift adder are set at the input of the PE accumulator to reduce the number of circuit elements.
It improves learning efficiency while preserving image spatial information, and reduces the manufacturing cost and size of memory devices.
Smart Images

Figure CN115705488B_ABST
Abstract
Description
[0001] Cross-references to related applications
[0002] This application claims priority to Korean Patent Application No. 10-2021-0102202, filed on August 3, 2021, which is incorporated herein by reference in its entirety. Technical Field
[0003] This disclosure relates to an electronic device, and more particularly, to a memory device for performing convolution operations. Background Technology
[0004] Artificial neural networks configured solely with fully connected layers are limited to one-dimensional (layout) input data. On the other hand, a single color image is three-dimensional data, and several images used in batch processing constitute 4D data. When learning a fully connected (FC) neural network from image data, the three-dimensional image data needs to be flattened into one dimension. Spatial information is lost during this flattening process. Therefore, artificial neural networks suffer from low efficiency in feature extraction and learning due to the loss of spatial information, and their accuracy improvement is limited. A model capable of learning while preserving image spatial information is the Convolutional Neural Network (CNN). Summary of the Invention
[0005] Embodiments of this disclosure provide a memory device capable of performing convolution operations that can reduce manufacturing costs.
[0006] According to embodiments of this disclosure, a memory device performs a convolution operation. The memory device includes first to Nth processing elements (PEs), a first analog-to-digital converter (ADC), a first shift adder, and a first accumulator. The first to Nth PEs are each associated with at least one weighted data segment included in a weighted feature map and are configured to perform a partial convolution operation using at least one input data segment included in the input feature map. The first ADC is configured to receive a first result of the partial convolution operation from the first to Nth PEs. The first shift adder shifts and adds the output of the first ADC. The first accumulator accumulates the output from the first shift adder. Here, N can be a natural number equal to or greater than 2.
[0007] In embodiments of this disclosure, each of the first to Nth PEs may include the first to kth synaptic arrays. Here, k may be a natural number equal to or greater than 2.
[0008] In embodiments of this disclosure, the first ADC may receive the output of the first synaptic array of each of the first to Nth PEs as a first result.
[0009] In embodiments of this disclosure, the first ADC may receive the sum of the output currents of the first synaptic arrays of each of the first to Nth PEs.
[0010] In embodiments of this disclosure, the memory device may further include: a second ADC configured to receive a second result of a partial convolution operation from the first to the Nth PE; a second shift adder configured to shift the output of the second ADC; and a second accumulator configured to accumulate the output from the second shift adder.
[0011] In embodiments of this disclosure, the second ADC may receive the output of the second synaptic array of each of the first to Nth PEs as a second result.
[0012] In embodiments of this disclosure, the second ADC may receive the sum of the output currents of the second synaptic arrays of each of the first to Nth PEs.
[0013] In embodiments of this disclosure, each of the first to kth synaptic arrays may include a plurality of memristors.
[0014] According to another embodiment of this disclosure, a memristor-based deep learning accelerator includes a convolutional computing device comprising multiple processing elements (PEs) and digital computing circuitry. The multiple processing elements (PEs) are configured to perform Equation 2 on a portion of an input feature map using a weighted feature map to generate their respective currents via simulated MAC operations. The digital computing circuitry is configured to synthesize the currents into their respective binary values and perform the operation of Equation 1.
[0015] [Equation 1]
[0016]
[0017] [Equation 2]
[0018] C LK =V IL(2) *V WLK(2)
[0019] Here, "PR" RiCj "This is the result of the shift adder operation." This is a partial image. It is a weighted feature map. It is a convolution operator, "I L "W" is an element of a partial graph. L " is an element of the weighted feature map and corresponds to a PE, "V IL(2) " is the binary value of an element in a partial graph, "V WLK(2)" is the binary value of the Kth bit within an element of the weighted feature map, "C LK "N" represents the current, "M" represents the number of rows in each of the partial graph and the weighted feature graph, "P" represents the number of columns in the partial graph or the weighted feature graph, and "P" represents the number of bits in each element of the partial graph and the weighted feature graph. This technology can provide a memory device that performs convolution operations, which can reduce manufacturing costs. Attached Figure Description
[0020] Figure 1 This is a chip-level diagram of a memory device according to an embodiment of the present disclosure.
[0021] Figure 2 This illustrates an embodiment according to the present disclosure. Figure 1 A block diagram of the structure of the included tiles.
[0022] Figure 3 This illustrates an embodiment according to the present disclosure. Figure 2 A diagram showing the structure of the processing element (PE).
[0023] Figure 4 This illustrates an embodiment according to the present disclosure. Figure 3 A diagram showing the structure of the synaptic array.
[0024] Figure 5 This is a diagram illustrating a convolution operation of an input feature map and a weighted feature map according to an embodiment of the present disclosure.
[0025] Figures 6A to 6C This is a diagram illustrating the convolution operation of an input feature map IFM and a weighted feature map WFM according to an embodiment of the present disclosure.
[0026] Figures 7A to 7D This is a diagram illustrating the convolution operation of a memory device according to an embodiment of the present disclosure.
[0027] Figure 8 This is a block diagram illustrating the PE and PE accumulator of a memory device according to an embodiment of the present disclosure.
[0028] Figure 9 This is a block diagram illustrating the PE and PE accumulator of a memory device according to another embodiment of the present disclosure.
[0029] Figure 10 This illustrates an embodiment according to the present disclosure. Figure 9 A diagram illustrating the convolution operation of the memory device shown. Detailed Implementation
[0030] The specific structural or functional descriptions illustrating embodiments of the concepts disclosed in this specification are merely for the purpose of describing embodiments of the concepts disclosed in this specification. Embodiments of the concepts disclosed in this specification may be implemented in various forms and should not be construed as limited to the embodiments described herein.
[0031] Figure 1 This is a chip-level diagram of a memory device according to an embodiment of the present disclosure.
[0032] Reference Figure 1 The memory device 100 may include multiple blocks 110 and peripheral circuitry 101. The peripheral circuitry 101 may include a pooling component 130, an accumulator 150, an activator 170, and a global buffer 190.
[0033] In a convolutional neural network, input data can be transmitted to multiple blocks 110. At this point, segments of data generated by applying a sliding window to the original form of input data can be transmitted to the respective blocks.
[0034] In one embodiment, pooling component 130 generates a new layer by adjusting the size of a convolutional layer configured with an activation map generated by a convolution operation. In another embodiment, pooling component 130 can generate a set of feature values for pixel data within a predetermined range. For example, pooling component 130 can perform max pooling, using the maximum value among the pixel data within the predetermined range as the feature value. In another example, pooling component 130 can perform average pooling, using the average value of the pixel data within the predetermined range as the feature value. Additionally, pooling component 130 can perform random pooling or cross-channel pooling. Pooling operations can reduce parameters, thereby suppressing overfitting of the corresponding network. Furthermore, performing pooling operations reduces the burden on subsequent operations, saves hardware resources, and improves the speed of convolution operations.
[0035] Activator 170 can activate the operation of each in block 110, and global buffer 190 can buffer input and output data.
[0036] Reference Figure 1 The line structure 105 connecting activator 170 to each of blocks 110 can be implemented as an H-tree structure. Due to the H-tree structure, the delay time of the data input / output lines from activator 170 to each of blocks 110 can be uniformly controlled.
[0037] Figure 2 This illustrates an embodiment according to the present disclosure. Figure 1 A block diagram of the structure of the blocks included in it.
[0038] Reference Figure 2Block 110 may include multiple processing elements (PEs) 111, block buffers 113, and accumulation output buffers 115. The line structure connecting each of the block buffers 113 and PEs 111 can be implemented as an H-tree structure. Therefore, as described above, the delay time of the data input / output lines from block buffer 113 to each of the PEs 111 can be uniformly controlled. Furthermore, data input to one PE through block buffer 113 can be transferred from the corresponding PE to another PE under predetermined conditions. This reduces the required capacity of block buffer 113. The accumulation output buffer 115 can accumulate the data processing results calculated by each PE and output the accumulated data.
[0039] Figure 3 This illustrates an embodiment according to the present disclosure. Figure 2 A diagram illustrating the structure of PE is shown. (Refer to...) Figure 3 The PE 200 may include multiple synaptic arrays 210, a PE buffer 230, and an accumulation output buffer 250. Each synaptic array 210 can receive data from the PE buffer 230 and perform synaptic processing. The result of the synaptic processing performed by each synaptic array 210 can be transmitted to the accumulation output buffer 250. The accumulation output buffer 210 can accumulate the synaptic processing results calculated by each synaptic array 210 and output the accumulated data.
[0040] Figure 4 This illustrates an embodiment according to the present disclosure. Figure 3 A diagram illustrating the structure of the synaptic array. (Refer to...) Figure 4 The synaptic array 300 may include a source line switch matrix 310, a word line switch matrix 320, a MUX (multiplexer) 330, a decoder 340, multiple units 350, multiple ADCs 360, multiple adders 370, and multiple shift registers 380.
[0041] Unit 350 can be implemented in various ways. In an embodiment, the units included in the synaptic array 300 can be implemented as a one-transistor-one-resistor (1T1R) structure.
[0042] Data can be transmitted to the target cell via the source line switch matrix 310. Conversely, the word line switch matrix 320 can activate the target cell based on an address. Furthermore, the MUX 330 can multiplex the outputs from the cells and transmit the outputs to each of the ADCs. The decoder 340 can control the operation of the MUX 330.
[0043] Figures 1 to 4The illustrated memory device can be implemented as an example of a deep learning accelerator architecture based on memristors (or 1T1R array cells). As described above, multiple PEs and multiple synaptic arrays exist within the memory device, and the synaptic arrays can perform simulated MAC operations. The synaptic array 300, including cells with a 1T1R structure, performs simulated MAC operations according to Ohm's law and Kirchhoff's current law.
[0044] like Figure 4 As shown, the synaptic array included in the memory device comprises multiple ADCs. Specifically, the memory elements of the synaptic array are configured in a cross-bar shape, and an ADC 360 sensing the output current is present in each column. The output of the ADC 360 can be transmitted to an adder 370 and a shift register 380. The adder 370 and shift register 380 can perform multiplication operations on the digital values sequentially output from the ADC 360 through addition and shift operations.
[0045] exist Figure 4 The present disclosure includes as many ADCs 360 as there are columns; however, it is not limited thereto. In another embodiment, the number of ADCs 360 may be less than the number of columns of the memory cells. In this case, the output from each column can be transmitted to the appropriate ADC via a MUX. When the number of ADCs 360 is less than the number of columns of the memory cells, the implementation area of the circuit can be reduced, but the data processing speed may be slower.
[0046] Figure 5 This is a diagram illustrating a convolution operation of an input feature map and a weighted feature map according to an embodiment of the present disclosure.
[0047] Reference Figure 5 The input feature map IFM consists of data with a horizontal and vertical dimension of 5, and the weighted feature map WFM consists of data with a horizontal and vertical dimension of 2. The input feature map IFM includes data segments I1 to I25, and the weighted feature map WFM includes data segments W1 to W4. More specifically, the first row of the input feature map IFM includes data segments I1, I3, I5, I7, and I9; the second row includes data segments I2, I4, I6, I8, and I10; the third row includes data segments I11, I13, I15, I17, and I19; the fourth row includes data segments I12, I14, I16, I18, and I20; and the fifth row includes data segments I21, I22, I23, I24, and I25. Similarly, the first row of the weighted feature map WFM includes data segments W1 and W2, and the second row includes data segments W3 and W4. However, this is just an example; various sizes of input feature maps IFM and weighted feature maps WFM can be applied.
[0048] Figures 6A to 6C This is a diagram illustrating the convolution operation of an input feature map (IFM) and a weighted feature map (WFM) according to an embodiment of the present disclosure. (Refer to...) Figure 6A To perform the convolution operation between the input feature map IFM and the weight feature map WFM, the weight feature map WFM can be located at position (R1, C1). When the weight feature map WFM is located at position (R1, C1), the data segments W1, W2, W3, and W4 of the weight feature map WFM correspond to the data segments I1, I3, I2, and I4 of the input feature map IFM, respectively. In other words, when the weight feature map WFM is located at position (R1, C1), part of the convolution operation result PR... R1,C1 It can be represented by the following equation 1.
[0049] [Equation 1]
[0050] PR R1,C1 =I1·W1+I3·W2+I2·W3+I4·W4
[0051] After performing the calculation according to Equation 1, refer to Figure 6B The weight feature map WFM can be located at position (R1, C2). When the weight feature map WFM is located at position (R1, C2), the data segments W1, W2, W3, and W4 of the weight feature map WFM correspond to the data segments I3, I5, I4, and I6 of the input feature map IFM, respectively. That is, when the weight feature map WFM is located at position (R1, C2), the partial result PR of the convolution operation... R1,C2 It can be represented as Equation 2 below.
[0052] [Equation 2]
[0053] PR R1,C2 =I3·W1+I5·W2+I4·W3+I6·W4
[0054] In this method, convolution operations can be performed for each case where the weight feature map WFM is located at positions (R1, C3) and (R1, C4). After the convolution operation is performed when the weight feature map WFM is located at position (R1, C4), the weight feature map WFM may no longer move to the right of the input feature map IFM. Therefore, the weight feature map WFM can be located at position (R2, C1) by changing the row, as shown below. Figure 6C As shown. When the weight feature map WFM is located at position (R2, C1), the data segments W1, W2, W3, and W4 of the weight feature map WFM correspond to the data segments I2, I4, I11, and I13 of the input feature map IFM, respectively. That is, when the weight feature map WFM is located at position (R2, C1), part of the convolution operation result PR...R2,C1 It can be represented by the following equation 3.
[0055] [Equation 3]
[0056] PR R2,C1 =I2·W1+I4·W2+I11·W3+I13·W4
[0057] In this way, the above operation can be repeated until the weight feature map WFM is located at position (R4, C4). In summary, the CNV result of the convolution operation of the input feature map IFM and the weight feature map WFM can be expressed as Equation 4 below.
[0058] [Equation 4]
[0059]
[0060] Figures 7A to 7D This is a diagram illustrating a convolution operation of a memory device according to an embodiment of the present disclosure. Referring hereafter, it will be noted together with... Figures 7A to 7D To describe the convolution operation of a memory device according to embodiments of the present disclosure. Figures 7A to 7D In, it is shown Figure 2 The first to fourth PEs shown are PE#1 to PE#4 or 410 to 440. The first PE, PE#1 or 410, performs a convolution operation related to data segment W1 in the weighted feature map WFM. The second PE, PE#2 or 420, performs a convolution operation related to data segment W2 in the weighted feature map WFM. The third PE, PE#3 or 430, performs a convolution operation related to data segment W3 in the weighted feature map WFM. The fourth PE, PE#4 or 440, performs a convolution operation related to data segment W4 in the weighted feature map WFM. For ease of discussion, Figures 7A to 7D The ADC, adder, and shift register connected to each synaptic array are omitted.
[0061] Reference Figure 7A An embodiment is shown in which a convolution operation related to a data segment W1 in a weighted feature map WFM is performed via a first PE PE#1 or 410. With the weighted feature map WFM in position (R1, C1), the data segment I1 is input to the first synaptic array SA#1 or 411 of the first PE PE#1, and the "I1·W1" operation is performed.
[0062] Reference Figure 7BAn embodiment is shown in which a convolution operation related to a data segment W2 in a weighted feature map WFM is performed via a second PE PE#2 or 420. With the weighted feature map WFM in position (R1, C1), the data segment I3 is input to the first synaptic array SA#1 or 421 of the second PE PE#2, and the "I3·W2" operation is performed.
[0063] Reference Figure 7C An embodiment is shown in which a convolution operation related to a data segment W3 in a weighted feature map WFM is performed via a third PE PE#3 or 430. With the weighted feature map WFM in position (R1, C1), the data segment I2 is input to the first synaptic array SA#1 or 431 of the third PE PE#3, and the "I2·W3" operation is performed.
[0064] Reference Figure 7D An embodiment is shown in which a convolution operation related to a data segment W4 in a weighted feature map WFM is performed via a fourth PE PE#4 or 440. With the weighted feature map WFM in position (R1, C1), the data segment I4 is input to the first synaptic array SA#1 or 441 of the fourth PE PE#4, and the "I4·W4" operation is performed.
[0065] In other words, referencing Figures 7A to 7D With the weight feature map WFM located at position (R1, C1), the first synaptic arrays 411, 421, 431, and 441 of the first to fourth PEs PE#1 to PE#4 or 410 to 440 respectively perform the operations "I1·W1", "I3·W2", "I2·W3", and "I4·W4". As described in Equation 1, with the weight feature map WFM located at position (R1, C1), when all "I1·W1", "I3·W2", "I2·W3", and "I4·W4" are added together, a partial result PR of the convolution operation can be calculated. R1,C1 In other words, the outputs of the first synaptic arrays 411, 421, 431, and 441 of the first to fourth PEs (PE#1 to PE#4 or 410 to 440) are transferred to the accumulator output buffer. The accumulator output buffer adds the values “I1·W1”, “I3·W2”, “I2·W3”, and “I4·W4” output by the first synaptic arrays 411, 421, 431, and 441 of the first to fourth PEs (PE#1 to PE#4 or 410 to 440) and stores the value obtained by the addition.
[0066] according to Figures 7A to 7DAs shown, with the weight feature map WFM located at position (R1, C2), the second synaptic arrays 412, 422, 432, and 442 of the first to fourth PEs PE#1 to PE#4 or 410 to 440 respectively perform the operations “I3·W1”, “I5·W2”, “I4·W3”, and “I6·W4”. As described in Equation 2, when all “I3·W1”, “I5·W2”, “I4·W3”, and “I6·W4” are added together, with the weight feature map WFM located at position (R1, C2), a partial result PR of the convolution operation can be calculated. R1,C2 .
[0067] Similarly, when the weighted feature map WFM is located at position (R1, C3), the third synaptic arrays 413, 423, 433, and 443 of the first to fourth PEs PE#1 to PE#4 or 410 to 440 perform the operations “I5·W1”, “I7·W2”, “I6·W3”, and “I8·W4”, respectively. Furthermore, when the weighted feature map WFM is located at position (R1, C4), the fourth synaptic arrays 414, 424, 434, and 444 of the first to fourth PEs PE#1 to PE#4 or 410 to 440 perform the operations “I7·W1”, “I9·W2”, “I8·W3”, and “I10·W4”, respectively.
[0068] In other words, according to Figures 7A to 7D As shown, when each of the first to fourth PEs (PE#1 to PE#4 or 410 to 440) includes four synaptic arrays, for the first row R1, partial results of the convolution operation corresponding to the four columns C1 to C4 can be calculated simultaneously. After calculating the partial results of the convolution operation corresponding to the four columns C1 to C4 for the first row R1, the rows can be changed to calculate the partial results of the four columns C1 to C4 for the second row R2. When this process is performed to the fourth row R4, the convolution result CNV of the input feature map IFM and the weight feature map WFM described in Equation 3 can be obtained.
[0069] Figure 8 This is a block diagram illustrating the PE and PE accumulator of a memory device according to an embodiment of the present disclosure. (Refer to...) Figure 8 It shows more details Figures 7A to 7D The first to fourth PEs 410 to 440 and the first to fourth PE accumulators 450 to 480 are shown. For ease of description, other components are omitted from the illustration.
[0070] As mentioned above Figure 7A The first PE 410 includes first to fourth synaptic arrays SA#1 to SA#4. Figure 8In one embodiment, the first PE 410 includes a plurality of ADCs and shift adders respectively connected to the first to fourth synaptic arrays SA#1 to SA#4. That is, the first PE PE#1 or 410 includes the first to fourth synaptic arrays SA#1 to SA#4 or 411, 412, 413 and 414, and the first to fourth ADCs 411a, 412a, 413a and 414a corresponding to the first to fourth synaptic arrays SA#1 to SA#4 or 411, 412, 413 and 414, and shift adders 411b, 412b, 413b and 414b respectively. The second PE PE#2 or 420 includes first to fourth synaptic arrays SA#1 to SA#4 or 421, 422, 423 and 424, and first to fourth ADCs 421a, 422a, 423a and 424a corresponding to the first to fourth synaptic arrays SA#1 to SA#4 or 421, 422, 423 and 424, respectively, and shift adders 421b, 422b, 423b and 424b. Additionally, the third PE PE#3 or 430 includes first to fourth synaptic arrays SA#1 to SA#4 or 431, 432, 433 and 434, and first to fourth ADCs 431a, 432a, 433a and 434a and shift adders 431b, 432b, 433b and 434a respectively, corresponding to the first to fourth synaptic arrays SA#1 to SA#4 or 431, 432, 433 and 434. Finally, the fourth PE PE#4 or 440 includes the first to fourth synaptic arrays SA#1 to SA#4 or 441, 442, 443 and 444, and the first to fourth ADCs 441a, 442a, 443a and 444a and shift adders 441b, 442b, 443b and 444b respectively, corresponding to the first to fourth synaptic arrays SA#1 to SA#4 or 441, 442, 443 and 444.
[0071] In this specification, a shift adder includes Figure 4 The components of adder 370 and shift register 380 are shown.
[0072] Referring to the first PE 410, data segment I1 is applied to the synaptic array SA#1 or 411 of the first PE 410. On the other hand, the result I1·W1 obtained by multiplying data segment I1 and data segment W1 through ADC 411a and shift adder 411b is transmitted to the first PE accumulator 450.
[0073] Referring to the second PE 420, data segment I3 is applied to the synaptic array SA#1 or 421 of the second PE 420. On the other hand, the result I3·W2 obtained by multiplying data segment I3 and data segment W2 through ADC 421a and shift adder 421b is transmitted to the first PE accumulator 450.
[0074] Referring to the third PE 430, data segment I2 is applied to the synaptic array SA#1 or 431 of the third PE 430. On the other hand, the result I2·W3 obtained by multiplying data segment I2 and data segment W3 through ADC 431a and shift adder 431b is transmitted to the first PE accumulator 450.
[0075] Referring to the fourth PE 440, data segment I4 is applied to the synaptic array SA#1 or 441 of the fourth PE 440. On the other hand, the result I4·W4 obtained by multiplying data segment I4 and data segment W4 through ADC 441a and shift adder 441b is transmitted to the first PE accumulator 450.
[0076] The first PE accumulator 450 can receive the operation results of the first synaptic arrays SA#1 411, 421, 431, and 441, ADCs 411a, 421a, 431a, and 441a, and shift adders 411b, 421b, 431b, and 441b respectively included in the first to fourth PEs 410 to 440, and add the operation results together. Thus, a partial result PR of the convolution operation described by Equation 1 can be generated. R1,C1 .
[0077] When the weight feature map is shifted one step to the right, i.e., when the weight feature map WFM is at position (R1, C2), the partial result PR of the convolution operation... R1,C2 This can be performed by the second synaptic arrays SA#2 412, 422, 432, and 442 included in the first to fourth PEs 410 to 440, respectively, the ADCs 412a, 422a, 432a, and 442a, the shift adders 412b, 422b, 432b, and 442b, and the second PE accumulator 460. On the other hand, when the weight feature map is shifted two steps to the right, i.e., when the weight feature map WFM is located at position (R1, C3), the partial result PR of the convolution operation... R1,C3 This can be performed by the third synaptic arrays SA#3 413, 423, 433, and 443 included in the first to fourth PEs 410 to 440, respectively; ADCs 413a, 423a, 433a, and 443a; shift adders 413b, 423b, 433b, and 443b; and the third PE accumulator 470. On the other hand, when the weight feature map is shifted three steps to the right, i.e., when the weight feature map WFM is located at position (R1, C4), the partial result PR of the convolution operation... R1,C4It can be performed by the fourth synaptic arrays SA#4414, 424, 434 and 444 included in the first to fourth PEs 410 to 440 respectively, ADCs 414a, 424a, 434a and 444a, shift adders 414b, 424b, 434b and 444b, and the fourth PE accumulator 480.
[0078] according to Figure 8 In the illustrated embodiment, each of the first to fourth PE#1 to PE#4 includes an ADC for converting the current output from the synapse array into a digital value, and a shift adder for performing multiplication operations based on the digital value. Figure 8 In this context, when each of the first to fourth PE#1 to PE#4 includes an ADC and a shift adder, a larger substrate area may be required to implement the memory device. This is because many circuit elements are needed to implement the ADC and shift adder. This results in an increase in the size of the memory device implemented to perform convolution operations. Additionally, the manufacturing cost of the memory device increases.
[0079] According to an embodiment of the memory device based on this disclosure, instead of implementing an ADC and a shift adder for each synaptic array, the ADC and shift adder are provided at the input of the PE accumulator. Therefore, the number of ADCs and shift adders required to perform convolution operations can be reduced, thereby reducing the manufacturing cost of the memory device. Hereinafter, reference is made to... Figure 9 and Figure 10 Describe a memory device with an ADC and a shift adder configured at the input of the PE accumulator.
[0080] Figure 9 This is a block diagram illustrating the PE and PE accumulator of a memory device according to another embodiment of the present disclosure.
[0081] Reference Figure 9 The diagram illustrates the first to fourth physical exciters 415, 425, 435, and 445 in the memory device, as well as the first to fourth accumulators 453, 463, 473, and 483. Figure 9 The diagram shows the first to fourth ADCs 451, 461, 471, and 481, corresponding to the first to fourth PE accumulators 453, 463, 473, and 483, respectively, and the first to fourth shift adders 452, 462, 472, and 482. For ease of description, other components are omitted from the diagram.
[0082] As mentioned above Figure 7A The first PE 415 comprises first to fourth synaptic arrays SA#1 to SA#4 or 416 to 419. Figure 8 The implementation methods are different. Figure 9 The first PE 415 does not include an ADC and a shift adder.
[0083] The second PE 425 includes the first to fourth synaptic arrays SA#1 to SA#4 or 426 to 429. Different from the Figure 8 embodiment of Figure 9 the second PE 425 does not include an ADC and a shift adder.
[0084] Similarly, the third PE 435 includes the first to fourth synaptic arrays SA#1 to SA#4 or 436 to 439. The third PE 435 does not include an ADC and a shift adder.
[0085] In addition, the fourth PE 445 includes the first to fourth synaptic arrays SA#1 to SA#4 or 446 to 449. The fourth PE 445 does not include an ADC and a shift adder.
[0086] Referring to the first PE 415, the data segment I1 is applied to the first synaptic array SA#1 or 416 of the first PE 415. The first current I as the synaptic processing result of the first synaptic array SA#1 or 416 V1 is transmitted to the first ADC 451 outside the first PE 415. Similarly, the second current I as the synaptic processing result of the first synaptic array SA#1 or 426 of the second PE 425 V2 is transmitted to the first ADC 451 outside the second PE 425. Similarly, the third current I as the synaptic processing result of the first synaptic array SA#1 or 436 of the third PE 435 V3 is transmitted to the first ADC 451 outside the third PE 435. Finally, the fourth current I as the synaptic processing result of the first synaptic array SA#1 or 446 of the fourth PE 445 V4 is transmitted to the first ADC 451 outside the fourth PE 445.
[0087] The input end of the first ADC 451 receives the first to fourth currents I output from the first synaptic arrays SA#1 416, 426, 436, and 446 of each of the first to fourth PEs 415 to 445 V1 、I V2 、I V3 and I V4 . According to Kirchhoff's law, the first to fourth currents I output from the first synaptic arrays SA#1 416, 426, 436, and 446 of each of the first to fourth PEs 415 to 445 V1 、I V2 、I V3 and I V4The values are added together and input to the first ADC 451. The first ADC 451 performs digital conversion on the received current and transmits the digitally converted current to the first shift adder 452. The first shift adder 452 performs multiplication by shifting or adding the digital values sequentially output from the first ADC 451. (Refer to the following...) Figure 10 A more detailed description Figure 9 The convolution operation of the memory device shown.
[0088] Figure 10 This illustrates an embodiment according to the present disclosure. Figure 9 A diagram illustrating the convolution operation of the memory device shown.
[0089] Reference Figure 10 This describes a partial result PR of the convolution operation when the weighted feature map WFM is located at (R1, C1). R1,C1 The method. For ease of discussion, both "I3·W2" and "I4·W4" are 0. That is to say, in Figure 10 The image shows a partial result of the convolution operation, PR. R1,C1 The method of adding the factors "I1·W1" and "I2·W3". Figure 10 The example shows the convolution operation with data segment I1 having a value of 7, data segment W1 having a value of 3, data segment I2 having a value of 6, and data segment W3 having a value of 5.
[0090] according to Figure 8 In the illustrated embodiment, the value of "I1·W1" is calculated by the first synaptic array 411, the first ADC 411a, and the first shift adder 411b of the first PE 410. That is, the first PE 410 directly outputs the digital value of "I1·W1" and transmits the digital value of "I1·W1" to the first PE accumulator 450.
[0091] like Figure 10 As shown, “I1·W1” is the product of 7 and 3, and the first ADC 411a and the first shift adder 411b calculate the value of “I1·W1” as described in Equation 5 below.
[0092] [Equation 5]
[0093] I1·W1=111 (2) *011 (2) =111 (2) *1*2 0 +111 (2) *1*2 1 +111 (2) *0*2 2 =21 (10)
[0094] Furthermore, “I2·W3” is the product of 6 and 5, and the first ADC 431a and the first shift adder 431b calculate the value of “I2·W3” as described in Equation 6 below.
[0095] [Equation 6]
[0096] I²W³ = 110 (2) *101 (2) =110 (2) *1*2 0 +110 (2) *0*2 1 +110 (2) *1*2 2 =30 (10)
[0097] Figure 8 The first PE accumulator 450 receives value 21 from the first PE 410 and value 30 from the third PE 430. The first PE accumulator 450 can add the received values 21 and 30 and store the result value 51.
[0098] according to Figure 9 The memory device shown firstly, at time t1, outputs a first current I from the first synaptic array 416 of the first PE 415. V1 (t1) and the third current I output from the first synaptic array 436 of the third PE 435. V3 (t1) is input to the first ADC 451. According to Kirchhoff's laws, the first current I... V1 (t1) and the third current I V3 (t1) is summed and input into the first ADC 451 (①). For example... Figure 10 As shown, at time t1, the first current I V1 (t1) and the third current I V3 The sum of (t1) is the value calculated by Equation 7 below.
[0099] [Equation 7]
[0100] I V1 (t1)+I V3 (t1)=111 (2) *0+110 (2) *1 = 6 (10)
[0101] First current I V1 (t1) and the third current I V3 The sum of (t1) is input into the shift register (②). At this time, the output value of the shift register can be calculated by the following equation 8.
[0102] [Equation 8]
[0103] (I V1 (t1)+I V3 (t1))*2 1 =12 (10)
[0104] Furthermore, at time t2, the first current I output from the first synaptic array 416 of the first PE 415 V1 (t2) and the third current I output from the first synaptic array 436 of the third PE 435. V3 (t2) is input to the first ADC 451. According to Kirchhoff's laws, the first current Iv1(t2) and the third current I... V3 (t2) is summed and input into the first ADC 451 (③). For example... Figure 10 As shown, at time t2, the first current I V1 (t2) and the third current I V3 The sum of (t2) is the value calculated by equation 9 below.
[0105] [Equation 9]
[0106] I V1 (t2)+I V3 (t2)=111 (2) *1+110 (2) *0 = 7 (10)
[0107] The results from Equation 8 and Equation 9 are input into the adder (④). Therefore, the output value of the adder can be calculated by Equation 10 below.
[0108] [Equation 10]
[0109] (I V1 (r1)+I V3 (t1))*2 1 +I V1 (t2)+I V3 (t2)=7 (10) +12 (10) =19 (10)
[0110] The result of Equation 10 is input back into the shift register (⑤). At this point, the output value of the shift register can be calculated using Equation 11 below.
[0111] [Equation 11]
[0112] ((I V1 (t1)+I V3 (t1))*21 +I V1 (t2)+I V3 (t2))*2 1 =38 (10)
[0113] At time t3, the first current I output from the first synaptic array 416 of the first PE 415 is... V1 (t3) and the third current I output from the first synaptic array 436 of the third PE 435. V3 (t3) is input to the first ADC 451. According to Kirchhoff's laws, the first current I... V1 (t3) and the third current I V3 (t3) is summed and input into the first ADC 451 (⑥). For example... Figure 10 As shown, at time t3, the first current I V1 (t3) and the third current I V3 The sum of (t3) is the value calculated by equation 12 below.
[0114] [Equation 12]
[0115] I V1 (t3)+I V3 (t3)=111 (2) *1+110 (2) *1=13 (10)
[0116] The results from Equation 11 and Equation 12 are input into the adder (⑦). Therefore, the output value of the adder can be calculated by Equation 13 below.
[0117] [Equation 13]
[0118] ((I V1 (t1)+I V3 (t1))*2 1 +I V1 (t2)+I V3 (t2))*2 1 +I V1 (t3)+I V3 (t3)=38 (10) +13 (10) =51 (10)
[0119] As a result of Equation 13, after time t3, the first shift adder 452 can output the result value 51 to the first PE accumulator 453. As described above, Figure 9 The operation results of the memory device shown can be compared with Figure 8 The results of the operations on the memory devices shown are the same.
[0120] Will Figure 8 and Figure 9 In comparison, Figure 8 In this case, the ADC is included in each of the PE 410, 420, 430 and 440, while Figure 9 In this case, the ADC is set outside each of PE 410, 420, 430, and 440. Figure 8 In this case, each ADC only receives the output from the synaptic array in the corresponding PE, while Figure 9 In this scenario, each ADC simultaneously receives outputs from synaptic arrays included in multiple PEs. The outputs from the multiple synaptic arrays are summed in the form of currents and input to the ADC. Therefore, in Figure 9 In the case of the illustrated embodiment, the output can be the same as... Figure 8 The same results were obtained in the illustrated embodiment. On the other hand, in Figure 9 In the illustrated embodiment, the total number of required ADCs can be reduced.
[0121] Although various embodiments of the disclosed technology have been described with particular and varying details for illustrative purposes, those skilled in the art will understand that various modifications, additions, and substitutions can be made based on what has been disclosed or shown in this disclosure without departing from the spirit and scope of the invention as defined by the appended claims. Furthermore, embodiments can be combined to form other embodiments.
Claims
1. A memory device that performs a convolution operation, the memory device comprising: The first to Nth processing elements are the first to Nth PEs, where N is a natural number greater than or equal to 2. The first to Nth PEs are respectively associated with at least one weighted data segment included in the weighted feature map, and perform partial convolution operations by using at least one input data segment included in the input feature map. The first analog-to-digital converter, i.e., the first ADC, receives the first result of the partial convolution operation from each of the first to the Nth PE; The first shift adder shifts and adds the output of the first ADC; as well as The first accumulator accumulates the output from the first shift adder.
2. The memory device of claim 1, wherein each of the first to Nth PEs comprises a first to kth synaptic array, wherein k is a natural number equal to or greater than 2.
3. The memory device of claim 2, wherein the first ADC receives the output of the first synaptic array of each of the first to Nth PEs as the first result.
4. The memory device of claim 3, wherein the first ADC receives the sum of the output currents of the first synaptic array of each of the first to Nth PEs.
5. The memory device according to claim 2, further comprising: The second ADC receives the second result of the partial convolution operation from the first to the Nth PE; The second shift adder shifts the output of the second ADC; as well as The second accumulator accumulates the output from the second shift adder.
6. The memory device of claim 5, wherein the second ADC receives the output of the second synaptic array of each of the first to Nth PEs as the second result.
7. The memory device of claim 6, wherein the second ADC receives the sum of the output currents of the second synaptic array of each of the first to Nth PEs.
8. The memory device of claim 2, wherein each of the first to kth synaptic arrays comprises a plurality of memristors.
9. A convolution operation device, included in a memristor-based deep learning accelerator, the convolution operation device comprising: Multiple processing elements, i.e. multiple PEs, generate their respective currents by performing Equation 2 on a portion of the input feature map through simulated MAC operation using a weighted feature map. as well as The digital processing circuit converts the currents into their respective binary values and performs the operation of Equation 1. in [Equation 1] and [Equation 2] in: " " is the result of the shift adder operation; " "This is the part of the diagram described; " " is the weighted feature map; " "It is a convolution operator; " " is an element of the aforementioned partial graph; " " is an element of the weighted feature map and corresponds to one of the PEs; " " is the binary value of an element of the partial graph; " " is the binary value of the Kth bit within the element of the weighted feature map; " "It is electric current;" "N" is the number of rows in each of the partial graph and the weighted feature graph; "M" is the number of columns in the partial graph or the weighted feature graph; and "P" represents the number of bits in each element of the partial graph and the weighted feature graph, and The digital arithmetic circuit includes: The first analog-to-digital converter, i.e., the first ADC, receives the first result of a partial convolution operation from each of the plurality of PEs.