Simulated domain storage-computation integrated convolution pooling circuit and method based on weight staggered deployment
By using a simulated domain in-memory convolution and pooling circuit with weighted staggered deployment, parallel convolution and pooling operations within the simulated domain are achieved, solving the problems of insufficient parallelism and high power consumption in existing technologies, and improving the computing efficiency and resource utilization of edge AI chips.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- EAST CHINA NORMAL UNIV
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing analog in-memory computing architectures suffer from insufficient parallelism, high power consumption, and high hardware resource consumption in convolution operations, failing to meet the requirements of edge AI chips for high frame rate inference and hardware compactness.
An analog-domain in-memory convolutional pooling circuit based on weighted staggered deployment is adopted, including a digital-to-analog conversion parallel input module, an in-memory array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module, to realize the parallel completion of convolution operation and the pooling operation in the analog domain, eliminating the analog-to-digital conversion stage.
It significantly improves the parallelism and overall efficiency of convolution operations, reduces power consumption and hardware resource consumption, and adapts to the low power consumption, high parallelism, and high compactness requirements of edge AI chips.
Smart Images

Figure CN122242601A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of analog integrated circuits and in-memory AI chip technology, and in particular to an analog domain in-memory convolutional pooling circuit and method based on weighted misalignment deployment. Background Technology
[0002] With the rapid development of edge intelligent devices, stringent requirements have been placed on the low power consumption, high parallelism, and low latency characteristics of AI inference chips. Convolutional neural networks (CNNs), as the core network architecture for AI inference, primarily rely on convolution and pooling operations, which are typically executed serially in cascade. In-memory computing architectures, by performing operations directly within memory cells, break through the memory wall bottleneck of traditional von Neumann architectures. Analog-domain in-memory computing architectures, leveraging the characteristics of analog circuits, implement multiplication using Ohm's law and addition using Kirchhoff's current law, significantly reducing the hardware overhead and power consumption of convolution operations. This has become the mainstream technology direction for low-power AI chips at the edge.
[0003] In current mainstream in-memory computing architectures, the weights of 3×3 convolutional kernels are typically fixed and deployed in a 3×3-sized in-memory array, and the full-image convolution operation is completed by temporal sliding of the input feature map. (See the attached diagram in the specification.) Figure 1 As shown, for a 3×3 standard convolution operation with a stride of 1 and no padding on a 4×4 input feature map, the traditional approach requires sliding the 3×3 convolution kernel sequentially across the 4×4 input feature map, performing convolution operations on four independent time periods (T1, T2, T3, T4) for each of the four 3×3 sliding sub-windows. Within each time period, only the pixel voltage signal of the corresponding sub-window can be loaded onto the input of the 3×3 memory array to complete the single-channel convolution result calculation and current reading. After the convolution operation is completed, the serially output multi-channel convolution result current needs to be converted into a digital signal via IV conversion and an analog-to-digital converter (ADC) before being sent to an independent digital processing unit for pooling operations. The above-mentioned existing technical solutions have the following insurmountable technical defects in practical applications: First, convolution operations suffer from insufficient parallelism and low computational efficiency. Limited by fixed-size storage arrays and serial sliding operation modes, traditional solutions require multiple time cycles to complete the full convolution operation on a single feature map. For example, a 3×3 convolution on a 4×4 input feature map requires 4 time cycles to complete the entire operation. This makes it impossible to achieve parallel execution of multi-window convolutions, which is insufficient to meet the performance requirements of high frame rate inference for edge intelligent devices.
[0004] Secondly, the frequent reading of convolution results leads to high data transfer overhead and high bandwidth usage. In traditional solutions, after the convolution operation of each sliding window is completed, the current of the convolution result needs to be read separately. For the convolution operation of a 4×4 input feature map, four current reading operations are required, which greatly increases the power consumption of data transfer and the bandwidth usage of the on-chip bus, and cannot give full play to the low power consumption advantage of the in-memory computing architecture.
[0005] Furthermore, in traditional solutions, convolution operations are performed in the analog domain and pooling operations are performed in the digital domain. An analog-to-digital conversion step is required between the two, and the digital-to-analog / analog-to-digital conversion process will generate significant additional power consumption and computational latency. At the same time, pooling operations rely on a separate digital processing unit, which further increases the hardware resource consumption of the chip. This contradicts the original intention of the analog-in-memory computing architecture to be efficient and low-power, and cannot meet the core requirement of edge AI chips for hardware compactness.
[0006] Therefore, it is necessary to improve existing technologies to overcome their shortcomings. Summary of the Invention
[0007] The problem to be solved by the present invention is to provide a simulated in-memory computing convolutional pooling circuit and method based on weight misalignment deployment, so as to overcome the defects of low computing efficiency, high power consumption and latency, and high hardware resource consumption of existing simulated in-memory computing architectures.
[0008] The technical solution adopted by this invention to solve its technical problem is: an analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment, comprising: a digital-to-analog conversion parallel input module, an in-memory array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module; The output of the parallel input digital-to-analog converter module is electrically connected to the input of the in-memory computing array module, and is used to convert the input feature map digital signal into multiple analog input voltage signals and output them to the in-memory computing array module. The in-memory computing array module is adapted to a convolutional kernel of a preset size. It includes multiple columns of parallel in-memory computing units. The convolutional kernel slides in the feature space corresponding to the input feature map to form multiple sliding sub-windows. Each column of the in-memory computing units corresponds to one sliding sub-window. The weights of the convolutional kernel are loaded in a staggered manner that matches the corresponding sliding sub-window. This is used to complete multiple convolution operations in parallel within a single time period and output multiple convolution result currents. The number of channels of the current-to-voltage conversion module is matched one-to-one with the number of in-memory computing unit columns of the in-memory computing array module. Its input terminal is electrically connected to the output terminal of each in-memory computing unit column of the in-memory computing array module, and is used to synchronously convert the multi-channel convolution result current output by the in-memory computing array module into multi-channel analog voltage signals. The input terminal of the parallel analog maximum value comparison module is electrically connected to the output terminal of the current-to-voltage conversion module, and is used to perform parallel comparison of the multiple analog voltage signals output by the current-to-voltage conversion module in the analog domain, and output the maximum pooling result.
[0009] As a further improvement of the present invention, each column of the in-memory computing array module contains multiple in-memory computing units with the same number of output channels as the digital-to-analog converter parallel input module; the in-memory computing units with the same serial number in each column of the in-memory computing unit together constitute a row of the in-memory computing array module, and each row of the array corresponds to one of the analog input voltage signals output by the digital-to-analog converter parallel input module.
[0010] As a further improvement of the present invention, the number of in-memory computing unit columns of the in-memory computing array module is consistent with the number of effective sliding sub-windows formed by the convolution kernel on the input feature map; the weight arrangement position of each column of the in-memory computing unit is matched with the position offset of the corresponding sliding sub-window on the input feature map, and the positions of the in-memory computing units that do not participate in the convolution operation of the corresponding sliding sub-window are set as placeholders of infinitesimal conductance.
[0011] As a further improvement of the present invention, the parallel input digital-to-analog converter module is a 16-channel parallel digital-to-analog converter used to convert the 4×4 dimension feature map digital signal into 16 channels of the analog input voltage signal; the in-memory computing array module is a 16-row × 4-column array adapted to a 3×3 size convolution kernel, and the 4 columns of in-memory computing array module correspond one-to-one with the 4 3×3 sliding sub-windows in the 4×4 input feature map.
[0012] As a further improvement of the present invention, the nine independent weights of the convolution kernel are set to W1~W9; wherein: The storage unit column located in the first column is matched with the 3×3 sliding sub-window in the upper left corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 1~3, W4~W6 are deployed in rows 5~7, W7~W9 are deployed in rows 9~11, and the remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the second column matches the 3×3 sliding sub-window in the upper right corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 2~4, W4~W6 are deployed in rows 6~8, W7~W9 are deployed in rows 10~12, and the remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the 3rd column matches the 3×3 sliding sub-window in the lower left corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 5~7, W4~W6 are deployed in rows 9~11, and W7~W9 are deployed in rows 13~15. The remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the 4th column matches the 3×3 sliding sub-window in the lower right corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 6~8, W4~W6 are deployed in rows 10~12, W7~W9 are deployed in rows 14~16, and the remaining rows are set as placeholders for infinitesimal conductance.
[0013] As a further improvement of the present invention, the current-to-voltage conversion module includes multiple sets of first operational amplifiers, each set of first operational amplifiers has a feedback resistor connected in series in the feedback loop, and the input terminal of each first operational amplifier is electrically connected to the output terminal of the in-memory computing unit column of the in-memory computing array module.
[0014] As a further improvement of the present invention, the parallel analog maximum value comparison module includes multiple sets of second operational amplifiers. Each set of second operational amplifiers has a diode connected in series in its feedback loop. The input terminal of each second operational amplifier is electrically connected to the output terminal of the corresponding first operational amplifier in the current-to-voltage conversion module. This is used to perform parallel comparison of the multiple analog voltage signals output by the current-to-voltage conversion module. Through the voltage clamping effect of the diode, only the maximum value of the multiple analog voltage signals is allowed to pass through and be output, so as to obtain the maximum pooling result.
[0015] As a further improvement of the present invention, the analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment also includes a convolution-pooling collaborative timing control module. The timing signal output terminal of the convolution-pooling collaborative timing control module is electrically connected to the timing control terminals of the parallel digital-to-analog converter input module, the in-memory array module, the current-to-voltage converter module, and the parallel analog maximum value comparison module, respectively, for outputting a synchronous timing trigger signal.
[0016] This invention also provides a simulated in-memory convolutional pooling method based on weight misalignment deployment, implemented using the simulated in-memory convolutional pooling circuit based on weight misalignment deployment as described above, including the following steps: S1. Based on the input feature map size, preset the convolution kernel size and sliding stride, and load the weights of the convolution kernel in each column of the in-memory array module in a staggered manner that matches the sliding sub-window of the convolution kernel. S2, the digital-to-analog converter parallel input module converts the input feature map digital signal into multiple analog input voltage signals, and loads them in parallel to the corresponding row of the in-memory computing array module; S3, each column of the in-memory computing array module synchronously completes the multiplication operation of the corresponding analog input voltage signal and weight, and performs multiplication and accumulation in parallel within the column, and outputs multiple convolution result currents in parallel within a single time period. S4, the current-to-voltage conversion module synchronously converts the multiple convolution result currents into multiple analog voltage signals and outputs them to the parallel analog maximum value comparison module; S5, the parallel analog maximum value comparison module performs parallel comparison of multiple analog voltage signals in the analog domain and outputs the maximum pooling result.
[0017] As a further improvement of the present invention, in step S3, each memory unit in each memory unit column completes the multiplication operation of the corresponding analog input voltage signal and weight through Ohm's law, and the operation current generated by the memory units in the same memory unit column completes the multiplication and accumulation at the output terminal through Kirchhoff's current law.
[0018] The beneficial effects of this invention are as follows: This invention provides an analog-domain in-memory convolutional pooling circuit and method based on weight misalignment deployment. It constructs an end-to-end analog-domain in-memory convolutional pooling fusion hardware architecture, integrating a parallel input module for digital-to-analog conversion, an in-memory array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module. It abandons the serial sliding convolution operation mode and the fragmented architecture of "analog convolution + digital pooling" in traditional technologies. The parallel input module for digital-to-analog conversion achieves synchronous conversion and parallel loading of feature map digital signals. The in-memory array module, through the one-to-one correspondence between multiple parallel columns of in-memory units and the sliding sub-windows of the convolution kernel, combined with the weight misalignment deployment method, achieves single-time-cycle... The parallel completion of multiple convolution operations significantly improves the parallelism and overall computational efficiency of convolution operations. The precise matching of the current-to-voltage conversion module and the number of in-memory computing unit columns enables the synchronous conversion of currents from multiple convolution results. The parallel analog maximum value comparison module completes the max pooling operation, thereby achieving the end-to-end operation of "convolution → current → voltage → maximum value comparison" in the analog domain. This eliminates the analog-to-digital conversion link between convolution and pooling, avoiding the additional power consumption and computational delay caused by digital-to-analog conversion. At the same time, there is no need to set up a separate digital pooling processing unit, which effectively reduces the occupation of hardware resources and makes the circuit architecture more compact, perfectly adapting to the application requirements of edge AI inference chips for low power consumption, high parallelism, and high compactness. Attached Figure Description
[0019] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 The diagram shows the timing and circuit diagram of traditional analog in-memory convolution, where (a) is the schematic diagram of the sliding convolution circuit of the 3×3 convolution kernel of the traditional analog in-memory architecture on the 4×4 input feature map, (b) is the timing diagram of the traditional analog in-memory convolution, and (c) is the hardware circuit diagram of the traditional analog in-memory convolution. Figure 2 This is an architecture diagram of the analog domain in-memory convolutional pooling circuit based on weight misalignment deployment according to the present invention. Figure 3 This is a circuit diagram of the analog input voltage signal corresponding to the 4×4 dimension feature map, the 3×3 size convolution kernel, and the in-memory computing array module in this invention. Figure 4 This is a circuit schematic diagram of the current-to-voltage conversion module and the parallel analog maximum value comparison module in this invention; Figure 5 This is the timing diagram of the analog domain in-memory convolutional pooling circuit based on weight misalignment deployment according to the present invention. Figure 6 This is a flowchart illustrating the steps of the simulated domain in-memory convolutional pooling method based on weighted misalignment deployment according to the present invention. Detailed Implementation
[0021] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. This application can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, in the absence of conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0022] It should be noted that various aspects of embodiments within the scope of the appended claims are described below. It will be apparent that the aspects described herein can be embodied in a wide variety of forms, and any particular structure and / or function described herein is merely illustrative. Based on this application, those skilled in the art will understand that one aspect described herein can be implemented independently of any other aspect, and two or more of these aspects can be combined in various ways. For example, any number and aspects set forth herein can be used to implement the device and / or practice the method. Additionally, this device and / or method can be implemented using structures and / or functionalities other than one or more of the aspects set forth herein.
[0023] It should also be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of this application. The illustrations only show the components related to this application and are not drawn according to the number, shape and size of the components in actual implementation. In actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0024] Additionally, specific details are provided in the following description to facilitate a thorough understanding of the examples. However, those skilled in the art will understand that practice can be carried out without these specific details.
[0025] The technical solutions provided by the various embodiments of this application are described below with reference to the accompanying drawings.
[0026] Example 1
[0027] See Figures 2 to 5 The present invention provides an analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment, comprising: a digital-to-analog conversion parallel input module, an in-memory array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module.
[0028] The parallel input module for digital-to-analog conversion is electrically connected to the input of the in-memory computing array module. It converts the input digital feature map signal into multiple analog input voltage signals and outputs them to the in-memory computing array module. The in-memory computing array module is adapted to a pre-sized convolution kernel and includes multiple columns of parallel-arranged in-memory computing units. The convolution kernel slides within the feature space corresponding to the input feature map to form multiple sliding sub-windows. Each column of in-memory computing units corresponds to one sliding sub-window, and the weights of the convolution kernel are loaded in a staggered manner matching the corresponding sliding sub-window. This allows for parallel completion of multiple convolution operations within a single time period and output of multiple convolution result currents. The current-to-voltage (IV) conversion module has a channel number that matches the number of in-memory computing unit columns in the in-memory computing array module. Its input is electrically connected to the output of each in-memory computing unit column in the in-memory computing array module, and it synchronously converts the multiple convolution result currents output by the in-memory computing array module into multiple analog voltage signals. The input of the parallel analog maximum value comparison module is electrically connected to the output of the current-to-voltage conversion module. It is used to perform parallel comparison of multiple analog voltage signals output by the current-to-voltage conversion module in the analog domain and output the maximum pooling result.
[0029] This invention constructs an end-to-end analog-domain in-memory convolutional pooling fusion hardware architecture, integrating a digital-to-analog conversion parallel input module, an in-memory computing array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module. It abandons the traditional serial sliding convolution operation mode and the fragmented architecture of "analog convolution + digital pooling." The digital-to-analog conversion parallel input module realizes synchronous conversion and parallel loading of feature map digital signals. The in-memory computing array module, through a one-to-one correspondence between multiple parallel columns of computing units and the sliding sub-windows of the convolution kernel, combined with a weight misalignment deployment method, achieves parallel completion of multiple convolution operations within a single time period, significantly improving convolution efficiency. The parallelism and overall computational efficiency of the operation are achieved through precise matching of the current-to-voltage conversion module and the number of in-memory unit columns, which enables synchronous conversion of current from multiple convolution results. The parallel analog maximum value comparison module completes the max pooling operation, thereby realizing the completion of the entire chain operation of "convolution → current → voltage → maximum value comparison" in the analog domain. This eliminates the analog-to-digital conversion link between convolution and pooling, avoiding the additional power consumption and computational delay caused by digital-to-analog conversion. At the same time, there is no need to set up a separate digital pooling processing unit, which effectively reduces the occupation of hardware resources and makes the circuit architecture more compact, perfectly adapting to the application requirements of edge AI inference chips for low power consumption, high parallelism, and high compactness.
[0030] In addition, the analog domain in-memory convolutional pooling circuit based on weighted staggered deployment of this invention also includes a convolution-pooling collaborative timing control module. The timing signal output terminal of the convolution-pooling collaborative timing control module is electrically connected to the timing control terminals of the analog-to-digital conversion parallel input module, the in-memory array module, the current-to-voltage conversion module, and the parallel analog maximum value comparison module, respectively, to output synchronous timing trigger signals. This achieves precise timing coordination of each operation stage, including analog-to-digital conversion parallel input, in-memory array convolution operation, current-to-voltage conversion, and parallel analog maximum value comparison, ensuring the synchronization and continuity of the entire convolutional pooling fusion operation chain, and further improving the stability and accuracy of the circuit operation.
[0031] Furthermore, each column of the in-memory computing array module contains multiple in-memory computing units with the same number of output channels as the parallel input module for digital-to-analog conversion. The in-memory computing units with the same serial number in each column together form a row of the in-memory computing array module. Each row of the array corresponds to one analog input voltage signal output by the parallel input module for digital-to-analog conversion, ensuring that each analog input voltage signal can be transmitted synchronously and accurately to the corresponding in-memory computing unit. This provides a stable and reliable hardware structure support for the subsequent synchronous completion of convolution operations by each column of in-memory computing units, ensuring the synchronization and accuracy of parallel convolution operations. At the same time, the regular array structure design also facilitates hardware integration, manufacturing, and mass production.
[0032] The number of in-memory computing unit columns in the integrated in-memory array module matches the number of effective sliding sub-windows formed by the convolution kernels on the input feature map, achieving a one-to-one correspondence between in-memory unit columns and sliding sub-windows. Simultaneously, the weight arrangement position of each in-memory unit column is matched with the position offset of the corresponding sliding sub-window on the input feature map, ensuring the accuracy of weight misalignment and allowing each in-memory unit column to precisely match and complete the convolution operation for its corresponding sliding sub-window. In-memory units that do not participate in the convolution operation of their corresponding sliding sub-windows are set as placeholders with infinitesimal conductance, effectively shielding signal interference from irrelevant in-memory units, ensuring the accuracy of the convolution operation results, and avoiding power consumption losses from ineffective in-memory units, further reducing the overall power consumption of the circuit and improving its power utilization efficiency.
[0033] like Figures 3 to 5 As shown, the following text uses a 4×4 dimension input feature map as an example to further explain the in-memory computing array module.
[0034] In this embodiment, the parallel input digital-to-analog converter module is a 16-channel parallel digital-to-analog converter (DAC) used to convert the 4×4 dimension feature map digital signal into 16 channels of analog input voltage signals (V1~V2). 16 This provides synchronous simulated input support for subsequent parallel convolution operations.
[0035] In this embodiment, the convolution kernel has a size of 3×3, and its nine independent weights are set to W1~W9. The convolution kernel forms four effective sliding sub-windows on the 4×4 input feature map, corresponding to the top-left, top-right, bottom-left, and bottom-right corners of the 4×4 input feature map (e.g., ...). Figure 1 (as shown in (a)).
[0036] In this embodiment, the in-memory computing array module is a 16-row × 4-column array, that is: the in-memory computing array module includes 4 columns of in-memory computing units, each column of in-memory computing units has 16 in-memory computing units, and the 4 columns of in-memory computing units form a total of 16 rows of in-memory computing units; the 4 columns of in-memory computing units of the in-memory computing array module correspond one-to-one with the 4 3×3 sliding sub-windows in the 4×4 input feature map.
[0037] See Figure 3 The 16-channel analog input voltage signal (V1~V) output by the parallel input digital-to-analog converter module 16 ), the 16 input terminals IN1~IN of the in-memory computing array module with one-to-one synchronous access and misaligned weight deployment. 16 (V1=IN1, V2=IN2…V) 16 =IN 16 The four in-memory unit columns load the weights of the 3x3 convolution kernels through a weight misalignment and placeholder setting method, enabling parallel operation of four 3x3 convolutions. The output of the four in-memory unit columns directly corresponds to the four candidate results of max pooling.
[0038] Table 1
[0039] Table 1 shows the weight distribution diagram of the 16-row × 4-column memory array. In Table 1, "N" indicates that the conductance of the memory device at this position is set to infinitesimal, the equivalent circuit is open, and it does not participate in multiplication and addition operations. It is only used as a placeholder to realize the weight misalignment and ensure that the weight of each column is accurately matched with the input node of the corresponding 3x3 sliding sub-window.
[0040] The weighted staggered deployment of the four in-memory computing unit columns corresponds one-to-one with the four 3x3 sliding sub-windows in the 4x4 input. Each column only activates the nine valid in-memory computing devices in the corresponding sliding sub-window, with the rest being N placeholders. The specific deployment logic is as follows: The first column of stored-units matches the top-left 3×3 sliding sub-window of the 4×4 input feature map, with input nodes IN1~IN3, IN5~IN7, and IN9~IN1~IN3. 11In this storage unit column, weights W1~W3 are deployed in rows 1~3, W4~W6 in rows 5~7, and W7~W9 in rows 9~11. The remaining rows (rows 4, 8, and 12~16) are set as placeholders for infinitesimal conductance to avoid the sliding sub-window interval rows and the unrelated rows below from participating in the calculation.
[0041] The storage unit column located in the second column matches the 3×3 sliding sub-window in the upper right corner of the 4×4 input feature map, with input nodes IN2~IN4, IN6~IN8, and IN... 10 ~IN 12 In this storage unit column, weights W1~W3 are deployed in rows 2~4, W4~W6 in rows 6~8, and W7~W9 in rows 10~12. The remaining rows (rows 1, 5, 9, and 13~16) are set as placeholders for infinitesimal conductance to avoid irrelevant rows and interval rows on the left side of the sliding sub-window from participating in the calculation.
[0042] The storage unit column located in the 3rd column matches the 3×3 sliding sub-window in the lower left corner of the 4×4 input feature map, with input nodes IN5~IN7 and IN9~IN7. 11 IN 13 ~IN 15 In this storage unit column, weights W1~W3 are deployed in rows 5~7, W4~W6 in rows 9~11, and W7~W9 in rows 13~15. The remaining rows (rows 1~4, 8, 12, and 16) are set as placeholders for infinitesimal conductance to avoid irrelevant rows and interval rows above the sliding sub-window from participating in the calculation.
[0043] The storage unit column located in column 4 matches the 3×3 sliding sub-window in the lower right corner of the 4×4 input feature map, with input nodes IN6~IN8 and IN... 10 ~IN 12 IN 14 ~IN 16 In this storage unit column, weights W1~W3 are deployed in rows 6~8, W4~W6 in rows 10~12, and W7~W9 in rows 14~16. The remaining rows (rows 1~5, 9, and 13) are set as placeholders for infinitesimal conductance to avoid irrelevant rows and interval rows above the sliding sub-window from participating in the calculation.
[0044] Triggered by the synchronous timing signal of the convolution-pooling collaborative timing control module, the in-memory computing array module with staggered weight deployment starts parallel operation: a single in-memory unit completes the multiplication operation between the input analog voltage and the weight using Ohm's law (I=VxG, where G is the conductance of the device corresponding to the weight); the operation current generated by the 9 effective in-memory devices in each column is accumulated at the column output end using Kirchhoff's current law to realize the multiplication and addition operation of 3x3 convolution; the 4 columns of in-memory units synchronously complete the above operation, within a single timing cycle (e.g. Figure 5 As shown, it can output four convolution result currents (I1~I4) in parallel, which correspond to the convolution operation results of the four 3x3 sliding sub-windows in the upper left, upper right, lower left and lower right corners of the 4x4 input feature map, respectively.
[0045] In this embodiment, the current-to-voltage conversion module includes four sets of first operational amplifiers. Feedback resistors are connected in series in the feedback loop formed between the inverting input and output of the four sets of first operational amplifiers. The inverting input of each first operational amplifier is electrically connected to the output of the in-memory computing unit column of the in-memory computing array module. The non-inverting input of each first operational amplifier is grounded.
[0046] The four convolution result currents (I1~I4) output by the in-memory computing array module are synchronously transmitted to the IV conversion module. The four sets of first operational amplifiers and feedback resistors in the IV conversion module receive I1~I4 one by one. Through the current-to-voltage conversion characteristics of the first operational amplifiers, the current signals are linearly converted into analog voltage signals (V1~V4), completing the signal format adaptation and providing a matching input signal for subsequent analog domain maximum value comparison.
[0047] In this embodiment, the parallel analog maximum value comparison module includes four sets of second operational amplifiers. A diode is connected in series in the feedback loop formed by the inverting input and output of each of the four sets of second operational amplifiers. The non-inverting input of each second operational amplifier is electrically connected to the output of one of the four corresponding first operational amplifiers in the IV conversion module. This is used to perform parallel comparison of the multiple analog voltage signals output by the current-to-voltage conversion module. Through the voltage clamping effect of the diodes, only the maximum value among the multiple analog voltage signals is allowed to pass through and be output, thus obtaining the max-pooling result. This max-pooling result can be acquired and converted into a digital signal by an analog-to-digital converter (ADC), input into the MCU / FPGA to enter the next round of neural network operation. Simultaneously, the convolution-pooling collaborative timing control module triggers the digital-to-analog conversion parallel input module to update the next batch of 4x4 feature map data, initiating a new round of convolution-pooling fusion operation.
[0048] This invention enables the entire chain of convolution and pooling fusion operations to be completed in the analog domain, completing 4 convolutions and pooling operations in a single cycle. The computational efficiency is 4 times higher than the traditional "4-cycle successive operation", which completely solves the problem of insufficient parallelism in the traditional solution, significantly reduces the power consumption and latency of pooling operations, and maximizes the low power consumption advantage of the analog in-memory computing architecture.
[0049] Example 2
[0050] See Figure 6 The present invention also provides a simulated in-memory convolutional pooling method based on weight misalignment deployment, which is implemented using the simulated in-memory convolutional pooling circuit based on weight misalignment deployment as described in Embodiment 1, and includes the following steps: S1, Weight Pre-deployment: Based on the input feature map size, the convolution kernel size and sliding stride are preset. In each column of the in-memory array module, the weights of the convolution kernel are loaded in a staggered manner that matches the sliding sub-window of the convolution kernel.
[0051] S2, Synchronous Input Conversion: The parallel input module for digital-to-analog conversion converts the input feature map digital signal into multiple analog input voltage signals, which are then loaded in parallel onto the corresponding rows of the in-memory computing array module.
[0052] S3, Single-cycle parallel convolution: Each column of the in-memory computing array module synchronously completes the multiplication operation of the corresponding analog input voltage signal and weight, and completes the multiplication and accumulation within the column, outputting the multi-channel convolution result current in parallel within a single time cycle.
[0053] In each storage unit column, a single storage unit performs a multiplication operation on the corresponding analog input voltage signal and weight using Ohm's law. The operational current generated by the storage units in the same storage unit column is multiplied and accumulated at the output terminal using Kirchhoff's current law.
[0054] S4, Signal Format Conversion: The current-to-voltage conversion module synchronously converts the current of the multi-channel convolution result into multiple analog voltage signals and outputs them to the parallel analog maximum value comparison module.
[0055] S5, Analog Domain Pooling Output: The parallel analog maximum value comparison module performs parallel comparison of multiple analog voltage signals in the analog domain and outputs the maximum pooling result.
[0056] After step S5 completes the output of the max pooling result, the max pooling result can be acquired by the analog-to-digital converter (ADC) and converted into a digital signal, which is then input into the MCU / FPGA to enter the next round of neural network operation. At the same time, the convolution-pooling collaborative timing control module triggers the digital-to-analog conversion parallel input module to update the next batch of feature map data, starting a new round of convolution-pooling fusion operation.
[0057] The specific embodiment of the analog domain in-memory convolutional pooling method based on weight misalignment deployment described in this invention is for unfolding a 4×4 dimension input feature map. This method is implemented based on the aforementioned analog domain in-memory convolutional pooling circuit based on weight misalignment deployment. The specific hardware structure of the circuit, the electrical connection relationship between each module, and the weight misalignment deployment method of the in-memory array module have been described in detail above. Therefore, the circuit-related technical content will not be repeated in this embodiment.
[0058] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A simulated in-memory convolutional pooling circuit based on weighted staggered deployment, characterized in that, include: The module includes a parallel input module for digital-to-analog conversion, a memory-based array module, a current-to-voltage conversion module, and a parallel analog maximum value comparison module. The output of the parallel input digital-to-analog converter module is electrically connected to the input of the in-memory computing array module, and is used to convert the input feature map digital signal into multiple analog input voltage signals and output them to the in-memory computing array module. The in-memory computing array module is adapted to a convolutional kernel of a preset size. It includes multiple columns of parallel in-memory computing units. The convolutional kernel slides in the feature space corresponding to the input feature map to form multiple sliding sub-windows. Each column of the in-memory computing units corresponds to one sliding sub-window. The weights of the convolutional kernel are loaded in a staggered manner that matches the corresponding sliding sub-window. This is used to complete multiple convolution operations in parallel within a single time period and output multiple convolution result currents. The number of channels of the current-to-voltage conversion module is matched one-to-one with the number of in-memory computing unit columns of the in-memory computing array module. Its input terminal is electrically connected to the output terminal of each in-memory computing unit column of the in-memory computing array module, and is used to synchronously convert the multi-channel convolution result current output by the in-memory computing array module into multi-channel analog voltage signals. The input terminal of the parallel analog maximum value comparison module is electrically connected to the output terminal of the current-to-voltage conversion module, and is used to perform parallel comparison of the multiple analog voltage signals output by the current-to-voltage conversion module in the analog domain, and output the maximum pooling result.
2. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 1, characterized in that, Each column of the in-memory computing array module contains multiple in-memory computing units that have the same number of output channels as the parallel input digital-to-analog converter module. The in-memory computing units with the same serial number in each column together constitute a row of the in-memory computing array module, and each row of the array corresponds to one of the analog input voltage signals output by the parallel input digital-to-analog converter module.
3. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 1, characterized in that, The number of in-memory computing unit columns in the in-memory computing array module is consistent with the number of effective sliding sub-windows formed by the convolution kernel on the input feature map; the weight arrangement position of each column of the in-memory computing unit is matched with the position offset of the corresponding sliding sub-window on the input feature map, and the positions of the in-memory computing units that do not participate in the convolution operation of the corresponding sliding sub-window are set as placeholders of infinitesimal conductance.
4. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 4, characterized in that, The parallel input digital-to-analog converter module is a 16-channel parallel digital-to-analog converter used to convert the 4×4 dimension feature map digital signal into 16 channels of analog input voltage signal; the in-memory computing array module is a 16-row × 4-column array adapted to a 3×3 size convolution kernel, and the 4 columns of in-memory computing array module correspond one-to-one with the 4 3×3 sliding sub-windows in the 4×4 input feature map.
5. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 1, characterized in that, The nine independent weights of the convolution kernel are set to W1~W9; where: The storage unit column located in the first column is matched with the 3×3 sliding sub-window in the upper left corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 1~3, W4~W6 are deployed in rows 5~7, W7~W9 are deployed in rows 9~11, and the remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the second column matches the 3×3 sliding sub-window in the upper right corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 2~4, W4~W6 are deployed in rows 6~8, W7~W9 are deployed in rows 10~12, and the remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the 3rd column matches the 3×3 sliding sub-window in the lower left corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 5~7, W4~W6 are deployed in rows 9~11, and W7~W9 are deployed in rows 13~15. The remaining rows are set as placeholders for infinitesimal conductance. The storage unit column located in the 4th column matches the 3×3 sliding sub-window in the lower right corner of the 4×4 input feature map. In this storage unit column, weights W1~W3 are deployed in rows 6~8, W4~W6 are deployed in rows 10~12, W7~W9 are deployed in rows 14~16, and the remaining rows are set as placeholders for infinitesimal conductance.
6. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 1, characterized in that, The current-to-voltage conversion module includes multiple sets of first operational amplifiers. Each set of first operational amplifiers has a feedback resistor connected in series in its feedback loop, and the input terminal of each first operational amplifier is electrically connected to the output terminal of the in-memory computing unit column of the in-memory computing array module.
7. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 6, characterized in that, The parallel analog maximum value comparison module includes multiple sets of second operational amplifiers. Each set of second operational amplifiers has a diode connected in series in its feedback loop. The input terminal of each second operational amplifier is electrically connected to the output terminal of the corresponding first operational amplifier in the current-to-voltage conversion module. This module is used to perform parallel comparison of multiple analog voltage signals output by the current-to-voltage conversion module. Through the voltage clamping effect of the diodes, only the maximum value among the multiple analog voltage signals is allowed to pass through and be output, so as to obtain the maximum pooling result.
8. The analog domain in-memory convolutional pooling circuit based on weighted misalignment deployment according to claim 1, characterized in that, It also includes a convolution-pooling collaborative timing control module. The timing signal output terminal of the convolution-pooling collaborative timing control module is electrically connected to the timing control terminals of the parallel digital-to-analog converter input module, the in-memory computing array module, the current-to-voltage converter module, and the parallel analog maximum value comparison module, respectively, and is used to output synchronous timing trigger signals.
9. A simulated domain in-memory convolutional pooling method based on weighted misalignment deployment, characterized in that, This is implemented using the analog in-memory convolutional pooling circuit based on weighted misalignment deployment as described in any one of claims 1-8. Includes the following steps: S1. Based on the input feature map size, preset the convolution kernel size and sliding stride, and load the weights of the convolution kernel in each column of the in-memory array module in a staggered manner that matches the sliding sub-window of the convolution kernel. S2, the digital-to-analog converter parallel input module converts the input feature map digital signal into multiple analog input voltage signals, and loads them in parallel to the corresponding row of the in-memory computing array module; S3, each column of the in-memory computing array module synchronously completes the multiplication operation of the corresponding analog input voltage signal and weight, and performs multiplication and accumulation in parallel within the column, and outputs multiple convolution result currents in parallel within a single time period. S4, the current-to-voltage conversion module synchronously converts the multiple convolution result currents into multiple analog voltage signals and outputs them to the parallel analog maximum value comparison module; S5, the parallel analog maximum value comparison module performs parallel comparison of multiple analog voltage signals in the analog domain and outputs the maximum pooling result.
10. The simulated domain in-memory convolutional pooling method based on weighted misalignment deployment according to claim 9, characterized in that, In step S3, each memory cell in each memory cell column performs a multiplication operation on the corresponding analog input voltage signal and weight using Ohm's law. The operational current generated by the memory cells in the same memory cell column is multiplied and accumulated at the output terminal using Kirchhoff's current law.