Digital in-memory computing method and system based on cache and lookup table

By separating high and low bits in the feature map and using a combination of caching and lookup tables, the problem of exponential increase in cache hit rate and energy consumption caused by the growth of data bit width is solved, and low-power and small-area in-memory multiplication and accumulation calculation is realized.

CN122245368APending Publication Date: 2026-06-19TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2026-03-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing in-memory computing circuits based on static random access memory (SRAM) caches experience an exponential increase in hot data distribution as the data bit width increases linearly, leading to an exponential increase in cache hit rate and energy consumption, rendering traditional computing architectures unsuitable.

Method used

A digital in-memory computation method based on caching and lookup tables is adopted. By separating high and low bits of the feature map, the multiplication result of the high bit part is obtained by caching, and the multiplication result of the low bit part is obtained by lookup tables. Finally, the shifted addition is performed, and the target multiplication result is obtained by combining lookup tables and caching.

Benefits of technology

It achieves a lookup table with high hit rate and high utilization in a small area cache, reduces computing power consumption and circuit area, and is suitable for energy-efficient in-memory multiplication and accumulation calculations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245368A_ABST
    Figure CN122245368A_ABST
Patent Text Reader

Abstract

This invention provides a digital in-memory computation method and system based on caching and lookup tables. The method includes: for each feature map to be computed, separating the high and low bits of the feature map to obtain a high-bit portion and a low-bit portion; using a cache, obtaining a first multiplication result of the high-bit portion and the model weights corresponding to the feature map, and using a lookup table, obtaining a second multiplication result of the low-bit portion and the model weights; shifting and adding the first and second multiplication results to obtain the target multiplication result corresponding to the feature map. Based on the unique data characteristics of MSB and LSB, this invention achieves high hit rates with a small cache area and high utilization with lookup tables, thus realizing a low-power in-memory multiplier.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of high-efficiency multiply-accumulate calculation technology, and in particular to a digital in-memory calculation method and system based on caching and lookup tables. Background Technology

[0002] Related literature provides an in-memory computing circuit based on caching multiplication results in static random access memory. It leverages the good data locality of activation values ​​in artificial intelligence (AI) vision tasks under 8-bit data format and caches hotspot multiplication results to achieve a high-energy-efficiency in-memory computing circuit.

[0003] However, the limitation of the aforementioned in-memory computing circuit is that as the data bit width increases linearly, the distribution of hot data increases exponentially. To maintain the original cache hit rate, the area and power consumption of the static random access memory also increase exponentially, making the original cache-based computing architecture no longer applicable. Therefore, an effective solution is urgently needed to address these issues. Summary of the Invention

[0004] To address the aforementioned technical problems, this invention provides a method and system for in-memory digital computation based on caching and lookup tables.

[0005] This invention provides a method for in-memory computation of numbers based on caching and lookup tables, comprising: For each feature map to be calculated, the feature map is separated into high-bit and low-bit parts to obtain the high-bit part and low-bit part of the feature map; The method employs caching to obtain the first multiplication result of the high-bit portion and the model weights corresponding to the feature map, and employs a lookup table to obtain the second multiplication result of the low-bit portion and the model weights. The first multiplication result and the second multiplication result are shifted and added together to obtain the target multiplication result corresponding to the feature map.

[0006] According to the present invention, a digital in-memory computation method based on caching and lookup tables is provided, wherein the method of obtaining the first multiplication result of the high-bit portion and the model weights corresponding to the feature map by means of caching includes: In the cache, find the first multiplication result of the high-bit portion and the model weight; If the search is successful, then read the first product result; If the lookup fails, the first multiplication result of the high-bit portion and the model weight is calculated by a multiplier based on the lookup table, and the first multiplication result is written into the cache.

[0007] According to the present invention, a digital in-memory computation method based on caching and lookup tables is provided, wherein obtaining the second multiplication result of the low-bit portion and the model weights using a lookup table includes: Based on the decoder of the lookup table, the second multiplication result of the low bit portion and the model weight is found from the 11 multiplication results stored in the memory of the lookup table; If the search is successful, then read the second product result; If the lookup fails, the 11 multiplication results are shifted and supplemented based on the shifter of the lookup table to obtain 5 shifted and supplemented multiplication results. From the five multiplication results, determine the second multiplication result of the low-bit portion and the model weights.

[0008] According to the present invention, a digital in-memory computation method based on caching and lookup tables, before performing high-low bit separation on each feature map to be computed to obtain the high-bit portion and low-bit portion of the feature map, the method further includes: Determine the feature map index for each of the aforementioned feature maps; Align the feature maps based on their respective indexes; For each aligned feature map, perform the high-low bit separation of the feature map to obtain the high bit portion and low bit portion of the feature map, and then perform the following steps.

[0009] According to the present invention, a digital in-memory computation method based on caching and lookup tables is provided, wherein aligning the feature maps based on the feature map indices includes: The maximum value of each feature map index is obtained by searching for its maximum value. For each feature map, the feature map is shifted based on the feature map index and the maximum feature map index to obtain an aligned feature map.

[0010] According to the present invention, a digital in-memory computation method based on caching and lookup tables, wherein shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map includes: The first multiplication result and the second multiplication result are shifted and added together to obtain the initial multiplication result corresponding to the feature map; Based on the weight index of the model weights corresponding to the feature map, the initial multiplication result is aligned to obtain the target multiplication result.

[0011] The present invention also provides a digital in-memory computing system based on caching and lookup tables, comprising: It has N input channels, N control modules, N computing core modules, and N output channels, where N is a positive integer; The i-th input channel is used to receive the i-th feature map to be calculated, where i is a positive integer less than or equal to N; Each of the control modules and each of the computing core modules are jointly used to perform high-low bit separation on each feature map to obtain the high-bit portion and the low-bit portion of the feature map; using a caching method, the first multiplication result of the high-bit portion and the model weights corresponding to the feature map is obtained, and using a lookup table, the second multiplication result of the low-bit portion and the model weights is obtained; the first multiplication result and the second multiplication result are shifted and added to obtain the target multiplication result corresponding to the feature map; The i-th output channel is used to output the target multiplication result corresponding to the i-th feature map.

[0012] According to the present invention, a digital in-memory computing system based on caching and lookup tables is provided, wherein the control module includes a cache controller and a decoder for the lookup table, and the computing core module includes N multipliers based on the lookup table; For the i-th feature map: The cache controller in the i-th control module is specifically used to search for the first multiplication result of the high-bit portion and the model weight in the cache; if the search fails, the high-bit portion is input into the decoder. The decoder in the i-th control module is specifically used to generate a first control signal for the i-th multiplier based on the lookup table, based on the high-bit portion of the input. The i-th multiplier based on the lookup table in each of the computing core modules is specifically used to calculate the first multiplication result of the high-bit portion and the model weight based on the first control signal, and write the first multiplication result into the cache; The cache controller in the i-th control module is specifically used to generate a cache read signal corresponding to the high bit portion in the cache after the first multiplication result is written into the cache; The decoder in the i-th control module is specifically used to generate a second control signal for the i-th multiplier based on the low bit portion of the input after generating the buffer read signal; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to calculate the second multiplication result of the low bit portion and the model weight based on the second control signal; read the first multiplication result from the cache based on the cache read signal; and shift and add the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the i-th feature map.

[0013] According to the present invention, a digital in-memory computing system based on caching and lookup tables is provided, wherein the control module includes a cache controller and a decoder for the lookup table, and the computing core module includes N multipliers based on the lookup table; For the i-th feature map: The cache controller in the i-th control module is specifically used to search for the first multiplication result of the high-bit portion and the model weight in the cache; if the search is successful, a cache read signal corresponding to the high-bit portion is generated. The decoder in the i-th control module is specifically used to generate a second control signal for the i-th multiplier based on the lookup table, based on the low bit portion of the input, after generating the buffer read signal; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to calculate the second multiplication result of the low bit portion and the model weight based on the second control signal; read the first multiplication result from the cache based on the cache read signal; and shift and add the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the i-th feature map.

[0014] According to the present invention, a digital in-memory computing system based on caching and lookup tables is provided. Specifically, the M hybrid in-memory multipliers are used, based on the decoder of the lookup table, to search for a second multiplication result of the low-bit portion and the model weight from 11 multiplication results stored in the memory of the lookup table. If the search is successful, the second multiplication result is read. If the search fails, the 11 multiplication results are shifted and supplemented based on the shifter of the lookup table to obtain 5 shifted and supplemented multiplication results. From the 5 multiplication results, the second multiplication result of the low-bit portion and the model weight is determined.

[0015] According to the present invention, a digital in-memory computing system based on caching and lookup tables further includes: The first-level alignment module is used to receive each of the feature maps to be calculated and determine the feature map index of each feature map; align each feature map based on the feature map index; and input each aligned feature map into a different digital memory computing unit.

[0016] According to the present invention, a digital in-memory computing system based on caching and lookup tables is provided, wherein the computing core module further includes a second-level alignment module; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to shift and add the first multiplication result and the second multiplication result to obtain the initial multiplication result corresponding to the i-th feature map; The second-level alignment module in each of the multipliers is specifically used to align the initial multiplication result based on the weight index of the model weight corresponding to the i-th feature map, so as to obtain the target multiplication result corresponding to the i-th feature map.

[0017] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the digital in-memory calculation method based on cache and lookup table as described above.

[0018] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the digital in-memory computation method based on cache and lookup table as described above.

[0019] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the digital in-memory calculation method based on cache and lookup table as described above.

[0020] The present invention provides a digital in-memory computation method and system based on caching and lookup tables. For each feature map to be computed, the high-bit and low-bit portions of the feature map are separated. A caching method is used to obtain the first multiplication result of the high-bit portion and the corresponding model weights of the feature map, and a lookup table is used to obtain the second multiplication result of the low-bit portion and the corresponding model weights. The first and second multiplication results are then shifted and added to obtain the target multiplication result corresponding to the feature map. Based on the unique data characteristics of MSB and LSB, this invention achieves a high hit rate with a small cache area and high utilization with the lookup table, thus realizing a low-power in-memory multiplier. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0022] Figure 1 This is one of the flowcharts illustrating the in-memory calculation method based on caching and lookup tables provided by the present invention.

[0023] Figure 2 This is the second flowchart of the digital in-memory calculation method based on caching and lookup tables provided by the present invention.

[0024] Figure 3 This is a schematic diagram of a lookup table-based multiplier provided by the present invention.

[0025] Figure 4 This is a flowchart illustrating the two-level alignment method provided by the present invention.

[0026] Figure 5 This is a schematic diagram of the structure of the digital in-memory computing system based on caching and lookup tables provided by the present invention.

[0027] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0029] The following is combined with Figures 1 to 6 This invention describes a digital in-memory computation method and system based on caching and lookup tables.

[0030] Figure 1 This is one of the flowcharts illustrating the in-memory computation method based on caching and lookup tables provided by the present invention, such as... Figure 1 As shown, the method includes the following: Step 101: For each feature map to be calculated, perform high-low bit separation on the feature map to obtain the high bit part and low bit part of the feature map; Step 102: Using a caching method, obtain the first multiplication result of the high-bit portion and the model weights corresponding to the feature map, and using a lookup table method, obtain the second multiplication result of the low-bit portion and the model weights. Step 103: Shift and add the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map.

[0031] The digital in-memory computation method based on caching and lookup tables provided by this invention is used for energy-efficient multiply-accumulate computation and can be applied to high-precision visual AI tasks such as image classification, image recognition, and super-resolution using data formats such as 12-bit signed integers (INT12) and half-precision floating-point numbers (Floating Point 16, FP16).

[0032] The execution subject of the digital in-memory computing method based on caching and lookup tables provided by this invention can be a digital in-memory computing system based on caching and lookup tables, or an electronic device.

[0033] Specifically, the feature map ACT can be the activation value data for a visual AI task. The high-bit portion (MSB) is the set of bits in the feature map ACT with weights higher than a set value, and the low-bit portion (LSB) is the set of bits in the feature map ACT with weights lower than a set value. The set value is half the number of bits in the feature map ACT, and the weight is the numerical value represented by the bit; the larger the weight, the larger the numerical value represented by the bit. Both the feature map ACT and the model weights are floating-point numbers, which include a sign bit, an exponent bit, and a mantissa bit.

[0034] In practical applications, the main operator in neural network algorithms is the matrix multiplication of model weights WT and feature maps ACT. Therefore, this invention mainly focuses on optimizing the multiplication operation.

[0035] See Figure 2 , Figure 2This is the second flowchart of the digital in-memory computation method based on caching and lookup tables provided by this invention: To address the inefficiency of existing caching schemes for high-bit-width data formats such as FP16 and INT12, this invention leverages the good data locality of the high-bit portion (MSB) and the near-uniform distribution of the low-bit portion (LSB). During each multiplication operation, the feature map ACT of each input is segmented into high-bit and low-bit portions, resulting in the high-bit portion (MSB) and the low-bit portion (LSB) of the feature map ACT. Further, the high-bit portion (MSB) is multiplied using cache-based multiplication to obtain the first multiplication result (MSB × WT), while the low-bit portion (LSB) is multiplied using lookup tables to obtain the second multiplication result (LSB × WT). Finally, the multiplier sums the two parts to obtain the final multiplication result, i.e., the target multiplication result (ACT × WT).

[0036] It should be noted that the adder can be an accumulation circuit, employing an addition tree similar to traditional in-memory computing circuits. Furthermore, as... Figure 2 As shown, the lookup table can be a high-density pre-computed weight lookup table.

[0037] The present invention provides a digital in-memory computation method based on caching and lookup tables. For each feature map to be computed, the high-bit and low-bit portions of the feature map are separated. A caching method is used to obtain the first multiplication result of the high-bit portion and the corresponding model weights of the feature map. A lookup table is used to obtain the second multiplication result of the low-bit portion and the corresponding model weights. The first and second multiplication results are then shifted and added to obtain the target multiplication result corresponding to the feature map. Based on the unique data characteristics of MSB and LSB, this invention achieves high hit rates with a small cache area and high utilization of the lookup table, thus realizing a low-power in-memory multiplier.

[0038] Optionally, the step of obtaining the first multiplication result of the high-bit portion and the model weights corresponding to the feature map using a caching method includes: In the cache, find the first multiplication result of the high-bit portion and the model weight; If the search is successful, then read the first product result.

[0039] In practical applications, the first product of MSB and WT is searched in the cache. If the cache is hit, the product result is read directly from the cache.

[0040] See Figure 2, the data stream when the high-bit part MSB cache hits is as follows: Look up the first multiplication result of the high-bit part and the model weight in the cache. If found, that is, the cache hits, generate a cache read signal (carrying the hit address), and at the same time, control the low-bit part LSB to enter the lookup table decoder through the selector to generate the control signal of the multiplier based on the lookup table; further, read the first multiplication result from the cache through the read interface of the cache, and use the multiplier based on the lookup table to obtain the LSB multiplication result, that is, the second multiplication result; then, shift (<<n, that is, shift n bits) and add the first multiplication result and the second multiplication result to obtain the target multiplication result.

[0041] Exemplarily, Product = ACT × WT = (MSB × 2 n + LSB) × WT = (MSB × WT) << n + LSB × WT, where Product is the target multiplication result, and n is the number of bits of the low-bit part, that is, the shift amount.

[0042] In the embodiments of the present invention, by utilizing the different data distribution characteristics of the high and low bits of the high-width data of the feature map in the visual AI task and using the corresponding efficient calculation methods, the reduction of the calculation power consumption and the circuit area is achieved.

[0043] Optionally, after looking up the first multiplication result of the high-bit part and the model weight in the cache, it further includes: Look up the first multiplication result of the high-bit part and the model weight in the cache; If the lookup fails, calculate the first multiplication result of the high-bit part and the model weight through the multiplier based on the lookup table, and write the first multiplication result into the cache.

[0044] Specifically, the multiplier based on the lookup table, the multiplier, and the lookup table multiplier are the same object.

[0045] In practical applications, look up the first product result of MSB and WT in the cache. If not hit, that is, the first product result (MSB × WT) of MSB and WT is missing, then use the multiplier circuit based on the lookup table to calculate the first product result and write it into the product cache.

[0046] See Figure 2, when there is a cache miss in the most significant bit (MSB) part, the data flow is as follows: Search for the first multiplication result of the most significant bit (MSB) and the model weight (WT) in the cache. If not found, that is, a cache miss (MSB×WT missing), first block the input data (control the least significant bit (LSB) part not to enter the lookup table decoder by setting the selector to 0). At the same time, the most significant bit (MSB) part enters the lookup table decoder to generate the first control signal of the multiplier based on the lookup table. Then, use the multiplier based on the lookup table to obtain the first multiplication result and update it to the cache (replace the address), for example, write it into the cache through the write interface of the cache. At this time, the cache hit signal is generated for the most significant bit (MSB) part, and the least significant bit (LSB) part enters the lookup table decoder to generate the second control signal of the multiplier based on the lookup table. Further, read the first multiplication result from the cache through the read interface of the cache, and use the multiplier based on the lookup table to obtain the second multiplication result of the least significant bit (LSB) and the model weight (WT). Finally, shift (<<n, that is, shift n bits) and add the first multiplication result and the first multiplication result to obtain the target multiplication result.

[0047] In the embodiment of the present invention, in the case where the most significant bit part is not hit, a method combining a lookup table and a cache, that is, a cache and reuse method, is adopted to obtain the first multiplication result of the most significant bit part and the model weight, ensuring the smooth acquisition of the first multiplication result and improving the energy efficiency.

[0048] Optionally, the method of using the lookup table to obtain the second multiplication result of the least significant bit part and the model weight includes: Based on the decoder of the lookup table, search for the second multiplication result of the least significant bit part and the model weight from the 11 multiplication results stored in the memory of the lookup table; if the search is successful, read the second product result; if the search fails, based on the shifter of the lookup table, perform shift and supplement on the 11 multiplication results to obtain 5 shifted and supplemented multiplication results; determine the second multiplication result of the least significant bit part and the model weight from the 5 multiplication results. Alternatively, based on the decoder of the lookup table, read the 11 multiplication results stored in the memory of the lookup table; based on the shifter of the lookup table, perform shift and supplement on the 11 multiplication results to obtain 5 shifted and supplemented multiplication results; determine the second multiplication result of the least significant bit part and the model weight from the 11 multiplication results and the 5 multiplication results.

[0049] See Figure 3 , Figure 3This is a schematic diagram of the lookup table-based multiplier provided by the present invention: the lookup table-based multiplier is a high-density lookup table-based multiplier, used to calculate the second multiplication result of the low-bit portion MSB and the model weight WT, and the first multiplication result of the high-bit portion MSB and the model weight WT when the cache is missing.

[0050] See Figure 3 The lookup table-based multiplier includes a lookup table decoder (i.e., a lookup table decoder, including an MSB lookup table decoder and an LSB lookup table decoder), a transmission gate-based shifter, and a lookup table memory (static random access memory). The static random access memory (ROM) can be an 11-row memory with one write port and two read ports, containing 11 storage areas, one write interface, and two read interfaces. Each of the 11 storage areas stores one multiplication result, meaning the 11 storage areas store a total of 11 multiplication results: 0 (ROM), 1WT, 3WT, 4WT, 5WT, 7WT, 9WT, 11WT, 12WT, 13WT, and 15WT. Here, WT represents the model weights.

[0051] Specifically, when the shift is 0 (<<0), the read data from the lookup table decoder includes 0, 1WT, 3WT, 4WT, 5WT, 7WT, 9WT, 11WT, 12WT, 13WT, and 15WT; when the shift is 1 (<<1), the read data from the lookup table decoder includes 2WT, 6WT, 8WT, 10WT, and 14WT. The shifter can perform shift operations of <<0 / 1, where <<0 / 1 represents a 0-bit shift to the left or a 1-bit shift to the left.

[0052] In practical applications, a lookup table-based multiplier can use a lookup table-based decoder to search for the second multiplication result of the low-bit portion and the model weights from the 11 multiplication results stored in the lookup table's memory. If the search is successful (<<0), the second product result is read. If the search fails, a lookup table-based shifter is needed to shift and supplement the 11 multiplication results (<<1) to obtain 5 shifted and supplemented multiplication results. From these 5 multiplication results, the second multiplication result of the low-bit portion and the model weights is determined.

[0053] Similarly, when the first multiplication result of the high-bit portion and model weights is not found in the cache, the multiplier based on the lookup table can use the decoder based on the lookup table to find the first multiplication result of the high-bit portion and model weights from the 11 multiplication results stored in the memory of the lookup table; if the search is successful (<<0), the first product result is read; if the search fails, the shifter based on the lookup table needs to shift and supplement the 11 multiplication results (<<1) to obtain 5 shifted and supplemented multiplication results; from the 5 multiplication results, the first multiplication result of the high-bit portion and model weights is determined.

[0054] For example, see Figure 3 The feature map ACT contains 8 bits, i.e., <7:0>. Bits 4 to 7 of <7:0> are divided into a high-bit portion <7:4>, and bits 0 to 3 of <7:0> are divided into a low-bit portion <3:0>. When the high-bit portion <7:4> is not found in the cache, the MSB lookup table decoder retrieves 11 multiplication results from the cache table. It then performs a << 0 / 1 shift operation on these 11 multiplication results using a transmission gate-based shifter, resulting in a total of 16 multiplication results. The first multiplication result is then retrieved from these 16 results. Further, the LSB lookup table decoder retrieves 11 multiplication results from the cache table. It then performs a << 0 / 1 shift operation on these 11 multiplication results using a transmission gate-based shifter, resulting in a total of 16 multiplication results. The second multiplication result corresponding to <3:0> is then retrieved from these 16 multiplication results. Then, the first multiplication result is shifted to the left by 4 bits (<<4) and added to the second multiplication result to obtain the target multiplication result Product=ACT[7:0]×WT.

[0055] In this embodiment of the invention, to save the area of ​​the multiplier based on the lookup table, the decoder and shifter were designed. First, an 11-row lookup table was designed. Based on this 11-row lookup table, only 1 bit shift is needed to obtain 16 multiplication results (11 multiplication results and 5 multiplication results obtained by shifting).

[0056] Optionally, before performing high-low bit separation on each feature map to obtain the high-bit portion and low-bit portion of the feature map, the method further includes: Determine the feature map index for each of the aforementioned feature maps; Align the feature maps based on their respective indexes; For each aligned feature map, perform the high-low bit separation of the feature map to obtain the high bit portion and low bit portion of the feature map, and then perform the following steps.

[0057] Specifically, the feature map index, also known as the activation value index (Ein), can be the number of exponent bits contained in the feature map (a floating-point number).

[0058] In practical applications, existing in-memory computation circuits based on static random access memory (SRAM) caching multiplication results are incompatible with cache-based architectures for the necessary alignment operations in floating-point multiplication and accumulation. Using pre-multiplication alignment leads to a significant decrease in precision, while post-multiplication alignment results in a decrease in cache hit rate. In other words, traditional alignment methods are incompatible with cache-based multiplication. Therefore, this embodiment employs a two-level alignment method, see [link to relevant documentation]. Figure 4 , Figure 4 This is a flowchart illustrating the two-level alignment method provided by the present invention: including first-level alignment (displacement based on feature map index (Ein)) and second-level alignment (displacement based on weight index (Ewt)).

[0059] Before segmenting each feature map, a first-level alignment can be performed, i.e., based on the Ein displacement. See [link to documentation]. Figure 4 The Ein shift process is as follows: The input buffer receives N feature maps, where N is 16, and these N feature maps are ACT0 to ACT15. The feature map indices for each of the N feature maps are determined, i.e., Ein0 to Ein15, where Ein0 is the feature map index of ACT0 and Ein15 is the feature map index of ACT15. Then, the feature maps are aligned according to Ein0 to Ein15. The feature maps after the first-level alignment can enter the control module (e.g., control module 0-control module 15) to generate shared control signals for the output channel. Simultaneously, after the first-level alignment, the data locality of the input feature maps is significantly improved compared to before alignment.

[0060] Optionally, aligning the feature maps based on their respective indexes includes: The maximum value of each feature map index is obtained by searching for its maximum value. For each feature map, the feature map is shifted based on the feature map index and the maximum feature map index to obtain an aligned feature map.

[0061] See Figure 4 After determining Ein0 to Ein15, a maximum value search is performed, that is, the largest feature map index Emax among Ein0 to Ein15 is determined. For each feature map, a shifter uses the difference between Emax and the feature map index of that feature map as the first right shift amount (the number of bits to shift to the right). Based on this first right shift amount, the mantissa Min of the feature map is shifted to obtain the aligned feature map corresponding to that feature map. Figure 4In the process, the shifter calculates the difference between Emax and Ein0 to obtain the first right shift amount. Then, based on the first right shift amount, the mantissa Min0 of ACT0 is shifted to obtain the aligned ACT0. The aligned ACT0 is then input to the control module 0.

[0062] Optionally, the step of shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map includes: The first multiplication result and the second multiplication result are shifted and added together to obtain the initial multiplication result corresponding to the feature map; Based on the weight index of the model weights corresponding to the feature map, the initial multiplication result is aligned to obtain the target multiplication result.

[0063] See Figure 4 After the control module segments the aligned feature map into high-bit and low-bit portions, if the high-bit portion cache misses, a lookup table-based multiplier is used to calculate the multiplication result between the high-bit portion and the model weights, i.e., the first multiplication result; or, if the high-bit portion cache hits, the first multiplication result is retrieved from the cache. For the low-bit portion, a lookup table-based multiplier calculates the second multiplication result between the low-bit portion and the model weights. Then, the lookup table-based multiplier shifts and adds the first and second multiplication results to obtain the initial multiplication result, and determines the maximum weight exponent Ewtmax among the weight exponents of the model weights corresponding to each feature map. The difference between the maximum weight exponent Ewtmax and the weight exponent Ewt of the model weights corresponding to that feature map is determined as the second right shift amount. Further, the initial multiplication result is right-shifted based on the second right shift amount to obtain the target multiplication result.

[0064] For example, the aligned ACT0 is input to the control module 0 for processing and then to the multiplier 0. The multiplier 0 obtains the first initial multiplication result and the second initial multiplication result corresponding to ACT0. The first initial multiplication result and the second initial multiplication result corresponding to ACT0 are shifted and added to obtain the initial multiplication result corresponding to ACT0. Then, Ewtmax is subtracted from the weight exponent Ewt0 corresponding to ACT0 to obtain the second right shift (Ewtmax-Ewt0). Based on Ewtmax-Ewt0, the initial multiplication result corresponding to ACT0 is shifted to obtain the target multiplication result corresponding to ACT0.

[0065] In this embodiment of the invention, the second-level alignment is to store only the shift bits after calculating the weight exponent offline, which reduces the area of ​​the comparator and subtractor. At the same time, the shift reduces the bit width of the data and reduces the overhead of the subsequent addition tree.

[0066] The present invention provides a digital in-memory computation method based on caching and lookup tables. By utilizing the different data distribution characteristics of high-bit and low-bit high-width data in feature maps in visual AI tasks, and using corresponding efficient computing circuits, it achieves a reduction in computing power consumption and circuit area. Specifically, based on the different data distribution characteristics of activation value data after high and low bit separation in visual AI tasks, the high bits have excellent data locality, so the high bits of the activation value are cached and reused by multiplying the high bits of the activation value with the model weights. The low bits are uniformly distributed, so the low bits of the activation value are pre-calculated and reused by multiplying the low bits of the activation value with the model weights, thus improving energy efficiency. By designing a new two-level alignment method, which is highly compatible with cache-based multipliers, it realizes support for floating-point multiplication and accumulation. By designing a high-density lookup table-based multiplier, the circuit area is reduced. In other words, to address the problem of mismatch between traditional floating-point alignment operations and cache, a two-level alignment technology is proposed to fully reuse the cache and achieve high-efficiency in-memory computation.

[0067] The following describes the in-memory computing system based on caching and lookup tables provided by the present invention. The in-memory computing system based on caching and lookup tables described below can be referred to in correspondence with the in-memory computing method based on caching and lookup tables described above.

[0068] Figure 5 This is a schematic diagram of the structure of the digital in-memory computing system based on caching and lookup tables provided by the present invention, as shown below. Figure 5 As shown, the system includes: It has N input channels, N control modules, N computing core modules, and N output channels, where N is a positive integer; The i-th input channel is used to receive the i-th feature map to be calculated, where i is a positive integer less than or equal to N; Each of the control modules and each of the computing core modules are jointly used to perform high-low bit separation on each feature map to obtain the high-bit portion and the low-bit portion of the feature map; using a caching method, the first multiplication result of the high-bit portion and the model weights corresponding to the feature map is obtained, and using a lookup table, the second multiplication result of the low-bit portion and the model weights is obtained; the first multiplication result and the second multiplication result are shifted and added to obtain the target multiplication result corresponding to the feature map; The i-th output channel is used to output the target multiplication result corresponding to the i-th feature map.

[0069] Specifically, N can be 16.

[0070] For example, see Figure 4The in-memory computing system based on caching and lookup tables has 16 input channels, 16 output channels, and 16 control modules (such as...). Figure 4 The control modules 0 to 15 are shown in the diagram, and there are 16 computing core modules (such as...). Figure 4 (As shown in the ×16OCH diagram). Model weights can be matrices. Each input feature map can be a feature map vector. (16 feature maps), perform matrix-vector multiplication to obtain the output target multiplication result. .

[0071] Each control module corresponds to one input channel and generates the control signals required for multiplication calculation based on the input feature map, including buffer control and lookup table control signals. All control signals generated by the 16 control modules are sent to each computing core module, and all control signals are shared by the 16 computing core modules.

[0072] In practical applications, the received i-th feature map is transmitted to the i-th control module through the i-th input channel. The i-th control module, in conjunction with each computational core module, performs ACT segmentation on the i-th feature map ACT, i.e., high-low bit separation, to obtain the high-bit portion MSB and low-bit portion LSB of the i-th feature map ACT. Further, the high-bit portion MSB is calculated using cache-based multiplication to obtain the first multiplication result (MSB×WT), and the low-bit portion LSB is calculated using lookup table-based multiplication to obtain the second multiplication result (LSB×WT). Finally, a multiplier is used to sum the two parts to obtain the final multiplication result, i.e., the target multiplication result (ACT×WT) corresponding to the i-th feature map ACT, and the target multiplication result corresponding to the i-th feature map ACT is output through the i-th output channel.

[0073] The in-memory computing system based on caching and lookup tables provided by this invention separates the high-bit and low-bit portions of each feature map to be computed. It then uses a cache to obtain the first multiplication result of the high-bit portion and the corresponding model weights of the feature map, and uses a lookup table to obtain the second multiplication result of the low-bit portion and the model weights. Finally, it shifts and adds the first and second multiplication results to obtain the target multiplication result corresponding to the feature map. Based on the unique data characteristics of MSB and LSB, this invention achieves a high hit rate with a small cache area and high utilization with the lookup table, thus realizing a low-power in-memory multiplier.

[0074] Optionally, the control module includes a cache controller and a decoder for the lookup table, and the computation core module includes N multipliers based on the lookup table; For the i-th feature map: The cache controller in the i-th control module is specifically used to search for the first multiplication result of the high-bit portion and the model weight in the cache; if the search fails, the high-bit portion is input into the decoder. The decoder in the i-th control module is specifically used to generate a first control signal for the i-th multiplier based on the lookup table, based on the high-bit portion of the input. The i-th multiplier based on the lookup table in each of the computing core modules is specifically used to calculate the first multiplication result of the high-bit portion and the model weight based on the first control signal, and write the first multiplication result into the cache; The cache controller in the i-th control module is specifically used to generate a cache read signal corresponding to the high bit portion in the cache after the first multiplication result is written into the cache; The decoder in the i-th control module is specifically used to generate a second control signal for the i-th multiplier based on the low bit portion of the input after generating the buffer read signal; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to calculate the second multiplication result of the low bit portion and the model weight based on the second control signal; read the first multiplication result from the cache based on the cache read signal; and shift and add the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the i-th feature map.

[0075] Specifically, one control module includes one cache controller and one decoder, and one computing core module contains 16 multipliers (such as...). Figure 4 (Multipliers 0 to 15 are shown in the diagram). Additionally, each computational core module can contain one quantization module.

[0076] In practical applications, see Figure 2 and Figure 4, when the high - order bit part MSB cache misses, the data stream is as follows: for the i - th feature map, the cache controller of the i - th control module (the i - th among control modules 0 to 15) looks up the first multiplication result of the high - order bit part MSB and the model weight WT in the cache. If not found, that is, the cache misses (MSB×WT is missing), the input data is blocked first. At the same time, the high - order bit part MSB enters the lookup table decoder of the i - th control module to generate the first control signal of the i - th lookup - table - based multiplier. Then, the i - th lookup - table multiplier in each computing core module responds to the first control signal respectively, determines the first multiplication result, and updates it to the cache. At this time, the cache controller of the i - th control module looks up the first multiplication result of the high - order bit part MSB and the model weight WT in the cache. When the high - order bit part MSB cache hits, a cache read signal is generated, and the low - order bit part LSB enters the lookup table decoder to generate the second control signal of the i - th lookup - table - based multiplier. Further, the i - th lookup - table multiplier in each computing core module (the i - th among multipliers 0 to 15) reads the first multiplication result from the cache through the read interface of the cache respectively. The lookup - table - based multiplier responds to the second control signal to obtain the second multiplication result of the low - order bit part LSB and the model weight WT. Finally, the i - th lookup - table multiplier in each computing core module shifts (<<n, that is, shifts n bits) and adds the first multiplication result obtained by itself to the first multiplication result respectively to obtain the target multiplication result of each i - th lookup - table multiplier in each computing core module for its own sub - part.

[0077] In the embodiments of the present invention, when the high - order bit part is a hit, a method combining a lookup table and a cache, that is, a cache and reuse method, is adopted to obtain the first multiplication result of the high - order bit part and the model weight, ensuring the smooth acquisition of the first multiplication result and improving energy efficiency.

[0078] Optionally, the control module includes a cache controller and a decoder of the lookup table, and the computing core module includes N lookup - table - based multipliers; For the i - th feature map: The cache controller in the i - th control module is specifically configured to look up the first multiplication result of the high - order bit part and the model weight in the cache; if the lookup is successful, a cache read signal corresponding to the high - order bit part is generated; The decoder in the i - th control module is specifically configured to generate the second control signal of the i - th lookup - table - based multiplier based on the input low - order bit part after generating the cache read signal; The i-th multiplier based on the lookup table in each of the computing core modules is specifically configured to calculate the second multiplication result of the low-bit part and the model weight based on the second control signal; read the first product result from the cache based on the cache read signal; and perform a shift-and-add operation on the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the i-th feature map.

[0079] See Figure 2 and Figure 4 When the high-bit part MSB cache hits, the data flow is as follows: For the i-th feature map, the cache controller of the i-th control module looks up the first multiplication result of the high-bit part MSB and the model weight WT in the cache. When the high-bit part MSB cache hits, a cache read signal is generated. The low-bit part LSB enters the lookup table decoder to generate the second control signal for the i-th multiplier based on the lookup table. Subsequently, the i-th lookup table multiplier in each computing core module reads the first multiplication result from the cache through the cache read interface. The multiplier based on the lookup table obtains the second multiplication result of the low-bit part LSB and the model weight WT in response to the second control signal. Finally, the i-th lookup table multiplier in each computing core module respectively performs a shift (<<n, i.e., shift n bits) and add operation on the first multiplication result obtained by each of them to obtain the target multiplication result divided by each of the i-th lookup table multipliers in each computing core module.

[0080] Optionally, the multiplier is specifically configured to look up the second multiplication result of the low-bit part and the model weight from 11 multiplication results stored in the memory of the lookup table based on the decoder of the lookup table; if the lookup is successful, read the second product result; if the lookup fails, perform a shift and supplement operation on the 11 multiplication results based on the shifter of the lookup table to obtain 5 shifted and supplemented multiplication results; and determine the second multiplication result of the low-bit part and the model weight from the 5 multiplication results. Alternatively, the multiplier is specifically configured to read 11 multiplication results stored in the memory of the lookup table based on the decoder of the lookup table; perform a shift and supplement operation on the 11 multiplication results based on the shifter of the lookup table to obtain 5 shifted and supplemented multiplication results; and determine the second multiplication result of the low-bit part and the model weight from the 11 multiplication results and the 5 multiplication results.

[0081] In practical applications, a lookup table-based multiplier can use a lookup table-based decoder to search for the second multiplication result of the low-bit portion and the model weights from the 11 multiplication results stored in the lookup table's memory. If the search is successful (<<0), the second product result is read. If the search fails, a lookup table-based shifter is needed to shift and supplement the 11 multiplication results (<<1) to obtain 5 shifted and supplemented multiplication results. From these 5 multiplication results, the second multiplication result of the low-bit portion and the model weights is determined.

[0082] Similarly, when the first multiplication result of the high-bit portion and model weights is not found in the cache, the multiplier based on the lookup table can use the decoder based on the lookup table to find the first multiplication result of the high-bit portion and model weights from the 11 multiplication results stored in the memory of the lookup table; if the search is successful (<<0), the first product result is read; if the search fails, the shifter based on the lookup table needs to shift and supplement the 11 multiplication results (<<1) to obtain 5 shifted and supplemented multiplication results; from the 5 multiplication results, the first multiplication result of the high-bit portion and model weights is determined.

[0083] For example, see Figure 3 The feature map ACT contains 8 bits, i.e., <7:0>. Bits 4 to 7 of <7:0> are divided into a high-bit portion <7:4>, and bits 0 to 3 of <7:0> are divided into a low-bit portion <3:0>. When the high-bit portion <7:4> is not found in the cache, the MSB lookup table decoder retrieves 11 multiplication results from the cache table. It then performs a << 0 / 1 shift operation on these 11 multiplication results using a transmission gate-based shifter, resulting in a total of 16 multiplication results. The first multiplication result is then retrieved from these 16 results. Further, the LSB lookup table decoder retrieves 11 multiplication results from the cache table. It then performs a << 0 / 1 shift operation on these 11 multiplication results using a transmission gate-based shifter, resulting in a total of 16 multiplication results. The second multiplication result corresponding to <3:0> is then retrieved from these 16 multiplication results. Then, the first multiplication result is shifted to the left by 4 bits (<<4) and added to the second multiplication result to obtain the target multiplication result Product=ACT[7:0]×WT.

[0084] In this embodiment of the invention, to save the area of ​​the multiplier based on the lookup table, the decoder and shifter were designed. First, an 11-row lookup table was designed. Based on this 11-row lookup table, only 1 bit shift is needed to obtain 16 multiplication results (11 multiplication results and 5 multiplication results obtained by shifting).

[0085] Optionally, the in-memory computing system based on caching and lookup tables further includes: The first-level alignment module is used to receive each of the feature maps to be calculated and determine the feature map index of each feature map; align each feature map based on the feature map index; and input each aligned feature map into a different digital memory computing unit.

[0086] In practical applications, before segmenting each feature map, a first-level alignment module can be used to perform a first-level alignment of the feature maps, i.e., based on the Ein displacement. See [link to documentation]. Figure 4 The Ein shift process is as follows: The input buffer receives N feature maps, where N is 16, and these N feature maps are ACT0 to ACT15. The feature map indices for each of the N feature maps are determined, i.e., Ein0 to Ein15, where Ein0 is the feature map index of ACT0 and Ein15 is the feature map index of ACT15. Then, the feature maps are aligned according to Ein0 to Ein15. The feature maps after the first-level alignment can enter the control module (e.g., control module 0-control module 15) to generate shared control signals for the output channel. Simultaneously, after the first-level alignment, the data locality of the input feature maps is significantly improved compared to before alignment.

[0087] Optionally, the computational core module further includes a second-level alignment module; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to shift and add the first multiplication result and the second multiplication result to obtain the initial multiplication result corresponding to the i-th feature map; The second-level alignment module in each of the multipliers is specifically used to align the initial multiplication result based on the weight index of the model weight corresponding to the i-th feature map, so as to obtain the target multiplication result corresponding to the i-th feature map.

[0088] Specifically, each computational core module contains 16 multipliers and one second-level alignment unit.

[0089] See Figure 4For the i-th feature map, after the i-th control module segments the aligned i-th feature map into high-bit and low-bit parts, if the high-bit part cache is not hit, a multiplication result of the high-bit part and the model weights needs to be calculated using a lookup table-based multiplier, i.e., the first multiplication result; or, if the high-bit part cache is hit, the first multiplication result is retrieved from the cache. For the low-bit part, for each computation kernel module, the i-th multiplier in that computation kernel module calculates the second multiplication result of the low-bit part and the model weights. Then, the i-th multiplier in that computation kernel module shifts and adds the first and second multiplication results to obtain the initial multiplication result, and determines the maximum weight index Ewtmax among the weight indices of the model weights corresponding to each feature map. The difference between the maximum weight index Ewtmax and the weight index Ewt of the model weights corresponding to that feature map is determined as the second right shift amount. Furthermore, the second-level alignment module in the computation kernel module right-shifts the initial multiplication result based on the second right shift amount to obtain the target multiplication result of the i-th feature map output in the computation kernel module.

[0090] In this embodiment of the invention, the second-level alignment is to store only the shift bits after calculating the weight exponent offline, which reduces the area of ​​the comparator and subtractor. At the same time, the shift reduces the bit width of the data and reduces the overhead of the subsequent addition tree.

[0091] In addition, see Figure 4 Each computational kernel module also includes an addition tree. The addition tree is used to superimpose the target multiplication results of the 16 multipliers in its computational kernel module to obtain the model weights and feature map vectors corresponding to that computational kernel module. The total objective multiplication result (16 feature maps).

[0092] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention, such as... Figure 6As shown, the electronic device may include a processor 610, a communications interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communications interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a digital in-memory computation method based on caching and lookup tables. The method includes: for each feature map to be computed, separating the high and low bits of the feature map to obtain a high bit portion and a low bit portion; using a caching method, obtaining a first multiplication result of the high bit portion and the model weights corresponding to the feature map, and using a lookup table method, obtaining a second multiplication result of the low bit portion and the model weights; shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map.

[0093] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0094] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the in-memory digital computation method based on caching and lookup tables provided by the above methods. The method includes: for each feature map to be computed, separating the high and low bits of the feature map to obtain a high bit portion and a low bit portion of the feature map; using a caching method, obtaining a first multiplication result of the high bit portion and the model weights corresponding to the feature map, and using a lookup table method, obtaining a second multiplication result of the low bit portion and the model weights; shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map.

[0095] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements a cache- and lookup table-based digital in-memory computation method provided by the above methods. The method includes: for each feature map to be computed, performing high- and low-bit separation on the feature map to obtain a high-bit portion and a low-bit portion of the feature map; using a cache method, obtaining a first multiplication result of the high-bit portion and the model weights corresponding to the feature map, and using a lookup table method, obtaining a second multiplication result of the low-bit portion and the model weights; shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map.

[0096] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0097] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0098] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for in-memory computation of numbers based on caching and lookup tables, characterized in that, include: For each feature map to be calculated, the feature map is separated into high-bit and low-bit parts to obtain the high-bit part and low-bit part of the feature map; The method employs caching to obtain the first multiplication result of the high-bit portion and the model weights corresponding to the feature map, and employs a lookup table to obtain the second multiplication result of the low-bit portion and the model weights. The first multiplication result and the second multiplication result are shifted and added together to obtain the target multiplication result corresponding to the feature map.

2. The in-memory calculation method based on caching and lookup tables according to claim 1, characterized in that, The method of obtaining the first multiplication result of the high-bit portion and the model weights corresponding to the feature map using caching includes: In the cache, find the first multiplication result of the high-bit portion and the model weight; If the search is successful, then read the first product result; If the lookup fails, the first multiplication result of the high-bit portion and the model weight is calculated by a multiplier based on the lookup table, and the first multiplication result is written into the cache.

3. The in-memory calculation method based on caching and lookup tables according to claim 1, characterized in that, The step of obtaining the second multiplication result of the low-bit portion and the model weights using a lookup table includes: Based on the decoder of the lookup table, the second multiplication result of the low bit portion and the model weight is found from the 11 multiplication results stored in the memory of the lookup table; If the search is successful, then read the second product result; If the lookup fails, the 11 multiplication results are shifted and supplemented based on the shifter of the lookup table to obtain 5 shifted and supplemented multiplication results. From the five multiplication results, determine the second multiplication result of the low-bit portion and the model weights.

4. The in-memory calculation method based on caching and lookup tables according to claim 1, characterized in that, Before performing high-low bit separation on each feature map to obtain the high-bit portion and low-bit portion of the feature map, the method further includes: Determine the feature map index for each of the aforementioned feature maps; Align the feature maps based on their respective indexes; For each aligned feature map, perform the high-low bit separation of the feature map to obtain the high bit portion and low bit portion of the feature map, and then perform the following steps.

5. The in-memory calculation method based on caching and lookup tables according to claim 4, characterized in that, The alignment of the feature maps based on their respective indexes includes: The maximum value of each feature map index is obtained by searching for its maximum value. For each feature map, the feature map is shifted based on the feature map index and the maximum feature map index to obtain an aligned feature map.

6. The in-memory calculation method based on caching and lookup tables according to any one of claims 1-5, characterized in that, The step of shifting and adding the first multiplication result and the second multiplication result to obtain the target multiplication result corresponding to the feature map includes: The first multiplication result and the second multiplication result are shifted and added together to obtain the initial multiplication result corresponding to the feature map; Based on the weight index of the model weights corresponding to the feature map, the initial multiplication result is aligned to obtain the target multiplication result.

7. A digital in-memory computing system based on caching and lookup tables, characterized in that, include: It has N input channels, N control modules, N computing core modules, and N output channels, where N is a positive integer; The i-th input channel is used to receive the i-th feature map to be calculated, where i is a positive integer less than or equal to N; Each of the control modules and each of the computing core modules are used together to perform high-low bit separation on each feature map to obtain the high bit part and the low bit part of the feature map; using a caching method, the first multiplication result of the high bit part and the model weight corresponding to the feature map is obtained, and using a lookup table method, the second multiplication result of the low bit part and the model weight is obtained; The first multiplication result and the second multiplication result are shifted and added together to obtain the target multiplication result corresponding to the feature map; The i-th output channel is used to output the target multiplication result corresponding to the i-th feature map.

8. The digital in-memory computing system based on caching and lookup tables according to claim 7, characterized in that, The control module includes a cache controller and a decoder for the lookup table, and the computation core module includes N multipliers based on the lookup table; For the i-th feature map: The cache controller in the i-th control module is specifically used to search for the first multiplication result of the high-bit portion and the model weight in the cache; if the search fails, the high-bit portion is input into the decoder. The decoder in the i-th control module is specifically used to generate a first control signal for the i-th multiplier based on the lookup table, based on the high-bit portion of the input. The i-th multiplier based on the lookup table in each of the computing core modules is specifically used to calculate the first multiplication result of the high-bit portion and the model weight based on the first control signal, and write the first multiplication result into the cache; The cache controller in the i-th control module is specifically used to generate a cache read signal corresponding to the high bit portion in the cache after the first multiplication result is written into the cache; The decoder in the i-th control module is specifically used to generate a second control signal for the i-th multiplier based on the low bit portion of the input after generating the buffer read signal; The i-th multiplier based on the lookup table in each of the computing core modules is specifically used to calculate the second multiplication result of the low bit portion and the model weight based on the second control signal; Based on the cache read signal, the first product result is read from the cache; The first multiplication result and the second multiplication result are shifted and added together to obtain the target multiplication result corresponding to the i-th feature map; Alternatively, the cache controller in the i-th control module is specifically used to search for the first multiplication result of the high-bit portion and the model weight in the cache; if the search is successful, a cache read signal corresponding to the high-bit portion is generated. The decoder in the i-th control module is specifically used to generate a second control signal for the i-th multiplier based on the lookup table, based on the low bit portion of the input, after generating the buffer read signal; The i-th multiplier based on the lookup table in each of the computing core modules is specifically used to calculate the second multiplication result of the low bit portion and the model weight based on the second control signal; Based on the cache read signal, the first product result is read from the cache; The first multiplication result and the second multiplication result are shifted and added together to obtain the target multiplication result corresponding to the i-th feature map.

9. The digital in-memory computing system based on caching and lookup tables according to claim 7 or 8, characterized in that, Also includes: The first-level alignment module is used to receive each of the feature maps to be calculated and to determine the feature map index of each feature map; Align the feature maps based on their respective indexes; Each of the aligned feature maps is then input into a different digital memory computing unit.

10. The digital in-memory computing system based on caching and lookup tables according to claim 8, characterized in that, The computation kernel module also includes a second-level alignment module; The i-th multiplier based on the lookup table in each of the computing kernel modules is specifically used to shift and add the first multiplication result and the second multiplication result to obtain the initial multiplication result corresponding to the i-th feature map; The second-level alignment module in each of the multipliers is specifically used to align the initial multiplication result based on the weight index of the model weight corresponding to the i-th feature map, so as to obtain the target multiplication result corresponding to the i-th feature map.