An image convolution iteration method and computer program product

By determining the effective convolution size and edge padding size, the writing process of image data in the storage unit and the convolution operation process are optimized, solving the problem of low efficiency of convolution iteration in wafer image processing, realizing efficient image convolution iteration, and meeting real-time requirements.

CN121883237BActive Publication Date: 2026-06-19BEIJING OPTOKO MICROELECTRONICS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING OPTOKO MICROELECTRONICS TECH CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-19

Smart Images

  • Figure CN121883237B_ABST
    Figure CN121883237B_ABST
Patent Text Reader

Abstract

This application discloses an image convolution iteration method and computer program product, relating to the field of image processing technology. The image convolution iteration method includes: determining the effective convolution size and edge padding size based on the convolution kernel size and the convolution image size, and then deriving the first storage address after reserving the edge padding position as the initial storage address; acquiring the target image or its intermediate convolution results according to the effective convolution size to obtain multiple blocks; writing each block one-to-one into the corresponding storage unit within a single storage cycle, starting from the initial storage address; performing convolution operations after edge padding of the blocks; and returning to the step of acquiring image data to continue iterative convolution on the current convolution result if a preset termination condition is not met. The execution complexity of the convolution iteration operation is low, reducing timing delays and improving the efficiency of image convolution iteration, meeting the real-time requirements of wafer image processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of image processing technology, and in particular relates to an image convolution iterative method and computer program product. Background Technology

[0002] In the wafer image processing process of chip manufacturing, in order to achieve accurate detection of wafer surface defects, image detail enhancement, noise filtering and other image processing operations, it is necessary to perform multiple rounds of convolution operations on the image data acquired by the image acquisition device, so as to improve image quality through progressively optimized convolution iteration processing.

[0003] Currently, image convolution iteration is generally performed using a block-based recombination method. After performing a single convolution operation on each block of the target image, the intermediate convolution results for each block need to be re-aggregated. Then, the aggregated convolution results are divided into multiple blocks for the next convolution operation. Because the block-based recombination method requires multiple operations such as result aggregation and data scheduling, the image convolution iteration efficiency is low and cannot meet the real-time requirements of wafer image processing. Summary of the Invention

[0004] This application provides an image convolution iteration method and computer program product, which can improve the efficiency of image convolution iteration to meet the real-time requirements of wafer image processing.

[0005] A first aspect of this application provides an image convolution iteration method, the method being applied to an FPGA, the FPGA being configured with multiple storage units, including:

[0006] Based on the preset kernel size and effective convolution size, determine the size of the convolution image and the edge padding size of the input for a single convolution operation;

[0007] Based on the convolutional image size and the edge padding size, determine the first storage address after reserving the edge padding position, and use it as the initial storage address;

[0008] Image data is obtained according to the effective size of the convolution, resulting in multiple blocks. The image data is the target image or the intermediate convolution result of the target image.

[0009] Starting from the initial storage address, the image data of each block is written into the storage unit, and the storage unit and the block have a one-to-one relationship in a single storage cycle;

[0010] According to the edge fill size, perform edge fill at the edge fill position;

[0011] Perform a convolution operation on the blocks whose edges have been filled to obtain the current convolution result;

[0012] Determine whether the current convolution result meets the preset termination condition;

[0013] If not, return to the step of obtaining the image data according to the effective size of the convolution and obtaining multiple blocks, and perform data blocking, data storage, edge filling and convolution operation on the current convolution result;

[0014] If so, the current convolution result is used as the output convolution result.

[0015] A second aspect of this application provides an image convolution iteration apparatus, comprising:

[0016] The parameter calibration module is used to determine the size of the convolutional image and the edge padding size of the input for a single convolution operation based on the preset convolutional kernel size and the effective convolutional size.

[0017] The initial address module is used to determine the first storage address after reserving the edge filling position, based on the convolutional image size and the edge filling size, as the initial storage address;

[0018] The segmentation module is used to acquire image data according to the effective size of the convolution to obtain multiple segments, wherein the image data is the target image or the intermediate convolution result of the target image;

[0019] The storage module is used to write the image data of each block into the storage unit starting from the initial storage address, wherein the storage unit and the block have a one-to-one relationship in a single storage cycle;

[0020] An edge filling module is used to fill the edge at the edge filling position according to the edge filling size;

[0021] The convolution operation module is used to perform convolution operations on the blocks after edge filling is completed to obtain the current convolution result;

[0022] The iteration determination module is used to determine whether the current convolution result meets the preset termination condition; if not, it returns to the block segmentation module to perform data segmentation, data storage, edge padding, and convolution operation on the current convolution result; if yes, it enters the output module.

[0023] The output module is used to take the current convolution result as the output convolution result.

[0024] A third aspect of the embodiments of this application provides an electronic device, the device comprising: a memory and a program or instructions stored in the memory and executable on a processor, wherein when the program or instructions are executed by the processor, they implement an image convolution iteration method as provided in any aspect of the embodiments of this application described above.

[0025] A fourth aspect of the embodiments of this application provides a readable storage medium storing a program or instructions, which, when executed by a processor, implements an image convolution iteration method as provided in any of the embodiments of this application described above.

[0026] A fifth aspect of the embodiments of this application provides a computer program product in which instructions, when executed by a processor of an electronic device, cause the electronic device to perform an image convolution iteration method as provided in any aspect of the embodiments of this application described above.

[0027] The technical solution provided in this application has at least the following beneficial effects:

[0028] In the image convolution iteration method provided in this application embodiment, the effective convolution size and edge padding size are determined according to the convolution kernel size and the convolution image size, and then the first storage address after reserving the edge padding position is derived as the initial storage address; the target image or its intermediate convolution result is obtained according to the effective convolution size to obtain multiple blocks; starting from the initial storage address, each block is written one-to-one into the corresponding storage unit in a single storage cycle; after edge padding of the blocks, the convolution operation is performed; if the preset termination condition is not met, the step of obtaining image data according to the effective convolution size is returned to continue to perform iterative convolution on the current convolution result. The execution complexity of the convolution iteration operation is low, the timing delay is reduced, the image convolution iteration efficiency is improved, and the real-time requirements of wafer image processing are met. Attached Figure Description

[0029] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0030] Figure 1 This is a schematic flowchart of an image convolution iteration method provided in one embodiment of this application;

[0031] Figure 2 This is a schematic diagram of the data format of the Advanced Extensible Interface 4 (ALE4) stream transport interface provided in one embodiment of this application;

[0032] Figure 3 This is a schematic diagram of the structure of image data and blocks provided in one embodiment of this application;

[0033] Figure 4 This is a schematic diagram of the structure of an image convolution iterative device provided in one embodiment of this application;

[0034] Figure 5This is a schematic diagram of an image convolution iterative device provided in one embodiment of this application. Detailed Implementation

[0035] The features and exemplary embodiments of various aspects of this application will be described in detail below. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only intended to explain this application and not to limit it. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples.

[0036] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes said element.

[0037] It should be noted that the acquisition, storage, use, and processing of data in the technical solution of this application all comply with the relevant provisions of national laws and regulations. In the embodiments of this application, certain existing industry solutions such as software, components, and models may be mentioned. These should be considered exemplary, intended only to illustrate the feasibility of implementing the technical solution of this application, and do not imply that the applicant has already used or necessarily used such solutions.

[0038] First, the terms and concepts involved in one or more embodiments of this application will be explained.

[0039] Convolution iteration refers to the process of filtering, extracting features, or enhancing details of an image by repeatedly performing convolution operations. The output of each round of operation can be used as the input for the next round.

[0040] A Field Programmable Gate Array (FPGA) is a hardware circuit that allows for flexible configuration of logic functions and storage resources.

[0041] The kernel size refers to the size of the kernel used in the convolution operation.

[0042] The convolutional image size refers to the width and height of the image data processed in this convolution operation.

[0043] The effective convolution size refers to the size of the image region that actually participates in the effective convolution calculation in a single convolution operation.

[0044] Edge padding size refers to the amount of pixels used to pad the edges of image blocks to ensure that convolution operations can be performed normally.

[0045] The target image refers to the original image to be processed by convolution iteration.

[0046] Intermediate convolution results refer to temporary image data obtained from a single convolution operation during the convolution iteration process that has not yet met the termination condition, and can be used as input for the next iteration.

[0047] A single storage cycle refers to the complete time period from when a block of image data is written to a storage unit to when it is read and used. For example, in a timing cycle of an FPGA performing parallel write operations on a batch of blocks, a storage unit stores only one block of image data within this timing cycle, realizing one-to-one parallel storage between blocks and storage units.

[0048] In the wafer image processing process of chip manufacturing, in order to achieve accurate detection of wafer surface defects, image detail enhancement, noise filtering and other image processing operations, it is necessary to perform multiple rounds of convolution operations on the image data acquired by the image acquisition device, so as to improve image quality through progressively optimized convolution iteration processing.

[0049] Currently, image convolution iteration is generally performed using a block-based recombination method. After performing a single convolution operation on each block of the target image, the intermediate convolution results for each block need to be re-aggregated. Then, the aggregated convolution results are divided into multiple blocks for the next convolution operation. Because the block-based recombination method requires multiple operations such as result aggregation and data scheduling, the image convolution iteration efficiency is low and cannot meet the real-time requirements of wafer image processing.

[0050] To address the aforementioned technical problems, this application provides an image convolution iteration method and a computer program product. In the image convolution iteration method provided in this application, firstly, based on a preset convolution kernel size and convolution image size, the effective convolution size and edge padding size required for a single convolution operation are determined. Based on these two, an initial storage address is determined after reserving edge padding positions, laying the foundation for subsequent data storage and convolution operations. Secondly, the target image or its intermediate convolution results are obtained according to the effective convolution size to obtain multiple blocks. Starting from the initial storage address, each block is written one-to-one into the corresponding storage unit of the FPGA within a single storage cycle. After edge padding is completed, convolution operations are performed on each block in parallel to obtain the current convolution result. Finally, it is determined whether the current convolution result meets a preset termination condition. If not, the process returns to the step of obtaining image data according to the effective convolution size to obtain multiple blocks, and convolution operations are continued on the current convolution result. If yes, the final convolution result is output, thereby achieving efficient image convolution iteration processing.

[0051] For example, the image convolution iteration method provided in this application can be applied to the production line of a semiconductor manufacturing company. It is used to perform multiple rounds of convolution iteration processing on the raw wafer surface image acquired by a Time Delay Integration (TDI) camera, achieving wafer surface defect detection, image detail enhancement, and noise filtering. In practical applications, operators can acquire the raw image of the wafer surface through an image acquisition device and input it into an FPGA configured with multiple storage units. The FPGA then executes the block segmentation, storage, edge filling, convolution operation, and iteration process step by step according to the above-described image convolution iteration method to obtain an optimized wafer image. This optimized wafer image is then stored in a data storage device. Based on this optimized wafer image, operators can perform wafer surface defect identification, defect location, yield analysis, and production process adjustments, thereby improving the inspection efficiency and product yield of the semiconductor production line.

[0052] It should be noted that the application scenarios described in the above embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. Those skilled in the art will understand that with the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems. The image convolution iteration method provided by the embodiments of this application can be applied to various application scenarios that require convolution iteration operations on images.

[0053] The image convolution iteration method provided in the embodiments of this application is described below. In practical applications, the execution subject of the image convolution iteration method in the embodiments of this application can be a terminal device, such as a desktop computer, a laptop computer, etc., or a remote device similar to a server. Of course, the embodiments of this application can also adopt an execution subject in the form of software, such as a client or software program installed on a terminal device. The specific type of execution subject corresponding to the technical solution provided in the embodiments of this application is not strictly limited here, and can be flexibly selected according to the actual application scenario and actual needs.

[0054] The following describes specific embodiments of the image convolution iterative method, apparatus, electronic device, storage medium, and computer program product provided in this application. First, an image convolution iterative method is introduced.

[0055] Figure 1 This is a schematic flowchart illustrating an image convolution iterative method provided in an embodiment of this application. Figure 1 As shown, this method is applied to an FPGA configured with multiple storage units. The image convolution iteration process includes steps S100 to S107, as follows:

[0056] S100: Determine the size of the convolutional image and the edge padding size of the input image for a single convolutional operation based on the preset convolutional kernel size and effective convolutional size.

[0057] In one or more embodiments of this application, in order to ensure the accuracy of the convolution operation results of each block in subsequent steps and to avoid abnormal convolution results due to missing image edge data, this application needs to pre-determine processing parameters that match the convolution kernel size and the effective convolution size in this step.

[0058] Specifically, this application first needs to pre-set the specific sizes of the convolution kernel size and the effective convolution size to ensure that the effective convolution size matches the convolution kernel size, so as to ensure that the effective data within the block can meet the requirements of the complete convolution operation. Second, based on the convolution kernel size and the effective convolution size, the actual image size participating in a single convolution operation is determined, i.e., the convolution image size. Finally, in order to compensate for the impact of missing pixels at the image edges on the convolution operation, the edge padding size is determined based on the convolution image size and the effective convolution size. This edge padding size determines the number of pixel rows / columns that need to be added at the top, bottom, left, and right sides of the input image for the convolution operation, so as to ensure that the convolution kernel can still complete the complete convolution operation at the image edge position, providing a size basis for subsequent edge padding.

[0059] It should be noted that convolution operation refers to the process where the convolution kernel slides across the image row by row and column by column, performing a multiplication-accumulation operation at each position to obtain an output pixel. In this application, a single convolution operation refers to one round of convolution operation performed on the image data, corresponding to one iteration in the convolution iteration process, rather than a single multiplication-accumulation operation during the sliding traversal of the convolution kernel across the image. This application does not limit the specific size of the convolution kernel and the effective convolution size. The size of the convolution kernel and the effective convolution size are related, and their specific values ​​can be comprehensively set based on actual image processing requirements, FPGA hardware resource constraints, and computational efficiency goals. For different image processing objectives such as wafer surface defect detection, detail enhancement, or noise filtering, select the appropriate convolution kernel size and effective convolution size. The larger the convolution kernel size, the more suitable it is for global feature extraction and large-scale noise suppression, while the smaller the convolution kernel size, the more suitable it is for extracting subtle defects and local details. Considering the number of internal storage units of the FPGA, data processing bandwidth, and computational latency constraints, select a convolution kernel size and effective convolution size that are compatible with hardware resources and balance processing accuracy and computational efficiency while ensuring image processing performance.

[0060] Taking a two-dimensional square convolution as an example, the relationship between the effective convolution size, the kernel size, and the convolution image size is as follows:

[0061]

[0062] For ease of description and calculation, all dimensions in this application are represented in the form of row number * column number. The effective convolution size is K*K, where K is the side length of the effective convolution region, indicating that each row / column includes K pixels of data before edge padding; the convolution kernel size is N*N, where N is the side length of the convolution kernel; the convolution image size is M*M, where M is the side length of the convolution image, indicating that each row / column includes K pixels of data after edge padding. The data consists of 1 pixel; the edge padding size is (N-1) / 2; the values ​​of K and N are not less than 1, and N is set to an odd number. In other cases of non-2D square convolution, the above size relationship can be adjusted according to the actual convolution dimension and shape adaptability, such as calculating the size independently according to the corresponding side length in the horizontal and vertical directions, or using the size calculated in the horizontal / vertical directions as the unified size in both directions to reduce complexity.

[0063] S101: Based on the convolutional image size and the edge padding size, determine the first storage address after reserving the edge padding position, and use it as the initial storage address.

[0064] In one or more embodiments of this application, in order to reserve sufficient storage space in the storage unit for edge padding data and avoid address overlap or storage confusion between image data and padding data during storage, this step requires determining the first storage address after reserving the edge padding space as the initial storage address.

[0065] Specifically, this application determines the overall required storage area range based on the size of the convolutional image, and reserves corresponding edge filling positions in the storage area according to the edge filling size. The first storage address used to store image data after the reservation is completed is used as the initial storage address, so that the image data corresponding to the blocks in subsequent steps can be written from this initial storage address, ensuring that edge filling and convolution operations can be executed in an orderly manner.

[0066] It should be noted that this application does not limit the specific type of storage unit, which can be set according to actual needs. For example, Block Random Access Memory (BRAM), or BRAM and Dynamic Random Access Memory (DRAM) can be used as storage units. In order to improve data read and write efficiency, meet the requirements of parallel execution of subsequent edge padding and convolution operations, and avoid the situation where the processing latency is increased due to read and write conflicts on a single port, this application can use dual-port BRAM as storage unit. The initial storage address is the starting write address of the image data corresponding to the block in the FPGA storage unit, and its function is to pre-plan a sufficient storage area for edge padding data.

[0067] Following the previous example, based on the zero-based addressing rule and combining the parameter settings of the convolutional image size M*M and the edge padding size (N-1) / 2, the initial storage address is determined as follows:

[0068]

[0069] In equations (2) to (3), equation (2) is the formula for calculating the storage address under the zero-base address rule; Line number; The total number of addresses in each row; For column number; For line number , column number is The corresponding storage address at that time. This is the initial storage address; each row includes [address] before edge padding. pixel data, i.e. Consistent with the total number of addresses occupied by each row of data; Define the edge padding size; the row / column number ranges from 0 to (M-1). The row and column numbers of the first memory address after reserving the edge padding position are both [value missing]. Based on equation (2), the initial storage address of equation (3) can be obtained as follows: .

[0070] S102: Obtain image data according to the effective convolution size to obtain multiple blocks, wherein the image data is the target image or the intermediate convolution result of the target image.

[0071] In one or more embodiments of this application, in order to ensure the smooth execution of block storage, edge padding, and convolution operations in subsequent steps and to guarantee the continuity of the multi-round convolution iteration process, this application needs to obtain the image data to be convolved according to the effective convolution size, and obtain multiple blocks corresponding to the effective convolution size. This image data can be the originally acquired target image, or it can be an intermediate convolution result obtained after performing at least one round of convolution operations on the target image, in order to meet the continuous processing requirements of multi-round iterative convolution.

[0072] It should be noted that this application does not limit the specific method of obtaining the blocks. The method can be set according to actual needs. For example, when acquiring image data, it can be directly segmented according to the effective convolution size, and the acquired data itself is a block conforming to the effective convolution size; or the image data can be acquired first, and then divided according to the effective convolution size to obtain multiple blocks of corresponding sizes. For instance, in the wafer image processing process, the TDI camera and the FPGA of the image acquisition card mainly use a coaxial serial extension (CoaXPress, CXP) interface or a 40G / 100G optical port for data transmission. The data type is the 4th generation Advanced Extensible Interface (AXI4 stream), with a data bit width of 256 bits, and sequential reception is achieved according to the camera's line scan order. Since the object of convolution operation is pixels, the AXI4 stream data needs to be converted into image data in pixels, and the corresponding image data is obtained according to the convolution image size.

[0073] Figure 2 This is a schematic diagram of the streaming interface data format of the fourth-generation advanced extensible interface provided in an embodiment of this application. (See diagram below.) Figure 2As shown, log_clk refers to the interface transmission clock, used to synchronize the data stream transmission timing of the Advanced Extensible Interface 4; tvalid refers to the sender data valid signal, indicating whether the image data on tdata sent by the sender is valid; tready refers to the receiver data valid signal, used to indicate whether the image data on the current tdata is valid; tdata refers to image data; tuser refers to the synchronization code; tkeep refers to the double word (DW) valid mask used to identify data validity; tkeep includes 8 flag bits, one flag bit is used to identify the validity of a single DW in tdata; 8'hFF indicates that all 8 DWs are valid; D0 to Dn in tdata refer to the multi-channel image double word data arranged in transmission order; ID1, 0, and ID3 in tuser refer to different synchronization identifiers or frame identifier signals, used to distinguish image data streams of different frames, different channels, or different transmission segments to ensure the synchronization and integrity of image data during interface transmission.

[0074] S103: Starting from the initial storage address, the image data of each block is written into the storage unit, and the storage unit and the block have a one-to-one relationship in a single storage cycle.

[0075] In one or more embodiments of this application, to ensure the reliability of the data source for edge padding and convolution operations in subsequent steps and to avoid data overwriting, address misalignment, or storage conflicts, this application uses the initial storage address determined in step S101 as the starting location for writing image data, and writes the image data of each block into the corresponding storage unit.

[0076] It should be noted that within a single storage cycle, one storage unit is used to store only one block of image data, maintaining a one-to-one storage relationship between the storage unit and the block, thus improving the parallelism and stability of data read and write. Continuing the previous example, this application acquires image data in line-scan order. In line-scan mode, image data needs to be acquired and written to the storage unit line by line. Therefore, the ratio of the overall size of the image data to the effective size of the convolution corresponding to each block is determined. This ratio is rounded up and used as the number of blocks for a single parallel storage cycle, ensuring that the number of blocks for parallel storage matches the efficiency of line-scan processing. Of course, this application does not limit the number of storage units; it can be set according to actual needs, such as the number of storage units being the same as the total number of blocks. To ensure a smooth transition between continuous image data reception, parallel storage, and convolution operations, and to avoid data congestion or waiting, the number of storage units in this application must be no less than twice the number of blocks in a single parallel storage operation. When the number of storage units is twice the number of blocks in a single parallel storage operation, one set of storage units is used to receive and write new block data (one set of storage units is used to write new data), and the other set of storage units is used to provide the FPGA with the already written block data, so that the FPGA can perform edge filling and convolution operations after reading image data from the storage units (one set of storage units is used to supply convolution operation data to the FPGA). The two sets of storage units are used alternately to achieve pipelined parallel execution of data writing and convolution operations, without waiting for the completion of a single batch of data operations before writing the next batch of data, thus ensuring uninterrupted processing of image data in line scan mode.

[0077] Figure 3 This is a schematic diagram illustrating the structure of the image data and blocks provided in this application. (See attached diagram.) Figure 3 As shown, the width of the image data is i times the effective convolution size, and the height is j times the effective convolution size (where i and j are both positive integers). Accordingly, a single convolution operation of the image data includes j rounds, parallel storage of i blocks per round, and convolution operation; the block acquisition sequence is block 1, block 2, ..., block j*i.

[0078] In order to ensure that each row of image data in the block can be written to the storage unit in an orderly manner according to the preset address rules, and to avoid address offset errors or data storage chaos, in one or more embodiments of this application, this application needs to determine the starting storage address of each row of data in the block according to the initial storage address and the size of the convolutional image; write the image data of each block into the corresponding storage unit respectively, and the starting writing position of each row of data in the block is the starting storage address corresponding to that row.

[0079] Continuing with the previous example, the starting position for writing the first line (line number zero) of the block is: Continuous storage After the pixel data is sent to the address And so on; The total number of addresses occupied by each row of data. If the row number is , then the starting storage address of each row of data is . The starting write position of each line of data is the starting storage address. Each line stores K pixels of data, and (N-1) / 2 edge filling positions are reserved on both the left and right edges.

[0080] In this embodiment, the present application determines the storage address range of the image data corresponding to a single block in the storage unit based on the convolution image size, edge padding size, and effective convolution size. By setting this address, edge padding positions that meet the requirements of convolution operation are reserved at the top, bottom, left, and right sides of the block, so that subsequent steps can directly complete the edge padding operation at the edge padding positions without having to perform overall address offset or reconstruction on the image data corresponding to the block, thus ensuring the reliability of performing convolution operation based on the data in the storage unit.

[0081] S104: Perform edge filling at the edge filling position according to the edge filling size.

[0082] In one or more embodiments of this application, in order to ensure that the input image size for convolution operations in subsequent steps meets the requirements of convolution operations and to ensure the continuity of multiple rounds of convolution operations, this step requires that the blocks be edge-filled according to a preset edge-filling size to expand the image size to the convolution image size required for the convolution operation.

[0083] Specifically, this application writes padding data at the corresponding edge padding positions around the image data based on the edge padding size, so that the size of the padded image is consistent with the size of the convolutional image, providing compliant input data for subsequent convolution operations.

[0084] It should be noted that for each convolution operation, the input image size is the convolution image size M*M, and the output image size is the effective convolution size K*K. That is, this application requires edge padding of the target image or its intermediate convolution result K*K to obtain data of size M*M, which serves as the input for the next convolution operation, thereby achieving continuous iterative convolution operations. This application does not limit the specific edge padding method; it can be set according to actual needs, such as constant padding with fixed pixel values, zero-value padding, mirror padding, or boundary copy padding. To fully utilize idle hardware resources, avoid edge padding operations occupying additional clock cycles, improve overall convolution processing efficiency, and ensure seamless integration of edge padding with block storage and convolution operations, this application can synchronously perform edge padding operations during the idle clock period when block data is written to the storage unit, as follows:

[0085] For the horizontal edge filling on the left and right sides of the block: For each block, this application can write a row of data in the storage unit to the idle clock when the next row of data in the block is started, obtain the horizontal filling data of the block according to the edge filling size, and perform horizontal edge filling at the reserved edge filling position in the horizontal direction according to the horizontal filling data.

[0086] For vertical edge filling at the top and bottom of the block: For each row of blocks, this application writes the row of blocks into the storage unit and obtains the vertical filling data of the row of blocks according to the edge filling size and the size of the convolutional image during the idle clock before writing the next row of blocks. Based on the vertical filling data, vertical edge filling is performed at the reserved edge filling positions in the vertical direction.

[0087] In this embodiment, whether it's after one row of data is written within a block and before the next row begins to be written, or after one block is written within each block and before the next block begins to be written, the gap between when the storage unit completes the current write operation and when the next batch of data is ready (such as the timing alignment gap in parallel storage) is the idle clock. Edge filling is completed during the idle clock, eliminating the need for an additional independent clock cycle for edge filling, thus achieving a pipelined connection between data writing and edge filling clock cycles. Furthermore, completing edge filling during the idle clock reduces data transport and address offset; that is, edge filling is completed synchronously within the storage unit, and the filled data is directly stored in the reserved edge filling position, eliminating the need to read out the block data, fill it separately, and then rewrite it, reducing data read / write latency and hardware resource consumption, and ensuring the continuity of data addresses after filling. Essentially, this application utilizes the idle clock cycle corresponding to the reserved edge filling position to complete the edge filling operation without increasing computational latency or interrupting normal image data writing, thereby improving the processing efficiency and resource utilization of the FPGA hardware pipeline.

[0088] Continuing with the previous example, after writing K data in a row within a block and before writing the next K data, there are (N-1) idle clock cycles in between. That is, the right side of the current row and the left side of the next row are each empty by (N-1) / 2 clock cycles. Similarly, after writing i blocks in a row and before writing the next i blocks, there are [(N-1)*M] idle clock cycles in between. That is, the bottom of a row of blocks and the top of the next row of blocks are each empty by [(N-1) / 2*M] clock cycles.

[0089] Of course, this application does not limit the specific method of obtaining the filling data. It depends on the filling method selected. For example, if the edge filling method is constant filling or zero filling, then fixed pixel values ​​or zero values ​​are filled in the (N-1) / 2 storage addresses that are empty on the left and right sides in the horizontal direction.

[0090] S105: Perform a convolution operation on the block that has completed edge filling to obtain the current convolution result.

[0091] S106: Determine whether the current convolution result meets the preset termination condition; if not, return to step S102 to perform data blocking, data storage, edge padding, and convolution operation on the current convolution result; if yes, execute step S107.

[0092] S107: Use the current convolution result as the output convolution result.

[0093] In one or more embodiments of this application, in order to achieve continuous convolution iterative operations on image data and accurately output the final convolution result when a preset termination condition is met, in steps S105 to S107, this application needs to perform convolution operations on the edge-filled blocks and determine whether to continue iterating based on the current convolution result, thereby forming a complete convolution processing closed loop.

[0094] Specifically, this application performs a convolution operation on the blocks that have been edge-filled to obtain the current convolution result; by judging whether the current convolution result meets the preset termination condition, it decides whether to continue the iterative operation; if it does not meet the condition, the current convolution result is used as an intermediate convolution result and returned to the block storage step to continue processing; if it meets the condition, the current convolution result is used as the output convolution result.

[0095] It should be noted that this application does not limit the specific content of the preset termination condition, which can be set according to actual needs, such as reaching a preset number of convolution iterations, or the output image after combining the output convolution results meeting preset image quality requirements.

[0096] In the above image convolution iterative method, this application can flexibly adapt to single or multiple convolution operation scenarios through an iterative loop mechanism. At the same time, by combining block parallel processing and pipeline architecture, it can effectively improve the convolution processing efficiency in scenarios such as wafer images.

[0097] In step S104, to ensure that the edge filling data is consistent with the image's own features and to improve the processing accuracy of convolution operations in the image edge region, this application can utilize the image data itself for edge filling, that is, obtain filling data from the edge region of the block itself and the edge regions of its adjacent blocks, as follows:

[0098] For filling the horizontal edges on the left and right sides of the block: When the block is at the edge of the image data, this application can use the storage unit corresponding to the block and the storage unit corresponding to the adjacent block in the horizontal direction as the first storage unit; when the block is not at the edge of the image data, the storage unit corresponding to the adjacent block in the horizontal direction is used as the first storage unit; from the first storage unit, the starting area data or ending area data of each row of data is obtained according to the edge filling size, and used as the horizontal filling data.

[0099] It should be noted that this application uses image data from the block itself and its horizontally adjacent blocks for horizontal edge filling, which avoids the abrupt edge problems caused by zero-value filling or constant filling, making the feature transition of the image in the horizontal direction smoother and improving the accuracy of convolution operations. Since blocks located at the edge of image data only have one adjacent block in the horizontal direction, it is impossible to obtain sufficient filling data from the adjacent blocks alone. Therefore, when a block is located at the edge of image data, its own storage unit is included in the first storage unit to obtain horizontal filling data. However, blocks located inside the image have complete adjacent blocks on both sides, and filling data can be directly obtained from the adjacent blocks, ensuring the completeness and effectiveness of horizontal filling data acquisition.

[0100] Similarly, for vertical edge filling at the top and bottom of the blocks: when the row block is at the edge of the image data, this application can use the storage unit corresponding to the row block and the storage unit corresponding to the adjacent row block in the vertical direction as the second storage unit; when the row block is not at the edge of the image data, the storage unit corresponding to the adjacent row block in the vertical direction is used as the second storage unit; from the second storage unit, according to the edge filling size and the convolution image size, the starting area data or ending area data of each column of data is obtained as the vertical filling data.

[0101] It should be noted that, similar to horizontal edge filling, distinguishing whether it is at the edge of the image data is to adapt to the difference in the number of adjacent blocks between the edge blocks and the inner blocks of the image, to ensure that effective and sufficient vertical filling data can be obtained, so that the filling process is more in line with the actual distribution of the image and improves the reliability of the convolution operation.

[0102] In the embodiments described above for obtaining horizontal and vertical fill data, the method of obtaining fill data from the real image data of the block itself and adjacent blocks in this application essentially involves filling with real data of non-zero and non-constant values. The purpose of distinguishing whether a block is at the image edge is to ensure that valid real data can be obtained for filling regardless of whether the block is inside or at the edge of the image, avoiding filling failure or invalid fill data due to the lack of adjacent blocks, and ensuring the continuity and realism of image edge features. This application does not limit the specific method of obtaining fill data from the first and second storage units; it can be set according to actual needs, such as edge copy filling, neighborhood mean filling, mirror filling, etc.

[0103] Furthermore, regarding the horizontal edge padding on the left and right sides of the block, in order to locate the storage position of the horizontal padding data in the storage unit, ensure orderly reading of the padding data, prevent address offset, and improve reading efficiency, in one or more embodiments of this application, this application can directly calculate and obtain the horizontal padding data based on a preset address rule, as follows:

[0104] Based on the initial storage address, the effective convolution size, and the edge padding size, the horizontal address range of the starting region data (left edge data) or ending region data (right edge data) of each row of data in the storage unit is determined. Then, the starting region data or ending region data of each row of data is read from the first storage unit according to the horizontal address range.

[0105] Continuing with the previous example, during the parallel storage of the first row of blocks (block 1 to block i), taking the horizontal edge filling between block 1 and block 2 as an example, when a row of data is written into the memory cell, during the idle clock period from when the next row of data is written, the BRAM of block 1 and the BRAM of block 2 need to perform the following operations:

[0106] Taking the horizontal edge padding of the first row of image data corresponding to row number (N-1) / 2 as an example, during the idle clock period when the first row of data is started being written into the storage unit, block 1's BRAM reads its... to The starting region data corresponding to the horizontal address range, the starting region data of the first row of image data is used to fill the left side of block 1.

[0107] After block 1 completes left-side padding, block 2 begins left-side padding and its own right-side padding: the first row of data is written to the memory cell, and during the idle clock cycle from when the second row of data begins to be written, the BRAM of block 1 is read out. to The terminating region data corresponding to the horizontal address range is used for left-side padding of block 2, that is, the terminating region data is written into the corresponding storage unit of block 2. to The edge filling position; during this period, the BRAM of block 2 is read out. to The starting region data corresponding to the horizontal address range, the starting region data of the first row of image data is used to fill the right side of block 1.

[0108] Similarly, for the horizontal edge filling of image data corresponding to other row numbers, this application can obtain it based on the horizontal address range and edge filling position corresponding to the first row of image data plus an integer multiple of M, such as the horizontal address range and edge filling position corresponding to the second row of image data plus M; until the Kth idle clock cycle of "writing a row of data in the storage unit to start writing the next row of data", the horizontal edge filling of the first row of i blocks is completed.

[0109] Based on the edge filling between block 1 and block 2, the process for edge filling between other adjacent blocks can be deduced: after completing the left-side filling of block 2, the processes of "filling the left side of block 3 according to the termination region data of each row of data in block 2" and "filling the right side of block 2 itself" are executed; and so on, after completing the left-side filling of the current block, the left-side filling of its next block and the right-side filling of the current block itself are executed. For the first row of image data in block i (located at the right edge of the image data), the BRAM of block i is read out. to The terminating region data corresponding to the horizontal address range is used to fill the right side of block i.

[0110] To ensure the symmetry and continuity of the filling data for image edge blocks and avoid abrupt feature changes at image boundaries, in one or more embodiments of this application, the horizontal filling data obtained from the block's own storage unit can be mirrored and filled to the edge filling position, and the horizontal filling data obtained from adjacent block storage units can be copied and filled to the edge filling position, as follows:

[0111] When the block is at the edge of the image data, the horizontal fill data obtained from the storage unit of the block is mirrored; the mirrored horizontal fill data and the horizontal fill data obtained from the storage unit of the adjacent block are written to the horizontal edge fill position of the block; when the block is not at the edge of the image data, the horizontal fill data is written to the horizontal edge fill position of the block.

[0112] It should be noted that this application does not limit the specific method of mirroring. It can be set according to actual needs, such as reading data in a mirrored manner when obtaining filling data, or writing the data to the edge filling position in a mirrored manner after reading the data.

[0113] Continuing with the previous example, taking the horizontal edge filling of the first row of image data corresponding to row number (N-1) / 2 as an example, during the idle clock period when the first row of data is started being written into the memory cell, block 1 (located at the left edge of the image data) of the BRAM is read out. to The starting region data corresponding to the horizontal address range is used for left-side padding of block 1. The left edge padding position of the first row of image data in block 1 is... to Mirroring can be done from: Read to and write them sequentially. to ;from Read to and write them sequentially. to .

[0114] Furthermore, regarding the vertical edge filling at the top and bottom of the blocks, in order to locate the storage position of the vertical filling data in the storage unit, ensure orderly reading of the filling data, prevent address offset, and improve reading efficiency, in one or more embodiments of this application, the vertical filling data can be directly calculated and obtained based on a preset address rule, as follows:

[0115] Based on the effective convolution size and the edge padding size, the vertical address range of the starting region data (top edge data) or ending region data (bottom edge data) of each column of data in the storage unit is determined. Then, the starting region data or ending region data of each column of data is read from the second storage unit according to the vertical address range and used as vertical padding data.

[0116] Similarly, in order to ensure the symmetry and continuity of the filling data for image edge blocks and avoid abrupt feature changes at image boundaries, in one or more embodiments of this application, mirror filling can be performed based on the vertical filling data obtained from the block's own storage unit, and copy filling can be performed based on the vertical filling data obtained from the adjacent block's storage unit, as follows:

[0117] If the row block is located at the edge of the image data, the vertical fill data obtained from the storage unit of that row block is mirrored. The mirrored vertical fill data, along with the vertical fill data obtained from the storage unit of the adjacent row block, is written to the vertical edge fill position of that row block. If the row block is not located at the edge of the image data, the vertical fill data is written to the vertical edge fill position of that row block.

[0118] It should be noted that this application does not limit the specific method of mirroring. It can be set according to actual needs, such as obtaining the filling data in a mirrored manner to achieve mirroring, or directly obtaining the data and then writing it in a mirrored manner to achieve mirroring.

[0119] Continuing with the previous example, taking the vertical edge padding between the first row of blocks (block 1 to block i) and the second row of blocks (block i+1 to block 2i) as an example, during the idle clock period from when the first row of blocks is written into the memory cell until the second row of blocks begins to be written, the BRAM of the first row of blocks and the BRAM of the second row of blocks need to perform the following operations:

[0120] Read the BRAM from block 1 to block i to The starting region data corresponding to the vertical address range, i.e., the first (N-1) / 2 data in each column of the block, is used to fill the top of blocks i+1 to 2i, that is, the ending region data is written into the corresponding storage units of blocks i+1 to 2i. to The edge fill position.

[0121] After the top filling of blocks i+1 to 2i is completed, the overall read of blocks 1 to i is initiated: from the storage units corresponding to blocks 1 to i, the first (N-1) / 2 rows of data are read by mirroring or the first (N-1) / 2 rows of data are read directly and then mirrored; taking mirror reading as an example, the BRAM of blocks 1 to i is read. to The starting region data corresponding to the vertical address range is used to fill the top of blocks 1 to i. Since vertical mirror filling is a mirroring process and data copying with the top edge as the axis of symmetry, it essentially reads data in reverse from the high address line to the low address line, first reading... to The vertical address range corresponds to the first Continue reading the row image data (after horizontal edge filling is complete). to The vertical address range corresponds to the first Line of image data, and so on, until the first line is read. The image data is read in a mirror image, corresponding to the vertical fill data at the top. (Read) to The corresponding image data, i.e., the image data after horizontal edge filling (M*K). During this period, the BRAM of blocks i+1 to 2i is read out. to The starting region data corresponding to the vertical address range, which is the vertical fill data corresponding to the bottom fill of block 1 to block i.

[0122] Similarly, for vertical edge filling of other row blocks not located at the edge of the image data, during the idle clock period when writing to each block of the current row begins in the storage unit, top filling of each block of the current row is performed. The vertical filling data for the top filling is the last (N-1) / 2 rows of the image data of the previous row blocks that have completed horizontal edge filling. From the time the current row blocks are written to the idle clock period when writing to the next row blocks begins in the storage unit, bottom filling of each block of the previous row is performed. The vertical filling data for the bottom filling is the first (N-1) / 2 rows of the image data of the current row blocks that have completed horizontal edge filling. Following the example above, when reading out the entire first row block, the second row block will be received simultaneously, and it will be processed in the same way as the first row block. After completing the horizontal edge filling of the second row block, the idle clock for the second row block is activated to read the data from the starting area of ​​each column of the second row block, thus completing the bottom filling of the first row block. After the bottom filling of the first row block is completed, the vertical filling data will be read sequentially at the following addresses: from Read Simultaneously, top padding is performed on the third block, and the write address is... to This process continues until the last row of blocks in the image data.

[0123] For blocks (j-1)*i+1 to block j*i (located at the bottom edge of the image data), after completing the horizontal edge filling and top filling, the entire image can be read directly, and the data can be read... Afterwards, either read the last (N-1) / 2 rows of data using a mirror image, or read the last (N-1) / 2 rows of data directly and then perform mirror processing. Taking mirror reading as an example, read the BRAM from block (j-1)*i+1 to block j*i. to The vertical address range corresponds to one line of M data, which is then read out. to The vertical address range corresponds to one line of M data, until it is read out. to A row of M data corresponding to the vertical address range.

[0124] Based on an image convolution iterative method, this application also provides specific embodiments of an image convolution iterative apparatus.

[0125] like Figure 4As shown in the figure, an image convolution iteration device 400 provided in this application embodiment includes a parameter calibration module 401, an initial address module 402, a block segmentation module 403, a storage module 404, an edge filling module 405, a convolution operation module 406, an iteration determination module 407, and an output module 408.

[0126] The parameter calibration module 401 is used to determine the size of the convolution image and the edge padding size of the input for a single convolution operation based on the preset convolution kernel size and the effective convolution size;

[0127] The initial address module 402 is used to determine the first storage address after reserving the edge filling position, based on the convolutional image size and the edge filling size, as the initial storage address;

[0128] The segmentation module 403 is used to acquire image data according to the effective size of the convolution to obtain multiple segments, wherein the image data is the target image or the intermediate convolution result of the target image;

[0129] Storage module 404 is used to write the image data of each block into the storage unit starting from the initial storage address, wherein the storage unit and the block have a one-to-one relationship in a single storage cycle;

[0130] The edge filling module 405 is used to fill the edge at the edge filling position according to the edge filling size;

[0131] Convolution operation module 406 is used to perform convolution operation on the blocks after edge filling is completed to obtain the current convolution result;

[0132] The iteration determination module 407 is used to determine whether the current convolution result meets the preset termination condition; if not, it returns to the block segmentation module to perform data block segmentation, data storage, edge padding and convolution operation on the current convolution result; if yes, it enters the output module.

[0133] The output module 408 is used to take the current convolution result as the output convolution result.

[0134] In some embodiments, the storage module is specifically used to: determine the starting storage address of each row of data in the block according to the initial storage address and the convolutional image size; and write the image data of each block into the storage unit starting from the starting storage address.

[0135] In some embodiments, the edge filling module described above is specifically configured to: for each block, write a row of data from the block into the storage unit to an idle clock cycle before writing the next row of data from the block; obtain the horizontal filling data of the block according to the edge filling size; perform horizontal edge filling at the reserved edge filling positions in the horizontal direction according to the horizontal filling data; for each row of blocks, write the row of blocks into the storage unit to an idle clock cycle before writing the next row of blocks; obtain the vertical filling data of the row of blocks according to the edge filling size and the convolutional image size; and perform vertical edge filling at the reserved edge filling positions in the vertical direction according to the vertical filling data.

[0136] In some embodiments, the edge filling module is specifically configured to: when the block is at the edge of the image data, use the storage unit corresponding to the block and the storage unit corresponding to the adjacent block in the horizontal direction as the first storage unit; when the block is not at the edge of the image data, use the storage unit corresponding to the adjacent block in the horizontal direction as the first storage unit; and obtain the start area data or end area data of each row of data from the first storage unit according to the edge filling size, as the horizontal filling data.

[0137] In some embodiments, the edge padding module described above can also be used to: determine the horizontal address range of the start region data or the end region data of each row of data in the storage unit according to the initial storage address, the effective convolution size and the edge padding size; and read the start region data or the end region data of each row of data from the first storage unit according to the horizontal address range as the horizontal padding data.

[0138] In some embodiments, the edge filling module described above can also be used to: mirror the horizontal filling data obtained from the storage unit of the block when the block is at the edge of the image data; write the mirrored horizontal filling data and the horizontal filling data obtained from the storage unit of the adjacent block to the horizontal edge filling position of the block; and write the horizontal filling data to the horizontal edge filling position of the block when the block is not at the edge of the image data.

[0139] In some embodiments, the edge filling module described above can also be used to: when the block in a row is at the edge of the image data, use the storage unit corresponding to the block in that row and the storage unit corresponding to the block in the adjacent row in the vertical direction as a second storage unit; when the block in a row is not at the edge of the image data, use the storage unit corresponding to the block in the adjacent row in the vertical direction as the second storage unit; and from the second storage unit, obtain the start region data or end region data of each column of data according to the edge filling size and the convolutional image size, as the vertical filling data.

[0140] In some embodiments, the edge filling module described above can also be used to: determine the vertical address range of the start region data or the end region data of each column of data in the storage unit according to the convolutional image size and the edge filling size; and read the start region data or the end region data of each column of data from the second storage unit according to the vertical address range as the vertical filling data.

[0141] In some embodiments, the edge filling module described above can also be used to: mirror the vertical filling data obtained from the storage unit of the block when the row of blocks is at the edge of the image data; write the mirrored vertical filling data and the vertical filling data obtained from the storage unit of the adjacent block to the vertical edge filling position of the row of blocks; and write the vertical filling data to the vertical edge filling position of the row of blocks when the row of blocks is not at the edge of the image data.

[0142] Based on an image convolution iterative method, this application also provides a specific embodiment of an image convolution iterative device.

[0143] Figure 5 A schematic diagram of the hardware structure of an image convolution iterative device provided in an embodiment of this application is shown.

[0144] The image convolution iterative device may include a processor 501 and a memory 502 storing computer program instructions.

[0145] Specifically, the processor 501 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.

[0146] Memory 502 may include mass storage for data or instructions. For example, and not limitingly, memory 502 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 502 may include removable or non-removable (or fixed) media. Where appropriate, memory 502 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 502 is non-volatile solid-state memory.

[0147] The processor 501 implements any of the image convolution iterative methods in the above embodiments by reading and executing computer program instructions stored in the memory 502.

[0148] In one example, the electronic device may also include a communication interface 503 and a bus 510. Wherein, as... Figure 5 As shown, the processor 501, memory 502, and communication interface 503 are connected through bus 510 and complete communication with each other.

[0149] The communication interface 503 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application.

[0150] Bus 510 includes hardware, software, or both, that couples the components of the electronic device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 510 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.

[0151] Furthermore, in conjunction with the image convolution iteration method in the above embodiments, this application embodiment can provide a computer storage medium for implementation. The computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the image convolution iteration methods in the above embodiments.

[0152] In addition, in conjunction with the image convolution iteration method in the above embodiments, this application embodiment can provide a computer program product for implementation. When the instructions in the computer program product are executed by the processor of an electronic device, the electronic device executes an image convolution iteration method as provided in any aspect of the above embodiments of this application.

[0153] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.

[0154] The functional blocks shown in the above-described structural diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.

[0155] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.

[0156] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.

[0157] The above description is merely a specific implementation of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.

Claims

1. An image convolution iterative method, characterized in that, The method is applied to an FPGA, which is configured with multiple memory units, including: Based on the preset kernel size and effective convolution size, determine the size of the convolution image and the edge padding size of the input for a single convolution operation; Based on the convolutional image size and the edge padding size, determine the first storage address after reserving the edge padding position, and use it as the initial storage address; Image data is obtained according to the effective size of the convolution, resulting in multiple blocks. The image data is the target image or the intermediate convolution result of the target image. Starting from the initial storage address, the image data of each block is written into the storage unit. The storage unit and the block have a one-to-one relationship in a single storage cycle. The number of storage units is not less than twice the number of blocks in a single parallel storage. The number of blocks in a single parallel storage is the rounded-up value of the ratio of the convolution image size to the effective convolution size. According to the edge fill size, perform edge fill at the edge fill position; Perform a convolution operation on the blocks whose edges have been filled to obtain the current convolution result; Determine whether the current convolution result meets the preset termination condition; If not, return to the step of obtaining image data according to the effective size of the convolution and obtaining multiple blocks, and perform data blocking, data storage, edge filling and convolution operation on the current convolution result; If so, the current convolution result is used as the output convolution result.

2. The method as described in claim 1, characterized in that, Starting from the initial storage address, the image data of each block is written into the storage unit, including: Based on the initial storage address and the convolutional image size, determine the starting storage address of each row of data in the block; Starting from the initial storage address, the image data of each block is written into the storage unit respectively.

3. The method as described in claim 1, characterized in that, According to the edge fill size, edge fill is performed at the edge fill location, including: For each block, a row of data in the block is written into the storage unit to the idle clock when the next row of data in the block is started. The horizontal fill data of the block is obtained according to the edge fill size. The horizontal edge fill is performed at the reserved edge fill position in the horizontal direction according to the horizontal fill data. For each row of blocks, the row of blocks is written into the storage unit into the idle clock before the next row of blocks is written. The vertical fill data of the row of blocks is obtained according to the edge fill size and the convolutional image size. Based on the vertical fill data, vertical edge filling is performed at the reserved edge fill positions in the vertical direction.

4. The method as described in claim 3, characterized in that, According to the stated edge fill size, obtain the lateral fill data for this block, including: When the block is located at the edge of the image data, the storage unit corresponding to the block and the storage units corresponding to the adjacent blocks in the horizontal direction are used as the first storage unit; If the block is not located at the edge of the image data, the storage unit corresponding to the adjacent block in the horizontal direction is used as the first storage unit; From the first storage unit, according to the edge fill size, obtain the start area data or end area data of each row of data as the horizontal fill data.

5. The method as described in claim 4, characterized in that, From the first storage unit, according to the edge fill size, the starting area data or ending area data of each row of data is obtained as the horizontal fill data, including: Based on the initial storage address, the effective convolution size, and the edge padding size, determine the horizontal address range of the start region data or the end region data of each row of data in the storage unit; From the first storage unit, according to the horizontal address range, the starting area data or the ending area data of each row of data is read as the horizontal fill data.

6. The method as described in claim 4, characterized in that, Based on the horizontal fill data, perform horizontal edge fill, including: When the block is located at the edge of the image data, the horizontal fill data obtained from the storage unit of the block is mirrored; the mirrored horizontal fill data and the horizontal fill data obtained from the storage unit of the adjacent block are written to the edge fill position of the block in the horizontal direction. If the block is not located at the edge of the image data, the horizontal fill data is written to the horizontal edge fill position of the block.

7. The method as described in claim 3, characterized in that, Based on the edge fill size and the convolutional image size, obtain the vertical fill data for the block in that row, including: When the block in a row is located at the edge of the image data, the storage unit corresponding to the block in that row and the storage unit corresponding to the block in the adjacent row in the vertical direction are used as the second storage unit; If the block in the row is not located at the edge of the image data, the storage unit corresponding to the block in the adjacent row of the block in the vertical direction will be used as the second storage unit. From the second storage unit, according to the edge fill size and the convolutional image size, the start region data or end region data of each column of data is obtained as the vertical fill data.

8. The method as described in claim 7, characterized in that, From the second storage unit, according to the edge fill size and the convolutional image size, the starting region data or ending region data of each column of data is obtained as the vertical fill data, including: Based on the convolutional image size and the edge padding size, determine the vertical address range of the starting region data or the ending region data of each column of data in the storage unit; From the second storage unit, according to the vertical address range, read the starting area data or the ending area data of each column of data as the vertical fill data.

9. The method as described in claim 7, characterized in that, Based on the vertical fill data, perform vertical edge filling, including: When the block in the row is located at the edge of the image data, the vertical fill data obtained from the storage unit of the block is mirrored; the mirrored vertical fill data and the vertical fill data obtained from the storage unit of the adjacent block are written to the vertical edge fill position of the block in the row. If the block in that row is not at the edge of the image data, the vertical fill data is written to the vertical edge fill position of the block in that row.

10. A computer program product, characterized in that, When the instructions in the computer program product are executed by the processor of the electronic device, the electronic device causes the electronic device to perform the image convolution iterative method as described in any one of claims 1-9.