Data processing method and device, chip, electronic equipment and medium

By splitting the original convolution kernel into multiple second convolution kernels and performing dilated convolution calculations by combining the dilation rate and preset stride, the problems of increased storage overhead and computational load are solved, achieving a more efficient computational energy consumption ratio and reducing the implementation difficulty of hardware accelerator chips.

CN115019054BActive Publication Date: 2026-06-30VASTAI TECH (SHANGHAI) INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
VASTAI TECH (SHANGHAI) INC
Filing Date
2022-06-29
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

When existing dilated convolution techniques are deployed on hardware accelerator chips, they suffer from high storage overhead, increased computational load, and reduced energy efficiency, which increases the difficulty of implementation.

Method used

The original convolutional kernel is split into multiple second convolutional kernels, and dilated convolution is performed on the input feature map based on the dilation rate and preset stride to reduce storage overhead and computational bandwidth and avoid invalid computation.

Benefits of technology

This reduces the implementation difficulty of hardware accelerator chips, improves the computing power efficiency, and enhances computational efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115019054B_ABST
    Figure CN115019054B_ABST
Patent Text Reader

Abstract

This disclosure provides a data processing method, apparatus, chip, electronic device, and medium, relating to the field of computer technology, specifically the field of data processing technology. The implementation scheme includes: acquiring an input feature map; acquiring a first convolution kernel, the first convolution kernel comprising multiple elements; splitting the first convolution kernel into multiple second convolution kernels, each of the multiple second convolution kernels corresponding one-to-one with the multiple elements; storing the multiple second convolution kernels in a memory; acquiring dilated convolution information of the first convolution kernel for dilated convolution calculation, the dilated convolution information including dilation rate and preset stride; and performing dilated convolution calculation on the input feature map based on the dilation rate, preset stride, and the multiple second convolution kernels.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, specifically to the field of data processing technology, and more particularly to a data processing method, apparatus, chip, electronic device, computer-readable storage medium, and computer program product. Background Technology

[0002] Dilated convolution, also known as dilated convolution, is a convolution technique that adds holes to a standard convolution kernel to increase its receptive field. Dilated convolution has been widely used in deep learning; for example, in image processing, it can increase the receptive field without losing resolution, thereby enabling the detection or segmentation of larger objects in an image.

[0003] The methods described in this section are not necessarily methods that had been previously conceived or adopted. Unless otherwise specified, no method described in this section should be assumed to be prior art simply because it is included in this section. Similarly, unless otherwise specified, the issues mentioned in this section should not be considered to be accepted in any prior art. Summary of the Invention

[0004] This disclosure provides a data processing method, apparatus, chip, electronic device, computer-readable storage medium, and computer program product.

[0005] According to one aspect of this disclosure, a data processing method is provided. The method includes: acquiring an input feature map; acquiring a first convolutional kernel, the first convolutional kernel comprising multiple elements; splitting the first convolutional kernel into multiple second convolutional kernels, the multiple second convolutional kernels corresponding one-to-one with the multiple elements; storing the multiple second convolutional kernels in a memory; acquiring dilated convolutional information of the first convolutional kernel for dilated convolution calculation, the dilated convolutional information including dilation rate and preset stride; and performing dilated convolution calculation on the input feature map based on the dilation rate, preset stride, and the multiple second convolutional kernels.

[0006] According to another aspect of this disclosure, a data processing apparatus is provided. The apparatus includes: an input feature map acquisition unit configured to acquire an input feature map; a first convolution kernel acquisition unit configured to acquire a first convolution kernel, the first convolution kernel including a plurality of elements; a splitting unit configured to split the first convolution kernel into a plurality of second convolution kernels, the plurality of second convolution kernels corresponding one-to-one with the plurality of elements; a storage unit configured to store the plurality of second convolution kernels in a memory; a dilated convolution parameter acquisition unit configured to acquire dilated convolution information of the first convolution kernel for dilated convolution calculation, the dilated convolution information including a dilation rate and a preset stride; and a calculation unit configured to perform dilated convolution calculation on the input feature map based on the dilation rate, the preset stride, and the plurality of second convolution kernels.

[0007] According to another aspect of this disclosure, a chip is provided, comprising: at least one processor; and a memory having a computer program stored thereon, wherein the computer program, when executed by the processor, causes the processor to perform the methods described above.

[0008] According to another aspect of this disclosure, an electronic device is provided, including the chip described above.

[0009] According to another aspect of this disclosure, a computer-readable storage medium is provided that stores a computer program thereon, which, when executed by a processor, causes the processor to perform the methods described above.

[0010] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, causes the processor to perform the methods described above.

[0011] According to one or more embodiments of this disclosure, by splitting the original convolutional kernel into multiple second convolutional kernels, and with each of the second convolutional kernels corresponding one-to-one with multiple elements in the original convolutional kernel, dilated convolution calculations are performed on the input feature map based on the dilation rate of the original convolutional kernel, the preset stride, and the multiple second convolutional kernels. This reduces storage overhead and bandwidth usage, reduces or avoids invalid calculations, thereby improving the energy efficiency of the computation and reducing the implementation difficulty of the chip.

[0012] These and other aspects of this disclosure will be apparent from the embodiments described below, and will be elucidated with reference to the embodiments described below. Attached Figure Description

[0013] The accompanying drawings exemplify embodiments and form part of the specification, serving together with the textual description to explain exemplary implementations of the embodiments. The illustrated embodiments are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, the same reference numerals refer to similar but not necessarily identical elements.

[0014] Figure 1 This diagram illustrates the expansion of the convolution kernel in related technologies.

[0015] Figure 2 This is a flowchart illustrating a data processing method according to an exemplary embodiment of the present disclosure;

[0016] Figure 3 This is a flowchart illustrating a portion of the process of a data processing method according to an exemplary embodiment of the present disclosure;

[0017] Figure 4 This is a scene diagram illustrating the implementation of a data processing method according to exemplary embodiments of the present disclosure;

[0018] Figure 5 This is a flowchart illustrating a portion of the process of a data processing method according to an exemplary embodiment of the present disclosure;

[0019] Figures 6A to 6E This is a scene diagram illustrating the implementation of a data processing method according to exemplary embodiments of the present disclosure;

[0020] Figure 7 This is a structural block diagram illustrating a data processing apparatus according to exemplary embodiments of the present disclosure; and

[0021] Figure 8 This is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure. Detailed Implementation

[0022] In this disclosure, unless otherwise stated, the use of terms such as "first," "second," etc., to describe various elements is not intended to limit the positional, temporal, or importance relationships of these elements; such terms are merely used to distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of that element, while in other cases, based on the context, they may refer to different instances.

[0023] As mentioned above, dilated convolution is a convolutional method that adds holes to standard convolution, thereby increasing the receptive field without shrinking the feature map. The following will refer to... Figure 1 Describe in detail the process of adding holes to a standard convolution. Figure 1 This diagram illustrates the expansion of the convolution kernel in related technologies.

[0024] like Figure 1 As shown, the original convolutional kernel 110 has a size of 3*3 and includes 9 elements, which are represented by the numbers "1" to "9". This original convolutional kernel 110 can be used to perform convolution operations on input feature maps, which can be obtained by feature extraction from image data, text data, or audio data, for example.

[0025] In related technologies, when it is necessary to perform dilated convolution on the input feature map based on the original convolution kernel 110, holes can be added to the original convolution kernel 110 first. For example, as Figure 1As shown to the right of the arrow, the original convolutional kernel 110 can be dilated at a dilation rate of 4 to obtain the dilated convolutional kernel 120. The dilated convolutional kernel 120 has a size of 9*9, and the nine elements (the numbers "1" to "9") are located in their respective positions within the dilated convolutional kernel 120. In the remaining positions of the dilated convolutional kernel 120, the element values ​​are zero. Then, the dilated convolutional kernel 120 can be used to perform a standard convolution operation on the input feature map to obtain the corresponding output feature map.

[0026] from Figure 1 As seen in the example, the kernel size changes from 3x3 to 9x9, and the expanded kernel 120 contains many elements with zero values. While this increases the receptive field, it presents challenges for deploying convolutions on processors or hardware accelerator chips. For example, the expanded kernel increases storage overhead and bandwidth usage, and requires more storage space in memory (e.g., Double Data Rate (DDR) memory or Synchronous Static Random Access Memory (SSRAM)). Simultaneously, the internal cache required for computation on the hardware accelerator chip also increases. Furthermore, the computational load increases accordingly when performing convolution operations using the expanded kernel. Because the parameters of the expanded kernel are sparse (containing many elements with zero values), it increases the amount of unnecessary computation. Therefore, the greater demand for storage space and computational load leads to a decrease in energy efficiency and a sharp increase in chip implementation difficulty.

[0027] Based on this, this disclosure proposes a data processing method that splits the original convolutional kernel into multiple second convolutional kernels, with each second convolutional kernel corresponding one-to-one with multiple elements in the original convolutional kernel. By performing dilated convolution calculations on the input feature map based on the dilation rate and preset stride of the original convolutional kernel and the multiple second convolutional kernels, storage overhead and computational bandwidth consumption can be reduced, and invalid computations can be reduced or avoided, thereby improving the energy efficiency of the computation and reducing the implementation difficulty of the chip.

[0028] Exemplary embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.

[0029] First refer to Figure 2 , Figure 2 This is a flowchart illustrating a data processing method 200 according to an exemplary embodiment of the present disclosure. For example... Figure 2 As shown, method 200 includes steps S210 to S260:

[0030] Step S210: Obtain the input feature map;

[0031] Step S220: Obtain the first convolution kernel, which includes multiple elements;

[0032] Step S230: Split the first convolution kernel into multiple second convolution kernels, with each of the multiple second convolution kernels corresponding to a single element;

[0033] Step S240: Store multiple second convolution kernels in memory;

[0034] Step S250: Obtain the dilated convolution information of the first convolution kernel for dilated convolution calculation. The dilated convolution information includes the dilation rate and the preset stride; and

[0035] Step S260: Perform dilated convolution calculation on the input feature map based on the dilation rate, preset stride, and multiple second convolution kernels.

[0036] The first convolutional kernel can be the original convolutional kernel (e.g., the 3*3 convolutional kernel mentioned above), and based on the dilated convolutional information of the first convolutional kernel used for dilated convolutional calculation, dilated convolutional calculation can be performed on the input feature map (the input feature map can be, for example, obtained by feature extraction from image data, text data, or audio data) based on the first convolutional kernel.

[0037] The multiple second convolutional kernels obtained from the splitting process correspond one-to-one with multiple elements in the first convolutional kernel; that is, each element in the first convolutional kernel corresponds to one second convolutional kernel. Therefore, by storing the multiple second convolutional kernels obtained from the splitting process in memory, and performing dilated convolution calculations on the input feature map based on the dilation rate of the first convolutional kernel for dilated convolution calculation, the preset stride, and the multiple second convolutional kernels, storage overhead and computational bandwidth consumption can be reduced, and invalid computations can be reduced or avoided, thereby improving the energy efficiency ratio of computation and reducing the implementation difficulty of the chip. Method 200 does not require dilating the first convolutional kernel when implementing dilated convolution; that is, it does not require adding elements with zero values ​​to the first convolutional kernel. This eliminates the need to store large-sized convolutional kernels, thereby reducing the space occupied by memory (e.g., DDR, SSRAM). Furthermore, the result of dilated convolution calculation on the input feature map based on the dilation rate, preset stride, and multiple second convolutional kernels is consistent with the result obtained by performing dilated convolution on the input feature map using the first convolutional kernel, and the receptive field size is also consistent. It is evident that, using method 200 for dilated convolution calculation can reduce storage overhead and energy consumption ratio while obtaining the same calculation results.

[0038] According to some embodiments, method 200 can be used with an accelerator chip. Therefore, when performing convolution operations using a hardware accelerator chip, the computational load on the hardware accelerator chip can be reduced, thus lowering the energy consumption ratio. The accelerator chip can be a hardware accelerator chip, such as a tensor processor.

[0039] In some examples, method 200 can also be applied to CPU or GPU.

[0040] According to some embodiments, the input feature map can be obtained by feature extraction from any of the following: image data, text data, and audio data.

[0041] Figure 3 This is a flowchart illustrating a portion of the process of a data processing method 200 according to an exemplary embodiment of the present disclosure.

[0042] like Figure 3 As shown, according to some embodiments, the dilated convolution calculation in step S260 above may include:

[0043] Step S361: For each of the multiple second convolution kernels, perform convolution calculation on the input feature map using the second convolution kernel to obtain an intermediate feature map;

[0044] Step S362: For each of the multiple intermediate feature maps corresponding to the multiple second convolutional kernels, determine the effective region of each intermediate feature map based on the dilation rate and the preset stride; and

[0045] Step S363: Based on the effective regions of each of the multiple intermediate feature maps, obtain the result of dilated convolution calculation.

[0046] For each of the multiple second convolutional kernels, the effective region in the resulting intermediate feature map is the region corresponding to the portion of the output feature map obtained by scanning the input feature map using the element corresponding to that second convolutional kernel in the first convolutional kernel (according to preset dilated convolution parameters) when dilated convolution is performed on the input feature map. For example, refer to... Figure 4 , Figure 4 This is a scene diagram illustrating the implementation of a data processing method 200 according to an exemplary embodiment of the present disclosure. The first convolution kernel 410 may be a first convolution kernel similar to the first convolution kernel 110 described above, such as... Figure 4As shown in the upper part, when it is necessary to perform dilated convolution calculation on the input feature map 430 using the first convolution kernel, the first convolution kernel 410 can be expanded into a dilated convolution kernel 420, and then the dilated convolution kernel 420 can be used to perform convolution calculation on the input feature map 430. In this example, the size of the input feature map 430 is 12*12. When the dilated convolution kernel 420 scans the input feature map 430 to perform convolution calculation, the area scanned by the element "1" in the dilated convolution kernel 420 is shown in gray (that is, the area 431 with a size of 4*4 in the upper left corner of the input feature map 430).

[0047] When performing dilated convolution calculation on the input feature map using method 200, the first convolution kernel 410 is first split into multiple second convolution kernels (for example, in this example, it can be split into 9 second convolution kernels). For the sake of simplicity, Figure 4 Only the first second convolutional kernel 440 is shown; the other eight second convolutional kernels are not shown. The input feature map is then subjected to standard convolution using this second convolutional kernel 440 to obtain an intermediate feature map 450. The gray area in the intermediate feature map 450 is the effective region 451. The position of the effective region 451 in the intermediate feature map 450 corresponds to the position of the region 431 scanned by element "1" in the dilated convolutional kernel 420 in the input feature map 430.

[0048] Finally, the effective regions of the nine intermediate feature maps obtained by standard convolution calculation on the input feature map based on the nine second convolution kernels can be obtained to obtain the final result of dilated convolution calculation.

[0049] Therefore, the dilated convolution calculation results obtained from the effective regions of multiple intermediate feature maps, without considering other regions in the intermediate feature maps besides the effective regions, can further reduce invalid calculations, thereby improving computational efficiency while reducing storage usage.

[0050] According to some embodiments, in step S361, performing convolution calculation on the input feature map using the second convolution kernel may include: performing convolution calculation on the input feature map by using the second convolution kernel to traverse each element in the input feature map with a stride of 1.

[0051] According to some embodiments, in step S363, obtaining the result of the dilated convolution calculation based on the effective regions of each of the multiple intermediate feature maps may include:

[0052] The corresponding elements in the effective regions of multiple intermediate feature maps are summed to obtain the output feature map; and the result of dilated convolution is obtained based on the output feature map.

[0053] For example, in the nine intermediate feature maps obtained, the corresponding effective region in each intermediate feature map is a region of size 4*4. Then, the corresponding elements in the nine 4*4 regions are summed (for example, the first element of each effective region corresponds to the second element of each effective region), thus obtaining an output feature map of size 4*4.

[0054] In some examples, the preset stride of the first convolutional kernel for dilated convolution computation can be 1, and the output feature map can be used as the result of the dilated convolution computation.

[0055] According to some embodiments, step S362 above, which involves determining the effective region of each intermediate feature map based on the dilation rate and a preset stride for each of the multiple intermediate feature maps corresponding to the multiple second convolutional kernels, may include:

[0056] Based on the expansion rate and preset step size, determine the coordinates of multiple corner points of the effective region; and

[0057] Extract the effective region enclosed by the coordinates of multiple corner points from the intermediate feature map.

[0058] For example, continue to refer to Figure 4 Based on the expansion rate and preset step size, the coordinates of multiple corner points of the effective region 451 can be determined (e.g., the coordinates of two opposite corner points of the effective region 451, or the coordinates of all four corner points of the effective region 451). Since the feature map is usually rectangular (square), determining the corner point coordinates allows for a simple and quick determination of the effective region, thereby reducing the computational cost required to extract the effective region.

[0059] According to some embodiments, the step of determining the coordinates of multiple corner points of the effective region based on the expansion rate and a preset step size may include:

[0060] Based on the expansion rate and preset step size, determine the horizontal and vertical starting corner coordinates of the effective area; and

[0061] Based on the expansion rate, preset step size, and the size of the input feature map, determine the coordinates of the horizontal and vertical termination corners of the effective region.

[0062] For example, the coordinates of the starting horizontal corner of the effective area can be determined by the following equation (1):

[0063] slice start =dW*i Equation (1)

[0064] Furthermore, the coordinates of the lateral termination point of the effective region can be determined using the following equations (2) and (3):

[0065]

[0066] Among them, slice start Indicates the coordinates of the horizontal starting corner point of the effective area;

[0067] slice end Indicates the coordinates of the horizontal termination point of the effective area;

[0068] dW represents the dilation rate of dilated convolution in the width direction;

[0069] i represents the index of the second convolution kernel among multiple second convolution kernels (i is an integer greater than or equal to zero, and for the first second convolution kernel, the value of i is 0);

[0070] iW represents the width of the input feature map;

[0071] dKW represents the size of the expansion of the first convolution kernel in the width direction when performing dilated convolution using the first convolution; and

[0072] sW represents the stride of the dilated convolution in the width direction.

[0073] Therefore, the coordinates of the horizontal starting corner and the horizontal ending corner of the effective area can be determined.

[0074] Similarly, the coordinates of the starting and ending points of the longitudinal angle of the effective area can be determined.

[0075] According to some embodiments, the dilated convolution information may also include padding information, and the effective region of each intermediate feature map may be determined based on the dilation rate, preset stride and padding information for each of the multiple intermediate feature maps corresponding to the multiple second convolution kernels.

[0076] In some scenarios, the dilated convolution information used for dilated convolution calculation in the first convolution kernel may include padding information. Padding refers to filling specific regions of the input feature map with preset element values ​​during the convolution process. When the dilated convolution information includes padding information, the effective region of the resulting intermediate feature map is different from that when the dilated convolution information does not include padding information. Therefore, based on the dilation rate, preset stride, and padding information, the effective region of each intermediate feature map can be determined more accurately.

[0077] For example, the coordinates of the starting horizontal corner of the effective area can be determined by the following equation (3):

[0078] slice start =dW*i-padL Equation (3)

[0079] Where padL represents the number of padding elements to the left of the dilated convolution.

[0080] Furthermore, the coordinates of the lateral termination point of the effective region can be determined using the following equations (4) and (5):

[0081]

[0082]

[0083] Where padR represents the number of padding elements to the right of the dilated convolution.

[0084] Therefore, when the dilated convolution information includes padding information, the horizontal starting and ending corner coordinates of the effective region can be determined.

[0085] Similarly, when the dilated convolution information includes padding information, the longitudinal starting and ending corner coordinates of the effective region can be determined.

[0086] The final effective area can be determined based on the coordinates of the horizontal starting corner, the horizontal ending corner, the vertical starting corner, and the vertical ending corner of the effective area.

[0087] In scenarios where the dilated convolution information used for dilated convolution calculation in the first convolution kernel includes padding information, the size of the effective region corresponding to each second convolution kernel obtained in the above steps may be different. To facilitate obtaining the final output feature map based on each effective region, a padding operation can be further performed in step S363. The following will continue to refer to... Figure 5 To provide an explanation.

[0088] Figure 5 This is a flowchart illustrating a portion of the process of a data processing method 200 according to an exemplary embodiment of the present disclosure. For example... Figure 5 As shown, according to some embodiments, step S363 above, based on the effective regions of each of the multiple intermediate feature maps, to obtain the result of the dilated convolution calculation may include:

[0089] Step S510: Based on the fill information, expansion rate and preset step size, determine the effective area to be filled and the effective area not to be filled from multiple effective areas, as well as the area to be filled around the effective area to be filled;

[0090] Step S520: Fill the area to be filled with preset element values ​​in the area to be filled surrounding the valid area to be filled; and

[0091] Step S530: Based on the unfilled valid region and the filled valid region to be filled, obtain the result of the dilated convolution calculation. The unfilled valid region and the filled valid region to be filled have the same size.

[0092] Therefore, by filling the area to be filled around the effective area to be filled with preset element values ​​(e.g., elements with a value of zero), the size of the unfilled effective area and the filled effective area to be filled are the same, which makes it easier to sum the elements between the unfilled effective area and the filled effective area to be filled, thereby obtaining the result of dilated convolution calculation.

[0093] In one example, the nine extracted valid regions are not all the same size. For instance, six of the nine valid regions are 4x4; one is 3x4; and two are 3x3. Based on the padding information, dilation rate, and preset stride, the valid regions to be filled (one 3x4 region and two 3x3 regions) and the unfilled regions (six 4x4 regions) can be determined from the nine valid regions, as well as the regions surrounding the valid regions to be filled. Zero elements can be filled into the regions surrounding the 3x4 and 3x3 regions, thus filling all three regions to 4x4. Therefore, element-wise summation can be performed on these nine 4x4 regions to obtain the result of the dilated convolution calculation.

[0094] According to some embodiments, step S510 above, which involves determining the area to be filled around the effective area to be filled from multiple effective areas based on filling information, expansion rate, and preset step size, may include:

[0095] Based on the filling information, expansion rate, and preset step size, determine the horizontal and vertical starting corner coordinates of the area to be filled; and

[0096] Based on the filling information, expansion rate, preset step size, and size of the input feature map, determine the coordinates of the horizontal and vertical termination corners of the region to be filled.

[0097] For example, the coordinates of the starting horizontal corner of the area to be filled can be determined by the following equation (6):

[0098]

[0099] Among them, pad before This indicates the coordinates of the starting horizontal corner of the area to be filled.

[0100] Furthermore, the coordinates of the lateral termination point of the region to be filled can be determined using the following equations (7), (8), and (9):

[0101] dKW=(dW-1)*(kW-1)+kW Equation (7)

[0102]

[0103]

[0104] Wherein, dKW represents the size of the first convolution kernel expanded in the width direction when using the first convolution for dilated convolution (i.e., the size of the expanded convolution kernel);

[0105] kW represents the width of the original convolutional kernel; and

[0106] pad after This indicates the coordinates of the horizontal termination point of the area to be filled.

[0107] Therefore, the coordinates of the starting and ending horizontal corner points of the area to be filled can be determined.

[0108] Similarly, the coordinates of the starting and ending vertical corners of the area to be filled can be determined.

[0109] The final area to be filled can be determined by using the coordinates of the horizontal starting corner, the horizontal ending corner, the vertical starting corner, and the vertical ending corner of the area to be filled.

[0110] According to some embodiments, the dilated convolution information may further include padding information, and step S362 above, determining the effective region of each intermediate feature map based on the dilation rate and a preset stride for each of the multiple intermediate feature maps corresponding to the multiple second convolution kernels, may include:

[0111] For each intermediate feature map, a padding operation is performed on the intermediate feature map based on the padding information to obtain a padded intermediate feature map; and

[0112] The effective region of each padded intermediate feature map is determined based on the dilation rate and a preset stride. The result of dilated convolution can be obtained based on the effective regions of multiple padded intermediate feature maps.

[0113] For example, based on the padding information, the outer regions of the intermediate feature maps that need padding can be padded (e.g., elements with a value of zero). Then, based on the dilation rate and a preset stride, the effective region of each padded intermediate feature map can be determined. This allows the extraction of the effective region containing the padded portion, thereby further obtaining the results of dilated convolution calculations. It should be understood that the final extracted effective regions can have the same size.

[0114] Next, we will continue to combine Figures 6A to 6E The data processing method 200 according to an embodiment of the present disclosure will be described. Figures 6A to 6E This is a scene diagram illustrating the implementation of a data processing method 200 according to an exemplary embodiment of the present disclosure.

[0115] First refer to Figure 6A The input feature map 610 and the first convolutional kernel 620 are obtained. The parameters of the input feature map 610 (also known as the input tensor) are [1,4,12,12], that is, the length and width of the input feature map 610 are 12, the number of layers is 4, and the number of channels is 1. The first convolutional kernel 620 is the original convolutional kernel, and the parameters of the first convolutional kernel 620 are [4,4,3,3], that is, the length and width of the first convolutional kernel 620 are 3, the number of layers is 4, and the number of channels is 4. In this example, the preset dilated convolution parameters are dilation rate [4,4], stride [1,1], and padding size [0,0].

[0116] Continue to refer to Figure 6B The first convolutional kernel 620 is split into multiple second convolutional kernels 621, each corresponding to one of the elements in the first convolutional kernel 620. Since each first convolutional kernel 620 has a width and height of 3, each first convolutional kernel 620 includes 9 elements. A first convolutional kernel 620 is split into 9 second convolutional kernels 621, and each second convolutional kernel 621 includes a corresponding element from the first convolutional kernel 620. Therefore, the parameters of the convolutional kernel change to [36, 4, 1, 1].

[0117] Next, multiple second convolution kernels 621 can be stored in memory for later retrieval of the second convolution kernels for dilated convolution calculations.

[0118] Continue to refer to Figure 6C Each second convolution kernel 621 is used to perform standard convolution on the input feature map 610 to obtain multiple intermediate feature maps 630 (parameters [1,36,12,12]).

[0119] Continue to refer to Figure 6D For each of the multiple intermediate feature maps 630 corresponding to the multiple second convolutional kernels 621, the effective region of each intermediate feature map can be determined based on the dilation rate, preset stride, and padding size using the method described in the above embodiments. Figure 6D The middle section shows the areas marked with an "X" in each intermediate feature map. For example, multiple corner coordinates of each effective region can be determined based on the expansion rate, preset step size, and fill size, and each effective region 640 enclosed by multiple corner coordinates can be extracted from the intermediate feature map. In the example shown in Figure D, the size of the effective region 640 is 4*4.

[0120] Continue to refer to Figure 6E The elements corresponding to each of the effective regions 640 of the multiple intermediate feature maps are summed to obtain the output feature map 650, thereby obtaining the result of the dilated convolution calculation. Figure 6E In the example shown, every 9 valid regions 640 are summed between elements to obtain a corresponding output feature map 650.

[0121] According to another aspect of this disclosure, a data processing apparatus is also provided. Figure 7 This is a structural block diagram illustrating a data processing apparatus 700 according to an exemplary embodiment of the present disclosure.

[0122] like Figure 7 As shown, the apparatus 700 includes: an input feature map acquisition unit 710 configured to acquire an input feature map; a first convolution kernel acquisition unit 720 configured to acquire a first convolution kernel, the first convolution kernel including multiple elements; a splitting unit 730 configured to split the first convolution kernel into multiple second convolution kernels, the multiple second convolution kernels corresponding one-to-one with the multiple elements; a storage unit 740 configured to store the multiple second convolution kernels in a memory; a dilated convolution parameter acquisition unit 750 configured to acquire dilated convolution information of the first convolution kernel for dilated convolution calculation, the dilated convolution information including dilation rate and preset stride; and a calculation unit 760 configured to perform dilated convolution calculation on the input feature map based on dilation rate, preset stride and multiple second convolution kernels.

[0123] It is understandable that the operation and technical effects of units 710 to 760 in device 700 are respectively related to... Figure 2 The operations and technical effects of steps S210 to S260 are similar, and will not be described in detail here.

[0124] According to another aspect of this disclosure, a chip is provided, comprising: at least one processor; and a memory having a computer program stored thereon, wherein the computer program, when executed by the processor, causes the processor to perform the data processing method described above.

[0125] According to another aspect of this disclosure, an electronic device is provided, including the chip described above.

[0126] According to another aspect of this disclosure, a computer-readable storage medium is provided that stores a computer program thereon, which, when executed by a processor, causes the processor to perform the data processing method described above.

[0127] According to another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, causes the processor to perform the data processing method described above.

[0128] Figure 8 This is a block diagram illustrating an example of an electronic device according to exemplary embodiments of the present disclosure. It should be noted that... Figure 8 The structure shown is merely an example; depending on the specific implementation, the electronic device disclosed herein may include only [specific components]. Figure 8 One or more of the components shown.

[0129] Electronic device 800 may be, for example, a general-purpose computer (such as a laptop computer, tablet computer, and various other computers), a mobile phone, or a personal digital assistant. According to some embodiments, electronic device 800 may be a cloud computing device or a smart device.

[0130] According to some embodiments, electronic device 800 may be configured to process at least one of images, text, and audio, and transmit the processing results to an output device for provision to a user. The output device may be, for example, a display screen, a device including a display screen, or a sound output device such as headphones, a speaker, or an oscillator. For example, electronic device 800 may be configured to perform object detection on an image and transmit the object detection results to a display device for display; electronic device 800 may also be configured to perform image enhancement processing and transmit the enhancement results to a display device for display. Electronic device 800 may also be configured to recognize text in an image and transmit the recognition results to a display device for display and / or convert the recognition results into sound data and transmit them to a sound output device for playback. Electronic device 800 may also be configured to recognize and process audio, transmit the recognition results to a display device for display and / or convert the processing results into sound data and transmit them to a sound output device for playback.

[0131] Electronic device 800 may include image processing circuit 803, which may be configured to perform various image processing operations on the image. For example, image processing circuit 803 may be configured to perform at least one of the following image processing operations on the image: noise reduction, geometric correction, feature extraction, object detection and / or recognition, image enhancement, and text detection and / or recognition, etc.

[0132] The electronic device 800 may further include a text recognition circuit 804, which is configured to perform text detection and / or recognition (e.g., OCR processing) on ​​text regions in an image to obtain text data. The text recognition circuit 804 may be implemented, for example, using a dedicated chip. The electronic device 800 may also include a voice conversion circuit 805, which is configured to convert the text data into voice data. The voice conversion circuit 805 may also be implemented, for example, using a dedicated chip.

[0133] The electronic device 800 may further include an audio processing circuit 806 configured to convert audio into text, thereby obtaining text data corresponding to the audio. The audio processing circuit 806 may also be configured to process the text data corresponding to the audio, for example, including keyword extraction, intent recognition, intelligent recommendation, and intelligent question answering. The audio processing circuit 806 may be implemented, for example, using a dedicated chip. The sound conversion circuit 805 may also be configured to convert the audio processing result into sound data for application scenarios such as voice assistants or virtual customer service.

[0134] The various circuits described above (e.g., one or more of the image processing circuit 803, character recognition circuit 804, voice conversion circuit 805, and audio processing circuit 806) can be implemented using custom hardware, and / or using hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, one or more of the various circuits described above can be implemented by programming hardware (e.g., programmable logic circuits including field-programmable gate arrays (FPGAs) and / or programmable logic arrays (PLAs)) using the logic and algorithms according to this disclosure in assembly language or hardware programming languages ​​(such as Verilog, VHDL, C++).

[0135] According to some embodiments, the electronic device 800 may also include an output device 807, which may be any type of device for presenting information, including but not limited to a display screen, a terminal with display function, headphones, a speaker, a vibrator and / or a printer.

[0136] According to some embodiments, the electronic device 800 may also include an input device 808, which may be any type of device for inputting information into the electronic device 800, including but not limited to various sensors, mice, keyboards, touch screens, buttons, joysticks, microphones and / or remote controls, etc.

[0137] According to some embodiments, the electronic device 800 may also include a communication device 809, which may be any type of device or system that enables communication with external devices and / or with a network, including but not limited to modems, network cards, infrared communication devices, wireless communication devices and / or chipsets, such as Bluetooth devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices and / or the like.

[0138] According to some embodiments, the electronic device 800 may also include a processor 801. The processor 801 can be any type of processor and may include, but is not limited to, one or more general-purpose processors and / or one or more dedicated processors (e.g., special-purpose processing chips). The processor 801 may be, for example, but not limited to, a central processing unit (CPU), a graphics processing unit (GPU), or various dedicated artificial intelligence (AI) computing chips, etc.

[0139] Electronic device 800 may also include working memory 802 and storage device 811. Processor 801 may be configured to retrieve and execute computer-readable instructions stored in working memory 802, storage device 811, or other computer-readable media, such as program code of operating system 802a, program code of application program 802b, etc. Working memory 802 and storage device 811 are examples of computer-readable storage media for storing instructions that can be executed by processor 801 to perform the various functions described above. Working memory 802 may include both volatile and non-volatile memory (e.g., RAM, ROM, etc.). Storage device 811 may include hard disk drives, solid-state drives, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network-attached storage, storage area networks, etc. Both working memory 802 and storage device 811 can be collectively referred to herein as memory or computer-readable storage medium, and can be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code, which can be executed by processor 801 as a specific machine configured to implement the operations and functions described in the examples herein.

[0140] According to some embodiments, processor 801 can control and schedule at least one of the image processing circuit 803, character recognition circuit 804, voice conversion circuit 805, audio processing circuit 806, and other various devices and circuits included in electronic device 800. According to some embodiments, Figure 8 At least some of the components described herein may be interconnected and / or communicate with each other via bus 810.

[0141] Software elements (programs) may be located in the working memory 802, including but not limited to operating system 802a, one or more application programs 802b, drivers and / or other data and code.

[0142] According to some implementations, the instructions for performing the aforementioned control and scheduling may be included in the operating system 802a or one or more application programs 802b.

[0143] According to some embodiments, instructions for performing the steps of the methods described herein may be included in one or more application programs 802b, and various modules of the electronic device 800 described above may be implemented by the processor 801 reading and executing the instructions of one or more application programs 802b. In other words, the electronic device 800 may include a processor 801 and a memory storing a program (e.g., working memory 802 and / or storage device 811), the program including instructions that, when executed by the processor 801, cause the processor 801 to perform the methods described as in various embodiments of the present disclosure.

[0144] According to some implementations, some or all of the operations performed by at least one of the image processing circuit 803, the character recognition circuit 804, the voice conversion circuit 805, and the audio processing circuit 807 can be implemented by the processor 801 reading and executing instructions from one or more application programs 802b.

[0145] The executable code or source code of the instructions of a software element (program) may be stored in a non-transitory computer-readable storage medium (e.g., the storage device 811) and may be stored in working memory 802 during execution (possibly for compilation and / or installation). Therefore, this disclosure provides a computer-readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the methods described in various embodiments of this disclosure. According to another embodiment, the executable code or source code of the instructions of a software element (program) may also be downloaded from a remote location.

[0146] It should also be understood that various modifications can be made depending on specific requirements. For example, custom hardware can be used, and / or the individual circuits, units, modules, or elements can be implemented using hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the circuits, units, modules, or elements included in the disclosed methods and apparatus can be implemented by programming the hardware (e.g., programmable logic circuits including field-programmable gate arrays (FPGAs) and / or programmable logic arrays (PLAs)) using the logic and algorithms according to this disclosure in assembly language or hardware programming languages ​​(such as Verilog, VHDL, C++).

[0147] According to some implementations, the processor 801 in the electronic device 800 can be distributed across a network. For example, one processor can perform some processing, while another processor located remotely can perform other processing. Other modules of the electronic device 800 can also be distributed similarly. Thus, the electronic device 800 can be interpreted as a distributed computing system performing processing in multiple locations. The processor 801 of the electronic device 800 can also be a processor of a cloud computing system, or a processor incorporating blockchain technology.

[0148] While embodiments or examples of this disclosure have been described with reference to the accompanying drawings, it should be understood that the methods, systems, and devices described above are merely exemplary embodiments or examples, and the scope of the invention is not limited by these embodiments or examples, but only by the granted claims and their equivalents. Various elements in the embodiments or examples may be omitted or replaced by their equivalents. Furthermore, the steps may be performed in a different order than that described in this disclosure. Further, various elements in the embodiments or examples may be combined in various ways. Importantly, as the technology evolves, many elements described herein can be replaced by equivalents that appear after this disclosure.

Claims

1. A data processing method, the method comprising: Obtain an input feature map, which is obtained by feature extraction from any one of the following: image data, text data, and audio data; Obtain the first convolution kernel, which includes multiple elements; The first convolutional kernel is split to obtain multiple second convolutional kernels that correspond one-to-one with the multiple elements; Store the plurality of second convolutional kernels in memory; Obtain the dilated convolution information of the first convolution kernel for dilated convolution calculation, wherein the dilated convolution information includes the dilation rate and the preset stride; as well as The dilated convolution is performed on the input feature map based on the dilation rate, the preset stride, and the plurality of second convolution kernels, wherein the dilated convolution calculation includes: For each of the plurality of second convolution kernels, the input feature map is convolved using the second convolution kernel to obtain an intermediate feature map; For each of the multiple intermediate feature maps corresponding to the multiple second convolutional kernels, an effective region of each intermediate feature map is determined based on the dilation rate and a preset stride. The position of the effective region in the intermediate feature map corresponds to the position of the region scanned in the input feature map by the element corresponding to the second convolutional kernel in the dilated convolutional kernel. The dilated convolutional kernel is determined based on the first convolutional kernel and the dilation rate. The elements corresponding to each of the multiple intermediate feature maps within their respective effective regions are summed to obtain the output feature map; and Based on the output feature map, the result of the dilated convolution calculation is obtained.

2. The method of claim 1, wherein, The convolution calculation of the input feature map using the second convolution kernel includes: Using the second convolution kernel, the input feature map is convolved by traversing each element of the input feature map with a stride of 1.

3. The method of claim 1, wherein, For each of the multiple intermediate feature maps corresponding to the multiple second convolutional kernels, determining the effective region of each intermediate feature map based on the dilation rate and the preset stride includes: Based on the expansion rate and the preset step size, determine the coordinates of multiple corner points of the effective region; and Extract the effective region enclosed by the coordinates of the multiple corner points from the intermediate feature map.

4. The method of claim 3, wherein, Determining the coordinates of multiple corner points of the effective region based on the expansion rate and the preset step size includes: Based on the expansion rate and the preset step size, determine the lateral and longitudinal starting corner coordinates of the effective region; and Based on the expansion rate, the preset step size, and the size of the input feature map, the coordinates of the horizontal and vertical termination corners of the effective region are determined.

5. The method according to claim 1, wherein, The dilated convolution information also includes padding information, and for each of the multiple intermediate feature maps corresponding to the multiple second convolution kernels, the effective region of each intermediate feature map is determined based on the dilation rate, the preset stride, and the padding information.

6. The method according to claim 5, wherein, Determining the effective region of each intermediate feature map based on the expansion rate, preset step size, and padding information includes: Based on the filling information, expansion rate, and preset step size, determine the effective regions to be filled and the effective regions not to be filled from multiple effective regions, as well as the regions to be filled around the effective regions to be filled; Fill the area to be filled around the valid area to be filled with preset element values; and Based on the unfilled effective region and the filled effective region to be filled, the result of the dilated convolution calculation is obtained, wherein the unfilled effective region and the filled effective region to be filled have the same size.

7. The method according to claim 6, wherein, Based on the filling information, expansion rate, and preset step size, the area to be filled around the effective area to be filled is determined from multiple effective areas, including: Based on the filling information, expansion rate, and preset step size, determine the horizontal and vertical starting corner coordinates of the area to be filled; and Based on the filling information, expansion rate, preset step size, and the size of the input feature map, the coordinates of the horizontal and vertical termination corners of the region to be filled are determined.

8. The method according to claim 5, wherein, Determining the effective region of each intermediate feature map based on the expansion rate, preset step size, and padding information includes: For each intermediate feature map, a padding operation is performed on the intermediate feature map based on the padding information to obtain a padded intermediate feature map; and The effective region of each filled intermediate feature map is determined based on the expansion rate and the preset step size. The result of the dilated convolution calculation is obtained based on the effective regions of multiple filled intermediate feature maps.

9. The method according to any one of claims 1 to 8, wherein, The method is used for accelerator chips.

10. A data processing apparatus, the apparatus comprising: The input feature map acquisition unit is configured to acquire an input feature map, which is obtained by feature extraction from any one of the following: image data, text data, and audio data. The first convolutional kernel acquisition unit is configured to acquire a first convolutional kernel, which includes multiple elements; The splitting unit is configured to split the first convolutional kernel to obtain a plurality of second convolutional kernels that correspond one-to-one with the plurality of elements; The storage unit is configured to store the plurality of second convolutional kernels into a memory; The dilated convolution parameter acquisition unit is configured to acquire dilated convolution information of the first convolution kernel for dilated convolution calculation, wherein the dilated convolution information includes dilation rate and preset stride. as well as The computation unit is configured to perform the dilated convolution calculation on the input feature map based on the dilation rate, a preset stride, and the plurality of second convolution kernels, wherein the dilated convolution calculation includes: For each of the plurality of second convolution kernels, the input feature map is convolved using the second convolution kernel to obtain an intermediate feature map; For each of the multiple intermediate feature maps corresponding to the multiple second convolutional kernels, an effective region of each intermediate feature map is determined based on the dilation rate and a preset stride. The position of the effective region in the intermediate feature map corresponds to the position of the region scanned in the input feature map by the element corresponding to the second convolutional kernel in the dilated convolutional kernel. The dilated convolutional kernel is determined based on the first convolutional kernel and the dilation rate. The elements corresponding to each of the multiple intermediate feature maps within their respective effective regions are summed to obtain the output feature map; and Based on the output feature map, the result of the dilated convolution calculation is obtained.

11. A chip, comprising: At least one processor; as well as The memory, on which computer programs are stored, When the computer program is executed by the processor, it causes the processor to perform the method according to any one of claims 1 to 9.

12. An electronic device comprising the chip as claimed in claim 11.

13. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-9.

14. A computer program product comprising a computer program, wherein, When the computer program is executed by a processor, it implements the method of any one of claims 1-9.