Large convolution kernel hardware implementation method, computer device and storable medium

A technology of hardware implementation and storage medium, applied in the field of convolutional neural network, can solve the problems of increasing the waiting time of NPU and reducing the actual processing performance of NPU, and achieve the effect of improving processing performance and reducing complexity

Active Publication Date: 2021-12-10
南京风兴科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem of using software to realize the convolution operation of a large convolution kernel will increase the waiting time of the NPU and reduce the actual processing performance of the NPU, the application discloses a large convolution kernel hardware implementation method, computer equipment and storage medium

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large convolution kernel hardware implementation method, computer device and storable medium
  • Large convolution kernel hardware implementation method, computer device and storable medium
  • Large convolution kernel hardware implementation method, computer device and storable medium

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0055] Example 1, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 5×5 convolution kernel.

[0056] For a (out_ch, in_ch, 5, 5) large convolution kernel, expand in the direction of the output channel to generate two layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator.

[0057] see figure 2 , to generate the first layer of sub-convolution kernels. First, in the direction of the output channel, four 3x3 sub-convolution kernels Conv1, Conv2, Conv3, and Conv4 are selected from the 5×5 convolution kernel expansion according to the step size equal to 2, and the first layer of sub-convolution kernels (out_ch×4, in_ch, 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-convolution kernel is filled with data 0; that is, the...

example 2

[0066] Example 2, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 7×7 convolution kernel.

[0067] For a (out_ch, in_ch, 7, 7) large convolution kernel, expand in the direction of the output channel to generate three layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator. For the filling scheme of each sub-convolution kernel, see Figure 5 .

[0068] Generate the first layer of sub-convolution kernels. First, in the direction of the output channel, from the 7×7 convolution kernel expansion, select 9 3x3 sub-convolution kernels Conv1, Conv2, ..., Conv9 according to the step size equal to 2, and generate the first layer of sub-convolution kernels (out_ch×9, in_ch , 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a large convolution kernel hardware implementation method, a computer device and a computer readable storage medium. The method comprises the steps of loading a large convolution kernel; expanding the large convolution kernel in an output channel direction to generate a layer 3*3 sub convolution kernel; and configuring a convolutional neural network hardware accelerator according to the layer 3*3 sub-convolution kernel. According to the large convolution kernel hardware implementation method provided by the invention, the large convolution kernel can be split into a plurality of 3*3 sub-convolution kernels, and the 3*3 sub-convolution kernels have overlapped parts; and the complex large convolution kernel operation is directly deployed on the existing simple convolution hardware in an NPU, so that the complexity of the NPU hardware is reduced, and the processing performance of the NPU is improved.

Description

technical field [0001] The present application relates to the technical field of convolutional neural networks, in particular to a large convolution kernel hardware implementation method, computer equipment and storage media. Background technique [0002] In the field of convolutional neural network technology, large convolution kernels generally refer to 5×5 convolution kernels and larger convolution kernels. The existing NPU (Neural-network Processing Unit, embedded neural network processor) already has convolution hardware that can directly implement smaller convolution kernels such as 1×1 convolution kernel and 3×3 convolution kernel, but there is no Convolution hardware that can directly and simply implement large convolution kernels such as 5×5 convolution kernels. [0003] When the NPU encounters convolution operations of large convolution kernels during processing, these convolution operations are generally reloaded into the CPU (central processing unit, central pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 王丹阳杨东天陶为王中风林军
Owner 南京风兴科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products