Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large convolution kernel hardware implementation method, computer equipment and storage medium

A hardware implementation and storage medium technology, applied in the field of convolutional neural network, can solve the problems of reducing the actual processing performance of NPU, increasing the waiting time of NPU, etc., and achieve the effect of improving processing performance and reducing complexity

Active Publication Date: 2022-04-05
南京风兴科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem of using software to realize the convolution operation of a large convolution kernel will increase the waiting time of the NPU and reduce the actual processing performance of the NPU, the application discloses a large convolution kernel hardware implementation method, computer equipment and storage medium

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large convolution kernel hardware implementation method, computer equipment and storage medium
  • Large convolution kernel hardware implementation method, computer equipment and storage medium
  • Large convolution kernel hardware implementation method, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0055] Example 1, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 5×5 convolution kernel.

[0056] For a (out_ch, in_ch, 5, 5) large convolution kernel, expand in the direction of the output channel to generate two layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator.

[0057] see figure 2 , to generate the first layer of sub-convolution kernels. First, in the direction of the output channel, four 3x3 sub-convolution kernels Conv1, Conv2, Conv3, and Conv4 are selected from the 5×5 convolution kernel expansion according to the step size equal to 2, and the first layer of sub-convolution kernels (out_ch×4, in_ch, 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-convolution kernel is filled with data 0; that is, the...

example 2

[0066] Example 2, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 7×7 convolution kernel.

[0067] For a (out_ch, in_ch, 7, 7) large convolution kernel, expand in the direction of the output channel to generate three layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator. For the filling scheme of each sub-convolution kernel, see Figure 5 .

[0068] Generate the first layer of sub-convolution kernels. First, in the direction of the output channel, from the 7×7 convolution kernel expansion, select 9 3x3 sub-convolution kernels Conv1, Conv2, ..., Conv9 according to the step size equal to 2, and generate the first layer of sub-convolution kernels (out_ch×9, in_ch , 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The large convolution kernel hardware implementation method, computer equipment, and computer-readable storage medium provided by the application include loading a large convolution kernel; expanding the large convolution kernel in the direction of the output channel to generate a layer of 3× 3 sub-convolution kernels; configure the convolutional neural network hardware accelerator according to the layer 3×3 sub-convolution kernels. The large convolution kernel hardware implementation method provided by this application can split the large convolution kernel to generate several 3×3 sub-convolution kernels, wherein there are overlapping parts between the 3×3 sub-convolution kernels; and the complex The large convolution kernel operation is directly deployed on the existing simple convolution hardware in the NPU, which reduces the complexity of the NPU hardware and improves the processing performance of the NPU.

Description

technical field [0001] The present application relates to the technical field of convolutional neural networks, in particular to a large convolution kernel hardware implementation method, computer equipment and storage media. Background technique [0002] In the field of convolutional neural network technology, large convolution kernels generally refer to 5×5 convolution kernels and larger convolution kernels. The existing NPU (Neural-network Processing Unit, embedded neural network processor) already has convolution hardware that can directly implement smaller convolution kernels such as 1×1 convolution kernel and 3×3 convolution kernel, but there is no Convolution hardware that can directly and simply implement large convolution kernels such as 5×5 convolution kernels. [0003] When the NPU encounters convolution operations of large convolution kernels during processing, these convolution operations are generally reloaded into the CPU (central processing unit, central pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 王丹阳杨东天陶为王中风林军
Owner 南京风兴科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products