Large convolution kernel hardware implementation method, computer device and storable medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of hardware implementation and storage medium, applied in the field of convolutional neural network, can solve the problems of increasing the waiting time of NPU and reducing the actual processing performance of NPU, and achieve the effect of improving processing performance and reducing complexity

Active Publication Date: 2021-12-10

南京风兴科技有限公司

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the problem of using software to realize the convolution operation of a large convolution kernel will increase the waiting time of the NPU and reduce the actual processing performance of the NPU, the application discloses a large convolution kernel hardware implementation method, computer equipment and storage medium

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0055] Example 1, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 5×5 convolution kernel.

[0056] For a (out_ch, in_ch, 5, 5) large convolution kernel, expand in the direction of the output channel to generate two layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator.

[0057] see figure 2 , to generate the first layer of sub-convolution kernels. First, in the direction of the output channel, four 3x3 sub-convolution kernels Conv1, Conv2, Conv3, and Conv4 are selected from the 5×5 convolution kernel expansion according to the step size equal to 2, and the first layer of sub-convolution kernels (out_ch×4, in_ch, 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-convolution kernel is filled with data 0; that is, the...

example 2

[0066] Example 2, a large convolution kernel hardware implementation method disclosed in this embodiment is applied to the implementation process of a 7×7 convolution kernel.

[0067] For a (out_ch, in_ch, 7, 7) large convolution kernel, expand in the direction of the output channel to generate three layers of 3×3 sub-convolution kernels, and configure them according to the generated two-layer 3×3 sub-convolution kernels Convolutional Neural Network Hardware Accelerator. For the filling scheme of each sub-convolution kernel, see Figure 5 .

[0068] Generate the first layer of sub-convolution kernels. First, in the direction of the output channel, from the 7×7 convolution kernel expansion, select 9 3x3 sub-convolution kernels Conv1, Conv2, ..., Conv9 according to the step size equal to 2, and generate the first layer of sub-convolution kernels (out_ch×9, in_ch , 3, 3). The position where any newly generated 3x3 sub-convolution kernel overlaps with any generated 3x3 sub-con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a large convolution kernel hardware implementation method, a computer device and a computer readable storage medium. The method comprises the steps of loading a large convolution kernel; expanding the large convolution kernel in an output channel direction to generate a layer 3*3 sub convolution kernel; and configuring a convolutional neural network hardware accelerator according to the layer 3*3 sub-convolution kernel. According to the large convolution kernel hardware implementation method provided by the invention, the large convolution kernel can be split into a plurality of 3*3 sub-convolution kernels, and the 3*3 sub-convolution kernels have overlapped parts; and the complex large convolution kernel operation is directly deployed on the existing simple convolution hardware in an NPU, so that the complexity of the NPU hardware is reduced, and the processing performance of the NPU is improved.

Description

technical field [0001] The present application relates to the technical field of convolutional neural networks, in particular to a large convolution kernel hardware implementation method, computer equipment and storage media. Background technique [0002] In the field of convolutional neural network technology, large convolution kernels generally refer to 5×5 convolution kernels and larger convolution kernels. The existing NPU (Neural-network Processing Unit, embedded neural network processor) already has convolution hardware that can directly implement smaller convolution kernels such as 1×1 convolution kernel and 3×3 convolution kernel, but there is no Convolution hardware that can directly and simply implement large convolution kernels such as 5×5 convolution kernels. [0003] When the NPU encounters convolution operations of large convolution kernels during processing, these convolution operations are generally reloaded into the CPU (central processing unit, central pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/063G06N3/04

CPCG06N3/063G06N3/045

Inventor 王丹阳杨东天陶为王中风林军

Owner 南京风兴科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Large convolution kernel hardware implementation method, computer device and storable medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology