Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An ARM-based Embedded Convolutional Neural Network Acceleration Method

A convolutional neural network and convolutional neural technology, applied in the field of embedded convolutional neural network acceleration, can solve problems such as inefficiency, achieve the effect of wide use space and improve computing efficiency

Active Publication Date: 2022-03-25
SOUTH CHINA UNIV OF TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although there are so many lightweight convolutional neural networks, it is not efficient to directly deploy them to run on embedded devices

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An ARM-based Embedded Convolutional Neural Network Acceleration Method
  • An ARM-based Embedded Convolutional Neural Network Acceleration Method
  • An ARM-based Embedded Convolutional Neural Network Acceleration Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The optimization method of the present invention will be further described in detail in conjunction with the drawings and MobileNetV1 below, but the present invention is also applicable to other neural networks using 1×1 convolution and 3×3 depth separable convolution.

[0036] Such as image 3 As shown, the ARM-based embedded convolutional neural network acceleration method provided by the present invention comprises the following steps:

[0037] Step 1, use Caffe or other deep learning frameworks to train the lightweight convolutional neural network MobileNetV1.

[0038] Step 2, export the trained MobileNetV1 network structure and weights to a file.

[0039] Step 3, the design program imports the weight file, and realizes the forward calculation of the neural network according to the trained network structure. Different layers in the neural network can be represented by different functions. Function parameters include layer specification parameters, input feature ma...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an ARM-based embedded convolutional neural network acceleration method, which overcomes the problems of insufficient hardware resources of embedded devices and high computational complexity of the convolutional neural network. The time-consuming 1×1 convolution and 3×3 depth-wise separable convolution commonly used in lightweight convolutional neural networks are optimized using ARM NEON technology. In particular, memory rearrangement is performed first for 1×1 convolution, and then ARM NEON vector optimization is used. For 3×3 depth separable convolution, ARM NEON vector optimization is directly performed, which accelerates the calculation of convolutional neural networks and makes full use of It saves the hardware computing resources of embedded devices, making the convolutional neural network deployed on embedded terminals run faster and more practical.

Description

technical field [0001] The present invention relates to the technical field of embedded convolutional neural network acceleration, in particular to an ARM-based embedded convolutional neural network acceleration method. Background technique [0002] Deep learning algorithms based on convolutional neural networks have achieved great success in various fields of computer vision. However, with the continuous improvement of the performance of the deep convolutional neural network, the number of parameters of the network is increasing, and the amount of calculation is also becoming larger and larger. Due to the high requirements on hardware computing power of deep convolutional neural networks, it has become a challenge to deploy deep convolutional neural networks on devices with limited computing resources such as embedded devices. [0003] At present, it has become a feasible method to design a lightweight convolutional neural network structure and deploy the structure to embe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/04G06N3/063G06N3/08
CPCG06N3/063G06N3/08G06N3/045
Inventor 毕盛张英杰董敏
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products