Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A hardware architecture for inference acceleration for convolutional neural network and its working method

A neural network and hardware architecture technology, applied in the field of integrated circuit processor hierarchy design, can solve problems such as inability to achieve parallel computing, high power consumption, and idle computing units.

Active Publication Date: 2021-03-02
SHANDONG LINGNENG ELECTRONIC TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logical operations (ALU modules), and the calculation instructions are executed sequentially one by one, and parallel computing cannot be achieved.
Although the GPU can do parallel computing, it can only process one image at a time, which limits the speed of inference, and consumes a lot of power, so it cannot be applied to embedded devices.
The method of inference based on FPGA is proposed in the prior art, but the prior art does not make full use of the logical computing unit, and many computing units are idle during the reasoning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hardware architecture for inference acceleration for convolutional neural network and its working method
  • A hardware architecture for inference acceleration for convolutional neural network and its working method
  • A hardware architecture for inference acceleration for convolutional neural network and its working method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] A hardware architecture for inference acceleration for convolutional neural networks, such as figure 1 As shown, including a preprocessing unit, a hardware acceleration unit, and a storage unit;

[0067] The preprocessing unit is used to preprocess the input original image frame;

[0068] The hardware acceleration unit is used to read in the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the calculation of the fully connected layer is performed. After the calculation of the fully connected layer is completed, the output calculation Feature judgment result, calculating the feature judgment result refers to judging the probability that the input picture conforms to each different result; that is, inferring and judging the input picture and then outputting the result, that is, judging the probability that the input picture conforms to each different result.

[0069] The...

Embodiment 2

[0071] According to a hardware architecture for reasoning acceleration of convolutional neuron networks described in Embodiment 1, the difference is that,

[0072] The preprocessing unit includes ARM, and the preprocessing unit is connected to the hardware acceleration unit through the AXI bus controller; the CPU (ARM) is the FPGA's own CPU, and supports the AXI bus structure, and the FPGA logic performs data interaction with the ARM through the AXI bus structure. The hardware acceleration unit includes several RAMs, RAM controllers, cropping modules, address control modules, data allocation control modules, and convolution array modules; all RAMs are composed of double buffers to improve efficiency. The RAM is a double buffer, which increases data sharing, reduces data read redundancy, and considers maximizing support for parallel PE computing.

[0073] ARM sequentially performs image supplementation on the input original image frame, converts floating-point data to fixed-poi...

Embodiment 3

[0076] The working method of the hardware architecture described in embodiment 2 includes:

[0077] (1) ARM preprocesses the input original image frame. The preprocessing includes sequentially performing image supplementation, converting floating-point data to fixed-point data, and configuring the logic register of FPGA; converting floating-point data to fixed-point data refers to converting floating-point data to fixed-point 8bits data. Configuring the logical registers of the FPGA refers to sending data such as weights and offsets to the logical registers using the AXI bus. After the configuration is completed, the input image can be used for inference. The connection relationship of the logic registers of the FPGA is fixed inside the FPGA, such as figure 1 shown;

[0078] (2) The AXI bus controller reads the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter to several RAMs; including: the AXI bus controller judges the origin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a hardware architecture and working method for reasoning acceleration of a convolutional neural network, including a preprocessing unit, a hardware acceleration unit, and a storage unit; the preprocessing unit is used to preprocess the original input original image frame; The hardware acceleration unit is used to read in the preprocessed original image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the fully connected layer calculation is performed. After the fully connected layer calculation is completed, Output calculation feature judgment results; the storage unit is used to store the original input original image frame, convolution kernel coefficient, offset parameter, and the output data of each convolution and the output data of the fully connected layer. The invention solves the problems of traditional processors such as slow speed, large delay, and inability to realize real-time reasoning, and provides a new solution for the design of a processor for reasoning calculations for CNN convolutional neuron networks.

Description

technical field [0001] The invention relates to a hardware framework and a working method for accelerating reasoning for a convolutional neuron network, and belongs to the technical field of hierarchical structure design of integrated circuit processors. Background technique [0002] With the rapid development of artificial intelligence technology, CNN convolutional neural network has developed into an advanced computer vision target recognition algorithm, which has a wide range of applications in feature extraction, target recognition, face recognition and other fields. However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logic operations (ALU modu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045
Inventor 朱顺意
Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products