Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof

A neuron network and hardware architecture technology, applied in the field of integrated circuit processor hierarchy design, can solve the problems of idle computing units, high power consumption, and inability to apply embedded devices

Active Publication Date: 2018-06-01
SHANDONG LINGNENG ELECTRONIC TECH CO LTD
View PDF3 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logical operations (ALU modules), and the calculation instructions are executed sequentially one by one, and parallel computing cannot be achieved.
Although the GPU can do parallel computing, it can only process one image at a time, which limits the speed of inference, and consumes a lot of power, so it cannot be applied to embedded devices.
The method of inference based on FPGA is proposed in the prior art, but the prior art does not make full use of the logical computing unit, and many computing units are idle during the reasoning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof
  • Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof
  • Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] A hardware architecture for inference acceleration for convolutional neural networks, such as figure 1 As shown, including a preprocessing unit, a hardware acceleration unit, and a storage unit;

[0067] The preprocessing unit is used to preprocess the input original image frame;

[0068] The hardware acceleration unit is used to read in the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the calculation of the fully connected layer is performed. After the calculation of the fully connected layer is completed, the output calculation Feature judgment result, calculating the feature judgment result refers to judging the probability that the input picture conforms to each different result; that is, inferring and judging the input picture and then outputting the result, that is, judging the probability that the input picture conforms to each different result.

[0069] The...

Embodiment 2

[0071] According to a hardware architecture for reasoning acceleration of convolutional neuron networks described in Embodiment 1, the difference is that,

[0072] The preprocessing unit includes ARM, and the preprocessing unit is connected to the hardware acceleration unit through the AXI bus controller; the CPU (ARM) is the FPGA's own CPU, and supports the AXI bus structure, and the FPGA logic performs data interaction with the ARM through the AXI bus structure. The hardware acceleration unit includes several RAMs, RAM controllers, cropping modules, address control modules, data allocation control modules, and convolution array modules; all RAMs are composed of double buffers to improve efficiency. The RAM is a double buffer, which increases data sharing, reduces data read redundancy, and considers maximizing support for parallel PE computing.

[0073] ARM sequentially performs image supplementation on the input original image frame, converts floating-point data to fixed-poi...

Embodiment 3

[0076] The working method of the hardware architecture described in embodiment 2 includes:

[0077] (1) ARM preprocesses the input original image frame. The preprocessing includes sequentially performing image supplementation, converting floating-point data to fixed-point data, and configuring the logic register of FPGA; converting floating-point data to fixed-point data refers to converting floating-point data to fixed-point 8bits data. Configuring the logical registers of the FPGA refers to sending data such as weights and offsets to the logical registers using the AXI bus. After the configuration is completed, the input image can be used for inference. The connection relationship of the logic registers of the FPGA is fixed inside the FPGA, such as figure 1 shown;

[0078] (2) The AXI bus controller reads the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter to several RAMs; including: the AXI bus controller judges the origin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a hardware framework for carrying out reasoning acceleration by aiming at a convolution neural network, and a working method thereof. The hardware framework comprises a preprocessing unit, a hardware acceleration unit and a storage unit, wherein the preprocessing unit is used for preprocessing an original image frame which is originally input; the hardware acceleration unit is used for reading the preprocessed original image frame to be convoluted, a convolution kernel coefficient and an offset parameter for convolution, executing fully connected layer calculation after convolution is finished, and outputting a calculation characteristic judgment result after the fully connected layer calculation is finished; the storage unit is used for storing the original imageframe which is originally input, the convolution kernel coefficient, the offset parameter, output data obtained by each convolution and the output data of the fully connected layer. According to the hardware framework, the problems that a traditional processor is low in speed and high in time delay, real-time reasoning can not be realized and the like are solved, and a new solution is provided fordesigning the processor which carries out the reasoning calculation by aiming at the CNN (Convolution Neural Network).

Description

technical field [0001] The invention relates to a hardware framework and a working method for accelerating reasoning for a convolutional neuron network, and belongs to the technical field of hierarchical structure design of integrated circuit processors. Background technique [0002] With the rapid development of artificial intelligence technology, CNN convolutional neural network has developed into an advanced computer vision target recognition algorithm, which has a wide range of applications in feature extraction, target recognition, face recognition and other fields. However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logic operations (ALU modu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045
Inventor 朱顺意
Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products