A hardware architecture for inference acceleration for convolutional neural network and its working method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neural network and hardware architecture technology, applied in the field of integrated circuit processor hierarchy design, can solve problems such as inability to achieve parallel computing, high power consumption, and idle computing units.

Active Publication Date: 2021-03-02

SHANDONG LINGNENG ELECTRONIC TECH CO LTD

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logical operations (ALU modules), and the calculation instructions are executed sequentially one by one, and parallel computing cannot be achieved.

Although the GPU can do parallel computing, it can only process one image at a time, which limits the speed of inference, and consumes a lot of power, so it cannot be applied to embedded devices.

The method of inference based on FPGA is proposed in the prior art, but the prior art does not make full use of the logical computing unit, and many computing units are idle during the reasoning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0066] A hardware architecture for inference acceleration for convolutional neural networks, such as figure 1 As shown, including a preprocessing unit, a hardware acceleration unit, and a storage unit;

[0067] The preprocessing unit is used to preprocess the input original image frame;

[0068] The hardware acceleration unit is used to read in the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the calculation of the fully connected layer is performed. After the calculation of the fully connected layer is completed, the output calculation Feature judgment result, calculating the feature judgment result refers to judging the probability that the input picture conforms to each different result; that is, inferring and judging the input picture and then outputting the result, that is, judging the probability that the input picture conforms to each different result.

[0069] The...

Embodiment 2

[0071] According to a hardware architecture for reasoning acceleration of convolutional neuron networks described in Embodiment 1, the difference is that,

[0072] The preprocessing unit includes ARM, and the preprocessing unit is connected to the hardware acceleration unit through the AXI bus controller; the CPU (ARM) is the FPGA's own CPU, and supports the AXI bus structure, and the FPGA logic performs data interaction with the ARM through the AXI bus structure. The hardware acceleration unit includes several RAMs, RAM controllers, cropping modules, address control modules, data allocation control modules, and convolution array modules; all RAMs are composed of double buffers to improve efficiency. The RAM is a double buffer, which increases data sharing, reduces data read redundancy, and considers maximizing support for parallel PE computing.

[0073] ARM sequentially performs image supplementation on the input original image frame, converts floating-point data to fixed-poi...

Embodiment 3

[0076] The working method of the hardware architecture described in embodiment 2 includes:

[0077] (1) ARM preprocesses the input original image frame. The preprocessing includes sequentially performing image supplementation, converting floating-point data to fixed-point data, and configuring the logic register of FPGA; converting floating-point data to fixed-point data refers to converting floating-point data to fixed-point 8bits data. Configuring the logical registers of the FPGA refers to sending data such as weights and offsets to the logical registers using the AXI bus. After the configuration is completed, the input image can be used for inference. The connection relationship of the logic registers of the FPGA is fixed inside the FPGA, such as figure 1 shown;

[0078] (2) The AXI bus controller reads the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter to several RAMs; including: the AXI bus controller judges the origin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention relates to a hardware architecture and working method for reasoning acceleration of a convolutional neural network, including a preprocessing unit, a hardware acceleration unit, and a storage unit; the preprocessing unit is used to preprocess the original input original image frame; The hardware acceleration unit is used to read in the preprocessed original image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the fully connected layer calculation is performed. After the fully connected layer calculation is completed, Output calculation feature judgment results; the storage unit is used to store the original input original image frame, convolution kernel coefficient, offset parameter, and the output data of each convolution and the output data of the fully connected layer. The invention solves the problems of traditional processors such as slow speed, large delay, and inability to realize real-time reasoning, and provides a new solution for the design of a processor for reasoning calculations for CNN convolutional neuron networks.

Description

technical field [0001] The invention relates to a hardware framework and a working method for accelerating reasoning for a convolutional neuron network, and belongs to the technical field of hierarchical structure design of integrated circuit processors. Background technique [0002] With the rapid development of artificial intelligence technology, CNN convolutional neural network has developed into an advanced computer vision target recognition algorithm, which has a wide range of applications in feature extraction, target recognition, face recognition and other fields. However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logic operations (ALU modu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/04G06N3/063

CPCG06N3/063G06N3/045

Inventor 朱顺意

Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A hardware architecture for inference acceleration for convolutional neural network and its working method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology