Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neuron network and hardware architecture technology, applied in the field of integrated circuit processor hierarchy design, can solve the problems of idle computing units, high power consumption, and inability to apply embedded devices

Active Publication Date: 2018-06-01

SHANDONG LINGNENG ELECTRONIC TECH CO LTD

View PDF3 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logical operations (ALU modules), and the calculation instructions are executed sequentially one by one, and parallel computing cannot be achieved.

Although the GPU can do parallel computing, it can only process one image at a time, which limits the speed of inference, and consumes a lot of power, so it cannot be applied to embedded devices.

The method of inference based on FPGA is proposed in the prior art, but the prior art does not make full use of the logical computing unit, and many computing units are idle during the reasoning process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0066] A hardware architecture for inference acceleration for convolutional neural networks, such as figure 1 As shown, including a preprocessing unit, a hardware acceleration unit, and a storage unit;

[0067] The preprocessing unit is used to preprocess the input original image frame;

[0068] The hardware acceleration unit is used to read in the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter for convolution. After the convolution is completed, the calculation of the fully connected layer is performed. After the calculation of the fully connected layer is completed, the output calculation Feature judgment result, calculating the feature judgment result refers to judging the probability that the input picture conforms to each different result; that is, inferring and judging the input picture and then outputting the result, that is, judging the probability that the input picture conforms to each different result.

[0069] The...

Embodiment 2

[0071] According to a hardware architecture for reasoning acceleration of convolutional neuron networks described in Embodiment 1, the difference is that,

[0072] The preprocessing unit includes ARM, and the preprocessing unit is connected to the hardware acceleration unit through the AXI bus controller; the CPU (ARM) is the FPGA's own CPU, and supports the AXI bus structure, and the FPGA logic performs data interaction with the ARM through the AXI bus structure. The hardware acceleration unit includes several RAMs, RAM controllers, cropping modules, address control modules, data allocation control modules, and convolution array modules; all RAMs are composed of double buffers to improve efficiency. The RAM is a double buffer, which increases data sharing, reduces data read redundancy, and considers maximizing support for parallel PE computing.

[0073] ARM sequentially performs image supplementation on the input original image frame, converts floating-point data to fixed-poi...

Embodiment 3

[0076] The working method of the hardware architecture described in embodiment 2 includes:

[0077] (1) ARM preprocesses the input original image frame. The preprocessing includes sequentially performing image supplementation, converting floating-point data to fixed-point data, and configuring the logic register of FPGA; converting floating-point data to fixed-point data refers to converting floating-point data to fixed-point 8bits data. Configuring the logical registers of the FPGA refers to sending data such as weights and offsets to the logical registers using the AXI bus. After the configuration is completed, the input image can be used for inference. The connection relationship of the logic registers of the FPGA is fixed inside the FPGA, such as figure 1 shown;

[0078] (2) The AXI bus controller reads the filled image frame to be convoluted, the convolution kernel coefficient, and the offset parameter to several RAMs; including: the AXI bus controller judges the origin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a hardware framework for carrying out reasoning acceleration by aiming at a convolution neural network, and a working method thereof. The hardware framework comprises a preprocessing unit, a hardware acceleration unit and a storage unit, wherein the preprocessing unit is used for preprocessing an original image frame which is originally input; the hardware acceleration unit is used for reading the preprocessed original image frame to be convoluted, a convolution kernel coefficient and an offset parameter for convolution, executing fully connected layer calculation after convolution is finished, and outputting a calculation characteristic judgment result after the fully connected layer calculation is finished; the storage unit is used for storing the original imageframe which is originally input, the convolution kernel coefficient, the offset parameter, output data obtained by each convolution and the output data of the fully connected layer. According to the hardware framework, the problems that a traditional processor is low in speed and high in time delay, real-time reasoning can not be realized and the like are solved, and a new solution is provided fordesigning the processor which carries out the reasoning calculation by aiming at the CNN (Convolution Neural Network).

Description

technical field [0001] The invention relates to a hardware framework and a working method for accelerating reasoning for a convolutional neuron network, and belongs to the technical field of hierarchical structure design of integrated circuit processors. Background technique [0002] With the rapid development of artificial intelligence technology, CNN convolutional neural network has developed into an advanced computer vision target recognition algorithm, which has a wide range of applications in feature extraction, target recognition, face recognition and other fields. However, although the convolutional neural network (CNN) is widely used, relying on traditional CPU processors and DPU processors to operate the CNN convolutional neural network for reasoning has many limitations: the reasoning process in the convolutional neural network A large number of calculations are required in the CPU, but in the CPU, there are not many parts responsible for logic operations (ALU modu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/063

CPCG06N3/063G06N3/045

Inventor 朱顺意

Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Hardware framework for carrying out reasoning acceleration by aiming at convolution neural network, and working method thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology