Programmable Deep Neural Network Processor

A deep neural network and processor technology, applied in the field of programmable deep neural network processors, can solve the problems of high power consumption, frequent transmission of chip data, etc., achieve low power consumption, low cost, and improve hardware utilization.

Active Publication Date: 2020-09-04
周军
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Repeated loading of the same filter will lead to frequent data transmission on-chip or off-chip, resulting in large power consumption
[0009] Fifth: For deep convolutional neural networks, multiply-accumulate operations generate most of the power consumption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Programmable Deep Neural Network Processor
  • Programmable Deep Neural Network Processor
  • Programmable Deep Neural Network Processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] Embodiment 1: see Figure 1 to Figure 7 . In the prior art, by figure 1 It can be seen that since the points in the output feature map are calculated on a behavior basis, it is necessary to wait for multiple lines in the output feature map to complete. This makes pipelining difficult, and it also requires a first-in-first-out memory to store all points in a row, which increases hardware overhead.

[0059] Depend on figure 2 It can be seen that the present invention and figure 1 Differently, we propose a cluster-based convolution operation, which computes the points in the output feature map in units of clusters instead of rows.

[0060] Depend on image 3 , Figure 4 , Figure 5 It can be seen that in the prior art, after convolution, the output results of different input feature maps need to be added (such as image 3 ), which is usually done by computing points with the same location from different input feature maps and adding them together (eg Figure 4 ),...

Embodiment 2

[0081] Example 2: see Figure 8 , the system constructs a block diagram of a specific embodiment. Among them, DDR3, JTAG, DDR controller, selector, arbitrator, feature map buffer and filter buffer constitute the storage part of the programmable deep neural network processor. The data comes from three parts, and one part is loaded through the JTAG port. The data, that is, user instructions and other upper instructions, part of which is data such as weights and feature maps, and part of it is intermediate data processed by the present invention, which needs to be temporarily stored in DDR3.

[0082] Therefore, DDR3 is used to store data. When the program control unit is working, the data is read from DDR3 to the chip, JTAG is used to write all data into DDR3, and the DDR controller is used to control whether DDR3 is read or written; the data passes through the DDR controller. After the read and write control, enter the arbitrator through the selector, where the selector is used...

Embodiment 3

[0085] Embodiment 3: see image 3 and Figure 4 , assuming one input feature map and one output feature map.

[0086] The pixel of the input feature map is Xin*Xin is 256*256, the pixel of the corresponding weight data is 11*11, and the convolution step S is 4;

[0087] Its processing method is:

[0088] (1) The program control unit obtains the user instruction, analyzes the user instruction, and obtains the parameters of the convolutional neural network; the parameters include that the pixel of the input feature map is Xin*Xin, which is 256*256, and the pixel Y*Y of the corresponding weight data is 11*11, the convolution step size S is 4, the input feature map is one, and the output feature map is one;

[0089] Then, the program control unit reads a feature map from the feature map buffer as an input feature map, and obtains its corresponding weight data from the filter buffer according to the input feature map, wherein the pixel of the input image is Xin*Xin , the pixel ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a programmable deep neural network processor, which includes a program control unit, a filter buffer area, and a feature map buffer area. The feature map buffer area is used to cache multiple feature maps. The filter buffer area is used for It also includes a layer processing engine, and the convolution unit part of the layer processing engine includes a multiplication and accumulation unit, a convolution accumulation unit and a feature map accumulation unit arranged in sequence, and the feature map cache The area and the filter buffer area are connected to the input end of the layer processing engine, and a data shaping and multiplexing unit is also arranged between the feature map buffer area and the input end of the layer processing engine. The present invention realizes a low-power, low-cost programmable deep neural network processor through the multiplexing control of the multiply-accumulate unit, the feature map data reading control, the feature map accumulation control, and the redundant data elimination control.

Description

technical field [0001] The invention relates to a deep neural network processor, in particular to a programmable deep neural network processor. Background technique [0002] Today, artificial intelligence based on deep neural networks has been proven to assist or even replace humans in many applications, such as autonomous driving, image recognition, medical diagnosis, gaming, financial data analysis, and search engines. This makes artificial intelligence algorithms a research hotspot. However, the related algorithms lack the matching hardware (especially the core chip) support. Traditional CPUs and GPUs are not specifically developed for artificial intelligence algorithms, and have major problems in terms of performance, power consumption, and hardware overhead. In recent years, there have been some dedicated artificial intelligence processors, which are mainly based on FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit) platforms, such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045
Inventor 周军王波
Owner 周军
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products