Programmable depth neural network processor

A deep neural network and processor technology, applied in the field of programmable deep neural network processors, can solve the problems of high power consumption, frequent transmission of chip data, etc., and achieve the effect of low power consumption, low cost, and reduced computing performance

Active Publication Date: 2018-09-11
周军
View PDF10 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Repeated loading of the same filter will lead to frequent data transmission on-chip or off-chip, resulting in large power consum

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Programmable depth neural network processor
  • Programmable depth neural network processor
  • Programmable depth neural network processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] Embodiment 1: see Figure 1 to Figure 7 . In the prior art, by figure 1 It can be seen that since the points in the output feature map are calculated on a behavior basis, it is necessary to wait for multiple lines in the output feature map to complete. This makes pipelining difficult, and it also requires a first-in-first-out memory to store all points in a row, which increases hardware overhead.

[0059] Depend on figure 2 It can be seen that the present invention and figure 1 Differently, we propose a cluster-based convolution operation, which computes the points in the output feature map in units of clusters instead of rows.

[0060] Depend on image 3 , Figure 4 , Figure 5 It can be seen that in the prior art, after convolution, the output results of different input feature maps need to be added (such as image 3 ), which is usually done by computing points with the same location from different input feature maps and adding them together (eg Figure 4 ),...

Embodiment 2

[0081] Example 2: see Figure 8 , the system constructs a block diagram of a specific embodiment. Among them, DDR3, JTAG, DDR controller, selector, arbitrator, feature map buffer and filter buffer constitute the storage part of the programmable deep neural network processor. The data comes from three parts, and one part is loaded through the JTAG port. The data, that is, user instructions and other upper instructions, part of which is data such as weights and feature maps, and part of it is intermediate data processed by the present invention, which needs to be temporarily stored in DDR3.

[0082] Therefore, DDR3 is used to store data. When the program control unit is working, the data is read from DDR3 to the chip, JTAG is used to write all data into DDR3, and the DDR controller is used to control whether DDR3 is read or written; the data passes through the DDR controller. After the read and write control, enter the arbitrator through the selector, where the selector is used...

Embodiment 3

[0085] Embodiment 3: see image 3 and Figure 4 , assuming one input feature map and one output feature map.

[0086] The pixel of the input feature map is Xin*Xin is 256*256, the pixel of the corresponding weight data is 11*11, and the convolution step S is 4;

[0087] Its processing method is:

[0088] (1) The program control unit obtains the user instruction, analyzes the user instruction, and obtains the parameters of the convolutional neural network; the parameters include that the pixel of the input feature map is Xin*Xin, which is 256*256, and the pixel Y*Y of the corresponding weight data is 11*11, the convolution step size S is 4, the input feature map is one, and the output feature map is one;

[0089] Then, the program control unit reads a feature map from the feature map buffer as an input feature map, and obtains its corresponding weight data from the filter buffer according to the input feature map, wherein the pixel of the input image is Xin*Xin , the pixel ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a programmable depth neural network processor, which comprises a program control unit, a filter cache area and a characteristic graph cache area, wherein the characteristic graph cache area is used for caching a plurality of characteristic graphs, and the filter cache area is used for caching weight data matched with the characteristic graph. The programmable depth neural network processor further comprises a layer processing engine, wherein the convolution unit part of the layer processing engine comprises a multiply accumulation unit, a convolution accumulation unit and a characteristic graph accumulation unit, which are arranged in order. The characteristic graph cache area and the filter cache area are connected with the input end of the layer processing engine,and a data shaping and multiplexing unit is further arranged between the characteristic graph buffer area and the input end of the layer processing engine. According to the invention, the multiplex control of the multiply accumulation unit, the characteristic graph data reading control and the characteristic graph accumulation control are carried out, and redundant data removal control is achieved, so that the programmable depth neural network processor with low power consumption and low cost is achieved.

Description

technical field [0001] The invention relates to a deep neural network processor, in particular to a programmable deep neural network processor. Background technique [0002] Today, artificial intelligence based on deep neural networks has been proven to assist or even replace humans in many applications, such as autonomous driving, image recognition, medical diagnosis, gaming, financial data analysis, and search engines. This makes artificial intelligence algorithms a research hotspot. However, the related algorithms lack the matching hardware (especially the core chip) support. Traditional CPUs and GPUs are not specifically developed for artificial intelligence algorithms, and have major problems in terms of performance, power consumption, and hardware overhead. In recent years, there have been some dedicated artificial intelligence processors, which are mainly based on FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit) platforms, such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045
Inventor 周军王波
Owner 周军
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products