Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data parallelism-based deep learning processor architecture and method

A processor architecture and deep learning technology, applied in the field of deep learning processor architecture, can solve problems such as energy loss, inability to optimize energy consumption, consume power consumption and data bandwidth, etc., to achieve improved utilization, optimized storage structure, The effect of reducing system latency

Active Publication Date: 2018-07-27
SHANDONG LINGNENG ELECTRONIC TECH CO LTD
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Today's mainstream neural networks need to perform a large amount of data calculation and transmission during operation. The two-way data transmission between on-chip and off-chip causes a lot of energy loss, and the access of intermediate data and output data consumes a lot of power consumption and data bandwidth. , can not achieve the optimization of energy consumption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data parallelism-based deep learning processor architecture and method
  • Data parallelism-based deep learning processor architecture and method
  • Data parallelism-based deep learning processor architecture and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] A deep learning processor architecture based on data parallelism, such as figure 1 As shown, it includes an input buffer area (In_Buf), a PE array, several on-chip buffer areas (SRam), and an output buffer area (Out_Buf). A group of on-chip buffer areas is arranged between two adjacent PE arrays;

[0038] Since the reading and writing of data from off-chip to on-chip consumes a lot of energy, the off-chip data is written into the input buffer area for temporary storage and pre-read, so that when the PE array reads the data in the input buffer area, the The input buffer area reads data from off-chip synchronously; it increases the continuity of data read and write.

[0039] The PE array is used to read the input buffer data and perform data convolution and pooling calculations;

[0040] The on-chip buffer area is used to store temporary data processed by the PE array; it does not return to off-chip storage to reduce power consumption.

[0041] The output buffer area i...

Embodiment 2

[0044] According to the data parallel-based deep learning processor architecture described in Embodiment 1, the difference is that the PE array includes N*N PE units. Such as figure 2 shown.

[0045] The data is written into the PE unit by the input buffer for processing. Since the amount of data read each time cannot be determined, the number of PE units required needs to be determined according to the amount of data read each time.

Embodiment 3

[0047] The processor framework described in embodiment 1 or 2 is based on the deep learning method of data parallelism, such as image 3 As shown, it is set that m frames of data need to be processed, m frames of data are stored in the input buffer area, and the PE array is set to include: {PE_A_1 matrix, PE_A_2 matrix...PE_A_k-2 matrix, PE_A_k-1 matrix, PE_A_k matrix}, PE_A_i matrix has N columns, 1≤i≤k, PE_A_1 matrix, PE_A_2 matrix...PE_A_k-2 matrix, PE_A_k-1 matrix, PE_A_k matrix The sum of rows is N; for example, divide the 5*5 matrix into rows 2*5 matrix (including the first row and the second row), 2*5 matrix (including the third row and the fourth row), 1*5 matrix (including the fifth row); including:

[0048] (1) In the first calculation cycle, the first frame of data is read from the input buffer into the PE_A_1 array for the first layer of convolution calculation, and the calculated feature sequence is stored in the SRAM of the PE_A_1 array;

[0049] (2) In the seco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a data parallelism-based deep learning processor architecture and method. The architecture comprises an input buffer region (In_Buf), PE arrays, multiple on-chip buffer regions and an output buffer region (Out_Buf); and a group of the on-chip buffer regions is arranged between the two adjacent N*N PE arrays. By configuring the N*N PE arrays, on-chip transmission of data can be realized and two-way transmission of the data and off-chip is reduced to the maximum extent; energy consumption of on-chip and off-chip transmission of conventional neural network data is reduced; and a new solution is provided for reducing the energy consumption problem of a neural network.

Description

technical field [0001] The invention relates to a deep learning processor architecture and method based on data parallelism, and belongs to the technical field of integrated circuit processor architecture design. Background technique [0002] The concept of deep learning originated from the study of artificial neural networks. The concept of deep learning was proposed by Hinton et al. in 2006. Based on the deep belief network (DBN), a non-supervised greedy layer-by-layer training algorithm is proposed, which brings hope to solve the optimization problems related to the deep structure, and then a multi-layer autoencoder deep structure is proposed. In addition, the convolutional neural network proposed by Lecun et al. is the first real multi-layer structure learning algorithm, which uses the spatial relative relationship to reduce the number of parameters to improve training performance. [0003] Deep learning is a method based on representation learning of data in machine l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F15/78G06F9/50
CPCG06F9/5016G06F15/781Y02D10/00
Inventor 朱顺意
Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products