Data parallelism-based deep learning processor architecture and method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A processor architecture and deep learning technology, applied in the field of deep learning processor architecture, can solve problems such as energy loss, inability to optimize energy consumption, consume power consumption and data bandwidth, etc., to achieve improved utilization, optimized storage structure, The effect of reducing system latency

Active Publication Date: 2018-07-27

SHANDONG LINGNENG ELECTRONIC TECH CO LTD

View PDF6 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Today's mainstream neural networks need to perform a large amount of data calculation and transmission during operation. The two-way data transmission between on-chip and off-chip causes a lot of energy loss, and the access of intermediate data and output data consumes a lot of power consumption and data bandwidth. , can not achieve the optimization of energy consumption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0037] A deep learning processor architecture based on data parallelism, such as figure 1 As shown, it includes an input buffer area (In_Buf), a PE array, several on-chip buffer areas (SRam), and an output buffer area (Out_Buf). A group of on-chip buffer areas is arranged between two adjacent PE arrays;

[0038] Since the reading and writing of data from off-chip to on-chip consumes a lot of energy, the off-chip data is written into the input buffer area for temporary storage and pre-read, so that when the PE array reads the data in the input buffer area, the The input buffer area reads data from off-chip synchronously; it increases the continuity of data read and write.

[0039] The PE array is used to read the input buffer data and perform data convolution and pooling calculations;

[0040] The on-chip buffer area is used to store temporary data processed by the PE array; it does not return to off-chip storage to reduce power consumption.

[0041] The output buffer area i...

Embodiment 2

[0044] According to the data parallel-based deep learning processor architecture described in Embodiment 1, the difference is that the PE array includes N*N PE units. Such as figure 2 shown.

[0045] The data is written into the PE unit by the input buffer for processing. Since the amount of data read each time cannot be determined, the number of PE units required needs to be determined according to the amount of data read each time.

Embodiment 3

[0047] The processor framework described in embodiment 1 or 2 is based on the deep learning method of data parallelism, such as image 3 As shown, it is set that m frames of data need to be processed, m frames of data are stored in the input buffer area, and the PE array is set to include: {PE_A_1 matrix, PE_A_2 matrix...PE_A_k-2 matrix, PE_A_k-1 matrix, PE_A_k matrix}, PE_A_i matrix has N columns, 1≤i≤k, PE_A_1 matrix, PE_A_2 matrix...PE_A_k-2 matrix, PE_A_k-1 matrix, PE_A_k matrix The sum of rows is N; for example, divide the 5*5 matrix into rows 2*5 matrix (including the first row and the second row), 2*5 matrix (including the third row and the fourth row), 1*5 matrix (including the fifth row); including:

[0048] (1) In the first calculation cycle, the first frame of data is read from the input buffer into the PE_A_1 array for the first layer of convolution calculation, and the calculated feature sequence is stored in the SRAM of the PE_A_1 array;

[0049] (2) In the seco...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a data parallelism-based deep learning processor architecture and method. The architecture comprises an input buffer region (In_Buf), PE arrays, multiple on-chip buffer regions and an output buffer region (Out_Buf); and a group of the on-chip buffer regions is arranged between the two adjacent N*N PE arrays. By configuring the N*N PE arrays, on-chip transmission of data can be realized and two-way transmission of the data and off-chip is reduced to the maximum extent; energy consumption of on-chip and off-chip transmission of conventional neural network data is reduced; and a new solution is provided for reducing the energy consumption problem of a neural network.

Description

technical field [0001] The invention relates to a deep learning processor architecture and method based on data parallelism, and belongs to the technical field of integrated circuit processor architecture design. Background technique [0002] The concept of deep learning originated from the study of artificial neural networks. The concept of deep learning was proposed by Hinton et al. in 2006. Based on the deep belief network (DBN), a non-supervised greedy layer-by-layer training algorithm is proposed, which brings hope to solve the optimization problems related to the deep structure, and then a multi-layer autoencoder deep structure is proposed. In addition, the convolutional neural network proposed by Lecun et al. is the first real multi-layer structure learning algorithm, which uses the spatial relative relationship to reduce the number of parameters to improve training performance. [0003] Deep learning is a method based on representation learning of data in machine l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F15/78G06F9/50

CPCG06F9/5016G06F15/781Y02D10/00

Inventor 朱顺意

Owner SHANDONG LINGNENG ELECTRONIC TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data parallelism-based deep learning processor architecture and method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology