Method for accelerating multi-outlet DNN reasoning by heterogeneous processor under edge computing

A heterogeneous processor and edge computing technology, applied in neural learning methods, neural architecture, biological neural network models, etc., can solve problems such as inability to meet edge intelligent applications, low latency, etc., to improve user experience and ensure stability. Effect

Pending Publication Date: 2022-06-24
SOUTHEAST UNIV
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Therefore, the existing deep learning model reasoning acceleration method still has great limitations when applied to the scenario of edge computing and artificial intelligence applications, and cannot meet the low-latency and high-precision operation requirements of edge intelligence applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for accelerating multi-outlet DNN reasoning by heterogeneous processor under edge computing
  • Method for accelerating multi-outlet DNN reasoning by heterogeneous processor under edge computing
  • Method for accelerating multi-outlet DNN reasoning by heterogeneous processor under edge computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] Example 1: as figure 1 As shown, the present invention predicts the computation time overhead required by each network layer by training a delay prediction model, and uses this to represent the computation amount of each network layer. When only model parameters are considered, the amount of computation of each layer is mainly represented by floating-point operations (FLOPs). The types of layers include convolutional layers, pooling layers, excitation layers and fully connected layers. Important parameters that determine the amount of computation include: The size of the input feature map (W×H), the number of input and output channels (C in , C out ), the convolution kernel window size (k w ×k h ); pooling window (p w ×p h ), and the number of input and output neurons of the fully connected layer (F in , F out ), the floating-point operation is calculated as follows:

[0067]

[0068] Combined with the actual system, considering the influence of CPU usage (u)...

Embodiment 2

[0071] Example 2: as figure 1 As shown, under different CPU utilization conditions, there are different execution strategies corresponding to different execution times. Based on the time delay quantification prediction model obtained by the above-mentioned multi-path model, the "early exit probability" of the branch and the "data amount transmission delay of the intermediate feature", the multi-exit deep neural network is segmented by the segmentation prediction model. The invention parallelizes the inference process of the model backbone and the egress network, thereby improving resource utilization while reducing the inference delay of multi-path, and solving the problem of performance drop caused by redundant computing in multi-egress network inference in extreme cases . In the model segmentation strategy, it is necessary to specify the model segmentation decision variables Among them, 1-n represents n branches, the main part is executed by the GPU, and the network model...

Embodiment 3

[0077] Embodiment 3: Asynchronous execution of multi-exit network inference work. like figure 2 As shown, the online stage includes model segmentation and cooperative multi-path parallel inference steps of heterogeneous processors. After all offline stages are realized, the generated multi-path deep neural network is deployed on the heterogeneous processors of mobile smart terminal devices. , which performs online multipath parallel inference. The intelligent terminal device monitors the local computing load in real time, and predicts the corresponding task allocation decision X according to the real-time load situation after receiving the task. b , the execution of the task is divided into two parallel parts: the inference of the trunk network and the inference of the branch-exit network. When the branch-exit network is assigned to the CPU for execution, if the task can exit from the branch-exit network, the execution of the trunk network will stop further. Hierarchical re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for accelerating multi-exit DNN (Deep Neural Networks) reasoning by a heterogeneous processor under edge computing, which comprises the following steps of: firstly, respectively counting the computing cost of each layer of a deep neural network on a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit) under different loads, the classification capability of each layer exiting a branch exit in advance and the intermediate data volume of each network layer; then analyzing the data to obtain an optimal parallel combination model for distributing each layer of the deep neural network to a CPU (GPU) processor under a specific load condition; and finally, the load conditions of the CPU and the GPU and the current computing power are monitored and analyzed on a terminal device on line, a deep neural network reasoning task is segmented with the purpose of minimizing reasoning time delay, task blocks are distributed to the GPU and the CPU respectively, and finally a reasoning acceleration framework based on a heterogeneous processor is formed. According to the method, the reasoning flexibility can be improved, the accuracy is ensured, the reasoning total time delay is reduced, and the real-time and high-precision requirements of edge intelligent application are met.

Description

technical field [0001] The invention belongs to the fields of intelligent terminals and deep learning, in particular to a method for realizing optimization and acceleration of inference of a deep learning model on which applications depend under the scenario of deploying intelligent applications in intelligent terminals with heterogeneous processors, and specifically relates to a A method for accelerating multi-exit DNN inference with heterogeneous processors under edge computing. Background technique [0002] In recent years, with the continuous development of deep learning technology and the rapid popularization of smart terminals such as smart phones, smart bracelets, and various Internet of Things (IOT) devices, it has become an inevitable trend to run deep learning applications on smart terminals. In this mode, the intelligent terminal collects massive data such as the surrounding environment and user behavior in real time, and mines and analyzes these environment and u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 东方蔡光兴沈典王慧田张竞慧
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products