Compiling method based on OPU instruction set and compiler

A compiling method and instruction set technology, which is applied in the field of compiling method and compiler based on OPU instruction set, can solve the problem of high complexity of hardware upgrade, achieve the effect of saving data transmission time and realizing utilization efficiency

Active Publication Date: 2019-07-30
梁磊
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But existing FPGA acceleration work aims to generate specific separate accelerators for different CNNs, which guarantees reasonable high performance for RTL-based or HLS-RTL-based templates, but with high hardware upgrade complexity in case of tuning the target network
Therefore, in order to realize that there is no need to generate specific hardware description code for a separate network, and it does not involve reprogramming the FPGA, the entire deployment process is completed by command configuration. Different target network configurations are configured through commands, and the FPGA accelerator is not reconfigured. An instruction set needs to be defined. , the compiler compiles the defined instruction set to generate instruction sequences, but in the compilation process, it faces the problem of off-chip memory communication delay and different target networks How to solve the universality problem of optimal performance configuration

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compiling method based on OPU instruction set and compiler
  • Compiling method based on OPU instruction set and compiler
  • Compiling method based on OPU instruction set and compiler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] A kind of compiling method based on OPU instruction set, comprises the steps:

[0078] Convert the CNN definition files of different target networks, and select the optimal accelerator configuration mapping according to the defined OPU instruction set, and generate instructions for different target networks to complete the mapping;

[0079] Wherein, the conversion includes file conversion, network layer reorganization and generation of a unified intermediate representation IR;

[0080] The mapping includes parsing the IR, searching the solution space according to the parsing information to obtain a mapping method that guarantees the maximum throughput, and expressing the above mapping solution as an instruction sequence according to the defined OPU instruction set to generate instructions for different target networks.

[0081] A compiler based on the OPU instruction set, including

[0082] The conversion unit is used for file conversion, network layer reorganization a...

Embodiment 2

[0097] Based on Example 1, a conventional CNN definition contains various types of layers that are connected from top to bottom to form a complete flow. The intermediate data passed between layers is called a feature map, which usually requires a large storage space and can only be performed off-chip. processed in memory. Since off-chip memory communication delay is the main optimization factor, it is necessary to overcome the problem of how to reduce data communication with off-chip, through layer reorganization, define the main layer and auxiliary layer to reduce off-chip DRAM access and avoid unnecessary write / read Back to the operation, the technical details are as follows:

[0098] After analyzing the format of the CNN definition file, convert the file, compress and extract the network information;

[0099] The network operation is reorganized into multiple layer groups. The layer group includes a main layer and multiple auxiliary layers. The results between the layer gr...

Embodiment 3

[0107] Based on Embodiment 1 or 2, in order to solve the problem of how to find the optimal performance configuration / how to solve the universality problem of the optimal performance configuration, the solution space is searched during the mapping process to obtain a mapping method that guarantees the maximum throughput, through the described The mapping method is used for mapping, including the following steps:

[0108] Step a1: Calculate the peak theoretical value, the calculation is shown in the following formula:

[0109] T=f*TN PE

[0110] Among them, T represents the throughput (operations per second), f represents the operating frequency, TN PE Indicates the total number of PEs available on the chip;

[0111] Step a2: Define the minimum value of the time L required for the entire network calculation, and the calculation is shown in the following formula:

[0112]

[0113] Among them, α i Indicates the PE efficiency of the i-th layer, C i Indicates the amount of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a compiling method based on an OPU instruction set and a compiler, and relates to the field of compiling methods based on CNN acceleration. The method comprises the following steps: converting CNN definition files of different target networks, selecting an optimal accelerator to configure mapping according to defined OPU instruction sets, and generating instructions of different target networks to complete mapping; wherein the conversion comprises file conversion, layer recombination of a network and generation of a unified intermediate representation IR; wherein the mapping comprises the steps of analyzing the IR, searching a solution space according to the analysis information to obtain a mapping mode for ensuring the maximum throughput, and generating instructions of different target networks based on the solved mapping mode according to the defined OPU instruction set. According to the method, the communication delay problem of an off-chip memory and the optimal accelerator configuration problem of finding different target networks are overcome, instructions of different target networks which can be executed by an OPU are output, CNN acceleration is completed by means of instruction configuration, and an FPGA accelerator is not reconstructed.

Description

technical field [0001] The invention relates to the field of CNN-accelerated compilation methods, in particular to a compilation method and a compiler based on an OPU instruction set. Background technique [0002] Deep convolutional neural networks (CNNs) have demonstrated high accuracy in various applications, such as visual object recognition, speech recognition, and object detection, etc. However, its breakthrough in accuracy comes at the cost of high computational cost, which needs to be accelerated by computing clusters, GPUs and FPGAs. Among them, the FPGA accelerator has the advantages of high energy efficiency, good flexibility, and strong computing power, especially in deep CNN applications on edge devices such as speech recognition and visual object recognition on smartphones; it usually involves architecture exploration and optimization , RTL programming, hardware implementation and software-hardware interface development. With the development, people have conduc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/30G06F8/41G06N3/04
CPCG06F9/30003G06F8/41G06N3/045Y02D10/00
Inventor 喻韵璇王铭宇
Owner 梁磊
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products