CNN acceleration method and system based on OPU

An accelerator and mapping technology, applied in the OPU-based CNN acceleration method and system field, can solve the problems of high hardware upgrade complexity and poor versatility

Active Publication Date: 2019-07-26
梁磊
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The object of the present invention is: the present invention provides a kind of CNN acceleration method and system based on OPU, solves existing FPGA acceleration work and aims at generating specific separate accelerator for different CNN, hardware upgrade complexity is high when target network changes, The problem of poor versatility

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • CNN acceleration method and system based on OPU
  • CNN acceleration method and system based on OPU
  • CNN acceleration method and system based on OPU

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0091] A kind of CNN acceleration method based on OPU, comprises the steps:

[0092] Define the OPU instruction set;

[0093] The compiler converts the CNN definition files of different target networks, and selects the optimal accelerator configuration mapping according to the defined OPU instruction set, and generates instructions for different target networks to complete the mapping;

[0094] The OPU reads the above-mentioned compiled instructions, runs the instructions according to the parallel computing mode defined by the OPU instruction set, and completes the acceleration of different target networks;

[0095] Among them, the OPU instruction set includes unconditional instructions that are directly executed and provide configuration parameters for conditional instructions, and conditional instructions that are executed after trigger conditions are met. The OPU instruction set is defined to optimize the instruction granularity according to the CNN network research results a...

Embodiment 2

[0124] Based on embodiment 1, refine the definition OPU instruction set of this application, details are as follows:

[0125]The instruction set defined in this application needs to overcome the problem of the universality of the processor corresponding to the instruction execution instruction set. Specifically, the instruction execution time in the existing CNN acceleration system is highly uncertain, so that the instruction sequence cannot be accurately predicted and the instruction set Corresponding to the universality of the processor, the technical means adopted are: define conditional instructions, define unconditional instructions and set instruction granularity, conditional instructions define its composition, set conditional instruction registers and execution methods, the execution method is to satisfy Execute after the trigger condition written by the hardware, the register includes the parameter register and the trigger condition register; set the parameter configur...

Embodiment 3

[0141] Based on Example 1, refine the compilation steps, the details are as follows:

[0142] Convert the CNN definition files of different target networks, and select the optimal accelerator configuration mapping according to the defined OPU instruction set, and generate instructions for different target networks to complete the mapping;

[0143] Wherein, the conversion includes file conversion, network layer reorganization and generation of a unified intermediate representation IR;

[0144] The mapping includes parsing the IR, searching the solution space according to the parsing information to obtain a mapping method that guarantees the maximum throughput, and expressing the above mapping solution as an instruction sequence according to the defined OPU instruction set to generate instructions for different target networks.

[0145] The corresponding compiler includes a conversion unit, which is used to perform file conversion, network layer reorganization, and generate IR a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a CNN acceleration method and system based on OPU, and relates to the field of CNN acceleration methods based on FPGA. The method includes that an OPU instruction set is defined; the compiler converts the CNN definition files of different target networks, selects an optimal accelerator to configure mapping according to the defined OPU instruction set, and generates instructions of different target networks to complete mapping; the OPU reads the compiled instruction, operates the instruction according to a parallel computing mode defined by the OPU instruction set, and completes acceleration of different target networks. According to the method, the instruction type is defined, the instruction granularity is set, network recombination optimization is carried out, a mapping mode for guaranteeing the maximum throughput is obtained by searching a solution space, and hardware adopts a parallel computing mode; the problem that an existing FPGA acceleration work aims at generating specific independent accelerators for different CNNs is solved, and the effects that the FPGA accelerator is not reconstructed, and acceleration of different network configurations is rapidly achieved through instructions are achieved.

Description

technical field [0001] The invention relates to the field of FPGA-based CNN acceleration methods, in particular to an OPU-based CNN acceleration method and system. Background technique [0002] Deep convolutional neural networks (CNNs) have demonstrated high accuracy in various applications, such as visual object recognition, speech recognition, and object detection, etc. However, its breakthrough in accuracy comes at the cost of high computational cost, which needs to be accelerated by computing clusters, GPUs and FPGAs. Among them, the FPGA accelerator has the advantages of high energy efficiency, good flexibility, and strong computing power, especially in deep CNN applications on edge devices such as speech recognition and visual object recognition on smartphones; it usually involves architecture exploration and optimization , RTL programming, hardware implementation and software-hardware interface development. With the development, people have conducted in-depth researc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/30G06F8/41G06N3/04
CPCG06F9/30003G06F9/3005G06F8/41G06N3/045G06N3/063G06N3/08Y02D10/00G06N20/10G06F9/30072G06F9/5027G06F9/5066
Inventor 喻韵璇王铭宇
Owner 梁磊
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products