A data allocation method for cpu-fpga heterogeneous multi-core system

A CPU-FPGA and heterogeneous multi-core technology, which is applied in the field of data distribution for CPU-FPGA heterogeneous multi-core systems, can solve problems such as not optimizing system performance, and achieve the effect of improving overall performance

Active Publication Date: 2021-06-01
SHANDONG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the complexity of the design space, artificial data allocations made by the system programmer can lead to suboptimal system performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data allocation method for cpu-fpga heterogeneous multi-core system
  • A data allocation method for cpu-fpga heterogeneous multi-core system
  • A data allocation method for cpu-fpga heterogeneous multi-core system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] In order to understand the memory-level performance of CPU-FPGA HMPSoC, a memory access latency model is constructed.

[0094] To measure the access latency of a single memory hierarchy using a Xilinx Zynq-7020 SoC at a clock frequency of 100MHz, a microbenchmark FPGA core was designed to perform millions of memory accesses of a specific type. For example, a kernel that repeatedly accesses a huge matrix column sort via ACP can be used to measure the average ACP read miss (i.e. L2 cache read miss) latency.

[0095] Sizing the matrix according to the size of the given L2 cache ensures that all column-ordered accesses result in a cache miss.

[0096] The experimental results are shown in Table 1:

[0097] Table 1

[0098]

[0099] The parameters will be used in some formula operations in step (3), for example:

[0100] In the definition of the execution time variable of the basic block:

[0101]

[0102] If a node represents an array a i memory load instructions...

Embodiment 2

[0118] The purpose of this example is to show that the allocation of data has a significant impact on the system performance of the Zedboard platform based on the Zynq-7020 SoC.

[0119] The experiment used the stimulus example of General Matrix Multiplication (GEMM) from Polybench:

[0120] Algorithm 1 Matrix Multiplication Algorithm

[0121]

[0122] We assume that at most 1 of the 3 arrays A, B, and C can fit in the on-chip BRAMs.

[0123] Experimental results such as Figure 4 shown, where A a B b C h Indicates the allocation scheme of arrays A, B, and C respectively allocated to ACP, BRAM, and HP ports.

[0124] The results show that the optimal allocation (A b C a B h ) than the worst allocation (A h C h B b ) is accelerated by a factor of 3.14.

[0125] In addition, it can be seen that some results may run counter to the traditional CPU-based SPM allocation scheme.

[0126] (1) Compared with the other two arrays, array C has significantly higher read and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present disclosure discloses a data allocation method for a CPU-FPGA heterogeneous multi-core system, comprising: compiling the source code into an intermediate code of the low-level virtual machine LLVM through the Clang front end; using the low-level virtual machine LLVM to execute the intermediate code of the low-level virtual machine LLVM , and receive input data to obtain the data access trajectory and instruction trajectory; generate a dynamic data dependency graph DDDG through the instruction trajectory to represent the control flow and data flow of the FPGA core; send the obtained data access trajectory to the cache simulator CacheSimulator, Obtain the cache conflict graph CCG; construct the integer linear programming formula, and solve the integer linear programming formula according to the dynamic data dependency graph DDDG and the cache conflict graph CCG to obtain the optimal data allocation scheme.

Description

technical field [0001] The disclosure relates to a data allocation method for a CPU-FPGA heterogeneous multi-core system. Background technique [0002] The statements in this section merely enhance the background related to the present disclosure and may not necessarily constitute prior art. [0003] Field-programmable gate arrays (Field-programmable Gate Arrays, FPGAs) are becoming an increasingly popular design choice in computer systems ranging from low-power embedded systems to high-performance computing architectures. Traditional FPGA design with Register-Transfer Level (RTL) programming requires extensive architecture and circuit experience, which is error-prone and time-consuming. A high-level synthesis (High-level Synthesis, HLS) tool compiles the C / C++ kernel into a corresponding hardware description language (Hardware Description Language, HDL) module. In recent years, HLS tools have been widely used in complex FPGA heterogeneous system design, shortening time to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
CPCG06F9/5027
Inventor 鞠雷荣雅洁李世清
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products