A data allocation method for cpu-fpga heterogeneous multi-core system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A CPU-FPGA and heterogeneous multi-core technology, which is applied in the field of data distribution for CPU-FPGA heterogeneous multi-core systems, can solve problems such as not optimizing system performance, and achieve the effect of improving overall performance

Active Publication Date: 2021-06-01

SHANDONG UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Due to the complexity of the design space, artificial data allocations made by the system programmer can lead to suboptimal system performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0093] In order to understand the memory-level performance of CPU-FPGA HMPSoC, a memory access latency model is constructed.

[0094] To measure the access latency of a single memory hierarchy using a Xilinx Zynq-7020 SoC at a clock frequency of 100MHz, a microbenchmark FPGA core was designed to perform millions of memory accesses of a specific type. For example, a kernel that repeatedly accesses a huge matrix column sort via ACP can be used to measure the average ACP read miss (i.e. L2 cache read miss) latency.

[0095] Sizing the matrix according to the size of the given L2 cache ensures that all column-ordered accesses result in a cache miss.

[0096] The experimental results are shown in Table 1:

[0097] Table 1

[0098]

[0099] The parameters will be used in some formula operations in step (3), for example:

[0100] In the definition of the execution time variable of the basic block:

[0101]

[0102] If a node represents an array a i memory load instructions...

Embodiment 2

[0118] The purpose of this example is to show that the allocation of data has a significant impact on the system performance of the Zedboard platform based on the Zynq-7020 SoC.

[0119] The experiment used the stimulus example of General Matrix Multiplication (GEMM) from Polybench:

[0120] Algorithm 1 Matrix Multiplication Algorithm

[0121]

[0122] We assume that at most 1 of the 3 arrays A, B, and C can fit in the on-chip BRAMs.

[0123] Experimental results such as Figure 4 shown, where A a B b C h Indicates the allocation scheme of arrays A, B, and C respectively allocated to ACP, BRAM, and HP ports.

[0124] The results show that the optimal allocation (A b C a B h ) than the worst allocation (A h C h B b ) is accelerated by a factor of 3.14.

[0125] In addition, it can be seen that some results may run counter to the traditional CPU-based SPM allocation scheme.

[0126] (1) Compared with the other two arrays, array C has significantly higher read and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present disclosure discloses a data allocation method for a CPU-FPGA heterogeneous multi-core system, comprising: compiling the source code into an intermediate code of the low-level virtual machine LLVM through the Clang front end; using the low-level virtual machine LLVM to execute the intermediate code of the low-level virtual machine LLVM , and receive input data to obtain the data access trajectory and instruction trajectory; generate a dynamic data dependency graph DDDG through the instruction trajectory to represent the control flow and data flow of the FPGA core; send the obtained data access trajectory to the cache simulator CacheSimulator, Obtain the cache conflict graph CCG; construct the integer linear programming formula, and solve the integer linear programming formula according to the dynamic data dependency graph DDDG and the cache conflict graph CCG to obtain the optimal data allocation scheme.

Description

technical field [0001] The disclosure relates to a data allocation method for a CPU-FPGA heterogeneous multi-core system. Background technique [0002] The statements in this section merely enhance the background related to the present disclosure and may not necessarily constitute prior art. [0003] Field-programmable gate arrays (Field-programmable Gate Arrays, FPGAs) are becoming an increasingly popular design choice in computer systems ranging from low-power embedded systems to high-performance computing architectures. Traditional FPGA design with Register-Transfer Level (RTL) programming requires extensive architecture and circuit experience, which is error-prone and time-consuming. A high-level synthesis (High-level Synthesis, HLS) tool compiles the C / C++ kernel into a corresponding hardware description language (Hardware Description Language, HDL) module. In recent years, HLS tools have been widely used in complex FPGA heterogeneous system design, shortening time to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F9/50

CPCG06F9/5027

Inventor 鞠雷荣雅洁李世清

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A data allocation method for cpu-fpga heterogeneous multi-core system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology