Unlock instant, AI-driven research and patent intelligence for your innovation.

Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor

A multi-core processor and optimization method technology, applied in the field of high-performance computing, can solve problems such as difficulty in adapting to small and irregular GEMMs, and low efficiency in solving small and irregular GEMMs, so as to save packaging costs, optimize performance, and promote development. Effect

Active Publication Date: 2022-01-28
NAT UNIV OF DEFENSE TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Aiming at the problem that the existing BLAS (BasicLinear Algebra Subprograms) library is difficult to adapt to small and irregular GEMMs, and the efficiency of solving small and irregular GEMMs is low on an ARMv8-based CPU architecture, the invention discloses a multi-core processor based on ARMv8 Small and irregular matrix multiplication optimization method for

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor
  • Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor
  • Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to better understand the contents of the present invention, an example is given here.

[0042] figure 1 Kernel design for small matrix multiplication in NN mode; figure 2 A packaged microkernel design for matrix multiplication in NT mode; image 3 Designed for edge microkernels; Figure 4 It is the microkernel flow chart of irregular matrix multiplication in NT mode; Figure 5 Performance for single-threaded small matrix multiplication (hot cache); Figure 6 Performance for single-threaded small matrix multiplication (cold cache); Figure 7 Performance on Phytium2000+ for multithreaded irregular matrix multiplication; Figure 8 Performance on KP920 and Thunder X2 for multithreaded irregular matrix multiplication; Figure 9 Performance of matrices used in CP2K for LibShalom; Figure 10 Performance of matrices used in VGG for LibShalom.

[0043] Aiming at the problem that the existing BLAS (Basic Linear Algebra Subprograms) library is difficult to adapt ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a small and irregular matrix multiplication optimization method based on an ARMv8 multi-core processor, which is realized by using the ARMv8 multi-core processor and comprises the following steps that a matrix storage space for storing a result matrix C obtained by multiplying a matrix A and a matrix B is established; and the ARMv8 multi-core processor performs packaging operation on the matrix B, and synchronously performs the packaging operation and the calculation operation of the small matrix multiplication. According to the method, different packaging strategies are selected for matrix multiplication in different modes to save packaging overhead, a more efficient edge microkernel is used for processing edge cases, in addition, a more reasonable parallelization method is adopted for parallelization of matrix multiplication, the performance of small and irregular matrix multiplication in an ARMv8 multi-core processor is greatly optimized, and therefore, the development of other practical applications on the ARMV8 multi-core processor can be promoted.

Description

technical field [0001] The invention relates to the field of high-performance computing, in particular to an optimization method for small and irregular matrix multiplication based on an ARMv8 multi-core processor. Background technique [0002] General matrix multiplication (GEMM) is a fundamental building block from traditional scientific simulations to emerging high-performance computing (HPC) applications of deep learning. How to optimize GEMMs is a heavily studied area, but existing linear algebra library methods are mainly for large and regularly shaped GEMMs (i.e. when the two dimensions of the matrix are more or less the same). [0003] Due to the variety and continuous evolution of HPC workloads, the size and shape of the input matrix to the GEMM kernel may vary depending on the application algorithm and input data used. For example, computational fluid dynamics (CFD), such as the finite element method and the wave equation, are often implemented with GEMMs operatin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16G06F7/57G06F9/30
CPCG06F17/16G06F7/57G06F9/30101Y02D10/00
Inventor 董德尊方建滨杨维玲苏醒庞征斌
Owner NAT UNIV OF DEFENSE TECH