Small and irregular matrix multiplication optimization method based on ARMv8 multi-core processor
A multi-core processor and optimization method technology, applied in the field of high-performance computing, can solve problems such as difficulty in adapting to small and irregular GEMMs, and low efficiency in solving small and irregular GEMMs, so as to save packaging costs, optimize performance, and promote development. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0041] In order to better understand the contents of the present invention, an example is given here.
[0042] figure 1 Kernel design for small matrix multiplication in NN mode; figure 2 A packaged microkernel design for matrix multiplication in NT mode; image 3 Designed for edge microkernels; Figure 4 It is the microkernel flow chart of irregular matrix multiplication in NT mode; Figure 5 Performance for single-threaded small matrix multiplication (hot cache); Figure 6 Performance for single-threaded small matrix multiplication (cold cache); Figure 7 Performance on Phytium2000+ for multithreaded irregular matrix multiplication; Figure 8 Performance on KP920 and Thunder X2 for multithreaded irregular matrix multiplication; Figure 9 Performance of matrices used in CP2K for LibShalom; Figure 10 Performance of matrices used in VGG for LibShalom.
[0043] Aiming at the problem that the existing BLAS (Basic Linear Algebra Subprograms) library is difficult to adapt ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


