High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform

A technology of linear algebra and implementation method, applied in the direction of electrical digital data processing, code compilation, program code conversion, etc., to achieve the effect of reducing the number, facilitating the arrangement, and reducing the number of instructions

Active Publication Date: 2016-07-27
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF8 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] The problem solved by the invention is: based on the fact that there is no BLAS math library specifically optimized for the Shenwei SW1600 platform, and the performance of the open source math library is directly applied to the platform and the performance is not high. The three-level function of the basic linear alg

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform
  • High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform
  • High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029]The embodiment of the present invention is based on the GEMM high-performance implementation of the domestic Shenwei SW1600 platform. The Shenwei SW1600 platform designed and developed by Jiangnan Institute of Computing Technology is based on a CPU of the platform for algorithm design, and the function design adopts a three-layer code design framework of "interface interface layer function-driver driver layer function-kernel assembly core layer function" ,Such as figure 1 As shown, the calling relationship is interface-driver-kernel, where the driver layer calls the kernel layer function multiple times. The specific implementation of the three-tier code design framework is as follows:

[0030] (1) interface interface layer function: this layer is a function interface, responsible for input parameter inspection, mainly judging whether parameters such as matrix size are legal, and returning error code when input parameters are illegal; judging A, For the transposition of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention puts forward a high-performance realization method of a BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of an SW platform. An ''interface-driver-kernel assembly core code'' three-layer code design framework is adopted by aiming at a domestic SW1600 platform, technical means, including a multiply-add instruction, loop unrolling, software pipeline instruction rearrangement, SIMD (Single Instruction Multiple Data) vector operation, register blocking technology and the like which are associated with platform architecture, are adopted to realize assembly level manual optimization, the problem that a compiler can not sufficiently optimize a compute-intensive function GEMM is solved, and function performance is greatly improved. Compared with an open source BLAS math library GotoBLAS, the high-performance realization method is characterized in that an average speed-up ratio is 4.72 and a highest speed-up ratio is 5.61.

Description

technical field [0001] The present invention relates to a high-performance implementation method of general matrix multiplication GEMM of a basic linear algebra library BLAS (one of the most basic mathematical libraries widely used in scientific engineering calculations, mainly including basic operations of vectors and matrices) , improve function performance through a series of optimization methods related to platform architecture. Background technique [0002] BLAS (BasicLinearAlgebraSubprograms) is a collection of linear algebra core subprograms, mainly including the basic operations of vectors and matrices. It is one of the most basic and important mathematical libraries widely used in scientific and engineering calculations. At present, almost all the software related to matrix operation in the world calls BLAS library; the bottom layers of important dense linear algebra algorithm software packages (such as EISPACK, LINPACK, LAPACK and ScaLAPACK, etc.) are all supporte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/45G06F17/15G06F17/16
CPCG06F8/44G06F8/441G06F17/15G06F17/16
Inventor 刘昊杨超刘芳芳赵玉文张鹏孙乔
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products