SW26010-Pro processor-oriented high-performance implementation method for 1-level and 2-level BLAS function library

An implementation method and function library technology, applied in the field of basic linear algebra library BLAS implementation, can solve the problems of low performance of open source mathematics library, achieve the effect of improving performance and solving data dependence problems

Active Publication Date: 2021-11-12
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a high-performance implementation method of 1st and 2nd-level BLAS function libraries for SW26010-Pro processors, so as to meet the needs of BLAS 1st-level and 2nd-level functions on SW26010-Pro many-core processors, and solve the existing problems The problem of low performance of open source math library

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • SW26010-Pro processor-oriented high-performance implementation method for 1-level and 2-level BLAS function library
  • SW26010-Pro processor-oriented high-performance implementation method for 1-level and 2-level BLAS function library
  • SW26010-Pro processor-oriented high-performance implementation method for 1-level and 2-level BLAS function library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

[0060] The high-performance implementation method of the present invention is characterized in that it comprises:

[0061] Feature 1. According to the scale of the input problem, the matrix or vector is divided into tasks, several subtasks are generated, and each subtask is assigned to each thread.

[0062] Feature 2. A thread reduction mechanism based on RMA communication and a thread communication mechanism based on point-to-point synchroniza...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an SW26010-Pro processor-oriented high-performance implementation method for a 1-level and 2-level BLAS function library. The method comprises the following steps: performing task division on a problem to generate a plurality of sub-problems, the structure of the problem comprising a vector, a common matrix, a symmetric matrix or a triangular matrix; if the matrix is a vector, a common matrix or a symmetric matrix, allocating the operation of each sub-problem to a corresponding thread; if the sub-problem is a triangular matrix, distributing the operation of the diagonal part of the sub-problem to the 0 # thread, and distributing the operation of the non-diagonal part of the sub-problem to other corresponding threads; and splicing the operation results of the threads to obtain a solution of the problem. According to the method, the parallelization of the BLAS 1-level function and the BLAS 2-level function is realized, the problem of data dependence among threads is solved, and the performance of the functions is further improved through a self-adaptive tuning mechanism.

Description

technical field [0001] The invention relates to the field of realization of basic linear algebra library BLAS (Basic Linear Algebra Subprograms), in particular to a high-performance method for implementing 1st and 2nd-level BLAS function libraries for SW26010-Pro processors. Background technique [0002] BLAS is a basic linear algebra subroutine library, which mainly includes the basic operations of vectors and matrices. It is one of the most basic and important mathematical libraries and is widely used in scientific computing, weather forecasting, astrophysics and other fields. The BLAS library is the core of many professional software. Among them, BLAS 1 and 2 functions will be called repeatedly many times by almost all applications related to matrix operations and dense linear algebra algorithm software packages (such as LAPACK, ScaLAPACK). Practices in numerical matrix analysis and deep learning have shown that BLAS level 1 and level 2 functions are of great significance...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16
CPCG06F17/16Y02D10/00
Inventor 胡怡陈道琨杨超刘芳芳马文静
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products