A sparse matrix storage method on simd many-core processor with multi-level cache

A technology of many-core processors and sparse matrix, which is applied in the field of parallel programming, can solve the problems of missing cache, high memory access delay overhead, low x-vector data reuse rate, low SIMD utilization rate, etc., to improve utilization rate, The effect of high density and improved computing efficiency

Active Publication Date: 2017-07-25
UNIV OF SCI & TECH OF CHINA
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to obtain high computing performance on this kind of SIMD processor with multi-level Cache for sparse matrix-vector multiplication, it is necessary to overcome the computing bottleneck caused by the irregular distribution of non-zero elements of the sparse matrix:
[0005] (1) The utilization rate of SIMD is low;
[0006] (2) The data reuse rate in the x vector is low, which makes the cache miss and memory access delay overhead very large;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A sparse matrix storage method on simd many-core processor with multi-level cache
  • A sparse matrix storage method on simd many-core processor with multi-level cache
  • A sparse matrix storage method on simd many-core processor with multi-level cache

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] This section applies the present invention to a typical sparse matrix-vector multiplication calculation on a SIMD many-core processor with multi-level Cache. Thereby, the object, advantages and key technical features of the present invention are further described. This implementation is only a typical example of the solution, and any technical solution formed by replacement or equivalent transformation falls within the scope of protection claimed by the present invention.

[0028] For a sparse matrix A to be computed:

[0029]

[0030] First, the matrix A is subjected to feature extraction, matrix scanning, column block, column compression, row block, and row-by-row storage according to the optimization method proposed by the present invention. converted to as Figure 5 ERB storage format, or directly store the sparse matrix to be calculated as Figure 5 storage format and save it to a file.

[0031] When calculating, read from the file Figure 5 The sparse matr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a storage method of a sparse matrix on an SIMD multi-core processor with a multi-level cache. The method includes the steps of firstly, the maximum value a of the number of row nonzero elements in a matrix A and the number b of the nonzero elements which can be calculated at the same time by a processor SIMD unit are acquired, and a minimum value which is larger than a and is the multiple of b is calculated to serve as a temperature row width; secondly, for the matrix A, array Value and Colidx respectively sequentially stores the nonzero element value of each row and line coordinates, and 0 and -1 are respectively supplemented to the tail of each row whose number of the nonzero elements does not reach the temporary row width; thirdly, partitioning according to b lines is performed on Colidx and Value; fourthly, each line block is compressed according to rows, and the rows, with the nonzero elements, in the line block is allowed to centralize on the upper portion of the line block; sixthly, partitioning is performed on the line blocks according to b rows to obtain sub-blocks; all-zero sub-blocks are removed, and the sub-blocks are stored according to rows. The method has the advantages that the sparse matrix is divided into dense sub-blocks, the utilization of the processor SIMD processing unit and a register are increased while the locality of the nonzero elements is kept, and sparse matrix vector multiplication performance is increased.

Description

technical field [0001] The invention relates to the field of parallel program design, in particular to a sparse matrix storage method on a SIMD many-core processor with multi-level Cache. Background technique [0002] Sparse matrix-vector multiplication (SpMV) is an important computing core of many scientific and engineering applications, and its computational efficiency is the key to the computational performance of scientific and engineering applications. The main function of the algorithm is to calculate y=y+Ax, where A is a two-dimensional sparse matrix, and both x and y are one-dimensional dense vectors. However, the core of the algorithm is on the modern SIMD many-core processor with multi-level Cache. Due to the irregularity of the non-zero element distribution of the sparse matrix, the SIMD utilization rate is very low, resulting in poor SpMV performance. To improve the performance of the algorithm, we often need to comprehensively consider the characteristics of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F12/0811G06F9/38G06F12/1009
Inventor 韩文廷张爱民江霞安虹陈俊仕孙荪汪朝辉
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products