Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A GPU acceleration method for multiplying a large-scale sparse matrix by a transposed matrix of the large-scale sparse matrix

A technology of sparse matrix and transposed matrix, which is applied in the direction of multi-channel program device, program synchronization, and data processing according to predetermined rules, which can solve the problems of inability to fully utilize the advantages of GPU in the program and in-depth optimization of thread design, so as to reduce floating Point calculation, the effect of solving the calculation time-consuming and reducing the time required for operation

Pending Publication Date: 2019-04-09
SOUTHEAST UNIV +3
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the operation of multiplying a large sparse matrix by its transpose matrix can be quickly completed through reasonable scheduling between the CPU and GPU. Scholars at home and abroad have begun to study the multiplication of a large sparse matrix by its transpose on the GPU, but there is no in-depth optimization. Thread design, simply study the calculation thread design from the distribution of calculation amount, without in-depth research on the thread calculation method and data index method, so that the program cannot make full use of the advantages of the GPU

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A GPU acceleration method for multiplying a large-scale sparse matrix by a transposed matrix of the large-scale sparse matrix
  • A GPU acceleration method for multiplying a large-scale sparse matrix by a transposed matrix of the large-scale sparse matrix
  • A GPU acceleration method for multiplying a large-scale sparse matrix by a transposed matrix of the large-scale sparse matrix

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

[0045] like image 3 Shown, a kind of large-scale sparse matrix of the present invention is multiplied by the GPU acceleration method of its transpose matrix, described method comprises the following steps:

[0046] (1) In the CPU, the large sparse matrix A is stored in the CSR sparse storage format, and the CSR sparse storage format of the sparse matrix A is stored in three vectors, which are row offset A_RowPtr, column number A_ColInd, and value A_Val;

[0047] (2) Call the cuSPARSE function cusparseDcsrgemm2 in the CPU to execute A×A T , obtain the CSR sparse storage format of the sparse matrix C, and generate the COO sparse storage format; the CSR sparse storage format of the sparse matrix C: row offset C_RowPtr, column number C_ColInd and value C_Val, the COO sparse storage of the sparse matrix C The row number in the format is C_RowInd...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a GPU acceleration method for multiplying a large sparse matrix by a transposed matrix of the large sparse matrix. The method comprises the following steps: storing the large sparse matrix A in a CSR sparse storage format in a CPU; Calling a cusparseDcsrge mm < 2 > function in the CPU to execute A * AT to obtain a CSR sparse storage format of the sparse matrix C, and generating a COO sparse storage format; allowing The CPU to transmit the data required by the GPU kernel function calculation to the GPU; And executing a kernel function SparseMM: C = A * AT in the GPU, wherein the kernel function SparseMM is obtained by multiplying the sparse matrix by the transposed matrix of the sparse matrix. According to the method, the efficiency of multiplying the large sparse matrix by the transposition of the large sparse matrix is improved in a mode of combining the process of a CPU control program, basic data processing and intensive floating point operation processing bythe GPU, and the problem that the time consumption of information matrix calculation in power system state estimation is high is solved.

Description

technical field [0001] The invention belongs to the application field of high-performance computing in power systems, and in particular relates to a GPU acceleration method for multiplying a large sparse matrix by its transposed matrix. Background technique [0002] Power system state estimation is an important part of energy management system (EMS) in modern power dispatching system, and it is also the basis of power system dispatching, control, safety assessment and so on. The functions of the energy management system can be divided into two parts: the online application for analyzing real-time changes in the power grid and the offline application for analyzing typical power flow sections. As the basis of many advanced software for online applications, the main function of state estimation is to filter the real-time information provided by the data acquisition and monitoring system (SCADA), use redundancy to improve data accuracy, automatically eliminate interference and n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/52G06F7/78
CPCG06F7/78G06F9/52
Inventor 周赣姚瑶冯燕钧傅萌张涛鹿军贺欢李强李静
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products