Single-accuracy matrix multiplication optimization method based on loongson chip 3A

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of matrix multiplication and optimization method, which is applied in the field of electrical digital data processing, can solve the problem of low performance of single-precision matrix multiplication, achieve the effect of improving operation efficiency and overcoming invalid prefetch

Inactive Publication Date: 2011-10-12

UNIV OF SCI & TECH OF CHINA

View PDF1 Cites 33 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Loongson 3A has unique instructions such as 128-bit memory access and parallel single-precision floating point. However, the basic linear algebra subroutine library (GotoBLAS) developed by the High Performance Computing Group of the Supercomputing Center at the University of Texas at Austin is not specific to Loongson 3A. Special optimization made by the characteristics, so the performance of the single-precision matrix multiplication it uses is not high on the Loongson 3A platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0016] The present invention is based on the single-precision matrix multiplication optimization method of Loongson 3A. First, the two single-precision source matrices of Loongson 3A are divided into two sub-matrices according to the principle that the block size is not larger than the second-level cache respectively. The principle that is larger than half of the second-level cache is divided into two sub-matrices; the 128-bit memory access instruction of Godson 3A is used in the matrix multiplication core calculation code of the 32-bit memory access instruction of Godson 3A, the single-precision floating-point multiply-add instruction and the prefetch instruction And parallel single-precision floating-point instructions, and use twice the size of the operation data set minus the size of the operation data unit prefetch address calculation method to prefetch the data.

[0017] In this embodiment, the two single-precision source matrices of Loongson 3A are first divided into two...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a single-accuracy matrix multiplication optimization method based on a loongson chip 3A. The method is characterized by comprising the following steps of: dividing two single-accuracy source matrixes of the loongson chip 3A into two sub matrixes according to a principle that the two single-accuracy source matrixes are less than or equal to a half of a one-level cache and less than or equal to a half of a second-level cache; and pre-fetching data by using a 128-bit access instruction and a concurrent single-accuracy floating point instruction of the loongson chip 3A in a matrix multiplication core computation code of a 32-bit access instruction, a single-accuracy floating point multiplication-addition instruction and a pre-fetching instruction of the loongson chip 3A and using a pre-fetching address calculation mode of subtracting the size of an operation data CDS from the first address CACAS of an operation data set, so that a floating point operation part can basically operate at full load. By the method, the problem of invalid pre-fetching of address-non-aligned data is solved, and the executive efficiency of an address-non-aligned single-accuracy matrix multiplication is approximate to that of an address-aligned single-accuracy matrix multiplication. Compared with a basic linear algebra subprogram library (GotoBLAS) version 2-1.07, the single-accuracy matrix multiplication which is optimized by the method provided by the invention has the advantage that: an operation speed is averagely improved by above 90 percent.

Description

technical field [0001] The invention belongs to the technical field of electrical digital data processing, and in particular relates to an optimization method for single-precision matrix multiplication based on Loongson 3A. Background technique [0002] Loongson 3A is China's first quad-core central processing unit (CPU) with completely independent intellectual property rights. In the field of high-performance computing, Godson 3A needs the support of the basic linear algebra subroutine library. The better basic linear algebra subroutine library that can be used on Godson 3A is the basic linear algebra subroutine library (GotoBLAS) developed by the High Performance Computing Group of the Supercomputing Center of the University of Texas at Austin. The basic linear algebra subroutine library (GotoBLAS) developed by the High Performance Computing Group of the University of Texas at Austin Supercomputing Center achieves efficient single-precision matrix by using optimization te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/16

Inventor 顾乃杰何颂颂张斌许耿纯

Owner UNIV OF SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Single-accuracy matrix multiplication optimization method based on loongson chip 3A

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology