Matrix multiplication calculation method and device

A calculation method and matrix multiplier technology, which are applied in complex mathematical operations, processor architecture/configuration, etc., can solve the problems of memory access occupying large chip space and low matrix multiplication calculation efficiency, so as to reduce the number of memory accesses and reduce Steps, the effect of improving efficiency

Pending Publication Date: 2019-11-05
HUAWEI TECH CO LTD
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Correspondingly, for the SM in the GPU, the matrix multiplier is an important component. It is the basis for the GPU to perform matrix multiplication operations using various algorithms. At presen...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Matrix multiplication calculation method and device
  • Matrix multiplication calculation method and device
  • Matrix multiplication calculation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In a GPU, data storage is usually organized in the form of a bank. figure 1 A diagram showing the structure of a repository collection is shown. Such as figure 1 As shown, a repository set is composed of several column storage blocks, and each column storage block is a repository, wherein each storage block has a size of 32 bits or 64 bits. The repository collection is row-continuous by default, that is, when assigning a value to the repository collection, consecutive elements are stored consecutively by row. When the instruction is executed in the SM, the access unit (English full name: Load / Store Units, abbreviation: LD / ST) loads data from the video memory into the storage bank, and when the SP executes specific calculation instructions, it needs to load data from the storage library read data in. Therefore, there are a large number of SPs and repositories in the SM (usually, the number of SPs in an SM is the same as the number of groups of repositories), and each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a matrix multiplier. A full-connection network contained in an existing matrix multiplier occupies a large chip space, and a large number of storage accesses are needed during matrix multiplication calculation, so that the matrix multiplication calculation efficiency of a stream multiprocessor is low. The objective of the invention is to improve the matrix multiplication calculation efficiency of a graphics processor. According to the matrix multiplier provided by the invention, when matrix multiplication is carried out, by utilizing the characteristic that different groups of memory banks can be accessed at the same time, one row of elements of the matrix serving as the multiplicand and one column of elements of the matrix serving as the multiplier are loaded into corresponding calculation units each time, and calculation is carried out at the same time. By using the matrix multiplier, the steps required for completing matrix multiplication calculation can be reduced, and the frequency of storage access required to be performed is reduced, so that the matrix multiplication calculation efficiency of the graphics processor is improved.

Description

technical field [0001] The present invention relates to the technical field of graphics, in particular to the technical field of matrix multiplication calculation. Background technique [0002] Graphics processing unit (full name in English: Graphics Processing Unit, abbreviation: GPU) is a microprocessor used to perform image calculations on devices such as hosts. In the GPU, a streaming multiprocessor (English full name: StreamingMultiprocessor, abbreviation: SM) is a basic computing unit, which adopts a single-instruction multi-threaded execution mode, which can ensure the simultaneous execution of multiple threads. Roughly speaking, SM includes instruction cache (English: Instruction Buffer), thread warp scheduler (English: Warp Scheduler), instruction distribution unit (English: Dispatch Unit), stream processor (English full name: Streaming Processor, abbreviation: SP) , double-precision floating-point unit (English full name: Double precision floating-point unit, abbr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T1/20G06F17/16
CPCG06T1/20G06F17/16
Inventor 方民权吴小蓉程剑
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products