Unlock instant, AI-driven research and patent intelligence for your innovation.

Array multiple with reduced bandwidth requirement

A technique for operands and program instructions, which is applied in the field of reducing memory bandwidth and can solve problems such as limiting the overall computing performance of matrix multiplication

Active Publication Date: 2007-11-21
NVIDIA CORP
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This limits the overall computational performance of the processing device for matrix multiplication

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Array multiple with reduced bandwidth requirement
  • Array multiple with reduced bandwidth requirement
  • Array multiple with reduced bandwidth requirement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013] In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the present invention.

[0014] 1A illustrates a conceptual diagram of matrix A 101 and matrix B 102 that are multiplied to produce matrix C 103 in accordance with one or more aspects of the disclosure. Conventionally, a dot product is computed using the elements in the rows of matrix A 101 and the columns of matrix B 102 to produce the elements in the columns of matrix C 103 . For example, elements in row 107 of matrix A 101 and elements in column 105 of matrix B 102 (eg, 131 , 132 and 146 ) are used to generate element 152 in column 104 of matrix C 103 . When multiple threads of execution are used...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Systems and methods for reducing the bandwidth needed to read the inputs to a matrix multiply operation may improve system performance. Rather than reading a row of a first input matrix and a column of a second input matrix to produce a column of a product matrix, a column of the first input matrix and a single element of the second input matrix are read to produce a column of partial dot products of the product matrix. Therefore, the number of input matrix elements read to produce each product matrix element is reduced from 2N to N+1, where N is the number of elements in a column of the product matrix.

Description

technical field [0001] Embodiments of the invention relate generally to performing matrix multiplication using multi-threaded processing or vector processing, and more specifically to reducing memory bandwidth. Background technique [0002] Matrix-matrix multiplication is an important building block for many calculations in the field of high-performance computing. Each multiply-add operation for performing a matrix-matrix multiplication requires access to two source operands in memory. Thus, in a multi-threaded processor executing T threads simultaneously, each thread performing a multiply-add operation, 2T memory operands are required to supply the operands for the multiply portion of the operation. Similarly, in a vector processor that executes T data lanes in parallel (eg, a T lane Single Instruction Multiple Data (SIMD) vector processor), each vector multiply-add requires 2T memory operands. In general, providing memory bandwidth for 2T simultaneous accesses becomes pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38
CPCG06F17/16G06F9/46G06F9/38G06F15/80
Inventor 诺伯特·朱法约翰·R·尼科尔斯
Owner NVIDIA CORP