Split-path fused multiply-accumulate operation using first and second sub-operations

A partial product and product technology, applied in the field of microprocessor design integrating floating-point product-accumulate operations, which can solve problems such as limited improvement, failure to fully achieve FMA design goals, and increased setup costs.

Active Publication Date: 2016-08-10
上海兆芯集成电路股份有限公司
View PDF10 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this approach has many disadvantages
[0014] First, FMA hardware is more expensive and complex to set up than using separate add and multiply functional units
Second, when performing simple addition or multiplication operations, using FMA hardware has higher latency and generally consumes more power than using separate addition and multiplication functional units
This time delay is sometimes modeled as a delay due to "long wires"
Therefore, in order to reduce the impact of the indivisible FMA operation on instruction-level parallel processing, the additional functional blocks can provide quit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Split-path fused multiply-accumulate operation using first and second sub-operations
  • Split-path fused multiply-accumulate operation using first and second sub-operations
  • Split-path fused multiply-accumulate operation using first and second sub-operations

Examples

Experimental program
Comparison scheme
Effect test

other Embodiment approach

[0332] In other implementations, the rounding cache can be an addressable register bit, a content-accessible memory, a queue storage space, or a mapping function.

other Embodiment approach

[0333] Other embodiments may provide multiple independent hardware or execution units to execute the first microinstruction, and / or provide multiple independent hardware or execution units to execute the second microinstruction. Likewise, these embodiments may also provide multiple round caches for different source code instruction streams or data streams, or various embodiments of multi-core computer processors, if advantageous.

[0334] This implementation is for superscalar, non-sequential instruction dispatch, but other implementations can also be used for in-order instruction dispatch, for example, by removing instructions from the instruction cache and providing them to the data forwarding network. Distributed from the provided multiplication unit to a separate addition unit. The instantiation of the classification of FMA operations by the present invention, and the minimal amount of hardware adjustments required by the present invention, also have advantages in sequenti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A microprocessor executes a fused multiply-accumulate operation of a form +-A * B +- C by dividing the operation in first and second suboperations. The first suboperation selectively accumulates the partial products of A and B with or without C and generates an intermediate result vector and a plurality of calculation control indicators. The calculation control indicators indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The intermediate result vector, in combination with the plurality of calculation control indicators, provides sufficient information to generate a result indistinguishable from an infinitely precise calculation of the compound arithmetic operation whose result is reduced in significance to a target data size.

Description

[0001] related application [0002] This application claims the U.S. Patent No. 62 / 020,246 provisional application "Non-Atomic Split-Path Fused Multiply-Accumulate with Rounding cache" filed on July 2, 2014 and the U.S. Patent No. 62 / 020,246 filed on June 10, 2015 Priority of Provisional Application No. 62 / 173,808 "Non-Atomic Temporally-Split Fused Multiply-Accumulate Apparatus and Operation Using a Calculation Control Indicator Cache and Providing a Split-Path Heuristic for Performing a Fused FMA Operation and Generating a Standard Format Intermediate Result". The entirety of these priorities are incorporated into this application by reference. [0003] This application is also related to the following applications filed concurrently with this application: U.S. Application No. 14 / 748,870, entitled "Temporally SplitFused Multiply-Accumulate Operation"; U.S. Application No. 14 / 748,924, entitled "Calculation Control Indicator Cache"; U.S. Application No. 14 / 748,956, entitled "Cal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F7/483G06F7/485G06F7/499G06F7/544G06F9/30G06F9/38
CPCG06F7/483G06F7/485G06F7/499G06F7/49915G06F7/49957G06F7/544G06F7/5443G06F9/223G06F9/30G06F9/30014G06F9/3017G06F9/30185G06F9/38G06F9/3893G06F9/30145G06F7/4876G06F9/3001G06F17/16
Inventor 汤玛士·艾欧玛
Owner 上海兆芯集成电路股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products