Method in microprocessor

A microprocessor and micro-instruction technology, applied in electrical digital data processing, digital data processing components, instruments, etc., can solve the problems of high cost of FMA hardware setup, increased power consumption, and complexity.

Active Publication Date: 2016-11-16
VIA ALLIANCE SEMICON CO LTD
View PDF11 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this approach has many disadvantages
[0015] First, FMA hardware is more expensive and complex to set up than using separate add and multiply functional units
Second, when performing simple addition or multiplication operations, using FMA hardware has higher latency and generally consumes more power than using separate addition and multiplication functional units
This time delay is sometimes modeled as a delay due to "long wires"
Therefore, in order to reduce the impact of the indivisible FMA operation on instruction-level parallel processing, the additional functional blocks can provide quite limited improvement after considering the required chip size, power consumption and arithmetic operation delay.
[0017] Therefore, the best proposals and implementations will usually (but not always) provide correct results (corresponding to IEEE rounding and other specifications), sometimes provide high instruction output (throughput), but obviously require additional Hardware circuitry increases setup cost and increases power consumption to perform simple multiply and add operations on complex FMA hardware
[0018] What modern FMA designs were intended to achieve has not yet been fully achieved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method in microprocessor
  • Method in microprocessor
  • Method in microprocessor

Examples

Experimental program
Comparison scheme
Effect test

other Embodiment approach

[0256] In other implementations, the rounding cache can be an addressable register bit, a content-accessible memory, a queue storage space, or a mapping function.

other Embodiment approach

[0257] Other embodiments may provide multiple independent hardware or execution units to execute the first microinstruction, and / or provide multiple independent hardware or execution units to execute the second microinstruction. Likewise, these embodiments may also provide multiple round caches for different source code instruction streams or data streams, or various embodiments of multi-core computer processors, if advantageous.

[0258] This implementation is for superscalar, non-sequential instruction dispatch, but other implementations can also be used for in-order instruction dispatch, for example, by removing instructions from the instruction cache and providing them to the data forwarding network. Distributed from the provided multiplication unit to a separate addition unit. The instantiation of the classification of FMA operations by the present invention, and the minimal amount of hardware adjustments required by the present invention, also have advantages in sequenti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Accoridng to a method in a microprocessor, the microprocessor prepares a fused multiply-accumulate operation of a form +-A*B+-C for execution by issuing first and second multiply-accumulate microinstructions to one or more instruction execution units to complete the fused multiply-accumulate operation. The first multiply-accumulate microinstruction causes an unrounded nonredundant result vector to be generated from a first accumulation of a selected one of (a) the partial products of A and B or (b) C with the partial products of A and B. The second multiply-accumulate microinstruction causes performance of a second accumulation of C with the unrounded nonredundant result vector, if the first accumulation did not include C. The second multiply-accumulate microinstruction also causes a final rounded result to be generated from the unrounded nonredundant result vector, wherein the final rounded result is a complete result of the fused multiply-accumulate operation.

Description

[0001] This application has an application date of June 24, 2015, an application number of 201580003388.3 (international application number PCT / US2015 / 037508), and an invention titled "Branch fusion product-accumulation operation using the first and second sub-operations" Divisional application of the application. [0002] related application [0003] This application claims the U.S. Patent No. 62 / 020,246 provisional application "Non-Atomic Split-Path Fused Multiply-Accumulate with Rounding cache" filed on July 2, 2014 and the U.S. Patent No. 62 / 020,246 filed on June 10, 2015 Priority of Provisional Application No. 62 / 173,808 "Non-Atomic Temporally-SplitFused Multiply-Accumulate Apparatus and Operation Using a Calculation Control Indicator Cache and Providing a Split-Path Heuristic for Performing a FusedFMA Operation and Generating a Standard Format Intermediate Result". The entirety of these priorities are incorporated into this application by reference. [0004] This applica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F7/483G06F7/485G06F7/499G06F7/544G06F9/30G06F9/38
CPCG06F7/483G06F7/485G06F7/499G06F7/49915G06F7/49957G06F7/544G06F7/5443G06F9/223G06F9/30G06F9/30014G06F9/3017G06F9/30185G06F9/38G06F9/3893G06F9/30145G06F7/4876G06F9/3001G06F17/16
Inventor 汤玛士·艾欧玛
Owner VIA ALLIANCE SEMICON CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products