Paralleling floating point multiplication addition unit

A floating-point multiply and add operation technology, applied in concurrent instruction execution, machine execution devices, instruments, etc., can solve problems such as reducing precision, reducing execution efficiency, and increasing hardware overhead.

Active Publication Date: 2008-05-14
TSINGHUA UNIV
View PDF0 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The deficiencies of the prior art shown in Figure 1 cannot be solved by adopting a separate addition unit and multiplication unit. First, the hardware overhead will be increased. Secondly, the multiplication and addition instruction needs to be split into two instructions for execution, which reduces its Execution efficiency, and due to double rounding, the accuracy is reduced. Finally, this scheme cannot accelerate data-related instructions.
Part of the deficiencies of the prior art shown in Figure 1 can be made up for by using a multiply-add unit and an add unit, but the increase in hardware overhead is too large, and for data-related instructions, this solution is also powerless

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paralleling floating point multiplication addition unit
  • Paralleling floating point multiplication addition unit
  • Paralleling floating point multiplication addition unit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0188] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0189] The invention adopts three-stage assembly line to realize, realizes with VerilogHDL, and uses 0.18 micron standard cell library to carry out circuit synthesis after passing verification.

[0190] The single-precision parallel floating-point unit of the present invention is divided into three pipeline beats according to time sequence. The whole working process will be described below with reference to FIG. 2 . In this embodiment, A+B+C×D is still used to represent a parallel multiply-add operation. And here B is less than or equal to A, which is pre-processed by the compiler.

[0191] The first-stage pipeline: shift alignment of A and B, Persian encoding and partial product compression of C×D.

[0192] The base 4 Potts encoder 3 encodes the mantissa of C, and then multiplies the encoded result with the mantissa of D to obtain ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a parallel floating-point fused multiply-add unit which simplifies similar technique and achieves the multiply-add operation of A+B+C*D (A is equal to or greater than B) and acquires the result of C*D, so as to achieve three classes production line: in the first production line, A and B are displaced and snapped, and the C*D is coded and part of C*D is compressed; in the second production line, the displacement and snapped result of A and B and the result of partial compressed C*D are compressed in 4:2CSA, and then front zero guide prediction, character prediction, half-add operation and normalized displacement are accomplished; in the third production line, the final add operation and rounding of A+B+C*D are accomplished and the index is counted, and the mantissa and the index of C*D are counted according to the output of the first production line. The invention has the advantages of achieving the parallel of instruction grade; accomplishes an add instruction and a multiply instruction at the same time; and also can accelerate two continuous instructions with correlative data.

Description

technical field [0001] The invention relates to the design of a floating-point operation unit, which is a high-speed floating-point multiply-add unit for realizing high-performance floating-point operation. Background technique [0002] Literature data show that almost 50% of floating-point multiplication instructions are followed by floating-point addition or subtraction instructions. Therefore, the floating-point multiply-add fusion operation A+B×C has become a basic operation in scientific computing and multimedia applications. Since the floating-point multiply-add fused operation occurs so frequently in applications, it has become a good choice for modern high-performance commercial processors to implement the operation with a floating-point multiply-add fused unit (referred to simply as the MAF unit). This implementation mainly has the following two advantages: (1) only one rounding is required instead of two; (2) circuit delay and hardware overhead can be reduced by s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38G06F7/556G06F7/533G06F7/501G06F7/499
Inventor 李兆麟李恭琼
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products