Vector floating point unit

a floating point unit and vector technology, applied in the field of vector floating point units, can solve the problems of conventional vector processor conventional vector processors that cannot meet the processing speed requirements of high-performance digital signal processing systems, and the pipeline latency of conventional vector processors cannot be reduced below a certain point, so as to achieve the effect of fine-tuning the performance to a particular application, rapid configuration of vfpu, and reducing pipeline la

Inactive Publication Date: 2005-07-26
NVIDIA CORP
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0007]The present invention provides a vector floating point unit (FPU) comprising a product-terms bus, a summation bus, a plurality of FIFO (first in first out) registers, a reconfigurable multiplexor, a floating point multiplier, and a floating point adder. The floating point multiplier and the floating point adder are disposed between the crossbar operand multiplxor and the product-terms and summation buses, and are in parallel to each other. The floating point multiplier is separated from the floating point adder by the product-terms bus so that the multiplication operation can be executed separately and independently of the addition operation.
[0008]The overall vector floating point operation is controlled by a command controller, an instruction buffer, and a command sequencer. The instruction buffer stores and decodes microcode instructions, and issues control signals to the FIFO registers, the crossbar operand multiplexor, the floating point adder, and the floating point multiplier. The command sequencer is coupled to the instruction buffer and is responsible for decoding the micro-code instructions and providing control signals to various parts of the VFPU, including control signals for the sequencing of the execution of the instruction buffer. The invention also includes a configuration register and a command register in order to permit rapid configuration of the VFPU and provide flexible architecture and the capability to fine-tune the performance to a particular application.
[0009]In operation, vector input operands are stored in FIFO (first in first out) registers. The reconfigurable multiplexor routes data in the FIFO registers to the floating point multiplier or the adder depending on the desired application. The multiplication operation is executed in a pipelined fashion. Once the pipeline is filled, the invention outputs at least one multiplication output at each clock cycle. The outputs of the multiplication are stored in a FIFO registers. If necessary, the outputs of the multiplication stored in the FIFO registers are routed to the floating point adder for an addition operation. The addition operation is also executed in a pipelined fashion. Once the pipeline is filled, at least one addition output is produced at each clock cycle. For a separate multiplication or addition, the invention reduces the pipeline latency to the latency required for an execution of multiplication or addition.

Problems solved by technology

Unfortunately, conventional FPUs fail to deliver the high vector processing speed required by high-performance digital signal processing systems.
However, even with the pipelined architecture, conventional vector FPUs do not deliver the processing speed demanded by the high-performance digital signal processing systems because of their architectural limitations.
Due to the sequential execution of the multiplication and the addition, the pipeline latency in a conventional vector processor cannot be reduced below a certain point because the pipeline includes both multiplication and addition stages.
Further, conventional FPUs lack flexibility and are cost-inefficient.
Often the conventional FPUs do not have a flexible architecture to handle the various types of memory accesses in an efficient manner.
Also, the cost of constructing a flexible architecture FPU can be prohibitively expensive using conventional technology.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vector floating point unit
  • Vector floating point unit
  • Vector floating point unit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021]The invention is particularly applicable to a vector FPU for vector floating point operatons, and it is in this context that the invention will be described. It will be appreciated, however, that the VFPU in accordance with the invention has greater utility, such as to other types of floating point or non-floating point calculations. To understand the VFPU in accordance with the invention, the basic structure of the VFPU and its operations will be described.

[0022]Overview—Vector FPU

[0023]FIG. 1. Illustrates a system diagram where a vector floating point unit (VFPU) 107 of the invention is provided to perform floating point calculations. As shown in FIG. 1, a VFPU is typically used in conjunction with one or more general microprocessor(s). In FIG. 1, a host processor 101 and a co-processor 103 are coupled to a VFPU 107 via a bus 105. A RAM (random access memory) 109 and a ROM (read only memory) 111 are also coupled to the bus 105. In addition, USB (universal serial bus) interfa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a vector floating point unit (FPU) comprising a product-terms bus, a summation bus, a plurality of FIFO (first in first out) registers, a crossbar operand multiplexor coupled, a floating point multiplier, and a floating point adder. The floating point multiplier and the floating point adder are disposed between the crossbar operand multiplexor and the product-terms and summation buses, and are in parallel to each other. The invention also provides the configuration register and the command register in order to provide flexible architecture and the capability to fine-tune the performance to a particular application. The invention performs the multiplication operation and the addition operation in a pipelined fashion. Once the pipeline is filled, the invention outputs one multiplication output and one addition output at each clock cycle. The invention reduces the latency of the pipelined operation and improves the overall system performance by separating the floating point multiplier from the floating point adder so that the multiplication operation can be executed separately and independently of the addition operation.

Description

FIELD OF THE INVENTION[0001]This invention relates generally to a floating point computation unit. More specifically, the invention relates to a vector floating point unit using pipelined and parallel processing architecture and a reconfigurable multiplexor.BACKGROUND OF THE INVENTION[0002]An FPU (floating point unit) is a type of coprocessor embedded in a more general microprocessor that manipulates numbers more quickly than the general, basic microprocessor. A coprocessor refers to a computer processor which assists the main processor by performing certain special functions, usually much faster than the main processor could perform them in software. The coprocessor often decodes instructions in parallel with the main processor and executes only those instructions intended for it. For example, an FPU coprocessor performs mathematical computations, particularly floating point operations. FPU coprocessors are also called numeric or math coprocessors. An FPU is often built into person...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(United States)
IPC IPC(8): G06F7/38G06F13/40G06F9/302G06F15/76G06F9/00G06F9/30G06F9/38
CPCG06F9/30145G06F9/3875G06F9/30014
Inventor KIM, JASON SEUNG-MINQUAN, ROBERT
Owner NVIDIA CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products